sort in scalar context

schwern on 2006-02-04T02:47:59

I'm going to crack this endless argument open once again because I was bitten by it yet again yesterday and I see myself getting bitten again and again in the future. I wrote "return sort ..." and I've been programming Perl for over 10 years. I grepped our code base at work and found 34 instances of 'return sort', each of them subtle bug waiting to strike. Try grepping your own code for it. The Perl core has 24 instances (9 are in the debugger alone).

For those who don't know, the behavior of sort in scalar context is undefined. I believe its because folks have never agreed on what it should do. The arguments fall into three camps:

1) sort should return the number of items in the list.

For: Then it will work just like map and grep which sort looks and behaves very simlar to. Against: If all you want to know is the length of the list, why sort at all? For: Because this is very common: return sort @foo; Against: You can't assume that because a function returns a list in list context that it will return the length of the list in scalar context. For: Its a pretty safe assumption. return map, grep or @array all work that way. Its extremely common in real world code and a solid default behavior.

Against: There's lots of things which return lists in list context but not their length in scalar context. For: The argument that "X should be inconsistent because Y and Z are inconsistent" is dubious at best, particularly in Perl where there's plenty of inconsistencies to dredge up. But let's look at those inconsistencies, are they really all that inconsistent? Those which don't return their length can be split into several categories: Things which always return a fixed size list (each, getpw*, gm/localtime, select, stat, times, caller); iterators (glob, readline, readdir); list behavior (as opposed to array behavior); other (unpack, qx). For things which return fixed sized lists, its not useful for it to return the length and unlikely to surprise anyone that they do something different. Iterators are a well known pattern. The behavior of lists has always been odd and full of traps for the unwary, so using it as an argument is dubious. As for the handful of outriders, they are just that. Of the things which do return their length in scalar context there is a clear pattern that "foo BLOCK/REGEX, LIST" (map, grep, split) returns length. sort is the exception.

For: What if I'm the subroutine author and want my function to return a sorted list in list context but a length in scalar? I have to write either "my @sorted = sort @foo; return @foo;" or use the archane "force list context" trick of "() = sort @foo;" Neither is natural.

For: People write "return sort" all the time without thinking. Its in the Perl core. Its all over CPAN. Close that trap!

For: Its a trivial patch.

2) sort should do insert escoteric functionality here.

For: sort can return a ratio of already sorted adjacent pairs! (or whatever) Against: That sounds about as useful as keys() returning the bucket ratio.

3) Leave it undefined.

For: We leave it undefined in case a clear solution presents itself in the future. Against: IT IS THE FUTURE! How much longer are you going to wait?

Against: People write "return sort" all the time. By leaving this undefined a large trap has been created.

For: Unlike map or grep, "scalar sort @foo" will return the exact same value as "scalar @foo". So why use sort in scalar context at all? Against: "return sort @foo".



That's how I see it. Feel free to add your own arguments.


Well, duh ...

Ovid on 2006-02-04T09:05:39

Sort in scalar context should return bunnies. Then everybody would be happy. Bunnies are fluffy and cute. If you don't think that, you'll find that they're awfully tasty. If you're a vegetarian, they conveniently return a ratio of already sorted pairs. You can't lose!

Re:Well, duh ...

educated_foo on 2006-03-08T06:19:21

This is scalar context, so you mean "bunny", right?

Perl::Critic

Dom2 on 2006-02-04T11:28:05

In the meanwhile, it sounds like a good thing to add to Perl::Critic...

-Dom

Re:Perl::Critic

phaylon on 2006-02-04T14:10:53

I'd second that, as it is what I was going to suggest. I see return sort .. in a same way as return undef just because one doesn't want something to be returned. It's ambiguous and in my opinion the developer should be aware of that. Would be a lot easier with a Perl-Critic policy, though I don't know how spread that already is "in the wild."

Re:

Aristotle on 2006-02-04T11:55:01

It’s interesting that neither argument against returning the list length presents a use case that would be precluded; both just argue on principles. And the leave-it-undefined stance takes the same basic position.

The magnitude of sorted pair ratios != 1.0 is meaningless, so that’s not even as useful as keys in scalar context.

We have exactly one real-world use case, return sort @foo, so there is exactly one meritorious argument, and it has no compelling counter. Why is anyone even arguing about this?

the "obvious" unexciting thing

rjbs on 2006-02-04T18:01:54

I've always thought the obvious behavior would be to return the first element. It doesn't gain much (saves you typing two parens, maybe), but it's just what I always thought made sense...

Re:the "obvious" unexciting thing

rjbs on 2006-02-04T18:06:18

I also always want to call the first element returned from sort the "most sorted element." I just like the sound of it.

Re:the "obvious" unexciting thing

bart on 2006-02-04T19:18:52

The first (min)... or the last (max)?

Both would have their uses.

Re:the "obvious" unexciting thing

rjbs on 2006-02-04T20:05:22

The first. If you want the last, you'd reverse $a and $b in your sort.

Re:the "obvious" unexciting thing

BooK on 2006-02-06T19:15:05

Except that it's bound to be used as a shortcut for min() or max() (I've seen people write (sort @list)[-1]), which is rather inefficient compared to using List::Util (now in the core) or rolling you own min() or max().

Code examples and benchmarks available there, in French (sorry).

Re:the "obvious" unexciting thing

schwern on 2006-02-08T09:34:55

Here's a short mental gauntlet for any proposal to run though:

If sort() should do X, why not map and grep?

Why is X useful as an implicit behavior?

Why not use a user defined function (in this case List::Util's max/min)? (bonus: no argument about if it should return the first or last element)

Will it help in the common "return sort ..." gotcha?

Can you think of common cases where you want to return the entire list in list context but only do X in scalar?

__

Abigail on 2006-02-04T22:57:33

First of all, I use 'return sort ...' in my code on a regular bases. I don't consider this a bug. If I write a function that's supposed to return a (sorted) list, and someone calls the function in scalar context, then it's the calling code that's buggy, not the subroutine. If I wanted the function to behave sensible when used in scalar context, I'd make it so. But I doubt that it would then return the number of elements of the list. I'm with rjbs on this one - the first (or last) element would be more useful.

Now, what really would be cool is the return an iterator, returning the elements of the list in sorted order. The function wouldn't be called again until the list was exhausted. Example:

my @list = qw [foo bar baz];
sub sortlist {
    print "Hello, world\n";
    return sort @list;
}
for (1 .. 5) {
    print scalar sortlist, "\n";
}
__END__
Hello, world
bar
baz
foo
Hello, world
bar
baz

Re:__

schwern on 2006-02-08T09:28:24


If I wanted the function to behave sensible when used in scalar context, I'd make it so.


The other way to look at that is "I want my functions to behave insensibly in scalar context unless I remember to make it so" which strikes me as a little peevish and definately not DWIM.

That aside, here's something more concrete--how do you do that? Take this sample for example, which is much like the code I recently stumbled over...

        return sort { $a cmp $b }
                      map { normalize_date($_) }
                      keys %days_processed;

How do you rewrite that so it returns the number of elements in scalar context without making it A) convoluted, B) repeat code or C) inefficient?


Now, what really would be cool is the return an iterator, returning the elements of the list in sorted order.


That would be cool (though I'd make the iterator more explicit, your example reminds me too much of each()) except it would work completely unlike every other Perl 5 built-in, particularly the other list processing functions map and grep.

I believe Perl 6 is going the iterator route.

Re:__

Abigail on 2006-02-08T22:12:12

The other way to look at that is "I want my functions to behave insensibly in scalar context unless I remember to make it so" which strikes me as a little peevish and definately not DWIM.
That sounds like for all functions that I design to return a list, I should take the trouble of figuring out what it they should do in list context.

Sorry, but I pass. For functions where it makes sense, I do so. But for many, I won't. Consider the context to be a pre-condition - just like the arguments being passed in.

That aside, here's something more concrete--how do you do that? Take this sample for example, which is much like the code I recently stumbled over...
        return sort { $a cmp $b }
                      map { normalize_date($_) }
                      keys %days_processed;
Assuming that normalize_date returns a single value, I'd write that as:
    return wantarray ? sort {$a cmp $b}
                       map  {normalize_date $_}
                       keys %days_processed
                     : keys %days_processed;

Re:

Aristotle on 2006-02-08T18:15:31

That would require of Perl to do extra magic when it encounters return sort, and it’s non-reentrant. I vote no. If you want an iterator, then an iterator object/closure should be returned. Let’s not create more instances of the each trap.

boolean test

jmm on 2006-02-06T16:05:35

My vote for the meaning of sort in scalar context is to return whether the list was already sorted.

unless( sort @list ) {
    # @list is not sorted, deal with it...
    ...
}

If you want to go beyond a simple boolean, the index of the first out of order element sounds a lot more useful than the proprtion of sorted neighbours - and, like the boolean result, it can short circuit the scan of the list as soon as an out of order pair is found.

Re:boolean test

n1vux on 2006-02-06T19:47:20

to return whether the list was already sorted

An is_sorted @list predicate would be an interesting builtin/XS/Inline function, and should run faster than sort (if coded equally well). It should run about the same speed as a least @list sort-variant optimized to return (sort @_)[0].

However, I'm not sure is_sorted should be called scalar sort, that doesn't seem any better than calling least as scalar sort. Changing sort from verb-transitive to verb-predicate when going from list to scalar context is rather drastic. Someone who expected to get the size of the list -- not knowning the list returned by the function that code containing return sort soemthingelse @_ is returned by sort -- might foolishly trust the 0 and 1 as list size. In merlyn's context of return sort @somelist, I suggest the right answer is either what "everyone" expects (list ops return size in scalar context unless something else is monstrously obvious, however monstrously wrong that assumption is) is the only valid option other than the current undef-and-warn-if-warnings, which is highly defensible, it at least announces it's broken (unless -w is off, in which case the caller is borked anyway).

Debugger okay

pemungkah on 2006-02-06T20:52:41

All of the return sort's in the debugger are called in list context - Term::Readline's completion function in most cases, and a similar bit of code in the options handlling. It's probably not worth patching it.

Re:

Aristotle on 2006-02-08T18:24:12

Three times a year, a new person is caught by surprise by sort in scalar context. The rest of the time, noone cares or notices. Why not spare these three guys the surprise and just give it a reasonable default? Noone will notice if it’s patched.