It's sort of like...

pmichaud on 2008-12-12T17:34:36

Earlier in the week, fw wrote a journal post about his First Perl6 program. I was quite excited to see this, but then read a comment from educated_foo that pointed out that the Perl 5 version is actually shorter to write.

That didn't sit too well with me, because we really ought to make common things simpler, not harder. What really bugged me (about Perl 6, not about fw's post) was the following for sorting a hash by values:

for %words.pairs.sort: { $^b.value <=> $^a.value } -> $pair {
    say $pair
}


Sorting hashes by value is a common operation, and although I can shorten the above a little bit

.say for %words.sort: { $^b.value <=> $^a.value };


it's still a bit long for my taste. That { $^b.value <=> $^a.value } just bugs me.

Then Moritz Lenz made a comment on #perl6 that perhaps sort should do something different with arity-1 blocks, and I then had an epiphany that led to the following pattern for sorting hashes by value:

%hash.sort: { .value }


I like this so much, I've gone ahead and implemented it in Rakudo, even though it's not officially part of the spec. (I'm hoping it'll be adopted as such.) So now in Rakudo we have the following:

> my %hash = { c=>5, b=>7, a=>-4, e=>9, d=>0 };
> .say for %hash.sort: { .value }
a       -4
d       0
c       5
b       7
e       9

> .say for %hash.sort: { .key } a -4 b 7 c 5 d 0 e 9


That seems much nicer. The general principal is that if the comparison argument to "sort" takes less than two arguments, then it's used to generate the values to be compared instead of the result of a comparison.

Of course, we aren't limited to just keys or values -- any operation we want to perform on the items being sorted will work. To sort by the magnitude of the values of the hash:

> .say for %hash.sort: { .value.abs }
d       0
a       -4
c       5
b       7
e       9


And of course this generalizes to more than just hashes; if @dogs contains a list of Dog objects we want to sort:
@dogs.sort: { .name }      # sort dogs by name
@dogs.sort: { .age }       # sort dogs by age


This works because the ".name" and ".age" methods are invoked on each of the objects in the list to determine the value to use for that object in the sort.

Or for a simplistic case-insensitive sort:

> my @a = ;
> .say for @a.sort: { .lc }
Apple
apricot
BaNaNa
berry
CHERRY
danish
Fruit


Besides clarity, another big advantage of

@a.sort: { .lc }


over

@a.sort: { $^a.lc leg $^b.lc }


is that in the first version we compute .lc on each element only once for the entire sort, whereas in the second version it's computed once for each comparison. And since we're typically doing O(n^2) comparisons, the first version can save a lot of method or function calls.

So, that was my fun for the morning. Implementing this behavior turned out to be really easy -- in fact, I wrote the algorithm first as Perl 6 before translating it into PIR:

multi method sort(@values: &by) {
    ...
    if &by.arity < 2 {
        my @v     = @values.map(&by);
        my @slice = (0..^@v).sort: { @v[$^a] cmp @v[$^b] };
        return @values[ @slice ];
    }
    ...
}


The code just uses &by to compute the values (@v) to be used in the sort, does a sort on the indexes based on a comparison of those values, and uses the resulting sorted index list to return the (sorted) slice of the original list. Note that the items in the result list are themselves unchanged from the original list -- they simply have a new sequence according to the &by criteria.

Since then PerlJam++ has suggested that we should do similar things for min and max:

my $longest_string = @strings.max( { .chars } );


This would use .chars as the criteria for determining the longest string but returns the longest string itself.

Perl 6 is very cool.

Pm

P.S.: By the way, this means that the solution to the problem in fw's original post becomes:

my %words;
$*IN.slurp.comb.map: { %words{$_}++ };
.say for reverse %words.sort: { .values };


Not all of the above works in Rakudo yet (I think .comb is not yet implemented), but at least it's getting closer to what we'd really like to see here.


schwartzian transforms now a builtin

Eric Wilhelm on 2008-12-12T19:16:25

in the first version we compute .lc on each element only once for the entire sort, whereas in the second version it's computed once for each comparison.

Nice.

or this?

markjugg on 2008-12-12T21:24:19

Maybe this doesn't make sense but some way to stick with method calls appeals to me.

Instead of:


@dogs.sort: { .name }

Something like:


@dogs.sort_by.name

But even as already implemented, I agree it's an improvement and a nice language feature. Thanks, Patrick!

Re:or this?

pmichaud on 2008-12-12T22:15:44

Part of the difference here is that .sort_by (or .sort) is a method call on the list as a whole, and any method call after that really ought to be treated as a method call on the resulting sorted list. For example,

@list.sort.reverse
@list.sort.map { ... }

Also, something like @list.sort_by.name gets the invocant in the wrong place, because in order for this to work the invocants to .name should be the individual elements of @list.

So, I think the information on how to sort (e.g., .name) really deserves to be an argument to the sort method, as opposed to a method itself.

I did think of using the method name .sort_by as a way of distinguishing the "sort by comparison function" from the "sort by value" behavior, but I think/hope the dwimminess of using the block to determine that is sufficient. It's much nicer to always think "sort" than to have to wonder or remember the answer to "do I need '.sort' here or '.sort_by'?"

Of course, Larry and the folks on perl6-language get the strongest vote, and I'll be glad to implement whatever API they finally decide on.

Thanks!

Pm

Nice!

davegaramond on 2008-12-12T23:32:24

Nice. Ever since I learnt Ruby I'm kind of envious that it has a .sort_by method whereas in Perl we have to do fleshed-out Schwartzian transform. This will make me feel at home again in Perl :-)

Aha! It's already in the spec!

pmichaud on 2008-12-13T15:38:38

Turns out that this particular form of sort already exists in the Perl 6 spec -- it was just in an odd place. In Synopis 3, we have:

The advantage of the method form is that it can be used in places that
require tighter precedence than C<~~> provides:
 
    sort { $^a.M <=> $^b.M }, @files
 
though that's a silly example since you could just write:
 
    sort { .M }, @files

So, it's just Synopsis 29 that needs to be updated, and as of this morning Moritz++ has made that change. So this is indeed an official part of the language, and we can go on remarking about how cool Perl 6 is.

Pm

A few problems...

theorbtwo on 2008-12-16T09:01:48

1: Why do you test if the arity

2: Do you really want to compare with cmp always? Your .say for %hash.sort: { .value.abs } example will no longer work if %hash has entries that are more then one digit. (Or did cmp change meaning between 5 and 6 to magically DWIM re numeric vs stringy compares -- I haven't been paying all that much attention for a long while now.)

Re:A few problems...

theorbtwo on 2008-12-18T18:13:29

Sorry for the rather fragmented first issue. It should say something along the lines of "why do you test if the airity is less then 2, rather then testing for it being exactly 1?"