Surprising filename sorting in Mac Finder

ChrisDolan on 2006-12-11T18:20:51

Today I was briefly confused by the Mac Finder. It sorted a collection of files in a surprising order.

screenshot

It appears that if Finder sees a number in a filename, it tokenizes that number and sorts it numerically before progressing to sorting subsequent alpha characters. This leads to unexpected sorting of files named hexadecimally.

Purely out of curiousity, I whipped up an implementation of this sort algorithm. Is there an easier way to do this?

use Test::More tests => 1;
my @items = ;
for (@items) { s/ //g; } # remove use.perl.org spaces
is_deeply([sort macsort @items], \@items);
sub macsort {
   my @a = split /(\d+)/, $a;
   my @b = split /(\d+)/, $b;
   while (@a && @b) {
      my $a_token = shift @a;
      my $b_token = shift @b;
      my $cmp = $a_token =~ /\A\d+\z/ && $b_token =~ /\A\d+\z/
              ? $a_token <=> $b_token
              : $a_token cmp $b_token;
      return $cmp if $cmp;
   }
   return 1 if @a;
   return -1 if @b;
   return 0;
}
__DATA__
0e1f07df-20a8-4b82-a3b3-21a25e6a7bd5.MP
0ee4b3b0-e6d1-4fca-8d27-a64d32b66e49.MP
0f63535c-76e0-407a-9a79-31f86c3c3ea5.MP
4e5fe1ad-00d4-435c-b18c-582e4c942f31.MP
9e4188f3-002a-47ff-bf8f-366c353c2f75.MP
9f510eb5-ce64-4222-aba3-59d6e426120a.MP
56fc7d70-2ed2-421f-adb1-da7b8805f001.MP
941c843b-f4ff-45aa-bb94-f34b4a61ace7.MP
7650cd13-f3f8-42c2-9f33-d8e7c410e7bd.MP
074908bd-895a-48d5-83ae-077872ba644d.MP
102044d9-a9e6-4454-9182-ed579b01897b.MP
a3ac0ca2-18ae-4b02-badd-1533e94d40f1.MP
aaae707d-e01e-4454-a856-3efba6588da4.MP
ab7025d9-b535-40bb-8807-8733ddcb17d0.MP
b357c383-7469-48b1-9c17-0fd2973b7896.MP
e3da5d7d-911d-4ba5-9654-9d579e5e1846.MP



I think they call this a "natural sort"

bart on 2006-12-12T21:01:54

See Sort::Naturally.

Your idea to use

split /(\d+)/
was IMHO an inspired one, so I cannot improve on that idea. Do note that the numeric string sections will always be at the list items with odd index (1, 3, 5, ...) and the string parts at even index (0, 2, 4, ...).

Here's an alternative implementation, IMO somewhat streamlined, and using an OR-cache (AKA an "Orcish Manoeuvre") to store the split array:

my %orcache;
sub macsort {
   my $A = $orcache{$a} ||= [ split /(\d+)/, $a ];
   my $B = $orcache{$b} ||= [ split /(\d+)/, $b ];
   my $cmp;
   for(my $i = 0; $i < @$A && $i < @$B; $i++) {
      $cmp = $i & 1
              ? $A->[$i] <=> $B->[$i]
              : $A->[$i] cmp $B->[$i]
        and return $cmp;
   }
   return @$A cmp @$B;
}
BTW you forgot to chomp the data, which may produce unexpected results when sorting the items, as there's a newline appended to every last array item.

Re:I think they call this a "natural sort"

ChrisDolan on 2006-12-13T01:37:21

Nice implementation. I think the last line is better written as:

return @$A <=> @$B;
but since one of them must be zero, the results are equivalent.

Is i & 1 preferable to i % 1? I consider the latter to be slightly more intuitive.

BTW you forgot to chomp the data

Yeah, I skipped that just for simplicity of the example It doesn't affect the test case.

Re:I think they call this a "natural sort"

bart on 2006-12-13T17:57:31

I think the last line is better written as:

return @$A <=> @$B;
Uh, of course, that's what I planned to write. I had been messing with the implementation showing debugging information during the comparison, and this must be a leftover. That's my story and I'm sticking to it. ;-)

Is i & 1 preferable to i % 1? I consider the latter to be slightly more intuitive.
At least it gives the correct results. :-) Surely you ment to type $i % 2.

A recent thread on Perlmonks by diotalevi shows that nowadays, these instructions take the same time. It used to make a difference, in my favour, 20 years ago — before the numerical coprocessors, that all CPUs have embedded, nowadays.

BTW you forgot to chomp the data
I skipped that just for simplicity of the example It doesn't affect the test case.
But it did have strange effects when I added a test case, where not all strings split into an equal number of parts, and the part that some do have in common is the same. Or so I thought.

Re:I think they call this a "natural sort"

ChrisDolan on 2006-12-13T18:27:24

Is i & 1 preferable to i % 1? I consider the latter to be slightly more intuitive.
At least it gives the correct results. :-) Surely you ment to type $i % 2.
Uh, of course, that's what I planned to write. That's my story and I'm sticking to it. :-P