Smart match beats the hell out of grep() and first()

brian_d_foy on 2007-12-22T19:09:00

Alberto Simões posted up an interesting benchmark illustrating that ~~, the new smart match operator in 5.10, outperforms both grep and List::Util::first() in the "is X in this list" role.

Here are my results:

$ perl5.10.0 bench.plx
Benchmark: timing 100000 iterations of first, grep BLOCK, grep EXPR, ~~...

     first: 23 wallclock secs (22.78 usr +  0.03 sys = 22.81 CPU) @ 4384.04/s (n=100000)
grep BLOCK: 18 wallclock secs (18.48 usr +  0.03 sys = 18.51 CPU) @ 5402.49/s (n=100000)
 grep EXPR: 17 wallclock secs (17.56 usr +  0.03 sys = 17.59 CPU) @ 5685.05/s (n=100000)
        ~~: 10 wallclock secs ( 9.97 usr +  0.01 sys =  9.98 CPU) @ 10020.04/s (n=100000)

What I believe it's illustrating is the Perl optimization rule of thumb that the more work you let the opcodes do, the better. ~~ doesn't have to execute a Perl expression over and over again, it can do that in optimized C.

Here's the benchmark code:

use Benchmark;
use 5.010;
use List::Util qw(first);

my @array = map { chr(64+int(rand(26)))."$_" } 1..1000;

timethese(100_000, {
           'first'   => sub {
               my $needle = chr(64+int(rand(26))).int(rand(1000)+1);
               first { $_ eq $needle } @array;
           },
           'grep BLOCK'   => sub {
               my $needle = chr(64+int(rand(26))).int(rand(1000)+1);
               grep { $_ eq $needle } @array;
           },
           'grep EXPR'   => sub {
               my $needle = chr(64+int(rand(26))).int(rand(1000)+1);
               grep $_ eq $needle, @array;
           },
           '~~' => sub {
               my $needle = chr(64+int(rand(26))).int(rand(1000)+1);
               $needle ~~ @array;
           }
});


Caution Advised

Dom2 on 2007-12-22T21:30:19

I'd be wary of using a void context in each of those benchmarks. Perl has been known to "optimise" these.

Re:Caution Advised

Aristotle on 2007-12-23T03:35:31

It seems to me that if void context was having an effect, the rates would be much higher than on the order of 5,000 iterations/sec. The benchmark could be more realistic, but it doesn’t seem fatally flawed as a mere rule of thumb measurement.

Caution waived

n1vux on 2007-12-23T07:21:23

duplicating the tests with "say $NULL ( );" wrapped around formerly void context (going to /dev/null of course) and adding "use warnings;", I do see a "useless in void" warning for the first test, but the change in times with say instead of void context is minimal or nil.