Perl 5.10.0, (not-so) giaganto-times faster than Ruby 1.8.6

jjore on 2009-07-03T21:00:02

Earlier today I wanted to edit a lot of source code to add some Emacs hints about the tab stop. I originally wrote it in perl and it works fine, takes two seconds to edit 150MB of text across 4851 files.

time find . -type f -print0 | xargs -0 /opt/perl-5.10.0/bin/perl tab-width

real 0m1.966s user 0m1.340s sys 0m0.354s


Here's the Perl 5:

undef $/;
my $RX = qr/
    (?
        ^(?.*)Local\ variables:.*\n
    )
    (?
        (?:
            \k.*\n
        )*
    )
    (?
        \kEnd:
    )
/mix;

for my $fn ( @ARGV ) { next if $fn =~ m{/.git};

open my $fh, '+<', $fn or die "Can't open $fn: $!"; binmode $fh or die $!;

my $src = <$fh>; next unless $src =~ s{$RX}{$+{start}$+{variables}$+{indent}tab-width: 8\n$+{end}};

seek $fh, 0, 0 or die $!; truncate $fh, 0 or die $!; print { $fh } $src or die $!; close $fh or die "Can't close $fh: $!"; }


I need to know Ruby better so I wrote a Ruby version of the same program. I don't know how long it takes to run. I started it awhile ago. In the time it's taking Ruby to run this, I got up to have some Pizza and Beer, then wrote this blog post.

time find . -type f -print0 | xargs -0 /opt/ruby-1.8.6/bin/ruby tab-width.rb
	
real	31m19.763s
user	18m7.957s
sys	12m5.596s


Here's the Ruby:

RX = Regexp.new(
    "
    (
        ^(.*)Local\\ variables:.*\\n
    )
    (
        (
            \\2.*\\n
        )*
    )
    (
        \\2End:
    )",
    Regexp::EXTENDED | Regexp::MULTILINE
)

ARGV.each do |fn| next if fn =~ %r{/\.git}

open( fn, 'r+' ) do |fh| fh.binmode

src = fh.read next unless src.sub!( RX ) {|m| "#{$1}#{$3}#{$2}tab-width: 8\n#{$5}" }

fh.seek 0, IO::SEEK_SET fh.truncate 0 fh.puts src end end


Update: It's been over half an hour now. Update: aha, done now Update: got the .sub!() wrong. Fixed. Presumably I'm waiting another 30m. Going for a bike ride instead of waiting. Update: didn't go biking. Asked #ruby-lang for help. (?>) and (?!) to control the ruby regexp engine seem to just trigger stack overflows Update: this next big chunk.

So I talked to helpful people on #ruby-lang like larsch, rpag, and Aria. It turns out one major mistake was mis-using Regexp::MULTILINE. It's equivalent to perl's /s, not perl's /m. The docs don't actually talk about this so I had to guess. I guessed wrong.

Now I find Ruby 1.8.6 is only 3x slower while 1.9 is equivalent to Perl 5.10.0