You may have seen the article Can Java technology beat Perl on its home turf with pattern matching in large files? that there has been some debate about on both #perl and comp.lang.perl.misc today.
One of the biggest criticisms of the article was that the author hasn't published the Perl code that he is comparing his Java with.
I emailed the author (found his email address thru a Google search) and pointed out the unfairness of this comparison. With half an hour I got a reply from him including the Perl code.
So here it is. Feel free to optimise it.
#!/home/hoffie/bin/perl @sunIPs=("192\\.9\\.","192\\.18\\.","192\\.29\\."); @fileext=("\\.gif","\\.jpg","\\.css","\\.GIF","\\.JPG","\\.CSS"); $filename="$ARGV[0]"; open(IN,$filename) || die "cannot open $ARGV[0] for reading: $!"; open(OUT,">$filename.out") || die "cannot open $filename.out for writing: $!"; LINE: while() { foreach $fileext (@fileext) { next LINE if ($_ =~ /$fileext HTTP/); } foreach $sunIP (@sunIPs) { next LINE if ($_ =~ /^$sunIP/); } print OUT; }
Yeah, it's almost always possible to beat bad Perl written by people who don't understand that regexes need to be compiled.foreach $fileext (@fileext) {
next LINE if ($_ =~/$fileext HTTP/);
}
foreach $sunIP (@sunIPs) {
next LINE if ($_ =~/^$sunIP/);
}
Simple first pass at it: using qr//:
#!/home/hoffie/bin/perl
@sunIPs=("192\\.9\\.","192\\.18\\.","192\\.29\\.");
@f ileext=("\\.gif","\\.jpg","\\.css","\\.GIF","\\.JPG","\\.CSS");
$filename="$ARG V[0]";
open(IN,$filename) || die "cannot open $ARGV[0] for reading: $!";
open(OUT,">$filename.out") || die "cannot open $filename.out for writing: $!";
# compile once
$fileext = join '|', @fileext;
$fileext = qr/(?:$fileext) HTTP/;
$sunIPs = join '|', @sunIPs;
$sunIPs = qr/^(?:$sunIPs)/;
LINE: while(<IN>) {
next LINE if/$filext/;
next LINE if/$sunIPs/;
print OUT;
}
This remains untested, but I'd bet that's faster!
Not sure about the execution speed of the regexes, but it's a damn sight easier to read.my $filename = shift;
open(IN,$filename) || die "cannot open $filename for reading: $!";
open(OUT,">$filename.out") || die "cannot open $filename.out for writing: $!";
while ( <IN> ) {
next if/\.(gif|jpg|css|GIF|JPG|CSS) HTTP/;
next if/192\.(9|18|29)\./;
print OUT;
}
Re:This is trivial
koschei on 2002-09-16T15:35:28
Tests? Should be/^192 and it should go faster if you use /o on the regexen.
But, yeah, much easier to read, much faster to write, and much better.
Hmm. I think I should ask pudge to make <tt> text a different colour.Re:This is trivial
petdance on 2002-09-16T15:44:06
it should go faster if you use/o on the regexen. No it won't. The
/o only applies to regexes that are based on variables, as in: That's the ONLY time thatmy $pattern = "192\.(whatever)";
if ( $foo =~/$pattern/o ) /o applies. Re:This is trivial
pdcawley on 2002-09-16T15:45:56
Without wishing to wave the golf stick, may I commendto the house?#!perl -pi.out
$_ = '' unless/(?:\.(?:gif|jpg|css|GIF|JPG|CSS)[ ]HTTP |
192\.(?:9|18|29)\.)/x
What underutilized lacky has enough time to worry about making a program that runs in 283 seconds BUT TAKES 5 MINUTES TO WRITE into a program that runs in 137 seconds BUT TAKES 15-30 minutes to write. If the program could be rewritten so that it runs under 10 seconds (my attention span), THEN the extra effort *might* be worth it. This program is likely to be run from a batch job so that a hyoo-mon isn't likely to be at the terminal waiting for it to finish.
Java cuts into my beer-drinking time.