Yes it's been yonks since I posted any perl. Well today I learned that read() can take an offset to where to put data in your $buf, so I can implement what should be an efficient grep for multiple strings in binary data (i.e. where I can't do: while (<$fh>) ). So given $fh and @strings and max() from List::Util, I can do this:
my $max_len = max(map{length} @strings); my $regexp = "(" . join("|", map {quotemeta} @strings) . ")"; my $buf = ''; while (1) { substr($buf, 0, length($buf) - $max_len) = ""; my $len = read($fh, $buf, 8096, length($buf)); last unless $len; if ($buf =~ /$regexp/o) { return $1; } }I could probably add code to show where in the file it matched, but I don't need that.
Re:Use array storage instead?
Matts on 2006-06-01T01:51:30
Think again:s/iter chris matt
chris 4.61 -- -50%
matt 2.32 99% --
Re:/o is *evil*
Matts on 2006-06-01T14:52:36
True. Laziness was in the way. Changed to qr() now.
It’s easy to arrange:
my $max_len = max map length, @strings;
my $rx = qr/(@{[ join '|', map quotemeta, @strings ]})/;
my $buf = '';
while ( read $fh, $buf, 8096, length $buf ) {
return $1 if $buf =~ $rx;
substr $buf, 0, length $buf - $max_len, '';
}
As a bonus the code is shorter and clearer.
The following is a tweak to avoid unnecessarily shrinking $buf
when the space is needed for the read
that immediately follows, and it may or may not be faster by a few percent. I didn’t benchmark (or even test) it.
my $max_len = max map length, @strings;
my $rx = qr/(@{[ join '|', map quotemeta, @strings ]})/;
my $buf = '';
my $offs = 0;
while ( my $read = read $fh, $buf, 8096, $offs ) {
substr( $buf, $offs + $read ) = ''; # throw away residue
return $1 if $buf =~ $rx;
substr( $buf, 0, $max_len ) = substr $buf, -$max_len;
$offs = $max_len;
}