Stripping certain requests from log files

paulm on 2004-06-09T00:32:02

I have this service that performs a HEAD request on a few sites I run, every few minutes. The problem is that it inflates my page hit count. Being one of those problems that is too annoying to do properly, the following one-liner (line broken for a shot at clarity) came out surprisingly easily,

for i in www.paulm.com-access.log.*gz; do
  gunzip $i; f=`echo $i | perl -pe 's/.gz//'`;
  perl -ni -e 'print unless /^ftp.itransact/' $f;
  gzip $f& echo $f;
done



zgrep might be handy too

grantm on 2004-06-09T19:10:29

zgrep -v '^ftp.itransact'

Of course once you add in a shell loop to iterate over the files and re-gzip them it starts to look like your original code :-)

Re:zgrep might be handy too

paulm on 2004-06-11T15:19:58

Yeah, true. zgrep's pretty handy for scanning but not so much use for in-place mods, so far as I've seen. I'm sure it's possible with some of bash's more esoteric (for me!) constructs... But why bother when perl's there, heh. It's funny watching these little hacks have perl appear in there somewhere (f=`perl ...`), then as more functionality is needed, more of it gets shoe-horned into the perl invocation until *pop* it's all in perl and it gets re-written "properly":
#!/bin/sh
exec perl $0.pl
:-)