So today I rolled out an RPM to 2500 linux servers that had a post install script that tried to HUP my spam scanner. It had this section of code:
my $post = <<"EOT"; ps afxww |grep my_proc_name|grep -v grep|tail --lines=+2|awk '{print$1}'|xargs kill -HUP EOTYou have to be pretty awake to spot the problem - the $1 in the awk script is interpolated in the string, so the awk script ends up being "awk '{print}'". This is very bad indeed. 2500 linux servers had to be rebooted. I owe a lot of beers.
Bummer...
Random thoughts, which may or may not be helpful:
ps afxww |grep 'm[y]_proc_name'|tail --lines=+2|awk '{print$1}'|xargs kill -HUP
killall -HUP 'm[y]_proc_name'
And don't try this at home
runrig on 2004-07-06T22:12:15
I once thought I could get around the "grep", "grep -v", AND the "tail" by doing:But (maybe depending on your awk) awk blows up when there's more than 99 fields, and there were often 'ps' lines which had very long command lines. So it was good to filter before piping to awk (or maybe just do the whole thing in perlps... | awk 'NR > 1 && /[m]y_proc_name/{print $1}' :). Re:Ouch...
josephgrossberg on 2004-07-07T15:46:46
Even better, change:
ps afxww |grep 'm[y]_proc_name'|tail --lines=+2|awk '{print$1}'|xargs kill -HUP
to
ps afxww |grep 'm[y]_proc_name'|tail --lines=+2|awk '{print$1}'|less
the first time you run it, and make sure those are the results you expected!
Re:another perfectly good reason
Matts on 2004-07-07T10:04:53
While that's valid, it's not strictly relevant here as the RPM has to do this, and the programmers have to write the RPM generation code.
The real issue is it not going through QA (and this happened because the code to build this RPM has been in the QA queue for 6 months and keeps getting overlooked in favour of more important projects).
It also shows that developers need sleep. I'm very tired right now and not sleeping well and am making stupid decisions because of this.Re:another perfectly good reason
jmason on 2004-07-07T17:47:54
ouchy ouch.
(1) killall is definitely better
(2) a good argument for using a pidfile and "kill existing daemon" code into the daemon
(3) or an rc script;)
(4) and definitely QARe:another perfectly good reason
Matts on 2004-07-07T17:58:10
killall didn't work because it was a perl process and so I was grepping for the script path. Otherwise I would have used it, yes.Re:another perfectly good reason
vek on 2004-07-07T17:20:11
Because, you know, there have never been any sysadmins who've made a mistake while they're working as root:-)