Really stupid programmer error

Matts on 2004-07-06T21:16:18

So today I rolled out an RPM to 2500 linux servers that had a post install script that tried to HUP my spam scanner. It had this section of code:

    my $post = <<"EOT";
ps afxww |grep my_proc_name|grep -v grep|tail --lines=+2|awk '{print$1}'|xargs kill -HUP
EOT
You have to be pretty awake to spot the problem - the $1 in the awk script is interpolated in the string, so the awk script ends up being "awk '{print}'". This is very bad indeed. 2500 linux servers had to be rebooted. I owe a lot of beers.


Ouch...

babbage on 2004-07-06T21:36:06

Bummer... :-)

Random thoughts, which may or may not be helpful:

  • Skip the "grep -v grep" step:

    ps afxww |grep 'm[y]_proc_name'|tail --lines=+2|awk '{print$1}'|xargs kill -HUP

  • Replace the whole line with killall (if available):

    killall -HUP 'm[y]_proc_name'

  • Don't wrap commands in doublequotes :-)

And don't try this at home

runrig on 2004-07-06T22:12:15

I once thought I could get around the "grep", "grep -v", AND the "tail" by doing:
ps ... | awk 'NR > 1 && /[m]y_proc_name/{print $1}'
But (maybe depending on your awk) awk blows up when there's more than 99 fields, and there were often 'ps' lines which had very long command lines. So it was good to filter before piping to awk (or maybe just do the whole thing in perl :).

Re:Ouch...

josephgrossberg on 2004-07-07T15:46:46

Even better, change:

ps afxww |grep 'm[y]_proc_name'|tail --lines=+2|awk '{print$1}'|xargs kill -HUP

to

ps afxww |grep 'm[y]_proc_name'|tail --lines=+2|awk '{print$1}'|less

the first time you run it, and make sure those are the results you expected!

another perfectly good reason

hfb on 2004-07-07T06:52:26

that programmers should never be given root...not /ever/. I hope those sysadmins guilt you out of enough beer that you begin to feel queasy the next time you even think about root. :)

Re:another perfectly good reason

Matts on 2004-07-07T10:04:53

While that's valid, it's not strictly relevant here as the RPM has to do this, and the programmers have to write the RPM generation code.

The real issue is it not going through QA (and this happened because the code to build this RPM has been in the QA queue for 6 months and keeps getting overlooked in favour of more important projects).

It also shows that developers need sleep. I'm very tired right now and not sleeping well and am making stupid decisions because of this.

Re:another perfectly good reason

jmason on 2004-07-07T17:47:54

ouchy ouch.

(1) killall is definitely better
(2) a good argument for using a pidfile and "kill existing daemon" code into the daemon
(3) or an rc script ;)
(4) and definitely QA

Re:another perfectly good reason

Matts on 2004-07-07T17:58:10

killall didn't work because it was a perl process and so I was grepping for the script path. Otherwise I would have used it, yes.

Re:another perfectly good reason

vek on 2004-07-07T17:20:11

Because, you know, there have never been any sysadmins who've made a mistake while they're working as root :-)