Solid programs II

jplindstrom on 2003-01-11T21:59:00

Adequate programs log errors when things go wrong. Solid programs also notify the admin when things go wrong, and the programmer when insanity strikes.

No-one will actively monitor error logs just for the fun of it. Not for any length of time. So programs need to tell you when they put something in there.

And if the alert e-mails become a major hassle, the problem isn't the numerous e-mails, it's the broken program. Fix the problem (the broken program) to lose the hassle (the e-mail load).


Solid Programs delegate

dws on 2003-01-11T23:33:23

This reminds me of the dictum that all programs expand until they can handle email. It's a slippery-slope argument.

I'd be more inclined to use one of the various monitoring packages that can be configured to look for particular patterns in logsfiles.

Check out Nagios for an Open Source monitoring package, and check out the screenshots.

Re:Solid Programs delegate

jplindstrom on 2003-01-12T00:20:16

I'd say that the concept of active notification is more important than the method.

Who sends the e-mail or ICQ-message (or inserts the row in the database or whatever) isn't that important.

One argument for keeping it outside the program (aside from keeping the program as simple as possible) is stability; by keeping the notification mechanism outside you don't risk that the program itself is so screwed up that it can't send e-mail.

But that may be more of an issue if you write in e.g. C++ or some other language which tends to crash and burn more severely than Perl does.

Re:Solid Programs delegate

koschei on 2003-01-12T01:28:47

I think the obvious thing to do is compartmentalise and keep points of failure minimised.

That is, have the program log somewhere, and have a program such as 'swatch' notify. This way, one could happily replace swatch with something more powerful/flexible. e.g. a POE system that will msg you on irc, or jabber, or wherever as well as email, sms whatever.

Of course, then you have the problem of "what if my watcher dies".

Re:Solid Programs delegate

chromatic on 2003-01-12T01:35:29

Of course, then you have the problem of "what if my watcher dies".

I think another one's called from the pool of available watchers. Someone from London.pm or NY.pm will have to confirm, though.

Re:Solid Programs delegate

jplindstrom on 2003-01-12T02:54:23

Preferrably with some kind of authentication so Trojan Watchers can be avoided.

PGP anyone? :)

Re:Solid Programs delegate

jplindstrom on 2003-01-12T02:35:27

Of course, then you have the problem of "what if my watcher dies".

Hehe.

Who watches the watchmen? -- Qui custodiet ipsos custodes?

I wrote a monitoring script once which crashed spontaneously after a measly tre, four months on some machines. So I created a watch-process for the monitoring process. I called the watcher watcher process "Custos Custodum". It basically means (I've been told) "I watch the watcher".

Bonus-URL:
http://www.geocities.com/SoHo/Study/4273/watch.html
http://www.grovel.org.uk/reviews/watchm01/watchm01.htm

Re:Solid Programs delegate

koschei on 2003-01-12T03:10:01

I was running swatch under djb's supervise utils. It worked quite well =) And since supervise is 'watched' by init...

Re:Solid Programs delegate

ask on 2003-01-12T10:56:53

If I don't use supervise I usually use something like this script to keep the daemons going and going and going.

It also (like supervise) makes it easy to implement an option in the daemon to have it restart itself.

    - ask