Nagios is a great monitoring tool. It has a long history and learned a whole lot since it was created. It's being developed at a good pace but doesn't get ahead of itself (e.g., version 3 was only released recently). It is a (very) stable monitoring solution that has tight integration with Perl (there's a build-in Perl interpreter) and with the Perl Community (there are Perl modules to help write Nagios plugins, which are available on CPAN.
Now, giving the previous introduction (which Nagios deserves at least three times as much), I had the problem of needing to monitor a massive amount of servers using Nagios. Sitting down and writing the configuration files is out of the question, so instead I just wrote a system that inspects our database and creates the configuration files for it.
It's working and it has a lot of options, including configuration files support, Moose-based, command line arguments utility, tests, etc.
However, one feature I had a difficult time adding was the need to allow a different type of dependency that Nagios (and other monitoring applications) are not used to, and probably haven't even though of.
Nagios uses NRPE to run tests on remote servers. It supports SSL and ACL. Now, there are some firewall problems on some of the servers, in which I'm unable to run tests through the NRPE service. I want to add a group called "has_nrpe" or "nrpe_working" or whatever, and then only test a server that is both that group and another one with a certain test. That is, Server1 will be tested for Test1 only if it's in the group "has_nrpe" AND "needs_to_be_tested_for_test1".
Unfortunately I'm unable to do that, so I will have the code allow certain configuration options to fetch the servers that are in both groups manually and provide them individually as hosts for the test.
If anyone could advise otherwise, I'd love to know better.
Re:check out opsview
xsawyerx on 2009-06-10T08:08:34
First of all, thanks for showing me this! I came across Opsview a long time ago but didn't give it too much thought. It seems very very nicely done.
Secondly, I don't know if you're the person to ask, but I noticed they have a good sense of inheritance and multi-inheritance, but that's something I would rather avoid. Do they have a concept of flat groups, as in roles?
I assume they do. The biggest question is if I could add tests that apply only to combined hostgroups.
Re:check out opsview
oliver on 2009-06-10T08:26:46
Sorry I'm not an expert by any means. Best thing is to try the opsview users mail list, or #opsview on freenode IRC (I think).