I've been looking lately at open source network monitoring tools. I'm not impressed at all by what I've seen so far. Pretty much the least common denominator when it comes to this type of tools is Nagios, which is not a bad tool (I used it a few years ago), but did you see its Web interface? It's soooooo 1999 -- think 'Perl CGI scripts'!
A slew of other tools are based on the Nagios engine, and are trying hard to be more pleasing to the eye -- Opsview and GroundWork are some examples. Opsview seems just a wrapper around Nagios, with not a lot of improvements in terms of both functionality and UI.
I looked at the GroundWork screencast and it seemed promising, but when I tried to install it I had a very unpleasant experience. First of all, the install script uses curses (did those guys hear about unattended installs?), and requires Java 1.5. Although I had both Java 1.5 and 1.6 on my CentOS server, and JAVA_HOME set correctly, it didn't stop the installer from complaining and exiting. Good riddance.
I should say that the first open source network monitoring tool that I tried was Zenoss, which is supposed to be the poster child for Python-based monitoring tools. Believe me, I tried hard to like it. I even went back and gave it a second chance, after noticing that other tools aren't any better. But to no avail -- I couldn't get past the sensation that it's a half-baked tool, with poor documentation and obscure user interface. It could work fine if you just want to monitor some devices with SNMP, but as soon as you try to extend it with your own plugins (called Zen Packs), or if you try to use their agents (called Zen Plugins), you run into a wall. At least I did. I got tired of Python tracebacks, obscure references to 'restarting Zope' (I thought it's based on twisted), fiddling with values for the so-called zProperties of a device, trying unsuccessfully to get ssh key authentication to work with the Zen Plugins, etc, etc. I'm not the only one who went through these frustrations either -- there are plenty of other users saying in the Zenoss forums that they've had it, and that they're going to look for something else. Which is what I did too.
I also tried OpenNMS, which was better than Zenoss, but it still had a CGI feel in terms of its Web interface.
So...for now I settled on Hyperic. It's a Java-based tool with a modern Web interface, very good documentation, and it's extensible via your own plugins (which you can write in any language you want, as long as you conform to some conventions which are not overly restrictive). Hyperic uses agents that you install on every server you need to monitor. I don't mind this, I find it better than configuring SNMP to death. It does have it quirks -- for example it calls devices that it monitors 'platforms' (instead of just 'devices' or 'servers'), and it calls the plugins that monitor specific services 'servers' (instead of services). Once you get used to it, it's not that bad. However, I wish there was a standard nomenclature for this stuff, as well as a standard way for these tools to inter-operate. As it is, you have to learn each tool and train your brain to ignore all the weirdness that it encounters. Not an optimal scenario by any means.
I'm very curious to see what tools other people use. If you care to leave a comment about your monitoring tool of choice, please do so!
I'll report back with more stuff about my experiences with Hyperic.