Sunday, November 30, 2008

The sad state of open source monitoring tools

I've been looking lately at open source network monitoring tools. I'm not impressed at all by what I've seen so far. Pretty much the least common denominator when it comes to this type of tools is Nagios, which is not a bad tool (I used it a few years ago), but did you see its Web interface? It's soooooo 1999 -- think 'Perl CGI scripts'!

A slew of other tools are based on the Nagios engine, and are trying hard to be more pleasing to the eye -- Opsview and GroundWork are some examples. Opsview seems just a wrapper around Nagios, with not a lot of improvements in terms of both functionality and UI.

I looked at the GroundWork screencast and it seemed promising, but when I tried to install it I had a very unpleasant experience. First of all, the install script uses curses (did those guys hear about unattended installs?), and requires Java 1.5. Although I had both Java 1.5 and 1.6 on my CentOS server, and JAVA_HOME set correctly, it didn't stop the installer from complaining and exiting. Good riddance.

I should say that the first open source network monitoring tool that I tried was Zenoss, which is supposed to be the poster child for Python-based monitoring tools. Believe me, I tried hard to like it. I even went back and gave it a second chance, after noticing that other tools aren't any better. But to no avail -- I couldn't get past the sensation that it's a half-baked tool, with poor documentation and obscure user interface. It could work fine if you just want to monitor some devices with SNMP, but as soon as you try to extend it with your own plugins (called Zen Packs), or if you try to use their agents (called Zen Plugins), you run into a wall. At least I did. I got tired of Python tracebacks, obscure references to 'restarting Zope' (I thought it's based on twisted), fiddling with values for the so-called zProperties of a device, trying unsuccessfully to get ssh key authentication to work with the Zen Plugins, etc, etc. I'm not the only one who went through these frustrations either -- there are plenty of other users saying in the Zenoss forums that they've had it, and that they're going to look for something else. Which is what I did too.

I also tried OpenNMS, which was better than Zenoss, but it still had a CGI feel in terms of its Web interface.

So...for now I settled on Hyperic. It's a Java-based tool with a modern Web interface, very good documentation, and it's extensible via your own plugins (which you can write in any language you want, as long as you conform to some conventions which are not overly restrictive). Hyperic uses agents that you install on every server you need to monitor. I don't mind this, I find it better than configuring SNMP to death. It does have it quirks -- for example it calls devices that it monitors 'platforms' (instead of just 'devices' or 'servers'), and it calls the plugins that monitor specific services 'servers' (instead of services). Once you get used to it, it's not that bad. However, I wish there was a standard nomenclature for this stuff, as well as a standard way for these tools to inter-operate. As it is, you have to learn each tool and train your brain to ignore all the weirdness that it encounters. Not an optimal scenario by any means.

I'm very curious to see what tools other people use. If you care to leave a comment about your monitoring tool of choice, please do so!

I'll report back with more stuff about my experiences with Hyperic.

Friday, November 21, 2008

Issues with Ubuntu 8.10 on Lenovo T61p laptop

I got a new Lenovo ThinkPad T61p, and of course I promptly installed Ubuntu Ibex 8.10 on it. The first day I used it, I had no issues, but this morning it froze no less than 3 times, and each time the Caps Lock light flashed. I googled around, and I found what I hope is the solution in this post on the Ubuntu forums. It seems that this is the core issue:

System lock-ups with Intel 4965 wireless

The version of the iwlagn wireless driver for Intel 4965 wireless chipsets included in Linux kernel version 2.6.27 causes kernel panics when used with 802.11n or 802.11g networks. Users affected by this issue can install the linux-backports-modules-intrepid package, to install a newer version of this driver that corrects the bug. (Because the known fix requires a new version of the driver, it is not expected to be possible to include this fix in the main kernel package.)

As recommended, I did 'apt-get install backports-modules-intrepid' and I rebooted. That was around 1 hour ago, and I haven't seen any issues since. Hopefully that was it. BTW, when the Caps Lock light blinks, it means 'kernel panic'. Who knew.

Thursday, November 13, 2008

Python and MS Azure

You've probably heard by now of Microsoft's entry in the cloud computing race, dubbed Azure. What I didn't know until I saw it this morning on InfoQ was that Microsoft encourages the use of languages and tools other than their official ones. Here's what they say on the 'What is the Azure Service Platform' page:

"Windows Azure is an open platform that will support both Microsoft and non-Microsoft languages and environments. Windows Azure welcomes third party tools and languages such as Eclipse, Ruby, PHP, and Python."

While you and I may think MS says this just for marketing/PR purposes, it turns out they are walking the walk a bit. I was glad to see in the InfoQ article that a Microsoft guy wrote a Python wrapper on top of the Azure Data Storage APIs. Note that this is classic CPython, not IronPython. I assume more interesting stuff can be done with IronPython.

Wednesday, November 12, 2008

"phrase from nearest book" meme

Via Elliot:

  • Grab the nearest book.
  • Open it to page 56.
  • Find the fifth sentence.
  • Post the text of the sentence in your journal along with these instructions.
  • Don’t dig for your favorite book, the cool book, or the intellectual one: pick the CLOSEST.
Here's mine, from 'Kim' by Kipling:

"A little later a marriage procession would strike into the Grand Trunk with music and shoutings, and a smell of marigold and jasmine stronger even than the reek of the dust."

Not bad, I like it :-)

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...