- Ideally open source, of if not affordable per host per month pricing (we already signed up as a paying customer of Boundary for example)
- Installation and configuration should be easily scriptable
- server installation, as well as addition/modification of clients should be easily automated so it can be done with Puppet/Chef
- API would be ideal
- Robust notifications/alerting rules
- escalations
- service dependencies
- event handler scripts
- alerts based on subsets of hosts/services
- for example alert me only when 2+ servers of the same type are down
- Out-of-the-box plugins
- database-specific checks for example
- Scalability
- the monitoring server shouldn't become a bottleneck as more clients are added
- nagios is OK with 100-200 clients (with passive checks)
- hierarchy of servers should be supported
- agent-based clients
- Reporting/dashboards
- Hosts/services status dashboards
- Downtime/outages dashboards
- Latency (for HTTP checks)
- Resource graphing would be great
- but in my experience very few tools do both alerting and resource/metrics graphing well
- in the past I used Nagios for alerting and Ganglia/Graphite for graphing
- Integration with other tools
- Send events to graphing tools (Graphite), alerting tools (PagerDuty), notification mechanisms (irc, Campfire), logging tools (Graylog2)
- Sensu
- OpenNMS
- Icinga
- Zenoss
- Riemann
- Ganglia
- Datadog
Update: more tools mentioned in comments or on Twitter after I posted a link to this blog post:
- New Relic (which I am actually in the process of evaluating, having paid for 1 host)
- Circonus
- Zabbix
- Server Density
- Librato
- Comostas
- OpsView
- Shinken
- PRTG
- NetXMS
- Tracelytics
 
 
19 comments:
In the same hosted space as Datadog I'll also put forward our service: wwwserverdensity.com (in fact DD's agent is a fork of our open source agent).
Thanks, added Server Density to list of tools in my post.
We're using check_mk at work, and quite like both its setup and its look. It builds/works on top of nagios, so I'm not sure if it counts in your list.
Rollout is especially easy on the clients.
Hello there, you can try comostas at comostas.com
there is 2 versions of it but both support most of the things you mentioned in your post.
The tool is quite new and was never advertised in anyplace yet though it already robust and used in monitoring by most of Israel banks ;-)
Have you looked at OpsView?
http://www.opsview.com/
There is an Open Source base version with pay-for extensions.
Have you looked at Tracelytics? They have more of a cloud based pricing model to scale..
Have a look at netxms.
Since you'use python, have a look at Shinken. It's a rewrite of nagios in python, with modern concepts (distributed, integrated graphite, webui agnostic etc.)
Since you're using puthon extensively, have a look at shinken, a Nagios rewrite in python with modern technologies!
Good luck at the new job!
I work at a Teleport for B2B Satellite Communication, and we use a mix of the commercial product WhatsUp Gold for Monitoring and the Open Source project Cacti (www.cacti.net) for Graphing.
Personally, if the budget allows, I'd recommend PRTG by Paessler (www.paessler.com) - it can do both graphing and monitoring and works very well. If you need to analyze traffic flows, Scrutinizer by Plixer (www.plixer.com) is an awesome tool.
Thanks everybody for the recommendations, I updated my original post with all the tools mentioned in the comments so far.
Stick with nagios. I still use it and it's great.
For system monitoring I am using check_mk with pnp4nagios for graphing and NagVis for visualising data.
When it comes to application monitoring, New Relic is a good choice.
Nice posting
We started with Nagios, tried the check_mk extension over Nagios, and then were delighted to discover the next level of nagios + check_mk, OMD.
OMD (Open Monitoring Distribution) seems to be from the author of check_mk and provides a system that is easy to install, easy to administer (web UI for many operations), and configured with Python scripts.
It uses all the Nagios plugins, plus has an additional way of allowing a monitoring agent to discover services on a computer and inform the server of the new services.
Grig, my earlier posting was incorrect in the expansion of OMD. It is the Open Monitoring Distribution. omdistro.org is the web site for the distribution. It is open source and works well in the tests I've run (monitoring 100+ machines from a 5 year old laptop)
I have used several monitoring solutions for more than three years; I recommend
from among them IPHost network monitor
So, what solution did you choose and why?
Post a Comment