Tuesday, February 22, 2011

HAProxy monitoring with Nagios and Munin

HAProxy is one of the most widely used (if not THE most widely used) software load balancing solution out there. I definitely recommend it if you're looking for a very solid and very fast piece of software for your load balancing needs. I blogged about it before, but here I want to describe ways to monitor it with Nagios (for alerting purposes) and Munin (for resource graphing purposes).

HAProxy Nagios plugin

Near the top of Google searches for 'haproxy nagios plugin' is this message to the haproxy mailing list from Jean-Christophe Toussaint which contains links to a Nagios plugin he wrote for checking HAProxy. This plugin is what I ended up using. It's a Perl script which needs the Nagios::Plugin CPAN module installed. Once you do it, drop check_haproxy.pl in your Nagios libexec directory, then configure it to check the HAProxy stats with a command line similar to this:

/usr/local/nagios/libexec/check_haproxy.pl -u 'http://your.haproxy.server.ip:8000/haproxy;csv' -U hauser -P hapasswd

This assumes that you have HAProxy configured to output its statistics on port 8000. I have these lines in /etc/haproxy/haproxy.cfg:
# status page.
listen stats 0.0.0.0:8000
    mode http
    stats enable
    stats uri /haproxy
    stats realm HAProxy
    stats auth hauser:hapasswd

Note that the Nagios plugin actually requests the stats in CSV format. The output of the plugin is something like:

HAPROXY OK -  cluster1 (Active: 60/60) cluster2 (Active: 169/169) | t=0.131051s;2;10;0; sess_cluster1=0sessions;;;0;20000 sess_cluster2=78sessions;;;0;20000

It shows the active clusters in your HAProxy configuration (e.g. cluster2), together with the number of backends that are UP among the total number of backends for that cluster (e.g 169/169), and also with the number of active sessions for each cluster. If any backend is DOWN, the check status code is critical and you'll get a Nagios alert.

HAProxy Munin plugins

Another Google search, this time for HAProxy and Munin, reveals another message to the haproxy mailing list with links to 4 Munin plugins written by Bart van der Schans:

- haproxy_check_duration: monitor the duration of the health checks per server
- haproxy_errors: monitor the rate of 5xx response headers per backend
- haproxy_sessions: monitors the rate of (tcp) sessions per backend
- haproxy_volume: monitors the bps in and out per backend

I downloaded the plugins, dropped them into /usr/share/munin/plugins, symlink-ed them into /etc/munin/plugins, and added this stanza to /etc/munin/plugin-conf.d/munin-node:

[haproxy*]
user haproxy
env.socket /var/lib/haproxy/stats.socket

However, note that for the plugins to work properly you need 2 things:

1) Configure HAProxy to use a socket that can be queried for stats. I did this by adding these lines to the global section in my haproxy.cfg file:

chroot /var/lib/haproxy
user haproxy
group haproxy
stats socket /var/lib/haproxy/stats.socket uid 1002 gid 1002

(where in my case 1002 is the uid of the haproxy user, and 1002 the gid of the haproxy group)

After doing 'service haproxy reload', you can check that the socket stats work as expected by doing something like this (assuming you have socat installed):

echo 'show stat' | socat unix-connect:/var/lib/haproxy/stats.socket stdio

This should output the HAProxy stats in CSV format.

2) Edit the 4 plugins and change the 'exit 1' statement to 'exit 1' at the top of each plugin:

if ( $ARGV[0] eq "autoconf" ) {
    print_autoconf();
    exit 0;
} elsif ( $ARGV[0] eq "config" ) {
    print_config();
    exit 0;
} elsif ( $ARGV[0] eq "dump" ) {
    dump_stats();
    exit 0;
} else {
    print_values();
    exit 0;
}

If you don't do this, the plugins will exit with code 1 even in the case of success, and this will be interpreted by munin-node as an error. Consequently, you will scratch your head wondering why no haproxy-related links and graphs are showing up on your munin stats page.

Once you do all this, do 'service munin-node reload' on the node running the HAProxy Munin plugins, then check that the plugins are working as expected by cd-ing into the /etc/munin/plugins directory and running each plugin through the 'munin-run' utility. For example:

# munin-run haproxy_sessions 
cluster2.value 146761052
cluster1.value 0

That's it. These plugins make it fairly easy for you to get more peace of mind and a better sleep at night. Although it's well known that in #devops we don't sleep that much anyway...

6 comments:

Anonymous said...

how did you get the nagios script to work with authentication? I get Unknown option: U
Unknown option: P

It seems like this script doesnt address web authentication

Anonymous said...

you have to add some new oprions to the py file...


diff -r fd7ee65c064b check_haproxy.pl
--- a/check_haproxy.pl Thu Mar 11 11:25:18 2010 +0100
+++ b/check_haproxy.pl Thu Mar 11 13:07:33 2010 +0100
@@ -55,13 +55,23 @@
my $np = Nagios::Plugin->new(
version => $VERSION,
blurb => _gt('Plugin to check HAProxy stats url'),
- usage => "Usage: %s [ -v|--verbose ] -u [-t ] [ -c|--critical= ] [
-w|--warning= ]",
+ usage => "Usage: %s [ -v|--verbose ] -u [-t ] [-U ] [-P ] [
-c|--critical= ] [ -w|--warning= ]",
timeout => $TIMEOUT+1
);
$np->add_arg (
spec => 'debug|d',
help => _gt('Debug level'),
default => 0,
+);
+$np->add_arg (
+ spec => 'username|U=s',
+ help => _gt('Username for HTTP Auth'),
+ required => 0,
+);
+$np->add_arg (
+ spec => 'password|P=s',
+ help => _gt('Password for HTTP Auth'),
+ required => 0,
);
$np->add_arg (
spec => 'w=f',
@@ -86,6 +96,8 @@

$DEBUG = $np->opts->get('debug');
my $verbose = $np->opts->get('verbose');
+my $username = $np->opts->get('username');
+my $password = $np->opts->get('password');

# Thresholds :
# time
@@ -114,6 +126,10 @@

# Build and submit an http request :
my $request = HTTP::Request->new('GET', $url);
+# Authenticate if username and password are supplied
+if ( defined($username) && defined($password) ) {
+ $request->authorization_basic($username, $password);
+}
my $timer = time();
my $http_response = $ua->request( $request );
$timer = time()-$timer;
@@ -181,7 +197,7 @@
if ( !defined($stats{$values[0]}{$values[1]}) ) {
$stats{$values[0]}{$values[1]} = {};
}
- for ( my $x = 2,; $x < $#values; $x++ ) {
+ for ( my $x = 2,; $x <= $#values; $x++ ) {
# $stats{pxname}{svname}{valuename}
$stats{$values[0]}{$values[1]}{$fields[$x]} = $values[$x];
}

urba said...

Use the last version (with -U and -P):

http://cvs.orion.education.fr/viewvc/viewvc.cgi/nagios-plugins-perl/trunk/plugins/check_haproxy.pl?view=markup

Anonymous said...

I am getting following error

# ./check_haproxy.pl
Bareword "LC_MESSAGES" not allowed while "strict subs" in use at ./check_haproxy.pl line 51.
Execution of ./check_haproxy.pl aborted due to compilation errors.

Any thoughts ??

Gagan said...

check_http -a ':' worked for me

Anonymous said...

Hi

I am getting below error when trying to execute the command

"Can't locate Nagios/Plugin.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at ./check_haproxy.pl line 33.
BEGIN failed--compilation aborted at ./check_haproxy.pl line 33"

can anybody help me?

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...