Friday, April 23, 2010

Deploying and monitoring mysql-proxy with supervisor

I've known about supervisor ever since Chris McDonough's PyCon 2008 talk. I've wanted to use it before, especially in conjunction with mysql-proxy, but my efforts in running it haven't been met with success. That is, until I realized the one but important mistake I was making: I was trying to run mysql-proxy in its own --daemon mode, and I was expecting supervisor to control the mysql-proxy at the same time. Well, this is not how supervisor works.

Supervisor handles processes that run in the foreground, and it daemonizes them for you. It knows how to capture stdout and stderr for the process it controls and how to save those streams either combined or separately to their own log files.

Of course, what supervisor excels at is actually monitoring that the process is running -- it can detect that the process died or was killed, and it will restart that process within seconds. This sounds like the perfect scenario for the deployment of mysql-proxy. Here's what I did to get that to work:

1) Installed supervisor via easy_install.
# easy_install supervisor
I also created a configuration file for supervisord by running:
# echo_supervisord_conf > /etc/supervisord.conf
2) Downloaded the binary package for mysql-proxy (version 0.8 at the time of this post). I un-tar-ed the package in /opt/mysql-proxy-0.8.0-linux-glibc2.3-x86-64bit and I made a symlink to it called /opt/mysql-proxy.

3) I wanted mysql-proxy to send traffic to 2 MySQL servers that I had configured in master-master mode. I also wanted one MySQL server to get all traffic, and the 2nd server to be hit only if the first one went down. In short, I wanted active-passive failover mode for my 2 MySQL servers. I found this lua script written by Sheeri Cabral for doing the failover, but I had to modify it slightly -- at least in mysql-proxy 0.8, the proxy.backends variable is now called proxy.global.backends. The rest of the script was fine. I created a directory called /opt/mysql-proxy/scripts and saved my version of the script in a file called failover.lua in that directory.

4) I verified that I could correctly run mysql-proxy in foreground mode by issuing this command (where 10.1.1.1 and 10.1.1.2 are the IP addresses of my active and passive MySQL servers respectively):
/opt/mysql-proxy/bin/mysql-proxy \
--admin-address 127.0.0.1:43306 \
--proxy-address 127.0.0.1:33306 \
--proxy-backend-addresses=10.1.1.1:3306 \
--proxy-backend-addresses=10.1.1.2:3306 \
--proxy-lua-script=/opt/mysql-proxy/scripts/failover.lua

Note that I am specifying the local ports where mysql-proxy should listen for admin traffic (port 43306) and for regular MySQL proxying traffic (port 33306).

5) I created a [program] section in the /etc/supervisord.conf configuration file that looks like this:

[program:mysql-proxy-3306]
command=/opt/mysql-proxy/bin/mysql-proxy \
--admin-address 127.0.0.1:43306 \
--proxy-address 127.0.0.1:33306 \
--proxy-backend-addresses=stgdb101:3306 \
--proxy-backend-addresses=stgdb03:3306 \
--proxy-lua-script=/opt/mysql-proxy/scripts/failover.lua
redirect_stderr=true
stdout_logfile=/var/log/%(program_name)s.log
user=mysql

Here's a short explanation of each line in this section:
i) The program name can be anything you want. It is what gets displayed in the program list when you run supervisorctl.
ii) The command is the exact commmand you would use to run the process in the foreground.
iii) I do want to redirect stderr to stdout
iv) I do want to capture stdout to a file in /var/log named "program_name".log -- so in my case it's named mysql-proxy-3306.log
v) I want to run the mysql-proxy process as user mysql (if no user is specified, it will run as root)

I made two other modifications to /etc/supervisord.conf in the [supervisord] section:
logfile=/var/log/supervisord.log
pidfile=/var/run/supervisord.pid

I also found an init.d startup script example for supervisord, which I modified a bit. My version is here. To start up supervisord, I use '/etc/init.d/supervisord start' or 'service supervisord start'. To make supervisord re-read its configuration file, I use 'service supervisord reload'. I also ran 'chkconfig supervisord on' to have the init.d script start upon reboot.

6) After starting up supervisord, I could see the mysql-proxy process running. Any output that mysql-proxy might have is captured in /var/log/mysql-proxy-3306.log.
Also, by running supervisorctl, I could see something like this:

mysql-proxy-3306 RUNNING pid 1860, uptime 23:10:45

To test out the monitoring and restarting capabilities of supervisord, I did a hard kill -9 of the mysql-proxy process. In a matter of 2 seconds, the process was restarted by supervisord. Here are the lines related to this in /var/log/supervisord.log:

2010-04-22 02:04:59,161 INFO exited: mysql-proxy-3309 (terminated by SIGKILL; not expected)
2010-04-22 02:05:00,177 INFO spawned: 'mysql-proxy-3309' with pid 26137
2010-04-22 02:05:01,325 INFO success: mysql-proxy-3309 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

As Borat would say...I LIKE!!!

My intention is to apply this methodology for other processes that I was daemonizing via grizzled.os for example. The advantage of using supervisord are many, but of particular importance to me is that it will restart the process on errors, so I don't need to worry about doing that on my own.

That's about it for now. I will post again about how mysql-proxy behaves in failover mode in production conditions. For now I have it in staging.

Thursday, April 22, 2010

Installing the memcached Munin plugin

Munin doesn't ship with a memcached plugin. Almost all links that point to that plugin on the Munin wiki are broken. I finally found it here. Here's what I did to get it to work:
  • Download raw text of the plugin. I saved it as "memcached_". Replace @@PERL@@ variable at the top of the file with the path to perl on your system. Make memcached_ executable.
On the munin node that you want to get memcached stats for (ideally you'd add these steps to your automated deployment scripts; I added them in a fabric file called fab_munin.py):
  • Copy memcached_ in /usr/share/munin/plugins (or equivalent for your system)
  • Make 3 symlinks in /etc/munin/plugins to the memcached_ file, but name each symlink according to the metric that will be displayed in the munin graphs:
    • ln -snf /usr/share/munin/plugins/memcached_ memcached_bytes
    • ln -snf /usr/share/munin/plugins/memcached_ memcached_counters
    • ln -snf /usr/share/munin/plugins/memcached_ memcached_rates
  • Install the Perl Cache::Memcached module (you can do it via the cpan cmdline utility)
  • Restart the munin-node service
At this point you should see 3 graphs for memcached on the munin dashboard:
  • "Network traffic" (from the 'bytes' symlink)
  • "Current values" such as bytes allocated/current connections/current items (from the 'counters' symlink)
  • "Commands" such as cache misses/cache hits/GETs/SETs (from the 'rates' symlink).

Wednesday, April 07, 2010

"Load Balancing in the Cloud" whitepaper by Rightscale

Rightscale recently published a whitepaper titled "Load Balancing in the Cloud: Tools, Tips and Techniques" (you can get it from here, however registration is required). I found out about it from this post on the Rightscale blog which is a good introduction to the paper. Here are some things I found interesting in the paper:
  • they tested HAProxy, Zeus Technologies Load Balancer, aiCache's Web Accelerator and Amazon's Elastic Load Balancer (ELB), all deployed in Amazon EC2
  • the connection rate (requests/second) was the metric under test, because it turns out it's usually the limiting factor when deploying a LB in the cloud
  • it was nice to see HAProxy as pretty much the accepted software-based Open Source solution for load balancing in the cloud (the other LBs tested are not free/open source)
  • the load testing methodology was very sound, and it's worth studying; Brian Adler, the author of the whitepaper, clearly knows what he's doing, and he made sure he eliminated potential bottlenecks along the paths from load-generating clients to the Web servers behind the LB
  • ab and httperf were used to generate load
  • all the non-ELB solutions resulted in 5,000 requests/sec on average (Zeus was slightly better than HAProxy, 6.5K vs. 5.2K req/sec)
  • ELB was shown to be practically 'infinitely' scalable (for some value of infinity); however, the elasticity of ELB is gradual, so if you experience a sudden traffic spike, ELB might catch up too slowly for your needs
  • for slow and steady traffic increase though ELB seems to be ideal
  • it turns out that 100,000 packets/second total throughput (inbound + outbound) is a hard limit in EC2 and it is due to virtualization
  • so...if your Web traffic exceeds this limit, you need to deploy multiple load balancers (or use ELB of course); when I was at OpenX, we did just that by deploying multiple HAProxy instances and using DNS round-robin to load balance traffic across our load balancers
Overall, if you're interested in deploying high traffic Web sites in the cloud, I think it's worth your time to register and read the whitepaper.

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...