Monday, September 27, 2010

Getting detailed I/O stats with Munin

Ever since Vladimir Vuksan pointed me to his Ganglia script for getting detailed disk stats, I've been looking for something similar for Munin. The iostat and iostat_ios Munin plugins, which are enabled by default when you install Munin, do show disk stats across all devices detected on the system. I wanted more in-depth stats per device though. In my case, the devices I'm interested in are actually Amazon EBS volumes mounted on my database servers.

I finally figured out how to achieve this, using the diskstat_ Munin plugin which gets installed by default when you install munin-node.

If you run

/usr/share/munin/plugins/diskstat_ suggest

you will see the various symlinks you can create for the devices available on your server.

In my case, I have 2 EBS volumes on each of my database servers, mounted as /dev/sdm and /dev/sdn. I created the following symlinks for /dev/sdm (and similar for /dev/sdn):


ln -snf /usr/share/munin/plugins/diskstat_ /etc/munin/plugins/diskstat_latency_sdm
ln -snf /usr/share/munin/plugins/diskstat_ /etc/munin/plugins/diskstat_throughput_sdm
ln -snf /usr/share/munin/plugins/diskstat_ /etc/munin/plugins/diskstat_iops_sdm

Here's what metrics you get from these plugins:

  • from diskstat_iops: Read I/O Ops/sec, Write I/O Ops/sec, Avg. Request Size, Avg. Read Request Size, Avg. Write Request Size
  • from diskstat_latency: Device Utilization, Avg. Device I/O Time, Avg. I/O Wait Time, Avg. Read I/O Wait Time, Avg. Write I/O Wait Time
  • from diskstat_throughput: Read Bytes, Write Bytes
My next step is to follow the advice of Mark Seger (the author of collectl) and graph the output of collectl in real time, so that the stats are displayed in fine-grained intervals of 5-10 seconds instead of the 5-minute averages that RRD-based tools offer.

Tuesday, September 21, 2010

Quick note on installing and configuring Ganglia

I decided to give Ganglia a try to see if I like its metric visualizations and its plugins better than Munin's. I am still in the very early stages of evaluating it. However, I already banged my head against the wall trying to understand how to configure it properly. Here are some quick notes:

1) You can split your servers into clusters for ease of metric aggregation.

2) Each node in a cluster needs to run gmond. In Ubuntu, you can do 'apt-get install ganglia-monitoring' to install it. The config file is in /etc/ganglia/gmond.conf. More on the config file in a minute.

3) Each node in a cluster can send its metrics to a designated node via UDP.

4) One server in your infrastructure can be configured as both the overall metric collection server, and as the web front-end. This server needs to run gmetad, which in Ubuntu can be installed via 'apt-get install gmetad'. Its config file is /etc/gmetad.conf.

Note that you can have a tree of gmetad nodes, with the root of the tree configured to actually display the metric graphs. I wanted to keep it simple, so I am running both gmetad and the Web interface on the same node.

5) The gmetad server periodically polls one or more nodes in each cluster and retrieves the metrics for that cluster. It displays them via a PHP web interface which can be found in the source distribution.

That's about it in a nutshell in terms of the architecture of Ganglia. The nice thing is that it's scalable. You split nodes in clusters, you designate one or more nodes in a cluster to gather metrics from all the other nodes, and you have one ore more gmetad node(s) collecting the metrics from the designated nodes.

Now for the actual configuration. I have a cluster of DB servers, each running gmond. I also have another server called bak01 that I keep around for backup purposes. I configured each DB server to be part of a cluster called 'db'. I also configured each DB server to send the metrics collected by gmond to bak01 (via UDP on the non-default port of 8650). To do this, I have these entries in /etc/ganglia/gmond.conf on each DB server:


cluster {
  name = "db"
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}

udp_send_channel { 
  host = bak01
  port = 8650

On host bak01, I also defined a udp_recv_channel and a tcp_accept_channel:

udp_recv_channel { 
  port = 8650

/* You can specify as many tcp_accept_channels as you like to share 
   an xml description of the state of the cluster */ 
tcp_accept_channel { 
  port = 8649 

The upd_recv_channel is necessary so bak01 can receive the metrics from the gmond nodes. The tcp_accept_channel is necessary so that bak01 can be contacted by the gmetad node.

That's it in terms of configuring gmond.

On the gmetad node, I made one modification to the default /etc/gmetad.conf file by specifying the cluster I want to collect metrics for, and the node where I want to collect the metrics from:

data_source "eosdb" 60 bak01

I then restarted gmetad via '/etc/init.d/gmetad restart'.

Ideally, these instructions would get you to a state where you would be able to see the graphs for all the nodes in the cluster. 

I automated the process of installing and configuring gmond on all the nodes via fabric. Maybe it all happened too fast for the collecting node (bak01), because it wasn't collecting metrics correctly for some of the nodes. I noticed that if I did 'telnet localhost 8649' on bak01, some of the nodes had no metrics associated with them. My solution was to stop and start gmond on those nodes, and that kicked things off. Strange though...

In any case, my next step is to install all kinds of Ganglia plugins, especially related to MySQL, but also for more in-depth disk I/O metrics.

Wednesday, September 15, 2010

Managing Rackspace CloudFiles with python-cloudfiles

I've started to use Rackspace CloudFiles as an alternate storage for database backups. I have the backups now on various EBS volumes in Amazon EC2, AND in CloudFiles, so that should be good enough for Disaster Recovery purposes, one would hope ;-)

I found the documentation for the python-cloudfiles package a bit lacking, so here's a quick post that walks through the common scenarios you encounter when managing CloudFiles containers and objects. I am not interested in the CDN aspect of CloudFiles for my purposes, so for that you'll need to dig on your own.

A CloudFiles container is similar to an Amazon S3 bucket, with one important difference: a container name cannot contain slashes, so you won't be able to mimic a file system hierarchy in CloudFiles the way you can do it in S3. A CloudFiles container, similar to an S3 bucket, contains objects -- which for CloudFiles have a max. size of 5 GB. So the CloudFiles storage landscape consists of 2 levels: a first level of containers (you can have an unlimited number of them), and a second level of objects embedded in containers. More details in the CloudFiles API Developer Guide (PDF).

Here's how you can use the python-cloudfiles package to perform CRUD operations on containers and objects.

Getting a connection to CloudFiles

First you need to obtain a connection to your CloudFiles account. You need a user name and an API key (the key can be generated via the Web interface at https://manage.rackspacecloud.com).

conn = cloudfiles.get_connection(username=USERNAME, api_key=API_KEY, serviceNet=True)

When specifying serviceNet=True, the docs say that you will use the Rackspace ServiceNet network to access Cloud Files, and not the public network.

Listing containers and objects

Once you get a connection, you can list existing containers, and objects within a container:

containers = conn.get_all_containers()
for c in containers:
    print "\nOBJECTS FOR CONTAINER: %s" % c.name
    objects = c.get_objects()
    for obj in objects:
        print obj.name

Creating containers

container = conn.create_container(container_name)

Creating objects in a container

Assuming you have a list of filenames you want to upload to a given container:

for f in files:
    print 'Uploading %s to container %s' % (f, container_name)
    basename = os.path.basename(f)
    o = container.create_object(basename)
    o.load_from_filename(f)

(note that the overview in the python-cloudfiles index.html doc has a typo -- it specifies 'load_from_file' instead of the correct 'load_from_filename')

Deleting containers and objects

You first need to delete all objects inside a container, then you can delete the container itself:

print 'Deleting container %s' % c.name
print 'Deleting all objects first'
objects = c.get_objects()
for obj in objects:
    c.delete_object(obj.name)
print 'Now deleting the container'
conn.delete_container(c.name)

Retrieving objects from a container

Remember that you don't have a backup process in place until you tested restores. So let's see how you retrieve objects that are stored in a CloudFiles container:

container_name = sys.argv[1]
containers = conn.get_all_containers()
c = None
for c in containers:
    if container_name == c.name:
        break
if not c:
    print "No countainer found with name %s" % container_name
    sys.exit(1)

target_dir = container_name
os.system('mkdir -p %s' % target_dir)
objects = c.get_objects()
for obj in objects:
    obj_name = obj.name
    print "Retrieving object %s" % obj_name
    target_file = "%s/%s" % (target_dir, obj_name)
    obj.save_to_filename(target_file)

Wednesday, September 01, 2010

MySQL InnoDB hot backups and restores with Percona XtraBackup

I blogged a while ago about MySQL fault-tolerance and disaster recovery techniques. At that time I was experimenting with the non-free InnoDB Hot Backup product. In the mean time I discovered Percona's XtraBackup (thanks Robin!). Here's how I tested XtraBackup for doing a hot backup and a restore of a MySQL database running Percona XtraDB (XtraBackup works with vanilla InnoDB too).

First of all, I use the following Percona .deb packages on a 64-bit Ubuntu Lucid EC2 instance:


# dpkg -l | grep percona
ii libpercona-xtradb-client-dev 5.1.43-xtradb-1.0.6-9.1-60.jaunty.11 Percona SQL database development files
ii libpercona-xtradb-client16 5.1.43-xtradb-1.0.6-9.1-60.jaunty.11 Percona SQL database client library
ii percona-xtradb-client-5.1 5.1.43-xtradb-1.0.6-9.1-60.jaunty.11 Percona SQL database client binaries
ii percona-xtradb-common 5.1.43-xtradb-1.0.6-9.1-60.jaunty.11 Percona SQL database common files (e.g. /etc
ii percona-xtradb-server-5.1 5.1.43-xtradb-1.0.6-9.1-60.jaunty.11 Percona SQL database server binaries

I tried using the latest stable XtraBackup .deb package from the Percona downloads site but it didn't work for me. I started a hot backup with /usr/bin/innobackupex-1.5.1 and it ran for a while before dying with "InnoDB: Operating system error number 9 in a file operation." See this bug report for more details.

After unsuccessfully trying to compile XtraBackup from source, I tried XtraBackup-1.3-beta for Lucid from the Percona downloads. This worked fine.

Here's the scenario I tested against a MySQL Percona XtraDB instance running with DATADIR=/var/lib/mysql/m10 and a customized configuration file /etc/mysql10/my.cnf. I created and attached an EBS volume which I mounted as /xtrabackup on the instance running MySQL.

1) Take a hot backup of all databases under that instance:

/usr/bin/innobackupex-1.5.1 --defaults-file=/etc/mysql10/my.cnf --user=root --password=xxxxxx /xtrabackup

This will take a while and will create a timestamped directory under /xtrabackup, where it will store the database files from DATADIR. Note that the InnoDB log files are not created unless you apply step 2 below.

As the documentation says, make sure the output of innobackupex-1.5.1 ends with:

100901 05:33:12 innobackupex-1.5.1: completed OK!

2) Apply the transaction logs to the datafiles just created, so that the InnoDB logfiles are recreated in the target directory:

/usr/bin/innobackupex-1.5.1 --defaults-file=/etc/mysql10/my.cnf --user=root --password=xxxxxx --apply-log /xtrabackup/2010-09-01_05-21-36/

At this point, I tested a disaster recovery scenario by stopping MySQL and moving all files in DATADIR to a different location.

To bring the databases back to normal from the XtraBackup hot backup, I did the following:

1) Brought back up a functioning MySQL instance to be used by the XtraBackup restore operation:

i) Copied the contents of the default /var/lib/mysql/mysql database under /var/lib/mysql/m10/ (or you can recreate the mysql DB from scratch)

ii) Started mysqld_safe manually:

mysqld_safe --defaults-file=/etc/mysql10/my.cnf

This will create the data files and logs under DATADIR (/var/lib/mysql/m10) with the sizes specified in the configuration file. I had to wait until the messages in /var/log/syslog told me that the MySQL instance is ready and listening for connections.

2) Copied back the files from the hot backup directory into DATADIR

Note that the copy-back operation below initially errored out because it tried to copy the mysql directory too, and it found the directory already there under DATADIR. So the 2nd time I ran it, I moved /var/lib/mysql/m10/mysql to mysql.bak. The copy-back command is:

/usr/bin/innobackupex-1.5.1 --defaults-file=/etc/mysql10/my.cnf --user=root --copy-back /xtrabackup/2010-09-01_05-21-36/

You can also copy the files from /xtrabackup/2010-09-01_05-21-36/ into DATADIR using vanilla cp.

NOTE: verify the permissions on the restored files. In my case, some files in DATADIR were owned by root, so MySQL didn't start up properly because of that. Do a 'chown -R mysql:mysql DATADIR' to be sure.

3) If everything went well in step 2, restart the MySQL instance to make sure everything is OK.

At this point, your MySQL instance should have its databases restored to the point where you took the hot backup.

IMPORTANT: if the newly restored instance needs to be set up as a slave to an existing master server, you need to set the correct master_log_file and master_log_pos parameters via a 'CHANGE MASTER TO' command. These parameters are saved by innobackupex-1.5.1 in a file called xtrabackup_binlog_info in the target backup directory.

In my case, the xtrabackup_binlog_info file contained:

mysql-bin.000041 23657066

Here is an example of a CHANGE MASTER TO command I used:

STOP SLAVE;

CHANGE MASTER TO MASTER_HOST='masterhost', MASTER_PORT=3316, MASTER_USER='masteruser', MASTER_PASSWORD='masterpass', MASTER_LOG_FILE='mysql-bin.000041', MASTER_LOG_POS=23657066;

START SLAVE;

Note that XtraBackup can also run in a 'stream' mode useful for compressing the files generated by the backup operation. Details in the documentation.