Friday, December 10, 2010

A Fabric script for striping EBS volumes

Here's a short Fabric script which might be useful to people who need to stripe EBS volumes in Amazon EC2. Striping is recommended if you want to improve the I/O of your EBS-based volumes. However, striping won't help if one of the member EBS volumes goes AWOL or suffers performance issues. In any case, here's the Fabric script:

import commands
from fabric.api import *

# Globals

env.project='EBSSTRIPING'
env.user = 'myuser'

DEVICES = [
    "/dev/sdd",
    "/dev/sde",
    "/dev/sdf",
    "/dev/sdg",
]

VOL_SIZE = 400 # GB

# Tasks

def install():
    install_packages()
    create_raid0()
    create_lvm()
    mkfs_mount_lvm()

def install_packages():
    run('DEBIAN_FRONTEND=noninteractive apt-get -y install mdadm')
    run('apt-get -y install lvm2')
    run('modprobe dm-mod')
    
def create_raid0():
    cmd = 'mdadm --create /dev/md0 --level=0 --chunk=256 --raid-devices=4 '
    for device in DEVICES:
        cmd += '%s ' % device
    run(cmd)
    run('blockdev --setra 65536 /dev/md0')

def create_lvm():
    run('pvcreate /dev/md0')
    run('vgcreate vgm0 /dev/md0')
    run('lvcreate --name lvm0 --size %dG vgm0' % VOL_SIZE)

def mkfs_mount_lvm():
    run('mkfs.xfs /dev/vgm0/lvm0')
    run('mkdir -p /mnt/lvm0')
    run('echo "/dev/vgm0/lvm0 /mnt/lvm0 xfs defaults 0 0" >> /etc/fstab')
    run('mount /mnt/lvm0')

A few things to note:

  • I assume that you already created and attached 4 EBS volumes to your instance with device names /dev/sdd through /dev/sdg; if your device names or volume count are different, modify the DEVICES list appropriately
  • The size of your target RAID0 volume is set in the VOL_SIZE variable
  • the helper functions are pretty self-explanatory: 
    1. we use mdadm to create a RAID0 device called /dev/md0; we also set the block size to 64 KB via the blockdev call
    2. we create a physical LVM volume on /dev/md0
    3. we create a volume group called vgm0 on /dev/md0
    4. we create a logical LVM volume called lvm0 of size VOL_SIZE, inside the vgm0 group
    5. we format the logical volume as XFS, then we mount it and also modify /etc/fstab
That's it. Hopefully it will be useful to somebody out there.

9 comments:

winhamwr said...

Thanks so much for posting this Grig. I've been really wanting to do this for our Hudson slaves (normally I/O bound loading/deleting MySQL fixture data) and I'm very interested to see what kind of speedup this could get us. It also makes me giggle a bit imagining going from 4 to 8 or 8 to 16 vols for fun. Gotta love ec2 :)

Grig Gheorghiu said...

@winhamwr -- glad to see it's useful to you. If you could post some performance numbers of striped vs. non-striped EBS I/O that would be really good!

Grig

Stephen said...

Interesting.

I'm going through a similar setup now, and I had assumed that a better way to do this is to combine devices into LVM's, then do MD on those LVM's. That way, if you add more EBS volumes, you can add them to the VG's/LVM's and use something like e2fsresize (or xfs_growfs in your case) to add capacity "on the fly".

The way you have it, you can add volumes to the MD, but it's a bit more work, isn't it?

Also, does the blockdev --setra make a big difference? I guess it depends on your usage pattern. Maybe not useful for random access? (i.e. mongodb, etc.)

CoreyS said...

Thanks for this post! I created 4 100G volumes and ran the script ... I ended up however with the mounted drive showing only 100G ... I was expecting it to have 400G... any idea what went wrong, or if I'm misunderstanding the process?

Grig Gheorghiu said...

Hi Corey -- thanks for your comment. You're right, it was a bug in my explanation of the script. VOL_SIZE needs to be the size of your target RAID0 volume, so in your case you need to set that to 400. I updated my post to reflect that.

Grig

CoreyS said...

Over the weekend, Amazon rebooted our instance, and I believe the drive was not re-mounted correctly, as we are missing entire directories that used to be on the striped drive.

The mounted striped drive exists /mnt/lvm0 and shows the correct 400G:

[root@ lvm0]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sdb 414G 199M 393G 1% /mnt

Some directories we had created are there:

/mnt/lvm0/lib/mysql/data

But the directory where our database files were located
/mnt/lvm0/lib/mysql/data/app

has completely vanished, resulting in complete failure of our MySQL instance.

I believe the directories are only there because MySQL created them after the reboot based on the my.cnf configuration.

I had set up a directory and symlink:
/cron_archive -> /mnt/lvm0/archive
... that directory no longer exists.

Our fstab contains the required remount I believe:
[root@domU-12-31-39-05-18-52 lvm0]# more /etc/fstab
/dev/sda1 / ext3 defaults 0 0
/dev/sdb /mnt ext3 defaults 0 0
none /dev/pts devpts gid=5,mode=620 0 0
none /dev/shm tmpfs defaults 0 0
none /proc proc defaults 0 0
none /sys sysfs defaults 0 0

/dev/vgm0/lvm0 /mnt/lvm0 xfs defaults 0 0

Any help is greatly appreciated, and would be of benefit to others who experience an Amazon triggered reboot.

CoreyS said...

A followup thought ... /mnt is a transient drive (e.g. after instance restart, the contents of that drive are not preserved). I am wondering that say Amazon formats /mnt on a restart, if it also inadvertently formatted "lvm0" as well, wiping out the striped drive? I am in contact with AWS support, and I'll let you know what they say...

CoreyS said...

Hi Grig,
I did not receive any help from Amazon support (who claim they are not allowed to actually log into an instance), but we did figure out the problem here, which you may want to share on your blog.

We needed to add to initialization (to /etc/rc.local):

mdadm --assemble /dev/md0 --chunk=256 /dev/sdj /dev/sdk /dev/sdl /dev/sdm
lvchange -a y /dev/vgm0
mount /mnt/lvm0

e.g. on a reboot, you need to reassemble the Stripe, activate the volume group, and remount.

Also I moved /mnt/lvm0 to /lvm0 as a precaution as the mount /mnt does not persist between instance restarts.

I rebooted the instance a few times, and the stripe was ready to go every time.

Cheers,
Corey

Grig Gheorghiu said...

Hey Corey -- sorry I was of no help, but glad you found a solution!

Grig