Tuesday, May 24, 2011

Setting up RAID 0 across ephemeral drives on EC2 instances (and surviving reboots!)

I've been experimenting with setting up RAID 0 across ephemeral drives on EC2 instances. The initial setup, be it with mdadm and lvm, or directly with lvm, is not that hard -- what has proven challenging is surviving reboots. Unless you perform certain tricks, your EC2 instance will be blissfully unaware of its new setup after a reboot. What's more, if you try to mount the new striped volume at boot time by adding it to /etc/fstab, chances are you won't even be able to ssh into the instance anymore. It happened to me many times while experimenting, hence this blog post.

Update: I realize I didn't go into details about the use case of this type of setup. This is useful if you don't want to incur EBS performance and reliability penalties, and yet you have a data set that is larger than the 400 GB offered by an individual ephemeral drive. Of course, if your instance dies, so do the ephemeral drives (after all they are named like this for a reason...) -- so make sure you have a good backup/disaster recovery strategy for the data you store there!

In the following, I will assume you want to set up RAID 0 across the four ephemeral drives that come with an EC2 m1.xlarge instance, and which are exposed as devices /dev/sdb through /dev/sde. By default, /dev/sdb is mounted as /mnt, while the other drives aren't mounted. 

I also assume you want to create 1 volume group encompassing the RAID 0 array, and within that volume group you want to create 2 logical volumes with associated XFS file systems, and also 1 logical volume for swap.

Step 1 - unmount /dev/sdb

# umount /dev/sdb

(also comment out the entry corresponding to /dev/sdb in /etc/fstab)

Step 2 - install lvm2 and mdadm

For an unattended install of these packages (slightly complicated by the fact that mdadm also needs postfix), I do:

# DEBIAN_FRONTEND=noninteractive apt-get -y install mdadm lvm2

Step 3 - manually load the dm-mod module

# modprobe dm-mod

(this seems to be a bug in devmapper in Ubuntu)

If  you want to set up RAID 0 via lvm directly, you can skip steps 4 and 5. From what I've read, you get better performance if you do the RAID 0 setup with mdadm. Also, if you need any other RAID level, you need to use mdadm.

Step 4 - configure RAID 0 array via mdadm

# mdadm --create /dev/md0 --level=0 --chunk=256 --raid-devices=4 /dev/sdb /dev/sdc /dev/sdd /dev/sde

Verify:

# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90
  Creation Time : Mon May 23 22:35:20 2011
     Raid Level : raid0
     Array Size : 1761463296 (1679.86 GiB 1803.74 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon May 23 22:35:20 2011
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 256K

           UUID : 03f63ee3:607fb777:f9441841:42247c4d (local to host adb08lvm)
         Events : 0.1

    Number   Major   Minor   RaidDevice State
       0       8       16        0      active sync   /dev/sdb
       1       8       32        1      active sync   /dev/sdc
       2       8       48        2      active sync   /dev/sdd
       3       8       64        3      active sync   /dev/sde

Step 5 - increase block size to 64 KB for better performance

# blockdev --setra 65536 /dev/md0

Step 6 - create physical volume from the RAID 0 array

# pvcreate /dev/md0

(if you didn't want to use mdadm, you would call pvcreate against each of the /dev/sdb through /dev/sde devices)

Step 7 - create volume group called vg0 spanning the RAID 0 array

# vgcreate vg0 /dev/md0

(if you didn't want to use mdadm, you would run vgcreate and specify the 4 devices /dev/sdb through /dev/sde)

Verify:

# vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "vg0" using metadata type lvm2

# pvscan
  PV /dev/md0   VG vg0   lvm2 [1.64 TiB / 679.86 GiB free]
  Total: 1 [1.64 TiB] / in use: 1 [1.64 TiB] / in no VG: 0 [0   ]

Step 8 - create 3 logical volumes within the vg0 volume group

Each local drive is 400 GB, so the total size for the volume group is 1.6 TB. I'll create 2 logical volumes at 500 GB each, and a 10 GB logical volume for swap.

# lvcreate --name data1 --size 500G vg0
# lvcreate --name data2 --size 500G vg0
# lvcreate --name swap --size 10G vg0

Verify:

# lvscan
  ACTIVE            '/dev/vg0/data1' [500.00 GiB] inherit
  ACTIVE            '/dev/vg0/data2' [500.00 GiB] inherit
  ACTIVE            '/dev/vg0/swap' [10.00 GiB] inherit

Step 9 - create XFS file systems and mount them

We'll create XFS file systems for the data1 and data2 logical volumes. The names of the devices used for mkfs are the ones displayed via the lvscan command above. Then we'll mount the 2 file systems as /data1 and /data2.

# mkfs.xfs /dev/vg0/data1
# mkfs.xfs /dev/vg0/data2
# mkdir /data1
# mkdir /data2
# mount -t xfs -o noatime /dev/vg0/data1 /data1
# mount -t xfs -o noatime /dev/vg0/data2 /data2

Step 10 - create and enable swap partition

# mkswap /dev/vg0/swap
# swapon /dev/vg0/swap

At this point, you should have a fully functional setup. The slight problem is that if you add the newly created file systems to /etc/fstab and reboot, you may not be able to ssh back into your instance -- at least that's what happened to me. I was able to ping the IP of the instance, but ssh would fail.

I finally redid the whole thing on a new instance (I created the RAID 0 directly with lvm, bypassing the mdadm step), but didn't add the file systems to /etc/fstab. After rebooting and running lvscan, I noticed that the logical volumes I had created were all marked as 'inactive':

# lvscan
  inactive            '/dev/vg0/data1' [500.00 GiB] inherit
  inactive            '/dev/vg0/data2' [500.00 GiB] inherit
  inactive            '/dev/vg0/swap' [10.00 GiB] inherit

This was after I ran 'modprobe dm-mod' manually, otherwise the lvscan command would complain:

  /proc/misc: No entry for device-mapper found
  Is device-mapper driver missing from kernel?
  Failure to communicate with kernel device-mapper driver.

A Google search revealed this thread which offered a solution: run 'lvchange -ay' against each logical volume so that the volume becomes active. Only after doing this I was able to see the logical volumes and mount them.

So I added these lines to /etc/rc.local:

/sbin/modprobe dm-mod
/sbin/lvscan
/sbin/lvchange -ay /dev/vg0/data1
/sbin/lvchange -ay /dev/vg0/data2
/sbin/lvchange -ay /dev/vg0/swap
/bin/mount -t xfs -o noatime /dev/vg0/data1  /data1
/bin/mount -t xfs -o noatime /dev/vg0/data2  /data2
/sbin/swapon /dev/vg0/swap

After a reboot, everything was working as expected. Note that I am doing the mounting of the file systems and the enabling of the swap within the rc.local script, and not via /etc/fstab. If you try to do it in fstab, it is too early in the boot sequence, so the logical volumes will be inactive and the mount will fail, with the dire consequence that you won't be able to ssh back into your instance (at least in my case).

This was still not enough when creating the RAID 0 array with mdadm. When I used mdadm, even when adding the lines above to /etc/rc.local, the /dev/md0 device was not there after the reboot, so the mount would still fail. The thread I mentioned above does discuss this case at some point, and I also found a Server Fault thread on this topic. The solution in my case was to modify the mdadm configuration file /etc/mdadm/mdadm.conf and:

a) change the DEVICE variable to point to my 4 devices:

DEVICE /dev/sdb /dev/sdc /dev/sdd /dev/sde

b) add an ARRAY variable containing the UUID of /dev/md0 (which you can get via 'mdadm --detail /dev/md0'):

ARRAY /dev/md0 level=raid0 num-devices=4 UUID=03f63ee3:607fb777:f9441841:42247c4d

This change, together with the custom lines in /etc/rc.local, finally enabled me to have a functional RAID 0 array and functional file systems and swap across the ephemeral drives in my EC2 instance.

I hope this will be useful to somebody out there and will avoid some head-against-the-wall moments that I had to go through....

Monday, May 09, 2011

Managing infrastructures in the cloud, with lessons learned the hard way

Here is a collection of blog posts I wrote over the last 3 years or so. Some of them are practical step-by-step tutorials on using various tools for managing cloud instances, while others talk about lessons learned the hard way, by deploying large-scale infrastructures in the cloud. I am aggregating them here for ease of future reference:

Lessons learned
Working with EC2-specific tools
Load balancing (ELB and HAProxy)

Friday, May 06, 2011

Upgrading the GD library in Ubuntu

We needed to use ImageFlow for some internal testing of image manipulations (esp. reflections). With a stock php5/libgd2 install in Ubuntu 10.04, some calls to the ImageFlow library would fail with:

"GD library is too old. Version 2.0.1 or later is required, and 2.0.28 is strongly recommended."

The libraries installed by Ubuntu were:
$ dpkg -l | grep libgd2
rc  libgd2-noxpm                               2.0.36~rc1~dfsg-3ubuntu1.9.04.1         GD Graphics Library version 2 (without XPM s
ii  libgd2-xpm                                 2.0.36~rc1~dfsg-3ubuntu1.9.04.1         GD Graphics Library version 2
$ dpkg -l | grep php5-gd
ii  php5-gd                                    5.2.6.dfsg.1-3ubuntu4.6                 GD module for php5
The issue here is that Ubuntu does not use the version of GD which is bundled with PHP. See this discussion for more details.

So...some googling around later, I stumbled on this great howtoforge post by patusovniak on "Recompiling PHP5 with bundled support for GD in Ubuntu". It also serves as a good overview of building Ubuntu packages from source. The only observation I have is that after I ran the step
dpkg-buildpackage -rfakeroot

I had to install all .deb packages in /usr/src. So I did
cd /usr/src
dpkg -i *.deb

When running phpinfo(), the GD section now looks:
gd

GD Support enabled
GD Version bundled (2.0.34 compatible)
FreeType Support enabled
FreeType Linkage with freetype
FreeType Version 2.3.11
T1Lib Support enabled
GIF Read Support enabled
GIF Create Support enabled
JPEG Support enabled
libJPEG Version 6b
PNG Support enabled
libPNG Version 1.2.42
WBMP Support enabled
XPM Support enabled
XBM Support enabled

Hopefully this will be useful to someone out there trying to desperately use a newer version of GD with PHP in Ubuntu...