Monday, September 01, 2008

Experiences with Amazon EC2 and EBS

I decided to port some of the sites I've been running for the last few years on a dedicated server (running RHEL9) to an Amazon EC2 AMI (which stands for 'Amazon Machine Image'). I also wanted to use some more recent features offered by Amazon in conjunction with their EC2 platform -- such as the permanent block-based storage AKA the Elastic Block Store (EBS), and also the permanent external IP addresses AKA the Elastic IPs.

To get started, I used a great blog post on 'Persistent Django on Amazon EC2 and EBS' by Thomas Brox Røst. I will refer here to some of the steps that Thomas details in his post; if you want to follow along, you're advised to read his post.

1) Create an AWS account and sign up for the EC2 service.

2) Install the ElasticFox Firefox extension -- the greatest thing since sliced bread in terms of managing EC2 AMIs. To run the ElasticFox GUI, go to Tools->ElasticFox in Firefox; this will launch a new tabbed window showing the GUI. From now on, I will abbreviate ElasticFox as EF.

3) Add your AWS user name and access keys in EF (use the Credentials button).

4) Add an EC2 security group (click on the 'Security Groups' tab in EF); this can be thought of as a firewall rule that will replace the default one. In my case, I called my group 'gg' and I allowed ports 80 and 443 (http and https) and 22 (ssh).

5) Add a keypair to be used when you ssh into your AMI (click on the 'KeyPairs' tab in EF). I named mine gg-ec2-keypair and I saved the private key in my .ssh folder on my local machine (.ssh/gg-ec2-keypair.pem).

6) Get a fixed external IP (click on the 'Elastic IPs' tab in EF). You will be assigned an IP which is not yet associated with any AMI.

7) Get a block-based storage volume that you can format later into a file system (click on the 'Volumes and Snapshots' tab in EF). I got a 10 GB volume.

These 7 steps are the foundation of everything else you need to do when running an AMI. Choosing and launching the AMI itself is the next step, which you can run any time you want to launch an AMI.

I followed Thomas's example and chose a 32-bit Fedora Core 8 image for my AMI. In EF, you can search for Fedora 8 images by going to the 'AMIs and Instances' tab and typing fedora-8 in the search box. Right click on the desired image (mine was called ec2-public-images/fedora-8-i386-base-v1.07.manifest.xml) and choose 'Launch instance(s) of this AMI'. You will need to choose a keypair (I chose the one I created earlier, gg-ec2-keypair), an availability zone (I chose the 'us-east-1a') and a security group (I removed the default one and added the one I created earlier).

You should immediately see the instance in a 'pending' state in the Instances list. After a couple of minutes, if you click Refresh you'll see it in the 'running' state, which means it's ready for you to access and work with.

Once my AMI was running, I right-clicked it and chose 'copy instance ID to clipboard'. The instance ID is needed to associate the EBS volume and the Elastic IP to this instance.

To associate the fixed external IP, I went to the 'Elastic IPs' tab in EF, right clicked on the Elastic IP I was assigned and chose 'Associate this address', then I indicated the instance ID of my running AMI. As a side note, if you don't see anything in a given EF list (such as Elastic IPs or Volumes), click Refresh and you should see it.

To associate the EBS volume, I went to the 'Volumes and Snapshots' tab in EF, right clicked on the volume I had created, then chose 'Attach this volume'. In the next dialog box, I specified the instance ID of my AMI, then /dev/sdh as the volume name.

The next step is to ssh into your AMI and format the raw block storage into a file system. You can use the Elastic IP you were assigned (let's call it A.B.C.D), and run:

$ ssh -i .ssh/your-private-key.pem root@A.B.C.D

At this point, you should be logged in into your AMI. To format the EBS volume, run:

# mkdir /ebs1; mount -t ext3 /dev/sdh /ebs1

If you want the mount point to persist across reboots, also add this line to /etc/fstab:

$ echo "/dev/sdh /ebs1 ext3 noatime 0 0" >> /etc/fstab

At this point, you have a bare-bones Fedora Core 8 instance accessible via HTTP, HTTPS and SSH at the IP address A.B.C.D. Not very useful in and of itself, unless you install your application.

In my case, the first Web site I wanted to port over was the SoCal Piggies wiki, at www.socal-piggies.org. I used to run it on MoinMoin 1.3.1on my old server, but for this brand-new AMI experiment I installed MoinMoin 1.7.1. I also had to install httpd and python-devel via yum. And since we're talking about package installs, here's the main point you should take away from this post: you need to install all required packages every time you re-launch your AMI. I'm not talking about rebooting your AMI, which preserves your file systems; I'm talking about terminating your AMI for any reason, then re-launching a new AMI instance. This operation will start your AMI with a clean slate in terms of packages that are installed. You can obviously re-mount the EBS volume that you created, and all your files will still be there, but those are typically application or database files, and not the actual required packages themselves (such as httpd or python-devel).

So, very important point: as soon as you start porting applications over to your AMI, you'd better start designing the layout of your apps so that they take full advantage of the EBS volume(s) you created. You'll also have to script the installation of the required packages, so you can easily run the script every time you launch a new instance of your AMI. This can be seen as a curse, but to me it's a blessing in disguise, because it forces you to automate the installation of your applications. Automation entails faster deployment, less errors, better testability. In short, you win in the long run.

For the first application I ported, the SoCal Piggies wiki, I made the following design decisions:

a) I chose to install MoinMoin 1.7.1 from scratch every time I launch a new AMI instance; I also install httpd, httpd-devel and python-devel from scratch every time
b) I chose to point the specific instance of the Piggies wiki to /ebs1/wikis/socal-piggies, so all the actual content of the wiki is kept persistently in the EBS volume
c) I moved /etc/httpd to /ebs1/httpd, then I created a symlink from /ebs1/httpd to /etc, so all the Apache configuration files are kept persistently in the EBS volume
d) I pointed the DocumentRoot of the Apache virtual host for the Piggies wiki to /ebs1/www/socal-piggies, so that all the static files that need to be accessed via the www.socal-piggies.org domain are kept persistenly in the EBS volume

So what do I have to do if I decide to terminate the current AMI instance, and launch a new one? Simple -- I first associate the Elastic IP and the EBS volume with the new instance via EF, then I ssh into the new AMI (which has the same external IP as the old one) and run this command line:

# mkdir /ebs1; mount -t ext3 /dev/sdh /ebs1

Then I go to /ebs1/scripts and run this script:
# cat mysetup.sh
#!/bin/bash

# Install various packages via yum
yum -y install python-devel
yum -y install httpd httpd-devel

# Create symlinks
mv /etc/httpd /etc/httpd.orig
ln -s /ebs1/httpd /etc

# Download and install MoinMoin
cd /tmp
rm -rf moin*
wget http://static.moinmo.in/files/moin-1.7.1.tar.gz
tar xvfz moin-1.7.1.tar.gz
cd moin-1.7.1
python setup.py install

# Start apache
service httpd start

# Make sure /ebs1 is mounted across reboots
echo "/dev/sdh /ebs1 ext3 noatime 0 0" >> /etc/fstab

Even better, I can script all this on my local machine, so I don't even have to log in via ssh. This is the command I run on my local machine:
ssh -i ~/.ssh/gg-ec2-keypair.pem 75.101.140.75 'mkdir /ebs1; mount -t ext3 /dev/
sdh /ebs1; /ebs1/scripts/mysetup.sh'

That's it! At this point, I have the Piggies wiki running on a brand-new AMI.

Two caveats here:

1) the ssh fingerprint of the remote AMI that had been saved in .ssh/known_hosts on your local machine will no longer be valid, so you'll get a big security warning the first time you will try ssh-ing into your new AMI. Just delete that line from known_hosts and ssh again.
2) it takes a while (for me it was up to 5 minutes) for the Elastic IP to be ready for you to ssh into after you associate it with a brand-new AMI; so in a disaster recovery situation, keep in mind that your site can potentially be down for 10-15 minutes, time in which you launch a new AMI, associate the Elastic IP and the EBS volume with it, and run your setup scripts.

My experience so far with EC2 and EBS has been positive. As I already mentioned, the fact that it forces you to design your application to take advantage of the persistent EBS volume, and to script the installation of the pre-requisite packages, is a net positive in my opinion.

The next step for me will be to port other sites with a MySQL database backend. Fun fun fun! I will blog soon about my experiences. In the mean time, go ahead and browse the brand-new SoCal Piggies wiki :-)

7 comments:

Unknown said...

How does this work out cost-wise on a monthly basis? I started to look at this, but an AMI instance alone was $74 a month before bandwith, storage, etc.

It does seem like a very cool idea to have things scripted out this way though. Thanks for the write up!

Grig Gheorghiu said...

Deuce -- you're right, running a bare-bones AMI runs you $72/mo. I just started to run my AMI a few days ago, but I'll post about a more exact cost at the end of this month.

Grig

Anonymous said...

Good post!

You could also bundle your own AMI (http://docs.amazonwebservices.com/AmazonEC2/dg/2006-06-26/bundling-an-ami.html), based off the one you started with, so you can save the hassle of reinstalling everything upon each reboot.

Anonymous said...

You forgot to include the step on formatting the device before mounting:

mkfs.ext3 /dev/sdh

Grig Gheorghiu said...

Adam -- thanks for your comment. I did say

"The next step is to ssh into your AMI and format the raw block storage into a file system"

...but it's true I didn't specify the exact command line.

You only need to do this once though. All other times you just need to mount the already formatted volume.

Grig

Arul said...

Hi Grig,
I am just started reading about Amazon EC2..One basic doubt i have is , After selecting my AMI from Elasic FireFox UI, Created instance and it is running, Now i want to install my application on the Instance and get the End point for the application. How to do that?
Thanks
Arut

Ken Thomas said...

As already noted, you can just save a personal AMI image of your current machine state,-- then boot from that. Or as many as you need to be sure you have the state you need.

You *REALLY SHOULD* update the article to reflect this.

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...