Thursday, October 16, 2008

The case of the missing profile photo

Earlier today I posted a blog entry, then I went to view it on my blog, only to notice that my profile photo was conspicuously absent. I double-checked the URL for the source of the image -- it was http://agile.unisonis.com/gg.jpg. Then I remembered that I recently migrated agile.unisonis.com to my EC2 virtual machine. I quickly ssh-ed into my EC2 machine and saw that the persistent storage volume was not mounted. I ran uptime and noticed that it only showed 8 hours, so the machine had somehow been rebooted. In my experiments with setting up that machine, I had failed to add a line to /etc/fstab that causes the persistent storage volume to be mounted after the rebooted. Easily rectified:

echo "/dev/sds /ebs1 ext3 defaults 0 0" >> /etc/fstab

I connected to my EC2 environment with ElasticFox and saw that the EBS volume was still attached to my machine instance as /dev/sds, so I mounted it via 'mount /dev/sds/ /ebs1', then restarted httpd and mysqld, and all my sites were again up and running.

I tested my setup by rebooting. After the reboot, another surprise: httpd and mysqld were not chkconfig-ed on, so they didn't start automatically. I fixed that, I rebooted again, and finally everything came back as expected.

A few lessons learned here in terms of hosting your web sites in 'the cloud':

1) you need to test your machine setup across reboots
2) you need automated tests for your machine setup -- things like 'is httpd chkconfig-ed on?'; 'is /dev/sds mounted as /ebs1 in /etc/fstab?'
3) you need to monitor your sites from a location outside the cloud which hosts your sites; I shouldn't have to eyeball a profile photo to realize that my EC2 instance is not functioning properly!

I'll cover all these topics and more soon in some other posts, so stay tuned!

3 comments:

Jarrod said...

so, if you don't mind sharing, how much does EC2 cost you a month?

Kumar McMillan said...

Gomez is a pretty nice commercial service we use at work to test our sites. It's really just a fancy ping; they can ping your site from all over the world to tell you if your site is down in Brazil or wherever due to Internetz problems. Of course, the biggest win here is you are using something outside of your own network to test your network.

Grig Gheorghiu said...

Kumar -- thanks for the tip, I've seen Gomez in action in the past.

Jarrod -- I already posted on the EC2 cost topic, see http://agiletesting.blogspot.com/2008/10/update-on-ec2-and-ebs.html

Grig

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...