Tuesday, December 16, 2008

Some issues when restoring files using duplicity

I blogged a while back about how to do incremental encrypted backups to S3 using duplicity. I've been testing the restore procedure for some of my S3 backups, and I had a problem with the way duplicity deals with temporary directories and files it creates during the restore.

By default, duplicity will use the system default temporary directory, which on Unix is usually /tmp. If you have insufficient disk space in /tmp for the files you're trying to restore from S3, the restore operation will eventually fail with "IOError: [Errno 28] No space left on device".

One thing you can do is create another directory on a partition with lots of disk space, and specify that directory in the duplicity command line using the --tempdir command line option. Something like: /usr/local/bin/duplicity --tempdir=/lotsofspace/temp

However, it turns out that this is not sufficient. There's still a call to os.tmpfile() buried in the patchdir.py module installed by duplicity. Consequently, duplicity will still try to create temporary files in /tmp, and the restore operation will still fail. As a workaround, I solved the issue in a brute-force kind of way by editing /usr/local/lib/python2.5/site-packages/duplicity/patchdir.py (the path is obviously dependent on your Python installation directory) and replacing the line:

tempfp = os.tmpfile()

with the line:

tempfp, filename = tempdir.default().mkstemp_file()

(I also needed to import tempdir at the top of patchdir.py; tempdir is a module which is part of duplicity and which deals with temporary file and directory management -- I guess the author of duplicity just forgot to replace the call to os.tmpfile() with the proper calls to the tempdir methods such as mkstemp_file).

This solved the issue. I'll try to open a bug somehow with the duplicity author.

Friday, December 12, 2008

Working with Amazon EC2 regions

Now that Amazon offers EC2 instances based in data centers in Europe, there is one more variable that you need to take into account when using the EC2 API: the concept of 'region'. Right now there are 2 regions to choose from: us-east-1 (based of course in the US on the East Coast), and the new region eu-west-1 based in Western Europe. Knowing Amazon, they will probably launch data centers in other regions across the globe -- Asia, South America, etc.

Each region has several availability zones. You can see the current ones in this nice article from the AWS Developer Zone. The default region is us-east-1, with 3 availability zones (us-east-1a, 1b and 1c). If you don't specify a region when you call an EC2 API tool, then the tool will query the default region. That's why I was baffled when I tried to launch a new AMI in Europe; I was calling 'ec2-describe-availability-zones' and it was returning only the US ones. After reading the article I mentioned, I realized I need to have 2 versions of my scripts: the old one I had will deal with the default US-based region, and the new one will deal with the Europe region by adding '--region eu-west-1' to all EC2 API calls (you need the latest version of the EC2 API tools from here).

You can list the zones available in a given region by running:

# ec2-describe-availability-zones --region eu-west-1
AVAILABILITYZONE eu-west-1a available eu-west-1
AVAILABILITYZONE eu-west-1b available eu-west-1
Note that all AWS resources that you manage belong to a given region. So if you want to launch an AMI in Europe, you have to create a keypair in Europe, a security group in Europe, find available AMIs in Europe, and launch a given AMI in Europe. As I said, all this is accomplished by adding '--region eu-west-1' to all EC2 API calls in your scripts.

Another thing to note is that the regions are separated in terms of internal DNS too. While you can access AMIs within the same zone based on their internal DNS names, this access doesn't work across regions. You need to use the external DNS name of an instance in Europe if you want to ssh into it from an instance in the US (and you also need to allow the external IP of the US instance to access port 22 in the security policy for the European instance.)

All this introduces more headaches from a management/automation point of view, but the benefits obviously outweigh the cost. You get low latency for your European customers, and you get more disaster recovery options.

Thursday, December 11, 2008

Deploying EC2 instances from the command line

I've been doing a lot of work with EC2 instances lately, and I wrote some simple wrappers on top of the EC2 API tools provided by Amazon. These tools are Java-based, and I intend to rewrite my utility scripts in Python using the boto library, but for now I'm taking the easy way out by using what Amazon already provides.

After downloading and unpacking the EC2 API tools, you need to set the following environment variables in your .bash_profile file:
export EC2_HOME=/path/to/where/you/unpacked/the/tools/api
export EC2_PRIVATE_KEY = /path/to/pem/file/containing/your/ec2/private/key
export EC2_CERT = /path/to/pem/file/containing/your/ec2/cert
You also need to add $EC2_HOME/bin to your PATH, so the command-line tools can be found by your scripts.

At this point, you should be ready to run for example:
# ec2-describe-images -o amazon
which lists the AMIs available from Amazon.

If you manage more than a handful of EC2 AMIs (Amazon Machine Instances), it quickly becomes hard to keep track of them. When you look at them for example using the Firefox Elasticfox extension, it's very hard to tell which is which. One solution I found to this is to create a separate keypair for each AMI, and give the keypair a name that specifies the purpose of that AMI (for example mysite-db01). This way, you can eyeball the list of AMIs in Elasticfox and make sense of them.

So the very first step for me in launching and deploying a new AMI is to create a new keypair, using the ec2-add-keypair API call. Here's what I have, in a script called create_keypair.sh:
# cat create_keypair.sh
#!/bin/bash

KEYNAME=$1

if [ -z "$KEYNAME" ]
then
echo "You must specify a key name"
exit 1

fi

ec2-add-keypair $KEYNAME.keypair > ~/.ssh/$KEYNAME.pem
chmod 600 ~/.ssh/$KEYNAME.pem

Now I have a pem file called $KEYNAME.pem containing my private key, and Amazon has my public key called $KEYNAME.keypair.

The next step for me is to launch an 'm1.small' instance (the smallest instance you can get from EC2) whose AMI ID I know in advance (it's a 32-bit Fedora Core 8 image from Amazon with an AMI ID of ami-5647a33f). I am also using the key I just created. My script calls the ec2-run-instances API.
# cat launch_ami_small.sh
#!/bin/bash

KEYNAME=$1

if [ -z "$KEYNAME" ]
then
echo "You must specify a key name"
exit 1

fi

# We launch a Fedora Core 8 32 bit AMI from Amazon
ec2-run-instances ami-5647a33f -k $KEYNAME.keypair --instance-type m1.small -z us-east-1a
Note that the script makes some assumptions -- such as the fact that I want my AMI to reside in the us-east-1a availability zone. You can obviously add command-line parameters for the availability zone, and also for the instance type (which I intend to do when I rewrite this in Python).

Next, I create an EBS volume which I will attach to the AMI I just launched. My create_volume.sh script takes an optional argument which specifies the size in GB of the volume (and otherwise sets it to 50 GB):
# cat create_volume.sh
#!/bin/bash

SIZE=$1
if [ -z "$SIZE" ]
then
SIZE=50

fi

ec2-create-volume -s $SIZE -z us-east-1a
The volume should be created in the same availability zone as the instance you intend to attach it to -- in my case, us-east-1a.

My next step is to attach the volume to the instance I just launched. For this, I need to specify the instance ID and the volume ID -- both values are returned in the output of the calls to ec2-run-instances and ec2-create-volume respectively.

Here is my script:

# cat attach_volume_to_ami.sh
#!/bin/bash

VOLUME_ID=$1
AMI_ID=$2

if [ -z "$VOLUME_ID" ] || [ -z "$AMI_ID" ]
then
echo "You must specify a volume ID followed by an AMI ID"
exit 1

fi

ec2-attach-volume $VOLUME_ID -i $AMI_ID -d /dev/sdh

This attaches the volume I just created to the AMI I launched and makes it available as /dev/sdh.

The next script I use does a lot of stuff. It connects to the new AMI via ssh and performs a series of commands:
* format the EBS volume /dev/sdh as an ext3 file system
* mount /dev/sdh as /var2, and copy the contents of /var to /var2
* move /var to /var.orig, create new /var
* unmount /var2 and re-mount /dev/sdh as /var
* append the mounting as /dev/sdh as /var to /etc/fstab so that it happens upon reboot

Before connecting via ssh to the new AMI, I need to know its internal DNS name or IP address. I use ec2-describe-instances to list all my running AMIs, then I copy and paste the internal DNS name of my newly launched instance (which I can isolate because I know the keypair name it runs with).

Here is the script which formats and mounts the new EBS volume:

# cat format_mount_ebs_as_var_on_ami.sh
#!/bin/bash

AMI=$1
KEYNAME=$2

if [ -z "$AMI" ] || [ -z "$KEY" ]
then
echo "You must specify an AMI DNS name or IP followed by a keypair name"
exit 1

fi

CMD='mkdir /var2; mkfs.ext3 /dev/sdh; mount -t ext3 /dev/sdh /var2; \
mv /var/* /var2/; mv /var /var.orig; mkdir /var; umount /var2; \
echo "/dev/sdh /var ext3 defaults 0 0" >>/etc/fstab; mount /var'

ssh -i ~/.ssh/$KEY.pem root@$AMI $CMD
The effect is that /var is now mapped to a persistent EBS volume. So if I install MySQL for example, the /var/lib/mysql directory (where the data resides by default in Fedora/CentOS) will be automatically persistent. All this is done without interactively logging in to the new instance. so it can be easily scripted as part of a larger deployment procedure.

That's about it for the bare-bones stuff you have to do. I purposely kept my scripts simple, since I use them more to remember what EC2 API tools I need to run than anything else. I don't do a lot of command-line option stuff and error-checking stuff, but they do their job.

If you run scripts similar to what I have, you should have at this point a running AMI with a 50 GB EBS volume mounted as /var. Total running time of all these scripts -- 5 minutes at most.

As soon as I have a nicer Python script which will do all this and more, I'll post it here.

Thursday, December 04, 2008

New job at OpenX

I meant to post this for a while, but haven't had the time, because...well, it's a new job, so I've been quite swamped. I started 2 weeks ago as a system engineer at OpenX, a company based in Pasadena, whose main product is an Open Source ad server. I am part of the 'black ops' team, and my main task for now is to help with deploying and scaling the OpenX Hosted service within Amazon EC2 -- which is just one of several cloud computing providers that OpenX uses (another one is AppNexus for example).

Lots of Python involved in this, lots of automation, lots of testing, so all this makes me really happy :-)

Here is some stuff I've been working on, which I intend to post on with more details as time permits:

* command-line provisioning of EC2 instances
* automating the deployment of the OpenX application and its pre-requisites
* load balancing in EC2 using HAProxy
* monitoring with Hyperic
* working with S3-backed file systems

I'll also start working soon with slack, a system developed at Google for automatic provisioning of files via the interesting concept of 'roles'. It's in the same family as cfengine or puppet, but simpler to use and with a powerful inheritance concept applied to roles.

All in all, it's been a fun and intense 2 weeks :-)

Sunday, November 30, 2008

The sad state of open source monitoring tools

I've been looking lately at open source network monitoring tools. I'm not impressed at all by what I've seen so far. Pretty much the least common denominator when it comes to this type of tools is Nagios, which is not a bad tool (I used it a few years ago), but did you see its Web interface? It's soooooo 1999 -- think 'Perl CGI scripts'!

A slew of other tools are based on the Nagios engine, and are trying hard to be more pleasing to the eye -- Opsview and GroundWork are some examples. Opsview seems just a wrapper around Nagios, with not a lot of improvements in terms of both functionality and UI.

I looked at the GroundWork screencast and it seemed promising, but when I tried to install it I had a very unpleasant experience. First of all, the install script uses curses (did those guys hear about unattended installs?), and requires Java 1.5. Although I had both Java 1.5 and 1.6 on my CentOS server, and JAVA_HOME set correctly, it didn't stop the installer from complaining and exiting. Good riddance.

I should say that the first open source network monitoring tool that I tried was Zenoss, which is supposed to be the poster child for Python-based monitoring tools. Believe me, I tried hard to like it. I even went back and gave it a second chance, after noticing that other tools aren't any better. But to no avail -- I couldn't get past the sensation that it's a half-baked tool, with poor documentation and obscure user interface. It could work fine if you just want to monitor some devices with SNMP, but as soon as you try to extend it with your own plugins (called Zen Packs), or if you try to use their agents (called Zen Plugins), you run into a wall. At least I did. I got tired of Python tracebacks, obscure references to 'restarting Zope' (I thought it's based on twisted), fiddling with values for the so-called zProperties of a device, trying unsuccessfully to get ssh key authentication to work with the Zen Plugins, etc, etc. I'm not the only one who went through these frustrations either -- there are plenty of other users saying in the Zenoss forums that they've had it, and that they're going to look for something else. Which is what I did too.

I also tried OpenNMS, which was better than Zenoss, but it still had a CGI feel in terms of its Web interface.

So...for now I settled on Hyperic. It's a Java-based tool with a modern Web interface, very good documentation, and it's extensible via your own plugins (which you can write in any language you want, as long as you conform to some conventions which are not overly restrictive). Hyperic uses agents that you install on every server you need to monitor. I don't mind this, I find it better than configuring SNMP to death. It does have it quirks -- for example it calls devices that it monitors 'platforms' (instead of just 'devices' or 'servers'), and it calls the plugins that monitor specific services 'servers' (instead of services). Once you get used to it, it's not that bad. However, I wish there was a standard nomenclature for this stuff, as well as a standard way for these tools to inter-operate. As it is, you have to learn each tool and train your brain to ignore all the weirdness that it encounters. Not an optimal scenario by any means.

I'm very curious to see what tools other people use. If you care to leave a comment about your monitoring tool of choice, please do so!

I'll report back with more stuff about my experiences with Hyperic.

Friday, November 21, 2008

Issues with Ubuntu 8.10 on Lenovo T61p laptop

I got a new Lenovo ThinkPad T61p, and of course I promptly installed Ubuntu Ibex 8.10 on it. The first day I used it, I had no issues, but this morning it froze no less than 3 times, and each time the Caps Lock light flashed. I googled around, and I found what I hope is the solution in this post on the Ubuntu forums. It seems that this is the core issue:

System lock-ups with Intel 4965 wireless

The version of the iwlagn wireless driver for Intel 4965 wireless chipsets included in Linux kernel version 2.6.27 causes kernel panics when used with 802.11n or 802.11g networks. Users affected by this issue can install the linux-backports-modules-intrepid package, to install a newer version of this driver that corrects the bug. (Because the known fix requires a new version of the driver, it is not expected to be possible to include this fix in the main kernel package.)

As recommended, I did 'apt-get install backports-modules-intrepid' and I rebooted. That was around 1 hour ago, and I haven't seen any issues since. Hopefully that was it. BTW, when the Caps Lock light blinks, it means 'kernel panic'. Who knew.

Thursday, November 13, 2008

Python and MS Azure

You've probably heard by now of Microsoft's entry in the cloud computing race, dubbed Azure. What I didn't know until I saw it this morning on InfoQ was that Microsoft encourages the use of languages and tools other than their official ones. Here's what they say on the 'What is the Azure Service Platform' page:

"Windows Azure is an open platform that will support both Microsoft and non-Microsoft languages and environments. Windows Azure welcomes third party tools and languages such as Eclipse, Ruby, PHP, and Python."

While you and I may think MS says this just for marketing/PR purposes, it turns out they are walking the walk a bit. I was glad to see in the InfoQ article that a Microsoft guy wrote a Python wrapper on top of the Azure Data Storage APIs. Note that this is classic CPython, not IronPython. I assume more interesting stuff can be done with IronPython.

Wednesday, November 12, 2008

"phrase from nearest book" meme

Via Elliot:

  • Grab the nearest book.
  • Open it to page 56.
  • Find the fifth sentence.
  • Post the text of the sentence in your journal along with these instructions.
  • Don’t dig for your favorite book, the cool book, or the intellectual one: pick the CLOSEST.
Here's mine, from 'Kim' by Kipling:

"A little later a marriage procession would strike into the Grand Trunk with music and shoutings, and a smell of marigold and jasmine stronger even than the reek of the dust."

Not bad, I like it :-)

Monday, October 27, 2008

This is depressing: Ken Thompson is also a googler

Just found out that Ken Thompson, one of the creators of Unix, works at Google. You can see his answers to various questions addressed to Google engineers by following the link with his name on this page.

So let's see, Google has hired:

* Ken Thompson == Unix
* Vint Cerf == TCP/IP
* Andrew Morton == #2 in Linux
* Guido van Rossum == Python
* Ben Collins-Sussman and Brian Fitzpatrick == subversion
* Bram Moolenaar == vim

...and I'm sure there are countless others that I missed.

If this isn't a march towards world domination, I don't know what is :-)

Thursday, October 16, 2008

The case of the missing profile photo

Earlier today I posted a blog entry, then I went to view it on my blog, only to notice that my profile photo was conspicuously absent. I double-checked the URL for the source of the image -- it was http://agile.unisonis.com/gg.jpg. Then I remembered that I recently migrated agile.unisonis.com to my EC2 virtual machine. I quickly ssh-ed into my EC2 machine and saw that the persistent storage volume was not mounted. I ran uptime and noticed that it only showed 8 hours, so the machine had somehow been rebooted. In my experiments with setting up that machine, I had failed to add a line to /etc/fstab that causes the persistent storage volume to be mounted after the rebooted. Easily rectified:

echo "/dev/sds /ebs1 ext3 defaults 0 0" >> /etc/fstab

I connected to my EC2 environment with ElasticFox and saw that the EBS volume was still attached to my machine instance as /dev/sds, so I mounted it via 'mount /dev/sds/ /ebs1', then restarted httpd and mysqld, and all my sites were again up and running.

I tested my setup by rebooting. After the reboot, another surprise: httpd and mysqld were not chkconfig-ed on, so they didn't start automatically. I fixed that, I rebooted again, and finally everything came back as expected.

A few lessons learned here in terms of hosting your web sites in 'the cloud':

1) you need to test your machine setup across reboots
2) you need automated tests for your machine setup -- things like 'is httpd chkconfig-ed on?'; 'is /dev/sds mounted as /ebs1 in /etc/fstab?'
3) you need to monitor your sites from a location outside the cloud which hosts your sites; I shouldn't have to eyeball a profile photo to realize that my EC2 instance is not functioning properly!

I'll cover all these topics and more soon in some other posts, so stay tuned!

Recommended book: "Scalable Internet Architectures"

One of my co-workers, Nathan, introduced me to this book -- "Scalable Internet Architectures" by Theo Schlossnagle. I read it in one sitting. Recommended reading for anybody who cares about scaling their web site in terms of both web/application servers and database servers. It's especially appropriate in our day and age, when cloud computing is all the rage (more on this topic in another series of posts). My preferred chapters were "Static Content Serving" (talks about wackamole and spread) and "Static Meets Dynamic" (talks about web proxy caches such as squid).

I wish the database chapter contained more in-depth architectural discussions; instead, the author spends a lot of time showing a Perl script that is supposed to illustrate some of the concepts in the chapter, but falls very short of that in my opinion.

Overall though, highly recommended.

Wednesday, October 08, 2008

Example Django app needed

Dear lazyweb, I need a good sample Django application (with a database backend) to run on Amazon EC2. If the application has Ajax elements, even better.

Comments with suggestions would be greatly appreciated!

Thursday, October 02, 2008

Update on EC2 and EBS

I promised I'll give an update on my "Experiences with Amazon EC2 and EBS" post from a month ago. Well, I just got an email from Amazon, telling me:

Greetings from Amazon Web Services,

This e-mail confirms that your latest billing statement is available on the AWS web site. Your account will be charged the following:

Total: $73.74

So there you have it. That's how much it cost me to run the new SoCal Piggies wiki, as well as some other small sites, with very little traffic. Your mileage will definitely vary, especially if you run a high-traffic site.

I also said I'll give an update on running a MySQL database on EBS. It turns out it's really easy. On my Fedora Core 8 AMI, I did this:

* installed mysql packages via yum:

yum -y install mysql mysql-server mysql-devel

* moved the default data directory for mysql (/var/lib/mysql) to /ebs1/mysql (where /ebs1 is the mount point of my 10 GB EBS volume), then symlinked /ebs1/mysql back to /var/lib, so that everything continues to work as expected as far as MySQL is concerned:

service mysqld stop
mv /var/lib/mysql /ebs1/mysql
ln -s /ebs1/mysql /var/lib
service mysqld start

That's about it. I also used the handy snapshot functionality in the ElasticFox plugin and backed up the EBS volume to S3. In case you lose your existing EBS volume, you just create another volume from the snapshot, specify a size for it, and associate it with your AMI instance. Then you mount it as usual.

Update 10/03/08

In response to comments inquiring about a more precise breakdown of the monthly cost, here it is:

$0.10 per Small Instance (m1.small) instance-hour (or partial hour) x 721 hours = $72.10

$0.100 per GB Internet Data Transfer - all data transfer into Amazon EC2 x 0.607 GB = $0.06

$0.170 per GB Internet Data Transfer - first 10 TB / month data transfer out of Amazon EC2 x 2.719 GB = $0.46

$0.010 per GB Regional Data Transfer - in/out between Availability Zones or when using public IP or Elastic IP addresses x 0.002 GB = $0.01

$0.10 per GB-Month of EBS provisioned storage x 9.958 GB-Mo = $1.00

$0.10 per 1 million EBS I/O requests x 266,331 IOs = $0.03

$0.15 per GB-Month of EBS snapshot data stored x 0.104 GB-Mo = $0.02

$0.01 per 1,000 EBS PUT requests (when saving a snapshot) x 159 Requests = $0.01

EC2 TOTAL: $73.69
Other S3 costs (outside of EC2): $0.05

GRAND TOTAL: $73.74

Friday, September 19, 2008

Presubmit testing at Google

Here is an interesting blog post from Marc Kaplan, test engineering manager at Google, on their strategy of running what they call 'presubmit tests' -- tests that are run automatically before the code gets checked in. They include performance tests, and they compare the performance of the new code with baselines from the previous week, then report back nice graphs showing the delta. Very cool.

Monday, September 15, 2008

"Unmaintained Free Software" wiki

Thanks to Heikki Toivonen, who left a comment to my previous post and pointed me to this Unmaintained Free Software wiki. Python-related projects on that site are here. Hmmm...RPM is an unmaintained Python project? Don't think so. That site could use some love...maybe it is itself in need of a maintainer? This seems like a good Google App Engine project -- to put together a similar site with a database back-end, showing unmaintained Open Source projects....

Saturday, September 13, 2008

Know of any Open Source projects that need maintainers?

I got an email on a testing-related mailing list from somebody who would like to take over an Open Source project with no current maintainer. Here's a fragment of that email:

"Folks,
As I am interested in brushing up on my coding skills, so I would
appreciate your help in identifying an existing orphan/dormant
open-source tool/toolset project who needs an owner/maintainer.

I am especially interested in software process-oriented tools that
fill a hole in an agile development/test/management tool stack."

If anybody knows of such projects, especially with a testing or agile bent, please leave a comment here. Thanks!

Tuesday, September 02, 2008

Getting around the Firefox port-blocking annoyance

Firefox 3.x has introduced something I'm sure they call a 'feature', but is a major annoyance for any sysadmin and developer -- they block access to ports other than 80. I thought IE was the only browser that was brain-dead that way, but Firefox has proved me wrong. Anyway, here's a simple recipe for getting around this:

1) go to about:config in the Firefox address bar
2) right click, choose new->string
3) enter the name network.security.ports.banned.override and the value 1-65535
4) there is no step 4

Monday, September 01, 2008

Experiences with Amazon EC2 and EBS

I decided to port some of the sites I've been running for the last few years on a dedicated server (running RHEL9) to an Amazon EC2 AMI (which stands for 'Amazon Machine Image'). I also wanted to use some more recent features offered by Amazon in conjunction with their EC2 platform -- such as the permanent block-based storage AKA the Elastic Block Store (EBS), and also the permanent external IP addresses AKA the Elastic IPs.

To get started, I used a great blog post on 'Persistent Django on Amazon EC2 and EBS' by Thomas Brox Røst. I will refer here to some of the steps that Thomas details in his post; if you want to follow along, you're advised to read his post.

1) Create an AWS account and sign up for the EC2 service.

2) Install the ElasticFox Firefox extension -- the greatest thing since sliced bread in terms of managing EC2 AMIs. To run the ElasticFox GUI, go to Tools->ElasticFox in Firefox; this will launch a new tabbed window showing the GUI. From now on, I will abbreviate ElasticFox as EF.

3) Add your AWS user name and access keys in EF (use the Credentials button).

4) Add an EC2 security group (click on the 'Security Groups' tab in EF); this can be thought of as a firewall rule that will replace the default one. In my case, I called my group 'gg' and I allowed ports 80 and 443 (http and https) and 22 (ssh).

5) Add a keypair to be used when you ssh into your AMI (click on the 'KeyPairs' tab in EF). I named mine gg-ec2-keypair and I saved the private key in my .ssh folder on my local machine (.ssh/gg-ec2-keypair.pem).

6) Get a fixed external IP (click on the 'Elastic IPs' tab in EF). You will be assigned an IP which is not yet associated with any AMI.

7) Get a block-based storage volume that you can format later into a file system (click on the 'Volumes and Snapshots' tab in EF). I got a 10 GB volume.

These 7 steps are the foundation of everything else you need to do when running an AMI. Choosing and launching the AMI itself is the next step, which you can run any time you want to launch an AMI.

I followed Thomas's example and chose a 32-bit Fedora Core 8 image for my AMI. In EF, you can search for Fedora 8 images by going to the 'AMIs and Instances' tab and typing fedora-8 in the search box. Right click on the desired image (mine was called ec2-public-images/fedora-8-i386-base-v1.07.manifest.xml) and choose 'Launch instance(s) of this AMI'. You will need to choose a keypair (I chose the one I created earlier, gg-ec2-keypair), an availability zone (I chose the 'us-east-1a') and a security group (I removed the default one and added the one I created earlier).

You should immediately see the instance in a 'pending' state in the Instances list. After a couple of minutes, if you click Refresh you'll see it in the 'running' state, which means it's ready for you to access and work with.

Once my AMI was running, I right-clicked it and chose 'copy instance ID to clipboard'. The instance ID is needed to associate the EBS volume and the Elastic IP to this instance.

To associate the fixed external IP, I went to the 'Elastic IPs' tab in EF, right clicked on the Elastic IP I was assigned and chose 'Associate this address', then I indicated the instance ID of my running AMI. As a side note, if you don't see anything in a given EF list (such as Elastic IPs or Volumes), click Refresh and you should see it.

To associate the EBS volume, I went to the 'Volumes and Snapshots' tab in EF, right clicked on the volume I had created, then chose 'Attach this volume'. In the next dialog box, I specified the instance ID of my AMI, then /dev/sdh as the volume name.

The next step is to ssh into your AMI and format the raw block storage into a file system. You can use the Elastic IP you were assigned (let's call it A.B.C.D), and run:

$ ssh -i .ssh/your-private-key.pem root@A.B.C.D

At this point, you should be logged in into your AMI. To format the EBS volume, run:

# mkdir /ebs1; mount -t ext3 /dev/sdh /ebs1

If you want the mount point to persist across reboots, also add this line to /etc/fstab:

$ echo "/dev/sdh /ebs1 ext3 noatime 0 0" >> /etc/fstab

At this point, you have a bare-bones Fedora Core 8 instance accessible via HTTP, HTTPS and SSH at the IP address A.B.C.D. Not very useful in and of itself, unless you install your application.

In my case, the first Web site I wanted to port over was the SoCal Piggies wiki, at www.socal-piggies.org. I used to run it on MoinMoin 1.3.1on my old server, but for this brand-new AMI experiment I installed MoinMoin 1.7.1. I also had to install httpd and python-devel via yum. And since we're talking about package installs, here's the main point you should take away from this post: you need to install all required packages every time you re-launch your AMI. I'm not talking about rebooting your AMI, which preserves your file systems; I'm talking about terminating your AMI for any reason, then re-launching a new AMI instance. This operation will start your AMI with a clean slate in terms of packages that are installed. You can obviously re-mount the EBS volume that you created, and all your files will still be there, but those are typically application or database files, and not the actual required packages themselves (such as httpd or python-devel).

So, very important point: as soon as you start porting applications over to your AMI, you'd better start designing the layout of your apps so that they take full advantage of the EBS volume(s) you created. You'll also have to script the installation of the required packages, so you can easily run the script every time you launch a new instance of your AMI. This can be seen as a curse, but to me it's a blessing in disguise, because it forces you to automate the installation of your applications. Automation entails faster deployment, less errors, better testability. In short, you win in the long run.

For the first application I ported, the SoCal Piggies wiki, I made the following design decisions:

a) I chose to install MoinMoin 1.7.1 from scratch every time I launch a new AMI instance; I also install httpd, httpd-devel and python-devel from scratch every time
b) I chose to point the specific instance of the Piggies wiki to /ebs1/wikis/socal-piggies, so all the actual content of the wiki is kept persistently in the EBS volume
c) I moved /etc/httpd to /ebs1/httpd, then I created a symlink from /ebs1/httpd to /etc, so all the Apache configuration files are kept persistently in the EBS volume
d) I pointed the DocumentRoot of the Apache virtual host for the Piggies wiki to /ebs1/www/socal-piggies, so that all the static files that need to be accessed via the www.socal-piggies.org domain are kept persistenly in the EBS volume

So what do I have to do if I decide to terminate the current AMI instance, and launch a new one? Simple -- I first associate the Elastic IP and the EBS volume with the new instance via EF, then I ssh into the new AMI (which has the same external IP as the old one) and run this command line:

# mkdir /ebs1; mount -t ext3 /dev/sdh /ebs1

Then I go to /ebs1/scripts and run this script:
# cat mysetup.sh
#!/bin/bash

# Install various packages via yum
yum -y install python-devel
yum -y install httpd httpd-devel

# Create symlinks
mv /etc/httpd /etc/httpd.orig
ln -s /ebs1/httpd /etc

# Download and install MoinMoin
cd /tmp
rm -rf moin*
wget http://static.moinmo.in/files/moin-1.7.1.tar.gz
tar xvfz moin-1.7.1.tar.gz
cd moin-1.7.1
python setup.py install

# Start apache
service httpd start

# Make sure /ebs1 is mounted across reboots
echo "/dev/sdh /ebs1 ext3 noatime 0 0" >> /etc/fstab

Even better, I can script all this on my local machine, so I don't even have to log in via ssh. This is the command I run on my local machine:
ssh -i ~/.ssh/gg-ec2-keypair.pem 75.101.140.75 'mkdir /ebs1; mount -t ext3 /dev/
sdh /ebs1; /ebs1/scripts/mysetup.sh'

That's it! At this point, I have the Piggies wiki running on a brand-new AMI.

Two caveats here:

1) the ssh fingerprint of the remote AMI that had been saved in .ssh/known_hosts on your local machine will no longer be valid, so you'll get a big security warning the first time you will try ssh-ing into your new AMI. Just delete that line from known_hosts and ssh again.
2) it takes a while (for me it was up to 5 minutes) for the Elastic IP to be ready for you to ssh into after you associate it with a brand-new AMI; so in a disaster recovery situation, keep in mind that your site can potentially be down for 10-15 minutes, time in which you launch a new AMI, associate the Elastic IP and the EBS volume with it, and run your setup scripts.

My experience so far with EC2 and EBS has been positive. As I already mentioned, the fact that it forces you to design your application to take advantage of the persistent EBS volume, and to script the installation of the pre-requisite packages, is a net positive in my opinion.

The next step for me will be to port other sites with a MySQL database backend. Fun fun fun! I will blog soon about my experiences. In the mean time, go ahead and browse the brand-new SoCal Piggies wiki :-)

Thursday, August 28, 2008

Back up your Windows desktop to S3 with SecoBackup

I found SecoBackup to be a good tool for backing up a Windows machine to S3. They have a 'free community edition' version, but you will still pay more in terms of your S3 costs than what you would normally pay to Amazon. You basically sign up for the SecoBackup service 'powered by AWS' and you pay $0.20 per GB of storage and $0.20 per GB of bandwidth -- so double what you'd pay if you stored it directly on S3. You don't even need to have an Amazon S3 account, they take care of it transparently for you.

I think this is a good tool for backing up certain files on Windows-based desktops. For example I back up my Quicken files from within a Windows XP virtual image that I run inside VMWare workstation on top of my regular Ubuntu Hardy desktop.

Tuesday, August 26, 2008

Ruby refugees flocking to Python?

I just wanted to put it out there that I know at least one person who was very fired up about Ruby, only to find out that all the available Ruby jobs are for Ruby-on-Rails programmers. He doesn't like Web programming, so what was he to do? You guessed it -- he started to learn Python :-)

RTFL

No, this is not a misspelling for ROTFL, but rather a variant of RTFM. It stands for Read The F...riendly Log. It's a troubleshooting technique that is very basic, yet surprisingly overlooked. I use it all the time, and I just want to draw attention to it in case you find yourself stumped by a problem that seems mysterious.

Here are some recent examples from my work.

Apache wouldn't start properly

A 'ps -def | grep http' would show only the main httpd process, with no worker processes. The Apache error log showed these lines:

Digest: generating secret for digest authentication

A google search for this line revealed this article:

http://www.raptorized.com/2006/08/11/apache-hangs-on-digest-secret-generation/

It turns out the randomness/entropy on that box had been exhausted. I grabbed the rng-tools tar.gz from sourceforge, compiled and installed it, then ran

rngd -r /dev/urandom

...and apache started its worker processes instantly.

Cannot create InnoDB tables in MySQL

Here, all it took was to read the MySQL error log in /var/lib/mysql. It's very friendly indeed, and tells you exactly what to do!

InnoDB: Error: data file ./ibdata1 is of a different size
InnoDB: 2176 pages (rounded down to MB)
InnoDB: than specified in the .cnf file 128000 pages!
InnoDB: Could not open or create data files.
InnoDB: If you tried to add new data files, and it failed here,
InnoDB: you should now edit innodb_data_file_path in my.cnf back
InnoDB: to what it was, and remove the new ibdata files InnoDB created
InnoDB: in this failed attempt. InnoDB only wrote those files full of
InnoDB: zeros, but did not yet use them in any way. But be careful: do not
InnoDB: remove old data files which contain your precious data!

Windows-based Web sites are displaying errors

Many times I've seen Windows/IIS based Web sites displaying cryptical errors such as:

Server Error in '/' Application.
Runtime Error

The IIS logs are much less friendly in terms of useful information than the Apache logs. However, the Event Viewer is a good source of information. In a recent case, inspecting the Event Viewer told us that the account used to connect from the Web server to the DB server had expired, so re-enabling it was all it took to fix the issue.

In conclusion -- RTFL and google it! You'll be surprised how large of a percentage of issues you can solve this way.

Wednesday, July 16, 2008

This just in: Google releases Mox

...which is a YAPMOF (yet another Python mock object framework). Find it here, and read more about it here. I'll definitely check it out soon.

Monday, July 14, 2008

Zach and sugarbot going strong in Google SoC

Zach Riggle has made very strong progress with his Google SoC project, sugarbot. The goal of the project is to create a tool that runs and tests OLPC Sugar activities automatically. Zach needed a way to hook into the PyGTK code that Sugar is based on. After looking at various tools, he settled on kiwi. He managed to have sugarbot run as an activity inside Sugar, then launch any other activities that need to be tested. The test scripts are kept by an XML-RPC server that Zach wrote, and sugarbot-based clients get them from the server and run them. Just these last couple of days, Zach also managed to get the sugarbot activity launch automatically when the Sugar environment starts up.

You can see a screencast that Zach put together, as well as a list of his accomplishments so far, in this blog post. In the screencast, Zach shows how he automates the launching and testing of two Sugar activities, the Calculator and the Terminal. Very cool stuff.

It's been a pleasure mentoring Zach on his SoC project. He has already proven himself to possess strong software engineering skills, not only in programming, but also in designing complex pieces of software. I only had to provide minimal guidance to Zach, and he has been very receptive with all the advice I have given him. I liked the fact that he implemented an automated test suite for sugarbot, and he included it in a buildbot continuous integration process, only days after I suggested that to him. It has also been very satisfying to me as a mentor to see his progress as exemplified by his almost-daily blog posts. I believe he is the most active blogger on Planet SoC. Good job, Zach!

Wednesday, June 18, 2008

Celtics use Ubuntu to beat Lakers

Excerpt from the Associated Press article about the Celtics-Lakers game last night:

"It was a group effort by this gang in green, which bonded behind Rivers, who borrowed an African word ubuntu (pronounced Ooh-BOON-too) and roughly means "I am, because we are" in English, as the Celtics' unifying team motto.

The Celtics gave the Lakers a 12-minute crash course of ubuntu in the second quarter.

Boston outscored Los Angeles 34-19, getting 11 field goals on 11 assists. The Celtics toyed with the Lakers, outworking the Western Conference's best inside and out and showing the same kind of heart that made Boston the center of pro basketball's universe in the '60s. "

It's not what you thought, but it's still nice to see that the ubuntu concept is used successfully in sports too. I wonder what parallel we can make between the Lakers' game last night and an operating system. The Windows Blue Screen of Death comes to mind.

Tuesday, June 17, 2008

Security testing for agile testers

I've been asked by Lisa Crispin to contribute a few paragraphs on security testing to an upcoming book on agile testing that she and Janet Gregory are co-authoring. Here's what I came up with:

Security testing is a broad topic that cannot be possibly covered in a few paragraphs. Whole books have been devoted to this subject. Here we will try to at least provide some guidelines and pointers to books and tools that might prove useful to agile teams interested in security testing.

Just like functional testing, security testing can be viewed and conducted from two perspectives: from the inside out (white-box testing) and from the outside in (black-box testing).

Inside-out security testing assumes that the source code for the application under test is available to the testers. The code can be analyzed statically with a variety of tools that try to discover common coding errors which can make the application vulnerable to attacks such as buffer overflows or format string attacks. (Resources:
http://en.wikipedia.org/wiki/Buffer_overflow and
http://en.wikipedia.org/wiki/Format_string_vulnerabilities)

A list of tools that can be used for static code analysis can be found here:
http://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis

The fact that the testers have access to the source code of the application also means that they can map what some books call "the attack surface" of the application, which is the list of all the inputs and resources used by the program under test. Armed with a knowledge of the attack surface, testers can then apply a variety of techniques that attempt to break the security of the application. A very effective class of such techniques is called fuzzing and is based on fault injection. Using this technique, the testers try to make the application fail by feeding it various types of inputs (hence the term fault injection). These inputs range from carefully crafted strings used in SQL Injection attacks, to random byte changes in given input files, to random strings fed as command line arguments. (Resources:
http://www.fuzzing.org/category/fuzzing-book/ and
http://www.fuzzing.org/fuzzing-software)

The outside-in approach is the one mostly used by attackers that try to penetrate into the servers or the network hosting your application. As a security tester, you need to have the same mindset that attackers do, which means that you have to use your creativity in discovering and exploiting vulnerabilities in your own application. You also need to stay up to date with the latest security news and updates related to the platform/operating system your application runs on. These tasks are by no means easy, they require extensive knowledge, and as such are mostly outsourced to third parties that specialize in security testing.

So what are agile testers to do when faced with the apparently insurmountable task of testing the security of their application? Here are some practical, pragmatic steps that anybody can follow:

1. Adopt a continuous integration (CI) process that periodically runs a suite of automated tests against your application.

2. Learn how to use one or more open source static code analysis tools. Add a step to your CI process which consists of running these tools against your application code. Mark the step as failed it the tools find any critical vulnerabilities.

3. Install an automated security vulnerability scanner such as Nessus
(http://www.nessus.org/nessus/). Nessus can be run in a command-line, non-GUI mode, which makes it suitable for inclusion in a CI tool. Add a step to your CI process which consists of running Nessus against your application. Capture the Nessus output in a file and parse that file for any high importance security holes found by the scanner. Mark the step as FAIL when any such holes are found.

4. Learn how to use one or more open source fuzzing tools. Add a step to your CI process which consists of running these tools against your application code. Mark the step as failed it the tools find any critical vulnerabilities.

As with any automated testing effort, running these tools is no guarantee that your code and your application will be free of security defects. However, running these tools will go a long way towards improving the quality of your application in terms of security. As always, the 80/20 rule applies. These tools will probably find the 80% most common security bugs out there while requiring 20% of your security budget.

To find the remaining 20% security defects, you're well advised to spend the other 80% of your security budget on high quality security experts. They will be able to test your application security thoroughly by the use of techniques such as SQL injection, code injection, remote code inclusion and cross-site scripting. While there are some tools that try to automate some of these techniques, they are no match for a trained professional who takes the time to understand the inner workings of your application in order to craft the perfect attack against it.

Tools for troubleshooting Web app performance

I came across this blog post which talks about 15 tools that can make your life easier when you need to troubleshoot the performance of your Web application. I knew about most of them, but a new addition to my arsenal is definitely wbox -- think of it as an HTTP-based ping. Very simple, but extremely useful.

Thursday, June 12, 2008

What does your Wordle look like?

The meme du jour seems to be Wordle tag clouds. I couldn't resist generating one out of the text on the first page of my blog. Here it is, in all its splendor:



You would think I'm very self-centered, since my first and last names appear so prominently. But I think it's because every blog post ends with "posted by Grig Gheorghiu at ". The next biggest word is Python, so there's some redemption for me right there :-)

Friday, May 23, 2008

Incremental backups to Amazon S3

Based on this great blog post by Tim McCormack, I managed to write some scripts that back up files to Amazon S3. The files are encrypted with GnuPG and rsync-ed to S3 using a Python-based tool called duplicity.

Here's what I did in order to get all this going on a CentOS 5.1 server running Python 2.5.

1) Signed up for Amazon S3 and got the AWS_ACCESS_KEY_ID and the AWS_SECRET_ACCESS_KEY.

2) Downloaded and installed the following packages: boto, GnuPGInterface, librsync, duplicity. All of them except librsync are Python-based, so they can be installed via 'python setup.py install'. For librsync you need to use './configure; make; make install'.

3) Generated a GPG key pair using "gpg --gen-key". Made a note of the hex fingerprint of the key (you can list the fingerprints of your keys via "gpg --fingerprint").

4) Wrote a simple boto-based Python script to create and list S3 buckets (the equivalent of directories in S3 parlance). Note that boto uses SSL, so your Python installation needs to have SSL enabled.

Here's how the script looks:

#!/usr/bin/env python

ACCESS_KEY_ID = 'theaccesskeyid'
SECRET_ACCESS_KEY = 'thesecretaccesskey'

from boto.s3.connection import S3Connection
conn = S3Connection(ACCESS_KEY_ID, SECRET_ACCESS_KEY)
buckets = [
'mybuckets_myserver_mysqldump',
'mybuckets_myserver_full',
]
for bucket in buckets:
conn.create_bucket(bucket)
rs = conn.get_all_buckets()
print 'Bucket listing:'
for b in rs:
print b.name

5) Wrote a bash script (heavily influenced by Tim McCormack's post) that runs duplicity and backs up the root partition of my Linux server (minus some directories) to S3. The nice thing about duplicity is that it uses rsync, so it only transfers the diffs over the wire. Here's how my script looks like:

export myEncryptionKeyFingerprint=somehexnumber
export mySigningKeyFingerprint=somehexnumber
export AWS_ACCESS_KEY_ID=accesskeyid
export AWS_SECRET_ACCESS_KEY=secretaccesskey
export PASSPHRASE=mypassphrase

/usr/local/bin/duplicity --encrypt-key=$myEncryptionKeyFingerprint
--sign-key=$mySigningKeyFingerprint --exclude=/sys --exclude=/dev
--exclude=/proc --exclude=/tmp --exclude=/mnt --exclude=/media /
s3+http://mybuckets_myserver_full

export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export PASSPHRASE=
NOTE: duplicity will interactively prompt you for your GPG key's passphrase, unless you have a variable called PASSPHRASE that contains the passphrase. Since I wanted to run this script as a cron job, I chose the less secure way of specifying the passphrase in clear inside the script. YMMV.

That's about it. Running the script produces an output such as this:

--------------[ Backup Statistics ]--------------
StartTime 1211482825.55 (Thu May 22 12:00:25 2008)
EndTime 1211488426.17 (Thu May 22 13:33:46 2008)
ElapsedTime 5600.62 (1 hour 33 minutes 20.62 seconds)
SourceFiles 174531
SourceFileSize 5080402735 (4.73 GB)
NewFiles 174531
NewFileSize 5080402735 (4.73 GB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 174531
RawDeltaSize 1200920038 (1.12 GB)
TotalDestinationSizeChange 2702953170 (2.52 GB)
Errors 0
-------------------------------------------------
The first time you run the script it will take a while, but subsequent runs will only back up the files that were changed since the last run. For example, my second run transferred only 19.3 MB:

--------------[ Backup Statistics ]--------------
StartTime 1211529638.99 (Fri May 23 01:00:38 2008)
EndTime 1211529784.18 (Fri May 23 01:03:04 2008)
ElapsedTime 145.19 (2 minutes 25.19 seconds)
SourceFiles 174522
SourceFileSize 5084478500 (4.74 GB)
NewFiles 64
NewFileSize 2280357 (2.17 MB)
DeletedFiles 28
ChangedFiles 418
ChangedFileSize 217974696 (208 MB)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 510
RawDeltaSize 2465010 (2.35 MB)
TotalDestinationSizeChange 20211663 (19.3 MB)
Errors 0

ASas
-------------------------------------------------
To restore files from S3, you use duplicity and specify the source as s3+http://mybuckets_myserver_full and the destination as a local directory.

Thanks to Tim McCormack for his detailed blog post, it made things so much easier than digging all this info by Google Fu.

Monday, May 19, 2008

Compiling Python 2.5 with SSL support

If you compile Python 2.5.x from source, you need to jump through some hoops so that SSL support is enabled. Googling around, I found Patrick Altman's excellent blog post talking about this very issue.

In my case, I needed to enable SSL support for Python 2.5.2 on CentOS 5.1. I already had the openssl development libraries installed:

# yum list installed | grep ssl
mod_ssl.i386 1:2.2.3-11.el5_1.cento installed
openssl.i686 0.9.8b-8.3.el5_0.2 installed
openssl-devel.i386 0.9.8b-8.3.el5_0.2 installed

Here's what I did next, following Patrick's post:

1) edited Modules/Setup.dist from the Python 2.5.2 source distribution and made sure the correct lines were put back in (they were commented out by default):

_socket socketmodule.c

# Socket module helper for SSL support; you must comment out the other
# socket line above, and possibly edit the SSL variable:
#SSL=/usr/local/ssl
_ssl _ssl.c \
-DUSE_SSL -I$(SSL)/include -I$(SSL)/include/openssl \
-L$(SSL)/lib -lssl -lcrypto

2) ran ./configure; make; make install

3) verified that I can access socket.ssl:

# python2.5
Python 2.5.2 (r252:60911, May 19 2008, 14:23:27)
[GCC 4.1.2 20070626 (Red Hat 4.1.2-14)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> socket.ssl
function ssl at 0xb7ef410c>

That's it. Not sure why it's so non-intuitive though.

Thursday, May 15, 2008

Encrypting a Linux root partition with LUKS and DM-CRYPT

One of our customers needed to have his Linux laptop's root partition encrypted. We found a HOWTO on achieving this with RHEL5, and we adapted it for CentOS 5. The technique is based on LUKS and DM-CRYPT. Kudos to my colleague Chris Evans for going through the exercise of getting this to work on CentOS 5 and for producing the documentation that follows, which I'm posting here hoping that it will benefit somebody at some point.

* Boot off of a Live CD, I used Fedora Core 9 Preview
* Find out which disk is which; for me /dev/sda was the external usb, and /dev/sdb was the internal
sfdisk -d /dev/sdb | sfdisk /dev/sda
pvcreate --verbose /dev/sda2
vgextend --verbose VolGroup00 /dev/sda2
pvmove --verbose /dev/sdb2 /dev/sda2 # This takes ages
vgreduce --verbose VolGroup00 /dev/sdb2
pvremove --verbose /dev/sdb2
fdisk /dev/sdb
* Change the partition type to 83 for /dev/sdb2
* Here is when you get to choose the password that will protect your partition:
cryptsetup --verify-passphrase --key-size 256 luksFormat /dev/sdb2

cryptsetup luksOpen /dev/sdb2 cryptroot
pvcreate --verbose /dev/mapper/cryptroot
vgextend --verbose VolGroup00 /dev/mapper/cryptroot
pvmove --verbose /dev/sda2 /dev/mapper/cryptroot # This takes ages
vgreduce --verbose VolGroup00 /dev/sda2
pvremove --verbose /dev/sda2
mkdir /mnt/tmp
mount /dev/VolGroup00/LogVol00 /mnt/tmp
cp -ax /dev/* /mnt/tmp/dev # I said no to overwriting any files
chroot /mnt/tmp/
(chroot) # mount -t proc proc /proc
(chroot) # mount -t sysfs sysfs /sys
(chroot) # mount /boot
(chroot) # swapon -a
(chroot) # vgcfgbackup

For the initrd, the blog mentions /etc/sysconfig/mkinitrd as a file. CentOS had a directory, I tried doing their suggestion as a file in there, moving the directory out, and making the file as they suggested. Both failed. So I ran the following command:

(chroot) # mkinitrd -v /boot/initrd-2.6.18-53.el5.crypt.img --with=aes --with=sha256 --with=dm-crypt 2.6.18-53.el5

Now we need to modify the initrd so that it will decrypt the partition at boot time

(chroot) # cd /boot
(chroot) # mkdir /boot/initrd-2.6.18-53.el5.crypt.dir
(chroot) # cd /boot/initrd-2.6.18-53.el5.crypt.dir
(chroot) # gunzip < ../initrd-2.6.18-53.el5.crypt.img | cpio -ivd

Now, we need to modify init by adding the following lines after the line which reads “mkblkdevs” and before “echo Scanning and configuring dmraid supported devices.”:

echo Decrypting root device
cryptsetup luksOpen /dev/sda2 cryptroot
echo Scanning logical volumes
lvm vgscan --ignorelockingfailure
echo Activating logical volumes
lvm vgchange -ay --ignorelockingfailure vg00

Copy cryptsetup and lvm to be put into the initrd, the blog doesn't mention it, but I'm sure it needs it.

cp /sbin/cryptsetup bin/
cp /sbin/lvm bin/

Compress the new initrd

find ./ | cpio -H newc -o | gzip -9 > /boot/initrd-2.6.18-53.el5.crypt.img

Modify the grub.conf. Copy the grub entry for the current kernel, and change as follows

title Centos Encrypted Server (2.6.18-53.1.4.el5)
initrd /initrd-2.6.18-53.el5.crypt.img

Unmount the fs's in the chroot, and exit

cd /
umount /boot
umount /proc
umount /sys
exit

NOTE: Don't upgrade the kernel without upgrading the initrd and grub.conf.

Reboot and test :)

At this point you have an encrypted root partition. You should be prompted for a password during the boot process (the boot partition is not encrypted). If somebody steals your laptop, they won't be able to mount the root partition without knowing the password.

After you have crypto setup, you can find out information about it (such as the crypto algorithm used) via this command:

# cryptsetup luksDump /dev/sda2
LUKS header information for /dev/sda2

Version: 1
Cipher name: aes
Cipher mode: cbc-essiv:sha256
Hash spec: sha1
Payload offset: 2056
MK bits: 256
MK digest: af 2e e6 39 3e 79 60 bb 4a 2b 33 05 1c 86 3a 83 bc a0 ef c1
MK salt: 79 b2 13 53 6f 52 72 a1 b5 3d dc d3 72 cd d6 f4
e3 25 3c 6e 08 00 f3 1d 44 1e 90 47 bc 43 e7 07
MK iterations: 10
UUID: 721abe52-5122-447b-8ed0-5ca3b2b32366

Key Slot 0: ENABLED
Iterations: 247223
Salt: 86 c7 53 6a 13 a9 77 81 89 ec 90 b3 e5 6a ea 8d
da 0c 6f ad ec 3e 3c 47 2d 6e 5f 59 28 4e 7c 63
Key material offset: 8
AF stripes: 4000
Key Slot 1: DISABLED
Key Slot 2: DISABLED
Key Slot 3: DISABLED
Key Slot 4: DISABLED
Key Slot 5: DISABLED

Thursday, May 08, 2008

Notes from the latest SoCal Piggies meeting

...have been posted to the "Happenings in Python User groups" blog.

Update: Ben Bangert sent me the slides he used. You can download or view the PDF from here.

Monday, May 05, 2008

Guido open sources Code Review app running on GAPE

Not sure why this wasn't publicized more, but Guido van Rossum announced today that he open sourced the code for Code Review, a Google AppEngine app he released last week. Code Review is based on Mondrian, the internal code review tool that Guido wrote for Google. The relationship between the two apps in terms of features is: Code Review < Mondrian.

The code for Code Review is part of a Google code project called Rietveld. I haven't looked at it yet, but I'll certainly do so soon, just to see the master's view on how to write a GAPE application.

Ruby to Python bytecode compiler

Kumar beat me to it, but I'll mention it here too: Why the Lucky Stiff published a Ruby-to-Python-bytecode compiler, as well as tools to decompile the byte code into source code. According to the README file, he based his work on blog posts by Ned Batchelder related to dissecting Python bytecode. I wholeheartedly agree with Why's comment at the end of the README file:

  You know, it's crazy that Python
and Ruby fans find themselves
battling so much. While syntax
is different, this exercise
proves how close they are to
each other! And, yes, I like
Ruby's syntax and can think much
better in it, but it would be
nice to share libs with Python
folk and not have to wait forever
for a mythical VM that runs all
possible languages.

Tuesday, April 29, 2008

Special guest for next SoCal Piggies meeting

We'll have the SoCal Piggies meeting this Thursday May 1st at the Gorilla Nation office in Culver City. Our special guest will be Ben Bangert, the creator of Pylons, who will give us an introduction to his framework. We'll also have a presentation from Pablo Noego from Gorilla Nation on a chat application he wrote using Google App Engine. We'll probably also have an informal discussion on Python mock testing tools and techniques.

BTW, I am putting together a Google code project for mock testing techniques in Python, in preparation for a presentation I would like to give to the group at some point. I called the project moctep, in honor of that ancient Egyptian deity, the protector of testers (or mockers, or maybe both). It doesn't have much so far, but there's some sample code you can browse through in the svn repository if you're curious. I'll be adding more meat to it soon.

Anyway, if you're a Pythonista who happens to be in the L.A. area on Thursday, please consider attending our meeting. It will be lots of fun, guaranteed.

Tuesday, April 22, 2008

"OLPC Automated Testing" project accepted for SoC

I'm happy to say that Zach Riggle's application for this year's Google Summer of Code, "OLPC Project Automated Testing", was accepted. I'm looking forward to mentoring Zach, and having Titus as a backup mentor. There's some very cool stuff that can be done in this area, and I hope that at the end of the summer we'll have some solid automated testing techniques and tools that can be applied to any Python project, not only to the OLPC Sugar environment. Stay tuned for more info on this project. BTW, here is the list of PSF-sponsored applications accepted for this years' SoC.

Thursday, April 17, 2008

Come work for RIS Technology

We just posted this on craigslist, but it never hurts to blog about it too. If you're interested, send an email to techjobs at ristech.net. You and I might get to work together on the same team!

Open Source Tech Top Guns Wanted

Are you a passionate Linux user? Are you running the latest Ubuntu alpha release on your laptop just because you can? Are you wired to the latest technologies -- things like Amazon EC2/S3 and Google AppEngine? Are you a virtuoso when it comes to virtualization (Xen/VMWare)?

Do you program in Python? Do you take hard problems as personal challenges and don't give up until you solve them?

RIS Technology Inc. is a rapidly growing Los Angeles-based premium managed hosting provider that hosts and manages internet applications for medium to large size organizations nationwide. We have grown consistently at 100% each of the past four years and are currently hiring for additional growth at our corporate operations center near LAX, in Los Angeles, CA. We have immediate openings for dedicated and knowledgeable technology engineers. If the answer to the questions above is YES, then we'd like to extend an invitation to interview with us.

We are an equal opportunity employer and have excellent benefits. We realize that one of the main things that makes us excellent are the people we choose to work with. We look for the best and brightest and our goal is to make work less "work" and more fun.

Wednesday, April 16, 2008

Google App Engine feels constrictive

I've been toying a bit with Google App Engine. I was lucky enough to score one of the 10,000 developer accounts. I first went through their tutorial, which was fine. Then I tried to port a simple application that I used to run from the command line, which queried a range of IP addresses for their reverse DNS names. No luck. I was using the dnspython module, which in turn uses the Python socket module -- and socket is not available within the Google App Engine sandbox environment.

Also, I was talking to Michał on rewriting the Cheesecake service to run on Google App Engine, but he pointed out that cron jobs are not allowed, so that won't work either... It seems that with everything I've tried with GAE I've run into a wall so far. I know it's a 'paradigm change' for Web development, but still, I can't help wishing I had my favorite Python modules to play with.

What has your experience been with GAE so far? I know Kumar wrote a cool PyPI mirror in GAE, but I haven't seen many other 'real life' applications mentioned on Planet Python.

Friday, April 11, 2008

Ubuntu Gutsy woes with Intel 801 graphics card

I just upgraded my Dell Inspiron 6000 laptop to Ubuntu Gutsy last night. My graphics card is based on the Intel 810 chipset. After the upgrade, everything graphics-related was dog-slow. Scrolling in Firefox was choppy, IM-ing was choppy, even typing at the console was choppy. Surprisingly, I didn't find a lot of solutions to this problem. But many people on Ubuntu forums suggested disabling compiz/xgl, so that's what I ended up doing. In fact, I uninstalled all compiz and xgl-related packages, rebooted, and graphics became snappy again. Now back to trying to write an application to run on THE GOOGLE.

Thursday, April 10, 2008

Meme du jour: shell history

Here's mine from my Ubuntu laptop:

$ history|awk '{a[$2]++ } END{for(i in a){print a[i] " " i}}' |sort -rn|head
121 cd
91 ssh
82 ls
46 vi
28 python
26 scp
16 dig
12 more
7 twistd
6 rm

Thursday, April 03, 2008

Steve Loughran on 'Farms, Fabrics and Clouds'

Yesterday I and my colleagues at RIS Technology had the pleasure of attending a remote presentation given to us by Steve Loughran, who works as a researcher at HP Labs and is also a committer on the Ant project. I had seen Steve's slides from a presentation he gave at the University of Bristol on 'Farms, Fabrics and Clouds' back in December 2007, and I have been pestering him via email ever since, hoping to have him release a screencast. After much back and forth, Steve offered to simply present for now directly to us via Skype. He did it out of the goodness of his heart, but both he and I realized that there's a nice little business opportunity in this type of presentation: you release the slides with no audio, then you get hired to present to interested parties in person, remotely, via Skype and a shared set of slides, with a Q&A session at the end. Everybody wins in this scenario. Filing it in the 'ideas worth trying' category.

To come back to Steve's presentation -- here are the slides from a previous version. I hope he will soon post the updated version we saw yesterday, but the differences are not major. The co-author of the talk is Julio Guijarro. Their area of interest within HP Labs is the deployment of large applications across distributed resources and the management of these apps/resources with an eye to maximizing their output and minimizing their cost. A familiar (and hard) problem for everybody who works in the hosting industry.

Steve talked about how the infrastructure architectures have changed over the years from a single web server talking to a single database server, to clustering, and finally to server farms and computing-on-demand. The challenge for us 'server farmers' is to figure a way to manage thousands of servers, heaps of storage, a myriad of network infrastructure devices, and large distributed applications on top of that -- all while keeping everything purring and happy, running to their maximum potential. Sounds impossible, but Amazon seems to be doing a decent job at it. And in fact Steve spent quite some time talking about how Amazon changed the game by their S3 and EC2 offerings. Even though they're not quite ready for prime time in terms of production deployments, Amazon will soon get there. As a proof, see their recent introduction of static IP addresses in EC2, and of the possibility of running your application in different data centers.

In my opinion, the best of Steve's slides are the 'Assumptions that are now invalid' ones. They really turn the 'established facts and best practices' of infrastructure and application design on their heads. Here are some examples of assumptions that don't hold anymore in our day and time:
  • it is expensive to create, deploy and duplicate a new system, running a Linux image of your choice (see Instalinux as a counter-example)
  • system failure is unusal and 100% availability can be achieved
  • databases are the best form of storage
  • you need physical access to the data center
  • a single server farm needs to scale to infinity
My other favorite part, which is not in the online slides yet, is the concept of 'agile infrastructure'. I haven't seen this concept before applied to server hosting, but Steve has a great point here. If you look at something like Amazon EC2, where you can pay as you go, you can test you application in a smaller environment and then scale it up, you can move your application between data centers -- this is indeed an agile environment that also imposes some new demands on your application.

I really recommend that you check out Steve's slides. There's a lot to chew on, but you can't afford not to chew on it, if you have anything to do with the IT industry these days.

Here are a couple more links that might prove useful:
  • Anubis: a tuple-space implementation that uses multicast to share information between hosts within a site
  • SmartFrog: a technology from HP used to distribute and manage applications (think puppet but geared towards application deployment); see also Google video
Thanks again to Steve for presenting to us. Now, as a server farmer, I need to go back to my plow and try to improve it (maybe buy a tractor?)

Update: Steve has some more thoughts on the Agile Infrastructure concept. Intriguing. This is something I'll definitely keep a very close eye on and tinker with.

Wednesday, April 02, 2008

For you students interested in GSoC

If you're a student and you want to apply for a Python-related project for Google Summer of Code 2008, Matt Harrison has just the project for you. The project has to do with branch coverage analysis and reporting. Matt is willing to mentor too. It's a really good opportunity, so don't hesitate to apply. Hurry up though, the deadline is April 8th.

Tuesday, April 01, 2008

TurboGears and Pylons finally merging

This has been a long time coming, and fans of both projects have been eagerly waiting for it, but it's finally happened. Not sure if you've seen the announcements from Kevin Dangoor, Mark Ramm and Ben Bangert on their projects' mailing lists, but basically they boil down to "we feel like after the sprints at PyCon we made enough progress so that we can pull the trigger on merging the source code from the 2 projects in one common trunk." They make it sound like it was purely a technological problem, but I have my doubts about that. I think it was driven in part by the increasing popularity of Django. Unifying TurboGears and Pylons is a somewhat desperate measure to chip away at the Django market share. We'll see if it works or not. Check out the brand new page of the TurboPylons project.

Monday, March 31, 2008

ReviewBoard: open source code review tool

Via Marc Hedlund's post on O'Reilly Radar, here's an open source code review tool from VMWare: ReviewBoard. For all of us non-googlers out there, it's probably the next best thing to Guido's Mondrian (question: why has that tool not been released as open source?). Check out the sweet screenshots. The kicker though is that it uses Python and Django. Way to go, VMWare!

Python code complexity metrics and tools

There's a buzz in the air around code complexity, metrics, code coverage, etc. It started with Matt Harrison's PyCon presentation, then Ned Batchelder jumped in with a nice McCabe cyclomatic complexity computation/visualization tool, and now David Stanek posted about his pygenie tool -- which also measures the McCabe cyclomatic complexity of Python code. Now it's time to unify all these ideas in one powerful tool that computes not only complexity but also path or at least branch coverage. This would make a nice Google Summer of Code project. Too bad the deadline for 2008 GSoC applications is in 7 hours...Maybe for next year.

Update: David Goodger left a comment pointing me to Martin Blais's snakefood package, which computes and shows dependencies for your Python code. It's a good complement to the tools I mentioned above.

Friday, March 28, 2008

Recommended testing conference: CAST 2008

If you're a tester and are serious about learning and advancing in your trade, I warmly recommend the CAST 2008 conference which will be held in Toronto, July 14-16. The theme of the conference is "Beyond the Boundaries: Interdisciplinary Approaches to Software Testing" and the keynote speaker is none other than Jerry Weinberg. And it's REALLY hard to get Jerry Weinberg to speak at a conference, so you might as well take advantage of this opportunity. For more details on CAST 2008, download the PDF brochure.

It's a good time to be a Python programmer

We had the SoCal Piggies meeting at the Disney Animation Studios last night. It was a great meeting -- great presentations from Disney engineers on how they use Python at Disney (and they use it A LOT!), great food, great turnout, and great atmosphere. Let me tell you -- the Disney Animation Studios are *lush*. Thanks to Paul Hildebrandt for organizing the meeting.

I'll probably blog separately about the technical content of the presentations, but for now I just wanted to comment on the fact that everybody seems to be hiring Python programmers -- Gorilla Nation and Virgin Charter are just two companies in the L.A. area that are aggressively looking to hire Python talent. Another thing: we used to have difficulties in finding venues for our meetings. We used to meet at either USC or Caltech, and around 10-12 people max. would show up. Now companies are clamoring for organizing the meetings at their offices, and we have 20-30 people in the audience, with many new faces at every meeting. Even more: Ruby on Rails programmers are showing up at our meetings, looking for an opportunity to be more involved with Python!

I take that as a sign that Python has arrived. It's a good time to be a Python programmer (or tester, for that matter.)

Tuesday, March 25, 2008

Easy parsing with pyparsing

If you haven't used Paul McGuire's pyparsing module yet, you've been missing out on a great tool. Whenever you hit a wall trying to parse text with regular expressions or string operations, 'think pyparsing'.

I had the need to parse a load balancer configuration file and save certain values in a database. Most of the stuff I needed was fairly easily obtainable with regular expressions or Python string operations. However, I was stumped when I encountered a line such as:

bind http "Customer Server 1" http "Customer Server 2" http

This line 'binds' a 'virtual server' port to one or more 'real servers' and their ports (I'm using here this particular load balancer's jargon, but the concepts are the same for all load balancers.)

The syntax is 'bind' followed by a word denoting the virtual server port, followed by one or more pairs of real server names and ports. The kicker is that the real server names can be either a single word containing no whitespace, or multiple words enclosed in double quotes.

Splitting the line by spaces or double quotes is not the solution in this case. I started out by rolling my own little algorithm and keeping track of where I am inside the string, then I realized that I'm actually writing my own parser at this point. Time to reach for pyparsing.

I won't go into the details of how to use pyparsing, since there is great documentation available (see Paul's PyCon06 presentation, the examples on the pyparsing site, and also Paul's O'Reilly Shortcut book). Basically you need to define your grammar for the expression you need to parse, then translate it into pyparsing-specific constructs. Because pyparsing's API is so intuitive and powerful, the translation process is straightforward.

Here's how I ended up implementing my pyparsing grammar:

from pyparsing import *

def parse_bind_line(line):
quoted_real_server = dblQuotedString.setParseAction(removeQuotes)
real_server = Word(alphas, printables) | quoted_real_server
port = Word(alphanums)
real_server_port = Group(real_server + port)
bind_expr = Suppress(Literal("bind")) + \
port + \
OneOrMore(real_server_port)
return bind_expr.parseString(line)

That's all there is to it. You need to read it from the bottom up to see how the expression gets decomposed into elements, and elements get decomposed into sub-elements.

I'll explain each line, starting with the last one before the return:

bind_expr = Suppress(Literal("bind")) + \
port + \
OneOrMore(real_server_port)

A bind expression starts with the literal "bind", followed by a port, followed by one or more real server/port pairs. That's pretty much what the line above actually says, isn't it. The Suppress construct tells pyparsing that we're not interested in returning the literal "bind" in the final token list.


real_server_port = Group(real_server + port)

A real server/port pair is simply a real server name followed by a port. The Group construct tells pyparsing that we want to group these 2 tokens in a list inside the final token list.


port = Word(alphanums)

A port is a word composed of alphanumeric characters. In general, word means 'a sequence of characters containing no whitespace'. The 'alphanums' variable is a special pyparsing variable already containing the list of alphanumeric characters.


real_server = Word(alphas, printables) | quoted_real_server

A real server is either a single word, or an expression in quotes. Note that we can declare a pyparsing Word with 2 arguments; the 1st argument specifies the allowed characters for the initial character of the word, whereas the 2nd argument specified the allowed characters for the body of the word. In this case, we're saying that we want a real server name to start with an alphabetical character, but other than that it can contain any printable character.


quoted_real_server = dblQuotedString.setParseAction(removeQuotes)

Here is where you can glimpse the power of pyparsing. With this single statement we're parsing a sequence of words enclosed in double quotes, and we're saying that we're not interested in the quotes. There's also a sglQuotedString class for words enclosed in single quotes. Thanks to Paul for bringing this to my attention. My clumsy attempt at manually declaring a sequence of words enclosed in double quotes ran something like this:


no_quote_word = Word(alphanums+"-.")
quoted_real_server = Suppress(Literal("\"")) + \
OneOrMore(no_quote_word) + \
Suppress(Literal("\""))
quoted_real_server.setParseAction(lambda tokens: " ".join(tokens))

The only useful thing you can take away from this mumbo-jumbo is that you can associate an action with each token. When pyparsing will encounter that token, it will apply the action (function or class) you specified on that token. This is useful for doing validation of your tokens, for example for a date. Very powerful stuff.

Now it's time to test my function on a few strings:

if __name__ == "__main__":
tests = """\
bind http "Customer Server 1" http "Customer Server 2" http
bind http "Customer Server - 11" 81 "Customer Server 12" 82
bind http www.mywebsite.com-server1 http www.mywebsite.com-server2 http
bind ssl www.mywebsite.com-server1 ssl www.mywebsite.com-server2 ssl
bind http TEST-server http
bind http MY-cluster-web11 83 MY-cluster-web-12 83
bind http cust1-server1.site.com http cust1-server2.site.com http
""".splitlines()

for t in tests:
print parse_bind_line(t)


Running the code above produces this output:


$ ./parse_bind.py
['http', ['Customer Server 1', 'http'], ['Customer Server 2', 'http']]
['http', ['Customer Server - 11', '81'], ['Customer Server 12', '82']]
['http', ['www.mywebsite.com-server1', 'http'], ['www.mywebsite.com-server2', 'http']]
['ssl', ['www.mywebsite.com-server1', 'ssl'], ['www.mywebsite.com-server2', 'ssl']]
['http', ['TEST-server', 'http']]
['http', ['MY-cluster-web11', '83'], ['MY-cluster-web-12', '83']]
['http', ['cust1-server1.site.com', 'http'], ['cust1-server2.site.com', 'http']]

From here, I was able to quickly identify for a given virtual server everything I needed: a virtual server port, and all the real server/port pairs associated with it. Inserting all this into a database was just another step. The hard work had already been done by pyparsing.

Once more, kudos to Paul McGuire for creating such an useful and fun tool.

Sunday, March 23, 2008

PyCon08 gets great coverage

Reports on the death of the PyCon conference as a community experience have been greatly exaggerated. I personally have never seen any PyCon edition as well covered in the blogs aggregated in Planet Python as the 2008 PyCon. If you don't believe me, maybe you'll believe Google Blog Search. I think the Python community is alive and well, and ready to rock at PyCon conferences for the foreseeable future. I'm looking forward to PyCon09 in Chicago, and then probably in San Francisco for 2010/11.

Wednesday, March 19, 2008

PyCon presenters, unite!

If you gave a talk at PyCon and haven't uploaded your slides to the official PyCon website yet, but you have posted them online somewhere else, please leave a comment to this post with the location of your slides. I'm helping Doug Napoleone upload the slides, since some authors have experienced issues when trying to upload the slides using their PyCon account. Thanks!

Tuesday, March 18, 2008

Links to resources from PyCon talks

I took some notes at the PyCon talks I've been to, and I'm gathering links to resources referenced in these talks. Hopefully they'll be useful to somebody (I know they will be to me at least.)

"MPI Cluster Programming with Python and Amazon EC2" by Pete Skomoroch

* slides in PDF format
* Message Passing Interface (MPI) modules for Python: mpi4py, pympi
* ElasticWulf project (Beowulf-like setup on Amazon EC2)
* IPython1: parallel computing in Python
* EC2 gotchas

"Like Switching on the Light: Managing an Elastic Compute Cluster with Python" by George Belotsky

* S3FS: mount S3 as a local file system using Fuse (unstable)
* EC2UI: Firefox extension for managing EC2 clusters
* S3 Organizer: Firefox extension for managing S3 storage
* bundling an EC2 AMI and storing it to S3
* the boto library, which allows programmatic manipulation of Amazon Web services such as EC2, S3, SimpleDB etc. (a python-boto package is available for most Linux distributions too; for example 'yum install python-boto)

"PyTriton: building a petabyte storage system" by Jonathan Ellis

* All this was done at Mozy (online remote backup, now owned by EMC, just like Avamar, the company I used to work for)
* They maxed out Foundry load balancers, so they ended up using LVS + ipvsadm
* They used erasure coding for data integrity -- rolled their own algorithm but Jonathan recommended that people use zfec developed by AllMyData
* An alternative to erasure coding would be to use RAID6, which is used by Carbonite

"Use Google Spreadsheets API to create a database in the cloud" by Jeffrey Scudder

* slides online
* APIs and documentation on google code

"Supervisor as a platform" by Chris McDonough and Mike Naberezny

* slides online
* supervisord home page

"Managing complexity (and testing)" by Matt Harrison

* slides online
* PyMetrics module for measuring the McCabe complexity of your code
* coverage module and figleaf module for measuring your code coverage

Resources from lightning talks

* bug.gd -- online repository of solutions to bugs, backtraces, exceptions etc (you can easy_install bug.gd, then call error_help() after you get a traceback to try to get a solution)
* geopy -- geocode package
* pvote.org -- Ka-Ping Yee's electronic voting software in 460 lines of Python (see also Ping's PhD dissertation on the topic of Building Reliable Voting Machine Software)
* bitsyblog -- a minimalist approach to blog software

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...