Friday, May 23, 2008

Incremental backups to Amazon S3

Based on this great blog post by Tim McCormack, I managed to write some scripts that back up files to Amazon S3. The files are encrypted with GnuPG and rsync-ed to S3 using a Python-based tool called duplicity.

Here's what I did in order to get all this going on a CentOS 5.1 server running Python 2.5.

1) Signed up for Amazon S3 and got the AWS_ACCESS_KEY_ID and the AWS_SECRET_ACCESS_KEY.

2) Downloaded and installed the following packages: boto, GnuPGInterface, librsync, duplicity. All of them except librsync are Python-based, so they can be installed via 'python setup.py install'. For librsync you need to use './configure; make; make install'.

3) Generated a GPG key pair using "gpg --gen-key". Made a note of the hex fingerprint of the key (you can list the fingerprints of your keys via "gpg --fingerprint").

4) Wrote a simple boto-based Python script to create and list S3 buckets (the equivalent of directories in S3 parlance). Note that boto uses SSL, so your Python installation needs to have SSL enabled.

Here's how the script looks:

#!/usr/bin/env python

ACCESS_KEY_ID = 'theaccesskeyid'
SECRET_ACCESS_KEY = 'thesecretaccesskey'

from boto.s3.connection import S3Connection
conn = S3Connection(ACCESS_KEY_ID, SECRET_ACCESS_KEY)
buckets = [
'mybuckets_myserver_mysqldump',
'mybuckets_myserver_full',
]
for bucket in buckets:
conn.create_bucket(bucket)
rs = conn.get_all_buckets()
print 'Bucket listing:'
for b in rs:
print b.name

5) Wrote a bash script (heavily influenced by Tim McCormack's post) that runs duplicity and backs up the root partition of my Linux server (minus some directories) to S3. The nice thing about duplicity is that it uses rsync, so it only transfers the diffs over the wire. Here's how my script looks like:

export myEncryptionKeyFingerprint=somehexnumber
export mySigningKeyFingerprint=somehexnumber
export AWS_ACCESS_KEY_ID=accesskeyid
export AWS_SECRET_ACCESS_KEY=secretaccesskey
export PASSPHRASE=mypassphrase

/usr/local/bin/duplicity --encrypt-key=$myEncryptionKeyFingerprint
--sign-key=$mySigningKeyFingerprint --exclude=/sys --exclude=/dev
--exclude=/proc --exclude=/tmp --exclude=/mnt --exclude=/media /
s3+http://mybuckets_myserver_full

export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export PASSPHRASE=
NOTE: duplicity will interactively prompt you for your GPG key's passphrase, unless you have a variable called PASSPHRASE that contains the passphrase. Since I wanted to run this script as a cron job, I chose the less secure way of specifying the passphrase in clear inside the script. YMMV.

That's about it. Running the script produces an output such as this:

--------------[ Backup Statistics ]--------------
StartTime 1211482825.55 (Thu May 22 12:00:25 2008)
EndTime 1211488426.17 (Thu May 22 13:33:46 2008)
ElapsedTime 5600.62 (1 hour 33 minutes 20.62 seconds)
SourceFiles 174531
SourceFileSize 5080402735 (4.73 GB)
NewFiles 174531
NewFileSize 5080402735 (4.73 GB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 174531
RawDeltaSize 1200920038 (1.12 GB)
TotalDestinationSizeChange 2702953170 (2.52 GB)
Errors 0
-------------------------------------------------
The first time you run the script it will take a while, but subsequent runs will only back up the files that were changed since the last run. For example, my second run transferred only 19.3 MB:

--------------[ Backup Statistics ]--------------
StartTime 1211529638.99 (Fri May 23 01:00:38 2008)
EndTime 1211529784.18 (Fri May 23 01:03:04 2008)
ElapsedTime 145.19 (2 minutes 25.19 seconds)
SourceFiles 174522
SourceFileSize 5084478500 (4.74 GB)
NewFiles 64
NewFileSize 2280357 (2.17 MB)
DeletedFiles 28
ChangedFiles 418
ChangedFileSize 217974696 (208 MB)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 510
RawDeltaSize 2465010 (2.35 MB)
TotalDestinationSizeChange 20211663 (19.3 MB)
Errors 0

ASas
-------------------------------------------------
To restore files from S3, you use duplicity and specify the source as s3+http://mybuckets_myserver_full and the destination as a local directory.

Thanks to Tim McCormack for his detailed blog post, it made things so much easier than digging all this info by Google Fu.

Monday, May 19, 2008

Compiling Python 2.5 with SSL support

If you compile Python 2.5.x from source, you need to jump through some hoops so that SSL support is enabled. Googling around, I found Patrick Altman's excellent blog post talking about this very issue.

In my case, I needed to enable SSL support for Python 2.5.2 on CentOS 5.1. I already had the openssl development libraries installed:

# yum list installed | grep ssl
mod_ssl.i386 1:2.2.3-11.el5_1.cento installed
openssl.i686 0.9.8b-8.3.el5_0.2 installed
openssl-devel.i386 0.9.8b-8.3.el5_0.2 installed

Here's what I did next, following Patrick's post:

1) edited Modules/Setup.dist from the Python 2.5.2 source distribution and made sure the correct lines were put back in (they were commented out by default):

_socket socketmodule.c

# Socket module helper for SSL support; you must comment out the other
# socket line above, and possibly edit the SSL variable:
#SSL=/usr/local/ssl
_ssl _ssl.c \
-DUSE_SSL -I$(SSL)/include -I$(SSL)/include/openssl \
-L$(SSL)/lib -lssl -lcrypto

2) ran ./configure; make; make install

3) verified that I can access socket.ssl:

# python2.5
Python 2.5.2 (r252:60911, May 19 2008, 14:23:27)
[GCC 4.1.2 20070626 (Red Hat 4.1.2-14)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> socket.ssl
function ssl at 0xb7ef410c>

That's it. Not sure why it's so non-intuitive though.

Thursday, May 15, 2008

Encrypting a Linux root partition with LUKS and DM-CRYPT

One of our customers needed to have his Linux laptop's root partition encrypted. We found a HOWTO on achieving this with RHEL5, and we adapted it for CentOS 5. The technique is based on LUKS and DM-CRYPT. Kudos to my colleague Chris Evans for going through the exercise of getting this to work on CentOS 5 and for producing the documentation that follows, which I'm posting here hoping that it will benefit somebody at some point.

* Boot off of a Live CD, I used Fedora Core 9 Preview
* Find out which disk is which; for me /dev/sda was the external usb, and /dev/sdb was the internal
sfdisk -d /dev/sdb | sfdisk /dev/sda
pvcreate --verbose /dev/sda2
vgextend --verbose VolGroup00 /dev/sda2
pvmove --verbose /dev/sdb2 /dev/sda2 # This takes ages
vgreduce --verbose VolGroup00 /dev/sdb2
pvremove --verbose /dev/sdb2
fdisk /dev/sdb
* Change the partition type to 83 for /dev/sdb2
* Here is when you get to choose the password that will protect your partition:
cryptsetup --verify-passphrase --key-size 256 luksFormat /dev/sdb2

cryptsetup luksOpen /dev/sdb2 cryptroot
pvcreate --verbose /dev/mapper/cryptroot
vgextend --verbose VolGroup00 /dev/mapper/cryptroot
pvmove --verbose /dev/sda2 /dev/mapper/cryptroot # This takes ages
vgreduce --verbose VolGroup00 /dev/sda2
pvremove --verbose /dev/sda2
mkdir /mnt/tmp
mount /dev/VolGroup00/LogVol00 /mnt/tmp
cp -ax /dev/* /mnt/tmp/dev # I said no to overwriting any files
chroot /mnt/tmp/
(chroot) # mount -t proc proc /proc
(chroot) # mount -t sysfs sysfs /sys
(chroot) # mount /boot
(chroot) # swapon -a
(chroot) # vgcfgbackup

For the initrd, the blog mentions /etc/sysconfig/mkinitrd as a file. CentOS had a directory, I tried doing their suggestion as a file in there, moving the directory out, and making the file as they suggested. Both failed. So I ran the following command:

(chroot) # mkinitrd -v /boot/initrd-2.6.18-53.el5.crypt.img --with=aes --with=sha256 --with=dm-crypt 2.6.18-53.el5

Now we need to modify the initrd so that it will decrypt the partition at boot time

(chroot) # cd /boot
(chroot) # mkdir /boot/initrd-2.6.18-53.el5.crypt.dir
(chroot) # cd /boot/initrd-2.6.18-53.el5.crypt.dir
(chroot) # gunzip < ../initrd-2.6.18-53.el5.crypt.img | cpio -ivd

Now, we need to modify init by adding the following lines after the line which reads “mkblkdevs” and before “echo Scanning and configuring dmraid supported devices.”:

echo Decrypting root device
cryptsetup luksOpen /dev/sda2 cryptroot
echo Scanning logical volumes
lvm vgscan --ignorelockingfailure
echo Activating logical volumes
lvm vgchange -ay --ignorelockingfailure vg00

Copy cryptsetup and lvm to be put into the initrd, the blog doesn't mention it, but I'm sure it needs it.

cp /sbin/cryptsetup bin/
cp /sbin/lvm bin/

Compress the new initrd

find ./ | cpio -H newc -o | gzip -9 > /boot/initrd-2.6.18-53.el5.crypt.img

Modify the grub.conf. Copy the grub entry for the current kernel, and change as follows

title Centos Encrypted Server (2.6.18-53.1.4.el5)
initrd /initrd-2.6.18-53.el5.crypt.img

Unmount the fs's in the chroot, and exit

cd /
umount /boot
umount /proc
umount /sys
exit

NOTE: Don't upgrade the kernel without upgrading the initrd and grub.conf.

Reboot and test :)

At this point you have an encrypted root partition. You should be prompted for a password during the boot process (the boot partition is not encrypted). If somebody steals your laptop, they won't be able to mount the root partition without knowing the password.

After you have crypto setup, you can find out information about it (such as the crypto algorithm used) via this command:

# cryptsetup luksDump /dev/sda2
LUKS header information for /dev/sda2

Version: 1
Cipher name: aes
Cipher mode: cbc-essiv:sha256
Hash spec: sha1
Payload offset: 2056
MK bits: 256
MK digest: af 2e e6 39 3e 79 60 bb 4a 2b 33 05 1c 86 3a 83 bc a0 ef c1
MK salt: 79 b2 13 53 6f 52 72 a1 b5 3d dc d3 72 cd d6 f4
e3 25 3c 6e 08 00 f3 1d 44 1e 90 47 bc 43 e7 07
MK iterations: 10
UUID: 721abe52-5122-447b-8ed0-5ca3b2b32366

Key Slot 0: ENABLED
Iterations: 247223
Salt: 86 c7 53 6a 13 a9 77 81 89 ec 90 b3 e5 6a ea 8d
da 0c 6f ad ec 3e 3c 47 2d 6e 5f 59 28 4e 7c 63
Key material offset: 8
AF stripes: 4000
Key Slot 1: DISABLED
Key Slot 2: DISABLED
Key Slot 3: DISABLED
Key Slot 4: DISABLED
Key Slot 5: DISABLED

Thursday, May 08, 2008

Notes from the latest SoCal Piggies meeting

...have been posted to the "Happenings in Python User groups" blog.

Update: Ben Bangert sent me the slides he used. You can download or view the PDF from here.

Monday, May 05, 2008

Guido open sources Code Review app running on GAPE

Not sure why this wasn't publicized more, but Guido van Rossum announced today that he open sourced the code for Code Review, a Google AppEngine app he released last week. Code Review is based on Mondrian, the internal code review tool that Guido wrote for Google. The relationship between the two apps in terms of features is: Code Review < Mondrian.

The code for Code Review is part of a Google code project called Rietveld. I haven't looked at it yet, but I'll certainly do so soon, just to see the master's view on how to write a GAPE application.

Ruby to Python bytecode compiler

Kumar beat me to it, but I'll mention it here too: Why the Lucky Stiff published a Ruby-to-Python-bytecode compiler, as well as tools to decompile the byte code into source code. According to the README file, he based his work on blog posts by Ned Batchelder related to dissecting Python bytecode. I wholeheartedly agree with Why's comment at the end of the README file:

  You know, it's crazy that Python
and Ruby fans find themselves
battling so much. While syntax
is different, this exercise
proves how close they are to
each other! And, yes, I like
Ruby's syntax and can think much
better in it, but it would be
nice to share libs with Python
folk and not have to wait forever
for a mythical VM that runs all
possible languages.

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...