Here's what I did in order to get all this going on a CentOS 5.1 server running Python 2.5.
1) Signed up for Amazon S3 and got the AWS_ACCESS_KEY_ID and the AWS_SECRET_ACCESS_KEY.
2) Downloaded and installed the following packages: boto, GnuPGInterface, librsync, duplicity. All of them except librsync are Python-based, so they can be installed via 'python setup.py install'. For librsync you need to use './configure; make; make install'.
3) Generated a GPG key pair using "gpg --gen-key". Made a note of the hex fingerprint of the key (you can list the fingerprints of your keys via "gpg --fingerprint").
4) Wrote a simple boto-based Python script to create and list S3 buckets (the equivalent of directories in S3 parlance). Note that boto uses SSL, so your Python installation needs to have SSL enabled.
Here's how the script looks:
#!/usr/bin/env python
ACCESS_KEY_ID = 'theaccesskeyid'
SECRET_ACCESS_KEY = 'thesecretaccesskey'
from boto.s3.connection import S3Connection
conn = S3Connection(ACCESS_KEY_ID, SECRET_ACCESS_KEY)
buckets = [
'mybuckets_myserver_mysqldump',
'mybuckets_myserver_full',
]
for bucket in buckets:
conn.create_bucket(bucket)
rs = conn.get_all_buckets()
print 'Bucket listing:'
for b in rs:
print b.name
5) Wrote a bash script (heavily influenced by Tim McCormack's post) that runs duplicity and backs up the root partition of my Linux server (minus some directories) to S3. The nice thing about duplicity is that it uses rsync, so it only transfers the diffs over the wire. Here's how my script looks like:
NOTE: duplicity will interactively prompt you for your GPG key's passphrase, unless you have a variable called PASSPHRASE that contains the passphrase. Since I wanted to run this script as a cron job, I chose the less secure way of specifying the passphrase in clear inside the script. YMMV.
export myEncryptionKeyFingerprint=somehexnumber
export mySigningKeyFingerprint=somehexnumber
export AWS_ACCESS_KEY_ID=accesskeyid
export AWS_SECRET_ACCESS_KEY=secretaccesskey
export PASSPHRASE=mypassphrase
/usr/local/bin/duplicity --encrypt-key=$myEncryptionKeyFingerprint
--sign-key=$mySigningKeyFingerprint --exclude=/sys --exclude=/dev
--exclude=/proc --exclude=/tmp --exclude=/mnt --exclude=/media /
s3+http://mybuckets_myserver_full
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export PASSPHRASE=
That's about it. Running the script produces an output such as this:
The first time you run the script it will take a while, but subsequent runs will only back up the files that were changed since the last run. For example, my second run transferred only 19.3 MB:
--------------[ Backup Statistics ]--------------
StartTime 1211482825.55 (Thu May 22 12:00:25 2008)
EndTime 1211488426.17 (Thu May 22 13:33:46 2008)
ElapsedTime 5600.62 (1 hour 33 minutes 20.62 seconds)
SourceFiles 174531
SourceFileSize 5080402735 (4.73 GB)
NewFiles 174531
NewFileSize 5080402735 (4.73 GB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 174531
RawDeltaSize 1200920038 (1.12 GB)
TotalDestinationSizeChange 2702953170 (2.52 GB)
Errors 0
-------------------------------------------------
--------------[ Backup Statistics ]--------------To restore files from S3, you use duplicity and specify the source as s3+http://mybuckets_myserver_full and the destination as a local directory.
StartTime 1211529638.99 (Fri May 23 01:00:38 2008)
EndTime 1211529784.18 (Fri May 23 01:03:04 2008)
ElapsedTime 145.19 (2 minutes 25.19 seconds)
SourceFiles 174522
SourceFileSize 5084478500 (4.74 GB)
NewFiles 64
NewFileSize 2280357 (2.17 MB)
DeletedFiles 28
ChangedFiles 418
ChangedFileSize 217974696 (208 MB)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 510
RawDeltaSize 2465010 (2.35 MB)
TotalDestinationSizeChange 20211663 (19.3 MB)
Errors 0
ASas
-------------------------------------------------
Thanks to Tim McCormack for his detailed blog post, it made things so much easier than digging all this info by Google Fu.