Wednesday, September 15, 2010

Managing Rackspace CloudFiles with python-cloudfiles

I've started to use Rackspace CloudFiles as an alternate storage for database backups. I have the backups now on various EBS volumes in Amazon EC2, AND in CloudFiles, so that should be good enough for Disaster Recovery purposes, one would hope ;-)

I found the documentation for the python-cloudfiles package a bit lacking, so here's a quick post that walks through the common scenarios you encounter when managing CloudFiles containers and objects. I am not interested in the CDN aspect of CloudFiles for my purposes, so for that you'll need to dig on your own.

A CloudFiles container is similar to an Amazon S3 bucket, with one important difference: a container name cannot contain slashes, so you won't be able to mimic a file system hierarchy in CloudFiles the way you can do it in S3. A CloudFiles container, similar to an S3 bucket, contains objects -- which for CloudFiles have a max. size of 5 GB. So the CloudFiles storage landscape consists of 2 levels: a first level of containers (you can have an unlimited number of them), and a second level of objects embedded in containers. More details in the CloudFiles API Developer Guide (PDF).

Here's how you can use the python-cloudfiles package to perform CRUD operations on containers and objects.

Getting a connection to CloudFiles

First you need to obtain a connection to your CloudFiles account. You need a user name and an API key (the key can be generated via the Web interface at

conn = cloudfiles.get_connection(username=USERNAME, api_key=API_KEY, serviceNet=True)

When specifying serviceNet=True, the docs say that you will use the Rackspace ServiceNet network to access Cloud Files, and not the public network.

Listing containers and objects

Once you get a connection, you can list existing containers, and objects within a container:

containers = conn.get_all_containers()
for c in containers:
    print "\nOBJECTS FOR CONTAINER: %s" %
    objects = c.get_objects()
    for obj in objects:

Creating containers

container = conn.create_container(container_name)

Creating objects in a container

Assuming you have a list of filenames you want to upload to a given container:

for f in files:
    print 'Uploading %s to container %s' % (f, container_name)
    basename = os.path.basename(f)
    o = container.create_object(basename)

(note that the overview in the python-cloudfiles index.html doc has a typo -- it specifies 'load_from_file' instead of the correct 'load_from_filename')

Deleting containers and objects

You first need to delete all objects inside a container, then you can delete the container itself:

print 'Deleting container %s' %
print 'Deleting all objects first'
objects = c.get_objects()
for obj in objects:
print 'Now deleting the container'

Retrieving objects from a container

Remember that you don't have a backup process in place until you tested restores. So let's see how you retrieve objects that are stored in a CloudFiles container:

container_name = sys.argv[1]
containers = conn.get_all_containers()
c = None
for c in containers:
    if container_name ==
if not c:
    print "No countainer found with name %s" % container_name

target_dir = container_name
os.system('mkdir -p %s' % target_dir)
objects = c.get_objects()
for obj in objects:
    obj_name =
    print "Retrieving object %s" % obj_name
    target_file = "%s/%s" % (target_dir, obj_name)


Chuck said...

Hey Grig,

Just a small note. While you can not have slashes in the container name, you can have slashes in the object names. When combined with prefix queries this can be quite useful for backups. One of my colleagues recently wrote a blog post about this here:

Thanks for the post, and thanks for using cloudfiles! :)

Grig Gheorghiu said...

Chuck -- thanks for the comment, it's good to know! I'll give it a try.

Lars Nordin said...

Thanks for useful example!

I would add that if you are not using RackSpace's cloud but your own or someone else's then change the call to connection instantiation to:

conn = cloudfiles.get_connection(username=USERNAME, api_key=API_KEY, authurl=provider_url)
# removed serviceNet parameter since Rackspace specific