I found the documentation for the python-cloudfiles package a bit lacking, so here's a quick post that walks through the common scenarios you encounter when managing CloudFiles containers and objects. I am not interested in the CDN aspect of CloudFiles for my purposes, so for that you'll need to dig on your own.
A CloudFiles container is similar to an Amazon S3 bucket, with one important difference: a container name cannot contain slashes, so you won't be able to mimic a file system hierarchy in CloudFiles the way you can do it in S3. A CloudFiles container, similar to an S3 bucket, contains objects -- which for CloudFiles have a max. size of 5 GB. So the CloudFiles storage landscape consists of 2 levels: a first level of containers (you can have an unlimited number of them), and a second level of objects embedded in containers. More details in the CloudFiles API Developer Guide (PDF).
Here's how you can use the python-cloudfiles package to perform CRUD operations on containers and objects.
Getting a connection to CloudFiles
First you need to obtain a connection to your CloudFiles account. You need a user name and an API key (the key can be generated via the Web interface at https://manage.rackspacecloud.com).
conn = cloudfiles.get_connection(username=USERNAME, api_key=API_KEY, serviceNet=True)
When specifying serviceNet=True, the docs say that you will use the Rackspace ServiceNet network to access Cloud Files, and not the public network.
Listing containers and objects
Once you get a connection, you can list existing containers, and objects within a container:
containers = conn.get_all_containers() for c in containers: print "\nOBJECTS FOR CONTAINER: %s" % c.name objects = c.get_objects() for obj in objects: print obj.name
Creating containers
container = conn.create_container(container_name)
Creating objects in a container
Assuming you have a list of filenames you want to upload to a given container:
for f in files: print 'Uploading %s to container %s' % (f, container_name) basename = os.path.basename(f) o = container.create_object(basename) o.load_from_filename(f)
(note that the overview in the python-cloudfiles index.html doc has a typo -- it specifies 'load_from_file' instead of the correct 'load_from_filename')
Deleting containers and objects
You first need to delete all objects inside a container, then you can delete the container itself:
print 'Deleting container %s' % c.name print 'Deleting all objects first' objects = c.get_objects() for obj in objects: c.delete_object(obj.name) print 'Now deleting the container' conn.delete_container(c.name)
Retrieving objects from a container
Remember that you don't have a backup process in place until you tested restores. So let's see how you retrieve objects that are stored in a CloudFiles container:
container_name = sys.argv[1] containers = conn.get_all_containers() c = None for c in containers: if container_name == c.name: break if not c: print "No countainer found with name %s" % container_name sys.exit(1) target_dir = container_name os.system('mkdir -p %s' % target_dir) objects = c.get_objects() for obj in objects: obj_name = obj.name print "Retrieving object %s" % obj_name target_file = "%s/%s" % (target_dir, obj_name) obj.save_to_filename(target_file)
3 comments:
Hey Grig,
Just a small note. While you can not have slashes in the container name, you can have slashes in the object names. When combined with prefix queries this can be quite useful for backups. One of my colleagues recently wrote a blog post about this here:
http://programmerthoughts.com/programming/nested-folders-in-cloud-files/
Thanks for the post, and thanks for using cloudfiles! :)
Chuck -- thanks for the comment, it's good to know! I'll give it a try.
Thanks for useful example!
I would add that if you are not using RackSpace's cloud but your own or someone else's then change the call to connection instantiation to:
conn = cloudfiles.get_connection(username=USERNAME, api_key=API_KEY, authurl=provider_url)
# removed serviceNet parameter since Rackspace specific
Post a Comment