Showing posts from April, 2012

Using DynamoDB BatchWriteItem with boto

This is just a quick note about the advantage of using DynamoDB's newly introduced BatchWriteItem functionality, which allows you to write multiple items at the same time to a table, with the write operation parallelized behind the scenes by DynamoDB. Currently there is a limit of 25 items that can be batch-written or batch-deleted to/from a DynamoDB table.

I was glad to see that the boto library already supports this new feature -- the fact that Mitch Garnaat is now an employee of Amazon probably helps too ;-) You do have to git pull the latest boto code from GitHub, since BatchWriteItem is not available in the latest boto release 2.3.0.

I tested this feature inside a script which was parsing mail logs and uploading lines corresponding to certain regular expressions as items to a DynamoDB table. When I used the standard item-at-a-time method, it took 7 hours to write 2 million items into the table. When using BatchWriteItem, it only took 26 minutes -- so a 16x improvement.


Initial experiences with Amazon DynamoDB

I've been experimenting a bit with Amazon DynamoDB -- the "fully managed NoSQL database service that provides fast and predictable performance with seamless scalability" according to Amazon -- in order to see how easy to use it is, and what kind of performance you can get out of it. My initial impressions are favorable, with some caveats.

Defining tables

To get started with DynamoDB, you can use the AWS Console web interface. You need to define a table by giving it a name. Then you need to define a hash key, which enables DynamoDB to build an unordered hash index for partitioning and querying purposes. You can also define a range key, in which case DynamoDB will build an unordered hash index on the hash key, and an sorted range index on the range key. For most intents and purposes, the range key will be some sort of timestamp-related attribute of your data. You can find out more details in the DynamoDB Data Model documentation.

The most confusing part when defining a tabl…