This is just a quick note about the advantage of using DynamoDB's newly introduced BatchWriteItem functionality, which allows you to write multiple items at the same time to a table, with the write operation parallelized behind the scenes by DynamoDB. Currently there is a limit of 25 items that can be batch-written or batch-deleted to/from a DynamoDB table.
I was glad to see that the boto library already supports this new feature -- the fact that Mitch Garnaat is now an employee of Amazon probably helps too ;-) You do have to git pull the latest boto code from GitHub, since BatchWriteItem is not available in the latest boto release 2.3.0.
I tested this feature inside a script which was parsing mail logs and uploading lines corresponding to certain regular expressions as items to a DynamoDB table. When I used the standard item-at-a-time method, it took 7 hours to write 2 million items into the table. When using BatchWriteItem, it only took 26 minutes -- so a 16x improvement.
Here's how I used this new functionality with boto:
1) I created a DynamoDB connection object and a table object:
dynamodb_conn = boto.connect_dynamodb(aws_access_key_id=MY_ACCESS_KEY_ID, aws_secret_access_key=MY_SECRET_ACCESS_KEY)
mytable = dynamodb_conn.get_table('mytable')
2) I created a batch_list object:
batch_list = dynamodb_conn.new_batch_write_list()
3) I populated this object with a list of DynamoDB items:
where items is a Python list containing item objects obtained via
4) I used the batch_write_item of the layer2 module in boto to write the batch list:
That was about it. I definitely recommend using BatchWriteItem whenever you can, for the speedup it provides.