Wednesday, April 25, 2012

Using DynamoDB BatchWriteItem with boto

This is just a quick note about the advantage of using DynamoDB's newly introduced BatchWriteItem functionality, which allows you to write multiple items at the same time to a table, with the write operation parallelized behind the scenes by DynamoDB. Currently there is a limit of 25 items that can be batch-written or batch-deleted to/from a DynamoDB table.

I was glad to see that the boto library already supports this new feature -- the fact that Mitch Garnaat is now an employee of Amazon probably helps too ;-) You do have to git pull the latest boto code from GitHub, since BatchWriteItem is not available in the latest boto release 2.3.0.

I tested this feature inside a script which was parsing mail logs and uploading lines corresponding to certain regular expressions as items to a DynamoDB table. When I used the standard item-at-a-time method, it took 7 hours to write 2 million items into the table. When using BatchWriteItem, it only took 26 minutes -- so a 16x improvement.

Here's how I used this new functionality with boto:

1) I created a DynamoDB connection object and a table object:

dynamodb_conn = boto.connect_dynamodb(aws_access_key_id=MY_ACCESS_KEY_ID, aws_secret_access_key=MY_SECRET_ACCESS_KEY)

mytable = dynamodb_conn.get_table('mytable')

2) I created a batch_list object:

batch_list = dynamodb_conn.new_batch_write_list()

3) I populated this object with a list of DynamoDB items:

batch_list.add_batch(mytable, puts=items)

where items is a Python list containing item objects obtained via


mytable.new_item(attrs=item_attributes)

4) I used the batch_write_item of the layer2 module in boto to write the batch list:

dynamodb_conn.batch_write_item(batch_list)

That was about it. I definitely recommend using BatchWriteItem whenever you can, for the speedup it provides.

2 comments:

Anonymous said...

HI, firstly, nice job putting this out there.

However, when I tried to batch write 30 items, I ran into this error

boto.dynamodb.exceptions.DynamoDBValidationError: DynamoDBValidationError: 400 Bad Request
{'message': 'Too many items requested for the BatchWriteItem call', '__type': 'com.amazon.coral.validate#ValidationException'}

From the documenation, I don't see any limit imposed
http://docs.pythonboto.org/en/latest/ref/dynamodb.html#boto.dynamodb.batch.BatchWriteList

Any thoughts on what I might be doing wrong?

Anonymous said...

@Anonymous

The underlying AWS API only allows 25 tasks (puts or deletes) per batch write.