Using DynamoDB BatchWriteItem with boto

This is just a quick note about the advantage of using DynamoDB's newly introduced BatchWriteItem functionality, which allows you to write multiple items at the same time to a table, with the write operation parallelized behind the scenes by DynamoDB. Currently there is a limit of 25 items that can be batch-written or batch-deleted to/from a DynamoDB table.

I was glad to see that the boto library already supports this new feature -- the fact that Mitch Garnaat is now an employee of Amazon probably helps too ;-) You do have to git pull the latest boto code from GitHub, since BatchWriteItem is not available in the latest boto release 2.3.0.

I tested this feature inside a script which was parsing mail logs and uploading lines corresponding to certain regular expressions as items to a DynamoDB table. When I used the standard item-at-a-time method, it took 7 hours to write 2 million items into the table. When using BatchWriteItem, it only took 26 minutes -- so a 16x improvement.

Here's how I used this new functionality with boto:

1) I created a DynamoDB connection object and a table object:

dynamodb_conn = boto.connect_dynamodb(aws_access_key_id=MY_ACCESS_KEY_ID, aws_secret_access_key=MY_SECRET_ACCESS_KEY)

mytable = dynamodb_conn.get_table('mytable')

2) I created a batch_list object:

batch_list = dynamodb_conn.new_batch_write_list()

3) I populated this object with a list of DynamoDB items:

batch_list.add_batch(mytable, puts=items)

where items is a Python list containing item objects obtained via


mytable.new_item(attrs=item_attributes)

4) I used the batch_write_item of the layer2 module in boto to write the batch list:

dynamodb_conn.batch_write_item(batch_list)

That was about it. I definitely recommend using BatchWriteItem whenever you can, for the speedup it provides.

Comments

Anonymous said…
HI, firstly, nice job putting this out there.

However, when I tried to batch write 30 items, I ran into this error

boto.dynamodb.exceptions.DynamoDBValidationError: DynamoDBValidationError: 400 Bad Request
{'message': 'Too many items requested for the BatchWriteItem call', '__type': 'com.amazon.coral.validate#ValidationException'}

From the documenation, I don't see any limit imposed
http://docs.pythonboto.org/en/latest/ref/dynamodb.html#boto.dynamodb.batch.BatchWriteList

Any thoughts on what I might be doing wrong?
Anonymous said…
@Anonymous

The underlying AWS API only allows 25 tasks (puts or deletes) per batch write.

Popular posts from this blog

Performance vs. load vs. stress testing

Dynamic DNS updates with nsupdate and BIND 9

Running Gatling load tests in Docker containers via Jenkins