This is just a quick note about the advantage of using DynamoDB's newly introduced BatchWriteItem functionality, which allows you to write multiple items at the same time to a table, with the write operation parallelized behind the scenes by DynamoDB. Currently there is a limit of 25 items that can be batch-written or batch-deleted to/from a DynamoDB table.
I was glad to see that the boto library already supports this new feature -- the fact that Mitch Garnaat is now an employee of Amazon probably helps too ;-) You do have to git pull the latest boto code from GitHub, since BatchWriteItem is not available in the latest boto release 2.3.0.
I tested this feature inside a script which was parsing mail logs and uploading lines corresponding to certain regular expressions as items to a DynamoDB table. When I used the standard item-at-a-time method, it took 7 hours to write 2 million items into the table. When using BatchWriteItem, it only took 26 minutes -- so a 16x improvement.
Here's how I used this new functionality with boto:
1) I created a DynamoDB connection object and a table object:
dynamodb_conn = boto.connect_dynamodb(aws_access_key_id=MY_ACCESS_KEY_ID, aws_secret_access_key=MY_SECRET_ACCESS_KEY)
mytable = dynamodb_conn.get_table('mytable')
2) I created a batch_list object:
batch_list = dynamodb_conn.new_batch_write_list()
3) I populated this object with a list of DynamoDB items:
batch_list.add_batch(mytable, puts=items)
where items is a Python list containing item objects obtained via
mytable.new_item(attrs=item_attributes)
4) I used the batch_write_item of the layer2 module in boto to write the batch list:
dynamodb_conn.batch_write_item(batch_list)
That was about it. I definitely recommend using BatchWriteItem whenever you can, for the speedup it provides.
2 comments:
HI, firstly, nice job putting this out there.
However, when I tried to batch write 30 items, I ran into this error
boto.dynamodb.exceptions.DynamoDBValidationError: DynamoDBValidationError: 400 Bad Request
{'message': 'Too many items requested for the BatchWriteItem call', '__type': 'com.amazon.coral.validate#ValidationException'}
From the documenation, I don't see any limit imposed
http://docs.pythonboto.org/en/latest/ref/dynamodb.html#boto.dynamodb.batch.BatchWriteList
Any thoughts on what I might be doing wrong?
@Anonymous
The underlying AWS API only allows 25 tasks (puts or deletes) per batch write.
Post a Comment