This is just a quick note about the advantage of using DynamoDB's newly introduced BatchWriteItem functionality, which allows you to write multiple items at the same time to a table, with the write operation parallelized behind the scenes by DynamoDB. Currently there is a limit of 25 items that can be batch-written or batch-deleted to/from a DynamoDB table.
I was glad to see that the boto library already supports this new feature -- the fact that Mitch Garnaat is now an employee of Amazon probably helps too ;-) You do have to git pull the latest boto code from GitHub, since BatchWriteItem is not available in the latest boto release 2.3.0.
I tested this feature inside a script which was parsing mail logs and uploading lines corresponding to certain regular expressions as items to a DynamoDB table. When I used the standard item-at-a-time method, it took 7 hours to write 2 million items into the table. When using BatchWriteItem, it only took 26 minutes -- so a 16x improvement.
Here's how I used this new functionality with boto:
1) I created a DynamoDB connection object and a table object:
dynamodb_conn = boto.connect_dynamodb(aws_access_key_id=MY_ACCESS_KEY_ID, aws_secret_access_key=MY_SECRET_ACCESS_KEY)
mytable = dynamodb_conn.get_table('mytable')
2) I created a batch_list object:
batch_list = dynamodb_conn.new_batch_write_list()
3) I populated this object with a list of DynamoDB items:
batch_list.add_batch(mytable, puts=items)
where items is a Python list containing item objects obtained via
mytable.new_item(attrs=item_attributes)
4) I used the batch_write_item of the layer2 module in boto to write the batch list:
dynamodb_conn.batch_write_item(batch_list)
That was about it. I definitely recommend using BatchWriteItem whenever you can, for the speedup it provides.
Subscribe to:
Post Comments (Atom)
Modifying EC2 security groups via AWS Lambda functions
One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...
-
A short but sweet PM Boulevard interview with Jerry Weinberg on Agile management/methods. Of course, he says we need to drop the A and actu...
-
Here's a good interview question for a tester: how do you define performance/load/stress testing? Many times people use these terms inte...
-
Update 02/26/07 -------- The link to the old httperf page wasn't working anymore. I updated it and pointed it to the new page at HP. Her...
2 comments:
HI, firstly, nice job putting this out there.
However, when I tried to batch write 30 items, I ran into this error
boto.dynamodb.exceptions.DynamoDBValidationError: DynamoDBValidationError: 400 Bad Request
{'message': 'Too many items requested for the BatchWriteItem call', '__type': 'com.amazon.coral.validate#ValidationException'}
From the documenation, I don't see any limit imposed
http://docs.pythonboto.org/en/latest/ref/dynamodb.html#boto.dynamodb.batch.BatchWriteList
Any thoughts on what I might be doing wrong?
@Anonymous
The underlying AWS API only allows 25 tasks (puts or deletes) per batch write.
Post a Comment