Agile Testing: May 2012

Tuesday, May 29, 2012

A sweep through my Instapaper for May 2012

Here are some of the presentations/blog posts/articles I read this month, as saved in my Instapaper account. Maybe you'll find something useful in there too.

"Blameless PostMortems and a Just Culture" -- John Allspaw describes the post mortem process at Etsy, in particular how it encourages engineers to own up to their mistakes and to help others avoid them, all in a safe and blame-free environment
"Sprint.ly's Continuous Integration" -- good overview of what should be by now industry-standard methods and tools for rapid deployments and continuous integration
"Big List of 20 Common Bottlenecks" -- via the super useful High Scalability blog, a list of bottlenecks that you will hit one way or another if you do any serious high-volume systems work
"BASE: An ACID alternative" -- Dan Pritchett from EBay coined the term BASE (basically available, soft state, eventually consistent) to compare and contrast NoSQL systems with traditional ACID-based relational databases; very good overview of these types of systems
"Calvin: Fast Distributed Transactions for Partitioned Database Systems" (PDF) -- Daniel Abadi and collaborators write about a distributed transaction mechanism that can sit on top of non-transactional storage systems, transforming them into scalable, highly-available ACID databases
"Creating a BOSH from scratch on AWS" -- great tutorial from Dr Nic from Engine Yard on installing CloudFoundry's BOSH tool on AWS
"Amazon's Journey to the Cloud" -- very good presentation by John Rauser on the long and winding road taken by Amazon from their beginnings running on 1 machine to the launch of AWS technologies
"Engineering Change" -- short but very insightful presentation by Etsy's CTO Kellan Elliott-McCrea on continuous deployment strategies and metrics-driven development
"People Make Poor Monitors for Computers" -- eye-opening article on the dangers on relying on highly automated and sophisticated monitoring systems; when they fail, human operators are expected to jump in and fix the issues, but unfortunately those rare issues are extremely hard to diagnose and fix by humans who have lost their edge by relying on the automated systems in the first place!
"vbench - benchmarking performance through time" -- from Wes McKinney's Panda project, vbench is a lightweight Python library for measuring code performance and catching performance regressions
"Big Data -- a little analysis" -- switching gears to Big Data, here's a good overview/taxonomy of types of problems in this space, based on data volume and algorithm complexity, courtesy of Chris Swan
"The unsexy side of big data: 5 tools to manage your Hadoop cluster" -- some tools I had never heard of for managing Hadoop clusters, including Apache Ambari and Apache Mesos
"Online resources for handling big data and parallel computing in R" -- from the R-bloggers blog aggregator, a useful collection of links on mostly parallel computing with R
"Much to like about HBaseCon" -- quick overview of some talks from last week's HBaseCon 2012

I also want to give a shout-out here to Gareth Rushgrove, who publishes an email newsletter called 'Devops Weekly'. If you are working in this field, I highly recommend you subscribe to it, as it is always full of interesting links and summaries to articles and tools.

Monday, May 14, 2012

The correct way of using DynamoDB BatchWriteItem with boto

In my previous post I wrote about the advantages of using the BatchWriteItem functionality in DynamoDB. As it turns out, I was overly optimistic when I wrote my initial code: I only called the batch_write_item method of the layer2 module in boto once.

The problem with this approach is that many of the batched inserts can fail, and in practice this happens quite frequently, probably because of transient network errors. The correct approach is to inspect the response object returned by batch_write_item -- here is an example of such an object:

{'Responses': {'mytable': {'ConsumedCapacityUnits': 5.0}},
'UnprocessedItems': {'mytable': [
{'PutRequest': {'Item': {'mykey': 'key1', 'myvalue': 'value1'}}},
{'PutRequest': {'Item': {'mykey': 'key2', 'myvalue': 'value2'}}},
{'PutRequest': {'Item': {'mykey': 'key3', 'myvalue': 'value3'}}}]}}

You need to look for the value corresponding to the 'UnprocessedItems' key. This value is a dictionary keyed by the name of the table you're inserting items in. The value corresponding to that key gives you a list of other dictionaries with keys corresponding to the operations you applied to the table ('PutRequest' in my case). Going one level deeper allows you to finally obtain the attributes (keys + values) of the items that failed, which you can then try to re-insert.

So basically you need to stay in a loop and keep calling batch_write_items until UnprocessedItems corresponds to an empty list. Here is a gist containing code that reads a log file in lzop format, looks for lines containing a key + white space + a value, then inserts items based on those key/value pairs into a DynamoDB table. I've been pretty happy with this approach.

Before I finish, I'd like to reiterate the gripe I have about the static nature of determining your Read and Write Throughput when dealing with DynamoDB. I understand that it makes life easier for AWS in terms of the capacity planning they have to do on their end to scale the table across multiple instances, but it's a black art when it comes to capacity planning you need to do as a user. You almost always end up overcommitting as a DynamoDB user, and it's hard to make sense sometimes of the capacity units you're consuming, especially when doing inserts of large volumes of data.

Agile Testing

Tuesday, May 29, 2012

A sweep through my Instapaper for May 2012

Monday, May 14, 2012

The correct way of using DynamoDB BatchWriteItem with boto

Modifying EC2 security groups via AWS Lambda functions

Followers