Werner Vogels is the CTO of Amazon. You can watch a talk he gave at the QCon conference on the topics of Availability and Consistency. The bottom line is that, as systems scale (and for amazon.com that means hundreds of thousands of systems), you have to pick 2 of the following 3: Consistency, Availability, Partitioning (actually the full name of the third one is "Tolerance to network partitioning.) This is called the CAP theorem, and Eric Brewer from Inktomi first came up with it.
Vogels pretty much equated partitioning with failure. Failure is inevitable, so you have to choose it out of those 3 properties. You're left with a choice between consistency and availability, or between ACID and BASE. According to Vogels, it turns out there's also a middle-of-the-road approach, where you choose a specific approach based on the needs of a particular service. He gave the example of the checkout process on amazon.com. When customers want to add items to their shopping cart, you ALWAYS want to honor that request (obviously because that's $$$ in the bank for you). So you choose high availability, and you hide errors from the customers in the hope that the system will sort out the errors at a later stage. When the customer hits the 'Submit order' button, you want high consistency for the next phase, because several sub-systems access that data at the same time (credit card processing, shipping and handling, reporting, etc.).
I also liked the approach Amazon takes when splitting people into teams. They have the 2-pizza rule: if it takes more than 2 pizzas to feed a team, it means the team is too large and needs to be split up. This equates to about 8 people per team. They actually make architectural decisions based on team size. If a feature is deemed to large to be comprehended by a team of 8 people, they split the feature into smaller pieces that can be digested more easily. Very agile approach :-)
Anyway, good presentation, highly recommended.
Wednesday, August 08, 2007
Werner Vogels talk at QCon
Subscribe to: Post Comments (Atom)
Modifying EC2 security groups via AWS Lambda functions
One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...
This post is a continuation of my previous one on " Running Gatling tests in Docker containers via Jenkins ". As I continued to se...
For the last month or so I've been experimenting with Rancher as the orchestration layer for Docker-based deployments. I've been pr...
Here's a good interview question for a tester: how do you define performance/load/stress testing? Many times people use these terms inte...
i found this blog entry on Google and a saw this video of Werner Vogels - great!
Do you have some more details about the agile process at Amazon?
Post a Comment