- "Blameless PostMortems and a Just Culture" -- John Allspaw describes the post mortem process at Etsy, in particular how it encourages engineers to own up to their mistakes and to help others avoid them, all in a safe and blame-free environment
- "Sprint.ly's Continuous Integration" -- good overview of what should be by now industry-standard methods and tools for rapid deployments and continuous integration
- "Big List of 20 Common Bottlenecks" -- via the super useful High Scalability blog, a list of bottlenecks that you will hit one way or another if you do any serious high-volume systems work
- "BASE: An ACID alternative" -- Dan Pritchett from EBay coined the term BASE (basically available, soft state, eventually consistent) to compare and contrast NoSQL systems with traditional ACID-based relational databases; very good overview of these types of systems
- "Calvin: Fast Distributed Transactions for Partitioned Database Systems" (PDF) -- Daniel Abadi and collaborators write about a distributed transaction mechanism that can sit on top of non-transactional storage systems, transforming them into scalable, highly-available ACID databases
- "Creating a BOSH from scratch on AWS" -- great tutorial from Dr Nic from Engine Yard on installing CloudFoundry's BOSH tool on AWS
- "Amazon's Journey to the Cloud" -- very good presentation by John Rauser on the long and winding road taken by Amazon from their beginnings running on 1 machine to the launch of AWS technologies
- "Engineering Change" -- short but very insightful presentation by Etsy's CTO Kellan Elliott-McCrea on continuous deployment strategies and metrics-driven development
- "People Make Poor Monitors for Computers" -- eye-opening article on the dangers on relying on highly automated and sophisticated monitoring systems; when they fail, human operators are expected to jump in and fix the issues, but unfortunately those rare issues are extremely hard to diagnose and fix by humans who have lost their edge by relying on the automated systems in the first place!
- "vbench - benchmarking performance through time" -- from Wes McKinney's Panda project, vbench is a lightweight Python library for measuring code performance and catching performance regressions
- "Big Data -- a little analysis" -- switching gears to Big Data, here's a good overview/taxonomy of types of problems in this space, based on data volume and algorithm complexity, courtesy of Chris Swan
- "The unsexy side of big data: 5 tools to manage your Hadoop cluster" -- some tools I had never heard of for managing Hadoop clusters, including Apache Ambari and Apache Mesos
- "Online resources for handling big data and parallel computing in R" -- from the R-bloggers blog aggregator, a useful collection of links on mostly parallel computing with R
- "Much to like about HBaseCon" -- quick overview of some talks from last week's HBaseCon 2012
I also want to give a shout-out here to Gareth Rushgrove, who publishes an email newsletter called 'Devops Weekly'. If you are working in this field, I highly recommend you subscribe to it, as it is always full of interesting links and summaries to articles and tools.