Tuesday, July 21, 2009

Automated testing of production deployments

When you work as a systems engineer at a company that has a large scale system infrastructure, sooner or later you realize that you need to automate pretty much everything you do. You can't afford not to, if you want to keep up with the ever-present demands of scaling up and down the infrastructure.

The main promise of cloud computing -- infinite elastic scaling based on demand -- is real, but you can only achieve it if you automate your deployments. It's fairly safe to say that most teams that are involved in such infrastructures have achieved high levels of automation. Some fearless teams practice continuous deployment, others do frequent dark launches. All these practices are great, but my thesis is that in order to achieve fearlessness you need automated tests of your production deployments.

Note the word 'production' -- I believe it is necessary to go one step beyond running automated tests in an isolated staging environment (although that is a very good thing to do, especially if staging mirrors production at a smaller scale). That next step is to run your test harness in production, every time you deploy. And deployment, at a fast moving Web company these days, can happen multiple times a day. Trust me, with no automated tests in place, you'll never get rid of that nagging feeling in the pit of your stomach that you might have broken things horribly, in production.

So how do you go about writing automated tests for your deployments? I wrote a while ago about automating and testing your system setup checklists. Even testing small things such as 'is httpd/mysqld/postfix setup to run at boot time' will go a long way in achieving peace of mind.

Assuming you have a list of things to test (it can be just a couple of critical things for starters), how and when do you run the tests? Again, you can do the simplest thing that works -- a bash shell that iterates through your production servers and runs the test scripts remotely on the servers via ssh. Some things I test this way these days are:

* do local MySQL databases on servers in a particular cluster contain the same data in certain tables? (this shows me that things are in sync across servers)
* is MySQL replication working as expected across the cluster of read-only slaves?
* are periodic operations happening as expected (here I can do a simple tail of a log file to figure it out)
* are certain PHP modules correctly installed?
* is Apache serving a number of requests per second that is not too high, but not too low either (where high and low are highly dependent on your traffic and application obviously)

I run these tests (and many others) each time I push a change to production. No matter how small the change can seem, it can have unanticipated side effects. I found that having tests that probe the system from as many angles as possible are the most efficient -- the angles in my case being Apache, MySQL, PHP, memcached for example. I also found that this type of testing (push-based if you want) is very good at showing discrepancies between servers. If you see a server being out of wack this way, then you know you need to attempt to fix it, or even terminate it and deploy a new one.

Another approach in your automated testing strategy is to run your test harness periodically (via cron for example) and also to write the harness in a proper language (Python comes to mind), integrated into a test framework. You can have the results of the tests emailed to you in case of failure. The advantage of this approach is that you can have things run automatically without your intervention (in the first approach, you still have to remember to run the test suite!).

The ultimate in terms of automated testing is to integrate it with your monitoring infrastructure. If you use Nagios for example, you can easily write plugins that essentialy probe for the same things that your tests probe for. The advantage of this approach is that the tests will run every time Nagios runs, and you can set up alerts easily. One disadvantage is that it can slow down your monitoring, depending on the number of tests you need to run on each server. Monitoring typically happens very often (every 5 minutes is a common practice), so it may be overkill to run all the tests every 5 minutes. Of course, this should be configurable in your monitoring tool, so you can have a separate class of checks that only happen every N hours for example.

In any case, let me assure you that even if you take the first approach I mentioned (ssh into all servers and run commands remotely that way), you'll reap the rewards very fast. In fact, you'll like it so much that you'll want to keep adding more tests, so you can achieve more inner peace. It's a sure way to becoming test infected, but also to achieve deployment nirvana.


Jacob Karma said...

Thanks for the nice post, gave me something to think about. We're doing a really poor job of automated testing at the moment. We use Fitnesse and in some way Selenium is a part of it, but I feel it's the weakest part of our development process.

Marius Gedminas said...

Do you have any thoughts to share about read-only versus read-write tests against production systems?

It's one thing to check whether the HTTP server is up and returns the right front page; it's a bit different when you want to have a test that logs in as a real user, updates some data, maybe sends out email notifications.

Grig Gheorghiu said...


I would say that the more in-depth application workflow testing should be done in the staging environment. Tests like the one you mentioned exercise the GUI and can be fairly easily automated and run in staging.

In production, I think it makes more sense to have tests that probe various areas of the application at the business logic layer ('below the GUI') as opposed to the GUI layer. So for example you could expose the login functionality as an API that can be called from the command line -- and then test that in production, bypassing the GUI.

You can have similar tests for other areas of the application, especially those areas that involve backends such as the database or the mail system. These tests 'smell' more like unit tests in that they are independent of each other and are not necessarily part of an end-to-end test.

But having this test harness in production allows you to quickly pinpoint issues in specific areas -- for example the tests will tell you that the application doesn't talk correctly to the database, or that the application doesn't send mail correctly, etc.



andy said...

Definitely gives you more peace of mind and is a worthwhile thing to do. But there are some tests that don't lend themselves well to production environments, such as scalability and performance testing.

One other note on the automation piece -- if you are afraid of forgetting to run the tests, make it a part of your acceptance criteria checklist. It's important to explicitly define what is a release candidate before throwing it out there with haphazard testing.

Arjan Kranenburg said...

Great article!
If you have also automated your deployment, wouldn't it be wise to start the automated tests from the deployment scripts?
Kind of like continuous integration with an automatic or manual fallback mechanism if not all tests pass.
Does anyone have experience with that?

Grig Gheorghiu said...

Arjan -- you're right, you could run your suite of automated deployment tests right after you deploy a new server and consider the deployment FAILED if they don't pass. I tend to run my automated tests after the fact, not quite immediately after the deployment, but your suggestion is really good.