Tuesday, July 28, 2009
noSQL databases? map-reduce? Erlang? it's all in this cartoon
Hilarious cartoon (not sure why it's titled 'Fault Tolerance' though) seen on the High Scalability blog. Captures very well the spirit and hype of our times in the IT world.
Monday, July 27, 2009
Python well represented in NASA's Nebula cloud
I found out today from the cloud-computing mailing list about NASA's Nebula project. Here's what the 'About' page of the project's web site says:
"NEBULA is a Cloud Computing environment developed at NASA Ames Research Center, integrating a set of open-source components into a seamless, self-service platform. It provides high-capacity computing, storage and network connectivity, and uses a virtualized, scalable approach to achieve cost and energy efficiencies."
The Services page has some nice architectural diagrams. I wasn't surprised to see that their VM enviroment is managed via Eucalyptus. I also shouldn't have been surprised by the large number of Python modules and applications they're using, especially on the client side. Pretty much all the frontend applications are Python bindings for the various backend technologies they're using (such as LUSTRE, RabbitMQ, Subversion). Of course Trac is there too.
But the most interesting thing for Python fans will be undoubtedly their selection for the Web application framework. Maybe again unsurprisingly, they chose...Django:
"After an extensive trade study, the NEBULA team selected Django, a python-based web application framework, as the first and primary application environment for the Cloud. NEBULA users have access to an extensive collection of open-source django "apps", providing features ranging from simple blogs, wikis, and discussion forums, to more advanced collaboration suites, image processing, and more."
Other interesting tidbits from those diagrams:
"NEBULA is a Cloud Computing environment developed at NASA Ames Research Center, integrating a set of open-source components into a seamless, self-service platform. It provides high-capacity computing, storage and network connectivity, and uses a virtualized, scalable approach to achieve cost and energy efficiencies."
The Services page has some nice architectural diagrams. I wasn't surprised to see that their VM enviroment is managed via Eucalyptus. I also shouldn't have been surprised by the large number of Python modules and applications they're using, especially on the client side. Pretty much all the frontend applications are Python bindings for the various backend technologies they're using (such as LUSTRE, RabbitMQ, Subversion). Of course Trac is there too.
But the most interesting thing for Python fans will be undoubtedly their selection for the Web application framework. Maybe again unsurprisingly, they chose...Django:
"After an extensive trade study, the NEBULA team selected Django, a python-based web application framework, as the first and primary application environment for the Cloud. NEBULA users have access to an extensive collection of open-source django "apps", providing features ranging from simple blogs, wikis, and discussion forums, to more advanced collaboration suites, image processing, and more."
Other interesting tidbits from those diagrams:
- deployments are automated with Fabric
- distributed automated testing is done with Selenium Grid
- continuous integration is done with CruiseControl
- for the database backend, they use a MySQL cluster with DRBD
- for the file system they use LUSTRE
- queuing is done with RabbitMQ (which is written in Erlang)
- search and indexing is done with SOLR
Sunday, July 26, 2009
How to roll your own Amazon EC2 image
Jeff Roberts, the vim-fu guru, does it again with a great post on "Bundling versioned AMIs rapidly in Amazon's EC2". It's a step-by-step guide on how to roll your own AMI, bundle it and upload it to S3, while keeping it versioned at the same time. Highly recommended.
Tuesday, July 21, 2009
Automated testing of production deployments
When you work as a systems engineer at a company that has a large scale system infrastructure, sooner or later you realize that you need to automate pretty much everything you do. You can't afford not to, if you want to keep up with the ever-present demands of scaling up and down the infrastructure.
The main promise of cloud computing -- infinite elastic scaling based on demand -- is real, but you can only achieve it if you automate your deployments. It's fairly safe to say that most teams that are involved in such infrastructures have achieved high levels of automation. Some fearless teams practice continuous deployment, others do frequent dark launches. All these practices are great, but my thesis is that in order to achieve fearlessness you need automated tests of your production deployments.
Note the word 'production' -- I believe it is necessary to go one step beyond running automated tests in an isolated staging environment (although that is a very good thing to do, especially if staging mirrors production at a smaller scale). That next step is to run your test harness in production, every time you deploy. And deployment, at a fast moving Web company these days, can happen multiple times a day. Trust me, with no automated tests in place, you'll never get rid of that nagging feeling in the pit of your stomach that you might have broken things horribly, in production.
So how do you go about writing automated tests for your deployments? I wrote a while ago about automating and testing your system setup checklists. Even testing small things such as 'is httpd/mysqld/postfix setup to run at boot time' will go a long way in achieving peace of mind.
Assuming you have a list of things to test (it can be just a couple of critical things for starters), how and when do you run the tests? Again, you can do the simplest thing that works -- a bash shell that iterates through your production servers and runs the test scripts remotely on the servers via ssh. Some things I test this way these days are:
* do local MySQL databases on servers in a particular cluster contain the same data in certain tables? (this shows me that things are in sync across servers)
* is MySQL replication working as expected across the cluster of read-only slaves?
* are periodic operations happening as expected (here I can do a simple tail of a log file to figure it out)
* are certain PHP modules correctly installed?
* is Apache serving a number of requests per second that is not too high, but not too low either (where high and low are highly dependent on your traffic and application obviously)
I run these tests (and many others) each time I push a change to production. No matter how small the change can seem, it can have unanticipated side effects. I found that having tests that probe the system from as many angles as possible are the most efficient -- the angles in my case being Apache, MySQL, PHP, memcached for example. I also found that this type of testing (push-based if you want) is very good at showing discrepancies between servers. If you see a server being out of wack this way, then you know you need to attempt to fix it, or even terminate it and deploy a new one.
Another approach in your automated testing strategy is to run your test harness periodically (via cron for example) and also to write the harness in a proper language (Python comes to mind), integrated into a test framework. You can have the results of the tests emailed to you in case of failure. The advantage of this approach is that you can have things run automatically without your intervention (in the first approach, you still have to remember to run the test suite!).
The ultimate in terms of automated testing is to integrate it with your monitoring infrastructure. If you use Nagios for example, you can easily write plugins that essentialy probe for the same things that your tests probe for. The advantage of this approach is that the tests will run every time Nagios runs, and you can set up alerts easily. One disadvantage is that it can slow down your monitoring, depending on the number of tests you need to run on each server. Monitoring typically happens very often (every 5 minutes is a common practice), so it may be overkill to run all the tests every 5 minutes. Of course, this should be configurable in your monitoring tool, so you can have a separate class of checks that only happen every N hours for example.
In any case, let me assure you that even if you take the first approach I mentioned (ssh into all servers and run commands remotely that way), you'll reap the rewards very fast. In fact, you'll like it so much that you'll want to keep adding more tests, so you can achieve more inner peace. It's a sure way to becoming test infected, but also to achieve deployment nirvana.
The main promise of cloud computing -- infinite elastic scaling based on demand -- is real, but you can only achieve it if you automate your deployments. It's fairly safe to say that most teams that are involved in such infrastructures have achieved high levels of automation. Some fearless teams practice continuous deployment, others do frequent dark launches. All these practices are great, but my thesis is that in order to achieve fearlessness you need automated tests of your production deployments.
Note the word 'production' -- I believe it is necessary to go one step beyond running automated tests in an isolated staging environment (although that is a very good thing to do, especially if staging mirrors production at a smaller scale). That next step is to run your test harness in production, every time you deploy. And deployment, at a fast moving Web company these days, can happen multiple times a day. Trust me, with no automated tests in place, you'll never get rid of that nagging feeling in the pit of your stomach that you might have broken things horribly, in production.
So how do you go about writing automated tests for your deployments? I wrote a while ago about automating and testing your system setup checklists. Even testing small things such as 'is httpd/mysqld/postfix setup to run at boot time' will go a long way in achieving peace of mind.
Assuming you have a list of things to test (it can be just a couple of critical things for starters), how and when do you run the tests? Again, you can do the simplest thing that works -- a bash shell that iterates through your production servers and runs the test scripts remotely on the servers via ssh. Some things I test this way these days are:
* do local MySQL databases on servers in a particular cluster contain the same data in certain tables? (this shows me that things are in sync across servers)
* is MySQL replication working as expected across the cluster of read-only slaves?
* are periodic operations happening as expected (here I can do a simple tail of a log file to figure it out)
* are certain PHP modules correctly installed?
* is Apache serving a number of requests per second that is not too high, but not too low either (where high and low are highly dependent on your traffic and application obviously)
I run these tests (and many others) each time I push a change to production. No matter how small the change can seem, it can have unanticipated side effects. I found that having tests that probe the system from as many angles as possible are the most efficient -- the angles in my case being Apache, MySQL, PHP, memcached for example. I also found that this type of testing (push-based if you want) is very good at showing discrepancies between servers. If you see a server being out of wack this way, then you know you need to attempt to fix it, or even terminate it and deploy a new one.
Another approach in your automated testing strategy is to run your test harness periodically (via cron for example) and also to write the harness in a proper language (Python comes to mind), integrated into a test framework. You can have the results of the tests emailed to you in case of failure. The advantage of this approach is that you can have things run automatically without your intervention (in the first approach, you still have to remember to run the test suite!).
The ultimate in terms of automated testing is to integrate it with your monitoring infrastructure. If you use Nagios for example, you can easily write plugins that essentialy probe for the same things that your tests probe for. The advantage of this approach is that the tests will run every time Nagios runs, and you can set up alerts easily. One disadvantage is that it can slow down your monitoring, depending on the number of tests you need to run on each server. Monitoring typically happens very often (every 5 minutes is a common practice), so it may be overkill to run all the tests every 5 minutes. Of course, this should be configurable in your monitoring tool, so you can have a separate class of checks that only happen every N hours for example.
In any case, let me assure you that even if you take the first approach I mentioned (ssh into all servers and run commands remotely that way), you'll reap the rewards very fast. In fact, you'll like it so much that you'll want to keep adding more tests, so you can achieve more inner peace. It's a sure way to becoming test infected, but also to achieve deployment nirvana.
Friday, July 17, 2009
Managing multiple MySQL instances with MySQL Sandbox
MySQL doesn't support multi-master replication, i.e. you can't have one MySQL instance acting as a replication slave to more than one master. There are times when you need this functionality, for example for disaster recovery purposes, where you have a machine with tons of CPU, RAM and disk running several MySQL instances, each being a replication slave to a different MySQL master.
One tool I've used for easy management of multiple MySQL instances on the same box is MySQL Sandbox. It's nothing fancy -- a Perl module which offers a collection of scripts -- but it does make your life much easier.
To install MySQL Sandbox, download it from its Launchpad page, then run 'perl Makefile.PL; make; make install'. You also need to download a MySQL binary tarball which will serve as a common base used by your MySQL instances.
Here's an example of a script I wrote which creates a new MySQL Sandbox instance under a common directory (/var/mysql_slaves in my case). The script takes 2 arguments: a database name, and the name of the MySQL master from which that database is replicated. The script automatically increments the port number that the new sandbox instance will listen on, then creates the instance via a call like this:
As a result, there will be a new directory called $SLAVEDB_NAME under /var/mysql_slaves, which serves as the sandbox for the newly created MySQL instance. The script also adds some lines related to replication to the new MySQL instance configuration file (which is /var/mysql_slaves/my.sandbox.cnf).
To start the instance, run
To stop the instance, run
To go to a MySQL prompt for this instance, run
At this point, you still don't have a functioning slave. You need to load the data from the master. One way to do this is to run mysqldump on the master with options such as ' --single-transaction --master-data=1'. This will include the master information (binlog name and position) in the DB dump.
The next step is to transfer the DB dump over to the box running MySQL Sandbox, and load it into the MySQL instance. I use a script similar to this.
You should now have a MySQL instance that acts as a replication slave to a specific master server. Repeat this process to set up other sandboxed MySQL instances that are slaves to other masters.
Note that MySQL Sandbox already includes some replication-related utilities (which I haven't used) and also an admin-type tool called sbtool. The documentation is pretty good.
One tool I've used for easy management of multiple MySQL instances on the same box is MySQL Sandbox. It's nothing fancy -- a Perl module which offers a collection of scripts -- but it does make your life much easier.
To install MySQL Sandbox, download it from its Launchpad page, then run 'perl Makefile.PL; make; make install'. You also need to download a MySQL binary tarball which will serve as a common base used by your MySQL instances.
Here's an example of a script I wrote which creates a new MySQL Sandbox instance under a common directory (/var/mysql_slaves in my case). The script takes 2 arguments: a database name, and the name of the MySQL master from which that database is replicated. The script automatically increments the port number that the new sandbox instance will listen on, then creates the instance via a call like this:
/usr/bin/make_sandbox /usr/local/src/mysql-5.1.32-linux-x86_64-glibc23.tar.gz \ --upper_directory=/var/mysql_slaves \ --sandbox_directory=$SLAVEDB_NAME --sandbox_port=$LAST_PORT_NUMBER \ --db_user=$SLAVEDB_NAME --db_password=PASSWORD \ --no_confirm
As a result, there will be a new directory called $SLAVEDB_NAME under /var/mysql_slaves, which serves as the sandbox for the newly created MySQL instance. The script also adds some lines related to replication to the new MySQL instance configuration file (which is /var/mysql_slaves/my.sandbox.cnf).
To start the instance, run
/var/mysql_slaves/$SLAVEDB_NAME/start
To stop the instance, run
/var/mysql_slaves/$SLAVEDB_NAME/start
To go to a MySQL prompt for this instance, run
/var/mysql_slaves/$SLAVEDB_NAME/use
At this point, you still don't have a functioning slave. You need to load the data from the master. One way to do this is to run mysqldump on the master with options such as ' --single-transaction --master-data=1'. This will include the master information (binlog name and position) in the DB dump.
The next step is to transfer the DB dump over to the box running MySQL Sandbox, and load it into the MySQL instance. I use a script similar to this.
You should now have a MySQL instance that acts as a replication slave to a specific master server. Repeat this process to set up other sandboxed MySQL instances that are slaves to other masters.
Note that MySQL Sandbox already includes some replication-related utilities (which I haven't used) and also an admin-type tool called sbtool. The documentation is pretty good.
Tuesday, July 14, 2009
Kent Langley's '10 rules for launching a web site'
The advice in this blog post by Kent Langley resonates with my experiences launching Web infrastructures of all types, large and small. Deploy early and often, automate your deployments, use version control, create checklists, have a rollback plan -- these are all very sensible things to do.
I would add one more very important thing that seems to be missing from the list: have an extensive suite of automated tests to check that your deployment steps did the right thing. Many people just stop at the automation step, and don't go beyond that to the testing step. It will come back to haunt them in the long run. But this is fodder for another blog post, which will be coming real soon now ;-)
I would add one more very important thing that seems to be missing from the list: have an extensive suite of automated tests to check that your deployment steps did the right thing. Many people just stop at the automation step, and don't go beyond that to the testing step. It will come back to haunt them in the long run. But this is fodder for another blog post, which will be coming real soon now ;-)
Monday, July 13, 2009
Greatest invention since sliced bread: vimdiff
If you work at the ssh command prompt all day long (like I do), and if you need to compare text files and merge differences (like I do), then make sure you check out vimdiff (thanks to Jeff Roberts for bringing it to my attention).
If you run 'vimdiff file1 file2', the tool will split your screen vertically, with file1 displayed in a vim session on the left and file2 on the right. The differences between the 2 files will be highlighted. To jump from difference to difference, use ]c (forward) and [c (backward). When the cursor is on a difference block, use :diffget or do to merge the difference from the other file into the file where the cursor is; use :diffput or dp to merge the other way. To jump from one file's window to the other, use Ctrl-w-w. Google vimdiff for other tips and tricks. Definitely a good tool to have in your arsenal.
If you have the luxury of a graphical enviroment, I also recommend meld (thanks to Chris Nutting for the tip).
If you run 'vimdiff file1 file2', the tool will split your screen vertically, with file1 displayed in a vim session on the left and file2 on the right. The differences between the 2 files will be highlighted. To jump from difference to difference, use ]c (forward) and [c (backward). When the cursor is on a difference block, use :diffget or do to merge the difference from the other file into the file where the cursor is; use :diffput or dp to merge the other way. To jump from one file's window to the other, use Ctrl-w-w. Google vimdiff for other tips and tricks. Definitely a good tool to have in your arsenal.
If you have the luxury of a graphical enviroment, I also recommend meld (thanks to Chris Nutting for the tip).
Recommended blog: Elastician
Elastician is the blog of Mitch Garnaat, the author of the amazingly useful boto Python library -- a collection of modules for managing AWS resources (EC2, S3, SQS, SimpleDB and more recently CloudWatch).
Mitch has a great picture on what he calls the 'Cloud Computing Hierarchy of Needs' (in a reference to Maslow's self-actualization hierarchy). Very insightful.
Mitch has a great picture on what he calls the 'Cloud Computing Hierarchy of Needs' (in a reference to Maslow's self-actualization hierarchy). Very insightful.
Friday, July 10, 2009
Python mock testing techniques and tools
This is an article I wrote for Python Magazine as part of the 'Pragmatic Testers' column. Titus and I have taken turns writing the column, although we haven't produced as many articles as we would have liked.
Here is the content of my article, which appeared in the February 2009 issue of PyMag:
Mock testing is a controversial topic in the area of unit testing. Some people swear by it, others swear at it. As always, the truth is somewhere in the middle.
Let's get some terminology clarified: when people say they use mock objects in their testing, in most cases they actually mean stubs, not mocks. The difference is expanded upon with his usual brilliance by Martin Fowler in his article "Mocks aren't stubs".
In his revised version of the article, Fowler uses the terminology from Gerard Meszaros's 'xUnit Test Patterns' book. In this nomenclature, both stubs and mocks are special cases of 'test doubles', which are 'pretend' objects used in place of real objects during testing. Here is Meszaros's definition of a test double:
Sometimes it is just plain hard to test the system under test (SUT) because it depends on other components that cannot be used in the test environment. This could be because they aren't available, they will not return the results needed for the test or because executing them would have undesirable side effects. In other cases, our test strategy requires us to have more control or visibility of the internal behavior of the SUT.
When we are writing a test in which we cannot (or chose not to) use a real depended-on component (DOC), we can replace it with a Test Double. The Test Double doesn't have to behave exactly like the real DOC; it merely has to provide the same API as the real one so that the SUT thinks it is the real one!
These 'other components' that cannot be used in a test environment, or can only be used with a high setup cost, are usually external resources such as database servers, Web servers, XML-RPC servers. Many of these resources may not be under your control, or may return data that often contains some randomness which makes it hard or impossible for your unit tests to assert things about it.
So what is the difference between stubs and mocks? Stubs are used to return canned data to your SUT, so that you can make some assertions on how your code reacts to that data. This eliminates randomness from the equation, at least in the test environment. Mocks, on the other hand, are used to specify expectations on the behavior of the object called by your SUT. You indicate your expectations by specifying that certain methods of the mock object need to be called by the SUT in a certain order and with certain arguments.
Fowler draws a further distinction between stubs and mocks by saying that stubs are used for “state verification”, while mocks are used for “behavior verification”. When we use state verification, we assert things about the state of the SUT after the stub returned the canned data back to the SUT. We don't care how the stub obtained that data, we just care about the final result (the data itself) and about how our SUT processed that data. When we use behavior verification, not only do we care about the data, but we also make sure that the SUT made the correct calls, in the correct order, and with the correct parameters, to the object representing the external resource.
If readers are still following along after all this theory, I'm fairly sure they have at least two questions:
1) when exactly do I use mock testing in my overall testing strategy?; and
2) if I do use mock testing, should I use mocks or stubs?
I already mentioned one scenario when you might want to use mock testing: when your SUT needs to interact with external resources which are either not under your control, or which return data with enough randomness to make it hard for your SUT to assert anything meaningful about it (for example external weather servers, or data that is timestamped). Another area where mock testing helps is in simulating error conditions which are not always under your control, and which are usually hard to reproduce. In this case, you can mock the external resource, simulate any errors or exceptions you want, and see how your program reacts to them in your unit tests (for example, you can simulate various HTTP error codes, or database connection errors).
Now for the second question, should you use mocks or stubs? In my experience, stubs that return canned data are sufficient for simulating the external resources and error conditions I mentioned. However, if you want to make sure that your application interacts correctly with these resources, for example that all the correct connection/disconnection calls are made to a database, then I recommend using mocks. One caveat of using mocks: by specifying expectations on the behavior of the object you're mocking and on the interaction of your SUT with that object, you couple your unit tests fairly tightly to the implementation of that object. With stubs, you only care about the external interface of the object you're mocking, not about the internal implementation of that object.
Enough theory, let's see some practical examples. I will discuss some unit tests I wrote for an application that interacts with an external resource, in my case a SnapLogic server. I don't have the space to go into detail about SnapLogic, but it is a Python-based Open Source data integration framework. It allows you to unify the access to the data needed by your application through a single API. Behind the scenes, SnapLogic talks to database servers, CSV files, and other data sources, then presents the data to your application via a simple unified API. The main advantage is that your application doesn't need to know the particular APIs for accessing the various external data sources.
In my case, SnapLogic talks to a MySQL database and presents the data returned by a SELECT SQL query to my application as a list of rows, where each row is itself a list. My application doesn't know that the data comes from MySQL, it just retrieves the data from the SnapLogic server via the SnapLogic API. I encapsulated the code that interacts with the SnapLogic server in its own class, which I called SnapLogicManager. My main SUT is passed a SnapLogicManager object in its __init__ method, then calls its methods to retrieve the data from the SnapLogic server.
I think you know where this is going – SnapLogic is an external resource as far as my SUT is concerned. It is expensive to set up and tear down, and it could return data with enough randomness so I wouldn't be able to make meaningful assertions about it. It would also be hard to simulate errors using the real SnapLogic server. All this indicates that the SnapLogicManager object is ripe for mocking.
My application code makes just one call to the SnapLogicManager object, to retrieve the dataset it needs to process:
Then the application processes the rows (list of lists) and instantiates various data structures based on the values in the rows. For the purpose of this article, I'll keep it simple and say that each row has an attribute name (element #0), and attribute value (element #1) and an attribute target (element #2). For example, an attribute could have the name “DocumentRoot”, the value “/var/www/mydocroot” and the target “apache”. The application expects that certain attributes are there with the correct target. If they're not, it raises an exception.
How do we test that the application correctly instantiates the data structure, and correctly reacts to the presence or absence of certain attributes? You guessed it, we use a mock SnapLogicManager object, and we return canned data to our application.
I will show here how to achieve this using two different Python mock testing frameworks: Mox, written by Google engineers, and Mock, written by Michael Foord.
Mox is based on the Java EasyMock framework, and it does have a Java-esque feel to it, down to the CamelCase naming convention. Mock feels more 'pythonic' – more intuitive and with cleaner APIs. The two frameworks also differ in the way they set up and verify the mock objects: Mox uses a record/replay/verify pattern, whereas Mock uses an action/assert pattern. I will go into these differences by showing actual code below.
Here is a unit test that uses Mox:
Some explanations are in order. With the Mox framework, when you instantiate a MockObject, it is in 'record' mode, which means it's waiting for you to specify expectations on its behavior. You specify these expectations by telling the mock object what to return when called with a certain method. In my example, I tell the mock object that I want the list of canned rows to be returned when I call its 'get_attrib_values' method: mock_snaplogic_manager.get_attrib_values(self.appname, self.hostname).AndReturn(canned_snaplogic_rows)
I only have one method that I am recording the expectations for in my example, but you could have several. When you are done recording, you need to put the mock object in 'replay' mode by calling
Then you call your application code, in my example by passing the mock object in the constructor of MyApp:
At this point, you need to verify that the expectations you set happened correctly. You do this by calling the Verify method of the mock object:
If any of the methods you recorded were not called, or where called in the wrong order, or with the wrong parameters, you would get an exception at this point and your unit tests would fail.
Finally, you also assert various things about your application, just as you would in any regular unit test. In my case, I assert that the
This seems like a lot of work if all you need to do is to return canned data to your application. Enter the other framework I mentioned, Mock, which lets you specify canned return values very easily, and also allows you to assert certain things about the way the mock objects were called without the rigorous record/replay/verify pattern.
Here's how I rewrote my unit test using Mock:
As you can see, Mock allows you to specify the return value for a given method of the mock object, in my case for the 'get_attrib_values' method. Mock also allows you to verify that the method has been called with the correct arguments. I do that by calling
There are many other things you can do with both Mox and Mock. Space doesn't permit me to go into many more details here, but I strongly encourage you to read the documentation and try things out on your own.
Another technique I want to show is how to simulate exceptions using the Mox framework. In my unit tests, I wanted to verify that my application reacts correctly to exceptions thrown by the SnapLogicManager class. Those exception are thrown for example when the SnapLogic server is not running. Here is the unit test I wrote:
I used the following Mox API for simulating an exception:
To verify that my application reacted correctly to the exception, I checked the application log file, and I made sure that the last line logged contained the correct exception type and value.
Space does not permit me to show a Python-specific mock testing technique which for lack of a better name I call 'namespace overriding' (actually this is a bit like monkey patching, but for testing purposes; so maybe we can call it monkey testing?). I refer the reader to my blog post on 'Mock testing examples and resources' and I just quickly describe here the technique. Imagine that one of the methods of your application calls
I started this article by saying that mock testing is a controversial topic in the area of unit testing. Many people feel that you should not use mock testing because you are not testing your application in the presence of the real objects on which it depends, so if the code for these objects changes, you run the risk of having your unit tests pass even though the application will break when it interacts with the real objects. This is a valid objection, and I don't recommend you go overboard with mocking every single interaction in your application. Instead, limit your mock testing, as I said in this article, to resources whose behavior and returned data are hard to control.
Another important note: whatever your unit testing strategy is, whether you use mock testing techniques or not, do not forget that you also need to have functional test and integration tests for your application. Integration tests especially do need to exercise all the resources that your application needs to interact with. For more information on different types of testing that you need to consider, please see my blog posts 'Should acceptance tests be included in the continuous build process?' and 'On the importance of functional testing'.
Here is the content of my article, which appeared in the February 2009 issue of PyMag:
Mock testing is a controversial topic in the area of unit testing. Some people swear by it, others swear at it. As always, the truth is somewhere in the middle.
Let's get some terminology clarified: when people say they use mock objects in their testing, in most cases they actually mean stubs, not mocks. The difference is expanded upon with his usual brilliance by Martin Fowler in his article "Mocks aren't stubs".
In his revised version of the article, Fowler uses the terminology from Gerard Meszaros's 'xUnit Test Patterns' book. In this nomenclature, both stubs and mocks are special cases of 'test doubles', which are 'pretend' objects used in place of real objects during testing. Here is Meszaros's definition of a test double:
Sometimes it is just plain hard to test the system under test (SUT) because it depends on other components that cannot be used in the test environment. This could be because they aren't available, they will not return the results needed for the test or because executing them would have undesirable side effects. In other cases, our test strategy requires us to have more control or visibility of the internal behavior of the SUT.
When we are writing a test in which we cannot (or chose not to) use a real depended-on component (DOC), we can replace it with a Test Double. The Test Double doesn't have to behave exactly like the real DOC; it merely has to provide the same API as the real one so that the SUT thinks it is the real one!
These 'other components' that cannot be used in a test environment, or can only be used with a high setup cost, are usually external resources such as database servers, Web servers, XML-RPC servers. Many of these resources may not be under your control, or may return data that often contains some randomness which makes it hard or impossible for your unit tests to assert things about it.
So what is the difference between stubs and mocks? Stubs are used to return canned data to your SUT, so that you can make some assertions on how your code reacts to that data. This eliminates randomness from the equation, at least in the test environment. Mocks, on the other hand, are used to specify expectations on the behavior of the object called by your SUT. You indicate your expectations by specifying that certain methods of the mock object need to be called by the SUT in a certain order and with certain arguments.
Fowler draws a further distinction between stubs and mocks by saying that stubs are used for “state verification”, while mocks are used for “behavior verification”. When we use state verification, we assert things about the state of the SUT after the stub returned the canned data back to the SUT. We don't care how the stub obtained that data, we just care about the final result (the data itself) and about how our SUT processed that data. When we use behavior verification, not only do we care about the data, but we also make sure that the SUT made the correct calls, in the correct order, and with the correct parameters, to the object representing the external resource.
If readers are still following along after all this theory, I'm fairly sure they have at least two questions:
1) when exactly do I use mock testing in my overall testing strategy?; and
2) if I do use mock testing, should I use mocks or stubs?
I already mentioned one scenario when you might want to use mock testing: when your SUT needs to interact with external resources which are either not under your control, or which return data with enough randomness to make it hard for your SUT to assert anything meaningful about it (for example external weather servers, or data that is timestamped). Another area where mock testing helps is in simulating error conditions which are not always under your control, and which are usually hard to reproduce. In this case, you can mock the external resource, simulate any errors or exceptions you want, and see how your program reacts to them in your unit tests (for example, you can simulate various HTTP error codes, or database connection errors).
Now for the second question, should you use mocks or stubs? In my experience, stubs that return canned data are sufficient for simulating the external resources and error conditions I mentioned. However, if you want to make sure that your application interacts correctly with these resources, for example that all the correct connection/disconnection calls are made to a database, then I recommend using mocks. One caveat of using mocks: by specifying expectations on the behavior of the object you're mocking and on the interaction of your SUT with that object, you couple your unit tests fairly tightly to the implementation of that object. With stubs, you only care about the external interface of the object you're mocking, not about the internal implementation of that object.
Enough theory, let's see some practical examples. I will discuss some unit tests I wrote for an application that interacts with an external resource, in my case a SnapLogic server. I don't have the space to go into detail about SnapLogic, but it is a Python-based Open Source data integration framework. It allows you to unify the access to the data needed by your application through a single API. Behind the scenes, SnapLogic talks to database servers, CSV files, and other data sources, then presents the data to your application via a simple unified API. The main advantage is that your application doesn't need to know the particular APIs for accessing the various external data sources.
In my case, SnapLogic talks to a MySQL database and presents the data returned by a SELECT SQL query to my application as a list of rows, where each row is itself a list. My application doesn't know that the data comes from MySQL, it just retrieves the data from the SnapLogic server via the SnapLogic API. I encapsulated the code that interacts with the SnapLogic server in its own class, which I called SnapLogicManager. My main SUT is passed a SnapLogicManager object in its __init__ method, then calls its methods to retrieve the data from the SnapLogic server.
I think you know where this is going – SnapLogic is an external resource as far as my SUT is concerned. It is expensive to set up and tear down, and it could return data with enough randomness so I wouldn't be able to make meaningful assertions about it. It would also be hard to simulate errors using the real SnapLogic server. All this indicates that the SnapLogicManager object is ripe for mocking.
My application code makes just one call to the SnapLogicManager object, to retrieve the dataset it needs to process:
rows = self.snaplogic_manager.get_attrib_values()
Then the application processes the rows (list of lists) and instantiates various data structures based on the values in the rows. For the purpose of this article, I'll keep it simple and say that each row has an attribute name (element #0), and attribute value (element #1) and an attribute target (element #2). For example, an attribute could have the name “DocumentRoot”, the value “/var/www/mydocroot” and the target “apache”. The application expects that certain attributes are there with the correct target. If they're not, it raises an exception.
How do we test that the application correctly instantiates the data structure, and correctly reacts to the presence or absence of certain attributes? You guessed it, we use a mock SnapLogicManager object, and we return canned data to our application.
I will show here how to achieve this using two different Python mock testing frameworks: Mox, written by Google engineers, and Mock, written by Michael Foord.
Mox is based on the Java EasyMock framework, and it does have a Java-esque feel to it, down to the CamelCase naming convention. Mock feels more 'pythonic' – more intuitive and with cleaner APIs. The two frameworks also differ in the way they set up and verify the mock objects: Mox uses a record/replay/verify pattern, whereas Mock uses an action/assert pattern. I will go into these differences by showing actual code below.
Here is a unit test that uses Mox:
def test_get_attrib_value_with_expected_target(self):
# We return a SnapLogic dataset which contains attributes with correct targets
canned_snaplogic_rows = [
[u'DocumentRoot', u'/var/www/mydocroot', u'apache'],
[u'dbname', u'some_dbname', u'database'],
[u'dbuser', u'SOME_DBUSER', u'database'],
]
# Create a mock SnapLogicManager
mock_snaplogic_manager = mox.MockObject(SnapLogicManager)
# Return the canned list of rows when get_attrib_values is called
mock_snaplogic_manager.get_attrib_values(self.appname, self.hostname).AndReturn(canned_snaplogic_rows)
# Put all mocks created by mox into replay mode
mox.Replay(mock_snaplogic_manager)
# Run the test
myapp = MyApp(self.appname, self.hostname, mock_snaplogic_manager)
myapp.get_attr_values_from_snaplogic()
# Verify all mocks were used as expected
mox.Verify(mock_snaplogic_manager)
# We test that attributes with correct targets are retrieved correctly
assert '/var/www/mydocroot' == myapp.get_attrib_value_with_expected_target("DocumentRoot", "apache")
assert 'some_dbname' == myapp.get_attrib_value_with_expected_target("db_name", "database")
assert 'SOME_DBUSER' == myapp.get_attrib_value_with_expected_target("db_user", "database")
Some explanations are in order. With the Mox framework, when you instantiate a MockObject, it is in 'record' mode, which means it's waiting for you to specify expectations on its behavior. You specify these expectations by telling the mock object what to return when called with a certain method. In my example, I tell the mock object that I want the list of canned rows to be returned when I call its 'get_attrib_values' method: mock_snaplogic_manager.get_attrib_values(self.appname, self.hostname).AndReturn(canned_snaplogic_rows)
I only have one method that I am recording the expectations for in my example, but you could have several. When you are done recording, you need to put the mock object in 'replay' mode by calling
mox.Replay(mock_snaplogic_manager)
. This means the mock object is now ready to be called by your application, and to verify that the expectations are being met.Then you call your application code, in my example by passing the mock object in the constructor of MyApp:
myapp = MyApp(self.appname, self.hostname, mock_snaplogic_manager)
. My test then calls myapp.get_attr_values_from_snaplogic()
, which in turn interacts with the mock object by calling its get_attrib_values() method.At this point, you need to verify that the expectations you set happened correctly. You do this by calling the Verify method of the mock object:
mox.Verify(mock_snaplogic_manager)
.If any of the methods you recorded were not called, or where called in the wrong order, or with the wrong parameters, you would get an exception at this point and your unit tests would fail.
Finally, you also assert various things about your application, just as you would in any regular unit test. In my case, I assert that the
get_attrib_value_with_expected_target
method of MyApp correctly retrieves the value of an attribute.This seems like a lot of work if all you need to do is to return canned data to your application. Enter the other framework I mentioned, Mock, which lets you specify canned return values very easily, and also allows you to assert certain things about the way the mock objects were called without the rigorous record/replay/verify pattern.
Here's how I rewrote my unit test using Mock:
def test_get_attrib_value_with_expected_target(self):
# We return a SnapLogic dataset which contains attributes with correct targets
canned_snaplogic_rows = [
[u'DocumentRoot', u'/var/www/mydocroot', u'apache'],
[u'dbname', u'some_dbname', u'database'],
[u'dbuser', u'SOME_DBUSER', u'database'],
]
# Create a mock SnapLogicManager
mock_snaplogic_manager = Mock()
# Return the canned list of rows when get_attrib_values is called
mock_snaplogic_manager.get_attrib_values.return_value = canned_snaplogic_rows
# Run the test
myapp = MyApp(self.appname, self.hostname, mock_snaplogic_manager)
myapp.get_attr_values_from_snaplogic()
# Verify that mocks were used as expected
mock_snaplogic_manager.get_attrib_values.assert_called_with(self.appname, self.hostname)
# We test that attributes with correct targets are retrieved correctly
assert '/var/www/mydocroot' == myapp.get_attrib_value_with_expected_target("DocumentRoot", "apache")
assert 'some_dbname' == myapp.get_attrib_value_with_expected_target("db_name", "database")
assert 'SOME_DBUSER' == myapp.get_attrib_value_with_expected_target("db_user", "database")
As you can see, Mock allows you to specify the return value for a given method of the mock object, in my case for the 'get_attrib_values' method. Mock also allows you to verify that the method has been called with the correct arguments. I do that by calling
assert_called_with
on the mock object. If you just want to verify that the method has been called at all, with no regard to the arguments, you can use assert_called
.There are many other things you can do with both Mox and Mock. Space doesn't permit me to go into many more details here, but I strongly encourage you to read the documentation and try things out on your own.
Another technique I want to show is how to simulate exceptions using the Mox framework. In my unit tests, I wanted to verify that my application reacts correctly to exceptions thrown by the SnapLogicManager class. Those exception are thrown for example when the SnapLogic server is not running. Here is the unit test I wrote:
def test_get_attr_values_from_snaplogic_when_errors(self):
# We simulate a SnapLogicManagerError and verify that it is caught properly
# Create a mock SnapLogicManager
mock_snaplogic_manager = mox.MockObject(SnapLogicManager)
# Simulate a SnapLogicManagerError when get_attrib_values is called
mock_snaplogic_manager.get_attrib_values(self.appname, self.hostname).AndRaise(SnapLogicManagerError('Boom!'))
# Put all mocks created by mox into replay mode
mox.Replay(mock_snaplogic_manager)
# Run the test
myapp = MyApp(self.appname, self.hostname, mock_snaplogic_manager)
myapp.get_attr_values_from_snaplogic()
# Verify all mocks were used as expected
mox.Verify(mock_snaplogic_manager)
# Verify that MyApp caught and logged the exception
line = get_last_line_from_log(self.logfile)
assert re.search('myapp - CRITICAL - get_attr_values_from_snaplogic --> SnapLogicManagerError: \'Boom!\'', line)
I used the following Mox API for simulating an exception:
mock_snaplogic_manager.get_attrib_values(self.appname, self.hostname).AndRaise(SnapLogicManagerError('Boom!')).
To verify that my application reacted correctly to the exception, I checked the application log file, and I made sure that the last line logged contained the correct exception type and value.
Space does not permit me to show a Python-specific mock testing technique which for lack of a better name I call 'namespace overriding' (actually this is a bit like monkey patching, but for testing purposes; so maybe we can call it monkey testing?). I refer the reader to my blog post on 'Mock testing examples and resources' and I just quickly describe here the technique. Imagine that one of the methods of your application calls
urllib.urlretrieve
in order to download a file from an external Web server. Did I say external Web server, as in 'external resource not under your control'? I did, so you know that mock testing will help. My blog post shows how you can write a mocked_urlretrieve function, and override the name urllib.urlretrieve in your unit tests with your mocked version mocked_urlretrieve. Simple and elegant. The blog post also shows how you can return various canned valued from the mocked version of urlretrieve, based on different input values.I started this article by saying that mock testing is a controversial topic in the area of unit testing. Many people feel that you should not use mock testing because you are not testing your application in the presence of the real objects on which it depends, so if the code for these objects changes, you run the risk of having your unit tests pass even though the application will break when it interacts with the real objects. This is a valid objection, and I don't recommend you go overboard with mocking every single interaction in your application. Instead, limit your mock testing, as I said in this article, to resources whose behavior and returned data are hard to control.
Another important note: whatever your unit testing strategy is, whether you use mock testing techniques or not, do not forget that you also need to have functional test and integration tests for your application. Integration tests especially do need to exercise all the resources that your application needs to interact with. For more information on different types of testing that you need to consider, please see my blog posts 'Should acceptance tests be included in the continuous build process?' and 'On the importance of functional testing'.
Thursday, July 09, 2009
Setting system-wide environment variables on RedHat-based machines
I keep forgetting this, so I'm committing it to long-term memory. If you have a RedHat-based operating system (RH, CentOS etc) and you need to set certain environment variables so they're available to every user, one good place to do it is to drop a script ending in .sh in /etc/profile.d. Then export your desired environment variables there.
Here's an example from a CentOS machine I have:
Note that you can do whatever else you need in these scripts -- for example you can set up aliases etc. Every script in /etc/profile.d which ends in .sh gets sourced in /etc/profile.
Here's an example from a CentOS machine I have:
# cd /etc/profile.d # cat java.sh export JAVA_HOME=/usr/java/default
Note that you can do whatever else you need in these scripts -- for example you can set up aliases etc. Every script in /etc/profile.d which ends in .sh gets sourced in /etc/profile.
Monday, July 06, 2009
Resource monitoring and graphing with Cacti in EC2
My colleague Jeff Roberts just posted a blog entry on 'Using Cacti to monitor a large scale infrastructure in Amazon’s EC2'. Highly recommended for people interested in monitoring and graphing their system resources across hundreds of nodes in Amazon EC2.
Thursday, July 02, 2009
Dark launching and other lessons from Facebook on massive deployments
I came across this note from the Engineering team at Facebook which talks about how they managed to smoothly launch their recent 'pick a username' feature. The title of the note is, appropriately enough, 'Hammering usernames' -- this is of course because they were expecting their infrastructure to be hammered.
In the note I saw for the first time a name for a strategy that teams I've been involved with have applied before: 'dark launching'. Essentially, dark launching is releasing a new feature to a subset of your users, mostly with no UI changes, but otherwise exercising all the parts of your infrastructure involved in serving that feature. A good strategy to apply when you're dealing with massive, large-scale deployments, and when you want to see how your infrastructure behaves in conditions that are as close to production as possible. Because remember, there's nothing like production! Your careful load/stress testing exercises in a lab environment ain't gonna cut it.
The note from Facebook has all sorts of other nuggets of wisdom related to massive infrastructure deployments. I recommend subscribing to the RSS feed for 'Engineering @ Facebook's Notes'.
While googling for 'dark launching', I also came across this very good post by Dare Obasanjo. Recommended reading.
In the note I saw for the first time a name for a strategy that teams I've been involved with have applied before: 'dark launching'. Essentially, dark launching is releasing a new feature to a subset of your users, mostly with no UI changes, but otherwise exercising all the parts of your infrastructure involved in serving that feature. A good strategy to apply when you're dealing with massive, large-scale deployments, and when you want to see how your infrastructure behaves in conditions that are as close to production as possible. Because remember, there's nothing like production! Your careful load/stress testing exercises in a lab environment ain't gonna cut it.
The note from Facebook has all sorts of other nuggets of wisdom related to massive infrastructure deployments. I recommend subscribing to the RSS feed for 'Engineering @ Facebook's Notes'.
While googling for 'dark launching', I also came across this very good post by Dare Obasanjo. Recommended reading.
Subscribe to:
Posts (Atom)
Modifying EC2 security groups via AWS Lambda functions
One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...
-
A short but sweet PM Boulevard interview with Jerry Weinberg on Agile management/methods. Of course, he says we need to drop the A and actu...
-
Here's a good interview question for a tester: how do you define performance/load/stress testing? Many times people use these terms inte...
-
Update 02/26/07 -------- The link to the old httperf page wasn't working anymore. I updated it and pointed it to the new page at HP. Her...