Friday, July 10, 2009

Python mock testing techniques and tools

This is an article I wrote for Python Magazine as part of the 'Pragmatic Testers' column. Titus and I have taken turns writing the column, although we haven't produced as many articles as we would have liked.

Here is the content of my article, which appeared in the February 2009 issue of PyMag:

Mock testing is a controversial topic in the area of unit testing. Some people swear by it, others swear at it. As always, the truth is somewhere in the middle.

Let's get some terminology clarified: when people say they use mock objects in their testing, in most cases they actually mean stubs, not mocks. The difference is expanded upon with his usual brilliance by Martin Fowler in his article "Mocks aren't stubs".

In his revised version of the article, Fowler uses the terminology from Gerard Meszaros's 'xUnit Test Patterns' book. In this nomenclature, both stubs and mocks are special cases of 'test doubles', which are 'pretend' objects used in place of real objects during testing.  Here is Meszaros's definition of a test double:


Sometimes it is just plain hard to test the system under test (SUT) because it depends on other components that cannot be used in the test environment. This could be because they aren't available, they will not return the results needed for the test or because executing them would have undesirable side effects. In other cases, our test strategy requires us to have more control or visibility of the internal behavior of the SUT.

When we are writing a test in which we cannot (or chose not to) use a real depended-on component (DOC), we can replace it with a Test Double. The Test Double doesn't have to behave exactly like the real DOC; it merely has to provide the same API as the real one so that the SUT thinks it is the real one! 

These 'other components' that cannot be used in a test environment, or can only be used with a high setup cost, are usually external resources such as database servers, Web servers, XML-RPC servers. Many of these resources may not be under your control, or may return data that often contains some randomness which makes it hard or impossible for your unit tests to assert things about it.

So what is the difference between stubs and mocks? Stubs are used to return canned data to your SUT, so that you can make some assertions on how your code reacts to that data. This eliminates randomness from the equation, at least in the test environment. Mocks, on the other hand, are used to specify expectations on the behavior of the object called by your SUT. You indicate your expectations by specifying that certain methods of the mock object need to be called by the SUT in a certain order and with certain arguments.

Fowler draws a further distinction between stubs and mocks by saying that stubs are used for “state verification”, while mocks are used for “behavior verification”. When we use state verification, we assert things about the state of the SUT after the stub returned the canned data back to the SUT. We don't care how the stub obtained that data, we just care about the final result (the data itself) and about how our SUT processed that data. When we use behavior verification, not only do we care about the data, but we also make sure that the SUT made the correct calls, in the correct order, and with the correct parameters, to the object representing the external resource.

If readers are still following along after all this theory, I'm fairly sure they have at least two questions:

1) when exactly do I use mock testing in my overall testing strategy?; and
2) if I do use mock testing, should I use mocks or stubs?

I already mentioned one scenario when you might want to use mock testing: when your SUT needs to interact with external resources which are either not under your control, or which return data with enough randomness to make it hard for your SUT to assert anything meaningful about it (for example external weather servers, or data that is timestamped). Another area where mock testing helps is in simulating error conditions which are not always under your control, and which are usually hard to reproduce. In this case, you can mock the external resource, simulate any errors or exceptions you want, and see how your program reacts to them in your unit tests (for example, you can simulate various HTTP error codes, or database connection errors).

Now for the second question, should you use mocks or stubs? In my experience, stubs that return canned data are sufficient for simulating the external resources and error conditions I mentioned. However, if you want to make sure that your application interacts correctly with these resources, for example that all the correct connection/disconnection calls are made to a database, then I recommend using mocks. One caveat of using mocks: by specifying expectations on the behavior of the object you're mocking and on the interaction of your SUT with that object, you couple your unit tests fairly tightly to the implementation of that object. With stubs, you only care about the external interface of the object you're mocking, not about the internal implementation of that object.

Enough theory, let's see some practical examples. I will discuss some unit tests I wrote for an application that interacts with an external resource, in my case a SnapLogic server. I don't have the space to go into detail about SnapLogic, but it is a Python-based Open Source data integration framework. It allows you to unify the access to the data needed by your application through a single API. Behind the scenes, SnapLogic talks to database servers, CSV files, and other data sources, then presents the data to your application via a simple unified API. The main advantage is that your application doesn't need to know the particular APIs for accessing the various external data sources.

In my case, SnapLogic talks to a MySQL database and presents the data returned by a SELECT SQL query to my application as a list of rows, where each row is itself a list. My application doesn't know that the data comes from MySQL, it just retrieves the data from the SnapLogic server via the SnapLogic API. I encapsulated the code that interacts with the SnapLogic server in its own class, which I called SnapLogicManager. My main SUT is passed a SnapLogicManager object in its __init__ method, then calls its methods to retrieve the data from the SnapLogic server.

I think you know where this is going – SnapLogic is an external resource as far as my SUT is concerned. It is expensive to set up and tear down, and it could return data with enough randomness so I wouldn't be able to make meaningful assertions about it. It would also be hard to simulate errors using the real SnapLogic server. All this indicates that the SnapLogicManager object is ripe for mocking.

My application code makes just one call to the SnapLogicManager object, to retrieve the dataset it needs to process:


rows = self.snaplogic_manager.get_attrib_values()


Then the application processes the rows (list of lists) and instantiates various data structures based on the values in the rows. For the purpose of this article, I'll keep it simple and say that each row has an attribute name (element #0), and attribute value (element #1) and an attribute target (element #2). For example, an attribute could have the name “DocumentRoot”, the value “/var/www/mydocroot” and the target “apache”. The application expects that certain attributes are there with the correct target. If they're not, it raises an exception.

How do we test that the application correctly instantiates the data structure, and correctly reacts to the presence or absence of certain attributes? You guessed it, we use a mock SnapLogicManager object, and we return canned data to our application.

I will show here how to achieve this using two different Python mock testing frameworks: Mox, written by Google engineers, and Mock, written by Michael Foord.

Mox is based on the Java EasyMock framework, and it does have a Java-esque feel to it, down to the CamelCase naming convention. Mock feels more 'pythonic' – more intuitive and with cleaner APIs. The two frameworks also differ in the way they set up and verify the mock objects: Mox uses a record/replay/verify pattern, whereas Mock uses an action/assert pattern. I will go into these differences by showing actual code below.

Here is a unit test that uses Mox:


    def test_get_attrib_value_with_expected_target(self):

        # We return a SnapLogic dataset which contains attributes with correct targets
        canned_snaplogic_rows = [
        [u'DocumentRoot', u'/var/www/mydocroot', u'apache'],
        [u'dbname', u'some_dbname', u'database'],
        [u'dbuser', u'SOME_DBUSER', u'database'],
        ]

        # Create a mock SnapLogicManager
        mock_snaplogic_manager = mox.MockObject(SnapLogicManager)

        # Return the canned list of rows when get_attrib_values is called
        mock_snaplogic_manager.get_attrib_values(self.appname, self.hostname).AndReturn(canned_snaplogic_rows)

        # Put all mocks created by mox into replay mode
        mox.Replay(mock_snaplogic_manager)

        # Run the test
        myapp = MyApp(self.appname, self.hostname, mock_snaplogic_manager)
        myapp.get_attr_values_from_snaplogic()

        # Verify all mocks were used as expected
        mox.Verify(mock_snaplogic_manager)

        # We test that attributes with correct targets are retrieved correctly
        assert '/var/www/mydocroot' == myapp.get_attrib_value_with_expected_target("DocumentRoot", "apache")
        assert 'some_dbname' == myapp.get_attrib_value_with_expected_target("db_name", "database")
        assert 'SOME_DBUSER' == myapp.get_attrib_value_with_expected_target("db_user", "database")


Some explanations are in order. With the Mox framework, when you instantiate a MockObject, it is in 'record' mode, which means it's waiting for you to specify expectations on its behavior. You specify these expectations by telling the mock object what to return when called with a certain method. In my example, I tell the mock object that I want the list of canned rows to be returned when I call its 'get_attrib_values' method: mock_snaplogic_manager.get_attrib_values(self.appname, self.hostname).AndReturn(canned_snaplogic_rows)

I only have one method that I am recording the expectations for in my example, but you could have several. When you are done recording, you need to put the mock object in 'replay' mode by calling mox.Replay(mock_snaplogic_manager). This means the mock object is now ready to be called by your application, and to verify that the expectations are being met.

Then you call your application code, in my example by passing the mock object in the constructor of MyApp: myapp = MyApp(self.appname, self.hostname, mock_snaplogic_manager). My test then calls myapp.get_attr_values_from_snaplogic(), which in turn interacts with the mock object by calling its get_attrib_values() method.

At this point, you need to verify that the expectations you set happened correctly. You do this by calling the Verify method of the mock object: mox.Verify(mock_snaplogic_manager).

If any of the methods you recorded were not called, or where called in the wrong order, or with the wrong parameters, you would get an exception at this point and your unit tests would fail.

Finally, you also assert various things about your application, just as you would in any regular unit test. In my case, I assert that the get_attrib_value_with_expected_target method of MyApp correctly retrieves the value of an attribute.

This seems like a lot of work if all you need to do is to return canned data to your application. Enter the other framework I mentioned, Mock, which lets you specify canned return values very easily, and also allows you to assert certain things about the way the mock objects were called without the rigorous record/replay/verify pattern.

Here's how I rewrote my unit test using Mock:


    def test_get_attrib_value_with_expected_target(self):
        # We return a SnapLogic dataset which contains attributes with correct targets
        canned_snaplogic_rows = [
        [u'DocumentRoot', u'/var/www/mydocroot', u'apache'],
        [u'dbname', u'some_dbname', u'database'],
        [u'dbuser', u'SOME_DBUSER', u'database'],
        ]

        # Create a mock SnapLogicManager
        mock_snaplogic_manager = Mock()

        # Return the canned list of rows when get_attrib_values is called
        mock_snaplogic_manager.get_attrib_values.return_value = canned_snaplogic_rows

        # Run the test
        myapp = MyApp(self.appname, self.hostname, mock_snaplogic_manager)
        myapp.get_attr_values_from_snaplogic()

        # Verify that mocks were used as expected
        mock_snaplogic_manager.get_attrib_values.assert_called_with(self.appname, self.hostname)

        # We test that attributes with correct targets are retrieved correctly
        assert '/var/www/mydocroot' == myapp.get_attrib_value_with_expected_target("DocumentRoot", "apache")
        assert 'some_dbname' == myapp.get_attrib_value_with_expected_target("db_name", "database")
        assert 'SOME_DBUSER' == myapp.get_attrib_value_with_expected_target("db_user", "database")


As you can see, Mock allows you to specify the return value for a given method of the mock object, in my case for the 'get_attrib_values' method. Mock also allows you to verify that the method has been called with the correct arguments. I do that by calling assert_called_with on the mock object. If you just want to verify that the method has been called at all, with no regard to the arguments, you can use assert_called.

There are many other things you can do with both Mox and Mock. Space doesn't permit me to go into many more details here, but I strongly encourage you to read the documentation and try things out on your own.

Another technique I want to show is how to simulate exceptions using the Mox framework. In my unit tests, I wanted to verify that my application reacts correctly to exceptions thrown by the SnapLogicManager class. Those exception are thrown for example when the SnapLogic server is not running. Here is the unit test I wrote:


    def test_get_attr_values_from_snaplogic_when_errors(self):
        # We simulate a SnapLogicManagerError and verify that it is caught properly

        # Create a mock SnapLogicManager
        mock_snaplogic_manager = mox.MockObject(SnapLogicManager)

        # Simulate a SnapLogicManagerError when get_attrib_values is called
        mock_snaplogic_manager.get_attrib_values(self.appname, self.hostname).AndRaise(SnapLogicManagerError('Boom!'))

        # Put all mocks created by mox into replay mode
        mox.Replay(mock_snaplogic_manager)

        # Run the test
        myapp = MyApp(self.appname, self.hostname, mock_snaplogic_manager)
        myapp.get_attr_values_from_snaplogic()


        # Verify all mocks were used as expected
        mox.Verify(mock_snaplogic_manager)

        # Verify that MyApp caught and logged the exception
        line = get_last_line_from_log(self.logfile)
        assert re.search('myapp - CRITICAL - get_attr_values_from_snaplogic --> SnapLogicManagerError: \'Boom!\'', line)


I used the following Mox API for simulating an exception: mock_snaplogic_manager.get_attrib_values(self.appname, self.hostname).AndRaise(SnapLogicManagerError('Boom!')).


To verify that my application reacted correctly to the exception, I checked the application log file, and I made sure that the last line logged contained the correct exception type and value.


Space does not permit me to show a Python-specific mock testing technique which for lack of a better name I call 'namespace overriding' (actually this is a bit like monkey patching, but for testing purposes; so maybe we can call it monkey testing?). I refer the reader to my blog post on 'Mock testing examples and resources' and I just quickly describe here the technique. Imagine that one of the methods of your application calls urllib.urlretrieve in order to download a file from an external Web server. Did I say external Web server, as in 'external resource not under your control'? I did, so you know that mock testing will help. My blog post shows how you can write a mocked_urlretrieve function, and override the name urllib.urlretrieve in your unit tests with your mocked version mocked_urlretrieve. Simple and elegant. The blog post also shows how you can return various canned valued from the mocked version of urlretrieve, based on different input values.

I started this article by saying that mock testing is a controversial topic in the area of unit testing. Many people feel that you should not use mock testing because you are not testing your application in the presence of the real objects on which it depends, so if the code for these objects changes, you run the risk of having your unit tests pass even though the application will break when it interacts with the real objects. This is a valid objection, and I don't recommend you go overboard with mocking every single interaction in your application. Instead, limit your mock testing, as I said in this article, to resources whose behavior and returned data are hard to control.

Another important note: whatever your unit testing strategy is, whether you use mock testing techniques or not, do not forget that you also need to have functional test and integration tests for your application. Integration tests especially do need to exercise all the resources that your application needs to interact with. For more information on different types of testing that you need to consider, please see my blog posts 'Should acceptance tests be included in the continuous build process?' and 'On the importance of functional testing'.

3 comments:

Gheorghe Gheorghiu said...

BRAVO! Ne bucuram tare mult ca esti pasionat de cea ce faci cu adevarat ! Nu stiu cat putem sa apreciem articolul tare pare tare "profesional" , nu pentru amatori :-)

Lukáš Čenovský said...

Nice article.

Carlos Ble said...

Hi,
There you go another framework for Python:
www.pyDoubles.org

Thanks :-)