Friday, December 15, 2006

Mock testing examples and resources

Mock testing is a very controversial topic in the area of unit testing. Some people swear by it, others swear at it. As always, the truth is somewhere in the middle. But first of all, let's ask Wikipedia about mock objects. Here's what it says:

"Mock objects are simulated objects that mimic the behavior of real objects in controlled ways. A computer programmer typically creates a mock object to test the behavior of some other object, in much the same way that an automobile designer uses a crash test dummy to test the behavior of an automobile during an accident."

This is interesting, because it talks about accidents, which in software development speak would be errors and exceptions. And indeed, I think one of the main uses of mock objects is to simulate errors and exceptions that would otherwise be very hard to reproduce.

Let's get some terminology clarified: when people say they use mock objects in their testing, in most cases they actually mean stubs, not mocks. The difference is expanded upon with his usual brilliance by Martin Fowler in his article "Mocks aren't stubs". I'll let you read that article and draw your own conclusions. Here are some of mine: stubs are used to return canned data to your methods or functions under test, so that you can make some assertions on how your program reacts to that data (here, I use "program" as shorthand for "method or function under test", not for executable or binary.) Mocks, on the other hand, are used to specify certain expectations about how the methods of the mocked object are called by your program: how many times, with how many arguments, etc.

In my experience, stubs are more useful than mocks when it comes to unit testing. You should still use a mock library or framework even when you want to use stubs, because these libraries make it very easy to instantiate and work with stubs -- as we'll see in some of the examples I'll present.

I said that mock testing is a controversial topic. If you care to follow the exchange of comments I had with Bruce Leggett on this topic, you'll see that his objections to mocking are very valid. His main point is that if you mock an object and the interface or behavior of that object changes, your unit tests which use the mock will pass happily, when in fact your application will fail.

I thought some more about Bruce's objections, and I think I can come up with a better rule of thumb now than I could when I replied to him. Here it is: use mocking at the I/O boundaries of your application and mock the interactions of your application with external resources that are not always under your control.

When I say "I/O boundaries", I mean mostly databases and network resources such as Web servers, XML-RPC servers, etc. The data that these resources produce is consumed by your application, and it often contains some randomness that makes it very hard for your unit tests to assert things about it. In this case, you can use a stub instead of the real external resource and you can return canned data from the stub. This gives you some control over the data that is consumed by your program and allows you to make more meaningful assertions about how your program reacts to that data.

These external resources are also often unreachable due to various error conditions which again are not always under your control, and which are usually hard to reproduce. In this case, you can mock the external resource and simulate any errors or exceptions you want, and see how your program reacts to them in your unit tests. This relates to the "crash test dummy" concept from the Wikipedia article.

In most cases, the external resources that your application needs are accessed via stable 3rd party libraries or APIs whose interfaces change rarely. For example, in Python you can use standard library modules such as urllib or xmlrpclib to interact with Web servers or XML-RPC servers, or 3rd party modules such as cxOracle or MySQLdb to interact with various databases. These modules, either part of the Python stdlib or 3rd party, have well defined interfaces that rarely if ever change. So you have a fairly high degree of confidence that their behavior won't change under you at short notice, and this makes them good candidates for mocking.

I agree with Bruce that you shouldn't go overboard with mocking objects that you create in your own application. There's a good chance the behavior/interface of those objects will change, and you'll have the situation where the unit tests which use mock versions of these objects will pass, when in fact the application as a whole will fail. This is also a good example of why unit tests are not sufficient; you need to exercise your application as a whole via functional/integration/system testing (here's a good concrete example why). In fact, even the most enthusiastic proponents of mock testing do not fail to mention the need for testing at higher levels than unit testing.

Enough theory, let's see some examples. All of them use Dave Kirby's python-mock module. There are many other mock libraries and modules for Python, with the newest addition being Ian Bicking's minimock module, which you should definitely check out if you use doctest in your unit tests.

The first example is courtesy of Michał, who recently added some mock testing to the Cheesecake unit tests. This is how cheesecake_index.py uses urllib.urlretrieve to retrieve a package in order to investigate it:

try:
downloaded_filename, headers = urlretrieve(self.url, self.sandbox_pkg_file)
except IOError, e:
self.log.error("Error downloading package %s from URL %s" % (self.package, self.url))
self.raise_exception(str(e))
if headers.gettype() in ["text/html"]:
f = open(downloaded_filename)
if re.search("404 Not Found", "".join(f.readlines())):
f.close()
self.raise_exception("Got '404 Not Found' error while trying to download package ... exiting")
f.close()

To test this functionality, we used to have a unit test that actually grabbed a tar.gz file from a Web server. This was obviously sub-optimal, because it required the Web server to be up and running, and it couldn't reproduce certain errors/exceptions to see if we handle them correctly in our code. Michał wrote a mocked version of urlretrieve:

def mocked_urlretrieve(url, filename):
if url in VALID_URLS:
shutil.copy(os.path.join(DATA_PATH, "nose-0.8.3.tar.gz"), filename)
headers = Mock({'gettype': 'application/x-gzip'})
elif url == 'connection_refused':
raise IOError("[Errno socket error] (111, 'Connection refused')")
else:
response_content = '''
HTML_INCLUDING_404_NOT_FOUND_ERROR
''''
dump_str_to_file(response_content, filename)
headers = Mock({'gettype': 'text/html'})

return filename, headers
(see the _helper_cheesecake.py module for the exact HTML string returned, since Blogger refuses to include it because of its tags)

The Mock class from python-mock is used here to instantiate and mock the headers object returned by urlretrieve. When you do:
headers = Mock({'gettype': 'text/html'})
you get an object which has all its methods stubbed out and returning None, with the exception of the one method you specified, gettype, which in this case will return the string 'text/html'.

This is the big advantage of using a library such as python-mock: you don't have to manually stub out all the methods of the object you want to mock; instead, you simply instantiate that object via the Mock class, and let the library handle everything for you. If you don't specify anything in the Mock constructor, all the methods of the mocked object will return None. In our case, since cheesecake_index.py only calls header.gettype(), we were only interested in this method, so we specified it in the dictionary passed to the Mock class, along with its return value.

The mocked_urlretrieve function inspects its first argument, url, and, based on its value, either copies a tar.gz file into a target location (indicated by filename) for further inspection, or raises an IOError exception, or returns an HTML document with a '404 Not Found' error. This illustrates the usefulness of mocking: it avoids going to an external resource (a Web server in this case) to retrieve a file, and instead it copies it from the file system to another location on the file system; it simulates an exception that would otherwise be hard to reproduce consistently; and it returns an error which also would be hard to reproduce. Now all that remains is to exercise this mocking functionality in some unit tests, and this is exactly what test_index_url_download.py does, by exercising 3 test cases: valid URL, invalid URL (404 error) and unreachable server. Just to exemplify, here's how the "Connection refused" exception is tested:

try:
self.cheesecake = Cheesecake(url='connection_refused',
sandbox=default_temp_directory, logfile=logfile)
assert False, "Should throw a CheesecakeError."
except CheesecakeError, e:
print str(e)
msg = "Error: [Errno socket error] (111, 'Connection refused')\n"
msg += "Detailed info available in log file %s" % logfile
assert str(e) == msg

You might have a question at this point: how did we make our application aware of the mocked version of urlretrieve? In Java, where the mock object techniques originated, this is usually done by what is called "dependency injection". This simply means that the mocked object is passed to the object under test (OUT) either via the OUT's constructor, or via a setter method of the OUT's. In Python, this is absolutely unnecessary, because of one honking great idea called namespaces. Here's how Michał did it:
import cheesecake.cheesecake_index as cheesecake_index
from _helper_cheesecake import mocked_urlretrieve
cheesecake_index.urlretrieve = mocked_urlretrieve
What happens here is that the urlretrieve name used inside the cheesecake_index module is simply reassigned and pointed to the mocked_urlretrieve function. Very simple and elegant. This way, the OUT, in our case the cheesecake_index module, is completely unchanged and blissfully unaware of any mocked version of urlretrieve. It is only in the unit tests that we reassign urlretrieve to its mocked version. Further proof, if you needed one, of Python's vast superiority over Java :-)

The second example is courtesy of Karen Mishler from ARINC. She used the python-mock module to mock an interaction with an external XML-RPC server that produces avionics data. In this case, the module that gets mocked is xmlrpclib (I changed around some names of servers and methods and I got rid of some information which is not important for this example):

fakeResults = {
"Request":('|returncode|0|/returncode|',
'|machineid|fakeServer:81:4080|/machineid|'),
"Results":('|returncode|0|/returncode|',
'|origin|ABC|/origin|\n|destination|DEF|/destination|\n'),
}
mockServer = Mock(fakeResults)
xmlrpclib = Mock({"Server":mockServer})

(I replaced the XML tag brackets with | because Blogger had issues with the tags....Beta software indeed)

Karen mocked the Server object used by xmlrpclib to return a handle to the XML-RPC server. When the application calls xmlrpclib.Server, it will get back the mockServer object. When the application then calls the Request or Results methods on this object, it will get back the canned data specified in the fakeResults dictionary. This completely avoids the network traffic to and from the real XML-RPC server, and allows the application to consume specific data about which the unit tests can make more meaningful assertions.

The third example doesn't use mocking per se, but instead illustrates a pattern sometimes called "Fake Object"; that is, replacing an object that your application depends on with a more lightweight and faster version to be used during testing. A good example is using an in-memory database instead of a file system-based database. This is usually done to speed up the unit tests and thus have more frequent continuous integration runs.

The MailOnnaStick application that Titus and I presented at our PyCon06 tutorial uses Durus as the back-end for storing mail message indexes. In the normal functionality of the application, we store the data on the file system using the FileStorage functionality in Durus (see the db.py module). However, Durus also provides MemoryStorage, which we decided to use for our unit tests via the mockdb.py module. In this case, mockdb is actually a misnomer, since we're not actually mocking or stubbing out methods of the FileStorage version, but instead we're reimplementing that functionality using the faster MemoryStorage. You can see how we use mockdb in our unit tests by looking at the test_index.py unit test module. Python namespaces come to the rescue again, since we don't have to make index.py, the consumer of the database functionality, aware of any mocking-related changes, except inside the unit test. In the test_index.py unit test, we reassign the index.db name to mockdb:
from mos import index, mockdb
index.db = mockdb
Speaking of patterns, I found very thorough explanations of unit testing patterns at the xUnit Patterns Web site. Sometimes the explanations are too thorough, if I may say so -- too much hair splitting going on -- but overall it's a good resource if you're interested in the more subtle nuances of Test Stubs, Test Doubles, Mock Objects, Test Spies, etc.

Mock testing is being used pretty heavily in Behavior-Driven Development (BDD), which I keep hearing about lately. I haven't looked too much into BDD so far, but from the little I've read about it, it seems to me that it's "just" syntactic sugar on top of the normal Test-Driven Development process. They do emphasize good naming for the unit tests, which if done to the letter turns the list of unit tests into a specification for the behavior of the application under test (hence the B in BDD). I think this can be achieved by properly naming your unit test, without necessarily resorting to tools such as RSpec. But I may be wrong, and maybe BDD is a pretty radical departure from TDD -- I don't know yet. It's worth checking it out in any case.

I'll finish by listing some Web sites and articles related to mock testing. Enjoy!

5 comments:

Zach Dennis said...

"His main point is that if you mock an object and the interface or behavior of that object changes, your unit tests which use the mock will pass happily, when in fact your application will fail."

This is a very real problem. I've recently seen this in an application and it just blew my mind. Why do some people apply mocks and stubs to absolutely all aspects of testing? I've heard that some people like fixture-less testing which will increase the speed of their tests, but if speed is what they're worried about they don't seem to be worried about the right thing.

"I agree with Bruce that you shouldn't go overboard with mocking objects that you create in your own application. There's a good chance the behavior/interface of those objects will change, and you'll have the situation where the unit tests which use mock versions of these objects will pass, when in fact the application as a whole will fail. This is also a good example of why unit tests are not sufficient;"

Exactly!

Thats for the article. This was a great read.

Anonymous said...

In reference to the argument about when to use mocks you said... 'There's a good chance the behavior/interface of those objects will change, and you'll have the situation where the unit tests which use mock versions of these objects will pass, when in fact the application as a whole will fail.' I don't believe this to be a valid argument. Mocked versions of any object used for testing a class are, assumingly, collaborating objects with actual class/object under test. These objects should have their own unit test. When this fails due to changing functionality, you can update it's tests accordingly. The only problem I see is in inaccurately mocked objects as collaborators in other test classes.

Anonymous said...

I found some useful examples at http://www.mocksamples.org/

Anonymous said...

I've read all of these arguments. Many good ideas were exchanged, but it seems as if everyone was missing the obvious. Finally I read the comment by jyaunches. Exactly what was on my mind.

Unknown said...

One of the things I find very frustrating about having a lot of mocks in unit tests is how brittle these tests become when collaborators are changed. Unit tests should take more of a black box approach and having loads of mocks couples them heavily to the implementation of the software under test.

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...