Thursday, January 27, 2005

Python unit testing part 2: the doctest module

This is part 2 of a 3-part discussion on Python unit test frameworks. You can find part 1 here. In this second part, I'll discuss the doctest module.

doctest

Availability

The doctest module has been part of the Python standard library since version 2.1.

Ease of use

It's hard to beat doctest in this category. There is no need to write separate test functions/methods. Instead, one simply runs the function/method under test in a Python shell, then copies the expected results and pastes them in the docstring that corresponds to the tested function.

In my example, I simply added the following docstrings to the post_new_entry and delete_all_entries methods of the Blogger class:

def post_new_entry(self, title, content):
"""
>>> blog = get_blog()
>>> title = "Test title"
>>> content = "Test content"
>>> init_num_entries = blog.get_num_entries()
>>> rc = blog.post_new_entry(title, content)
>>> print rc
True
>>> num_entries = blog.get_num_entries()
>>> num_entries == init_num_entries + 1
True
"""

def delete_all_entries(self):
"""
>>> blog = get_blog()
>>> blog.delete_all_entries()
>>> print blog.get_num_entries()
0
"""
I then added the following lines to the __main__ section of the Blogger module:

if __name__ == "__main__":
import doctest
doctest.testmod()
Now the Blogger module is "doctest-ready". All you need to do at this point is run Blogger.py:
# python Blogger.py

In this case, the fact that we have no output is a good thing. Doctest-based tests do not print anything by default when the tests pass.

API complexity

There is no API! Note that doctest-enabled docstrings can contain any other text that is needed for documentation purposes.

However, there are some caveats associated with the way doctest interprets the docstrings (most of the following bullet points are lifted verbatim from the doctest documentation):
  • any expected output must immediately follow the final '>>> ' or '... ' line containing the code, and the expected output (if any) extends to the next '>>> ' or all-whitespace line
  • expected output cannot contain an all-whitespace line, since such a line is taken to signal the end of expected output
  • output to stdout is captured, but not output to stderr (exception tracebacks are captured via a different means)
  • doctest is serious about requiring exact matches in expected output. If even a single character doesn't match, the test fails (so pay attention to those white spaces at the end of your copied-and-pasted lines!)
  • the exact match requirement means that the output must be the same on every run, so in the output that you capture you should try not to:
    • print a dictionary, because the order of the items can vary from one run to the other
    • operate with floating-point numbers, because the precision can vary across platforms
    • print hard-coded addresses, such as <__main__.c>
Test execution customization

The output of a doctest run can be made more verbose by means of the -v flag. Here is an example:
# python Blogger.py -v

Trying:
blog = get_blog()
Expecting nothing
ok
Trying:
title = "Test title"
Expecting nothing
ok
Trying:
content = "Test content"
Expecting nothing
ok
Trying:
init_num_entries = blog.get_num_entries()
Expecting nothing
ok
Trying:
rc = blog.post_new_entry(title, content)
Expecting nothing
ok
Trying:
print rc
Expecting:
True
ok
Trying:
num_entries = blog.get_num_entries()
Expecting nothing
ok
Trying:
num_entries == init_num_entries + 1
Expecting:
True
ok
Trying:
blog = get_blog()
Expecting nothing
ok
Trying:
blog.delete_all_entries()
Expecting nothing
ok
Trying:
print blog.get_num_entries()
Expecting:
0
ok
25 items had no tests:
__main__
__main__.BlogParams
__main__.BlogParams.__init__
__main__.Blogger
__main__.Blogger.__init__
__main__.Blogger.delete_entry_by_url
__main__.Blogger.delete_nth_entry
__main__.Blogger.get_feed_posting_host
__main__.Blogger.get_feed_posting_url
__main__.Blogger.get_nonce
__main__.Blogger.get_nth_entry
__main__.Blogger.get_nth_entry_content
__main__.Blogger.get_nth_entry_content_strip_html
__main__.Blogger.get_nth_entry_title
__main__.Blogger.get_nth_entry_url
__main__.Blogger.get_num_entries
__main__.Blogger.get_post_headers
__main__.Blogger.get_tagline
__main__.Blogger.get_title
__main__.Blogger.refresh_feed
__main__.Blogger.snooze
__main__.Entry
__main__.Entry.__cmp__
__main__.Entry.__init__
__main__.get_blog
2 items passed all tests:
3 tests in __main__.Blogger.delete_all_entries
8 tests in __main__.Blogger.post_new_entry
11 tests in 27 items.
11 passed and 0 failed.
Test passed.
The amount of output seems a bit too verbose to me, but in any case it gives a feeling for how doctest actually runs the tests.

One other important customization that can be done for doctest execution is to include the docstrings in a separate file. This can be beneficial when the docstrings become too large and start detracting from the clarity of the code under test, instead of increasing it. Having separate doctest files scales better in my opinion (and this seems to be the direction the Zope project is heading with their test strategy, according again to Jim Fulton's PyCon 2004 presentation).

As an example, consider the following text file, which I saved as testfile_blogger:

Test for post_new_entry():

>>> from Blogger import get_blog
>>> blog = get_blog()
>>> title = "Test title"
>>> content = "Test content"
>>> init_num_entries = blog.get_num_entries()
>>> rc = blog.post_new_entry(title, content)
>>> print rc
True
>>> num_entries = blog.get_num_entries()
>>> assert num_entries == init_num_entries + 1

Test for delete_all_entries():

>>> blog = get_blog()
>>> blog.delete_all_entries()
>>> print blog.get_num_entries()
0
Note that free-flowing text can coexist with the actual output and there is no need to use quotes. This is especially advantageous for interspersing the output with descriptions of test scenarios, special boundary cases, etc.

To have doctest run the tests in this file, you need to put the following 2 lines either in their own module, or replacing the 2 lines at the end of the Blogger module:
import doctest

doctest.testfile("testfile_blogger")

I chose to save these lines in a separate file called doctest_testfile.py. Here is the result of the test run in this case:

# python doctest_testfile.py -v
Trying:
from Blogger import get_blog
Expecting nothing
ok
Trying:
blog = get_blog()
Expecting nothing
ok
Trying:
title = "Test title"
Expecting nothing
ok
Trying:
content = "Test content"
Expecting nothing
ok
Trying:
init_num_entries = blog.get_num_entries()
Expecting nothing
ok
Trying:
rc = blog.post_new_entry(title, content)
Expecting nothing
ok
Trying:
print rc
Expecting:
True
ok
Trying:
num_entries = blog.get_num_entries()
Expecting nothing
ok
Trying:
num_entries == init_num_entries + 1
Expecting:
True
ok
Trying:
blog = get_blog()
Expecting nothing
ok
Trying:
blog.delete_all_entries()
Expecting nothing
ok
Trying:
print blog.get_num_entries()
Expecting:
0
ok
1 items passed all tests:
12 tests in testfile_blogger
12 tests in 1 items.
12 passed and 0 failed.
Test passed.

Test fixture management

doctest does not provide any set-up/tear-down hooks for managing test fixture state (although this is a feature that's considered for inclusion in future release). This can sometimes be an advantage, in those cases where you do not want each of your test methods to be independent of each other, and you do not want the overhead of setting up and tearing down state for each test run.

Test organization and reuse

A new feature of doctest in Python 2.4 is the ability to piggyback on unittest's suite management capabilities. To quote from the doctest documentation:

As your collection of doctest'ed modules grows, you'll want a way to run all their doctests systematically. Prior to Python 2.4, doctest had a barely documented Tester class that supplied a rudimentary way to combine doctests from multiple modules. Tester was feeble, and in practice most serious Python testing frameworks build on the unittest module, which supplies many flexible ways to combine tests from multiple sources. So, in Python 2.4, doctest's Tester class is deprecated, and doctest provides two functions that can be used to create unittest test suites from modules and text files containing doctests. These test suites can then be run using unittest test runners.

The two doctest functions that can create unittest test suites are DocFileSuite, which takes a path to a file as a parameter, and DocTestSuite, which takes a module containing test cases as a parameter. I'll show an example of using DocFileSuite. I saved the following lines in a file called doctest2unittest_blogger.py:
import unittest

import doctest

suite = unittest.TestSuite()
suite.addTest(doctest.DocFileSuite("testfile_blogger"))
unittest.TextTestRunner().run(suite)
I passed to DocFileSuite the path to the testfile_blogger file that contains the docstrings for 2 of the Blogger methods. Running doctest2unittest_blogger produces:

python doctest2unittest_blogger.py
.
----------------------------------------------------------------------
Ran 1 test in 24.768s

OK
Note that this is a unittest-specific output. As far as unittest is concerned, it executed only 1 test, the one we added via the suite.addTest() call. We can increase the verbosity by calling unittest.TextTestRunner(verbosity=2).run(suite):
# python doctest2unittest_blogger.py

Doctest: testfile_blogger ... ok

----------------------------------------------------------------------
Ran 1 test in 33.693s

OK
To convince myself that the doctest tests are really being run, I edited testfile_blogger and changed the expected return codes from True to False, and the last expected value from 0 to 1. Now both doctest tests should fail:
# python doctest2unittest_blogger.py

Doctest: testfile_blogger ... FAIL

======================================================================
FAIL: Doctest: testfile_blogger
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.4/doctest.py", line 2152, in runTest
raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for testfile_blogger
File "testfile_blogger", line 0

----------------------------------------------------------------------
File "testfile_blogger", line 9, in testfile_blogger
Failed example:
print rc
Expected:
False
Got:
True
----------------------------------------------------------------------
File "testfile_blogger", line 19, in testfile_blogger
Failed example:
print blog.get_num_entries()
Expected:
1
Got:
0


----------------------------------------------------------------------
Ran 1 test in 25.691s

FAILED (failures=1)
As you can see, it's pretty easy to use the strong unittest test aggregation/organization mechanism and combine it with doctest-specific tests.

As a matter of personal taste, I prefer to write my unit tests using the unittest framework, since for me it is more conducive to test-driven development. I write a test, watch it fail, then write the code for making the test pass. It is an organic process that sort of grows on you and changes your whole outlook to code design and development. I find that I don't get the same results with doctest. Maybe I'm just not used to playing with the Python shell that much while I'm developing. For me, the biggest plus of using doctest comes from having top-notch documentation for my code. This style of testing is rightly called "literate testing" or "executable documentation" by the doctest folks. However, I only copy and paste the expected output into the docstrings AFTER I know that my code is working well (because it was already unit-tested with unittest).

Assertion syntax

There is no special syntax for assertions in doctest. Most of the time, assertions will not even be necessary. For example, in order to verify that posting a new entry increments the number of entries by 1, I simply included these 2 lines in the corresponding docstring:
>>> num_entries == init_num_entries + 1

True
Dealing with exceptions

I'll use an example similar to the one I used for the unittest discussions. Here are some simple doctest tests for the sort() list method:

def test_ascending_sort():
"""
>>> a = [5, 2, 3, 1, 4]
>>> a.sort()
>>> a
[1, 2, 3, 4, 5]
"""

def test_custom_sort():
"""
>>> def int_compare(x, y):
... x = int(x)
... y = int(y)
... return x - y
...
>>> a = [5, 2, 3, 1, 4]
>>> a.sort(int_compare)
>>> print a
[1, 2, 3, 4, 5]
>>> b = ["1", "2", "10", "20", "100"]
>>> b.sort()
>>> b
['1', '10', '100', '2', '20']
>>> b.sort(int_compare)
>>> b
['1', '2', '10', '20', '100']
"""

def test_sort_reverse():
"""
>>> a = [5, 2, 3, 1, 4]
>>> a.sort()
>>> a.reverse()
>>> a
[5, 4, 3, 2, 1]
"""

def test_sort_exception():
"""
>>> a = [5, 2, 3, 1, 4]
>>> a.sort(int_compare)
Traceback (most recent call last):
File "", line 1, in ?
NameError: name 'int_compare' is not defined
"""

if __name__ == "__main__":
import doctest
doctest.testmod()

Note that for testing exceptions, I simply copied and pasted the traceback output. doctest will look at the line starting with Traceback, will ignore any lines that contain details likely to change (such as file names and line numbers), and finally will interpret the lines starting with the exception type. Both the exception type (NameError in this case) and the exception details (which can span multiple lines) are matched against the actual output.

The doctest documentation recommends omitting traceback stack details and replacing them by an ellipsis (...), as they are ignored anyway by the matching mechanism.

To summarize, here are some Pros and Cons of using the doctest framework.

doctest Pros
  • available in the Python standard library
  • no API to remember, just copy and paste output from shell session
  • flexibility in test execution via command-line arguments
  • perfect way to keep documentation in sync with code
  • tests can be kept in separate files which can also contain free-flowing descriptions of test scenarios, special boundary cases, etc.
doctest Cons
  • output matching mechanism mandates that output must be the same on every run
  • no provisions for test fixture/state management
  • provides test organization only if used in conjunction with the unittest framework
  • does not seem very conducive to test-driven development

7 comments:

Ian Bicking said...

Note that doctest in Python 2.4 has a number of hooks to control how input is compared -- including ellipses (which like a wildcard match anything, e.g., an object's address) and a token for matching blank lines.

The original idea of doctest was to test documentation, i.e., make sure your examples are up to date. While it can be used for more than that, the documentation testing alone is a good reason to know and use it. I also find it is great for small functions, where a unittest setup seems too heavy; e.g., a five-line function can easily require twenty or more lines to test with unittest. doctest is generally nice when you have code with few dependencies and little internal state, and when you are doing input/output kind of testing (e.g., call x(foo) and you get y, call x(foo+1) and you get z, x(-1) gives you an exception, etc). I find this to be one of unittest's weakest areas, simply because of the tedium involved in the class declarations.

One last doctest detail that I like is __test__, a magic variable which can be a dictionary of doctest strings. This is somewhere between putting tests in docstrings, and putting tests in a separate text file.

As for TDD, I think it's just fine -- there are certainly places where the interactive prompt doesn't feel entirely predictable, so you'll copy and paste, but often it is predictable (as predictable as in a unittest at least). Or you'll put in a failure condition, but won't bother with identifying the specific exception until you run the test, which is still reasonable when you are only making sure that corner cases fail, not that they fail in a particular way.

PJE said...

I actually prefer it for TDD, because I never know what to start with in unittest. In a doctest, when I don't know what test to write, I start writing documentation, and it lets me ease into what I want the test to say. It seems easier to start with small tests if I think about trying to explain to someone what the finished code will be doing. And, I get some documentation written. :)

Anonymous said...

I have found that writing doctests requires a different way of thinking about tests, but once you grok it (examples from Zope 3 and PJE's blog help a lot here), the resulting tests are much more readable.

Switching from unittest to doctest is like switching to a different programming language -- you need to change the way you think about tests. As they say, you can write FORTRAN programs in any language.

--
Marius Gedminas (http://mg.pov.lt/)

chris smith said...

>As a matter of personal taste, I prefer to write my >unit tests using the unittest framework, since for me >it is more conducive to test-driven development. I >write a test, watch it fail, then write the code for >making the test pass. It is an organic process that >sort of grows on you and changes your whole outlook to >code design and development. I find that I don't get >the same results with doctest.

I like unittest, but I've lately been doing some things with nested classes, and I'm wondering if one could do something interesting with doctest for inner classes (in more of a white-box sense), and unittest for more of a black-box feel. It's nice to be able to look at the shape of the code, and have it suggest its usage at a glance...

def ZA said...

Thanks for the nice info!

ps. It seems the CODE html tag on your site uses double line spacing, it really makes it hard to read! (Opera 9.1)

Anonymous said...

Great Dog Gifts Here is one for you. Great dog lover site, very impressive. Take a look.

arnebab said...

You can set a blankline as output using

-BLANKLINE-
(replace the dashes ("-") with tags, as in html)

instead of the blank line.

Best wishes,
Arne
--
http://infinite-hands.draketo.de - A part of the history of free software in a song.