Friday, January 28, 2005

Python unit testing part 3: the py.test tool and library

This is the last part of a 3-part discussion on Python unit test frameworks. You can find part 1 here and part 2 here. In this post I'll discuss the py.test tool and library.

py.test

Availability

As Python unit test frameworks go, py.test is the new kid on the block. It already has an impressive set of features, with more to come, since the tool is under very active development.

py.test is part of the py library, a collection of modules with bold goals. For example, the py.path module aims to "allow you to seamlessly work with different backends, currently a local filesystem, subversion working copies and subversion remote URLs. Moreover, there is an experimental extpy path to address a Python object on the (possibly remote) filesystem." (quoted from the Why, who, what and how do you do the py lib page).

Much of the motivation for writing the py library came from issues that arose in the PyPy project, whose goal is no other than to produce a simple and fast runtime-system for the Python language, written in Python itself. Note that the PyPy project received funding from the European Union, which is very encouraging for open-source projects in general and for Python projects in particular. As you can see, the guys in charge of these projects set their sights high and, judging by the intense activity on the py-dev mailing list, they'll waste no time reaching their goals. Of course, PyPy sprints in winter-sport-friendly Switzerland can't be all that bad either :-)

The main py.test developers are Holger Krekel and Armin Rigo. People interested in delving into all the juicy details of how to use py.test are urged to attend Holger's and Armin's talk at PyCon 2005. In this post, I'll just cover the basic usage, since I'm still very much a beginner at using this tool.

I'll start by a quick overview of the installation. More details can be found in Getting started with py.lib.

1. I didn't have subversion installed on my machine, so I had to jump through some hoops in order to install it (I won't go into the gory details here.)
2. I cd-ed into the directory where the py distribution will live ( /usr/local in my case)
3. I checked out the latest py distribution by running:
svn co http://codespeak.net/svn/py/dist dist-py
4. At this point, I had the py directory tree under /usr/local/dist-py
5. I added the following line to my .bash_profile:
eval `python /usr/local/dist-py/py/env.py`
(this line basically sets up the PATH and PYTHONPATH environment variables so that you can run py.test as a command-line utility and you can "import py" in your Python code)
6. I sourced .bash_profile in my current shell session:
. ~/.bash_profile

That's about it. Now you can just run "py.test -h" at a command prompt to see the various command-line options accepted by the tool.

Ease of use / API complexity

Two words: no API. It's a scary thought, but you can really go wild in writing your unit tests. Just two things you need to remember:

1. Prefix the names of your test functions/methods with test_ and the names of your test classes with Test
2. Save your test code in files that start with test_

That's about it in terms of API complexity. If you just run py.test in the directory that contains your tests, the tool will search the current directory and its subdirectories for files that start with test_ , then it will automagically invoke all the test functions/methods it finds in those files. There is no need to inherit your test class from a framework-specific class, as is the case with unittest.

As with everything, there is one exception to the "no API" rule. The one place where py.test does have an API is in providing hooks for managing test fixture state. I'll provide more details as you read on.

Here's a quick example of testing the sort() list method. I saved the following in a file called test_sort.py:

class TestSort:
def setup_method(self, method):
self.alist = [5, 2, 3, 1, 4]

def test_ascending_sort(self):
self.alist.sort()
assert self.alist == [1, 2, 3, 4, 5]

def test_custom_sort(self):
def int_compare(x, y):
x = int(x)
y = int(y)
return x - y
self.alist.sort(int_compare)
assert self.alist == [1, 2, 3, 4, 5]

b = ["1", "10", "2", "20", "100"]
b.sort()
assert b == ['1', '10', '100', '2', '20']
b.sort(int_compare)
assert b == ['1', '2', '10', '20', '100']

def test_sort_reverse(self):
self.alist.sort()
self.alist.reverse()
assert self.alist == [5, 4, 3, 2, 1]

def test_sort_exception(self):
import py.test
py.test.raises(NameError, "self.alist.sort(int_compare)")
py.test.raises(ValueError, self.alist.remove, 6)
Note the use of the special setup_method. It provides the same functionality as the setUp hook of the unittest module. I'll revisit py.test's state setup/teardown mechanism in the "Test fixture management" discussion below.

To run the tests in test_sort.py, simply invoke:
# py.test test_sort.py

inserting into sys.path: /usr/local/dist-py
============================= test process starts =============================
testing-mode: inprocess
executable : /usr/local/bin/python (2.4.0-final-0)
using py lib: /usr/local/dist-py/py
initial testconfig 0: /usr/local/dist-py/py/test/defaultconfig.py/.
===============================================================================
....
================== tests finished: 4 passed in 0.01 seconds ==================
Test execution customization

If you ran "py.test -h", you already saw that py.test has an impressive array of command-line options. The simplest one to try out is the verbose (-v) option:
# py.test -v test_sort.py

inserting into sys.path: /usr/local/dist-py
============================= test process starts =============================
testing-mode: inprocess
executable : /usr/local/bin/python (2.4.0-final-0)
using py lib: /usr/local/dist-py/py
initial testconfig 0: /usr/local/dist-py/py/test/defaultconfig.py/.
===============================================================================
0.000 ok test_sort.py:5 TestSort.test_ascending_sort()
0.000 ok test_sort.py:9 TestSort.test_custom_sort()
0.000 ok test_sort.py:23 TestSort.test_sort_reverse()
0.007 ok test_sort.py:28 TestSort.test_sort_exception()


================== tests finished: 4 passed in 0.02 seconds ==================
(a nice touch here is the printing of the execution time for each method. )

By the way, if you run py.test with no command-line options whatsoever, it will dutifully collect and run all the test code it can find in the current directory and below. Not bad for 7 characters worth of typing...

Another useful option is -S or --nocapture, which suppresses py.test's catching of sys.stdout/stderr output. By default, all such output is intercepted by py.test. I had some problems with this when a test I was running was itself redirecting stderr to stdout and not releasing it properly. When I ran my test code through py.test, all I got was a pretty mysterious traceback. I notified the developers on the py-dev list and the issue was promptly remedied. However, there may still be cases when you want your print statements to actually show up in your test output -- that's where the -S flag comes in handy (by default, print output from your test code will only show up for tests that fail.)

I might as well air a gripe at this point: py.test does a lot of "magic" behind the curtains, which may or may not be what you want. This is the price you pay for the "No API" feature. There's a lot of hidden stuff going on and sometimes py.test handles errors/exceptions less than gracefully -- so you can find yourself staring at stack traces that may be not very revelatory. However, the people on the py-dev mailing list are extremely responsive and supportive, so all you need to do is just send an email to py-dev at codespeak.net and you can be assured your issue will be responded to in a matter of hours.

Back to command-line options: I haven't played with all of them yet, but I'll just mention another useful one: --collectonly, which shows you all the tests found by py.test in the current directory and below, without actually running them. Here's the output I get:
# py.test --collectonly

inserting into sys.path: /usr/local/dist-py
============================= test process starts =============================
testing-mode: inprocess
executable : /usr/local/bin/python (2.4.0-final-0)
using py lib: /usr/local/dist-py/py
initial testconfig 0: /usr/local/dist-py/py/test/defaultconfig.py/.
===============================================================================
Directory('')
Module('/root/scripts/tests/test_blogger.py/.')
Class('/root/scripts/tests/test_blogger.py/.TestBlogger')
Module('/root/scripts/tests/test_blogger2.py/.')
Class('/root/scripts/tests/test_blogger2.py/.TestBlogger')
Module('/root/scripts/tests/test_doctest_sort.py/.')
Module('/root/scripts/tests/test_sort.py/.')
Class('/root/scripts/tests/test_sort.py/.TestSort')

====================== tests finished: in 0.16 seconds ======================
Test fixture management

py.test really shines in this category. It vastly surpasses unittest in providing setup and teardown hooks for managing test fixture/state in your test environments. You can have state maintained across test modules, classes and methods via hooks called setup_module/teardown_module, setup_class/teardown_class and setup_method/teardown_method respectively.

Let's first see an example of setup_method. As I mentioned before, this is the equivalent of unittest's setUp hook. Here's a test class I wrote for the Blogger module. I saved the following lines in a file called test_blogger.py:
import Blogger


class TestBlogger:

def setup_method(self, method):
print "in setup_method"
self.blogger = Blogger.get_blog()

def test_get_feed_title(self):
title = "fitnessetesting"
assert self.blogger.get_title() == title

def test_get_feed_posting_url(self):
posting_url = "http://www.blogger.com/atom/9276918"
assert self.blogger.get_feed_posting_url() == posting_url

def test_get_feed_posting_host(self):
posting_host = "www.blogger.com"
assert self.blogger.get_feed_posting_host() == posting_host

def test_post_new_entry(self):
init_num_entries = self.blogger.get_num_entries()
title = "testPostNewEntry"
content = "testPostNewEntry"
assert self.blogger.post_new_entry(title, content) == True
assert self.blogger.get_num_entries() == init_num_entries+1
# Entries are ordered most-recent first
# Newest entry should be first
assert title == self.blogger.get_nth_entry_title(1)
assert content == self.blogger.get_nth_entry_content_strip_html(1)

def test_delete_all_entries(self):
self.blogger.delete_all_entries()
assert self.blogger.get_num_entries() == 0
Let's run this through py.test with -v and -S, so that we can see the print output from setup_method:
# py.test -v -S test_blogger.py

inserting into sys.path: /usr/local/dist-py
============================= test process starts =============================
testing-mode: inprocess
executable : /usr/local/bin/python (2.4.0-final-0)
using py lib: /usr/local/dist-py/py
initial testconfig 0: /usr/local/dist-py/py/test/defaultconfig.py/.
===============================================================================
in setup_method
0.050 ok test_blogger.py:9 TestBlogger.test_get_feed_title()
in setup_method
0.000 ok test_blogger.py:13 TestBlogger.test_get_feed_posting_url()
in setup_method
0.001 ok test_blogger.py:17 TestBlogger.test_get_feed_posting_host()
in setup_method
10.173 ok test_blogger.py:21 TestBlogger.test_post_new_entry()
in setup_method
7.106 ok test_blogger.py:32 TestBlogger.test_delete_all_entries()


================== tests finished: 5 passed in 17.47 seconds ==================
Note that setup_method was called before each of the test_ methods. This is exactly what you need in those cases where you want your test methods to be independent of each other, each with its own state: you create that state in setup_method and you destroy it if needed in teardown_method.

However, you may not need the overhead of setting up/tearing down state on each and every test method call. In this case, you can use module-level or class-level setup/teardown hooks.

Here's an example of using a module-level hook. In my specific case, it doesn't make that much difference, since the call to Blogger.get_blog() returns the same object every time. But one can easily imagine cases where some fixture state (such as a database connection or query result, or a file to read from) needs to be set up once per module, so that all test classes/methods/functions in that module can then use it. I saved the following lines in a file called test_blogger2.py:
import Blogger


def setup_module(module):
print "in setup_module"
module.TestBlogger.blogger = Blogger.get_blog()

class TestBlogger:
"""the rest of code is the same"""
Running this code under py.test with -v and -S produces:
# py.test -v -S test_blogger2.py

inserting into sys.path: /usr/local/dist-py
============================= test process starts =============================
testing-mode: inprocess
executable : /usr/local/bin/python (2.4.0-final-0)
using py lib: /usr/local/dist-py/py
initial testconfig 0: /usr/local/dist-py/py/test/defaultconfig.py/.
===============================================================================
in setup_module
0.058 ok test_blogger2.py:9 TestBlogger.test_get_feed_title()
0.000 ok test_blogger2.py:13 TestBlogger.test_get_feed_posting_url()
0.001 ok test_blogger2.py:17 TestBlogger.test_get_feed_posting_host()
10.173 ok test_blogger2.py:21 TestBlogger.test_post_new_entry()
21.329 ok test_blogger2.py:32 TestBlogger.test_delete_all_entries()


================== tests finished: 5 passed in 31.71 seconds ==================
Note that setup_module was called only once, at the very beginning of the test run. For more examples of setup/teardown hooks in action, see the py.test online documentation.

Test organization

This is another strong point of py.test. Because the only requirement for a test file to be recognized as such by py.test is for the filename to start with test_ (and even this can be customized), it is very easy to organize your tests in hierarchies and test suites by creating a directory tree and placing/grouping your test files in the appropriate directories. Then you can just run py.test with no arguments and let it find and execute all the test files for you. A carefully chosen naming scheme would certainly help you in this scenario.

A feature of py.test which is a pleasant change from unittest is that the test execution order is guaranteed to be the same for each test run, and it is simply the order in which the test function/methods appear in a given test file. No alphanumerical sorting order to worry about.

I should probably also mention YAPTF (yet another py.test feature): testing starts as soon as the first test item iscollected. The collection process is iterative and does not need to complete before your first test items are executed. But wait...the nifty things you can do never seem to stop! You can disable the execution of test classes by setting the special class-level attribute disabled. An example from the documentation: to avoid running Unix-specific test under Windows, you can say

class TestEgSomePosixStuff:
disabled = sys.platform == 'win32'

def test_xxx(self):
...
Note that the py.test collection process can be used not only for unit tests, but for other types of testing, for example functional or system testing. In the past, I used a homegrown framework for collecting and running functional and system test suites, but I intend to replace that with the more elegant and customizable py.test mechanism. See the py.test documentation for more details, particularly The three components of py.test and Customizing the py.test process. One caveat here is that this is a work in progress, so some details related to the customizaton of the process might change. Consult the py-dev mailing list if in doubt.

Another py.test feature worthy to be mentioned in this category is the ability to define and run so-called "generative tests". I haven't yet used them but here's what the py.test documentation has to say about them:

"Generative tests are test methods that are generator functions which yield callables and their arguments. This is most useful for running a test function multiple times against different parameters. Example:


def test_generative():
for x in (42,17,49):
yield check, x

def check(arg):
assert arg % 7 == 0 # second generated tests fails!

Note that test_generative() will cause three tests to get run, notably check(42), check(17) and check(49) of which the middle one will obviously fail."

Assertion syntax

There is no special assertion syntax in py.test. You can use the standard Python assert statements, and they will (again, magically) be interpreted by py.test so that more helpful error messages can be printed out. This is in marked contrast with unittest's custom and somewhat clunky assertEqual/assertTrue/etc. mechanism.

I haven't showed an example of a failing test yet. Let's modify the assertion in the test_delete_all_entries method from:
assert self.blogger.get_num_entries() == 0

to:
assert self.blogger.get_num_entries() == 1

We now get this output:
# py.test test_blogger2.py

inserting into sys.path: /usr/local/dist-py
============================= test process starts =============================
testing-mode: inprocess
executable : /usr/local/bin/python (2.4.0-final-0)
using py lib: /usr/local/dist-py/py
initial testconfig 0: /usr/local/dist-py/py/test/defaultconfig.py/.
===============================================================================
....F
_______________________________________________________________________________

def test_delete_all_entries(self):
self.blogger.delete_all_entries()
E assert self.blogger.get_num_entries() == 1
~ assert 0 == 1
+ where 0 = "<"/root/scripts/tests/test_blogger2/py.TestBlogger instance at 0x40801f0c">".blogger.get_num_entries()

[/root/scripts/tests/test_blogger2.py:34]
_______________________________________________________________________________
============= tests finished: 4 passed, 1 failed in 36.78 seconds =============

(I had to manually insert quotes around the less-than and greater-than signs on the line starting with + where 0, otherwise the Blogger editor would eliminate the whole text between those characters)

Note the output:

def test_delete_all_entries(self):
self.blogger.delete_all_entries()
E assert self.blogger.get_num_entries() == 1
~ assert 0 == 1
+ where 0 = "<"/root/scripts/tests/test_blogger2/py.TestBlogger instance at 0x40801f0c">".blogger.get_num_entries()

When it encounters a failed assertion, py.test prints the lines in the method containing the assertion, up to and including the failure. It also prints the actual and the expected values involved in the failed assertion. This default behavior can be changed by giving the --nomagic option at the command line, in which case the assert statement behaves in the standard way, generating an output such as:
E       assert self.blogger.get_num_entries() == 1

~ AssertionError
Also, by default, when it encounters a failure py.test only shows the relevant portions of the tracebacks in order to make debugging easier. If you want to see the full traceback leading to the failure in all its gory details, you can run py.test with the --fulltrace option (I will spare you the details of the output.)

Dealing with exceptions

The test_sort.py module I showed above contains an example of how exceptions can be handled with py.test:

def test_sort_exception(self):
import py.test
py.test.raises(NameError, "self.alist.sort(int_compare)")
py.test.raises(ValueError, self.alist.remove, 6)
Here I needed to import py.test in my test code, in order to be able to use the raises() function it provides. This function takes the expected exception type as the first parameter. The other parameters are either
  • a string specifying the function or method call that is supposed to raise the exception, or
  • the actual callable, followed by its arguments
The more general form for the raises() function is:
py.test.raises(Exception, "func(*args, **kwargs)")

py.test.raises(Exception, func, *args, **kwargs)
Summary

I hope I convinced you that the py.test tool and the py library are worthy of your consideration, although I probably just scratched the surface in terms of their capabilities.

Here are some Pros and Cons of using py.test, in the interest of what I hope is a fair comparison between unittest, doctest and py.test.

py.test Pros
  • no API!
  • great flexibility in test execution via command-line arguments
  • strong support for test fixture/state management via setup/teardown hooks
  • strong support for test organization via collection mechanism
  • strong debugging support via customized traceback and assertion output
  • very active and responsive development team
py.test Cons
  • available only in "raw" form via subversion; this makes its inclusion in other modules/frameworks a bit risky
  • many details, especially the ones related to customizing the collection process, are subject to refactorings and thus may change in the future
  • a lot of magic goes on behind the scenes, which can sometimes obscure the tool's intent (it sure obscures its output sometimes)
An interesting question is how to best combine the strengths of the 3 tools I discussed (unittest, doctest and py.test). It seems that many people are already using unittest in conjunction with doctest, with the former being used in situations that demand fixture setup and teardown, and the latter in situations where small functions need to be tested without the overhead of creating test case classes. Regardless of the style of testing, doctest seems to be a great way of keeping documentation in sync with the code. At the same time, py.test can either coexist with or replace unittest in those cases where test fixture management and test organization are important.

I think that small teams will appreciate py.test's flexibility and utter lack of rules, whereas larger teams might appreciate unittest's structure and the fact that it standardizes the testing code, thus making it more maintainable. That is not to say that py.test cannot be adopted by large teams -- I just think that they will have to create at some point their own frameworks on top of py.test, in order to impose structure and standardization to the body of their test code. The good news is that py.test's versatility and ductility makes it easy to add structure on top of it.

9 comments:

Anonymous said...

py.test might feel odd at first,
but it has some really convenient
options. For example, using
--session allows you to run only those tests that failed last time.

--exitfirst (or just -x) exits instantly on the first error, making output cleaner and easier to spot problems.

I had nothing against unittest module, but I think I'll go for py.test from now on. Great work.

Anonymous said...

I'd like to plug TestOOB: http://testoob.sourceforge.net

It provides extensions for Python's unittest module, including cool features like:
* color output
* select tests w/regexps
* run pdb on failing tests
* XML/HTML reports

Plus it works without changes to your existing unittest test suites!

Anonymous said...

That is great.
Also there is a way to execute python code in parallel on SMP: Parallel Python

Anonymous said...

For functional testing of web applications you can try InCisif.net. The release 1.3
(available at the end of march 2007) will support IronPython.

InCisif.net is an automation tool designed to implement client-side functional testing of web applications under Internet Explorer 6.x or 7.x, using the C# or VB.NET language with Visual Studio 2003, 2005 or express editions.

Anonymous said...

hi it was interesting to read your stuff, do you think py.test also runs in Windows if so could you please blog the cmd-line option for testing scripts using "py.test" residing in other directory. For example, py\bin\ py.test pypy/lib/, I tried this doesnt work in windows.
thanks!

Norbert Klamann said...

@Anonymous:

add D:\Python25\Lib\site-packages\py\bin\win32\

to your path,

rename py.test.cmd to something like py_test.cmd

use that

Anonymous said...

Concerning the mock objects question. I think there is no need for a framework in Python for that. You just write an object. It's not Java, that always gets in the way with its type system. You just write:

class MockObject(object):
def whatever():
if a = "b":
raise something

You need no API for such simple things.

Ted Lilley said...

The mock library which most closely matches the feel of py.test, in my opinion, is Micheal Foord's mock library. It's available at:

http://www.voidspace.org.uk/python/mock/

It makes what I would term "blobs"...little objects that automatically morph into whatever you ask of them. When that's useful, it's great. When it's not useful, you can tell them to mimic and enforce behavior of existing objects, which they can "clone".

For behavior you need to verify, you instill them with the values and behavior you want to see. For things you want to be able to skip in your code, they become passive recorders of what methods were called, allowing you to inspect how they were treated by your code.

Very simple to get started with and very powerful. I find it especially easy and useful to automate nasty bits like avoiding gui prompts for user input and such. It's worth a look.

Jabba said...

Thank you, excellent introduction to pytest. With this I could write my first tests in no time :)

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...