Monday, December 27, 2004

Performance testing with pyUnitPerf

Summary: If you are familiar with Mike Clark's JUnitPerf framework, then you may be interested in knowing that I have just released a Python port: pyUnitPerf. You can browse the source code here and you can get a gzipped tarball from here.


pyUnitPerf tests are meant to transparently add performance testing capabilities to existing pyUnit test suites. The pyUnitPerf framework introduces 2 new types of tests:
  • TimedTest: runs an existing pyUnit test case by imposing a limit to the time it takes to run the test
  • LoadTest: runs an existing pyUnit test case by simulating concurrent users and iterations
Let's look at a simple example adapted from the samples provided with JUnitPerf.

Assume you have the following pyUnit test case in a file called
from unittest import TestCase, TestSuite, TextTestRunner, makeSuite

import time

class ExampleTestCase(TestCase):

def __init__(self, name):
TestCase.__init__(self, name)

def testOneSecondResponse(self):

def suite(self):
return makeSuite(self.__class__)

if __name__ == "__main__":
example = ExampleTestCase("testOneSecondResponse")
runner = TextTestRunner()

Admitedly this is a contrived example, since the testOneSecondResponse method simply sleeps for 1 second and does not actually test anything, but it serves to illustrate the pyUnitPerf functionality.

Assume you want to create a timed test that waits for the completion of the ExampleTestCase.testOneSecondResponse method and then fails if the elapsed time exceeded 1 second. With pyUnitPerf, all you need to do is write the following code in a file called
from unittest import TestSuite, TextTestRunner

from ExampleTestCase import ExampleTestCase
from LoadTest import LoadTest
from TimedTest import TimedTest

class ExampleTimedTest:

def __init__(self):
self.toleranceInSec = 0.05

def suite(self):
s = TestSuite()
return s

def make1SecondResponseTimedTest(self):
Decorates a one second response time test as a
timed test with a maximum elapsed time of 1 second
maxElapsedTimeInSec = 1 + self.toleranceInSec

testCase = ExampleTestCase("testOneSecondResponse")
timedTest = TimedTest(testCase, maxElapsedTimeInSec)
return timedTest

if __name__ == "__main__":

The suite() method constructs a TestSuite object and adds to it the test object returned by the make1SecondResponseTimedTest method. This method instantiates an ExampleTestCase object, passing it the method name to be tested: testOneSecondResponse. We then pass the testCase object to a TimedTest object, together with the desired maximum time to wait for the completion of the test (to which we add a 50 msec. tolerance to account for time potentially spent setting up and tearing down the test case). In the __main__ section of the module, we simply call the pyUnit TextTestRunner, passing it the suite.

If you run: python at a command prompt, you will get the following output:
testOneSecondResponse (ExampleTestCase.ExampleTestCase) ... ok

TimedTest (WAITING): testOneSecondResponse (ExampleTestCase.ExampleTestCase): 1.0 sec.

Ran 1 test in 1.000s


Now let's make the test fail by requiring the timed test to finish in 0.9 seconds. To do this, simply change
maxElapsedTimeInSec = 1 + self.toleranceInSec
maxElapsedTimeInSec = 0.9 + self.toleranceInSec
Running python now results in the following output:
testOneSecondResponse (ExampleTestCase.ExampleTestCase) ... ok

TimedTest (WAITING): testOneSecondResponse (ExampleTestCase.ExampleTestCase): 1.
0 sec.

FAIL: testOneSecondResponse (ExampleTestCase.ExampleTestCase)
AssertionFailedError: Maximum elapsed time exceeded! Expected 0.95 sec., but was
1.0 sec.

Ran 1 test in 1.000s

FAILED (failures=1)

Note that the test result for the pyUnit test case (ExampleTestCase.testOneSecondResponse) is still marked as OK, but the test result for the Timed Test is marked as FAILED, since the time it took was longer than the specified maximum time of 0.96 sec.

Let's look at an example of a LoadTest. The following code can be saved in a file called
from unittest import TestSuite, TextTestRunner

from ExampleTestCase import ExampleTestCase
from LoadTest import LoadTest
from TimedTest import TimedTest

class ExampleLoadTest:

def __init__(self):
self.toleranceInSec = 0.05

def suite(self):
s = TestSuite()
return s

def make1SecondResponseSingleUserLoadTest(self):
Decorates a one second response time test as a single user
load test with a maximum elapsed time of 1 second
and a 0 second delay between users.
users = 1
maxElapsedTimeInSec = 1 + self.toleranceInSec

testCase = ExampleTestCase("testOneSecondResponse")
loadTest = LoadTest(testCase, users)
timedTest = TimedTest(loadTest, maxElapsedTimeInSec)
return timedTest

def make1SecondResponseMultipleUserLoadTest(self):
Decorates a one second response time test as a multiple-user
load test with a maximum elapsed time of 1.5
seconds and a 0 second delay between users.
users = 10
maxElapsedTimeInSec = 1.5 + self.toleranceInSec

testCase = ExampleTestCase("testOneSecondResponse")
loadTest = LoadTest(testCase, users)
timedTest = TimedTest(loadTest, maxElapsedTimeInSec)
return timedTest

def make1SecondResponse1UserLoadIterationTest(self):
Decorates a one second response time test as a single user
load test with 10 iterations per user, a maximum
elapsed time of 10 seconds, and a 0 second delay
between users.

users = 1
iterations = 10

maxElapsedTimeInSec = 10 + self.toleranceInSec
testCase = ExampleTestCase("testOneSecondResponse");
loadTest = LoadTest(testCase, users, iterations)
timedTest = TimedTest(loadTest, maxElapsedTimeInSec)
return timedTest

if __name__ == "__main__":

The 3 methods defined in ExampleLoadTest cover some of the most commonly used load test scenarios. See the doc strings at the beginning of each method for more details. Running python generates this output:
.TimedTest (WAITING): LoadTest (NON-ATOMIC): ThreadedTest: testOneSecondResponse

(ExampleTestCase.ExampleTestCase): 1.03099989891 sec.
..........TimedTest (WAITING): LoadTest (NON-ATOMIC): ThreadedTest: testOneSecondResponse (ExampleTestCase.ExampleTestCase): 1.0150001049 sec.
..........TimedTest (WAITING): LoadTest (NON-ATOMIC): ThreadedTest: testOneSecondResponse (ExampleTestCase.ExampleTestCase)(repeated): 10.0 sec.

Ran 21 tests in 12.046s


This time all the tests passed. Note that the multiple user load test (make1SecondResponseMultipleUserLoadTest) runs the individual test cases in parallel, each test case in its own thread, and thus the overall time is only slighly longer than 1 second. The multiple iteration test (make1SecondResponse1UserLoadIterationTest) runs the 10 iterations of the test case sequentially, and thus the overall time is 10 seconds.

We can make some of the tests fail by increasing the value of maxElapsedTimeInSec, similar to what we did for the TimedTest.

Why should you use pyUnitPerf? Mike Clark makes a great case for using JUnitPerf here. To summarize, you use pyUnitPerf when you have an existing suite of pyUnit tests that verify the correctness of your code, and you want to isolate potential performance issues with your code.

The fact that the pyUnitPerf test suites are completely independent from the pyUnit tests helps you in scheduling different run times for the 2 types of tests:
  • you want to run the pyUnit tests very often, since they (should) run fast
  • you want to run the pyUnitPerf tests less frequently, when trying to verify that an identified bottleneck has been eliminated (potential bottlenecks can be pinpointed via profiling for example); performance tests tend to take a longer time to run, so they could be scheduled for example during a nightly smoke test run

Monday, December 20, 2004

PyFIT Tutorial Part 2

In the conclusion of part 1 of the PyFIT tutorial, I said I will experiment with RowFixture tables. It turned out that they're really easy to use. I'll show here a simple example that will extend the FitNesse acceptance test suite for the Blog Management application.

In the FitNesse tests I put together in part 1 I used only ColumnFixture tables. One example is BloggerFixtures.GetEntryTitleContent, which takes entry_index as an argument and returns the title and the content of that entry. To use a database analogy, that particular ColumnFixture behaves like a SQL query of the form:

SELECT title, content FROM entries WHERE entry_index=N

In the FitNesse tests I wrote, I used GetEntryTitleContent in a table like this:

|1|Entry #3 Title|Entry #3 Content|
|2|Entry #2 Title|Entry #2 Content|
|3|Entry #1 Title|Entry #1 Content|

We can look at each row in this table as being the result of running the SQL query above, with entry_index set to 1, 2, and 3.

However, we really are interested in verifying ALL entries in the blog at the same time. It would be useful to have a fixture similar to a SQL query such as:

SELECT * FROM entries

This is exactly what a RowFixture achieves: it can be thought of as returning all the "rows" in the "database". In our case, we need to return all the entries in the blog, specifically their index, title, and content, so we need something like:

SELECT entry_index, title, content FROM entries

To achieve this, we need to write a clas derived from RowFixture. Our class needs to define
two methods:
  • getTargetClass
  • query
In PyFIT, the query method needs to return a list of objects, each object being an instance of the class returned by getTargetClass. Each object is a representation of a row in the RowFixture table. In our example, the target class needs to expose 3 class variables: entry_index, title and content.

Here is the ListAllEntries fixture I wrote:

from fit.RowFixture import RowFixture
import sys
blogger_path = "C:\\eclipse\\workspace\\blogger"
import Blogger

class BlogEntry:
"entry_index": "Int",
"title": "String",
"content": "String",
entry_index = 0
title = ""
content = ""

class ListAllEntries(RowFixture):

def getTargetClass(self):
return BlogEntry

def query(self):
blogger = Blogger.get_blog()
num_entries = blogger.get_num_entries()
entry_list = []
for i in range(num_entries):
blog_entry = BlogEntry()
blog_entry.entry_index = i+1
blog_entry.title = blogger.get_nth_entry_title(i+1)
blog_entry.content = blogger.get_nth_entry_content_strip_html(i+1)
return entry_list

The class derived from RowFixture is ListAllEntries, which defines the 2 methods I mentioned:
  • getTargetClass() returns the class BlogEntry
  • query() builds a list of BlogEntry objects by retrieving entry_index, title and content for each entry in the blog and assigning their values to the corresponding variables of BlogEntry
The BlogEntry class will be inspected by the FitNesse framework, so it needs to define the _typeDict TypeAdapter with the 3 variables listed in the FitNesse table that uses the ListAllEntries fixture. Speaking of this table, here is an example of how I used it:

|1|Entry #4 Title|Entry #4 Content|
|2|Entry #3 Title|Entry #3 Content|
|3|Entry #2 Title|Entry #2 Content|
|4|Entry #1 Title|Entry #1 Content|

The above fragment is from a new test page I created in the BlogMgmtSuite acceptance test suite. You can see the full contents of this page here: PostDelete4EntriesRowFixture.

Using a RowFixture has another big advantage over using a ColumnFixture: a RowFixture will retrieve all the entries in the table and will let you know if you have any extra rows or if you are missing any rows. Here is an example:

Assume the blog has 4 entries. Assume we use a ColumnFixture like this:

|1|Entry #4 Title|Entry #4 Content|
|2|Entry #3 Title|Entry #3 Content|
|3|Entry #2 Title|Entry #2 Content|

In this case, all the rows will be colored green and the test will pass, since the ColumnFixture will retrieve one by one the three entries we specified.

Now assume we use a RowFixture like this:

|1|Entry #4 Title|Entry #4 Content|
|2|Entry #3 Title|Entry #3 Content|
|3|Entry #2 Title|Entry #2 Content|

In this case, the test will fail, since the RowFixture will retrieve the fourth entry too and will let us know that we missed it in our table. FitNesse will show a fourth row colored in red, with the following text:

|4 surplus|Entry #1 Title|Entry #1 Content|

  • a RowFixture is the ideal vehicle to use when you need to verify that all the data you entered into the system under test so far is there -- no more and no less
  • using RowFixtures proved to be easy, once I visualized their similarity to SQL "select * from" queries.

Friday, December 17, 2004


Some people whose blogs I'm reading (Tim Bacon, Laurent Bossavit, Alan Francis) decided to "gridblog" today about fatherhood and how it may relate to software development. More specifically, the topic is: "what do you wish someone had told you".

Here's a comment I made to Laurent Bossavit, who wrote about the strong emotions he experienced in his role as a father:

Laurent, in my experience as a father of 2, the strong emotions you mention were one of the best things that ever happened to me. I am one of the intellectual types you describe, and I always felt it hard to express my feelings. Having kids made me relive many moments from my past and opened many wounds and old feelings I had repressed. Instead of making me a stranger to myself, being a father made me know my true forgotten/supressed self much better. I am still struggling with how to be a good father -- it will probably be a life-long struggle, but it's a fight worth fighting. Many people tend to avoid struggles like this (and of course trying to be a good husband is another struggle), they tend to live in a bubble that isolates them from painful frictions but also from discovering who they really are. If you avoid pain, you can't know joy.

This doesn't really qualify as advice to give to a prospective father, and has nothing really to do with software development. But I do have this advice to give to software developers who sometimes tend to live in the bubble I described above, and who dedicate their life to their work (I know, I was an workaholic too): become a father! The unavoidable struggles that will ensue: the sleepless nights; the diaper-changing; the interferences from your own parents; walking the fine line between wanting to help your kids and letting them struggle on their own; walking the fine line between being too authoritarian or too permissive; realizing that your kids are their own persons and not simply extensions of yourself; realizing that kids need to be allowed to express their feelings in all situations, even if this contradicts everything you were taught and shown in your own childhood; realizing that maybe you need to confront your own parents with painful memories and emotions --- all these things will polish your soul and your heart and will help you unveil your TRUE you.

As far as software development is concernded, I think the royal way to getting in touch with your creativity is to become child-like --- and what better way to achieve that than to become a father. It is the child within us who is really the source of our energy and enthusiasm.

Thursday, December 16, 2004

STAF/STAX tutorial

Automated test distribution, execution and reporting with STAF/STAX

Assume you are part of a test team whose goal is to automate the distribution of tests to a large set of clients running on various platforms. You want to run an automated 'smoke test' in the following scenario:

  • A nightly build process sends out email notification that a new version of the software is ready to be tested.
  • The notification email triggers a 'Start Smoke Test' request sent to a dedicated machine (I will call it the "test management" machine), which coordinates all clients to be tested
  • The test management machine somehow tells all clients that version x.y.z of the software is available, then tells all clients to run a test harness and report back the results
  • After getting back the test results from all the clients, the test management machine sends out a test summary email containing the overall, failed, and successful test case count

You could try to implement this functionality yourself by writing for example a simple XML-RPC agent that runs on every client and accepts commands from the test management machine, but you soon realize that you need something more robust, something that had already been proved in large test environments.

I will show you how to use the STAF/STAX framework from IBM, which offers all the features listed in the smoke-test scenario just described.

The idea behind STAF is to run a very simple agent on all the machines that participate in the STAF testbed. Every machine can then run services on any other machine, subject to a so-called trust level. In practice, one machine will act as what I called the 'test management' machine, and will coordinate the test runs by sending jobs to the test clients. STAX is one of the services offered on top of the low-level STAF plumbing. It greatly facilitates the distribution of jobs to the test clients and the collection and logging of test results. STAX jobs are XML files spiced up with special <script> tags that contain Python code (actually Jython, but there are no differences for the purpose of this tutorial). This in itself was for us a major reason for choosing STAF over other solutions.

Here is the test environment that I will use in my example:

  • 3 clients that will run the test harness: one called win1 running some flavor of Windows, one called linux1 running some flavor of Linux, and one called sol1 running some flavor of Solaris
  • 1 test management machine, called mgmt1
  • 1 desktop PC, called desktop1

What follows is a step-by-step guide to configuring STAF and STAX on the machines in the example testbed:

Step 1: Install and configure STAF on the test clients

Install STAF on all 5 machines (I refer the readers to the STAF User Guide for details on installing STAF). Here is an example of a STAF configuration file (on Unix, it's usually in /usr/local/staf/bin/STAF.cfg) for one of the 3 client machines:

# Enable TCP/IP connections
interface tcpip

# Turn on tracing of internal errors and deprecated options
trace on error deprecated

serviceloader library STAFDSLS



Note that the 3 client machines need to increase the trust level (default is 3) for the test management machine, so that the latter can initiate jobs on the clients.

Step 2: Install and configure STAX on the management host

Install the STAX service on the test management machine. In STAF parlance, this machine is called the STAX Service machine (readers are referred to the STAX User's Guide for details on STAX). There are a few things to remember in terms of requirements for this machine:
    • Java 1.2 or later needs to be installed
    • The following 2 variables need to be set (for example in .bash_profile):
export CLASSPATH=$CLASSPATH:/usr/local/staf/lib/JSTAF.jar	

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/staf/lib
The STAF.cfg configuration file needs to have the STAX service added to it (note the increase to trust level 4 for the desktop1 machine, which will act as the monitoring machine and needs special rights to connect to mgmt1):

# Enable TCP/IP connections
interface tcpip

# Turn on tracing of internal errors and deprecated options
trace on error deprecated

serviceloader library STAFDSLS




Step 3: Start the STAF agent

Run STAFProc on all 5 machines. STAFProc is the STAF agent that listens on a specific port (default is 6500) for STAF-specific commands.

Step 4: Create STAX job files

Create the STAX XML job files that will be interpreted by the STAX service on mgmt1. Here is an example of a job file, called client_test_harness.xml, that will run a test harness on our 3 clients

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE stax SYSTEM "C:\QA\STAF\stax.dtd">
The following <script> element is overriden if the SCRIPTFILE is used
A SCRIPTFILE can be specified either in the STAX Monitor, or directly when submitting a job to STAX

VERSION = '1.0.1'

clients_os = { 'win1':'win',
harness_path = {
'unix': '/qa/harness',
'win' : 'C:/qa/harness'
tests_unix = [
[ 'unix_perms', '' ],
[ 'long_names', '' ]]
tests_win = [
[ 'unicode_names', '' ]]

<defaultcall function="Main"/>

<function name="Main">
<import machine="'mgmt1'" file="'/QA/STAF/stax_jobs/log_result.xml'"/>
<call function="'ClientTestHarness'">
[clients_os, harness_path, tests_unix, tests_win]

<function name="ClientTestHarness">
<function-required-arg name='clients_os'/>
<function-required-arg name='harness_path'/>
<function-required-arg name='tests_unix'/>
<function-required-arg name='tests_win'/>
<function-other-args name='args'/>
<paralleliterate var="machine" in="clients_os.keys()">
os_type = clients_os[machine]
tests = {}
if os_type == 'unix':
tests = tests_unix
if os_type == 'win':
tests = tests_win
<iterate var="test" in="tests">
test_name = machine + "_" + test[0]
<testcase name="test_name">
cmdline = harness_path[os_type] + "/" + test[1] </script>
<timer duration = "HARNESS_TIMER_DURATION">
<stderr mode="'stdout'" />
<returnstdout />
<call function="'LogResult'">machine</call>

The syntax may seem overwhelming at first, but it turns out to be quite manageable once you get he hang of it. Here are the salient points in the above file:
  • The first <script> element sets a number of Python variables which are then used in the body of the XML document; think of them as global constants
  • There is one function called in the element; this function is called Main and is defined in the first element
  • The Main function imports another XML file (log_result.xml) in order for this job to be able to call a function (LogResult) defined in the imported file
  • The Main function then calls a function called ClientTestHarness, passing it as arguments four Python variables defined at the top
  • Almost all the action in this job happens in the ClientTestHarness function, which starts by declaring its required arguments, then proceeds by running a series of tests in parallel on each of our 3 client machines; the parallelism is achieved by means of the element
  • The <script> element that follows is simple Python code that retrieves the test suite to be run from the global dictionaries, via the machine name
  • On each machine, the tests in the test suite are executed sequentially, via the element
  • A element is defined for each test, so that we can easily retrieve the test statistics at the end of the run, via the LogResult function
  • For each test, the ClientTestHarness function executes a element, which runs a command (for example on the target machine; the element is surrounded by a element which will mark the test as failed if the specified time interval reaches its limit
  • The element also specifies that the command to be executed redirect stderr to stdout, and return stdout
  • Finally, the ClientTestHarness function calls LogResult, passing it the machine name as the only argument

The LogResult function is defined in the log_result.xml file. Its tasks are to:
  • interpret the return code (which is a STAF-specific variable called RC) and the output (which is a STAX-specific variable called STAXResult) for each test case
  • set the result of the test run to PASS or FAIL
  • log it accordingly

Here is the log_result.xml file:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE stax SYSTEM "C:\QA\STAF\stax.dtd">
<function name="LogResult">
<function-required-arg name='machine'/>
<function-other-args name='args'/>
<if expr="RC != 0">
<tcstatus result="'fail'">'Failed with RC=%s' % RC</tcstatus>
<log level="'error'">'Process failed with RC=%s, Result=%s' % (RC, STAFResult)</log>
<elseif expr="STAXResult != None">
<iterate var="file_info" in="STAXResult" indexvar="i">
<if expr="file_info[0] == 0">
import re
fail ='FAIL', file_info[1])
log_msg = 'HOST:%s\n\n%s' % (machine,file_info[1])
<if expr = "fail">
<tcstatus result="'fail'">'Test output contains FAIL'</tcstatus>
<log level="'error'">log_msg</log>
<tcstatus result="'pass'"></tcstatus>
<log level="'info'">log_msg</log>
<log level="'error'">'Retrieval of file %s contents failed with RC=%s' % (i, file_info[0])</log>
<log level="'info'">'STAXResult is None'</log>

Step 4: Run STAX jobs on the test clients

From the desktop1 machine, which in STAX is called the monitoring machine, send a carefully crafted STAF command to the test management machine, telling it to run the client_test_harness.xml job:


The above incantation runs a STAF command by specifying a service (STAX) and a request (EXECUTE), then passing various arguments to the request, the most common ones being a FILE (the path to the job XML file), a MACHINE to run the job file on (mgmt1), and a JOBNAME (which can be any string value).

Two other arguments, entirely optional, are Python-specific:

  • SCRIPTFILE -- points to a Python file whose code will be interpreted after the code in the top-level <script> element of the job file; in my example, the file contains definitions of Python variables that will override the variables defined in the job's <script> element
  • SCRIPT -- can contain any inline Python code, which will be interpreted after any code in the job's top-level <script> element, and after any code in the SCRIPTFILE; in my example, the VERSION variable is set to 1.0.2 on the command line via the SCRIPT argument, because it is retrieved from the nightly build email notification, and thus is not known in advance. The value 1.0.2 will override whatever values are given in the <script> element and in

To summarize, a SCRIPTFILE file is commonly used as a "static" repository for Python variables that are used across several job files, whereas the SCRIPT inline code is used to pass "dynamic" values for Python variables on the command line.

The above STAF command, if successful, returns an integer that represents the job ID. Based on this ID, we can query the log service on the STAX machine (mgmt1) by running this command:


STAX also offers a GUI monitoring tool called the STAX Job Monitor that is usually run on the monitoring machine (desktop1 in our example). The tool is a Java application that is started via the command line (java -jar STAXMon.jar) in the directory which contains the STAX service jar files. The Job Monitor displays the processes that are run within the job, as well as the test case information (test name, pass/fail status, duration) for each test in the test suite.


I will now show how all these steps fit together and give us the capability to run the automated smoke-test scenario I described in the beginning of this section.

  • A build completion message is sent to several distribution lists with a subject that contains the new version of the software
  • The build message is forwarded via a mail alias to an account on the test management machine
  • A .procmailrc file on the test management machine triggers a Python script that runs the "STAF mgmt1 STAF EXECUTE " command in step 4. The script then sits in a loop and periodically queries the log file (via the LOG QUERY command) for the new job identified by jobID. When it sees a line containing "Stop|JobID: jobID", the script sends a message with the job log in its body and the test count (overall, pass and fail) in its subject
  • The PARALLELITERATE and ITERATE constructs available in STAX allow us to achieve both parallel and sequential operations for the test run: we run the test harness in parallel on all clients, then on each client we run the individual tests comprising the harness in a sequential order. Another very useful STAX construct is TIMER, which makes it very easy to time out the failed tests so that the whole test run is not held up
  • Since all the individual tests are written using our framework, all the test results are also saved in the Firebird database and can be easily inspected via a Web interface
Two more things are worth mentioning:
  • Support for STAF/STAX is top-notch and comes via the staf-users mailing list from the IBM developers working on this project.I had two questions answered within hour of each posting.
  • STAF/STAX is used as the test distribution platform for the Linux Test Project. The January 2005 issue of "Linux Journal" has an article on the Linux Test Project that mentions STAF/STAX.

Wednesday, December 15, 2004

PyFIT tutorial on

I turned the PyFIT tutorial blog entry into a standalone HTML page that you can find here. I also updated the PytonFit page over at to point to the same HTML page. I need to figure out how to add raw HTML to the wiki at

Wednesday, November 24, 2004

Writing FitNesse tests in Python

Last week I had the chance to participate, together with other xpsocal members, in a seminar on FitNesse given by Micah Martin from Object Mentor. Micah is one of the creators of FitNesse and also the primary maintainer of the distribution. Instead of a slide-show, he actually fired up FitNesse on his Mac OS X laptop and we worked together on writing some acceptance tests and fixtures. It was a great presentation and it clarified many aspects of FitNesse that can be hard to understand by simply looking at the documentation. Micah's examples used Java, so as a homework I thought I will try to use Python instead. I knew there is a Python port of the FIT framework (appropiately called PyFIT, written by John Roth) that tries to stay as close as possible to the newest releases of FitNesse. In what follows I will show how I used PyFIT to write FitNesse acceptance tests and test fixtures. I installed FitNesse locally, on a Windows box, but the examples should work just as well in any environment supported by FitNesse.

Install FitNesse

- Download the the full distribution of FitNesse from its site's download page and unzip it. In the following discussion, I will assume it was unzipped in C:\fitnesse

- Start FitNesse by running the run.bat script in C:\fitnesse. By default, FitNesse runs a Web server on port 80, but you can specify a different port, say 8080, like this:

run.bat -p 8080

- Go to http://localhost in your browser and look around to get a feel of what FitNesse looks like (it's a Wiki) -- the User Guide is especially helpful
Install PyFIT

- Download the latest PyFIT distribution (currently from the Files section of the FitNesse Yahoo group. You need to become a member of the group in order to be able to download the file

- Unzip the PyFIT distribution (I unzipped it in C:\Python23\PyFIT-0.61a), then install it by running:

python install

- Go to the FitNesse FrontPage and edit it by clicking on the blue Edit button on the left-hand side of the page. Add a WikiWord somewhere in the page (for example PythonTest) and click Save. The FrontPage should now show PythonTest followed by a ? character, which means it does not yet have a page called PythonTest

- To create content for the PythonTest page, click on the ? link, then enter the following lines and click Save:

!define COMMAND_PATTERN {python "%m" %p}
!define TEST_RUNNER {C:\Python23\PyFIT-0.6a1\fit\}

- The COMMAND_PATTERN variable tells FitNesse to start python instead of java when running test fixtures (NOTE: there has been a recent post on the fitnesse mailing list about the quotes surrounding %m causing problems on Linux. It looks like Linux users should use %m insted of "%m")
- The TEST_RUNNER variable tells FitNesse to replace the default fit.FitServer Java test runner with, which is its Python equivalent. If you unzipped PyFIT somewhere else, you need to set TEST_RUNNER to the full path to

- The | | notation is a special Wiki convention for describing tables:
  • the cell in the first row tells FitNesse to run a fixture called eg.Division
  • the cells in second row are the names of variables to be set or get by FitNesse
  • a ? mark next to a variable name means that FitNesse will retrieve the value for that variable and will compare it agains the values entered by the user
  • the next 2 rows are examples of input (10 and 5 in the second row, 10 and 2 in the third row) and expected output (2 in the second row and 5.001 in the third row)

- Make the PythonTest page a test page by clicking on the blue Properties button, then clicking the Test checkbox, then clicking on the Save button on the Properties page

- Run the test by clicking on the Test button which should have appeared on top of the Edit button. If everything goes well, the row containing 10, 5 and 2 should be colored green, and the next row should be colored red, with 2 cells for quotient: one containing "5.001 expected" and the other containing "5.0 actual". Of course, the correct answer is 5, but I put 5.001 just so we can see how a failed test looks like. Also, on top of the table, FitNesse prints a summary of the test run. In this case, it is:

Assertions: 1 right, 1 wrong, 0 ignored, 0 exceptions
- One question you may have at this point is how did FitNesse find the eg.Division fixture. If you installed everything like I did, you will see a directory called eg under C:\Python23\PyFIT-0.6a1\fit and a file called in that directory. There is also an important file called in that directory; this is an empty file which tells Python to treat that directory as a package, thus making it possible to invoke with the argument eg.Division

- It is instructive at this point to look at the file. Here it is:

from fit.ColumnFixture import ColumnFixture

class Division(ColumnFixture):
"numerator": "Float",
"denominator": "Float",
"quotient": "Float",
"quotient.charBounds": "99",
numerator = 0.0
denominator = 0.0

def quotient(self):
return self.numerator / self.denominator

- The Division class is derived from the fit.ColumnFixture class. A ColumnFixture is the most common type of fixture and is most useful when you need a way to specify inputs for the acceptance test, then verify outputs.

- A PyFIT-specific caveat is that you need to have the _typeDict dictionary in any fixture you write (actually anywhere you need to use a TypeAdapter). Here is what the PyFIT documentation (PyFIT-0.6a1/fit/Doc/FIT_TypeAdapters.htm) has to say about Type Adapters:

"FIT, as distributed, requires the use of a Type Adapter to convert the text format used in the tables to and from the actual data type needed by the various fields, methods and properties in the fixture. This practice came from the Java version, where manifest typing makes it easy to find the expected data type by reflection. Since Python does not have manifest typing, there is no way that the reflection capability can determine the proper type. Type information must be provided another way. Since types need to be declared separately, TypeAdapter contains a more general metadata mechanism. This consists of a dictionary named _typeDict that must be located in the class whose fields, methods or properties are to be referenced. It's also possible to pass a metadata dictionary to the type adapter factory function; this is useful for unusual requirements."

In the _typeDict dictionary specifies that the numerator and denominator variables, as well as the return value of the quotient method, are of type Float.

Write the Python application that you want to test with FitNesse

- For this tutorial, I wrote a simple Blog Management application, based on the Universal Feed Parser Python module written by Mark Pilgrim of Dive Into Python fame

- To follow along the example, you need to register with Blogger and create a test blog (I called mine fitnessetesting). The most important parameters to remember are your user name, your password and your blog's Atom Feed URL (which in my case is

- You need to install the feedparser Python module. You can download it from the project's SourceForge page. Unzip it, then run the following:

python install

- I implemented the blog management functionality in a class called Blogger. You can get the source code from here:

- If you created your own blog at Blogger, you need to assign the appropriate values to the BlogParams class variables FEED_URL, USER and PASSWORD

- The main functionality of the Blogger class is exported via the following methods:
  • post_new_entry: posts a new entry to the blog, takes a title and a content as parameters
  • get_nth_entry_title: returns the title of the nth entry (note that the entries are ordered most-recent first, so the entry entered last will appear first in the blog)
  • get_nth_entry_content: returns the HTML content of the nth entry
  • get_nth_entry_content_strip_html: strips HTML tags from the content (partially implemented so far)
  • delete_nth_entry: deletes the nth entry from the blog
  • delete_all_entries: self-explanatory
- The methods for posting and deleting entries use the Blogger API; in order to craft the appropriate XML-RPC parameters, I used code from the Python Atom API examples available at

- Consumers of the Blogger class get an instance of a Blogger object via a get_blog() function that returns a global variable defined at the Blogger module level

- I also wrote a unit test suite for my Blogger class. You can get the source code from here:

- After making sure that all the unit tests pass, we're ready to tackle the FitNesse integration

Write a FitNesse acceptance test suite for the Blogger application

- Now let's go back to the FitNesse FrontPage (http://localhost). We will create a FitNesse test suite page called BlogMgmtSuite. You need to edit the FrontPage and add the BlogMgmtSuite WikiWord. Save, then click on the ? link next to BlogMgmtSuite in order to edit that page. For now, enter the following content and save it:

!define COMMAND_PATTERN {python "%m" %p}
!define TEST_RUNNER {C:\Python23\PyFIT-0.6a1\fit\}
!path C:\eclipse\workspace\blogger
!2 ''Blog Management acceptance test suite''
|^DeleteAllEntries|''Delete all blog entries''|

- The first 2 lines are similar to the ones in our PythonTest example. The third line tells FitNesse to look for our fixtures in the C:\eclipse\workspace\blogger directory. The line starting with !2 will be interpreted as a header. The ^ character before DeleteAllEntries is a special FitNesse convention signifying that the DeleteAllEntries page is a "child" page of the current BlogMgmtSuite page

- We need to tell FitNesse that this page is a suite; to do this, click on the Properties button of the BlogMgmtSuite page, then click on the Suite checkbox and save

- You will see a question mark next to DeleteAllEntries. This is of course because that page does not exist. Let's create it: click on the ? link, then enter the following content and save it:

!3 We test deleting all entries from the blog

We delete all blog entries and we verify that we have 0 entries.


- If you look at the URL of this page, you will note that it appears as http://localhost/BlogMgmtSuite.DeleteAllEntries -- this is because FitNesse made it a "child" page of the BlogMgmtSuite page

- We also need to make the DeleteAllEntries page a test page, by going to its Properties and clicking on the Test checkbox

- Now we are ready to run the DeleteAllEntries acceptance test by clicking on the Test button of that page. The first first 2 rows of the table should be colored yellow and you should see the following text:

Fixture 'BloggerFixtures.DeleteAllEntries' not found

- Also, if you click on the Output Captured link in the right upper corner, you should see this (on a single line):
template: 'Fixture '%s' not found' args:

'('FixtureNotFound', u'BloggerFixtures.DeleteAllEntries')'

- Of course, we did not write the DeleteAllEntries fixture yet. Let's first create a directory called BloggerFixtures under C:\eclipse\workspace\blogger (if you create this directory somewhere else, you need to replace the !path variable in the BlogMgmtSuite page with the parent directory of BloggerFixtures). We also need to create an empty filed called in the BloggerFixture directory, otherwise will not consider it a package and thus will not know how to interpret BloggerFixtures.DeleteAllEntries

- Now let's write the actual DeleteAllEntries fixture. It resembles the Division fixture discussed previously as an example. Here it is, in its entirety:

from fit.ColumnFixture import ColumnFixture
import sys
blogger_path = "C:\\eclipse\\workspace\\blogger"
import Blogger

class DeleteAllEntries(ColumnFixture):
"num_entries": "Int",
blogger = Blogger.get_blog()

def num_entries(self):
return self.blogger.get_num_entries()

def execute(self):

- The DeleteAllEntries class is derived from fit.ColumnFixture. It gets a Blogger object via the get_blog() function call and assigns it to a class variable called blogger. Note that you need to replace blogger_path at the top of the file with the actual path to your class

- The table we entered in the DeleteAllEntries page contains only one column, called num_entries, followed by a question mark, which means it is a method, as opposed to a variable.We need to implement a method called num_entries in In our case, this method will simply call the get_num_entries() method of the blogger object. Note also that we need to define the _typeDict dictionary and declare the type of the value returned by the num_entries method. In our case, that type is an integer

- The real action happens in a special method called execute(), which is inherited from fit.ColumnFixture. This method is called by the FitNesse framework once for every row of the table, before the other elements of that row are processed. In our case, the execute() method simply calls delete_all_entries() method of the blogger object. Generally speaking, all the FitNesse fixtures you will write will serve as a simple 'wiring' to the actual application objects. All they need to do is to define the variables and the methods named in the FitNesse table, then return the appropriate values by calling the application object's methods

- Let's run the DeleteAllEntries test again by clicking the Test button. This time, the row containing the number 0 should turn green, and the test summary at the top of the page should read:

Assertions: 1 right, 0 wrong, 0 ignored, 0 exceptions

- We wrote our first FitNesse fixture. Time to celebrate by going back to the BlogMgmtSuite page and then clicking on the Suite button. This will run all the tests found on the page -- in our case only DeleteAllEntries. You should see the following summary colored green at the top of the page:

Test Pages: 1 right, 0 wrong, 0 ignored, 0 exceptions Assertions: 1 right, 0 wrong, 0 ignored, 0 exceptions

- The summary is followed by individual test results with links to the tests. In our case, we have:

1 right, 0 wrong, 0 ignored, 0 exceptions DeleteAllEntries

- We made BlogMgmtSuite a test suite (as opposed to a test page) so that we can easily add more test pages to it. Let's add another acceptance test which will test posting and then deleting a blog entry. Edit the BlogMgmtSuite page, add the following line and save:

|^PostDelete1Entry|''Post single blog entry''|

- Now click on the ? link next to PostDelete1Entry, enter the following text, save the page and don't forget to make it a Test page by clicking the Test checkbox in its Properties:

!3 We test posting a single new entry to the blog

First we delete all entries from the blog and we verify that we have 0 entries.


Then we post the new entry and we verify that we have 1 entry.

|BloggerFixtures.PostSingleEntry Title|BloggerFixtures.PostSingleEntry Content|true|1|

We verify that the entry has the title and the content we indicated.

|1|BloggerFixtures.PostSingleEntry Title|BloggerFixtures.PostSingleEntry Content|

We delete the entry and we verify that we have no entries left.


- Note how convenient it is to write down an acceptance test in FitNesse. We simply explain what we want to do, then we put together tables with inputs and desired outputs. FitNesse will ignore anything that is not part of the table, and will invoke the fixtures defined in the tables

- The page contains many fixtures that we haven't defined yet. After calling DeleteAllEntries, we call PostNewEntry, which is another ColumnFixture with 2 member variables (title and content) and 2 methods (valid and num_entries). We need to create another Python module in the BloggerFixtures directory and call it Here is my version of it:

from fit.ColumnFixture import ColumnFixture
import sys
blogger_path = "C:\\eclipse\\workspace\\blogger"
import Blogger

class PostNewEntry(ColumnFixture):
"title": "String",
"content": "String",
"num_entries": "Int",
"valid": "Boolean"

title = ""
content = ""
blogger = Blogger.get_blog()

def num_entries(self):
return self.blogger.get_num_entries()

def valid(self):
return self.blogger.post_new_entry(self.title, self.content)

- This fixture defines the all-important _typeDict dictionary, then defines 2 class variables, title and content, whose values will be assigned by the FitNesse framework. The valid() method does all the work here, by invoking the Blogger object's post_new_entry method and returning true or false, depending on the success or failure of this operation. The num_entries() method again serves as only a wiring to the Blogger object's get_num_entries method

- The other 2 fixtures invoked on the PostDelete1Entry page are similar. For your reference, here is (all the fixtures are also available at

from fit.ColumnFixture import ColumnFixture
import sys
blogger_path = "C:\\eclipse\\workspace\\blogger"
import Blogger

class DeleteEntry(ColumnFixture):
"num_entries": "Int",
"valid": "Boolean"
blogger = Blogger.get_blog()

def num_entries(self):
return self.blogger.get_num_entries()

def valid(self):
entry_index = int(self.getArgs()[0])
return self.blogger.delete_nth_entry(entry_index)

- An interesting thing to note in the DeleteEntry fixture is that it gets a parameter passed via the FitNesse table cell next to the fixture name: !|BloggerFixtures.DeleteEntry|1|; the parameter is available in the class via the self.getArgs() list, which in this case contains only 1 element

- Here is

from fit.ColumnFixture import ColumnFixture
import sys
blogger_path = "C:\\eclipse\\workspace\\blogger"
import Blogger

class GetEntryTitleContent(ColumnFixture):
"entry_index": "Int",
"title": "String",
"content": "String",

entry_index = 0
blogger = Blogger.get_blog()

def title(self):
return self.blogger.get_nth_entry_title(self.entry_index)

def content(self):
return self.blogger.get_nth_entry_content_strip_html(self.entry_index)

- Now we can go back to the BlogMgmtSuite page and click the Suite button. FitNesse will run both acceptance tests that we have defined so far: DeleteAllItems and PostDelete1Entry. If for some reason you notice that only DeleteAllItems has been executed, this usually means that you forgot to make PostDelete1Entry a test page

- If everything ran fine, you will see the following summary:

Test Pages: 2 right, 0 wrong, 0 ignored, 0 exceptions Assertions: 9 right, 0 wrong, 0 ignored, 0 exceptions
1 right, 0 wrong, 0 ignored, 0 exceptions DeleteAllEntries
8 right, 0 wrong, 0 ignored, 0 exceptions PostDelete1Entry

I wrote a few more acceptance tests for the Blogger application. They expand on PostDelete1Entry by posting several entries, deleting them one by one and verifying at each step that the expected entries are kept. They are available at

Lessons learned

1. From a developer's perspective, I realized that FitNesse acceptance tests exercise the application in ways that unit tests do not.

- In a typical unit test scenario, a single application object (for example a Blogger object) is instantiated in the setUp method and then used throughout the test case class methods.

- In a typical FitNesse acceptance test, there are several fixture objects created during the execution of a test page, each object using potentially different instances of the application object. This can result in inconsistencies between the states of these objects, and thus in seemingly mysterious failures. Case in point:

I started getting failures after posting three entries and deleting them one by one via the PostDelete3Entries test page. This page is using 4 different fixtures: DeleteAllEntries, PostNewEntry, GetEntryTitleContent and DeleteEntry. Each of these fixtures appears in several tables, but the PyFIT framework only instantiates one object per fixture, so for example the same GetEntryTitleContent object is used in every row of every table corresponding to the GetEntryTitleContent fixture.

When I first wrote the fixtures, each fixture object had its own copy of a Blogger object, so there was a disconnect between the number and order of entries reported by each fixture. For example, when an entry was deleted via DeleteEntry, the fact was not reflected in the GetEntryTitleContent object. Here is an example of how DeleteEntry looked initially:

class DeleteEntry(ColumnFixture):
"num_entries": "Int",
"valid": "Boolean"
blog_params = Blogger.BlogParams()
blogger = Blogger.Blogger(blog_params)

def num_entries(self):
return self.blogger.get_num_entries()

I rewrote the fixture so that now it looks like this:

class DeleteEntry(ColumnFixture):
"num_entries": "Int",
"valid": "Boolean"
blogger = Blogger.get_blog()

def num_entries(self):
return self.blogger.get_num_entries()

Note the call to Blogger.get_blog(), which returns a global variable at the module level, thus guaranteeing that all fixture objects will share a single instance of a Blogger object.

The FitNesse documentation recommends using static instances of Singleton objects for sharing an object among tables on the same page. Since it is not trivial to properly implement the Singleton pattern in Python (although I found several examples on the Web), I resorted to calling the get_blog() function which returns a Blogger object as a global variable. I will experiment with implementing the Singleton pattern following some of the examples I found.

2. I had problems with the HTTP connection to, hence many times both the unit tests and the FitNesse acceptance tests failed. I intend to isolate this type of failures by using Mock Objects (for example the pMock library) in order to simulate posting and deleting blog entries without actually using the Blogger API.

3. FitNesse advocates the use of SetUp and TearDown pages that are automatically invoked at the beginning and the end of a test suite run. I have not used them so far, but I intend to use them, for example for instantiating a Singleton object in the SetUp page.

I realize that all this is just scratching the surface of what FitNesse can do, and I intend to further explore its functionality. My next goal is to experiment with RowFixture and ActionFixture tables. The GetEntryTitleContent fixture in particular is a query, so it's especially suitable for being expressed as a RowFixture. I expressed it as a ColumnFixture mainly because it seemed pretty hard to write RowFixture-derived classes in PyFIT, but I intend to remedy this as soon as I can.
I will report my findings in a future post, so keep your RSS/Atom Reader tuned to this blog :-)

Friday, November 19, 2004

IBM's STAF/STAX test automation framework

I've been playing with IBM's open-sourced STAF/STAX test automation framework and I'm really impressed. Support is top-notch too via the project's SourceForge mailing lists. I asked 2 questions one day: both were answered in 30 minutes, and both answers solved my problems.

STAF shines when you need to distribute your tests on various platforms, run the tests, get all the results in one place together with the FAIL/PASS count for the test run. I had been trying to do this in the past with a home-grown XML-RPC agent written in Python, but it was not industrial-strength and it had very few reporting features compared to STAF.

You can run a regression/smoke test based on STAF every night, and have STAF send you an email with the test result. Perfect for continuous integration!

Hello World

I decided to bite the bullet and start a blog. I find it hard to jot down my thoughts in a regular notebook, so maybe an online medium will help. I intend to write about some of my experiences in the field of agile testing. I'm also a huge Python fan, so expect to read about that too.

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...