Monday, February 28, 2005

Performance vs. load vs. stress testing

Here's a good interview question for a tester: how do you define performance/load/stress testing? Many times people use these terms interchangeably, but they have in fact quite different meanings. This post is a quick review of these concepts, based on my own experience, but also using definitions from testing literature -- in particular: "Testing computer software" by Kaner et al, "Software testing techniques" by Loveland et al, and "Testing applications on the Web" by Nguyen et al.

Update July 7th, 2005

From the referrer logs I see that this post comes up fairly often in Google searches. I'm updating it with a link to a later post I wrote called 'More on performance vs. load testing'.

Performance testing

The goal of performance testing is not to find bugs, but to eliminate bottlenecks and establish a baseline for future regression testing. To conduct performance testing is to engage in a carefully controlled process of measurement and analysis. Ideally, the software under test is already stable enough so that this process can proceed smoothly.

A clearly defined set of expectations is essential for meaningful performance testing. If you don't know where you want to go in terms of the performance of the system, then it matters little which direction you take (remember Alice and the Cheshire Cat?). For example, for a Web application, you need to know at least two things:
  • expected load in terms of concurrent users or HTTP connections
  • acceptable response time
Once you know where you want to be, you can start on your way there by constantly increasing the load on the system while looking for bottlenecks. To take again the example of a Web application, these bottlenecks can exist at multiple levels, and to pinpoint them you can use a variety of tools:
  • at the application level, developers can use profilers to spot inefficiencies in their code (for example poor search algorithms)
  • at the database level, developers and DBAs can use database-specific profilers and query optimizers
  • at the operating system level, system engineers can use utilities such as top, vmstat, iostat (on Unix-type systems) and PerfMon (on Windows) to monitor hardware resources such as CPU, memory, swap, disk I/O; specialized kernel monitoring software can also be used
  • at the network level, network engineers can use packet sniffers such as tcpdump, network protocol analyzers such as ethereal, and various utilities such as netstat, MRTG, ntop, mii-tool
From a testing point of view, the activities described above all take a white-box approach, where the system is inspected and monitored "from the inside out" and from a variety of angles. Measurements are taken and analyzed, and as a result, tuning is done.

However, testers also take a black-box approach in running the load tests against the system under test. For a Web application, testers will use tools that simulate concurrent users/HTTP connections and measure response times. Some lightweight open source tools I've used in the past for this purpose are ab, siege, httperf. A more heavyweight tool I haven't used yet is OpenSTA. I also haven't used The Grinder yet, but it is high on my TODO list.

When the results of the load test indicate that performance of the system does not meet its expected goals, it is time for tuning, starting with the application and the database. You want to make sure your code runs as efficiently as possible and your database is optimized on a given OS/hardware configurations. TDD practitioners will find very useful in this context a framework such as Mike Clark's jUnitPerf, which enhances existing unit test code with load test and timed test functionality. Once a particular function or method has been profiled and tuned, developers can then wrap its unit tests in jUnitPerf and ensure that it meets performance requirements of load and timing. Mike Clark calls this "continuous performance testing". I should also mention that I've done an initial port of jUnitPerf to Python -- I called it pyUnitPerf.

If, after tuning the application and the database, the system still doesn't meet its expected goals in terms of performance, a wide array of tuning procedures is available at the all the levels discussed before. Here are some examples of things you can do to enhance the performance of a Web application outside of the application code per se:
  • Use Web cache mechanisms, such as the one provided by Squid
  • Publish highly-requested Web pages statically, so that they don't hit the database
  • Scale the Web server farm horizontally via load balancing
  • Scale the database servers horizontally and split them into read/write servers and read-only servers, then load balance the read-only servers
  • Scale the Web and database servers vertically, by adding more hardware resources (CPU, RAM, disks)
  • Increase the available network bandwidth
Performance tuning can sometimes be more art than science, due to the sheer complexity of the systems involved in a modern Web application. Care must be taken to modify one variable at a time and redo the measurements, otherwise multiple changes can have subtle interactions that are hard to qualify and repeat.

In a standard test environment such as a test lab, it will not always be possible to replicate the production server configuration. In such cases, a staging environment is used which is a subset of the production environment. The expected performance of the system needs to be scaled down accordingly.

The cycle "run load test->measure performance->tune system" is repeated until the system under test achieves the expected levels of performance. At this point, testers have a baseline for how the system behaves under normal conditions. This baseline can then be used in regression tests to gauge how well a new version of the software performs.

Another common goal of performance testing is to establish benchmark numbers for the system under test. There are many industry-standard benchmarks such as the ones published by TPC, and many hardware/software vendors will fine-tune their systems in such ways as to obtain a high ranking in the TCP top-tens. It is common knowledge that one needs to be wary of any performance claims that do not include a detailed specification of all the hardware and software configurations that were used in that particular test.

Load testing

We have already seen load testing as part of the process of performance testing and tuning. In that context, it meant constantly increasing the load on the system via automated tools. For a Web application, the load is defined in terms of concurrent users or HTTP connections.

In the testing literature, the term "load testing" is usually defined as the process of exercising the system under test by feeding it the largest tasks it can operate with. Load testing is sometimes called volume testing, or longevity/endurance testing.

Examples of volume testing:
  • testing a word processor by editing a very large document
  • testing a printer by sending it a very large job
  • testing a mail server with thousands of users mailboxes
  • a specific case of volume testing is zero-volume testing, where the system is fed empty tasks
Examples of longevity/endurance testing:
  • testing a client-server application by running the client in a loop against the server over an extended period of time
Goals of load testing:
  • expose bugs that do not surface in cursory testing, such as memory management bugs, memory leaks, buffer overflows, etc.
  • ensure that the application meets the performance baseline established during performance testing. This is done by running regression tests against the application at a specified maximum load.
Although performance testing and load testing can seem similar, their goals are different. On one hand, performance testing uses load testing techniques and tools for measurement and benchmarking purposes and uses various load levels. On the other hand, load testing operates at a predefined load level, usually the highest load that the system can accept while still functioning properly. Note that load testing does not aim to break the system by overwhelming it, but instead tries to keep the system constantly humming like a well-oiled machine.

In the context of load testing, I want to emphasize the extreme importance of having large datasets available for testing. In my experience, many important bugs simply do not surface unless you deal with very large entities such thousands of users in repositories such as LDAP/NIS/Active Directory, thousands of mail server mailboxes, multi-gigabyte tables in databases, deep file/directory hierarchies on file systems, etc. Testers obviously need automated tools to generate these large data sets, but fortunately any good scripting language worth its salt will do the job.

Stress testing

Stress testing tries to break the system under test by overwhelming its resources or by taking resources away from it (in which case it is sometimes called negative testing). The main purpose behind this madness is to make sure that the system fails and recovers gracefully -- this quality is known as recoverability.

Where performance testing demands a controlled environment and repeatable measurements, stress testing joyfully induces chaos and unpredictability. To take again the example of a Web application, here are some ways in which stress can be applied to the system:
  • double the baseline number for concurrent users/HTTP connections
  • randomly shut down and restart ports on the network switches/routers that connect the servers (via SNMP commands for example)
  • take the database offline, then restart it
  • rebuild a RAID array while the system is running
  • run processes that consume resources (CPU, memory, disk, network) on the Web and database servers
I'm sure devious testers can enhance this list with their favorite ways of breaking systems. However, stress testing does not break the system purely for the pleasure of breaking it, but instead it allows testers to observe how the system reacts to failure. Does it save its state or does it crash suddenly? Does it just hang and freeze or does it fail gracefully? On restart, is it able to recover from the last good state? Does it print out meaningful error messages to the user, or does it merely display incomprehensible hex codes? Is the security of the system compromised because of unexpected failures? And the list goes on.

Conclusion

I am aware that I only scratched the surface in terms of issues, tools and techniques that deserve to be mentioned in the context of performance, load and stress testing. I personally find the topic of performance testing and tuning particularly rich and interesting, and I intend to post more articles on this subject in the future.

Friday, February 25, 2005

Web app testing with Python part 1: MaxQ

I intend to write a series of posts on various Web app testing tools that use Python/Jython. I'll start by covering MaxQ, then I'll talk about mechanize, Pamie, Selenium and possibly other tools.

First of all, I'll borrow Bret Pettichord's terminology by saying that there are two main classes of Web app testing tools:
  1. Tools that simulate browsers (Bret calls them "Web protocol drivers") by implementing the HTTP request/response protocol and by parsing the resulting HTML
  2. Tools that automate browsers ("Web browser drivers") by driving them for example via COM calls in the case of Internet Explorer
Examples of browser simulators:
Examples of browser drivers:
  • Pamie (Python), which is based on Samie (Perl): IE automation via COM
  • Watir (Ruby): IE automation via COM
  • JSSh (Mozilla C++ extension module): Mozilla automation via JavaScript shell connections
One tool that doesn't fit neatly in these categories is Selenium from ThoughtWorks. I haven't experimented with it yet, so all I can say is that it has several components:
  • Server-side instrumentation adapted to the particular flavor of the Web server under test
  • JavaScript automation engine -- the "Browser Bot" -- that runs directly in the browser and parses tests written as HTML tables (a la FIT)
The main advantage that Selenium has over the other tools I mentioned is that it's cross-platform, cross-browser. The main disadvantage is that it requires server-side instrumentation. The syntax used by the Selenium JavaScript engine inside the browser is called "Selenese". In Bret Pettichord's words from his blog:

"You can also express Selenium tests in a programming language, taking advantage of language-specific drivers that communicate in Selenese to the browser bot. Java and Ruby drivers have been released, with dot-Net and Python drivers under development. These drivers allow you to write tests in Java, Ruby, Python, C#, or VB.net."

Jason Huggins, the main Selenium developer, is at the same time a Plone developer. He pointed me to the Python code already written for Selenium. Right now it's only available via subversion from svn://beaver.codehaus.org/selenium/scm/trunk. I checked it out, but I haven't had a chance to try it yet. It's high on my TODO list though, so stay tuned...

One issue that almost all browser simulator tools struggle with is dealing with JavaScript. In my experience, their HTML parsing capabilities tend to break down when faced with rich JavaScript elements. This is one reason why Wilkes Joiner, one of the creators of jWebUnit, said that jWebUnit ended up being used for simple "smoke test"-type testing that automates basic site navigation, rather than for more complicated acceptance/regression testing. No browser simulator tool I know of supports all of the JavaScript constructs yet. But if the Web application you need to test does not make heavy use of JavaScript, then these tools might prove enough for the job.

Browser driver tools such as Watir, Samie and Pamie do not have the JavaScript shortcoming, but of course they are limited to IE and Windows. This may prove too restrictive, especially in view of the recent Firefox resurgence. I haven't used the Mozilla-based JSSh tool yet.

The tool I want to talk about in this post is MaxQ. I found out about it from Titus Brown's blog. MaxQ belongs to the browser simulator category, but it is different from the other tools I mentioned in that it uses a proxy to capture HTTP requests and replies. One of its main capabilities is record/playback of scripts that are automatically written for you in Jython while you are browsing the Web site under test. The tests can then be run either using the GUI version of the tool (which also does the capture), or from the command line.

MaxQ is written in Java, but the test scripts it generates are written in Jython. This is a typical approach taken by other tools such as The Grinder and TestMaker. It combines the availability of test libraries for Java with the agility of a scripting language such as Jython. It is a trend that I see gaining more traction in the testing world as Jython breaks more into the mainstream.

MaxQ's forte is in automating HTTP requests (both GET and POST) and capturing the response codes, as well as the raw HTML output. It does not attempt to parse HTML into a DOM object, as other tools do, but it does offer the capability of verifying that a given text or URI exists in the HTTP response. There is talk on the developer's mailing list about extending MaxQ with HttpUnit, so that it can offer more finely-grained control over HTML elements such as frames and tables. MaxQ does not support HTTPS at this time.

One question you might have (I know I had it) is why should you use MaxQ when other tools offer more capabilities, at least in terms of HTML parsing. Here are some reasons:
  • The record/playback feature is very helpful; the fact that the tool generates Jython code makes it very easy to modify it by hand later and maintain it
  • MaxQ retrieves all the elements referenced on a given Web page (images, CSS), so it makes it easy to test that all links to these objects are valid
  • Form posting is easy to automate and verify
  • The fact that MaxQ does not do HTML parsing is sometimes an advantage, since HTML parsing is brittle (especially when dealing with JavaScript), and relying on HTML parsing makes your tests fragile and prone to break whenever the HTML elements are modified
In short, I would say that you should use MaxQ whenever you are more interested in testing the HTTP side of your Web application, and not so much the HTML composition of your pages.

Short MaxQ tutorial

As an example of the application under test, I will use a fresh installation of Bugzilla and I will use MaxQ to test a simple feature: running a Bugzilla query with a non-existent summary results in an empty results page.
Install MaxQ

I downloaded and installed MaxQ on a Windows XP box. I already had the Java SDK installed. To run MaxQ, go to a command prompt, cd to the bin sub-directory and type maxq.bat. This will launch the proxy process, which by default listens on port 8090. It will also launch the MaxQ Java GUI tool.

In the GUI tool, go to File->New to start either a "standard" script or a "compact" script. The difference is that the standard script will include HTTP requests for all the elements referenced on every Web page you visit (such as images or CSS), whereas the compact script will only include one HTTP request per page, to the page URL. The compact script also lives up to its name by aggregating the execution of the HTTP request and the validation of the response in one line of code.

To start a recording session, go to Test->Start Recording.

Now configure your browser to use a proxy on localhost:8090.

Record the test script

For my first test, I created a new standard script and MaxQ generated this code:

# Generated by MaxQ [com.bitmechanic.maxq.generator.JythonCodeGenerator]
from PyHttpTestCase import PyHttpTestCase
from com.bitmechanic.maxq import Config
from org.python.modules import re
global validatorPkg
if __name__ == 'main':
validatorPkg = Config.getValidatorPkgName()
# Determine the validator for this testcase.
exec 'from '+validatorPkg+' import Validator'


# definition of test class
class test_bugzilla_empty_search(PyHttpTestCase):
def runTest(self):
self.msg('Test started')

# ^^^ Insert new recordings here. (Do not remove this line.)


# Code to load and run the test
if __name__ == 'main':
test = MaxQTest("MaxQTest")
test.Run()

Note that the test class is derived from PyHttpTestCase, a Jython class that is itself derived from a Java class: HttpTestCase. What HttpTestCase does is encapsulate the HTTP request/response functionality. Its two main methods are get() and post(), but it also offers helper methods such as responseContains(text) or responseContainsURI(uri), which verify that a given text or URI is present in the HTTP request.

I started recording, then I went to http://example.com/bugs in my browser (real URL omitted) and got to the main Bugzilla page. I then clicked on the "Query existing bug reports" link to go to the Search page. I entered "nonexistentbug!!" in the Summary field, then clicked Search. I got back a page containing the text "Zarro Boogs found."

While I was busily navigating the Bugzilla pages and posting the Search query, this is what MaxQ automatically recorded for me:

# Generated by MaxQ [com.bitmechanic.maxq.generator.JythonCodeGenerator]
from PyHttpTestCase import PyHttpTestCase
from com.bitmechanic.maxq import Config
from org.python.modules import re
global validatorPkg
if __name__ == 'main':
validatorPkg = Config.getValidatorPkgName()
# Determine the validator for this testcase.
exec 'from '+validatorPkg+' import Validator'


# definition of test class
class test_bugzilla_empty_search(PyHttpTestCase):
def runTest(self):
self.msg('Test started')
self.msg("Testing URL: %s" % self.replaceURL('''http://example.com/bugs'''))
url = "http://example.com/bugs"
params = None
Validator.validateRequest(self, self.getMethod(), "get", url, params)
self.get(url, params)
self.msg("Response code: %s" % self.getResponseCode())
self.assertEquals("Assert number 1 failed", 301, self.getResponseCode())
Validator.validateResponse(self, self.getMethod(), url, params)

self.msg("Testing URL: %s" % self.replaceURL('''http://example.com/bugs/query.cgi'''))
url = "http://example.com/bugs/query.cgi"
params = None
Validator.validateRequest(self, self.getMethod(), "get", url, params)
self.get(url, params)
self.msg("Response code: %s" % self.getResponseCode())
self.assertEquals("Assert number 2 failed", 200, self.getResponseCode())
Validator.validateResponse(self, self.getMethod(), url, params)

params = [
('''short_desc_type''', '''allwordssubstr'''),
('''short_desc''', '''nonexistentbug!!!'''),
('''long_desc_type''', '''allwordssubstr'''),
('''long_desc''', ''''''),
('''bug_file_loc_type''', '''allwordssubstr'''),
('''bug_file_loc''', ''''''),
('''bug_status''', '''NEW'''),
('''bug_status''', '''ASSIGNED'''),
('''bug_status''', '''REOPENED'''),
('''emailassigned_to1''', '''1'''),
('''emailtype1''', '''substring'''),
('''email1''', ''''''),
('''emailassigned_to2''', '''1'''),
('''emailreporter2''', '''1'''),
('''emailcc2''', '''1'''),
('''emailtype2''', '''substring'''),
('''email2''', ''''''),
('''bugidtype''', '''include'''),
('''bug_id''', ''''''),
('''votes''', ''''''),
('''changedin''', ''''''),
('''chfieldfrom''', ''''''),
('''chfieldto''', '''Now'''),
('''chfieldvalue''', ''''''),
('''cmdtype''', '''doit'''),
('''order''', '''Reuse same sort as last time'''),
('''field0-0-0''', '''noop'''),
('''type0-0-0''', '''noop'''),
('''value0-0-0''', ''''''),]
self.msg("Testing URL: %s" % self.replaceURL('''http://example.com/bugs/buglist.cgi?short_desc_type=allwordssubstr&short_desc=nonexistentbug!!!&long_desc_type=allwordssubstr&long_desc=&bug_file_loc_type=allwordssubstr&bug_file_loc=&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&emailassigned_to1=1&emailtype1=substring&email1=&emailassigned_to2=1&emailreporter2=1&emailcc2=1&emailtype2=substring&email2=&bugidtype=include&bug_id=&votes=&changedin=&chfieldfrom=&chfieldto=Now&chfieldvalue=&cmdtype=doit&order=Reuse same sort as last time&field0-0-0=noop&type0-0-0=noop&value0-0-0='''))
url = "http://example.com/bugs/buglist.cgi"
Validator.validateRequest(self, self.getMethod(), "get", url, params)
self.get(url, params)
self.msg("Response code: %s" % self.getResponseCode())
self.assertEquals("Assert number 3 failed", 200, self.getResponseCode())
Validator.validateResponse(self, self.getMethod(), url, params)

# ^^^ Insert new recordings here. (Do not remove this line.)


# Code to load and run the test
if __name__ == 'main':
test = test_bugzilla_empty_search("test_bugzilla_empty_search")
test.Run()

The generated script is a bit on the verbose side. Note that getting and verifying the HTTP request for http://example.com/bugs takes 8 lines:

   
self.msg("Testing URL: %s" % self.replaceURL('''http://example.com/bugs'''))
url = "http://example.com/bugs"
params = None
Validator.validateRequest(self, self.getMethod(), "get", url, params)
self.get(url, params)
self.msg("Response code: %s" % self.getResponseCode())
self.assertEquals("Assert number 1 failed", 301, self.getResponseCode())
Validator.validateResponse(self, self.getMethod(), url, params)

This is where the compact script form comes in handy. The equivalent compact expression is:

self.get('http://example.com/bugs', None, 301)

MaxQ shines at retrieving form fields (even hidden ones), filling them with the values given by the user and submitting the form via an HTTP POST operation. This is what the second part of the generated Jython script does.

I manually added this line before the "Insert new recordings" line:

assert self.responseContains("Zarro Boogs found")
This shows how to use the responseContains helper method from the HttpTestCase class in order to verify that the returned page contains a given page.

You can also do an ad-hoc validation on the returned HTML by using a regular expression applied to the raw HTML (which can be retrieved via the getResponse() method). So you can do something like this:

assert re.search(r'Zarro', self.getResponse())
Caveat: a simple "import re" will not work; you need to import the re module like this:
from org.python.modules import re
Run the test script

When you are done browsing the target Web site for the functionality you want to test, go to Test->Stop Recording. You will be prompted for a file name. I chose test_bugzilla_empty_search.py. At this point, you can run the Jython test script inside the MaxQ GUI by going to Test->Run. The output is something like:

Test started
Testing URL: http://example.com/bugs
Response code: 301
Testing URL: http://example.com/bugs/query.cgi
Response code: 200
Testing URL: http://example.com/bugs/buglist.cgi?short_desc_type=allwordssubstr&short_desc=nonexistentbug!!!&long_desc_type=allwordssubstr&long_desc=&bug_file_loc_type=allwordssubstr&bug_file_loc=&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&emailassigned_to1=1&emailtype1=substring&email1=&emailassigned_to2=1&emailreporter2=1&emailcc2=1&emailtype2=substring&email2=&bugidtype=include&bug_id=&votes=&changedin=&chfieldfrom=&chfieldto=Now&chfieldvalue=&cmdtype=doit&order=Reuse same sort as last time&field0-0-0=noop&type0-0-0=noop&value0-0-0=
Response code: 200
Test Ran Successfully
You can also run the script at the command line by invoking:
maxq -r test_bugzilla_empty_search.py
Conclusion

I think MaxQ is a useful tool for regression-testing simple Web site navigation and form processing. Its record/playback feature is very helpful in taking away from the tediousness of manually generating test scripts (as an aside, TestMaker uses the MaxQ capture/playback engine for its own functionality.) The fact that the script language is Jython is a big plus, since testers can enhance the generated scripts with custom Python logic. The source code is clean and easy to grasp, and development is active at maxq.tigris.org.

Another nifty feature I haven't mentioned is that it is easy to add your own script generator plugins. All you need to do is write a Java class derived from AbstractCodeGenerator, put it in java/com/bitmechanic/maxq/generator under the main maxq directory, recompile maxq.jar via ant, then add the class to conf/maxq.properties in the generator.classnames section. The MaxQ GUI tool will then automatically pick up your generator at run time and offer it in the File->New menu. For an example of a custom generator, see Titus Brown's PBP script generator.

On the minus side, MaxQ is not the best tool to use if you need fine-grained control over HTML elements such as links, tables and frames. If you need this functionality, you are better off using a tool such as HttpUnit or HtmlUnit and drive it from Jython. If instead of Jython you want to use pure Python, you can use mechanize or webunit, which I'll discuss in a future post.

Sunday, February 20, 2005

Articles and tutorials

My posts are starting to be archived and hard to find, so I thought of putting up this page with links to the various articles and tutorials that I posted so far.

Unit testing
Unit testing in Python part 1: unittest
Unit testing in Python part 2: doctest
Unit testing in Python part 3: py.test
Michael Feathers on unit testing rules

Acceptance testing with FitNesse
PyFIT/FitNesse Tutorial Part 1
PyFIT/FitNesse Tutorial Part 2
PyFIT/FitNesse Tutorial Part 3

Web application testing
Web app testing with Jython and HttpUnit
Web app testing with Python part 1: MaxQ
Web app testing with Python part 3: twill
Acceptance tests for Web apps: GUI vs. business logic

Selenium-specific articles
Web app testing with Python part 2: Selenium and Twisted
Quick update on Selenium in TestRunner mode
Quick update on Selenium in Twisted Server mode
Using Selenium to test a Plone site (part 1)
Using Selenium to test a Plone site (part 2)
New features in Selenium 0.3
Article on Selenium in Oct. 2005 issue of "Better Software"

Performance/load/stress testing
pyUnitPerf Tutorial
Performance vs. load vs. stress testing
HTTP performance testing with httperf, autobench and openload
More on performance vs. load testing

Automated test distribution, execution and reporting
STAF/STAX Tutorial

General testing topics
Quick black-box testing example
White-box vs. black-box testing

Agile Documentation
Agile Documentation with doctest and epydoc
Agile documentation in the Django project

Databases
Installing and using cx_Oracle on Unix
Installing Python 2.4.1 and cx_Oracle on AIX
Installing the Firebird database on a 64-bit RHEL Linux server

The py library
Keyword-based logging with the py library
py lib gems: greenlets and py.xml
'py library overview' slides

Python on Windows
Handling the 'Path' Windows registry value correctly
Running a Python script as a Windows service

System Administration HOWTOS
Telecommuting via ssh tunneling
Managing DNS zone files with dnspython
Configuring OpenLDAP as a replacement for NIS
Chroot-ed FTP with wu-ftpd
System monitoring via SNMP
Compiling and installing a custom Linux kernel
Configuring Apache 2 and Tomcat 5.5 with mod_jk

Data visualization
sparkplot: creating sparklines with matplotlib

Usability
Jakob Nielsen on Usability Testing
Jakob Nielsen on Blog Usability

Other articles

Python as an agile language
Oblique Strategies and testing

Wednesday, February 16, 2005

Agile Documentation with doctest and epydoc

This post was inspired by an article I read in the Feb. 2005 issue of Better Software: "Double Duty" by Brian Button. The title refers to having unit tests serve the double role of testing and documentation. Brian calls this Agile Documentation. For Python developers, this is old news, since the doctest module already provides what is called "literate testing" or "executable documentation". However, Brian also introduces some concepts that I think are worth exploring: Test Lists and Tests Maps.

Test Lists

A Test List tells a story about the behavior expected from the module/class under test. It is composed of one-liners, each line describing what a specific unit test tries to achieve. For example, in the case of a Blog management application, you could have the following (incomplete) Test List:

  • Deleting all entries results in no entries in the blog.
  • Posting single entry results in single valid entry.
  • Deleting a single entry by index results in no entries in the blog.
  • Posting new entry results in valid entry and increases the number of entries by 1.
  • Etc.

I find it very valuable to have such a Test List for every Python module that I write, especially if the list is easy to generate from the unit tests that I write. I will show later in this post how the combination of doctest and epydoc makes it trivial to achieve this goal.

Test Maps

A Test Map is a list of unit tests associated with a specific function/method under test. It helps you see how that specific function/method is being exercised via unit tests. A Test Map could look like this:

Testmap for method delete_all_entries:

  • test_delete_all_entries
  • test_delete_single_entry
  • test_post_single_entry
  • test_post_two_entries
  • test_delete_first_of_two_entries
  • test_delete_second_of_two_entries

Generating Test Lists

As an example of a module under test, I will use the Blog management application that I discussed in several previous posts. The source code can be found here. I have a directory called blogmgmt which contains a module called blogger.py. The blogger module contains several classes, the main one being Blogger, and a top-level function called get_blog. I also created an empty __init__.py file, so that blogmgmt can be treated as a package. I wrote a series of doctest-based tests for the blogger module in a file I called testlist_blogger.py. Here is part of that file:



"""
Doctest unit tests for module L{blogger}
"""

def test_get_blog():
"""
get_blog() mimics a singleton by always returning the same object.

Function(s) tested:
- L{blogger.get_blog}


>>> from blogger import get_blog
>>> blog1 = get_blog()
>>> blog2 = get_blog()
>>> id(blog1) == id(blog2)
True

"""

def test_get_feed_title():
"""
Can retrieve the feed title.

Method(s) tested:
- L{blogger.Blogger.get_title}

>>> from blogger import get_blog
>>> blog = get_blog()
>>> print blog.get_title()
fitnessetesting

"""

def test_delete_all_entries():
"""
Deleting all entries results in no entries in the blog.

Method(s) tested:
- L{blogger.Blogger.delete_all_entries}
- L{blogger.Blogger.get_num_entries}

>>> from blogger import get_blog
>>> blog = get_blog()
>>> blog.delete_all_entries()
>>> print blog.get_num_entries()
0

"""

def test_post_new_entry():
"""
Posting new entry results in valid entry and increases the number of entries by 1.

Method(s) tested:
- L{blogger.Blogger.post_new_entry}
- L{blogger.Blogger.get_nth_entry_title}
- L{blogger.Blogger.get_nth_entry_content_strip_html}
- L{blogger.Blogger.get_num_entries}

>>> from blogger import get_blog
>>> blog = get_blog()
>>> init_num_entries = blog.get_num_entries()
>>> rc = blog.post_new_entry("Test title", "Test content")
>>> print rc
True
>>> print blog.get_nth_entry_title(1)
Test title
>>> print blog.get_nth_entry_content_strip_html(1)
Test content
>>> num_entries = blog.get_num_entries()
>>> num_entries == init_num_entries + 1
True

"""

Each unit test function is composed of a docstring and nothing else. The docstring starts with a one-line description of what the unit test tries to achieve. The docstring continues with a list of methods/functions tested by that unit test. Finally, the interactive shell session output is copied and pasted into the docstring so that it can be processed by doctest.

For the purpose of generating a Test List, only the first line in each docstring is important. If you simply run

epydoc -o blogmgmt testlist_blogger.py

you will get a directory called blogmgmt that contains the epydoc-generated documentation. I usually then move this directory somewhere under the DocumentRoot of one of my Apache Virtual Servers. When viewed in a browser, this is what the epydoc page for the summary of the testlist_blogger module looks like this (also available here):

Module blogmgmt.testlist_blogger

Doctest unit tests for module blogger


Function Summary


test_delete_all_entries()
Deleting all entries results in no entries in the blog.


test_delete_first_of_two_entries()
Posting two entries and deleting entry with index 1 leaves oldest entry in place.


test_delete_second_of_two_entries()
Posting two entries and deleting entry with index 2 leaves newest entry in place.


test_delete_single_entry()
Deleting a single entry by index results in no entries in the blog.


test_get_blog()
get_blog() mimics a singleton by always returning the same object.


test_get_feed_posting_host()
Can retrieve the feed posting host.


test_get_feed_posting_url()
Can retrieve the feed posting URL.


test_get_feed_title()
Can retrieve the feed title.


test_post_new_entry()
Posting new entry results in valid entry and increases the number of entries by 1.


test_post_single_entry()
Posting single entry results in single valid entry.


test_post_two_entries()
Posting two entries results in 2 valid entries ordered most recent first.


This is exactly the Test List we wanted. Note that epydoc dutifully generated it for us, since in the Function Summary section it shows the name of every function it finds, plus the first line of that function's docstring. The main value of this Test List for me is that anybody can see at a glance what the methods of the Blogger class are expected to do. It's a nice summary of expected class behavior that enhances the documentation.

So all you need to do to get a nicely formatted Test List is to make sure that you have the test description as the first line of the unit test's docstring; epydoc will then do the grungy work for you.

If you click on the link with the function name on it, you will go to the Function Detail section and witness the power of doctest/epydoc. Since all the tests are copied and pasted from an interactive session and included in the docstring, epydoc will format the docstring very nicely and it will even color-code the blocks of code. Here is an example of the detail for test_delete_all_entries.

Generating Test Maps

Each docstring in the testlist_blogger module contains lines such as these:


Method(s) tested:
- L{blogger.Blogger.post_new_entry}
- L{blogger.Blogger.get_nth_entry_title}
- L{blogger.Blogger.get_nth_entry_content_strip_html}
- L{blogger.Blogger.get_num_entries}

(the L{...} notation is epydoc-specific and represents a link to another object in the epydoc-generated documentation)

The way I wrote the unit tests, each of them actually exercises several functions/methods from the blogger module. Some unit test purists might think these are not "real" unit tests, but in practice I found it is easier to work this way. For example, the get_blog function is called by each and every unit test in order to retrieve the same "blog" object. However, I am not specifically testing get_blog in every unit test, only calling it as a helper function. The way I see it, a method is tested when there is an assertion made about its behavior. All the other methods are merely called as helpers.

So whenever I write a unit test, I manually specify the list of methods/functions under test. This makes it easy to then parse the testlist file and build a mapping from each function/method under test to a list of unit tests that test it, i.e. what we called the Test Map.

For example, in the testlist_blogger module, the Blogger.delete_all_entries method is listed in the docstrings of 6 unit tests: test_delete_all_entries, test_delete_single_entry, test_post_single_entry, test_post_two_entries, test_delete_first_of_two_entries, test_delete_second_of_two_entries. These 6 unit test represent the Test Map for Blogger.delete_all_entries. It's easy to build the Test Map programatically by parsing the testlist_blogger.py file and creating a Python dictionary having the methods under tests as keys and the unit test lists corresponding to them as values.

An issue I had while putting this together was how to link a method in the Blogger class (for example Blogger.delete_all_entries) to its Test Map. One way would have been to programatically insert the Test Map into the docstring for that method. But this would mean that every time a new unit test is added that tests that method, the Test Map will change and thus the module containing the Blogger class will get changed. This is unfortunate especially when the files are under source control. I think a better solution, and the one I ended up implementing, is to have a third module called for example testmap_blogger that will be automatically generated from testlist_blogger. A method M in the Blogger class will then link to a single function in testmap_blogger. That function will contain in its docstring the Test Map for the Blogger method M.

Again, an example to make all this clearer. Here is the docstring of the Blogger.delete_all_entries method in the blogger module:

"""
Delete all entries in the blog

Test map (set of unit tests that exercise this method):
- L{testmap_blogger.testmap_Blogger_delete_all_entries}
"""


Here is the epydoc-generated documentation for the Blogger.delete_all_entries method (in the Method Details section):

delete_all_entries(self)

Delete all entries in the blog

Test map (set of unit tests that exercise this method):


I manually inserted in the docstring an epydoc link to a function called testmap_Blogger_delete_all_entries in a module called testmap_blogger. Assuming that the testmap_blogger module was already generated and epydoc-documented, clicking on the link will bring up the epydoc detail for that particular function, which contains the 6 unit tests for te delete_all_entries method:

testmap_Blogger_delete_all_entries()

Testmap for blogger.Blogger.delete_all_entries:

Here is the programatically-generated testmap_blogger.py file.

To have all this mechanism work, I use some naming conventions:

  • The module containing the Test Maps for module blogger is called testmap_blogger
  • In testmap_blogger, the function containing the Test Map for method Blogger.M from the blogger module is called testmap_Blogger_M
  • In testmap_blogger, the function containing the Test Map for function F from the blogger module is called testmap_F
  • In the docstring of the testmap function itself there is a link which points back to the method Blogger.M; the name of the link needs to be blogger.Blogger.M, otherwise epydoc will not find it

Here's an end-to-end procedure for using the doctest/epydoc combination to write Agile Documentation:

1. We'll unit test a Python module we'll call P which contains a class C.

2. We start by writing a unit test for the method C.M1 from the P module. We write the unit test by copying and pasting a Python shell session output in another Python module called testlist_P. We call the unit test function test_M1. It looks something like this:

def test_M1():
"""
Short description of the behavior we're testing for M1.

Method(s) tested:
- L{P.C.M1}

>>> from P import C
>>> c = C()
>>> rc = c.M1()
>>> print rc
True

"""

The testlist_P module has a "main" section of the form:

if __name__ == "__main__":
import doctest
doctest.testmod()

This is the typical doctest way of running unit tests. To actually execute the tests, we need to run "python testlist_P.py" at a command line (for more details on doctest, see a previous blog post).

3. At this point, we fleshed out an initial implementation for method M1 in module P. In its docstring, we add a link to the test map:

def M1(self):
"""
Short description of M1

Test map (set of unit tests that exercise this method):
- L{testmap_P.testmap_C_M1}

"""

Note that I followed the naming convention I described earlier.

4. We programatically generate the Test Map for module P by running something like this: build_testmap.py. It will create a file called testmap_P.py with the following content:

def testmap_C_M1():
"""
Testmap for L{P.C.M1}:

- L{testlist_P.test_M1}
"""


5. We run epydoc:

epydoc -o P_docs P.py testlist_P.py testmap_P.py

A directory called P_docs will be generated; we can move this directory to a public area of our Web server and thus make the documentation available online. When we click on the testlist_P
module link, we will see the Test List for module P. It will show something like:

Module P_docs.testlist_P

Doctest unit tests for module P


Function Summary


test_M1()
Short description of the behavior we're testing for M1.


When we click on the test map link inside the docstring of method C.M1, we see:

testmap_C_M1()

Testmap for P.C.M1:


Now repeat steps 2-5 for method M2:

6. Let's assume we now unit test method M2, but in the process we also test method M1. The function test_M2 will look something like this:

def test_M2():
"""
Short description of the behavior we're testing for M2.

Method(s) tested:
- L{P.C.M1}
- L{P.C.M2}

>>> from P import C
>>> c = C()
>>> rc = c.M1()
>>> print rc
True
>>> rc = c.M2()
>>> print rc
True

"""

We listed both methods in the "Method(s) tested" section.

7. We add a link to the testmap in method M2's docstring (in module P):

def M2(self):
"""
Short description of M2

Test map (set of unit tests that exercise this method):
- L{testmap_P.testmap_C_M2}
"""

8. We recreate the testmap_P file by running build_testmap.py. The testmap for M1 will now contain 2 functions: test_M1 and test_M2, while the testmap for M2 will contain test_M2:

def testmap_C_M1():
"""
Testmap for L{P.C.M1}:

- L{testlist_P.test_M1}
- L{testlist_P.test_M2}
"""

def testmap_C_M2():
"""
Testmap for L{P.C.M2}:

- L{testlist_P.test_M2}
"""


9. We run epydoc again:

epydoc -o P_docs P.py testlist_P.py testmap_P.py

Now clicking on testlist_P will show:

Module P_docs.testlist_P

Doctest unit tests for module P


Function Summary


test_M1()
Short description of the behavior we're testing for M1.


test_M2()
Short description of the behavior we're testing for M2.


Clicking on the test map link inside the docstring of method C.M1 shows:

testmap_C_M1()

Testmap for P.C.M1:

10. Repeat steps 2-5 for each unit test that you add to the testlist_P module.

Conclusion

I find the combination doctest/epydoc very powerful and easy to use in generating Agile Documentation, or "literate testing", or "executable documentation", or however you want to call it. The name is not important, but what you can achieve with it is: a way of documenting your APIs by means of unit tests that live in your code as docstrings. It doesn't get much more "agile" than this. Kudos to the doctest developers and to Edward Loper, the author of epydoc. Also, kudos to Brian Button for his insightful article which inspired my post. Brian's examples used .NET, but hopefully he'll switch to Python soon :-)

If you want to see the full documentation I generated for my blogmgmt package, you can find it here.

Saturday, February 12, 2005

L.A. Piggies? Trying to organize Python Interest Group in L.A./O.C.

If anybody is interested in putting together a Python Interest Group in the Los Angeles/Orange County area, please let me know. Add a comment to this post or send me email at grig at gheorghiu.net.

Thursday, February 10, 2005

Python as an agile language

Here are some ideas on why I think Python is an agile language. I use the term agile as in "agile software development practices", best exemplified by Extreme Programming. I find this definition by Ron Jeffries, from his article "What is Extreme Programming", particularly illuminating:

Extreme Programming is a discipline of software development based on values of simplicity, communication, feedback, and courage. It works by bringing the whole team together in the presence of simple practices, with enough feedback to enable the team to see where they are and to tune the practices to their unique situation.

Let's see how Python fares in light of the 4 core XP values: simplicity, communication, feedback and courage.

1. Python fosters simplicity
  • Clean and simple syntax reads like pseudo-code
  • Built-in high level data types (strings, lists, tuples, dictionaries) make it possible to pack a lot of functionality in very few lines of code, without sacrificing readability
  • As an exercise, try to port Java code to Jython: you will see a significant reduction in line count (as much as 40% in my experience)
2. Python fosters communication
  • Powerful yet simple idioms enable developers to clearly communicate their intentions through code
  • Python just lets you code and doesn't get in your way -- see ESR's classic "Why Python?" article
  • Standard coding style enforced by significant whitespace enables people to better read code, maintain it, and communicate about it
  • Modules such as doctest provide "executable documentation"
3. Python fosters feedback
  • Dynamic, interpreted nature of the language shortens development cycle and closes feedback loop more quickly
  • Interactive shell session provides instantaneous feedback
  • Various unit test frameworks (unittest, doctest, py.test) are available for feedback via frequent unit testing
4. Python fosters courage
  • This really stems from the other 3 values: if you can write code that is simple, provides quick feedback and can be easily understood by your peers, then you have the courage to "go confidently in the direction of your dreams", to quote Thoreau
  • Courage in the XP sense also means having the guts to throw away code that doesn't work and start afresh; since the simple act of coding in Python produces pure pleasure, it follows that throwing code away and starting to code anew will be felt not as a chore, but as a chance to improve and, why not, attain enlightenment
These are just a few ideas and I'm sure people can come up with many more. How did Python improve your life as an agile software developer? Jump in with a comment or send me email at grig at gheorghiu dot net.

Wednesday, February 09, 2005

New Google group: extreme-python

Troy Frever from Aviarc Corporation posted a message to the fitnesse mailing list announcing that he created a Google group for topics related to both Python and Agile methodologies (Extreme Programming and others).

Appropriately enough, the name of the group is extreme-python. Visit http://groups-beta.google.com/group/extreme-python/ , take a look, post a message.

Wednesday, February 02, 2005

Web app testing with Jython and HttpUnit

There's been a lot of talk recently about "dynamic Java", by which people generally mean driving the JVM by means of a scripting language (see Tim Bray's post and Sean McGrath's post on this topic). One of the languages leading the pack in this area is Jython (the other one is Groovy). In fact, a Java Republic poll asking "What is your scripting language for Java for 2004?" has Jython as the winner with 59% of the votes.

Update: As a coincidence, while writing this post, I came across this blog entry: Gosling on JVM scripting

Jython is also steadily making inroads into the world of test frameworks. It is perhaps no coincidence that in a talk given at Stanford, Guido van Rossum lists "Testing (popular area for Jython)" on the slide that talks about Python Sample Use Areas. Because Jython combines the agility of Python with easy access to the Java libraries, it is the scripting language of choice for test tools such as The Grinder v3, TestMaker, Marathon and STAF/STAX.

I want to show here how to use Jython for interactively driving a Java test tool (HttpUnit) in order to verify the functionality of a Web application.

HttpUnit is a browser simulator written in Java by Russell Gold. It is used in the Java world for functional, black-box type testing of Web applications. Although its name contains "Unit", it is not a unit test tool, but it is often used in conjunction with the jUnit framework. The canonical way of using HttpUnit is to write jUnit tests that call various HttpUnit components in order to mimic the actions of a browser. These individual tests can then be aggregated into test suites that will be run by the jUnit framework. Building all this scaffolding takes some time, and compiling the Java code after each change adds other delays.

In what follows, I want to contrast the Java-specific HttpUnit usage with the instantaneous feedback provided by working in the Jython shell and with the near-zero overhead that comes with writing Python doctest tests. The functionality I will test is a search for Python books on amazon.com.

Step 1: Install Jython

- The machine I ran my tests on is a Linux server running Red Hat 9 which already had the Java 1.4.2_04 SDK installed in /usr/java/j2sdk1.4.2_04
- I downloaded Jython 2.1 from its download site and I put the file jython_21.class in /usr/local
- I cd-ed into /usr/local and ran the command-line installer, specifying Jython-2.1 as the target directory:
[root@concord root]# cd /usr/local

[root@concord local]# which java
/usr/java/j2sdk1.4.2_04/bin/java
[root@concord local]# java jython_21 -o Jython-2.1 demo lib source
try path /usr/local/
Done
[root@concord local]# ls Jython-2.1/
ACKNOWLEDGMENTS Doc jython.jar org Uninstall.class
cachedir installer Lib README.txt
com jython LICENSE.txt registry
Demo jythonc NEWS Tools
- I added /usr/local/Jython-2.1 to the PATH environment variable in .bash_profile and I sourced that file:
[root@concord root]# . ~/.bash_profile

[root@concord root]# which jython
/usr/local/Jython-2.1/jython
- I verified that I can run the interactive Jython shell (the first time you run it, it will process the its own jython.jar file, plus all jar files that it finds in $JAVA_HOME/jre/lib):
[root@concord local]# jython

*sys-package-mgr*: processing new jar, '/usr/local/Jython-2.1/jython.jar'
*sys-package-mgr*: processing new jar, '/usr/java/j2sdk1.4.2_04/jre/lib/rt.jar'
*sys-package-mgr*: processing new jar, '/usr/java/j2sdk1.4.2_04/jre/lib/sunrsasign.jar'
*sys-package-mgr*: processing new jar, '/usr/java/j2sdk1.4.2_04/jre/lib/jsse.jar'
*sys-package-mgr*: processing new jar, '/usr/java/j2sdk1.4.2_04/jre/lib/jce.jar'
*sys-package-mgr*: processing new jar, '/usr/java/j2sdk1.4.2_04/jre/lib/charsets.jar'
*sys-package-mgr*: processing new jar, '/usr/java/j2sdk1.4.2_04/jre/lib/ext/dnsns.jar'
*sys-package-mgr*: processing new jar, '/usr/java/j2sdk1.4.2_04/jre/lib/ext/ldapsec.jar'
*sys-package-mgr*: processing new jar, '/usr/java/j2sdk1.4.2_04/jre/lib/ext/localedata.jar'
*sys-package-mgr*: processing new jar, '/usr/java/j2sdk1.4.2_04/jre/lib/ext/sunjce_provider.jar'
Jython 2.1 on java1.4.2_04 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>>
Step 2: Install HttpUnit

- I downloaded HttpUnit 1.6 from its download site and I unzipped the file httpunit-1.6.zip under /root
- The main HttpUnit functionality is contained in the httpunit.jar file in /root/httpunit-1.6/lib and other optional jar files are in /root/httpunit-1.6/jars, so I added all the jar files in these two directories to the CLASSPATH environment variable in .bash_profile. Here is the relevant portion from .bash_profile:
# Set up CLASSPATH for HttpUnit

CLASSPATH=$CLASSPATH:/root/httpunit-1.6/lib/httpunit.jar
CLASSPATH=$CLASSPATH:/root/httpunit-1.6/jars/js.jar
CLASSPATH=$CLASSPATH:/root/httpunit-1.6/jars/servlet.jar
CLASSPATH=$CLASSPATH:/root/httpunit-1.6/jars/Tidy.jar
CLASSPATH=$CLASSPATH:/root/httpunit-1.6/jars/xercesImpl.jar
CLASSPATH=$CLASSPATH:/root/httpunit-1.6/jars/xmlParserAPIs.jar
export CLASSPATH
- I sourced .bash_profile, then I went to the jython shell and verified that the new jar files are seen by Jython:
[root@concord root]# . ~/.bash_profile

[root@concord root]# jython
*sys-package-mgr*: processing new jar, '/root/httpunit-1.6/lib/httpunit.jar'
*sys-package-mgr*: processing new jar, '/root/httpunit-1.6/jars/js.jar'
*sys-package-mgr*: processing new jar, '/root/httpunit-1.6/jars/servlet.jar'
*sys-package-mgr*: processing new jar, '/root/httpunit-1.6/jars/Tidy.jar'
*sys-package-mgr*: processing new jar, '/root/httpunit-1.6/jars/xercesImpl.jar'
*sys-package-mgr*: processing new jar, '/root/httpunit-1.6/jars/xmlParserAPIs.jar'
Jython 2.1 on java1.4.2_04 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>>
- I verified that I can import the httpunit Java package from with a Jython shell session:
>>>  from com.meterware.httpunit import *

>>>
- Nothing was printed to the console, which means that the import succeeded. If CLASSPATH had not been set right and Jython had not been able to process the httpunit.jar file, I would have seen an error similar to this:
Traceback (innermost last):

File "", line 1, in ?
ImportError: No module named meterware
Step 3: Use HttpUnit inside a Jython shell session to test a Web application

This is not a full-fledged HttpUnit tutorial. For people who want to learn more about HttpUnit, I recommend the HttpUnit cookbook and this article by Giora-Katz Lichtenstein on O'Reilly's ONjava.com site.

I will however show you some basic HttpUnit usage patterns. The first thing you do in HttpUnit is open a WebConversation, then send an HTTP request to your Web application and get back the response. Let's do this for www.amazon.com inside a Jython shell:
[root@concord root]# jython

Jython 2.1 on java1.4.2_04 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> from com.meterware.httpunit import *
>>> web_conversation = WebConversation()
>>> request = GetMethodWebRequest("http://www.amazon.com")
>>> response = web_conversation.getResponse(request)
>>> response != None
1
We're already seeing some advantages of using Jython over writing Java code: no type declarations necessary! We're also testing that we get a valid response back by expecting to see 1 when we type response != None.

If we were to print the response variable, we would see the HTTP headers:
>>> print response

HttpWebResponse [url=http://www.amazon.com/exec/obidos/subst/home/home.html/002-1556899-2409632; headers=
CONTENT-TYPE: text/html
CNEONCTION: close
TRANSFER-ENCODING: chunked
SERVER: Stronghold/2.4.2 Apache/1.3.6 C2NetEU/2412 (Unix) amarewrite/0.1 mod_fastcgi/2.2.12
DATE: Thu, 03 Feb 2005 21:34:29 GMT
SET-COOKIE: obidos_path_continue-shopping=continue-shopping-url=/subst/home/home.html/002-1556899-2409632&continue-shopping-post-data=&continue-shopping-description=generic.gateway.default; path=/; domain=.amazon.com
SET-COOKIE: ubid-main=077-3170816-5986942; path=/; domain=.amazon.com; expires=Tuesday, 01-Jan-2036 08:00:01 GMT ]
We could also look at the raw HTML output via response.getText() (I will omit the output, since it takes a lot of space).

At this point, I want to say that testing a Web application via its GUI is a very error-prone endeavor. Any time the name or the position of an HTML element under test changes, the test will break. Generally speaking, testing at the GUI level is notoriously brittle and should only be done when there is a strong chance that the GUI layout and element names will not change. It's almost always better to test the business logic underneath the GUI (assuming the application was designed to clearly separate the GUI logic from the business logic) via a tool such as FitNesse, which can simulate GUI actions without actually going through the GUI.

However, there certainly are cases when one simply cannot skip testing the GUI, and HttpUnit is a decent tool for achieving this goal in the case of a Web application. Let's continue our example and test the search functionality of the main amazon.com Web page. If we were part of a QA team at amazon.com, we would probably expect the HTML design team to hand us a document detailing the layout of the main HTML pages comprising the site and the names of their main elements (forms, frames, etc.) As it is, we need to hunt for this information ourselves by playing with the live site itself and carefully poring through the HTML source of the pages we want to test.

I said before that in HttpUnit we can also get the raw HTML output via response.getText(). The response variable is an instance of the HttpUnit WebResponse class, which offers many useful methods for dealing with HTML elements. We can obtain collections of forms, tables, links, images and other HTML elements, then iterate over them until we find the element we need to test. We can alternatively get a specific element directly from the response by calling methods such as getLinkWithID() or getTableWithID().

If we search for the word "form" inside the HTML page source on the main amazon.com Web page, we see that the search form is called "searchform". We can retrieve this form from the response variable via the getFormWithName() method:
>>> search_form = response.getFormWithName("searchform")

>>> search_form != None
1
We can also see from the HTML page source that the form has two input fields: a drop-down list of values called "url" and an entry field called "field-keywords". We will use the form's setParameter() method to fill both fields with our information: "Books" (which actually corresponds to the value "index=stripbooks:relevance-above") for the drop-down list and "Python" for the entry field:
>>> search_form.setParameter("url", "index=stripbooks:relevance-above")

>>> search_form.setParameter("field-keywords", "Python")
Now we can simulate submitting our information via the form's submit() method:
>>> search_response = search_form.submit()

>>> search_response != None
1
At this point, search_response represents the HTML page containing the 3 most popular search results for "Python", followed by the first 10 of the total number of relevant results (370 results when I tried it).

The HTML source for this page looks confusing to say the least. It's composed of a myriad of tables, which can be eyeballed by this code:
>>> tables = search_response.getTables()

>>> print tables
Let's pretend we're only interested in the 3 most popular search results. If we look carefully through the output returned by print tables, we see that the first cell in the table containing the 3 most popular results is "1.". We can use this piece of information in retrieving the whole table via the getTableStartingWith() method:
>>> most_popular_table = search_response.getTableStartingWith("1.")

>>> most_popular_table != None
1
We can quickly inspect the contents of the table by simply printing it:
>>> print most_popular_table

WebTable:
[0]: [0]=1. [1]=Learning Python, Second Edition -- by Mark Lutz, David Ascher; Paperback
Buy new: $23.07 -- Used & new from: $15.42
[1]: [0]=2. [1]=Python Cookbook -- by Alex Martelli, David Ascher; Paperback
Buy new: $26.37 -- Used & new from: $22.30
[2]: [0]=3. [1]=Python Programming for the Absolute Beginner (Absolute Beginner) -- by Michael Dawson; Paperback
Buy new: $19.79 -- Used & new from: $19.78
We see that the table has 3 rows and 2 columns. We can make this into a test by using the getRowCount() and getColumnCount() methods of the table object:
>>> rows = most_popular_table.getRowCount()

>>> rows == 3
1
>>> columns = most_popular_table.getColumnCount()
>>> columns == 2
1
From the output of print most_popular_table we also see that the second column in each row contains information about the book: title, authors, new price and used price. If we look at the live page on amazon.com, we notice that each title is actually a link. Let's say we want to test the link for each of the 3 top titles. We expect that by clicking on the link we will get back a page with details corresponding to the selected title.

For starters, let's test the first title, the one at row 0. We can retrieve its link by calling the getLinkWith() method of the search_response object, and passing to it the title of the book (which we need to retrieve from the contents of the cell in column 2 via a regular expression):
>>> book_info = most_popular_table.getCellAsText(0, 1)

>>> import re
>>> title = ""
>>> s = re.search("(.*) --", book_info)
>>> if s:
... title = s.group(1)
...
>>> title.find("Python") > -1
1
>>> link = search_response.getLinkWith(title)
>>> link != None
1
Note that we also tested that the title contains "Python". Although this test may fail, it's nevertheless a pretty sure bet that each of the 3 top selling books on Python will have the word "Python" somewhere in their title.

We can now simulate clicking on the link via the link object's click() method. We verify that we get back a non-empty page and also that the HTML title of the book detail page contains the title of the book:
>>> book_details = link.click()

>>> book_details != None
1
>>> page_title = book_details.getTitle()
>>> page_title.find(title) > -1
1
We can test the links for all of the top 3 titles by looping through the rows of most_popular_table:
>>> import re

>>> for i in range(rows):
... book_info = most_popular_table.getCellAsText(i, 1)
... title = ""
... s = re.search("(.*) --", book_info)
... if s:
... title = s.group(1)
... title.find("Python") > -1
... link = search_response.getLinkWith(title)
... link != None
... book_details = link.click()
... book_details != None
... page_title = book_details.getTitle()
... page_title.find(title) > -1
...
1
1
1
1
1
1
1
1
1
1
1
1
>>>

We have 4 test statements which expect 1 as a result in the body of the loop. Since there are 3 rows to inspect, we should expect 12 1's to be printed.

I'll stop here with my example. In a real-life situation, you would want to test much more functionality, but this example should be sufficient to get you going with both HttpUnit and Jython.

Step 4: Use the doctest module to write functional tests

Using the Python doctest module, we can save the Jython interactive session conducted so far into a docstring inside a function that we can call for example test_amazon_search. We can put this function (with an empty body) inside a module called test_amazon.py:
def test_amazon_search():

"""
>>> from com.meterware.httpunit import *
>>> web_conversation = WebConversation()
>>> request = GetMethodWebRequest("http://www.amazon.com")
>>> response = web_conversation.getResponse(request)
>>> response != None
1
>>> search_form = response.getFormWithName("searchform")
>>> search_form != None
1
>>> search_form.setParameter("url", "index=stripbooks:relevance-above")
>>> search_form.setParameter("field-keywords", "Python")
>>> search_response = search_form.submit()
>>> search_response != None
1
>>> tables = search_response.getTables()
>>> tables != None
1
>>> most_popular_table = search_response.getTableStartingWith("1.")
>>> most_popular_table != None
1
>>> rows = most_popular_table.getRowCount()
>>> rows == 3
1
>>> columns = most_popular_table.getColumnCount()
>>> columns == 2
1
>>> for i in range(rows):
... book_info = most_popular_table.getCellAsText(i, 1)
... import re
... title = ""
... s = re.search("(.*) --", book_info)
... if s:
... title = s.group(1)
... title.find("Python") > -1
... link = search_response.getLinkWith(title)
... link != None
... book_details = link.click()
... book_details != None
... page_title = book_details.getTitle()
... page_title.find(title) > -1
...
1
1
1
1
1
1
1
1
1
1
1
1
"""

if __name__ == "__main__":
import doctest, test_amazon
doctest.testmod(test_amazon)
Note that we need to keep in the docstring only those portions of the Jython interactive session which do not change from one test run to another. We can't put there things like print statements that reveal book or title specifics, since these specifics are almost guaranteed to change in the future. We want our test to serve as a functional regression test for the bare-bones search functionality of amazon.com.

An interesting note is that the doctest module is used here to conduct a black-box type of test, whereas traditionally it is used for unit testing.

To fully take advantage of the interactive Jython session in order to later include it in a doctest string, I used the "script" trick. On a Unix system, if you type script at a shell prompt, a file called typescript is generated which will contain everything you type afterwards. When you are done with your "script" session, type exit to go back to the normal shell operation. You can then copy and paste the lines saved in the file typescript. This is especially useful for large outputs which can sometimes make other lines scroll past the current window of the shell.

Running the test_amazon module through Jyhon produces this output:
[root@concord jython]# jython test_amazon.py -v

Running test_amazon.__doc__
0 of 0 examples failed in test_amazon.__doc__
Running test_amazon.test_amazon_search.__doc__
Trying: from com.meterware.httpunit import *
Expecting: nothing
ok
Trying: web_conversation = WebConversation()
Expecting: nothing
ok
Trying: request = GetMethodWebRequest("http://www.amazon.com")
Expecting: nothing
ok
Trying: response = web_conversation.getResponse(request)
Expecting: nothing
ok
Trying: response != None
Expecting: 1
ok
Trying: search_form = response.getFormWithName("searchform")
Expecting: nothing
ok
Trying: search_form != None
Expecting: 1
ok
Trying: search_form.setParameter("url", "index=stripbooks:relevance-above")
Expecting: nothing
ok
Trying: search_form.setParameter("field-keywords", "Python")
Expecting: nothing
ok
Trying: search_response = search_form.submit()
Expecting: nothing
ok
Trying: search_response != None
Expecting: 1
ok
Trying: tables = search_response.getTables()
Expecting: nothing
ok
Trying: tables != None
Expecting: 1
ok
Trying: most_popular_table = search_response.getTableStartingWith("1.")
Expecting: nothing
ok
Trying: most_popular_table != None
Expecting: 1
ok
Trying: rows = most_popular_table.getRowCount()
Expecting: nothing
ok
Trying: rows == 3
Expecting: 1
ok
Trying: columns = most_popular_table.getColumnCount()
Expecting: nothing
ok
Trying: columns == 2
Expecting: 1
ok
Trying:
for i in range(rows):
book_info = most_popular_table.getCellAsText(i, 1)
import re
title = ""
s = re.search("(.*) --", book_info)
if s:
title = s.group(1)
title.find("Python") > -1
link = search_response.getLinkWith(title)
link != None
book_details = link.click()
book_details != None
page_title = book_details.getTitle()
page_title.find(title) > -1
Expecting:
1
1
1
1
1
1
1
1
1
1
1
1
ok
0 of 20 examples failed in test_amazon.test_amazon_search.__doc__
1 items had no tests:
test_amazon
1 items passed all tests:
20 tests in test_amazon.test_amazon_search
20 tests in 2 items.
20 passed and 0 failed.
Test passed.
Some parting thoughts:

1. Porting Java code to Jython is a remarkably smooth and painless process. I ported the OnJava.com example to Jython and in the process got a 40% reduction in line count (you can find the original Java code here and the Jython code here). While doing this, I gleefully got rid of ugly Java idioms such as:
for(int i=0; i < resultLinks.length; i++)

{
String url = resultLinks[i].getURLString();
}
and replaced them with the more elegant:
for link in result_links:

url = link.getURLString()
2. My one-to-one porting from Java to Jython used unittest, which naturally corresponds to the original jUnit code. However, when I started using Jython interactively in a shell session, I realized that doctest is the proper test framework to use in this case.

3. I wish Jython could keep up with CPython. For example, the doctest version shipped with Jython 2.1 does not have the testfile functionality which allows you to save the docstrings in separate text files and add free-flowing text.

4. HttpUnit offers limited Javascript support. This can be a problem in practice, since a large number of sites are heavy on Javascript. While trying to find a good example for this post, I tried a number of sites and had HttpUnit bomb when trying to either retrieve the main page or post via a search form (such sites include monster.com, hotjobs.com, freshmeat.net, sourceforge.net).

In conclusion, I think there is a real advantage in using Jython over Java in order to quickly prototype tests that use third-party Java libraries. The combination of Jython and doctest proves to be extremely "agile", since it simplifies the test code, it enhances its clarity, and it provides instantaneous feedback -- all eminently agile qualities.