Friday, April 29, 2005

New version of sparkplot on sparkplot.org

Due to positive feedback to my sparkplot post, I registered sparkplot.org and made it into a home for the sparkplot module.

I also added another option to sparkplot: -t or --transparency. Setting this option produces a transparent background for the generated PNG image (thanks to usagi for the suggestion).

I was glad to see somebody actually using the sparkplot module at the T&J blog. It looks like visualizing trading data is a natural application of sparklines. I also liked the term used by T to describe the act of embedding sparklines into blog entries: datablogging.

Saturday, April 23, 2005

sparkplot: creating sparklines with matplotlib

Edward Tufte introduced sparklines in a sample chapter of his upcoming book "Beautiful Evidence". In his words, sparklines are "small, high-resolution graphics embedded in a context of words, numbers, images. Sparklines are data-intense, design-simple, word-sized graphics." Various implementations of sparklines that use different graphics packages have already been published, and probably the best known one is a PHP implementation that uses the GD backend. In what follows, I'll show some examples of sparkline graphics created via matplotlib by the sparkplot Python module I wrote.

Example 1

Since a picture is worth a thousand words, here is the Los Angeles Lakers' road to their NBA title in 2002. Wins are pictured with blue bars and losses with red bars. Note how easy it is to see the streaks for wins and losses.

The Lakers' 2004 season was their last with Shaq, when they reached the NBA finals and lost to Detroit (note the last 3 losses which sealed their fate in the finals).

Compare those days of glory with their abysmal 2005 performance, with only 2 wins in the last 21 games. Also note how the width of the last graphic is less than the previous 2, a consequence of the Lakers not making the playoffs this year.

Example 2

The southern oscillation is defined as the barometric pressure difference between Tahiti and the Darwin Islands at sea level. The southern oscillation is a predictor of El Nino which in turn is thought to be a driver of world-wide weather. Specifically, repeated southern oscillation values less than -1 typically defines an El Nino.

Here is a sparkline for the southern oscillation from
1955to 1992 (456 sample data points obtained from NIST). The sparkline is plotted with a horizontal span drawn along the x axis covering data values between -1 and 0, so that values less than -1 can be more clearly seen.

Example 3
Here is the per capita income in California from 1959 to 2003.
And here is the "real" per capita income (adjusted for inflation) in California, from 1959 to 2003.

Example 4

Here is the monthly distribution of messages sent to comp.lang.py from 1994 to 2004, plotted per year. Minimum and maximum values are shown with blue dots and labeled in the graphics.

Year

Total
1994
clpy 1994
3,018
1995
clpy 1995
4,026
1996
clpy 1996
8,378
1997
clpy 1997
12,910
1998
clpy 1998
19,533
1999
clpy 1999
24,725
2000
clpy 2000
42,961
2001
clpy 2001
55,271
2002
clpy 2002
56,750
2003
clpy 2003
64,548
2004
clpy 2004
56,184


There was an almost constant increase in the number of messages per year, from 1994 to 2004, the only exception being 2004, when there were fewer message than in 2002 and 2003.

Details on using sparkplot

1) Install the Numeric Python module (required by matplotlib)
2) Install matplotlib
3) Prepare data files: sparkplot simplistically assumes that its input data file contains just 1 column of numbers
4) Run sparkplot.py. Here are some command-line examples to get you going:

- given only the input file and no other option, sparkplot.py will generate a gray sparkline with the first and last data points plotted in red:

sparkplot.py -i CA_real_percapita_income.txt

produces:

The name of the output file is by default .png. It can be changed with the -o option.

The plotting of the first and last data points can be disabled with the --noplot_first and --noplot_last options.

- given the input file and the label_first, label_last, format=currency options, sparkplot.py will generate a gray sparkline with the first and last data points plotted in red and with the first and last data values displayed in a currency format:

sparkplot.py -i CA_real_percapita_income.txt --label_first --label_last --format=currency

produces:

The currency symbol is $ by default, but it can be changed with the --currency option.

- given the input file and the plot_min, plot_max, label_min, label_max, format=comma options, sparkplot.py will generate a gray sparkline with the first and last data points plotted in red, with the min. and max. data points plotted in blue, and with the min. and max. data values displayed in a 'comma' format (e.g. 23,456,789):

sparkplot.py -i clpy_1997.txt --plot_min --plot_max --label_min --label_max --format=comma

produces:

- given the input file and the type=bars option, sparkplot.py will draw blue bars for the positive data values and red bars for the negative data values:

sparkplot.py -i lakers2005.txt --type=bars

produces:

As a side note, I think bar plots look better when the data file contains a relatively large number of data points, and the variation of the data is relatively small. This type of plots works especially well for sports-related graphics, where wins are represented as +1 and losses as -1.

- for other options, run sparkplot.py -h

I hope the sparkplot module will prove to be useful when you need to include sparkline graphics in your Web pages. All the caveats associated with alpha-level software apply :-) Let me know if you find it useful. I'm very much a beginner at using matplotlib, and as I become more acquainted with it I'll add more functionality to sparkplot.

Finally, kudos to John Hunter, the creator of matplotlib. I found this module extremely powerful and versatile. For a nice introduction to matplotlib, see also John's talk at PyCon05.

Note: the Blogger template system might have something to do with the fact that the graphics are shown with a border; when included in a "normal", white-background HTML page, there is no border and they integrate more seamlessly into the text.

Update 5/2/05: Thanks to Kragen Sitaker for pointing out a really simple solution to the "borders around images" problem -- just comment out the CSS definition for .post img in the Blogger template.

Thursday, April 14, 2005

More on performance vs. load testing

I recently got some comments/questions related to my previous blog entry on performance vs. load vs. stress testing. Many people are still confused as to exactly what the difference is between performance and load testing. I've been thinking more about it and I'd like to propose the following question as a litmus test to distinguish between these two types of testing: are you actively profiling your application code and/or monitoring the server(s) running your application? If the answer is yes, then you're engaged in performance testing. If the answer is no, then what you're doing is load testing.

Another way to look at it is to see whether you're doing more of a white-box type testing as opposed to black-box testing. In the white-box approach, testers, developers, system administrators and DBAs work together in order to instrument the application code and the database queries (via specialized profilers for example), and the hardware/operating system of the server(s) running the application and the database (via monitoring tools such as vmstat, iostat, top or Windows PerfMon). All these activities belong to performance testing.

The black box approach is to run client load tools against the application in order to measure its responsiveness. Such tools range from lightweight, command-line driven tools such as httperf, openload, siege, Apache Flood, to more heavy duty tools such as OpenSTA, The Grinder, JMeter. This type of testing doesn't look at the internal behavior of the application, nor does it monitor the hardware/OS resources on the server(s) hosting the application. If this sounds like the type of testing you're doing, then I call it load testing.

In practice though the 2 terms are often used interchangeably, and I am as guilty as anyone else of doing this, since I called one of my recent blog entries "HTTP performance testing with httperf, autobench and openload" instead of calling it more precisely "HTTP load testing". I didn't have access to the application code or the servers hosting the applications I tested, so I wasn't really doing performance testing, only load testing.

I think part of the confusion is that no matter how you look at these two types of testing, they have one common element: the load testing part. Even when you're profiling the application and monitoring the servers (hence doing performance testing), you still need to run load tools against the application, so from that perspective you're doing load testing.

As far as I'm concerned, these definitions don't have much value in and of themselves. What matters most is to have a well-established procedure for tuning the application and the servers so that you can meet your users' or your business customers' requirements. This procedure will use elements of all the types of testing mentioned here and in my previous entry: load, performance and stress testing.

Here's one example of such a procedure. Let's say you're developing a Web application with a database back-end that needs to support 100 concurrent users, with a response time of less than 3 seconds. How would you go about testing your application in order to make sure these requirements are met?

1. Start with 1 Web/Application server connected to 1 Database server. If you can, put both servers behind a firewall, and if you're thinking about doing load balancing down the road, put the Web server behind the load balancer. This way you'll have one each of different devices that you'll use in a real production environment.

2. Run a load test against the Web server, starting with 10 concurrent users, each user sending a total of 1000 requests to the server. Step up the number of users in increments of 10, until you reach 100 users.

3. While you're blasting the Web server, profile your application and your database to see if there are any hot spots in your code/SQL queries/stored procedures that you need to optimize. I realize I'm glossing over important details here, but this step is obviously highly dependent on your particular application.

Also monitor both servers (Web/App and Database) via command line utilities mentioned before (top, vmstat, iostat, netstat, Windows PerfMon). These utilities will let you know what's going on with the servers in terms of hardware resources. Also monitor the firewall and the load balancer (many times you can do this via SNMP) -- but these devices are not likely to be a bottleneck at this level, since they usualy can deal with thousands of connections before they hit a limit, assuming they're hardware-based and not software-based.

This is one of the most important steps in the whole procedure. It's not easy to make sense of the output of these monitoring tools, you need somebody who has a lot of experience in system/network architecture and administration. On Sun/Solaris platforms, there is a tool called the SE Performance Toolkit that tries to alleviate this task via built-in heuristics that kick in when certain thresholds are reached and tell you exactly what resource is being taxed.

4. Let's say your Web server's reply rate starts to level off around 50 users. Now you have a repeatable condition that you know causes problems. All the profiling and monitoring you've done in step 3, should have already given you a good idea about hot spots in your applicationm about SQL queries that are not optimized properly, about resource status at the hardware/OS level.

At this point, the developers need to take back the profiling measurements and tune the code and the database queries. The system administrators can also increase server performance simply by throwing more hardware at the servers -- especially more RAM at the Web/App server in my experience, the more so if it's Java-based.

5. Let's say the application/database code, as well as the hardware/OS environment have been tuned to the best of everybody's abilities. You re-run the load test from step 2 and now you're at 75 concurrent users before performance starts to degrade.

At this point, there's not much you can do with the existing setup. It's time to think about scaling the system horizontally, by adding other Web servers in a load-balanced Web server farm, or adding other database servers. Or maybe do content caching, for example with Apache mod_cache. Or maybe adding an external caching server such as Squid.

One very important product of this whole procedure is that you now have a baseline number for your application for this given "staging" hardware environment. You can use the staging setup for nightly peformance testing runs that will tell you whether changes in your application/database code caused an increase or a decrease in performance.

6. Repeat above steps in a "real" production environment before you actually launch your application.

All this discussion assumed you want to get performance/benchmarking numbers for your application. If you want to actually discover bugs and to see if your application fails and recovers gracefully, you need to do stress testing. Blast your Web server with double the number of users for example. Unplug network cables randomly (or shut down/restart switch ports via SNMP). Take out a disk from a RAID array. That kind of thing.

The conclusion? At the end of the day, it doesn't really matter what you call your testing, as long as you help your team deliver what it promised in terms of application functionality and performance. Performance testing in particular is more art than science, and many times the only way to make progress in optimizing and tuning the application and its environment is by trial-and-error and perseverance. Having lots of excellent open source tools also helps a lot.

Friday, April 08, 2005

HTTP performance testing with httperf, autobench and openload

Update 02/26/07
--------
The link to the old httperf page wasn't working anymore. I updated it and pointed it to the new page at HP. Here's a link to a PDF version of a paper on httperf written by David Mosberger and Tai Jin: "httperf -- a tool for measuring Web server performance".

Also, openload is now OpenWebLoad, and I updated the link to its new home page.
--------

In this post, I'll show how I conducted a series of performance tests against a Web site, with the goal of estimating how many concurrent users it can support and what the response time is. I used a variety of tools that measure several variables related to HTTP performance.

  • httperf is a benchmarking tool that measures the HTTP request throughput of a web server. The way it achieves this is by sending requests to the server at a fixed rate and measuring the rate at which replies arrive. Running the test several times and with monotonically increasing request rates, one can see the reply rate level off when the server becomes saturated, i.e., when it is operating at its full capacity.
  • autobench is a Perl wrapper around httperf. It runs httperf a number of times against a Web server, increasing the number of requested connections per second on each iteration, and extracts the significant data from the httperf output, delivering a CSV format file which can be imported directly into a spreadsheet for analysis/graphing.
  • openload is a load testing tool for Web applications. It simulates a number of concurrent users and it measures transactions per second (a transaction is a completed request to the Web server) and response time.

I ran a series of autobench/httperf and openload tests against a Web site I'll call site2 in the following discussion (site2 is a beta version of a site I'll call site1). For comparison purposes, I also ran similar tests against site1 and against www.example.com. The machine I ran the tests from is a Red Hat 9 Linux server co-located in downtown Los Angeles.

I won't go into details about installing httperf, autobench and openload, since the installation process is standard (configure/make/make install or rpm -i).

Here is an example of running httperf against www.example.com:

# httperf --server=www.example.com --rate=10 --num-conns=500

httperf --client=0/1 --server=www.example.com --port=80 --uri=/ --rate=10 --send-buffer=4096 --recv-buffer=16384 --num-conns=500 --num-calls=1
Maximum connect burst length: 1

Total: connections 500 requests 500 replies 500 test-duration 50.354 s

Connection rate: 9.9 conn/s (100.7 ms/conn, <=8 concurrent connections)
Connection time [ms]: min 449.7 avg 465.1 max 2856.6 median 451.5 stddev 132.1
Connection time [ms]: connect 74.1
Connection length [replies/conn]: 1.000

Request rate: 9.9 req/s (100.7 ms/req)
Request size [B]: 65.0

Reply rate [replies/s]: min 9.2 avg 9.9 max 10.0 stddev 0.3 (10 samples)
Reply time [ms]: response 88.1 transfer 302.9
Reply size [B]: header 274.0 content 54744.0 footer 2.0 (total 55020.0)
Reply status: 1xx=0 2xx=500 3xx=0 4xx=0 5xx=0

CPU time [s]: user 15.65 system 34.65 (user 31.1% system 68.8% total 99.9%)
Net I/O: 534.1 KB/s (4.4*10^6 bps)

Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0

The 3 arguments I specified on the command line are:
  • server: the name or IP address of your Web site (you can also specify a particular URL via the --uri argument)
  • rate: specifies the number of HTTP requests/second sent to the Web server -- indicates the number of concurrent clients accessing the server
  • num-conns: specifies how many total HTTP connections will be made during the test run -- this is a cumulative number, so the higher the number of connections, the longer the test run
Here is a detailed interpretation of an httperf test run. In short, the main numbers to look for are the connection rate, the request rate and the reply rate. Ideally, you would like to see that all these numbers are very close to the request rate specified on the command line. If the actual request rate and the reply rate start to decline, that's a sign your server became saturated and can't handle any new connections. That could also be a sign that your client became saturated, so that's why it's better to test your client against a fast Web site in order to gauge how many outgoing HTTP requests can be sustained by your client.

Autobench is a simple Perl script that facilitates multiple runs of httperf and automatically increases the HTTP request rate. Configuration of autobench can be achieved for example by means of the ~/.autobench.conf file. Here is how my file looks like:

# Autobench Configuration File

# host1, host2
# The hostnames of the servers under test
# Eg. host1 = iis.test.com
# host2 = apache.test.com

host1 = testhost1
host2 = testhost2

# uri1, uri2
# The URI to test (relative to the document root). For a fair comparison
# the files should be identical (although the paths to them may differ on the
# different hosts)

uri1 = /
uri2 = /

# port1, port2
# The port number on which the servers are listening

port1 = 80
port2 = 80

# low_rate, high_rate, rate_step
# The 'rate' is the number of number of connections to open per second.
# A series of tests will be conducted, starting at low rate,
# increasing by rate step, and finishing at high_rate.
# The default settings test at rates of 20,30,40,50...180,190,200

low_rate = 10
high_rate = 50
rate_step = 10

# num_conn, num_call
# num_conn is the total number of connections to make during a test
# num_call is the number of requests per connection
# The product of num_call and rate is the the approximate number of
# requests per second that will be attempted.

num_conn = 200
#num_call = 10
num_call = 1

# timeout sets the maximimum time (in seconds) that httperf will wait
# for replies from the web server. If the timeout is exceeded, the
# reply concerned is counted as an error.

timeout = 60

# output_fmt
# sets the output type - may be either "csv", or "tsv";

output_fmt = csv

## Config for distributed autobench (autobench_admin)
# clients
# comma separated list of the hostnames and portnumbers for the
# autobench clients. No whitespace can appear before or after the commas.
# clients = bench1.foo.com:4600,bench2.foo.com:4600,bench3.foo.com:4600

clients = localhost:4600

The only variable I usually tweak from one test run to another is num_conn, which I set to the desired number of total HTTP connections to the server for that test run. In the example file above it is set to 200.

I changed the default num_call value from 10 to 1 (num_call specifies the number of HTTP requests per connection; I like to set it to 1 to keep things simple). I started my test runs with low_rate set to 10, high_rate set to 50 and rate_step set to 10. What this means is that autobench will run httperf 5 times, starting with 10 requests/sec and going up to 50 requests/sec in increments of 10.

When running the following command line...

# autobench --single_host --host1=www.example.com --file=example.com.csv

...I got this output and this CSV file.

Here is a graph generated via Excel from the CSV file obtained when running autobench against www.example.com for a different test run, with 500 total HTTP connections (the CSV file is here):



A few things to note about this typical autobench run:
  • I chose example.com as an example of how an "ideal" Web site should behave
  • the demanded request rate (in requests/second) starts at 10 and goes up to 50 in increments of 5 (x-axis)
  • for each given request rate, the client machine makes 500 connections to the Web site
  • the achieved request rate and the connection rate correspond to the demanded request rate
  • the average and maximum reply rates are roughly equal to the demanded request rate
  • the reponse time is almost constant, around 100 msec
  • the are no HTTP errors

What this all means is that the example.com Web site is able to easily handle up to 50 req/sec. The fact that the achieved request rate and the connection rate increase linearly from 10 to 50 also means that the client machine running the test is not the bottleneck. If the demanded request rate were increased to hundreds of req/sec, then the client will not be able to keep up with the demanded requests and it will become the bottleneck itself. In these types of situations, one would need to use several clients in parallel in order to bombard the server with as many HTTP requests as it can handle. However, the client machine I am using is sufficient for requests rates lower than 50 req/sec.

Here is an autobench report for site1 (the CSV file is here):



Some things to note about this autobench run:
  • I specified only 200 connections per run, so that the server would not be over-taxed
  • the achieved request rate and the connection rate increase linearly with the demanded request rate, but then level off around 40
  • there is a drop at 45 req/sec which is probably due to the server being temporarily overloaded
  • the average and maximum reply rates also increase linearly, then level off around 39 replies/sec
  • the response time is not plotted, but it also increases linearly from 93 ms to around 660 ms

To verify that 39 is indeed the maximum reply rate that can be achieved by the Web server, I ran another autobench test starting at 10 req/sec and going up to 100 req/sec in increments of 10 (the CSV file is here):



Observations:
  • the reply rate does level off around 39 replies/sec and actually drops to around 34 replies/sec when the request rate is 100
  • the response time (not plotted) increases linearly from 97 ms to around 1.7 sec
We can conclude that the current site1 Web site can sustain up to around 40 requests/second before it becomes saturated. At higher request rates, the response time increases and users will start experiencing time-outs.

Here is an autobench report for site2 (the CSV file is here):



Some things to note about this autobench run:
  • the achieved request rate and the connection rate do not increase with the demanded request rate; instead, they are both almost constant, hovering around 6 req/sec
  • the average reply rate also stays relatively constant at around 6 replies/sec, while the maximum reply rate varies between 5 and 17
  • there is a dramatic increase in response time (not plotted) from 6 seconds to more than 18 seconds
From this initial run, we can see that the average reply rate does not exceed 6-7 replies/second, so this seems to be the limit for the site2 Web site. In order to further verify this hypothesis, I ran another autobench test, this time going from 1 to 10 requests/second, in increments of 1. Here is the report (the CSV file is here):



Some things to note about this autobench run:
  • the achieved request rate and the connection rate increase linearly with the demanded request rate from 1 to 6, then level off around 6
  • the average reply rate is almost identical to the connection rate and also levels off around 6
  • the maximum reply rate levels off around 8
  • the reponse time (not plotted) increases from 226 ms to 4.8 seconds
We can conclude that the site2 Web site can sustain up to 7 requests/second before it becomes saturated. At higher request rates, the response time increases and users will start experiencing time-outs.

Finally, here are the results of a test run that uses the openload tool in order to measure transactions per second (equivalent to httperf's reply rate) and reponse time (the CSV file is here):



Some notes:
  • the transaction rate levels off, as expected, around 6 transactions/sec
  • the average response time levels off around 7 seconds, but the maximum response time varies considerably from 3 to around 20 seconds, reaching up to 30 seconds
These results are consistent with the ones obtained by running httperf via autobench. From all these results, we can safely conclude that in its present state, the site2 Web site is not ready for production, unless more than 6-7 concurrent users are never expected to visit the site at the same time. The response time is very high and the overall user experience is not a pleasant one at this time. Also, whenever I increased the load on the site (for example by running autobench with 200 through 500 connections per run), the site became almost instantly un-responsive and ended up sending HTTP errors back to the client.

Conclusion

The tools I described are easy to install and run. The httperf request/reply throughput measurements in particular prove to be very helpful in pinpointing HTTP bottlenecks. When they are corroborated with measurements from openload, an overall picture emerges that is very useful in assessing HTTP performance numbers such as concurrent users and response time.

Update

I got 2 very un-civil comments from the same Anonymous Coward-type poster. This poster called my blog entry "amateurish" and "recklessly insane" among other things. One slightly more constructive point made by AC is a question: why did I use these "outdated" tools and not other tools such as The Grinder, OpenSTA and JMeter? The answer is simple: I wanted to use command-line-driven, lightweight tools that can be deployed on any server, with no need for GUIs and distributed installations. If I were to test a large-scale Web application, I would certainly look into the heavy-duty tools mentioned by the AC. But the purpose of my post was to show how to conduct a very simple experiment that can still furnish important results and offer a good overall picture about a Web site's behavior under moderate load.

Wednesday, April 06, 2005

Using Selenium to test a Plone site (part 2)

In this post I'll talk about some Plone-specific features available in Selenium, such as setup, tearDown and postResults methods. See part 1 for more general Selenium features that can be used to test any Web site.

Jason Huggins recently released a new version (selenium-0.3rc2-plone.zip) of the Plone product implementation of Selenium. If you already have an old version of the Selenium Plone product installed, you need to uninstall it and install the new version. Alternatively, you can check out the latest Selenium source code via subversion, then follow the instructions in Installing Selenium as a Plone product in my previous post.

The most important addition in the Selenium Plone product is a "Plone tool" called FunctionalTestTool.py (on my test system, it is in /var/lib/plone2/main/Products/Selenium). A Plone tool is Python code that adds functionality to a Plone-based Web site. You can use a tool for example wherever you would use a CGI script. You can find a good intro on Plone tools here.

Every Plone tool has an id which is used as part of the URL that actually invokes the tool. The id for FunctionalTestTool is selenium_ft_tool. The 3 methods that are "public" (i.e. callable from other pages) in FunctionalTestTool.py are setup, tearDown and postResults. If you wanted to invoke the setup method for example, you would need to open the URL http://www.example.com:8080/Plone/selenium_ft_tool/setup followed by optional arguments.

The purpose of the setup method is to facilitate the creation of Plone users at the beginning of each Selenium test. In my previous post on Selenium and Plone, I used the example of testing the "New user" functionality in Plone. I used a Selenium test table to fill in the necessary values for user name, password and email in the Plone registration form, then submitted that form so that the user can be created. If you want to test specific functionality for a new user, having to go through the process of filling the registration form at the beginning of each of your tests can quickly become tedious. Enter the setup method, which allows you to create a new user in one line of your test table.

In its current incarnation, the setup method accepts a user_type argument which can be either Member of Manager. The method uses the Plone API (via utils.getToolByName(self, 'portal_membership').addMember) to create a new user of the specified type, with a random user name (prefixed by FTuser or FTmanageruser, depending on the user type), with a password set to the user name, and with an email set to username@example.org. I modified the FunctionalTestTool.py code and added a second parameter called user, which, if specified, sets both the user name and the password to the specified value. This is so that I can create my own random user name at the very beginning of my test table, then use that value as the user name throughout my test. You can find my modified version of FunctionalTestTool.py here.

The purpose of the tearDown method is to delete the user created via setup at the end of your test, so that the Plone user database doesn't get cluttered with a new user for each test you run. The tearDown method takes a user name in the user parameter and uses the Plone API (via utils.getToolByName(self, 'portal_membership').pruneMemberDataContents()) to delete the specified user.

The purpose of the postResults method is to save the HTML output (the test table(s) with rows colored green or red) produced by the current test run into its own file, so that you can keep a record of all your test runs. The method also saves the Pass/Fail/Total test count for the current test run in a plain text file which can be archived for future inspection. The 2 files generated by postResults can be found in /var/lib/plone2/main/Products/Selenium/skins/selenium_test_results. The HTML file is called selenium-results-USER_AGENT.html, where USER_AGENT is set to the name of the browser you're using. In my case, the file is called selenium-results-Firefox.html. The test count file is called selenium-results-metadata-USER_AGENT.txt.dtml. In my case, the file is called selenium-results-metadata-Firefox.txt.dtml.

Currently, you can't call postResults directly. It gets called implicitly if you run TestRunner.html with auto=true. In my case, if I wanted to post the results at the end of the test run for the test suite shipped with Selenium, I would open this URL: http://www.example.com:8080/Plone/TestRunner.html?test=PloneTestSuite.html&auto=true
(replace example with your domain name)

Here is an example of a test table that uses the methods I described so far. Its purpose is to test the capability for brand new Plone users to edit their home page. Here is the test table I created:

testEditHomePage
setVariable username 'user'+(new Date()).getTime()
open ./selenium_ft_tool/setup?user=${username}
setVariable base_url 'http://www.example.com:8080/Plone'
open ${base_url}
type __ac_name ${username}
type __ac_password ${username}
click submit
verifyTextPresent Welcome! You are now logged in
setVariable myfolder_url '${base_url}/Members/${username}/folder_contents'
click //a[@href='${myfolder_url}']
verifyTextPresent Home page area that contains the items created and collected by ${username}
setVariable homepage_url '${base_url}/Members/${username}/index_html/document_view'
click //a[@href='${homepage_url}']
verifyTextPresent This item does not have any body text, click the edit tab to change it.
setVariable edit_homepage_url '${base_url}/Members/${username}/index_html/document_edit_form'
click //a[@href='${edit_homepage_url}']
verifyTextPresent Fill in the details of this document.
type text Hello World!
click form.button.Save
verifyTextPresent Document changes saved
verifyTextPresent Hello World!
setVariable member_url '${base_url}/Members/${username}'
open ${member_url}
verifyTextPresent Home page for ${username}
verifyTextPresent Hello World!
open ./selenium_ft_tool/tearDown?user=${username}

The test table starts by assigning a random value to the username, then it calls the setup method with the user argument to create that user. It continues by logging in into Plone via the "log in" form on the main Plone page, then it opens the "my folder" page, it goes to the user's home page and fills in "Hello, World!" as the text of the page. It finally saves the home page, then opens the Members/username Plone page to verify that the user's home page is there. Along the way, the test uses the verifyTextPresent assertion to check that expected text values are indeed present on the various pages that it opens. The last line in the test table uses the tearDown method to delete the newly-created user.

The Plone product version of Selenium ships with a test suite that contains one test table called testJoinPortal. The test suite file (called PloneTestSuite.html.dtml) and the testJoinPortal file (called testJoinPortal.html.dtml) are both in /var/lib/plone2/main/Products/Selenium/skins/ftests_browser_driven. I called my custom test table testEditHomePage.html.dtml and I edited PloneTestSuite.html.dtml to add a new row corresponding to testEditHomePage.html. I then restarted Plone and I went to http://www.example.com:8080/Plone/TestRunner.html?test=PloneTestSuite.html&auto=true.
The test suite got executed automatically (because auto was set to true), and postResults got called automatically at the end of the test suite run (also because auto was set to true).

The frame where the emebedded browser runs shows this at the end of the test suite run:

Results have been successfully posted to the server here:
/var/lib/plone2/main/Products/Selenium/skins/selenium_test_results

selenium-results-metadata-Firefox.txt
selenium-results-Firefox

Clicking on the selenium-results-Firefox link shows the saved output of the test suite run:


Test Suite
testJoinPortal
testEditHomePage


testJoinPortal
open ./join_form
verifyTextPresent Registration Form
type fullname Jason Huggins
type username jrhuggins12345
type email jrhuggins@example.com
type password 12345
type confirm 12345
clickAndWait form.button.Register
verifyTextPresent You have been registered as a member.
clickAndWait document.forms[1].elements[3]
verifyTextPresent Welcome! You are now logged in.
open ./selenium_ft_tool/tearDown?user=jrhuggins12345


testEditHomePage
setVariable username 'user'+(new Date()).getTime()
open ./selenium_ft_tool/setup?user=${username}
setVariable base_url 'http://www.example.com:8080/Plone'
open ${base_url}
type __ac_name ${username}
type __ac_password ${username}
click submit
verifyTextPresent Welcome! You are now logged in
setVariable myfolder_url '${base_url}/Members/${username}/folder_contents'
click //a[@href='${myfolder_url}']
verifyTextPresent Home page area that contains the items created and collected by ${username}
setVariable homepage_url '${base_url}/Members/${username}/index_html/document_view'
click //a[@href='${homepage_url}']
verifyTextPresent This item does not have any body text, click the edit tab to change it.
setVariable edit_homepage_url '${base_url}/Members/${username}/index_html/document_edit_form'
click //a[@href='${edit_homepage_url}']
verifyTextPresent Fill in the details of this document.
type text Hello World!
click form.button.Save
verifyTextPresent Document changes saved
verifyTextPresent Hello World!
setVariable member_url '${base_url}/Members/${username}'
open ${member_url}
verifyTextPresent Home page for ${username}
verifyTextPresent Hello World!
open ./selenium_ft_tool/tearDown?user=${username}

Clicking on the selenium-results-metadata-Firefox.txt link shows the test total/pass/fail count
for the current test suite run:

totalTime: 7
numTestFailures: 0
numCommandPasses: 11
numCommandFailures: 0
numCommandErrors: 0
result: passed
numTestPasses: 2
Conclusion

The new Plone test tool added in the latest version of the Plone product implementation of Selenium provides valuable functionality for test management via the setup/tearDown/postResults methods. Even though these methods are Plone-specific, the ideas behind them should be easily adaptable to other Web applications. The only requirement is for the application to provide API hooks for user management functionality. This requirement is easily met by Plone, and I have to say I found it very easy to extend the Plone tool, even though I had no prior experience in writing Plone-specific code.

As a future Selenium enhancement, I'd like to see a way to specify 'SetUp' and 'TearDown' test tables at the test suite level that would automatically run at the beginning and at the end of each test suite run. In my example above, the log in process could be encapsulated in a SetUp table, and all other tables in the suite would then implicitly use the functionality in the SetUp table.

The alternative of course would be to extend the FunctionalTestTool and create a log_in method that would use the Plone API to log in the newly-created user. The current setup method in FunctionalTestTool could then get an extra parameter (such as login=true) that would call the log_in method if specified.