Agile Testing: December 2004

Monday, December 27, 2004

Performance testing with pyUnitPerf

Summary: If you are familiar with Mike Clark's JUnitPerf framework, then you may be interested in knowing that I have just released a Python port: pyUnitPerf. You can browse the source code here and you can get a gzipped tarball from here.

Details

pyUnitPerf tests are meant to transparently add performance testing capabilities to existing pyUnit test suites. The pyUnitPerf framework introduces 2 new types of tests:

TimedTest: runs an existing pyUnit test case by imposing a limit to the time it takes to run the test
LoadTest: runs an existing pyUnit test case by simulating concurrent users and iterations

Let's look at a simple example adapted from the samples provided with JUnitPerf.

Assume you have the following pyUnit test case in a file called ExampleTestCase.py:

from unittest import TestCase, TestSuite, TextTestRunner, makeSuite

import time



class ExampleTestCase(TestCase):



    def __init__(self, name):

        TestCase.__init__(self, name)



    def testOneSecondResponse(self):

        time.sleep(1)



    def suite(self):

        return makeSuite(self.__class__)



if __name__ == "__main__":

    example = ExampleTestCase("testOneSecondResponse")

    runner = TextTestRunner()

Admitedly this is a contrived example, since the testOneSecondResponse method simply sleeps for 1 second and does not actually test anything, but it serves to illustrate the pyUnitPerf functionality.

Assume you want to create a timed test that waits for the completion of the ExampleTestCase.testOneSecondResponse method and then fails if the elapsed time exceeded 1 second. With pyUnitPerf, all you need to do is write the following code in a file called ExampleTimedTest.py:

from unittest import TestSuite, TextTestRunner

from ExampleTestCase import ExampleTestCase

from LoadTest import LoadTest

from TimedTest import TimedTest



class ExampleTimedTest:



    def __init__(self):

        self.toleranceInSec = 0.05



    def suite(self):

        s = TestSuite()

        s.addTest(self.make1SecondResponseTimedTest())

        return s



    def make1SecondResponseTimedTest(self):

        """

        Decorates a one second response time test as a

        timed test with a maximum elapsed time of 1 second

        """

        maxElapsedTimeInSec = 1 + self.toleranceInSec



        testCase = ExampleTestCase("testOneSecondResponse")

        timedTest = TimedTest(testCase, maxElapsedTimeInSec)

        return timedTest



if __name__ == "__main__":

    TextTestRunner(verbosity=2).run(ExampleTimedTest().suite())

The suite() method constructs a TestSuite object and adds to it the test object returned by the make1SecondResponseTimedTest method. This method instantiates an ExampleTestCase object, passing it the method name to be tested: testOneSecondResponse. We then pass the testCase object to a TimedTest object, together with the desired maximum time to wait for the completion of the test (to which we add a 50 msec. tolerance to account for time potentially spent setting up and tearing down the test case). In the __main__ section of the module, we simply call the pyUnit TextTestRunner, passing it the suite.

If you run: python ExampleTimedTest.py at a command prompt, you will get the following output:

testOneSecondResponse (ExampleTestCase.ExampleTestCase) ... ok

TimedTest (WAITING): testOneSecondResponse (ExampleTestCase.ExampleTestCase): 1.0 sec.



----------------------------------------------------------------------

Ran 1 test in 1.000s



OK

Now let's make the test fail by requiring the timed test to finish in 0.9 seconds. To do this, simply change

maxElapsedTimeInSec = 1 + self.toleranceInSec

maxElapsedTimeInSec = 0.9 + self.toleranceInSec

Running python ExampleTimedTest.py now results in the following output:

testOneSecondResponse (ExampleTestCase.ExampleTestCase) ... ok

TimedTest (WAITING): testOneSecondResponse (ExampleTestCase.ExampleTestCase): 1.

0 sec.

FAIL



======================================================================

FAIL: testOneSecondResponse (ExampleTestCase.ExampleTestCase)

----------------------------------------------------------------------

AssertionFailedError: Maximum elapsed time exceeded! Expected 0.95 sec., but was

1.0 sec.



----------------------------------------------------------------------

Ran 1 test in 1.000s



FAILED (failures=1)

Note that the test result for the pyUnit test case (ExampleTestCase.testOneSecondResponse) is still marked as OK, but the test result for the Timed Test is marked as FAILED, since the time it took was longer than the specified maximum time of 0.96 sec.

Let's look at an example of a LoadTest. The following code can be saved in a file called ExampleLoadTest.py:

from unittest import TestSuite, TextTestRunner

from ExampleTestCase import ExampleTestCase

from LoadTest import LoadTest

from TimedTest import TimedTest



class ExampleLoadTest:



    def __init__(self):

        self.toleranceInSec = 0.05



    def suite(self):

        s = TestSuite()

        s.addTest(self.make1SecondResponseSingleUserLoadTest())

        s.addTest(self.make1SecondResponseMultipleUserLoadTest())

        s.addTest(self.make1SecondResponse1UserLoadIterationTest())

        return s



    def make1SecondResponseSingleUserLoadTest(self):

        """

        Decorates a one second response time test as a single user

        load test with a maximum elapsed time of 1 second

        and a 0 second delay between users.

        """

        users = 1

        maxElapsedTimeInSec = 1 + self.toleranceInSec



        testCase = ExampleTestCase("testOneSecondResponse")

        loadTest = LoadTest(testCase, users)

        timedTest = TimedTest(loadTest, maxElapsedTimeInSec)

        return timedTest



    def make1SecondResponseMultipleUserLoadTest(self):

        """

        Decorates a one second response time test as a multiple-user

        load test with a maximum elapsed time of 1.5

        seconds and a 0 second delay between users.

        """

        users = 10

        maxElapsedTimeInSec = 1.5 + self.toleranceInSec



        testCase = ExampleTestCase("testOneSecondResponse")

        loadTest = LoadTest(testCase, users)

        timedTest = TimedTest(loadTest, maxElapsedTimeInSec)

        return timedTest



    def make1SecondResponse1UserLoadIterationTest(self):

        """

        Decorates a one second response time test as a single user

        load test with 10 iterations per user, a maximum

        elapsed time of 10 seconds, and a 0 second delay

        between users.

        """



        users = 1

        iterations = 10



        maxElapsedTimeInSec = 10 + self.toleranceInSec

        testCase = ExampleTestCase("testOneSecondResponse");

        loadTest = LoadTest(testCase, users, iterations)

        timedTest = TimedTest(loadTest, maxElapsedTimeInSec)

        return timedTest



if __name__ == "__main__":

    TextTestRunner(verbosity=1).run(ExampleLoadTest().suite())

The 3 methods defined in ExampleLoadTest cover some of the most commonly used load test scenarios. See the doc strings at the beginning of each method for more details. Running python ExampleLoadTest.py generates this output:

.TimedTest (WAITING): LoadTest (NON-ATOMIC): ThreadedTest: testOneSecondResponse

(ExampleTestCase.ExampleTestCase): 1.03099989891 sec.

..........TimedTest (WAITING): LoadTest (NON-ATOMIC): ThreadedTest: testOneSecondResponse (ExampleTestCase.ExampleTestCase): 1.0150001049 sec.

..........TimedTest (WAITING): LoadTest (NON-ATOMIC): ThreadedTest: testOneSecondResponse (ExampleTestCase.ExampleTestCase)(repeated): 10.0 sec.



----------------------------------------------------------------------

Ran 21 tests in 12.046s



OK

This time all the tests passed. Note that the multiple user load test (make1SecondResponseMultipleUserLoadTest) runs the individual test cases in parallel, each test case in its own thread, and thus the overall time is only slighly longer than 1 second. The multiple iteration test (make1SecondResponse1UserLoadIterationTest) runs the 10 iterations of the test case sequentially, and thus the overall time is 10 seconds.

We can make some of the tests fail by increasing the value of maxElapsedTimeInSec, similar to what we did for the TimedTest.

Why should you use pyUnitPerf? Mike Clark makes a great case for using JUnitPerf here. To summarize, you use pyUnitPerf when you have an existing suite of pyUnit tests that verify the correctness of your code, and you want to isolate potential performance issues with your code.

The fact that the pyUnitPerf test suites are completely independent from the pyUnit tests helps you in scheduling different run times for the 2 types of tests:

you want to run the pyUnit tests very often, since they (should) run fast
you want to run the pyUnitPerf tests less frequently, when trying to verify that an identified bottleneck has been eliminated (potential bottlenecks can be pinpointed via profiling for example); performance tests tend to take a longer time to run, so they could be scheduled for example during a nightly smoke test run

Monday, December 20, 2004

PyFIT Tutorial Part 2

In the conclusion of part 1 of the PyFIT tutorial, I said I will experiment with RowFixture tables. It turned out that they're really easy to use. I'll show here a simple example that will extend the FitNesse acceptance test suite for the Blog Management application.

In the FitNesse tests I put together in part 1 I used only ColumnFixture tables. One example is BloggerFixtures.GetEntryTitleContent, which takes entry_index as an argument and returns the title and the content of that entry. To use a database analogy, that particular ColumnFixture behaves like a SQL query of the form:


SELECT title, content FROM entries WHERE entry_index=N

In the FitNesse tests I wrote, I used GetEntryTitleContent in a table like this:


!|BloggerFixtures.GetEntryTitleContent|

|entry_index|title?|content?|

|1|Entry #3 Title|Entry #3 Content|

|2|Entry #2 Title|Entry #2 Content|

|3|Entry #1 Title|Entry #1 Content|

We can look at each row in this table as being the result of running the SQL query above, with entry_index set to 1, 2, and 3.

However, we really are interested in verifying ALL entries in the blog at the same time. It would be useful to have a fixture similar to a SQL query such as:


SELECT * FROM entries

This is exactly what a RowFixture achieves: it can be thought of as returning all the "rows" in the "database". In our case, we need to return all the entries in the blog, specifically their index, title, and content, so we need something like:


SELECT entry_index, title, content FROM entries

To achieve this, we need to write a clas derived from RowFixture. Our class needs to define
two methods:

getTargetClass
query

In PyFIT, the query method needs to return a list of objects, each object being an instance of the class returned by getTargetClass. Each object is a representation of a row in the RowFixture table. In our example, the target class needs to expose 3 class variables: entry_index, title and content.

Here is the ListAllEntries fixture I wrote:


from fit.RowFixture import RowFixture

import sys

blogger_path = "C:\\eclipse\\workspace\\blogger"

sys.path.append(blogger_path)

import Blogger



class BlogEntry:

_typeDict={

 "entry_index": "Int",

 "title": "String",

 "content": "String",

 }

entry_index = 0

title = ""

content = ""



class ListAllEntries(RowFixture):



def getTargetClass(self):

 return BlogEntry



def query(self):

 blogger = Blogger.get_blog()

 num_entries = blogger.get_num_entries()

 entry_list = []

 for i in range(num_entries):

     blog_entry = BlogEntry()

     blog_entry.entry_index = i+1

     blog_entry.title = blogger.get_nth_entry_title(i+1)

     blog_entry.content = blogger.get_nth_entry_content_strip_html(i+1)

     entry_list.append(blog_entry)

 return entry_list

The class derived from RowFixture is ListAllEntries, which defines the 2 methods I mentioned:

getTargetClass() returns the class BlogEntry
query() builds a list of BlogEntry objects by retrieving entry_index, title and content for each entry in the blog and assigning their values to the corresponding variables of BlogEntry

The BlogEntry class will be inspected by the FitNesse framework, so it needs to define the _typeDict TypeAdapter with the 3 variables listed in the FitNesse table that uses the ListAllEntries fixture. Speaking of this table, here is an example of how I used it:


!|BloggerFixtures.ListAllEntries|

|entry_index|title|content|

|1|Entry #4 Title|Entry #4 Content|

|2|Entry #3 Title|Entry #3 Content|

|3|Entry #2 Title|Entry #2 Content|

|4|Entry #1 Title|Entry #1 Content|

The above fragment is from a new test page I created in the BlogMgmtSuite acceptance test suite. You can see the full contents of this page here: PostDelete4EntriesRowFixture.

Using a RowFixture has another big advantage over using a ColumnFixture: a RowFixture will retrieve all the entries in the table and will let you know if you have any extra rows or if you are missing any rows. Here is an example:

Assume the blog has 4 entries. Assume we use a ColumnFixture like this:


!|BloggerFixtures.GetEntryTitleContent|

|entry_index|title?|content?|

|1|Entry #4 Title|Entry #4 Content|

|2|Entry #3 Title|Entry #3 Content|

|3|Entry #2 Title|Entry #2 Content|

In this case, all the rows will be colored green and the test will pass, since the ColumnFixture will retrieve one by one the three entries we specified.

Now assume we use a RowFixture like this:


!|BloggerFixtures.ListAllEntries|

|entry_index|title|content|

|1|Entry #4 Title|Entry #4 Content|

|2|Entry #3 Title|Entry #3 Content|

|3|Entry #2 Title|Entry #2 Content|

In this case, the test will fail, since the RowFixture will retrieve the fourth entry too and will let us know that we missed it in our table. FitNesse will show a fourth row colored in red, with the following text:


|4 surplus|Entry #1 Title|Entry #1 Content|

Conclusions

a RowFixture is the ideal vehicle to use when you need to verify that all the data you entered into the system under test so far is there -- no more and no less
using RowFixtures proved to be easy, once I visualized their similarity to SQL "select * from" queries.

Friday, December 17, 2004

[grid::fatherhood]

Some people whose blogs I'm reading (Tim Bacon, Laurent Bossavit, Alan Francis) decided to "gridblog" today about fatherhood and how it may relate to software development. More specifically, the topic is: "what do you wish someone had told you".

Here's a comment I made to Laurent Bossavit, who wrote about the strong emotions he experienced in his role as a father:

Laurent, in my experience as a father of 2, the strong emotions you mention were one of the best things that ever happened to me. I am one of the intellectual types you describe, and I always felt it hard to express my feelings. Having kids made me relive many moments from my past and opened many wounds and old feelings I had repressed. Instead of making me a stranger to myself, being a father made me know my true forgotten/supressed self much better. I am still struggling with how to be a good father -- it will probably be a life-long struggle, but it's a fight worth fighting. Many people tend to avoid struggles like this (and of course trying to be a good husband is another struggle), they tend to live in a bubble that isolates them from painful frictions but also from discovering who they really are. If you avoid pain, you can't know joy.

This doesn't really qualify as advice to give to a prospective father, and has nothing really to do with software development. But I do have this advice to give to software developers who sometimes tend to live in the bubble I described above, and who dedicate their life to their work (I know, I was an workaholic too): become a father! The unavoidable struggles that will ensue: the sleepless nights; the diaper-changing; the interferences from your own parents; walking the fine line between wanting to help your kids and letting them struggle on their own; walking the fine line between being too authoritarian or too permissive; realizing that your kids are their own persons and not simply extensions of yourself; realizing that kids need to be allowed to express their feelings in all situations, even if this contradicts everything you were taught and shown in your own childhood; realizing that maybe you need to confront your own parents with painful memories and emotions --- all these things will polish your soul and your heart and will help you unveil your TRUE you.

As far as software development is concernded, I think the royal way to getting in touch with your creativity is to become child-like --- and what better way to achieve that than to become a father. It is the child within us who is really the source of our energy and enthusiasm.

Thursday, December 16, 2004

STAF/STAX tutorial

Automated test distribution, execution and reporting with STAF/STAX

Assume you are part of a test team whose goal is to automate the distribution of tests to a large set of clients running on various platforms. You want to run an automated 'smoke test' in the following scenario:

A nightly build process sends out email notification that a new version of the software is ready to be tested.
The notification email triggers a 'Start Smoke Test' request sent to a dedicated machine (I will call it the "test management" machine), which coordinates all clients to be tested
The test management machine somehow tells all clients that version x.y.z of the software is available, then tells all clients to run a test harness and report back the results
After getting back the test results from all the clients, the test management machine sends out a test summary email containing the overall, failed, and successful test case count

You could try to implement this functionality yourself by writing for example a simple XML-RPC agent that runs on every client and accepts commands from the test management machine, but you soon realize that you need something more robust, something that had already been proved in large test environments.

I will show you how to use the STAF/STAX framework from IBM, which offers all the features listed in the smoke-test scenario just described.

The idea behind STAF is to run a very simple agent on all the machines that participate in the STAF testbed. Every machine can then run services on any other machine, subject to a so-called trust level. In practice, one machine will act as what I called the 'test management' machine, and will coordinate the test runs by sending jobs to the test clients. STAX is one of the services offered on top of the low-level STAF plumbing. It greatly facilitates the distribution of jobs to the test clients and the collection and logging of test results. STAX jobs are XML files spiced up with special <script> tags that contain Python code (actually Jython, but there are no differences for the purpose of this tutorial). This in itself was for us a major reason for choosing STAF over other solutions.

Here is the test environment that I will use in my example:

3 clients that will run the test harness: one called win1 running some flavor of Windows, one called linux1 running some flavor of Linux, and one called sol1 running some flavor of Solaris
1 test management machine, called mgmt1
1 desktop PC, called desktop1

What follows is a step-by-step guide to configuring STAF and STAX on the machines in the example testbed:

Step 1: Install and configure STAF on the test clients

Install STAF on all 5 machines (I refer the readers to the STAF User Guide for details on installing STAF). Here is an example of a STAF configuration file (on Unix, it's usually in /usr/local/staf/bin/STAF.cfg) for one of the 3 client machines:


# Enable TCP/IP connections

interface tcpip



# Turn on tracing of internal errors and deprecated options

trace on error deprecated



serviceloader library STAFDSLS



SET CONNECTTIMEOUT 15000

SET MAXQUEUESIZE 10000



TRUST LEVEL 5 MACHINE mgmt1

Note that the 3 client machines need to increase the trust level (default is 3) for the test management machine, so that the latter can initiate jobs on the clients.

Step 2: Install and configure STAX on the management host

Install the STAX service on the test management machine. In STAF parlance, this machine is called the STAX Service machine (readers are referred to the STAX User's Guide for details on STAX). There are a few things to remember in terms of requirements for this machine:

Java 1.2 or later needs to be installed
The following 2 variables need to be set (for example in .bash_profile):

export CLASSPATH=$CLASSPATH:/usr/local/staf/lib/JSTAF.jar	

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/staf/lib

The STAF.cfg configuration file needs to have the STAX service added to it (note the increase to trust level 4 for the desktop1 machine, which will act as the monitoring machine and needs special rights to connect to mgmt1):


# Enable TCP/IP connections

interface tcpip



# Turn on tracing of internal errors and deprecated options

trace on error deprecated



serviceloader library STAFDSLS



SERVICE STAX LIBRARY JSTAF EXECUTE /usr/local/staf/services/STAX/STAX.jar



SET MAXQUEUESIZE 10000



TRUST LEVEL 4 MACHINE desktop1

Step 3: Start the STAF agent

Run STAFProc on all 5 machines. STAFProc is the STAF agent that listens on a specific port (default is 6500) for STAF-specific commands.

Step 4: Create STAX job files

Create the STAX XML job files that will be interpreted by the STAX service on mgmt1. Here is an example of a job file, called client_test_harness.xml, that will run a test harness on our 3 clients


<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<!DOCTYPE stax SYSTEM "C:\QA\STAF\stax.dtd">

<stax>

<!--

The following <script> element is overriden if the global_vars.py SCRIPTFILE is used

A SCRIPTFILE can be specified either in the STAX Monitor, or directly when submitting a job to STAX

-->



<script>

VERSION = '1.0.1'

HARNESS_TIMER_DURATION = '60m'



clients_os = {	'win1':'win',

			'sol1':'unix',

			'linux1':'unix'

			}

harness_path = {

'unix': '/qa/harness',

		    	'win' : 'C:/qa/harness'

			}

tests_unix = [

[ 'unix_perms', 'brv_unix_perms.py' ],

[ 'long_names', 'brv_long_names.py' ]]

tests_win = [

[ 'unicode_names', 'brv_unicode_names.py' ]]

</script>



<defaultcall function="Main"/>



<function name="Main">

<sequence>

  <import machine="'mgmt1'" file="'/QA/STAF/stax_jobs/log_result.xml'"/>

  <call function="'ClientTestHarness'">

    [clients_os, harness_path, tests_unix, tests_win]

  </call>

</sequence>

</function>



<function name="ClientTestHarness">

<function-list-args>

  <function-required-arg name='clients_os'/>

  <function-required-arg name='harness_path'/>

  <function-required-arg name='tests_unix'/>

  <function-required-arg name='tests_win'/>

  <function-other-args name='args'/>

</function-list-args>

<paralleliterate var="machine" in="clients_os.keys()">

  <sequence>

  <script>

  os_type = clients_os[machine]

  tests = {}

  if os_type == 'unix':

  	tests = tests_unix

  if os_type == 'win':

  	tests = tests_win

  </script>

  <iterate var="test" in="tests">

    <sequence>

      <script>

      test_name = machine + "_" + test[0]

		</script>

      <testcase name="test_name">

        <sequence>

          <script>

cmdline = harness_path[os_type] + "/" + test[1]            </script>

			<timer duration = "HARNESS_TIMER_DURATION">

				<process>

				  <location>machine</location>

				  <command>'python'</command>

				  <parms>cmdline</parms>

				  <stderr mode="'stdout'" />

				  <returnstdout />

				</process>

			</timer>

          <call function="'LogResult'">machine</call>

        </sequence>

      </testcase>

    </sequence>

  </iterate>

  </sequence>

</paralleliterate>

</function>

</stax>

The syntax may seem overwhelming at first, but it turns out to be quite manageable once you get he hang of it. Here are the salient points in the above file:

The first <script> element sets a number of Python variables which are then used in the body of the XML document; think of them as global constants
There is one function called in the element; this function is called Main and is defined in the first element
The Main function imports another XML file (log_result.xml) in order for this job to be able to call a function (LogResult) defined in the imported file
The Main function then calls a function called ClientTestHarness, passing it as arguments four Python variables defined at the top
Almost all the action in this job happens in the ClientTestHarness function, which starts by declaring its required arguments, then proceeds by running a series of tests in parallel on each of our 3 client machines; the parallelism is achieved by means of the element
The <script> element that follows is simple Python code that retrieves the test suite to be run from the global dictionaries, via the machine name
On each machine, the tests in the test suite are executed sequentially, via the element
A element is defined for each test, so that we can easily retrieve the test statistics at the end of the run, via the LogResult function
For each test, the ClientTestHarness function executes a element, which runs a command (for example brv_unix_perms.py) on the target machine; the element is surrounded by a element which will mark the test as failed if the specified time interval reaches its limit
The element also specifies that the command to be executed redirect stderr to stdout, and return stdout
Finally, the ClientTestHarness function calls LogResult, passing it the machine name as the only argument

The LogResult function is defined in the log_result.xml file. Its tasks are to:

interpret the return code (which is a STAF-specific variable called RC) and the output (which is a STAX-specific variable called STAXResult) for each test case
set the result of the test run to PASS or FAIL
log it accordingly

Here is the log_result.xml file:


<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<!DOCTYPE stax SYSTEM "C:\QA\STAF\stax.dtd">

<stax>

<function name="LogResult">

<function-list-args>

  <function-required-arg name='machine'/>

  <function-other-args name='args'/>

</function-list-args>

<if expr="RC != 0">

  <sequence>

    <tcstatus result="'fail'">'Failed with RC=%s' % RC</tcstatus>

    <log level="'error'">'Process failed with RC=%s, Result=%s' % (RC, STAFResult)</log>

  </sequence>

<elseif expr="STAXResult != None">

  <iterate var="file_info" in="STAXResult" indexvar="i">

    <if expr="file_info[0] == 0">

      <sequence>

        <script>

        import re

        fail = re.search('FAIL', file_info[1])

        log_msg = 'HOST:%s\n\n%s' % (machine,file_info[1])

        </script>

        <if expr = "fail">

          <sequence>

            <tcstatus result="'fail'">'Test output contains FAIL'</tcstatus>

			  <log level="'error'">log_msg</log>

          </sequence>

        <else>

          <sequence>

            <tcstatus result="'pass'"></tcstatus>

            <log level="'info'">log_msg</log>

          </sequence>

        </else>

        </if>

        </sequence>

    <else>

      <log level="'error'">'Retrieval of file %s contents failed with RC=%s' % (i, file_info[0])</log>

    </else>

    </if>

  </iterate>

</elseif>

<else>

  <log level="'info'">'STAXResult is None'</log>

</else>

</if>

</function>

</stax>

Step 4: Run STAX jobs on the test clients

From the desktop1 machine, which in STAX is called the monitoring machine, send a carefully crafted STAF command to the test management machine, telling it to run the client_test_harness.xml job:


STAF mgmt1 STAX EXECUTE FILE /QA/STAF/stax_jobs/client_test_harness.xml MACHINE mgmt1 SCRIPTFILE /QA/STAF/stax_jobs/global_vars.py JOBNAME "CLIENT_TEST_HARNESS" SCRIPT "VERSION='1.0.2'" CLEARLOGS Enabled

The above incantation runs a STAF command by specifying a service (STAX) and a request (EXECUTE), then passing various arguments to the request, the most common ones being a FILE (the path to the job XML file), a MACHINE to run the job file on (mgmt1), and a JOBNAME (which can be any string value).

Two other arguments, entirely optional, are Python-specific:

SCRIPTFILE -- points to a Python file whose code will be interpreted after the code in the top-level <script> element of the job file; in my example, the global_vars.py file contains definitions of Python variables that will override the variables defined in the job's <script> element
SCRIPT -- can contain any inline Python code, which will be interpreted after any code in the job's top-level <script> element, and after any code in the SCRIPTFILE; in my example, the VERSION variable is set to 1.0.2 on the command line via the SCRIPT argument, because it is retrieved from the nightly build email notification, and thus is not known in advance. The value 1.0.2 will override whatever values are given in the <script> element and in global_vars.py

To summarize, a SCRIPTFILE file is commonly used as a "static" repository for Python variables that are used across several job files, whereas the SCRIPT inline code is used to pass "dynamic" values for Python variables on the command line.

The above STAF command, if successful, returns an integer that represents the job ID. Based on this ID, we can query the log service on the STAX machine (mgmt1) by running this command:


STAF mgmt1 LOG QUERY MACHINE mgmt1 LOGNAME STAX_Job_jobID

STAX also offers a GUI monitoring tool called the STAX Job Monitor that is usually run on the monitoring machine (desktop1 in our example). The tool is a Java application that is started via the command line (java -jar STAXMon.jar) in the directory which contains the STAX service jar files. The Job Monitor displays the processes that are run within the job, as well as the test case information (test name, pass/fail status, duration) for each test in the test suite.

Conclusion

I will now show how all these steps fit together and give us the capability to run the automated smoke-test scenario I described in the beginning of this section.

A build completion message is sent to several distribution lists with a subject that contains the new version of the software
The build message is forwarded via a mail alias to an account on the test management machine
A .procmailrc file on the test management machine triggers a Python script that runs the "STAF mgmt1 STAF EXECUTE " command in step 4. The script then sits in a loop and periodically queries the log file (via the LOG QUERY command) for the new job identified by jobID. When it sees a line containing "Stop|JobID: jobID", the script sends a message with the job log in its body and the test count (overall, pass and fail) in its subject
The PARALLELITERATE and ITERATE constructs available in STAX allow us to achieve both parallel and sequential operations for the test run: we run the test harness in parallel on all clients, then on each client we run the individual tests comprising the harness in a sequential order. Another very useful STAX construct is TIMER, which makes it very easy to time out the failed tests so that the whole test run is not held up
Since all the individual tests are written using our framework, all the test results are also saved in the Firebird database and can be easily inspected via a Web interface

Two more things are worth mentioning:

Support for STAF/STAX is top-notch and comes via the staf-users mailing list from the IBM developers working on this project.I had two questions answered within hour of each posting.
STAF/STAX is used as the test distribution platform for the Linux Test Project. The January 2005 issue of "Linux Journal" has an article on the Linux Test Project that mentions STAF/STAX.

Wednesday, December 15, 2004

PyFIT tutorial on fitnesse.org

I turned the PyFIT tutorial blog entry into a standalone HTML page that you can find here. I also updated the PytonFit page over at fitnesse.org to point to the same HTML page. I need to figure out how to add raw HTML to the wiki at fitnesse.org.

Agile Testing