Wednesday, February 01, 2006

Continuous integration with buildbot

From the buildbot manual:

"The BuildBot is a system to automate the compile/test cycle required by most software projects to validate code changes. By automatically rebuilding and testing the tree each time something has changed, build problems are pinpointed quickly, before other developers are inconvenienced by the failure. The guilty developer can be identified and harassed without human intervention. By running the builds on a variety of platforms, developers who do not have the facilities to test their changes everywhere before checkin will at least know shortly afterwards whether they have broken the build or not. Warning counts, lint checks, image size, compile time, and other build parameters can be tracked over time, are more visible, and are therefore easier to improve."

All this sounded very promising, so I embarked on the journey of installing and configuring buildbot for the application that Titus and I will be presenting at our PyCon tutorial later this month. I have to say it wasn't trivial to get buildbot to work, and I was hoping to find a simple HOWTO somewhere on the Web, but since I haven't found it, I'm jotting down these notes for future reference. I used the latest version of buildbot, 0.7.1, on a Red Hat 9 Linux box. In the following discussion, I will refer to the application built and tested via buildbot as APP.

Installing buildbot

This step is easy. Just get the package from its SourceForge download page and run "python setup.py install" to install it. A special utility called buildbot will be installed in /usr/local/bin.

Update 2/21/06

I didn't mention in my initial post that you also need to install a number of pre-requisite packages before you can install and run buildbot (thanks to Titus for pointing this out):

a) install ZopeInterface; one way of quickly doing it is running the following command as root:

# easy_install http://www.zope.org/Products/ZopeInterface/3.1.0c1/ZopeInterface-3.1.0c1.tgz

b) install CVSToys; the quick way:

# easy_install http://twistedmatrix.com/users/acapnotic/wares/code/CVSToys/CVSToys-1.0.9.tar.bz2

c) install Twisted; there is no quick way, so I just downloaded the latest version of TwistedSumo (although technically you just need Twisted and TwistedWeb):

# wget http://tmrc.mit.edu/mirror/twisted/Twisted/2.2/TwistedSumo-2006-02-12.tar.bz2
# tar jxvf TwistedSumo-2006-02-12.tar.bz2
# cd TwistedSumo-2006-02-12
# python setup.py install


Creating the buildmaster

The buildmaster is the machine which triggers the build-and-test process by sending commands to other machines known as the buildslaves. The buildmaster itself does not run the build-and-test commands, the slaves do that, then they send the results back to the master, which displays them in a nice HTML format.

The build-and-test process can be scheduled periodically, or can be triggered by source code changes. I took the easy way of just triggering it periodically, every 6 hours.

On my Linux box, I created a user account called buildmaster, I logged in as the buildmaster user and I created a directory called APP. The I ran this command:

buildbot master /home/buildmaster/APP

This created some files in the APP directory, the most important of them being a sample configuration file called master.cfg.sample. I copied that file to master.cfg.

All this was easy. Now comes the hard part.

Configuring the buildmaster

The configuration file master.cfg is really just Python code, and as such it is easy to modify and extend -- if you know where to modify and what to extend :-).

Here are the most important sections of this file, with my modifications:

Defining the project name and URL

Search for c['projectName'] in the configuration file. The default lines are:

c['projectName'] = "Buildbot"
c['projectURL'] = "http://buildbot.sourceforge.net/"


I replaced them with:

c['projectName'] = "App"
c['projectURL'] = "http://www.app.org/"

where App and www.app.org are the name of the application, and its URL respectively. These values are displayed by buildbot in its HTML status page.

Defining the URL for the builbot status page

Search for c['buildbotURL'] in the configuration file. The default line is:

c['buildbotURL'] = "http://localhost:8010/"

I changed it to:

c['buildbotURL'] = "http://www.app.org:9000/"

You need to make sure that whatever port you choose here is actually available on the host machine, and is externally reachable if you want to see the HTLM status page from another machine.

If you replace the default port 8010 with another value (9000 in my case), you also need to specify that value in this line:

c['status'].append(html.Waterfall(http_port=9000))

Defining the buildslaves

Search for c['bots'] in the configuration file. The default line is:

c['bots'] = [("bot1name", "bot1passwd")]

I modified the line to look like this:

c['bots'] = [("x86_rh9", "slavepassword")]

Here I defined a buildslave called x86_rh9 with the given password. If you have more slave machines, just add more tuples to the above list. Make a note of these values, because you will need to use the exact same ones when configuring the buildslaves. More on this when we get there.

Configuring the schedulers

Search for c['schedulers'] in the configuration file. I commented out all the lines in that section and I added these lines:

# We build and test every 6 hours
periodic = Periodic("every_6_hours", ["x86_rh9_trunk"], 6*60*60)
c['schedulers'] = [periodic]

Here I defined a schedule of type Periodic with a name of every_6_hours, which will run a builder called x86_rh9_trunk with a periodicity of 6*60*60 seconds (i.e. 6 hours). The builder name needs to correspond to an actual builder, which we will define in the next section.

I also modified the import line at the top of the config file from:

from buildbot.scheduler import Scheduler

to:

from buildbot.scheduler import Scheduler, Periodic


Configuring the build steps

This is the core of the config file, because this is where you define all the steps that your build-and-test process will consist of.

Search for c['builders'] in the configuration file. I commented out all the lines from:

cvsroot = ":pserver:anonymous@cvs.sourceforge.net:/cvsroot/buildbot"


to:

c['builders'] = [b1]


I added instead these lines:

source = s(step.SVN, mode='update',
baseURL='http://svn.app.org/repos/app/',
defaultBranch='trunk')

unit_tests = s(UnitTests, command="/usr/local/bin/python setup.py test")
text_tests = s(TextTests, command="/usr/local/texttest/texttest.py")
build_egg = s(BuildEgg, command="%s/build_egg.py" % BUILDBOT_SCRIPT_DIR)
install_egg = s(InstallEgg, command="%s/install_egg.py" % BUILDBOT_SCRIPT_DIR)

f = factory.BuildFactory([source,

unit_tests,
text_tests,
build_egg,

install_egg,
])

c['builders'] = [

{'name':'x86_rh9_trunk',
'slavename':'x86_rh9',
'builddir':'test-APP-linux',
'factory':f },
]

First off, here's what the buildbot manual has to say about build steps:

BuildSteps are usually specified in the buildmaster's configuration file, in a list of “step specifications” that is used to create the BuildFactory. These “step specifications” are not actual steps, but rather a tuple of the BuildStep subclass to be created and a dictionary of arguments. There is a convenience function named “s” in the buildbot.process.factory module for creating these specification tuples.

In my example above, I have the following build steps: source, unit_tests, text_tests, build_egg and install_egg.

source is a build step of type SVN which does a SVN update of the source code by going to the specified SVN URL; the default branch is trunk, which was fine with me. If you need to check out a different branch, see the buildbot documentation on SVN operations

For different types (i.e. classes) of steps, it's a good idea to look at the file step.py in the buildbot/process directory (which got installed in my case in /usr/local/lib/python2.4/site-packages/buildbot/process/step.py).

The step.py file already contains pre-canned steps for configuring, compiling and testing your freshly-updated source code. They are called respectively Configure, Compile and Test, and are subclasses of the ShellCommand class, which basically executes a given command, captures its stdout and stderr and returns the exit code for that command.

However, I wanted to have some control at least on the text that appears in the buildbot HTML status page next to my steps. For example, I wanted my UnitTest step to say "unit tests" instead of the default "test". For this, I derived a class from step.ShellCommand and called it UnitTests. I created a file called extensions.py in the same directory as master.cfg and added my own classes, which basically just redefine 3 variables. Here is my entire extensions.py file:

from buildbot.process import step
from step import ShellCommand

class UnitTests(ShellCommand):
name = "unit tests"
description = ["running unit tests"]
descriptionDone = [name]

class TextTests(ShellCommand):
name = "texttest regression tests"
description = ["running texttest regression tests"]
descriptionDone = [name]

class BuildEgg(ShellCommand):
name = "egg creation"
description = ["building egg"]
descriptionDone = [name]

class InstallEgg(ShellCommand):
name = "egg installation"
description = ["installing egg"]
descriptionDone = [name]

The two variables that I wanted to customize are description, which appears in the buildbot HTML status page while that particular step is being executed, and descriptionDone, which appears in the status page once the step is finished.

To make master.cfg aware of my custom classes, I added this line to the top of the config file:

from extensions import UnitTests, TextTests, BuildEgg, InstallEgg

Let's look at the custom build steps I added. For the unit_tests step, I'm telling buildbot to run the command python setup.py test on the buildslaves and report back the results. For the text_tests step, the command is /usr/local/texttest/texttest.py, which is where I installed the TextTest acceptance/regression test package. For build_egg and install_egg, I'm running my own custom scripts build_egg.py and install_egg.py on the buildslave, using the BUILDBOT_SCRIPT_DIR variable which I defined at the top of the configuration file as:

BUILDBOT_SCRIPT_DIR = "/home/buildbot/APP/bot_scripts"

As you add more build steps, you need to also add them to the factory object:

f = factory.BuildFactory([source,
unit_tests,
text_tests,
build_egg,

install_egg,
])

The final step in dealing with build steps is defining the builders, which correspond to the buildslaves. In my case, I only have one buildslave machine, so I'm only defining one builder called x86_rh9_trunk which is running on the slave called x86_rh9. The slave will use a builddir named test-APP-linux; this is the directory where the source code will get checked out and where all the build steps will be performed.

Note: the name of the builder x86_rh9_trunk needs to correspond with the name you indicated when defining the scheduler.

Here is again the code fragment which defines the builder:

c['builders'] = [
{'name':'x86_rh9_trunk',
'slavename':'x86_rh9',
'builddir':'test-APP-linux',
'factory':f },
]

We're pretty much done with configuring the buildmaster. Now it's time to create and configure a buildslave.

Creating and configuring a buildslave

On my Linux box, I created a user account called buildbot, I logged in as the buildbot user and I created a directory called APP. The I ran this command:

buildbot slave /home/buildbot/APP localhost:9989 x86_rh9 slavepassword

Note that most of these values have already been defined in the buildmaster's master.cfg file:
  • localhost is the host where the buildmaster is running (if you're running the master on a different machine from the one running the slave, you need to indicate here a name or an IP address which is reachable from the slave machine)
  • 9989 is the default port that the buildmaster listens on (it is assigned to c['slavePortnum'] in master.cfg)
  • x86_rh9 is the name of this slave, and slavepassword is the password for this slave (both values are assigned in master.cfg to c['bots'])
This command creates a file called buildbot.tac in the APP directory. You can edit the file and change the values of all the elements indicated above.

I also created my custom scripts for building and installing a Python egg. I created a sub-directory of APP called bot_scripts, and in there I put build_egg.py and install_egg.py, the 2 scripts that are referenced in the "Build steps" section of the buildmaster's configuration file.

Starting and stopping the buildmaster and the buildslave

To start the buildmaster, I ran this command as user buildmaster:

buildbot start /home/buildmaster/APP

To stop the buildmaster, I used this command:

buildbot stop /home/buildmaster/APP

When I needed the buildmaster to re-read its configuration file, I used this command:

buildbot sighup /home/buildmaster/APP

I used similar commands to start and stop the buildslave, the only difference being that I was logged in as user buildbot and I indicated /home/buildbot/APP as the BASEDIR directory for the buildbot start/stop/sighup commands.

If everything went well, you should be able at this point to see the buildbot HTML status page at the URL that you defined in the buildmaster master.cfg file (in my case this was http://www.app.org:9000/)

If you can't reach the status page, something might have gone wrong during the startup of the buildmaster. Inspect the file /home/buildmaster/APP/twistd.log for details. I had some configuration file errors initially which prevented the buildmaster from starting.

Whenever the buildmaster is started, it will initiate a build-and-test process. If it can't contact the buildslave, you will see a red cell on the status page with a message such as

18:50:52
ping
no slave

In this case, you need to look at the slave's log file, which in my case is in /home/buildbot/APP/twistd.log. Make sure the host name and port numbers, as well as the slave name and password are the same in the slave's buildbot.tac file and in the master's master.cfg file.

If the slave is reachable from the master, then the build-and-test process should unfold, and you should end up with something like this on the status page:

APP
last build
build
successful
current activity idle
time (PDT) changes x86_rh9_trunk
13:06:19

egg installation
log
egg creation
log
13:05:53 texttest regression tests
log
unit tests
log
update
log
Build 29

That's about it. I'm sure there are many more intricacies that I have yet to discover, but I hope that this HOWTO will be useful to people who are trying to give buildbot a chance, only to be discouraged by its somewhat steep learning curve.

And I can't finish this post without pointing you to the live buildbot status page for the Python code base.

14 comments:

cowmix said...

Can you go into more detail on how you got BuildBot working on RH9? Doesn't RH9 come with an older version of Python that the newer versions of Twisted will not work on?

Anyway.. this was a great post.

Grig Gheorghiu said...

More details about my setup on the RH9 box:

- I compiled and installed Python 2.4.2 from the tarball on python.org.
- I used the exact versions of Twisted and other required packages that are mentioned in the post.

cowmix said...

Cool.. did you replace the default Python installation or did you install 2.4.2 in parallel the to stock version?

Grig Gheorghiu said...

I installed Python 2.4.2 in /usr/local/bin, then I renamed /usr/bin/python to something else, so I didn't have to worry about /usr/bin being in front of /usr/local/bin in PATH.

Anonymous said...

Thanks you very much for this invaluable guide. It really saved me an headhacke ;)

Slava Imeshev said...

You can avoid all mentioned installation hassles by switching to our Parabuild - it takes three minutes to install.

Anonymous said...

If only I came across this guide sooner, I would have saved myself a lot of headache about buildbot. :P Now, I just need to figure out how to get the build master to kick off a build when a change in a repository has been detected

Anonymous said...

EXCELLENT explanation of the create-slave option to buildbot(version 0.7.5). I never did get a clear explanation from the sourceforge buildbot documentation of just which port number was needed. Thank you so much for clarifying that tidbit of knowledge. They should mimick your presentation on that, at sourceforge, to help novices like me get going quicker.

Regards,
Carl E.

Raj said...

BuildFactory.addStep is not releasing spawned Process!!!!

To put things in a gist :

In the master.cfg :

1) I start a checkout
2) I add a step to invoke a service (python.exe --serviceName--
3) start a build
4) Close a service

The problem arises when I invoke the service on step 2. BuildBot (actually
ShellCommand) keeps on hanging to the started process --serviceName-- , EVEN
THOUGH the process is an independent pytthon process.
IE, it executes (2) successfully and remains there and doesnt proceed to
(3). (2) is a standalone process...BB wont proceed to the next step until
it is killed.

Looking at the code, I see that Shell Command maintains an stdio that wont
close until the process exits.

Ive even tried the want_stdout=0 attribute of Factory.addStep() to prevent
output to be piped for THIS step to no avail.

f.addStep(ShellCommand, command=['C:/Python23/python.exe',
'administer_position_service.py', 'start'], description='Start the position
service script', want_stdout=0)

Please help

Raj said...

BuildFactory.addStep is not releasing spawned Process!!!!

To put things in a gist :

In the master.cfg :

1) I start a checkout
2) I add a step to invoke a service (python.exe --serviceName--
3) start a build
4) Close a service

The problem arises when I invoke the service on step 2. BuildBot (actually
ShellCommand) keeps on hanging to the started process --serviceName-- , EVEN
THOUGH the process is an independent pytthon process.
IE, it executes (2) successfully and remains there and doesnt proceed to
(3). (2) is a standalone process...BB wont proceed to the next step until
it is killed.

Looking at the code, I see that Shell Command maintains an stdio that wont
close until the process exits.

Ive even tried the want_stdout=0 attribute of Factory.addStep() to prevent
output to be piped for THIS step to no avail.

f.addStep(ShellCommand, command=['C:/Python23/python.exe',
'administer_position_service.py', 'start'], description='Start the position
service script', want_stdout=0)

Please help

Jim said...

The command

buildbot master /home/buildmaster/APP

should be

buildbot create-master /home/buildmaster/APP

Andriy Drozdyuk said...

Great stuff. Don't be shy to write more like this ;-)

mattack said...

In 0.7.6 and later, c['slaves'] is used instead of c['bots']

This is to help people who find this in the future. IMHO it's much easier to follow than the buildbot manual.

Anonymous said...

You should think about updating this. But even as it is, this was very useful. Thank you! :)