Quick note to self, hopefully useful to others too:
If you compile Python 2.6 (or 2.5) from source, and you want to enable sqlite3 support (which is included in the stdlib for 2.5 and above), then you need to pass a special USE flag to the configuration command line, like this:
./configure USE="sqlite"
(note "sqlite" and not "sqlite3")
Monday, November 23, 2009
Thursday, November 19, 2009
5 years of blogging
Today marks the 5th anniversary of my blog. It's been a fun and rewarding experience, and I hope to never run out of interesting topics to post about ;-)
As a sort of retrospective, I was curious to see which of my blog posts have been getting the most traffic. Here's the top 10 over the last 9 months, according to Google Analytics:
1. Performance vs. load vs. stress testing (as an aside, I think this has been wildly popular because I inadvertently hit on a lot of keywords in the title)
2. Experiences deploying a large-scale infrastructure in Amazon EC2
3. Ajax testing with Selenium using waitForCondition
4. Useful tools for writing Selenium tests
5. Load balancing in EC2 with HAProxy
6. Python unit testing part 1: the unittest module
7. HTTP performance testing with httperf, autobench and openload
8. Running a Python script as a Windows service
9. Apache virtual hosting with Tomcat and mod_jk
10. Configuring Apache 2 and Tomcat 5.5 with mod_jk
It's interesting that 2 of the top 5 posts are Selenium-related. I think Selenium documentation is not where it needs to be generally speaking, hence people find my old posts on this topic. Adam, you really need to write a Selenium RC book!
As a sort of retrospective, I was curious to see which of my blog posts have been getting the most traffic. Here's the top 10 over the last 9 months, according to Google Analytics:
1. Performance vs. load vs. stress testing (as an aside, I think this has been wildly popular because I inadvertently hit on a lot of keywords in the title)
2. Experiences deploying a large-scale infrastructure in Amazon EC2
3. Ajax testing with Selenium using waitForCondition
4. Useful tools for writing Selenium tests
5. Load balancing in EC2 with HAProxy
6. Python unit testing part 1: the unittest module
7. HTTP performance testing with httperf, autobench and openload
8. Running a Python script as a Windows service
9. Apache virtual hosting with Tomcat and mod_jk
10. Configuring Apache 2 and Tomcat 5.5 with mod_jk
It's interesting that 2 of the top 5 posts are Selenium-related. I think Selenium documentation is not where it needs to be generally speaking, hence people find my old posts on this topic. Adam, you really need to write a Selenium RC book!
Tuesday, November 17, 2009
Monitoring multiple MySQL instances with Munin
I've been using Munin for its resource graphing capabilities. I especially like the fact that you can group servers together and watch a common metric (let's say system load) across all servers in a group -- something that is hard to achieve with other similar tools such as Cacti and Ganglia.
I did have the need to monitor multiple MySQL instances running on the same server. I am using mysql-sandbox to launch and manage these instances. I haven't found any pointers on how to use Munin to monitor several MySQL instances, so I rolled my own solution.
First of all, here is my scenario:
Locate mysql_* plugins already installed by the munin-node package in /usr/share/munin/plugins. I have 5 such plugins: mysql_bytes, mysql_isam_space_, mysql_queries, mysql_slowqueries and mysql_threads. I don't use ISAM, so I am ignoring mysql_isam_space_.
Step 2
Make a copy of each plugin for each MySQL instance you want to monitor. I know this contradicts the DRY principle, but I just wanted something quick that worked. The alternative is to modify the plugins and add extra parameters so they refer to specific MySQL instances.
For example, I made N + 1 copies of mysql_bytes and called them mysql_m0_bytes, mysql_m1_bytes,..., mysql_mN_bytes. In each copy, I modified the line "echo 'graph_title MySQL throughput'" to say "echo 'graph_title MySQL throughput for mN'". I did the same for mysql_threads, mysql_queries and mysql_slowqueries. So at the end of this step I have 4 x (N+1) new plugins in /usr/share/munin/plugins.
As I said, the alternative is to modify for example mysql_bytes and add new parameters, e.g. a parameter for the title of the graph. However, I don't know exactly how the plugin is called from within Munin, and I don't want to fiddle with the number and order of parameters it's called with -- which is why I chose the easy way out.
Step 3
Create symlinks in /etc/munin/plugins to the newly created plugins. Example:
(and similar for all the other plugins).
Step 4
Specify the path to msyqladmin for the newly defined plugins. You do this by editing the plugin configuration file /etc/munin/plugin-conf.d/munin-node.
Here's what I have in this file related to MySQL:
What the above lines say is that for each class of plugins starting with mysql_mN, I want to use the mysqladmin utility for that particular MySQL instance. The way mysql-sandbox works, mysqladmin is actually available per instance as "/mysql/mN/my sqladmin".
Note that the naming convention is important. The syntax of the munin-node plugin configuration file says that the plugin name "May include one wildcard ('*') at the start or end of the plugin-name, but not both, and not in the middle." Trust me, I haven't read this fine print initially, and I named my new plugins something like mysql_bytes_mN, then tried to configure the plugins as mysql_*mN. Pulling hair time ensued.
Step 5
Restart munin-node via 'service munin-node restart'. At this point you're supposed to see the new graphs under the Mysql link corresponding to the munin node where you set all this up. You should see N+1 graphs for each type of plugin (mysql_bytes, mysql_threads, mysql_queries and mysql_slowqueries). The graphs can be easily differentiated by their titles, e.g. 'MySQL throughput for m0' or 'MySQL queries for m1', etc.
One other quick tip: if you want to easily group nodes together, come up with some domain name which doesn't need to correspond to a real DNS domain name. For example, I called my MySQL servers something like mysqlN.myproject.mydomain.com in /etc/munin/munin.conf on the Munin server side. This allows me to see the myproject.mydomain.com group at a glance, with all the metrics for the nodes in that group shown side by side.
Here's how I defined each node in munin.conf:
(where N is 1, 2, etc)
I did have the need to monitor multiple MySQL instances running on the same server. I am using mysql-sandbox to launch and manage these instances. I haven't found any pointers on how to use Munin to monitor several MySQL instances, so I rolled my own solution.
First of all, here is my scenario:
- server running Ubuntu 9.04 64-bit
- N+1 MySQL instances installed as sandboxes rooted in /mysql/m0, /mysql/m1,..., /mysql/mN
- munin-node package version 1.2.6-8ubuntu3 (installed via 'apt-get install munin-node')
Locate mysql_* plugins already installed by the munin-node package in /usr/share/munin/plugins. I have 5 such plugins: mysql_bytes, mysql_isam_space_, mysql_queries, mysql_slowqueries and mysql_threads. I don't use ISAM, so I am ignoring mysql_isam_space_.
Step 2
Make a copy of each plugin for each MySQL instance you want to monitor. I know this contradicts the DRY principle, but I just wanted something quick that worked. The alternative is to modify the plugins and add extra parameters so they refer to specific MySQL instances.
For example, I made N + 1 copies of mysql_bytes and called them mysql_m0_bytes, mysql_m1_bytes,..., mysql_mN_bytes. In each copy, I modified the line "echo 'graph_title MySQL throughput'" to say "echo 'graph_title MySQL throughput for mN'". I did the same for mysql_threads, mysql_queries and mysql_slowqueries. So at the end of this step I have 4 x (N+1) new plugins in /usr/share/munin/plugins.
As I said, the alternative is to modify for example mysql_bytes and add new parameters, e.g. a parameter for the title of the graph. However, I don't know exactly how the plugin is called from within Munin, and I don't want to fiddle with the number and order of parameters it's called with -- which is why I chose the easy way out.
Step 3
Create symlinks in /etc/munin/plugins to the newly created plugins. Example:
ln -s /usr/share/munin/plugins/mysql_m0_bytes /etc/munin/plugins/mysql_m0_bytes
(and similar for all the other plugins).
Step 4
Specify the path to msyqladmin for the newly defined plugins. You do this by editing the plugin configuration file /etc/munin/plugin-conf.d/munin-node.
Here's what I have in this file related to MySQL:
[mysql_m0*] user m0 env.mysqladmin /mysql/m0/my sqladmin [mysql_m1*] user m1 env.mysqladmin /mysql/m1/my sqladmin [mysql_m2*] user m2 env.mysqladmin /mysql/m2/my sqladmin [mysql_m3*] user m3 env.mysqladmin /mysql/m3/my sqladmin
What the above lines say is that for each class of plugins starting with mysql_mN, I want to use the mysqladmin utility for that particular MySQL instance. The way mysql-sandbox works, mysqladmin is actually available per instance as "/mysql/mN/my sqladmin".
Note that the naming convention is important. The syntax of the munin-node plugin configuration file says that the plugin name "May include one wildcard ('*') at the start or end of the plugin-name, but not both, and not in the middle." Trust me, I haven't read this fine print initially, and I named my new plugins something like mysql_bytes_mN, then tried to configure the plugins as mysql_*mN. Pulling hair time ensued.
Step 5
Restart munin-node via 'service munin-node restart'. At this point you're supposed to see the new graphs under the Mysql link corresponding to the munin node where you set all this up. You should see N+1 graphs for each type of plugin (mysql_bytes, mysql_threads, mysql_queries and mysql_slowqueries). The graphs can be easily differentiated by their titles, e.g. 'MySQL throughput for m0' or 'MySQL queries for m1', etc.
One other quick tip: if you want to easily group nodes together, come up with some domain name which doesn't need to correspond to a real DNS domain name. For example, I called my MySQL servers something like mysqlN.myproject.mydomain.com in /etc/munin/munin.conf on the Munin server side. This allows me to see the myproject.mydomain.com group at a glance, with all the metrics for the nodes in that group shown side by side.
Here's how I defined each node in munin.conf:
[mysqlN.myproject.mydomain.com] address 192.168.0.N use_node_name yes
(where N is 1, 2, etc)
Behaviour Driven Infrastructure
I just read a post by Matthew Flanagan on Behaviour Driven Infrastructure or BDI, a concept that apparently originates with Martin Englund's post on this topic. The idea is that you describe what you need your system to do in natural language, using for example a tool such as Cucumber. What's more, you can then use the cucumber-nagios plugin to express the desired behaviour of the new system as a series of Nagios checks. The checks will initiall fail (just like in a TDD or BDD development cycle), but you will make them pass by deploying the appropriate packages and applications to the system.
I also expressed the need for automated testing of production deployments in one of my blog posts. However, BDI goes one step further, by describing a test plan for production deployments in natural language. Pretty cool, and again I can only wish that the Python testing tools kept up with Ruby-based tools such as Cucumber and friends....
I also expressed the need for automated testing of production deployments in one of my blog posts. However, BDI goes one step further, by describing a test plan for production deployments in natural language. Pretty cool, and again I can only wish that the Python testing tools kept up with Ruby-based tools such as Cucumber and friends....
Friday, November 13, 2009
Great series of posts on Tokyo Tyrant
Matt Yonkovit has started a series of posts on Tokyo Tyrant at Percona's MySQL Performance Blog. Great in-depth analysis of the reliability and performance of TT.
Part 1: Tokyo Tyrant -- is it durable?
Part 2: Tokyo Tyrant -- the performance wall
Part 3: Tokyo Tyrant -- write bottleneck
(parts 4 and 5, about replication and scaling, are hopefully coming soon)
Part 1: Tokyo Tyrant -- is it durable?
Part 2: Tokyo Tyrant -- the performance wall
Part 3: Tokyo Tyrant -- write bottleneck
(parts 4 and 5, about replication and scaling, are hopefully coming soon)
Tuesday, November 10, 2009
Google using buildbot for Chromium continuous integration
Via Ben Bangert, this gem of a page showing the continuous integration status for the Chromium project at Google. It's cool to see that they're using buildbot. But just like Ben says -- I wish they open sourced the look and feel of that buildbot status page ;-)
NFS troubleshooting with iostat and lsof
Scenario: you mount a volume exported from a NetApp on several Linux clients via NFS
Problem: you see constant high CPU usage on the NetApp, and some of the Linux clients become sluggish, primarily in terms of I/O
Troubleshooting steps:
1) If iostat is not already on the clients, install the sysstat utilities.
2) On each client mounting from the filer, or on a representative sample of the clients, run iostat with -n so that it shows NFS-related statistics. The following
command will run iostat every 5 seconds and show NFS stats in a nicely tabulated output:
3) Notice which client exhibits the most NFS operations per second, and correlate it with the NFS volume on that client which shows the most NFS reads and/or writes per second.
At this point you found the most likely culprit in terms of sending NFS traffic to the filer (there could be several client machines in this position, for example if they are part of a cluster).
5) If not already installed, download and install lsof.
6) Run lsof on the client(s) discovered in step 4, and grep for the directory representing the mount point of the NFS volume with the most reads and/or writes. For example:
This will show you, among other things, which processes are accessing which files under that directory. Usually something will jump out at you in terms of things that are going on outside of the ordinary. In my case, it was logrotate kicking off from a daily cron and compressing a huge log file -- since the log file was on a volume NFS-mounted from the filer, this caused the filer to do extra work, hence its increased CPU usage.
That's about it. Of courser these steps can be refined/modified/added to -- but even in this simple form, they can help you pinpoint NFS issues fairly quickly.
Problem: you see constant high CPU usage on the NetApp, and some of the Linux clients become sluggish, primarily in terms of I/O
Troubleshooting steps:
1) If iostat is not already on the clients, install the sysstat utilities.
2) On each client mounting from the filer, or on a representative sample of the clients, run iostat with -n so that it shows NFS-related statistics. The following
command will run iostat every 5 seconds and show NFS stats in a nicely tabulated output:
# iostat -nh 5
3) Notice which client exhibits the most NFS operations per second, and correlate it with the NFS volume on that client which shows the most NFS reads and/or writes per second.
At this point you found the most likely culprit in terms of sending NFS traffic to the filer (there could be several client machines in this position, for example if they are part of a cluster).
5) If not already installed, download and install lsof.
6) Run lsof on the client(s) discovered in step 4, and grep for the directory representing the mount point of the NFS volume with the most reads and/or writes. For example:
# lsof | grep /var/log
This will show you, among other things, which processes are accessing which files under that directory. Usually something will jump out at you in terms of things that are going on outside of the ordinary. In my case, it was logrotate kicking off from a daily cron and compressing a huge log file -- since the log file was on a volume NFS-mounted from the filer, this caused the filer to do extra work, hence its increased CPU usage.
That's about it. Of courser these steps can be refined/modified/added to -- but even in this simple form, they can help you pinpoint NFS issues fairly quickly.
Thursday, November 05, 2009
Automated deployments with Puppet and Fabric
I've been looking into various configuration management/automated deployment tools lately. At OpenX we used slack, but I wanted something with a bit more functionality than that (although I'm not badmouthing slack by any means -- it can definitely be bent to your will to do pretty much whatever you need in terms of automating your deployments).
From what I see, there are 2 types of configuration management tools:
I also like the categorization of these tools done by the people at ControlTier. Check out their blog post on Achieving Fully Automated Provisioning (which also links to a white paper PDF) for a nice diagram of hierarchy of deployment tools:
Before I go on, let me just say that in my evaluation of different deployment tools, I quickly eliminated the ones that use XML as their configuration language. In my experience, many tools that aim to be language-neutral end up using XML as their configuration language, and then they try to bend XML into a 'real' programming language, thus ending up reinventing the wheel badly. I'd rather use a language I like (Python in my case) as the glue around the various tools in my toolchain. Your mileage may vary of course.
OK, enough theory, let's see some practical examples of Puppet and Fabric in action. While Fabric is very easy to install and has a minimal learning curve, I can't say the same about Puppet. It takes a while to get your brain wrapped around it, and there isn't a lot of great documentation online, so for this reason I warmly recommend that you go buy the book.
Puppet examples
The way I organize things in Puppet is by creating a module for each major package I need to configure. On my puppetmaster server, under /etc/puppet/modules, I have directories such as apache2, mysqlserver, nginx, scribe, tomcat, tornado. Under each such directory I have 2 directories, one called files and one called manifests. I keep files and directories that I need downloaded to the puppet clients under files, and I create manifests (series of actions to be taken on the puppet clients) under manifests. I usually have a single manifest file called init.pp.
Here's an example of the init.pp manifest file for my tornado module:
I'll go through this file from the top down. At the very top I declare some variables that are referenced throughout the file. In particular, $url points to the location where I keep large files that I need every puppet client to download. I could have kept the files inside the tornado module's files directory, and they would have been served by the puppetmaster process, but I prefered to use Apache for better performance and scalability. Note that I do this only for relatively large files such as tar.gz archives.
The Exec stanza (note upper case E) defines certain parameters that will be common to all 'exec' actions that follow. In my case, I specify that I only want to log failures, and I also specify the path for the binaries called in the various 'exec' actions -- this is so I don't have to specify that path each and every time I call 'exec' (alternatively, you can specify the full path to each binary that you call).
The next 2 stanzas define files and directories that I want created on the puppet client nodes. Both 'exec' and 'file' are what is called 'types' in Puppet lingo. I first specify that I wanted the directory /opt/tornado created on each node, and by setting 'recurse=>true' I'm saying that the contents of that directory should be taken from a source which in my case is "puppet:///tornado/bin". This translates to a directory called bin which I created under /etc/puppet/modules/tornado/files. The contents of that directory will be copied over via the puppet internal communication protocol to the destination /opt/tornado by each Puppet client node.
The 'package' type that follows specifies the list of packages I want installed on the client nodes. Note that I don't need to specify how I want those packages installed, only what I want installed. Puppet's language is mostly declarative -- you tell Puppet what you want done, and it does it for you, using OS-specific commands that can vary from one client node to another. It so happens in my case that I know my client nodes all run Ubuntu, so I did specify Ubuntu/Debian-specific package names.
Next in my manifest file is a function definition. You can have these definitions inline, or in a separate manifest file. In my case, I declare a function called 'install_pkg' which takes 3 arguments: the package name, any extra arguments to be passed to the installer, and a module name to test the installation with. The function runs the easy_install command via the 'exec' type, but only if the specified module wasn't already installed on the system.
A paranthesis: the Puppet docs don't recommend the overuse of the 'exec' type, because it strays away from the declarative nature of the Puppet language. With exec, you specifically tell the remote node how to run a specific command, not merely what to do. I find myself using exec very heavily though. I means that I don't grokk Puppet fully yet, but it also means that Puppet doesn't have enough native types yet that can hide OS-specific commands.
One important thing to keep in mind is that for every exec action that you write, you need to specify a condition which becomes true after the successful completion of the action. Otherwise exec will be called each and every time the manifest will be inspected by the puppet nodes. Examples of such conditions:
After defining the function 'install_pkg', I call it 3 times, with various parameters, thus installing 3 Python packages -- virtualenv, boto and grizzled. Note that the syntax for calling a function is funky; it's one of the many things I don't necessarily like about Puppet, but it's an evil you learn to deal with.
Next up in my manifest file is a case statement based on the $architecture variable. Puppet makes several such variables available to your manifests, based on facts gathered from the remote nodes via Facter (which comes with Puppet).
Moving along, we have a package definition, a file definition -- both should be familiar by now -- followed by 3 exec actions:
One more thing here: once you have a module with manifests and files defined properly, you need to define the set of nodes that this module will apply to. The way I do it is to have the following files on the puppet master, in /etc/puppet/manifests:
1) A file called modules.pp which imports the modules I have defined, for example:
2) A file called nodetemplates.pp which contains definitions for 'node templates', i.e. classes of nodes that have the same composition in terms of modules they import and actions they perform. For example:
Here I defined 3 types of nodes: basenode (which includes the 'common' module), default (which applies to any machine not associated with a specific node definition) and webserver (which includes modules such as apache2, tomcat, tornado, and also requires that certain apache modules be enabled).
3) A file called nodes.pp which maps actual machine names of the Puppet clients to node template definitions. For example:
Much more documentation on node definition and node inheritance can be found on the Puppet wiki, especially in the Language Tutorial.
Fabric examples
In comparison with Puppet, Fabric is a breeze. I wanted to live on the cutting edge, so I installed the latest version (alpha, pre-1.0) from github via:
git clone git://github.com/bitprophet/fabric.git
I also easy_install'ed paramiko, which at this time brings down paramiko-1.7.6 (the Fabric documentation warns against using 1.7.5, but I assume 1.7.6 is OK).
Then I proceeded to create a so-called 'fabfile', which is a Python module containing fabric-specific functions. Here is a fragment of a file I called fab_nginx.py:
Note that in its 0.9 and later versions, Fabric uses the 'env' environment dictionary for configuration purposes (it used to be called 'config' pre-0.9).
My file starts by defining or assigning global env configuration variables, for example env.user and env.password (which are special pre-defined variables that I assign to, and which are used by Fabric when connecting to remote hosts via the ssh functionality provided by paramiko). I also define my own variables, for example env.nginx_conf_dir and env.nginx_conf_file. This makes it easy to pass the env dictionary as a whole when I need to format a string. Here's an example from another fab file:
I then have 2 function definitions in my fab file: one called prod, which sets env.hosts to a list of production nginx servers, and one called test, which does the same but sets env.hosts to test nginx servers.
Next I have the actions or tasks that I want performed on the remote hosts. Note the require function (similar in a way to the parameter used in Puppet manifests), which says that the function will only be executed if the given variable in the env dictionary has been assigned to (in my case, the variable is hosts, and I require that the value need to have been provided by either the prod or the test function). This is a useful mechanism to ensure that certain things have been defined before attempting to run commands on the remote servers.
The first task is called disable_server_in_lb. It takes a host name as a parameter, which is the server that I want disabled in the nginx configuration file. I use the handy 'comment' function available in fabric.contrib.files to comment out the lines that contain 'server HOSTNAME' in the nginx configuration. The comment function can be invoked with sudo rights on the remote host by passing use_sudo=True.
The task also calls another function defined in my fab file, restart_nginx. This taks simply calls '/etc/init.d/nginx restart' on the remote host, then verifies that nginx is running by calling is_nginx_running.
By default, when running a command on the remote host, if the command returns a non-zero code, it is considered to have failed by Fabric, and execution stops. In most cases, this is exactly what you want. In case you just want to run a command to get the output, and you don't care if it fails, you can set warn_only=True before running the command. I show an example if this in the is_nginx_running function.
The other main task in my fabfile is enable_server_in_lb. Here I use another handy function offered by Fabric -- the sed function. I substitute '#server HOSTNAME' with 'server HOSTNAME' in the nginx configuration file, then I restart nginx.
So now that we have the fabfile, how do we actually perform the tasks we defined? Let's assume we have a server called 'web1.mydomain.com' that we want disabled in nginx. We want to test our task first in a test environment, so we would call:
By specifying test on the command line before specifying the task, I ensure that Fabric first calls the function named 'test' in the fabfile, which sets the hosts to the test nginx servers.
Once I'm satisfied that this works well in the test environment, I call:
For a real deployment procedure, let's say for deploying tornado-based servers that are behind one or more nginx load balancer, I would do something like this:
This will deploy my new application code to web1.mydomain.com. Of course I can script this and call the above sequence for all my production servers. I assume here that I have another fabfile called fab_tornado.py and a task defined in in which does the actual deployment of the application code (most likely by downloading and easy_install'ing an egg).
That's it for today. It's been more like a whirlwind through two types of automated deployment tools -- Puppet/pull and Fabric/push. I didn't do justice to either of these tools in terms of their full capabilities, but I hope this will still be useful for some people as a starting point into their own explorations.
From what I see, there are 2 types of configuration management tools:
- The first type I call 'pull', which means that the servers pull their configurations and their marching orders in terms of applying those configurations from a centralized location -- both slack and Puppet are in this category. I think this is great for initial configuration of a server. As I described in another post, you can have a server bootstrap itself by installing Puppet (or slack) and then 'call home' to the central Puppet master (or slack repository) and get all the information it needs to configure itself
- The second type I call 'push', which means that you send configurations and commands to a list of servers from a centralized location -- Fabric is in this category. I think this is a more appropriate mode for application-specific deployments, where you might want to deploy first to a subset of servers, then push it to all servers.
I also like the categorization of these tools done by the people at ControlTier. Check out their blog post on Achieving Fully Automated Provisioning (which also links to a white paper PDF) for a nice diagram of hierarchy of deployment tools:
- at the bottom you have tools that install or launch the initial OS on physical servers (via Kickstart/Jumpstart/Cobbler) or on virtual machines/cloud instances (via various vendor tools, or by rolling your own)
- in the middle you have what they call 'system configuration' tools, such as Puppet/Chef/SmartFrog/cfengine/bcfg2
- at the top you have what they call 'application service deployment' tools, such as Fabric/Capistrano/Func -- and of course their own ControlTier tool
Before I go on, let me just say that in my evaluation of different deployment tools, I quickly eliminated the ones that use XML as their configuration language. In my experience, many tools that aim to be language-neutral end up using XML as their configuration language, and then they try to bend XML into a 'real' programming language, thus ending up reinventing the wheel badly. I'd rather use a language I like (Python in my case) as the glue around the various tools in my toolchain. Your mileage may vary of course.
OK, enough theory, let's see some practical examples of Puppet and Fabric in action. While Fabric is very easy to install and has a minimal learning curve, I can't say the same about Puppet. It takes a while to get your brain wrapped around it, and there isn't a lot of great documentation online, so for this reason I warmly recommend that you go buy the book.
Puppet examples
The way I organize things in Puppet is by creating a module for each major package I need to configure. On my puppetmaster server, under /etc/puppet/modules, I have directories such as apache2, mysqlserver, nginx, scribe, tomcat, tornado. Under each such directory I have 2 directories, one called files and one called manifests. I keep files and directories that I need downloaded to the puppet clients under files, and I create manifests (series of actions to be taken on the puppet clients) under manifests. I usually have a single manifest file called init.pp.
Here's an example of the init.pp manifest file for my tornado module:
class tornado { $tornado = "tornado-0.2" $url = "http://mydomain.com/download" $tornado_root_dir = "/opt/tornado" $tornado_log_dir = "/opt/tornado/logs" $tornado_src_dir = "/opt/tornado/$tornado" Exec { logoutput => on_failure, path => ["/bin", "/sbin", "/usr/bin", "/usr/sbin", "/usr/local/bin", "/usr/local/sbin"] } file { "$tornado_root_dir": ensure => directory, recurse => true, source => "puppet:///tornado/bin"; } file { "$tornado_log_dir": ensure => directory, } package { ["curl", "libcurl3", "libcurl3-gnutls", "python-setuptools", "python-pycurl", "python-simplejson", "python-memcache", "python-mysqldb", "python-imaging"]: ensure => installed; } define install_pkg ($pkgname, $extra_easy_install_args = "", $module_to_test_import) { exec { "InstallPkg_$pkgname": command => "easy_install-2.6 $extra_easy_install_args $pkgname", unless => "python2.6 -c 'import $module_to_test_import'", require => Package["python-setuptools"]; } } install_pkg { "virtualenv": pkgname => "virtualenv", module_to_test_import => "virtualenv"; "boto": pkgname => "boto", module_to_test_import => "boto"; "grizzled": pkgname => "grizzled", module_to_test_import => "grizzled.os"; } $oracle_root_dir = "/opt/oracle" case $architecture { i386, i686: { $oracle_instant_client_pkg = "instantclient_11_2-linux-i386" $oracle_instant_client_dir = "instantclient_11_2" } x86_64: { $oracle_instant_client_pkg = "instantclient_11_1-linux-x86_64" $oracle_instant_client_dir = "instantclient_11_1" } } package { ["libaio-dev", "gcc"]: ensure => installed; } file { "$oracle_root_dir": ensure => directory; } exec { "InstallOracleInstantclient": command => "(cd $oracle_root_dir; wget $url/$oracle_instant_client_pkg.tar.gz; tar xvfz $oracle_instant_client_pkg.tar.gz; rm $oracle_instant_client_pkg.tar.gz; cd $oracle_instant_client_dir; ln -s libclntsh.so.11.1 libclntsh.so); echo $oracle_root_dir/$oracle_instant_client_dir > /etc/ld.so.conf.d/oracleinstantclient.conf; ldconfig", creates => "$oracle_root_dir/$oracle_instant_client_dir", require => File[$oracle_root_dir]; } $cx_oracle = "cx_Oracle-5.0.2" exec { "InstallCxOracle": command => "(cd $oracle_root_dir; wget $url/$cx_oracle.tar.gz; tar xvfz $cx_oracle.tar.gz; rm $cx_oracle.tar.gz; cd $oracle_root_dir/$cx_oracle; export ORACLE_HO ME=$oracle_root_dir/$oracle_instant_client_dir; python2.6 setup.py install)", unless => "python2.6 -c 'import cx_Oracle'", require => [Package["libaio-dev"], Package["gcc"], Exec["InstallOracleInstantclient"]]; } exec { "InstallTornado": command => "(cd $tornado_root_dir; wget $url/$tornado.tar.gz; tar xvfz $tornado.tar.gz; rm $tornado.tar.gz; cd $tornado; python2.6 setup.py install)", creates => $tornado_src_dir, unless => "python2.6 -c 'import tornado.web'", require => [File[$tornado_root_dir], Package["python-pycurl"], Package["python-simplejson"], Package["python-memcache"], Package["python-mysqldb"]]; } }
I'll go through this file from the top down. At the very top I declare some variables that are referenced throughout the file. In particular, $url points to the location where I keep large files that I need every puppet client to download. I could have kept the files inside the tornado module's files directory, and they would have been served by the puppetmaster process, but I prefered to use Apache for better performance and scalability. Note that I do this only for relatively large files such as tar.gz archives.
The Exec stanza (note upper case E) defines certain parameters that will be common to all 'exec' actions that follow. In my case, I specify that I only want to log failures, and I also specify the path for the binaries called in the various 'exec' actions -- this is so I don't have to specify that path each and every time I call 'exec' (alternatively, you can specify the full path to each binary that you call).
The next 2 stanzas define files and directories that I want created on the puppet client nodes. Both 'exec' and 'file' are what is called 'types' in Puppet lingo. I first specify that I wanted the directory /opt/tornado created on each node, and by setting 'recurse=>true' I'm saying that the contents of that directory should be taken from a source which in my case is "puppet:///tornado/bin". This translates to a directory called bin which I created under /etc/puppet/modules/tornado/files. The contents of that directory will be copied over via the puppet internal communication protocol to the destination /opt/tornado by each Puppet client node.
The 'package' type that follows specifies the list of packages I want installed on the client nodes. Note that I don't need to specify how I want those packages installed, only what I want installed. Puppet's language is mostly declarative -- you tell Puppet what you want done, and it does it for you, using OS-specific commands that can vary from one client node to another. It so happens in my case that I know my client nodes all run Ubuntu, so I did specify Ubuntu/Debian-specific package names.
Next in my manifest file is a function definition. You can have these definitions inline, or in a separate manifest file. In my case, I declare a function called 'install_pkg' which takes 3 arguments: the package name, any extra arguments to be passed to the installer, and a module name to test the installation with. The function runs the easy_install command via the 'exec' type, but only if the specified module wasn't already installed on the system.
A paranthesis: the Puppet docs don't recommend the overuse of the 'exec' type, because it strays away from the declarative nature of the Puppet language. With exec, you specifically tell the remote node how to run a specific command, not merely what to do. I find myself using exec very heavily though. I means that I don't grokk Puppet fully yet, but it also means that Puppet doesn't have enough native types yet that can hide OS-specific commands.
One important thing to keep in mind is that for every exec action that you write, you need to specify a condition which becomes true after the successful completion of the action. Otherwise exec will be called each and every time the manifest will be inspected by the puppet nodes. Examples of such conditions:
- 'creates' -- specifies a file or directory that gets created by the exec action; if the file or directory is already there, exec won't be called
- 'unless' -- specifies a condition that, if true, results in exec not being called. In my case, this condition is the import of a given Python module, but it can be any shell command that returns 0
After defining the function 'install_pkg', I call it 3 times, with various parameters, thus installing 3 Python packages -- virtualenv, boto and grizzled. Note that the syntax for calling a function is funky; it's one of the many things I don't necessarily like about Puppet, but it's an evil you learn to deal with.
Next up in my manifest file is a case statement based on the $architecture variable. Puppet makes several such variables available to your manifests, based on facts gathered from the remote nodes via Facter (which comes with Puppet).
Moving along, we have a package definition, a file definition -- both should be familiar by now -- followed by 3 exec actions:
- InstallOracleInstantclient performs the download and unpacking of this package, followed by some ldconfig incantations to actually make it work
- InstallCxOracle downloads and installs the cx_Oracle Python package (not a trivial feat at all in and of itself); note that for this action, the require parameter contains Package["libaio-dev"], Package["gcc"], Exec["InstallOracleInstantclient"] -- so we're saying that these 2 packages, and the Instantclient Oracle libraries need to be installed before attempting to even install cx_Oracle
- InstallTornado -- pretty self-explanatory, with the observation that the require parameter again points to a directory and several packages that need to be on the remote node before the installation of Tornado is attempted
One more thing here: once you have a module with manifests and files defined properly, you need to define the set of nodes that this module will apply to. The way I do it is to have the following files on the puppet master, in /etc/puppet/manifests:
1) A file called modules.pp which imports the modules I have defined, for example:
import "common" import "tornado"('common' can be a module where you specify actions that are common across all types of nodes)
2) A file called nodetemplates.pp which contains definitions for 'node templates', i.e. classes of nodes that have the same composition in terms of modules they import and actions they perform. For example:
node basenode { include common } node default inherits basenode { } node webserver inherits basenode { include scribe include apache2 $required_apache2_modules = ["rewrite", "proxy", "proxy_http", "proxy_balancer", "deflate", "headers", "expires"] apache2::module { $required_apache2_modules: ensure => 'present', } include tomcat include tornado }
Here I defined 3 types of nodes: basenode (which includes the 'common' module), default (which applies to any machine not associated with a specific node definition) and webserver (which includes modules such as apache2, tomcat, tornado, and also requires that certain apache modules be enabled).
3) A file called nodes.pp which maps actual machine names of the Puppet clients to node template definitions. For example:
node "web1.mydomain.com" inherits webserver {}4) A file called site.pp which ties together all these other files. It contains:
import "modules" import "nodetemplates" import "nodes"
Much more documentation on node definition and node inheritance can be found on the Puppet wiki, especially in the Language Tutorial.
Fabric examples
In comparison with Puppet, Fabric is a breeze. I wanted to live on the cutting edge, so I installed the latest version (alpha, pre-1.0) from github via:
git clone git://github.com/bitprophet/fabric.git
I also easy_install'ed paramiko, which at this time brings down paramiko-1.7.6 (the Fabric documentation warns against using 1.7.5, but I assume 1.7.6 is OK).
Then I proceeded to create a so-called 'fabfile', which is a Python module containing fabric-specific functions. Here is a fragment of a file I called fab_nginx.py:
from __future__ import with_statement import os from fabric.api import * from fabric.contrib.files import comment, sed # Globals env.user = 'myuser' env.password = 'mypass' env.nginx_conf_dir = '/usr/local/nginx/conf' env.nginx_conf_file = '%(nginx_conf_dir)s/nginx.conf' % env # Environments def prod(): """Nginx production environment.""" env.hosts = ['nginx1', 'nginx2'] def test(): """Nginx test environment.""" env.hosts = ['nginx3'] # Tasks def disable_server_in_lb(hostname): require('hosts', provided_by=[nginx,nginxtest]) comment(env.nginx_conf_file, "server %s" % hostname, use_sudo=True) restart_nginx() def enable_server_in_lb(hostname): require('hosts', provided_by=[nginx,nginxtest]) sed(env.nginx_conf_file, "#server %s" % hostname, "server %s" % hostname, use_sudo=True) restart_nginx() def restart_nginx(): require('hosts', provided_by=[nginx,nginxtest]) sudo('/etc/init.d/nginx restart') is_nginx_running() def is_nginx_running(warn_only=False): with settings(warn_only=warn_only): output = run('ps -def|grep nginx|grep -v grep') if warn_only: print 'output:', output print 'failed:', output.failed print 'return_code:', output.return_code
Note that in its 0.9 and later versions, Fabric uses the 'env' environment dictionary for configuration purposes (it used to be called 'config' pre-0.9).
My file starts by defining or assigning global env configuration variables, for example env.user and env.password (which are special pre-defined variables that I assign to, and which are used by Fabric when connecting to remote hosts via the ssh functionality provided by paramiko). I also define my own variables, for example env.nginx_conf_dir and env.nginx_conf_file. This makes it easy to pass the env dictionary as a whole when I need to format a string. Here's an example from another fab file:
cmd = 'mv -f %(crt_egg)s %(backup_dir)s' % env
I then have 2 function definitions in my fab file: one called prod, which sets env.hosts to a list of production nginx servers, and one called test, which does the same but sets env.hosts to test nginx servers.
Next I have the actions or tasks that I want performed on the remote hosts. Note the require function (similar in a way to the parameter used in Puppet manifests), which says that the function will only be executed if the given variable in the env dictionary has been assigned to (in my case, the variable is hosts, and I require that the value need to have been provided by either the prod or the test function). This is a useful mechanism to ensure that certain things have been defined before attempting to run commands on the remote servers.
The first task is called disable_server_in_lb. It takes a host name as a parameter, which is the server that I want disabled in the nginx configuration file. I use the handy 'comment' function available in fabric.contrib.files to comment out the lines that contain 'server HOSTNAME' in the nginx configuration. The comment function can be invoked with sudo rights on the remote host by passing use_sudo=True.
The task also calls another function defined in my fab file, restart_nginx. This taks simply calls '/etc/init.d/nginx restart' on the remote host, then verifies that nginx is running by calling is_nginx_running.
By default, when running a command on the remote host, if the command returns a non-zero code, it is considered to have failed by Fabric, and execution stops. In most cases, this is exactly what you want. In case you just want to run a command to get the output, and you don't care if it fails, you can set warn_only=True before running the command. I show an example if this in the is_nginx_running function.
The other main task in my fabfile is enable_server_in_lb. Here I use another handy function offered by Fabric -- the sed function. I substitute '#server HOSTNAME' with 'server HOSTNAME' in the nginx configuration file, then I restart nginx.
So now that we have the fabfile, how do we actually perform the tasks we defined? Let's assume we have a server called 'web1.mydomain.com' that we want disabled in nginx. We want to test our task first in a test environment, so we would call:
fab -f fab_nginx.py test disable_server_in_lb:web1.mydomain.com(note the syntax for passing parameters to a function/task)
By specifying test on the command line before specifying the task, I ensure that Fabric first calls the function named 'test' in the fabfile, which sets the hosts to the test nginx servers.
Once I'm satisfied that this works well in the test environment, I call:
fab -f fab_nginx.py prod disable_server_in_lb:web1.mydomain.com
For a real deployment procedure, let's say for deploying tornado-based servers that are behind one or more nginx load balancer, I would do something like this:
fab -f fab_nginx.py prod disable_server_in_lb:web1.mydomain.com fab -f fab_tornado.py prod deploy fab -f fab_nginx.py prod enable_server_in_lb:web1.mydomain.com
This will deploy my new application code to web1.mydomain.com. Of course I can script this and call the above sequence for all my production servers. I assume here that I have another fabfile called fab_tornado.py and a task defined in in which does the actual deployment of the application code (most likely by downloading and easy_install'ing an egg).
That's it for today. It's been more like a whirlwind through two types of automated deployment tools -- Puppet/pull and Fabric/push. I didn't do justice to either of these tools in terms of their full capabilities, but I hope this will still be useful for some people as a starting point into their own explorations.
Subscribe to:
Posts (Atom)
Modifying EC2 security groups via AWS Lambda functions
One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...
-
A short but sweet PM Boulevard interview with Jerry Weinberg on Agile management/methods. Of course, he says we need to drop the A and actu...
-
Here's a good interview question for a tester: how do you define performance/load/stress testing? Many times people use these terms inte...
-
Update 02/26/07 -------- The link to the old httperf page wasn't working anymore. I updated it and pointed it to the new page at HP. Her...