Thursday, November 05, 2009

Automated deployments with Puppet and Fabric

I've been looking into various configuration management/automated deployment tools lately. At OpenX we used slack, but I wanted something with a bit more functionality than that (although I'm not badmouthing slack by any means -- it can definitely be bent to your will to do pretty much whatever you need in terms of automating your deployments).

From what I see, there are 2 types of configuration management tools:
  1. The first type I call 'pull', which means that the servers pull their configurations and their marching orders in terms of applying those configurations from a centralized location -- both slack and Puppet are in this category. I think this is great for initial configuration of a server. As I described in another post, you can have a server bootstrap itself by installing Puppet (or slack) and then 'call home' to the central Puppet master (or slack repository) and get all the information it needs to configure itself
  2. The second type I call 'push', which means that you send configurations and commands to a list of servers from a centralized location -- Fabric is in this category. I think this is a more appropriate mode for application-specific deployments, where you might want to deploy first to a subset of servers, then push it to all servers.
So, as a rule of thumb, I think it makes sense to use a tool like Puppet for the initial configuration of the OS and of the packages required by your application (things like MySQL, Apache, Tomcat, Tornado, Nginx, or whatever your application relies on). When it comes time to deploy your application, I think a tool like Fabric is more appropriate, since it gives you more immediate and finer-grained control over what you want to do.

I also like the categorization of these tools done by the people at ControlTier. Check out their blog post on Achieving Fully Automated Provisioning (which also links to a white paper PDF) for a nice diagram of hierarchy of deployment tools:
  • at the bottom you have tools that install or launch the initial OS on physical servers (via Kickstart/Jumpstart/Cobbler) or on virtual machines/cloud instances (via various vendor tools, or by rolling your own)
  • in the middle you have what they call 'system configuration' tools, such as Puppet/Chef/SmartFrog/cfengine/bcfg2
  • at the top you have what they call 'application service deployment' tools, such as Fabric/Capistrano/Func -- and of course their own ControlTier tool
In a comment on one of my posts,  Damon Edwards from ControlTier calls Fabric a "command dispatching tool", as opposed to Puppet, which he calls a "configuration management tool". I think this relates to the 2 types of tools I described above, where you 'push' or 'dispatch' commands with Fabric, and you 'pull' configurations and actions with Puppet.

Before I go on, let me just say that in my evaluation of different deployment tools, I quickly eliminated the ones that use XML as their configuration language. In my experience, many tools that aim to be language-neutral end up using XML as their configuration language, and then they try to bend XML into a 'real' programming language, thus ending up reinventing the wheel badly. I'd rather use a language I like (Python in my case) as the glue around the various tools in my toolchain. Your mileage may vary of course.

OK, enough theory, let's see some practical examples of Puppet and Fabric in action. While Fabric is very easy to install and has a minimal learning curve, I can't say the same about Puppet. It takes a while to get your brain wrapped around it, and there isn't a lot of great documentation online, so for this reason I warmly recommend that you go buy the book.

Puppet examples

The way I organize things in Puppet is by creating a module for each major package I need to configure. On my puppetmaster server, under /etc/puppet/modules, I have directories such as apache2, mysqlserver, nginx, scribe, tomcat, tornado. Under each such directory I have 2 directories, one called files and one called manifests. I keep files and directories that I need downloaded to the puppet clients under files, and I create manifests (series of actions to be taken on the puppet clients) under manifests. I usually have a single manifest file called init.pp.

Here's an example of the init.pp manifest file for my tornado module:

class tornado {
 $tornado = "tornado-0.2"
 $url = "http://mydomain.com/download"

 $tornado_root_dir = "/opt/tornado"
 $tornado_log_dir = "/opt/tornado/logs"
 $tornado_src_dir = "/opt/tornado/$tornado"

 Exec {
  logoutput => on_failure,
  path => ["/bin", "/sbin", "/usr/bin", "/usr/sbin", "/usr/local/bin",  "/usr/local/sbin"]
 }

 file { 
  "$tornado_root_dir":
  ensure => directory,
  recurse => true,
  source =>  "puppet:///tornado/bin";
 }

 file { 
  "$tornado_log_dir":
  ensure => directory,
 }

 package {
  ["curl", "libcurl3", "libcurl3-gnutls", "python-setuptools", "python-pycurl", "python-simplejson", "python-memcache", "python-mysqldb", "python-imaging"]:
  ensure => installed;
 }

 define install_pkg ($pkgname, $extra_easy_install_args = "", $module_to_test_import) {
  exec {
   "InstallPkg_$pkgname":
   command => "easy_install-2.6 $extra_easy_install_args $pkgname",
   unless => "python2.6 -c 'import $module_to_test_import'",
   require => Package["python-setuptools"];
  }
 }

 install_pkg {
  "virtualenv":
  pkgname => "virtualenv",
  module_to_test_import => "virtualenv";

  "boto":
  pkgname => "boto",
  module_to_test_import => "boto";

  "grizzled":
  pkgname => "grizzled",
  module_to_test_import => "grizzled.os";
 }

 $oracle_root_dir = "/opt/oracle"
 
 case $architecture {
  i386, i686: { 
   $oracle_instant_client_pkg = "instantclient_11_2-linux-i386"
   $oracle_instant_client_dir = "instantclient_11_2"
  }
  x86_64: { 
   $oracle_instant_client_pkg = "instantclient_11_1-linux-x86_64"
   $oracle_instant_client_dir = "instantclient_11_1"
  }
 }

 package {
  ["libaio-dev", "gcc"]:
  ensure => installed;
 }

 file { 
  "$oracle_root_dir":
  ensure => directory;
 }

 exec {
  "InstallOracleInstantclient":
  command => "(cd $oracle_root_dir; wget $url/$oracle_instant_client_pkg.tar.gz; tar xvfz $oracle_instant_client_pkg.tar.gz; rm $oracle_instant_client_pkg.tar.gz; 
cd $oracle_instant_client_dir; ln -s libclntsh.so.11.1 libclntsh.so); echo $oracle_root_dir/$oracle_instant_client_dir > /etc/ld.so.conf.d/oracleinstantclient.conf; ldconfig",
  creates => "$oracle_root_dir/$oracle_instant_client_dir",
  require => File[$oracle_root_dir];
 }

 $cx_oracle = "cx_Oracle-5.0.2"
 exec {
  "InstallCxOracle":
  command => "(cd $oracle_root_dir; wget $url/$cx_oracle.tar.gz; tar xvfz $cx_oracle.tar.gz; rm $cx_oracle.tar.gz; cd $oracle_root_dir/$cx_oracle; export ORACLE_HO
ME=$oracle_root_dir/$oracle_instant_client_dir; python2.6 setup.py install)",
  unless => "python2.6 -c 'import cx_Oracle'",
  require => [Package["libaio-dev"], Package["gcc"], Exec["InstallOracleInstantclient"]];
 }

 exec {
  "InstallTornado":
  command => "(cd $tornado_root_dir; wget $url/$tornado.tar.gz; tar xvfz $tornado.tar.gz; rm $tornado.tar.gz; cd $tornado; python2.6 setup.py install)",
  creates => $tornado_src_dir,
  unless => "python2.6 -c 'import tornado.web'",
  require => [File[$tornado_root_dir], Package["python-pycurl"], Package["python-simplejson"], Package["python-memcache"], Package["python-mysqldb"]];
 }
}

I'll go through this file from the top down. At the very top I declare some variables that are referenced throughout the file. In particular, $url points to the location where I keep large files that I need every puppet client to download. I could have kept the files inside the tornado module's files directory, and they would have been served by the puppetmaster process, but I prefered to use Apache for better performance and scalability. Note that I do this only for relatively large files such as tar.gz archives.

The Exec stanza (note upper case E) defines certain parameters that will be common to all 'exec' actions that follow. In my case, I specify that I only want to log failures, and I also specify the path for the binaries called in the various 'exec' actions -- this is so I don't have to specify that path each and every time I call 'exec' (alternatively, you can specify the full path to each binary that you call).

The next 2 stanzas define files and directories that I want created on the puppet client nodes. Both 'exec' and 'file' are what is called 'types' in Puppet lingo. I first specify that I wanted the directory /opt/tornado created on each node, and by setting 'recurse=>true' I'm saying that the contents of that directory should be taken from a source which in my case is "puppet:///tornado/bin". This translates to a directory called bin which I created under /etc/puppet/modules/tornado/files. The contents of that directory will be copied over via the puppet internal communication protocol to the destination /opt/tornado by each Puppet client node.

The 'package' type that follows specifies the list of packages I want installed on the client nodes. Note that I don't need to specify how I want those packages installed, only what I want installed. Puppet's language is mostly declarative -- you tell Puppet what you want done, and it does it for you, using OS-specific commands that can vary from one client node to another. It so happens in my case that I know my client nodes all run Ubuntu, so I did specify Ubuntu/Debian-specific package names.

Next in my manifest file is a function definition. You can have these definitions inline, or in a separate manifest file. In my case, I declare a function called 'install_pkg' which takes 3 arguments: the package name, any extra arguments to be passed to the installer, and a module name to test the installation with. The function runs the easy_install command via the 'exec' type, but only if the specified module wasn't already installed on the system.

A paranthesis: the Puppet docs don't recommend the overuse of the 'exec' type, because it strays away from the declarative nature of the Puppet language. With exec, you specifically tell the remote node how to run a specific command, not merely what to do. I find myself using exec very heavily though. I means that I don't grokk Puppet fully yet, but it also means that Puppet doesn't have enough native types yet that can hide OS-specific commands.

One important thing to keep in mind is that for every exec action that you write, you need to specify a condition which becomes true after the successful completion of the action. Otherwise exec will be called each and every time the manifest will be inspected by the puppet nodes. Examples of such conditions:
  • 'creates' -- specifies a file or directory that gets created by the exec action; if the file or directory is already there, exec won't be called
  • 'unless' -- specifies a condition that, if true, results in exec not being called. In my case, this condition is the import of a given Python module, but it can be any shell command that returns 0
Another thing to note in the exec action is the 'require' parameter. You'll find yourself using 'require' over and over again. It is a critical component of Puppet manifests, and it is so important because it allows you to order the actions in the manifest. Without it, actions would be executed in random order, which is most likely something you don't want. In my function definition, I require the existence of the package python-setuptools, and I do it because I need the easy_install command to be present on the remote node.

After defining the function 'install_pkg', I call it 3 times, with various parameters, thus installing 3 Python packages -- virtualenv, boto and grizzled. Note that the syntax for calling a function is funky; it's one of the many things I don't necessarily like about Puppet, but it's an evil you learn to deal with.

Next up in my manifest file is a case statement based on the $architecture variable. Puppet makes several such variables available to your manifests, based on facts gathered from the remote nodes via Facter (which comes with Puppet).

Moving along, we have a package definition, a file definition -- both should be familiar by now -- followed by 3 exec actions:
  • InstallOracleInstantclient performs the download and unpacking of this package, followed by some ldconfig incantations to actually make it work
  • InstallCxOracle downloads and installs the cx_Oracle Python package (not a trivial feat at all in and of itself); note that for this action, the require parameter contains Package["libaio-dev"], Package["gcc"], Exec["InstallOracleInstantclient"] -- so we're saying that these 2 packages, and the Instantclient Oracle libraries need to be installed before attempting to even install cx_Oracle
  • InstallTornado -- pretty self-explanatory, with the observation that the require parameter again points to a directory and several packages that need to be on the remote node before the installation of Tornado is attempted
Whew. Nobody said Puppet is easy. But let me tell you, when you get everything working smoothly (after much pulling of hair), it's a great feeling to let a node 'phone home' to the puppetmaster server and configure itself unattended in a matter of minutes. It's worth the effort and the pain.

One more thing here: once you have a module with manifests and files defined properly, you need to define the set of nodes that this module will apply to. The way I do it is to have the following files on the puppet master, in /etc/puppet/manifests:

1) A file called modules.pp which imports the modules I have defined, for example:
import "common" 
import "tornado"
('common' can be a module where you specify actions that are common across all types of nodes)

2) A file called nodetemplates.pp which contains definitions for 'node templates', i.e. classes of nodes that have the same composition in terms of modules they import and actions they perform. For example:
node basenode {
    include common
}

node default inherits basenode {
}

node webserver inherits basenode {
    include scribe
    include apache2
    $required_apache2_modules = ["rewrite", "proxy", "proxy_http", "proxy_balancer", "deflate", "headers", "expires"]
    apache2::module {
        $required_apache2_modules:
        ensure => 'present',
    }
    include tomcat
    include tornado
}

Here I defined 3 types of nodes: basenode (which includes the 'common' module), default (which applies to any machine not associated with a specific node definition) and webserver (which includes modules such as apache2, tomcat, tornado, and also requires that certain apache modules be enabled).

3) A file called nodes.pp which maps actual machine names of the Puppet clients to node template definitions. For example:
node "web1.mydomain.com" inherits webserver {}
4) A file called site.pp which ties together all these other files. It contains:
import "modules"
import "nodetemplates"
import "nodes" 

Much more documentation on node definition and node inheritance can be found on the Puppet wiki, especially in the Language Tutorial.

Fabric examples

In comparison with Puppet, Fabric is a breeze. I wanted to live on the cutting edge, so I installed the latest version (alpha, pre-1.0) from github via:

git clone git://github.com/bitprophet/fabric.git

I also easy_install'ed paramiko, which at this time brings down paramiko-1.7.6 (the Fabric documentation warns against using 1.7.5, but I assume 1.7.6 is OK).

Then I proceeded to create a so-called 'fabfile', which is a Python module containing fabric-specific functions. Here is a fragment of a file I called fab_nginx.py:

from __future__ import with_statement
import os
from fabric.api import *
from fabric.contrib.files import comment, sed

# Globals

env.user = 'myuser'
env.password = 'mypass'
env.nginx_conf_dir = '/usr/local/nginx/conf'
env.nginx_conf_file = '%(nginx_conf_dir)s/nginx.conf' % env

# Environments


def prod():
    """Nginx production environment."""
    env.hosts = ['nginx1', 'nginx2']

def test():
    """Nginx test environment."""
    env.hosts = ['nginx3']

# Tasks

def disable_server_in_lb(hostname):
    require('hosts', provided_by=[nginx,nginxtest])
    comment(env.nginx_conf_file, "server %s" % hostname, use_sudo=True)
    restart_nginx()

def enable_server_in_lb(hostname):
    require('hosts', provided_by=[nginx,nginxtest])
    sed(env.nginx_conf_file, "#server %s" % hostname, "server %s" % hostname, use_sudo=True)
    restart_nginx()

def restart_nginx():
    require('hosts', provided_by=[nginx,nginxtest])
    sudo('/etc/init.d/nginx restart')
    is_nginx_running()

def is_nginx_running(warn_only=False):
    with settings(warn_only=warn_only):
        output = run('ps -def|grep nginx|grep -v grep')
        if warn_only:
            print 'output:', output
            print 'failed:', output.failed
            print 'return_code:', output.return_code

Note that in its 0.9 and later versions, Fabric uses the 'env' environment dictionary for configuration purposes (it used to be called 'config' pre-0.9).

My file starts by defining or assigning global env configuration variables, for example env.user and env.password (which are special pre-defined variables that I assign to, and which are used by Fabric when connecting to remote hosts via the ssh functionality provided by paramiko). I also define my own variables, for example env.nginx_conf_dir and env.nginx_conf_file. This makes it easy to pass the env dictionary as a whole when I need to format a string. Here's an example from another fab file:

cmd = 'mv -f %(crt_egg)s %(backup_dir)s' % env

I then have 2 function definitions in my fab file: one called prod, which sets env.hosts to a list of production nginx servers, and one called test, which does the same but sets env.hosts to test nginx servers.

Next I have the actions or tasks that I want performed on the remote hosts. Note the require function (similar in a way to the parameter used in Puppet manifests), which says that the function will only be executed if the given variable in the env dictionary has been assigned to (in my case, the variable is hosts, and I require that the value need to have been provided by either the prod or the test function). This is a useful mechanism to ensure that certain things have been defined before attempting to run commands on the remote servers.

The first task is called disable_server_in_lb. It takes a host name as a parameter, which is the server that I want disabled in the nginx configuration file. I use the handy 'comment' function available in fabric.contrib.files to comment out the lines that contain 'server HOSTNAME' in the nginx configuration. The comment function can be invoked with sudo rights on the remote host by passing use_sudo=True.

The task also calls another function defined in my fab file, restart_nginx. This taks simply calls '/etc/init.d/nginx restart' on the remote host, then verifies that nginx is running by calling is_nginx_running.

By default, when running a command on the remote host, if the command returns a non-zero code, it is considered to have failed by Fabric, and execution stops. In most cases, this is exactly what you want. In case you just want to run a command to get the output, and you don't care if it fails, you can set warn_only=True before running the command. I show an example if this in the is_nginx_running function.

The other main task in my fabfile is enable_server_in_lb. Here I use another handy function offered by Fabric -- the sed function. I substitute '#server  HOSTNAME' with 'server HOSTNAME' in the nginx configuration file, then I restart nginx.
So now that we have the fabfile, how do we actually perform the tasks we defined? Let's assume we have a server called 'web1.mydomain.com' that we want disabled in nginx. We want to test our task first in a test environment, so we would call:
fab -f fab_nginx.py test disable_server_in_lb:web1.mydomain.com
(note the syntax for passing parameters to a function/task)

By specifying test on the command line before specifying the task, I ensure that Fabric first calls the function named 'test' in the fabfile, which sets the hosts to the test nginx servers.

Once I'm satisfied that this works well in the test environment, I call:

fab -f fab_nginx.py prod disable_server_in_lb:web1.mydomain.com

For a real deployment procedure, let's say for deploying tornado-based servers that are behind one or more nginx load balancer, I would do something like this:

fab -f fab_nginx.py prod disable_server_in_lb:web1.mydomain.com
fab -f fab_tornado.py prod deploy
fab -f fab_nginx.py prod enable_server_in_lb:web1.mydomain.com

This will deploy my new application code to web1.mydomain.com. Of course I can script this and call the above sequence for all my production servers. I assume here that I have another fabfile called fab_tornado.py and a task defined in in which does the actual deployment of the application code (most likely by downloading and easy_install'ing an egg).

That's it for today. It's been more like a whirlwind through two types of automated deployment tools -- Puppet/pull and Fabric/push. I didn't do justice to either of these tools in terms of their full capabilities, but I hope this will still be useful for some people as a starting point into their own explorations.

8 comments:

Gheorghe Gheorghiu said...

Munca nu gluma ! Ma bucur ca ai avut totusi timp pentru a scrie pe blog si sunt sigur ca va folosi cuiva!Felicitari !

Chris Petrilli said...

This is a great introduction. I'm wondering if you use Kickstart/Jumpstart/etc to build? In the past I've done a lot of that, but never integrated the whole life-cycle chain.

James Turnbull said...

My book on Puppet might help some people out -
Very cool to see Puppet in a layered model like that.

Lucia said...

Ia sa vedem...

jay said...

Instead of:

output = run('ps -def|grep nginx|grep -v grep')

you probably want:

output = run('ps -def|grep -v grep|grep nginx')

Reason being that it is the exit code of the last command in the pipeline that matters, and it's nginx that you care about seeing, not grep.

Dan said...

An old trick:
output = run('ps -def|grep [n]ginx')

Philipp Keller said...

It would be nice to get the fabric host list from puppet, did you look into that one?

Greets Philipp

J Whitehouse said...

I generally use "pgrep -f" to find processes. It returns a list of process ids and doesn't find itself like "ps ax | grep proc". You can also split its string to count processes. This is probably more elegant to do in python with string.split(), but here's a way to do that in BASH:

PROCESSES=$(pgrep -f $PROGRAM | wc -w)

if [[ $PROCESSES -lt 1 ]]; then
echo "No running $PROGRAM processes"
exit 1
elif [[ $PROCESSES -gt 1 ]]; then
echo "More than one running $PROGRAM process"
exit 1
else
echo "One running $PROGRAM process"
exit 0
fi

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...