Why Chef this time, you ask? Although I am a Python guy, I prefer learning a smattering of Ruby rather than a proprietary DSL for configuration management. Also, when I upgraded my EC2 instances to the latest Ubuntu Lucid AMIs, puppet stopped working, so I was almost forced to look into Chef -- and I've liked what I've seen so far. I don't want to bad-mouth puppet though, I recommend you look into both if you need a good configuration management/deployment tool.
Here is a high-level view of the bootstrapping procedure I'm using:
1) You create Chef roles and tie them to cookbooks and recipes that you want executed on machines which will be associated with these roles.
2) You launch an EC2 Ubuntu AMI using any method you want (the EC2 Java-based command-line API, or scripts based on boto, etc.). The main thing here is that you pass a custom shell script to the instance via a user-data file.
3) When the EC2 instance boots up, it runs your custom user-data shell script. The script installs chef-client and its prerequisites, downloads the files necessary for running chef-client, runs chef-client once to register with the chef master and to run the recipes associated with its role, and finally runs chef-client in the background so that it wakes up and executed every N minutes.
Here are the 3 steps in more detail.
1) Create Chef roles, cookbooks and recipes
I already described how to do this in my previous post.
For the purposes of this example, let's assume we have a role called 'base' associated with a cookbook called 'base' and another role called 'myapp' associated with a cookbook called 'myapp'.
The 'base' cookbook contains recipes that can do things like installing packages that are required across all your applications, creating users and groups that you need across all server types, etc.
The 'myapp' cookbook contains recipes that can do things specific to one of your particular applications -- in my case, things like installing and configuring tornado/nginx/haproxy.
As a quick example, here's how to add a user and a group both called "myoctopus". This can be part of the default recipe in the cookbook 'base' (in the file cookbooks/base/recipes/default.rb).
The home directory is /home/myoctopus, and we make sure that directory exists and is owned by the user and group myoctopus.
group "myoctopus" do
action :create
end
user "myoctopus" do
gid "myoctopus"
home "/home/myoctopus"
shell "/bin/bash"
end
%w{/home/myoctopus}.each do |dir|
directory dir do
owner "myoctopus"
group "myoctopus"
mode "0755"
action :create
not_if "test -d #{dir}"
end
end
The role 'base' looks something like this, in a file called roles/base.rb:
name "base"
description "Base role (installs common packages)"
run_list("recipe[base]")
The role 'myapp' looks something like this, in a file called roles/myapp.rb:
name "myapp"
description "Installs required packages and applications for an app server"
run_list "recipe[memcached]", "recipe[myapp::tornado]"
Note that the role myapp specifies 2 recipes to be run: one is the default recipe of the 'memcached' cookbook (which is part of the Opscode cookbooks), and one is a reciped called tornado which is part of the myapp cookbook (the file for that recipe is cookbooks/myapp/recipes/tornado.rb). Basically, to denote a recipe, you either specify its cookbook (if the recipe is the default recipe of that cookbook), or you specify cookbook::recipe_name (if the recipe is non-default).
So far, we haven't associated any clients with these roles. We're going to do that on the client EC2 instance. This way the Chef server doesn't have to do any configuration operations during the bootstrap of the EC2 instance.
2) Launching an Ubuntu EC2 AMI with custom user-data
I wrote a Python wrapper around the EC2 command-line API tools. To launch an EC2 instance, I use the ec2-run-instances command-line tool. My Python script also takes a command line option called chef_role, which specifies the Chef role I want to associate with the instance I am launching. The main ingredient in the launching of the instance is the user-data file (passed to ec2-run-instances via the -f flag).
I use this template for the user-data file. My Python wrapper replaces HOSTNAME with an actual host name that I pass via a cmdline option. The Python wrapper also replaces CHEF_ROLE with the value of the chef_role cmdline option (which defaults to 'base').
The shell script which makes up the user-data file does the following:
a) Overwrites /etc/hosts with a version that has hardcoded values for chef.mycloud and mysite.com. The chef.mycloud.com box is where I run Chef server, and mysite.com is a machine serving as a download repository for utility scripts.
b) Downloads Eric Hammond's runurl script, which it uses to run other utility scripts.
c) Executes via runurl the script mysite.com/customize/hostname and passes it the real hostname of the machine being launched. The hostname script simply sets the hostname on the machine:
#!/bin/bash
hostname $1
echo $1 > /etc/hostname
d) Executes via runurl the script mysite.com/customize/hosts and passes it 2 arguments: add and self. Here's the hosts script:
#!/bin/bash
if [[ "$1" == "add" ]]; then
IPADDR=`ifconfig | grep 'inet addr:'| grep -v '127.0.0.1' | cut -d: -f2 | awk '{ print $1}'`
HOSTNAME=`hostname`
sed -i "s/127.0.0.1 localhost.localdomain localhost/127.0.0.1 localhost.localdomain localhost\n$IPADDR $HOSTNAME.mycloud.com $HOSTNAME\n/g" /etc/hosts
fi
What this does is it adds the internal IP of the machine being launched to /etc/hosts and associates it with both the FQDN and the short hostname. The FQDN bit is important for chef configuration purposes. It needs to come before the short form in /etc/hosts. I could have obviously also used DNS, but at bootstrap time I prefer to deal with hardcoded host names for now.
Update 07/22/10
Patrick Lightbody sent me a note saying that it's easier to get the local IP address of the machine by using one of the handy EC2 internal HTTP queries.
If you run "curl -s http://169.254.169.254/latest/meta-data" on any EC2 instance, you'll see a list of variables that you can inspect that way. For the local IP, I modified my script above to use:
IPADDR=`curl -s http://169.254.169.254/latest/meta-data/local-ipv4`
e) Finally, and most importantly for this discussion, executes via runurl the script mysite.com/install/chef-client and passes it the actual value of the cmdline argument chef_role. The chef-client script does the heavy lifting in terms of installing and configuring chef-client on the instance being launched. As such, I will describe it in the next step.
3) Installing and configuring chef-client on the newly launched instance
Here is the chef-client script I'm using. The comments are fairly self-explanatory. Because I am passing CHEF_ROLE as its first argument, the script knows which role to associate with the client. It does it by downloading the appropriate chef.${CHEF_ROLE}.json. To follow the example, I have 2 files corresponding to the 2 roles I created on the Chef server.
Here is chef.base.json:
{
"bootstrap": {
"chef": {
"url_type": "http",
"init_style": "init",
"path": "/srv/chef",
"serve_path": "/srv/chef",
"server_fqdn": "chef.mycloud.com"
}
},
"run_list": [ "role[base]" ]
}
The only difference in chef.myapp.json is the run_list, which in this case contains both roles (base and myapp):
{
"bootstrap": {
"chef": {
"url_type": "http",
"init_style": "init",
"path": "/srv/chef",
"serve_path": "/srv/chef",
"server_fqdn": "chef.mycloud.com"
}
},
"run_list": [ "role[base]", "role[myapp]" ]
}
The chef-client script also downloads the client.rb file which contains information about the Chef server:
log_level :info
log_location STDOUT
ssl_verify_mode :verify_none
chef_server_url "http://chef.mycloud.com:4000"
validation_client_name "chef-validator"
validation_key "/etc/chef/validation.pem"
client_key "/etc/chef/client.pem"
file_cache_path "/srv/chef/cache"
pid_file "/var/run/chef/chef-client.pid"
Mixlib::Log::Formatter.show_time = false
Note that the client knows the IP address of chef.mycloud.com because we hardcoded it in /etc/hosts.
The chef-client script also downloads validation.pem, which is an RSA key file used by the Chef server to validate the client upon the initial connection from the client.
The last file downloaded is the init script for launching chef-client automatically upon reboots. I took the liberty of butchering this sample init script and I made it much simpler (see the gist here but beware that it contains paths specific to my environment).
At this point, the client is ready to run this chef-client command which will contact the Chef server (via client.rb), validate itself (via validation.pem), download the recipes associated with the roles specified in chef.json, and run these recipes:
chef-client -j /etc/chef/chef.json -L /var/log/chef.log -l debug
I run the command in debug mode and I specify a log file location (the default output is stdout) so I can tell what's going on if something goes wrong.
That's about it. At this point, the newly launched instance is busy configuring itself via the Chef recipes. Time to sit back and enjoy your automated bootstrap process!
The last lines in chef-client remove the validation.pem file, which is only needed during the client registration, and run chef-client again, this time in the background, via the init script. The process running in the background looks something like this in my case:
/usr/bin/ruby1.8 /usr/bin/chef-client -L /var/log/chef.log -d -j /etc/chef/chef.json -c /etc/chef/client.rb -i 600 -s 30
The -i 600 option means chef-client will contact the Chef server every 600 seconds (plus a random interval given by -s 30) and it will inquire about additions or modifications to the roles it belongs to. If there are new recipes associated with any of the roles, the client will download and run them.
If you want to associate the client to new roles, you can just edit the local file /etc/chef/chef.json and add the new roles to the run_list.