But this is not a post about Puppet -- although I promise I'll blog about that too. This is a post on how to get to the point of using Puppet in an EC2 environment, by automatically configuring EC2 instances as Puppet clients once they're launched.
While the mechanism I'll describe can be achieved by other means, I chose to use the Ubuntu EC2 AMIs provided by alestic. As a parenthesis, if you're thinking about using Ubuntu in EC2, do yourself a favor and read Eric Hammond's blog (which can be found at alestic.com) He has a huge number of amazingly detailed posts related to this topic, and they're all worth your while to read.
Unsurprisingly, I chose a mechanism provided by the alestic AMIs to bootstrap my EC2 instances -- specifically, passing user-data scripts that will be automatically run on the first boot of the instance. You can obviously also bake this into your own custom AMI, but the alestic AMIs already have this hook baked in, which I LIKE (picture Borat's voice). What's more, Eric kindly provides another way to easily run custom scripts within the main user-data script -- I'm referring to his runurl script, detailed in this blog post. Basically you point runurl at a URL that contains the location of another script that you wrote, and runurl will download and run that script. You can also pass parameters to runurl, which will in turn be passed to your script.
Enough verbiage, let's see some examples.
Here is my user-data file, whose file name I am passing along as a parameter when launching my EC2 instances:
cat <<EOL > /etc/hosts
127.0.0.1 localhost.localdomain localhost
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
wget -qO/usr/bin/runurl run.alestic.com/runurl
chmod 755 /usr/bin/runurl
The first thing I do in this script is to add an entry to the /etc/hosts file pointing at the IP address of my puppetmaster server. You can obviously do this with an internal DNS server too, but I've chosen not to maintain my own internal DNS servers in EC2 for now.
My script then retrieves the runurl utility from alestic.com, puts it in /usr/bin and chmod's it to 755. Then the script uses runurl and points it at various other scripts I wrote, all hosted on an internal web server.
For example, the contents of upgrade/apt are:
apt-get -y upgrade
apt-get -y autoremove
For ssh customizations, my scripts downloads a specific .ssh/authorized_keys file, so I can ssh to the new instance using certain ssh keys.
To install and customize vim, I have customize/vim:
apt-get -y install vim
wget -qO/root/.vimrc http://ec2web.mycompany.com/configs/os/.vimrc
echo 'alias vi=vim' >> /root/.bashrc
...where .vimrc is a customized file that I keep under the document root of the same web server where I keep my scripts.
Finally, install/puppet looks like this:
apt-get -y install puppet
wget -qO/etc/puppet/puppetd.conf http://ec2web.mycompany.com/configs/puppet/puppetd.conf
Here I am installing puppet via apt-get, then I'm downloading a custom puppetd.conf configuration, which points at puppetmaster as its server name (instead of the default, which is puppet). Finally, I restart puppet so that the new configuration takes effect.
Note that I want to keep these scripts to the bare minimum that allows me to:
1) ssh into the instance in case anything goes wrong
2) install and configure puppet so the instance can talk to the puppetmaster
The actual package and application installations and customizations on my newly launched image will be done through puppet, by associating the instance hostname with a node that is defined on the puppetmaster; I am also adding more entries to /etc/hosts as needed using puppet-specific mechanisms such as the 'host' type (as promised, blog post on this forthcoming...)
Note that you need to make sure you have good security for the web server instance which is serving your scripts to runurl; Eric Hammond talks about using S3 for that, but it's too complicated IMO (you need to sign URL and expire them, etc.) In my case, I preferred to use an internal Apache instance with basic HTTP authentication, and to only allow traffic on port 80 from certain security groups within EC2 (my Apache server doubles as the puppetmaster BTW).