The first thing we did was look for an existing Chef cookbook for keepalived -- luckily, @jtimberman already wrote it. It's a pretty involved cookbook, probably one of the most complex I've seen. The usage instructions are pretty good though. In any case, we ended up writing our own wrapper cookbook on top of keepalived -- let's call it frontend-keepalived.
The usage documentation for the Opscode keepalived cookbook contains a role-based example and a recipe-based example. We took inspiration from both. In our frontend-keepalived/recipes/default.rb file we have:
include_recipe 'keepalived'
node[:keepalived][:check_scripts][:chk_haproxy] = {
:script => 'killall -0 haproxy',
:interval => 2,
:weight => 2
}
node[:keepalived][:instances][:vi_1] = {
:ip_addresses => '172.30.10.10',
:interface => 'frontend_if',
:track_script => 'chk_haproxy',
:nopreempt => false,
:advert_int => 1,
:auth_type => :pass, # :pass or :ah
:auth_pass => 'mypass'
}
This code overrides the default values for many of the attributes defined in the Opscode keepalived cookbook. It specifies the floating IP address that will be common between the 2 servers that will each run HAProxy (:ip_addresses). It also specifies the network interface where the multicast-based keepalived protocol (:interface) and the 'check script' which tests whether HAProxy is still running on each server.
However, we still needed a way to specify which of the 2 servers is the master and which is the backup (in keepalived parlance), as well as indicating priorities for each server. The usage document in the keep alived cookbook shows this as an example of using a single role to define the master and the backup:
override_attributes( :keepalived => { :global => { :router_ids => { 'node1' => 'MASTER_NODE', 'node2' => 'BACKUP_NODE' } } } )
We couldn't get this to work (if somebody who did reads this, please leave a comment and tell me how you did it!). Instead, we defined 2 roles, one for the master and one for the backup. Here's the master role:
$ cat frontend-keepalived-master.rb
name "frontend-keepalived-master"
description "install keepalived and set state to MASTER"
override_attributes(
"keepalived" => {
"instance_defaults" => {
"state" => "MASTER",
"priority" => "101"
}
}
)
run_list(
"recipe[frontend-keepalived]"
)
Here's the backup role:
$ cat frontend-keepalived-backup.rb
name "frontend-keepalived-backup"
description "install keepalived and set state to BACKUP"
override_attributes(
"keepalived" => {
"instance_defaults" => {
"state" => "BACKUP",
"priority" => "100"
}
}
)
run_list(
"recipe[frontend-keepalived]"
)
Notice that we override 2 attributes, the state and the priority. The defaults for these are in the Opscode keepalived cookbook, under attributes/default.rb
default['keepalived']['instance_defaults']['state'] = 'MASTER'
default['keepalived']['instance_defaults']['priority'] = 100
This was useful in determining how to specify the stanza overriding them in our roles -- it made us see that we needed to specify the instance_defaults key under keepalived in the role files.
At this point, we added the master role to the Chef run_list of server #1 and the backup role to the Chef run_list of server #2. We had to do one more thing on each server (which we'll add to the default recipe of our frontend-keepalived cookbook): per this very helpful blog post on setting up HAProxy and keepalived, we edited /etc/systctl.conf and added:
net.ipv4.ip_nonlocal_bind=1
then applied it via 'sysctl -p'. This was needed so that HAProxy can listen on the keepalived-created 'floating IP' common to the 2 servers, which is not a real IP tied to an existing local network interface.
Once we ran chef-client on each of the 2 servers, we were able to verify that keepalived does its job by pinging the common floating IP from a 3rd server, then shutting down the network interface 'frontend_if' on each server, with no interruption in the ICMP responses sent from the floating IP. Our next step is to do some heavy-duty testing involving HTTP requests handled by HAProxy, and see that there is no interruption in service when we fail over from one HAProxy server to the other.
UPDATE
My colleague Zmer Andranigian discovered an attribute in the Opscode keepalived cookbook that deals with the sysctl setup. The default value for this attribute is:
default['keepalived']['shared_address'] = false
If this attribute is set to 'true' (for example in one of the 2 roles we defined above), then the keepalived cookbook will create a file called /etc/sysctl.d/60-ip-nonlocal-bind.conf containing:
net.ipv4.ip_nonlocal_bind=1
and will also set it in the running configuration of sysctl.
For reference, the role frontend-keepalived-master would contain the following attributes:
override_attributes(
"keepalived" => {
"instance_defaults" => {
"state" => "MASTER",
"priority" => "101"
}
"shared_address" => "true"
}
)