Wednesday, February 25, 2015

Sending Windows logs to Papertrail with nxlog

I am revisiting Papertrail as a log aggregation tool. It's really easy to send Linux logs to Papertrail via syslog or rsyslog or syslog-ng (see this article on how to configure syslog with TLS) but to send Windows logs you need to jump through some hoops.

Papertrail recommends nxlog as their Windows log management tool of choice, so that's what I used. This Papertrail article explains how to install and configure nxlog on Windows (I recommend enabling TLS).  The nxlog.conf template file provided by Papertrail will send Windows Event logs over. I also wanted to send application-specific logs, so here's what I did:

1) Add an Input section to nxlog.conf for each directory containing the files you want to send to Papertrail. For example, if one of your applications logs to C:\MyApp1\logs and your log files end with .log, you could have this input section:

# Monitor MyApp1 log files 
 Module im_file
 File 'C:\\MyApp1\\logs\\*.log' 
 Exec $Message = $raw_event; 
 Exec if $Message =~ /GET \/ping/ drop(); 
 Exec if file_name() =~ /.*\\(.*)/ $SourceName = $1; 
 SavePos TRUE 
 Recursive TRUE 

Some observations:

  • Blogger doesn't like angle brackets so replace START_ANGLE_BRACKET with < and END_ANGLE_BRACKET with >
  • The name MyApp1 is the name of this Input section
  • The File statement points to the location and name of the log files
  • The first Exec statement saves the log line under consideration as the variable $Message
  • The second Exec statement drops messages that contain a specific regular expression, in my case just 'GET /ping' -- which happens to be health checks from the load balancer that pollute the logs; you can replace this with any regular expression that will filter out log lines you don't want sent to Papertrail
  • The next few statements were in the sample Input stanza from the template nxlog.conf file so I just left them there
2) Add more Input sections, one for each log location (i.e. multiple log files under a given directory) that you want to send to Papertrail. You need to give each Input section a unique name (e.g. MyApp1 above).

3) Add a Route section for the Input sections defined previously. If you defined 2 Input sections MyApp1 and MyApp2, your Route section would look something like:

Path MyApp1, MyApp2=> filewatcher_transformer => syslogout

The filewatcher_transformer section was already included in the sample nxlog.conf file from Papertrail. The Route section above says that the files processed by the 2 Input paths MyApp1 and MyApp2 will be processed through the statements defined in the filewatcher_transformer section, then will be sent to Papertrail by virtue of being processed through the statements defined in the syslogout section.

At this point, if you restart the nxlog service on your Windows box, you should start seeing log entries from your application(s) flowing into the Papertrail console.

Saturday, January 31, 2015

Setting up an OpenVPN server inside an AWS VPC

My goal in this post is to show how to set up an OpenVPN server within an AWS VPC so that clients can connect to EC2 instances via a VPN connection. It's fairly well documented how to configure a site-to-site VPN with an AWS VPC gateway, but articles talking about client-based VPN connections into a VPC are harder to find.

== OpenVPN server setup ==

Let's start with setting up the OpenVPN server. I launched a new Ubuntu 14.04 instance in our VPC and I downloaded the latest openvpn source code via:


In order for the 'configure' step to succeed, I also had to install the following Ubuntu packages:

apt-get install build-essential openssl libssl-dev lzop liblzo2-dev libpam-dev

I then ran the usual commands:

./configure; make; sudo make install

At this point I proceeded to set up my own Certificate Authority (CA), per the OpenVPN HOWTO guide.  As it turned out, I needed the easy-rsa helper scripts on the server running openvpn. I got them from github:

git clone

To generate the master CA certificate & key, I did the following:

cd ~/easy-rsa/easyrsa3
cp vars.example vars

- edited vars file and set these variables with the proper values for my organization:

./easyrsa build-ca
(this will use the info specified in the vars file above)

To generate the OpenVPN server certificate and key, I ran:

./easyrsa build-server-full server
(I was prompted for a password for the server key)

To generate an OpenVPN client certificate and key for user myuser, I ran:

./easyrsa  build-client-full myuser
(I was prompted for a password for the client key)

The next step was to generate the Diffie Hellman (DH) parameters for the server by running:

./easyrsa gen-dh

I was ready at this point to configure the OpenVPN server.

I created a directory called /etc/openvpn and copied the pki directory under ~/easy-rsa/easyrsa3 to /etc/openvpn. I also copied the sample server configuration file ~/openvpn-2.3.6/sample/sample-config-files/server.conf to /etc/openvpn.

I edited /etc/openvpn/server.conf and specified the following:

ca /etc/openvpn/pki/ca.crt
cert /etc/openvpn/pki/issued/server.crt
key /etc/openvpn/pki/private/server.key  # This file should be kept secret
dh /etc/openvpn/pki/dh.pem

ifconfig-pool-persist /etc/openvpn/ipp.txt

push "route"

The first block specifies the location of the CA certificate, the server key and certificate, and the DH certificate.

The 'server' parameter specifies a new subnet from which both the OpenVPN server and the OpenVPN clients connecting to the server will get their IP addresses. I set it to The client IP allocations will be saved in the ipp.txt file, as specified in the ifconfig-pool-persist parameter.

One of the most important options, which I missed when I initially configured the server, is the 'push route' one. This makes the specified subnet (i.e. the instances in the VPC that you want to get to via the OpenVPN server) available to the clients connecting to the OpenVPN server without the need to create static routes on the clients. In my case, all the EC2 instances in the VPC are on the subnet, so that's what I specified above.

Two more very important steps are needed on the OpenVPN server. It took me quite a while to find them so I hope you will be spared the pain.

The first step was to turn on IP forwarding on the server:

- uncomment the following line in /etc/sysctl.conf:

- run
sysctl -p

The final step in the configuration of the OpenVPN server was to make it do NAT via itpables masquerading (thanks to rbgeek's blog post for these last two critical steps):

- run
iptables -t nat -A POSTROUTING -s -o eth0 -j MASQUERADE

- also add the above line to /etc/rc.local so it gets run on reboot

Now all that's needed on the server is to actually run openvpn. You can run it in the foreground for troubleshooting purposes via:

openvpn /etc/openvpn/server.conf

Once everything works, run it in daemon mode via:

openvpn --daemon --config /etc/openvpn/server.conf

You will be prompted for the server key password when you start up openvpn. Haven't looked yet on how to run the server in a fully automated way.

Almost forgot to specify that you need to allow incoming traffic to UDP port 1194 in the AWS security group where your OpenVPN server belongs. Also allow traffic from that security group to the security groups of the EC2 instances that you actually want to reach over the OpenVPN tunnel.

== OpenVPN client setup ==

This is on a Mac OSX Mavericks client, but I'm sure it's similar for other clients.

Install tuntap
- download tuntap_20150118.tar.gz from
- untar and install tuntap_20150118.pkg

Install lzo
tar xvfz lzo-2.06.tar.gz
cd lzo-2.06
./configure; make; sudo make install

Install openvpn
- download openvpn-2.3.6.tar.gz from
tar xvf openvpn-2.3.6.tar
cd openvpn-2.3.6
./configure; make; sudo make install

At this point ‘openvpn --help’ should work.

The next step for the client setup is to copy the CA certificate ca.crt, and the client key and certificate (myuser.key and myuser.crt) from the OpenVPN server to the local client. I created an openvpn directory under my home directory on my Mac and dropped ca.crt in ~/openvpn/pki, myuser.key in ~/openvpn/pki/private and myuser.crt in ~/openvpn/pki/issued. I also copied the sample file ~/openvpn-2.3.6/sample/sample-config-files/client.conf to ~/openvpn and specified the following parameters in that file:


ca /Users/myuseropenvpn/pki/ca.crt
cert /Users/myuser/openvpn/pki/issued/myuser.crt
key /Users/myuser/openvpn/pki/private/myuser.key

Then I started up the OpenVPN client via:

sudo openvpn ~/openvpn/client.conf
(at this point I was prompted for the password for myuser.key)

To verify that the OpenVPN tunnel is up and running, I ping-ed the internal IP address of the OpenVPN server (in my case it was on the internal subnet I specified in server.conf), as well as the internal IPs of various EC2 instances behind the OpenVPN server. Finally, I ssh-ed into those internal IP addresses and declared victory.

That's about it. Hope this helps!

UPDATE: I discovered in the mean time a very good Mac OSX GUI tool for managing client OpenVPN connections: Tunnelblick. All it took was importing the client.conf file mentioned above.

Wednesday, December 17, 2014

Dynamic DNS updates with nsupdate (new and improved!)

I blogged about this topic before. This post shows a slightly different way of using nsupdate remotely against a DNS server running BIND 9 in order to programatically update DNS records. The scenario I am describing here involves an Ubuntu 12.04 DNS server running BIND 9 and an Ubuntu 12.04 client running nsupdate against the DNS server.

1) Run ddns-confgen and specify /dev/urandom as the source of randomness and the name of the zone file you want to dynamically update via nsupdate:

$ ddns-confgen -r /dev/urandom -z

# To activate this key, place the following in named.conf, and
# in a separate keyfile on the system or systems from which nsupdate
# will be run:
key "" {
algorithm hmac-sha256;
secret "1D1niZqRvT8pNDgyrJcuCiykOQCHUL33k8ZYzmQYe/0=";

# Then, in the "zone" definition statement for "",
# place an "update-policy" statement like this one, adjusted as
# needed for your preferred permissions:
update-policy {
 grant zonesub ANY;

# After the keyfile has been placed, the following command will
# execute nsupdate using this key:
nsupdate -k <keyfile>

2) Follow the instructions in the output of ddns-keygen (above). I actually named the key just ddns-key, since I was going to use it for all the zones on my DNS server. So I added this stanza to /etc/bind/named.conf on the DNS server:

key "ddns-key" {
algorithm hmac-sha256;
secret "1D1niZqRvT8pNDgyrJcuCiykOQCHUL33k8ZYzmQYe/0=";

3) Allow updates when the key ddns-key is used. In my case, I added the allow-update line below to all zones that I wanted to dynamically update, not only to

zone "" {
        type master;
        file "/etc/bind/zones/";
allow-update { key "ddns-key"; };

At this point I also restarted the bind9 service on my DNS server.

4) On the client box, create a text file containing nsupdate commands to be sent to the DNS server. In the example below, I want to dynamically add both an A record and a reverse DNS PTR record:

$ cat update_dns1.txt
debug yes
update add 3600 A
update add 3600 PTR

Still on the client box, create a file containing the stanza with the DDNS key generated in step 1:

$ cat ddns-key.txt
key "ddns-key" {
algorithm hmac-sha256;
secret "Wxp1uJv3SHT+R9rx96o6342KKNnjW8hjJTyxK2HYufg=";

5) Run nsupdate and feed it both the update_dns1.txt file containing the commands, and the ddns-key.txt file:

$ nsupdate -k ddns-key.txt -v update_dns1.txt

You should see some fairly verbose output, since the command file specifies 'debug yes'. At the same time, tail /var/log/syslog on the DNS server and make sure there are no errors.

In my case, there were some hurdles I had to overcome on the DNS server. The first one was that apparmor was installed and it wasn't allowing the creation of the journal files used to keep track of DDNS records. I saw lines like these in /var/log/syslog:

Dec 16 11:22:59 dns1 kernel: [49671335.189689] type=1400 audit(1418757779.712:12): apparmor="DENIED" operation="mknod" parent=1 profile="/usr/sbin/named" name="/etc/bind/zones/" pid=31154 comm="named" requested_mask="c" denied_mask="c" fsuid=107 ouid=107
Dec 16 11:22:59 dns1 kernel: [49671335.306304] type=1400 audit(1418757779.828:13): apparmor="DENIED" operation="mknod" parent=1 profile="/usr/sbin/named" name="/etc/bind/zones/" pid=31153 comm="named" requested_mask="c" denied_mask="c" fsuid=107 ouid=107

To get past this issue, I disabled apparmor for named:

# ln -s /etc/apparmor.d/usr.sbin.named /etc/apparmor.d/disable/
# service apparmor restart

The next issue was an OS permission denied (nothing to do with apparmor) when trying to create the journal files in /etc/bind/zones:

Dec 16 11:30:54 dns1 named[32640]: /etc/bind/zones/ create: permission denied
Dec 16 11:30:54 dns named[32640]: /etc/bind/zones/ create: permission denied

I got past this issue by running

# chown -R bind:bind /etc/bind/zones

At this point everything worked as expected.

Monday, November 17, 2014

Service discovery with consul and consul-template

I talked in the past about an "Ops Design Pattern: local haproxy talking to service layer". I described how we used a local haproxy on pretty much all nodes at a given layer of our infrastructure (webapp, API, e-commerce) to talk to services offered by the layer below it. So each webapp server has a local haproxy that talks to all API nodes it sends requests to. Similarly, each API node has a local haproxy that talks to all e-commerce nodes it needs info from.

This seemed like a good idea at a time, but it turns out it has a couple of annoying drawbacks:
  • each local haproxy runs health checks against N nodes, so if you have M nodes running haproxy, each of the N nodes will receive M health checks; if M and N are large, then you have a health check storm on your hands
  • to take a node out of a cluster at any given layer, we tag it as 'inactive' in Chef, then run chef-client on all nodes that run haproxy and talk to the inactive node at layers above it; this gets old pretty fast, especially when you're doing anything that might conflict with Chef and that the chef-client run might overwrite (I know, I know, you're not supposed to do anything of that nature, but we are all human :-)
For the second point, we are experimenting with haproxyctl so that we don't have to run chef-client on every node running haproxy. But it still feels like a heavy-handed approach.

If I were to do this again (which I might), I would still have an haproxy instance in front of our webapp servers, but for communicating from one layer of services to another I would use a proper service discovery tool such as grampa Apache ZooKeeper or the newer kids on the block, etcd from CoreOS and consul from HashiCorp.

I settled on consul for now, so in this post I am going to show how you can use consul in conjunction with the recently released consul-template to discover services and to automate configuration changes. At the same time, I wanted to experiment a bit with Ansible as a configuration management tool. So the steps I'll describe were actually automated with Ansible, but I'll leave that for another blog post.

The scenario I am going to describe involves 2 haproxy instances, each pointing to 2 Wordpress servers running Apache, PHP and MySQL, with Varnish fronting the Wordpress application. One of the 2 Wordpress servers is considered primary as far as haproxy is concerned, and the other one is a backup server, which will only get requests if the primary server is down. All servers are running Ubuntu 12.04.

Install and run the consul agent on all nodes

The agent will start in server mode on the 2 haproxy nodes, and in agent mode on the 2 Wordpress nodes.

I first deployed consul to the 2 haproxy nodes. I used a modified version of the ansible-consul role from jivesoftware. The configuration file /etc/consul.cfg for the first server (lb1) is:

  "domain": "consul.",
  "data_dir": "/opt/consul/data",
  "log_level": "INFO",
  "node_name": "lb1",
  "server": true,
  "bind_addr": "",
  "datacenter": "us-west-1b",
  "bootstrap": true,
  "rejoin_after_leave": true

(and similar for lb2, with only node_name and bind_addr changed to lb2 and respectively)

The ansible-consul role also creates a consul user and group, and an upstart configuration file like this:

# cat /etc/init/consul.conf

# Consul Agent (Upstart unit)
description "Consul Agent"
start on (local-filesystems and net-device-up IFACE!=lo)
stop on runlevel [06]

exec sudo -u consul -g consul /opt/consul/bin/consul agent -config-dir /etc/consul.d -config-file=/etc/consul.conf >> /var/log/consul 2>&1
respawn limit 10 10
kill timeout 10

To start/stop consul, I use:

# start consul
# stop consul

Note that "server" is set to true and "bootstrap" is also set to true, which means that each consul server will be the leader of a cluster with 1 member, itself. To join the 2 servers into a consul cluster, I did the following:
  • join lb1 to lb2: on lb1 run consul join
  • tail /var/log/consul on lb1, note messages complaining about both consul servers (lb1 and lb2) running in bootstrap mode
  • stop consul on lb1: stop consul
  • edit /etc/consul.conf on lb1 and set  "bootstrap": false
  • start consul on lb1: start consul
  • tail /var/log/consul on both lb1 and lb2; it should show no more errors
  • run consul info on both lb1 and lb2; the output should show server=true on both nodes, but leader=true only on lb2
Next I ran the consul agent in regular non-server mode on the 2 Wordpress nodes. The configuration file /etc/consul.cfg on node wordpress1 was:

  "domain": "consul.",
  "data_dir": "/opt/consul/data",
  "log_level": "INFO",
  "node_name": "wordpress1",
  "server": false,
  "bind_addr": "",
  "datacenter": "us-west-1b",
  "rejoin_after_leave": true

(and similar for wordpress2, with the node_name set to wordpress2 and bind_addr set to

After starting up the agents via upstart, I joined them to lb2 (although the could be joined to any of the existing members of the cluster). I ran this on both wordpress1 and wordpress2:

# consul join

At this point, running consul members on any of the 4 nodes should show all 4 members of the cluster:

Node          Address         Status  Type    Build  Protocol
lb1    alive   server  0.4.0  2
wordpress2   alive   client  0.4.0  2
lb2    alive   server  0.4.0  2
wordpress1   alive   client  0.4.0  2

Install and run dnsmasq on all nodes

The ansible-consul role does this for you. Consul piggybacks on DNS resolution for service naming, and by default the domain names internal to Consul start with consul. In my case they are configured in consul.cfg via "domain": "consul."

The dnsmasq configuration file for consul is:

# cat /etc/dnsmasq.d/10-consul


This causes dnsmasq to provide DNS resolution for domain names starting with consul. by querying a DNS server on running on port 8600 (which is the port the local consul agent listens on to provide DNS resolution).

To start/stop dnsmasq, use: service dnsmasq start | stop.

Now that dnsmasq is running, you can look up names that end in .node.consul from any member node of the consul cluster (there are 4 member nodes in my cluster, 2 servers and 2 agents). For example, I ran this on lb2:

$ dig wordpress1.node.consul

; <<>> DiG 9.8.1-P1 <<>> wordpress1.node.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2511
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;wordpress1.node.consul. IN A

wordpress1.node.consul. 0 IN A

;; Query time: 1 msec
;; WHEN: Fri Nov 14 00:09:16 2014
;; MSG SIZE  rcvd: 76

Configure services and checks on consul agent nodes

Internal DNS resolution within the .consul domain becomes even more useful when nodes define services and checks. For example, the 2 Wordpress nodes run varnish and apache (on port 80 and port 443) so we can define 3 services as JSON files in /etc/consul.d. On wordpress1, which is our active/primary node in haproxy, I defined these services:

$ cat http_service.json
    "service": {
        "name": "http",
        "tags": ["primary"],
        "check": {
                "id": "http_check",
                "name": "HTTP Health Check",
  "script": "curl -H '' http://localhost",
        "interval": "5s"

$ cat ssl_service.json
    "service": {
        "name": "ssl",
        "tags": ["primary"],
        "check": {
                "id": "ssl_check",
                "name": "SSL Health Check",
  "script": "curl -k -H '' https://localhost:443",
        "interval": "5s"

$ cat varnish_service.json
    "service": {
        "name": "varnish",
        "tags": ["primary"],
        "port":6081 ,
        "check": {
                "id": "varnish_check",
                "name": "Varnish Health Check",
  "script": "curl http://localhost:6081",
        "interval": "5s"

Each service we defined has a name, a port and a check with its own ID, name, script that runs whenever the check is executed, and an interval that specifies how often the check is run. In the examples above I specified simple curl commands against the ports that these services are running on. Note also that each service has a list of tags associated with it. In my case, the services on wordpress1 have the tag "primary". The services defined on wordpress2 are identical to the ones on wordpress1 with the only difference being the tag, which on wordpress2 is "backup".

After restarting consul on wordpress1 and wordpress2, the following service-related DNS names are available for resolution on all nodes in the consul cluster (I am going to include only relevant portions of the dig output):

$ dig varnish.service.consul

varnish.service.consul. 0 IN A
varnish.service.consul. 0 IN A

This name resolves in DNS round-robin fashion to the IP addresses of all nodes that are running the varnish service, regardless of their tags and regardless of the data centers that their nodes run in. In our case, it resolves to the IP addresses of wordpress1 and wordpress2.

Note that the IP address of a given node only appears in the DNS result set if the service running on that node has a healty check. If the check fails, then consul's DNS service will not include the IP of the node in the result set. This is very important for the dynamic discovery of healthy services.

$ dig


If we include the data center (in our case us-west-1b) in the DNS name we query, then only the services running on nodes in that data center will be returned in the result set. In our case though, all nodes run in the us-west-1b data center, so this query returns, like the previous one, the IP addresses of wordpress1 and wordpress2. Note that the IPs can be returned in any order, because of DNS round-robin. In this case the IP of wordpress2 was first.

$ dig SRV varnish.service.consul

varnish.service.consul. 0 IN SRV 1 1 6081
varnish.service.consul. 0 IN SRV 1 1 6081


A useful feature of the consul DNS service is that it returns the port number that a given service runs on when queried for an SRV record. So this query returns the names and IPs of the nodes that the varnish service runs on, as well as the port number, which in this case is 6081. The application querying for the SRV record needs to interpret this extra piece of information, but this is very useful for the discovery of internal services that might run on non-standard port numbers.

$ dig primary.varnish.service.consul

primary.varnish.service.consul. 0 IN A

$ dig backup.varnish.service.consul

backup.varnish.service.consul. 0 IN A

The 2 DNS queries above show that it's possible to query a service by its tag, in our case 'primary' vs. 'backup'. The result set will contain the IP addresses of the nodes tagged with the specific tag and running the specific service we asked for. This feature will prove useful when dealing with consul-template in haproxy, as I'll show later in this post.

Load balance across services

It's easy now to see how an application can take advantage of the internal DNS service provided by consul and load balance across services. For example, an application that needs to load balance across the 2 varnish services on wordpress1 and wordpress2 would use varnish.service.consul as the DNS name it talks to when it needs to hit varnish. Every time this DNS name is resolved, a random node from wordpress1 and wordpress2 is returned via the DNS round-robin mechanism. If varnish were to run on a non-standard port number, the application would need to issue a DNS request for the SRV record in order to obtain the port number as well as the IP address to hit.

Note that this method of load balancing has health checks built in. If the varnish health check fails on one of the nodes providing the varnish service, that node's IP address will not be included in the DNS result set returned by the DNS query for that service.

Also note that the DNS query can be customized for the needs of the application, which can query for a specific data center, or a specific tag, as I showed in the examples above.

Force a node out of service

I am still looking for the best way to take nodes in and out of service for maintenance or other purposes. One way I found so far is to deregister a given service via the Consul HTTP API. Here is an example of a curl command that accomplishes that, executed on node wordpress1:

$ curl -v http://localhost:8500/v1/agent/service/deregister/varnish
* About to connect() to localhost port 8500 (#0)
*   Trying connected
> GET /v1/agent/service/deregister/varnish HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/ libidn/1.23 librtmp/2.3
> Host: localhost:8500
> Accept: */*
< HTTP/1.1 200 OK
< Date: Mon, 17 Nov 2014 19:01:06 GMT
< Content-Length: 0
< Content-Type: text/plain; charset=utf-8
* Connection #0 to host localhost left intact
* Closing connection #0

The effect of this command is that the varnish service on node wordpress1 is 'deregistered', which for my purposes means 'marked as down'. DNS queries for varnish.service.consul will only return the IP address of wordpress2:

$ dig varnish.service.consul

varnish.service.consul. 0 IN A

We can also use the Consul HTTP API to verify that the varnish service does not appear in the list of active services on node wordpress1. We'll use the /agent/services API call and we'll save the output to a file called services.out, then we'll use the jq tool to pretty-print the output:

$ curl -v http://localhost:8500/v1/agent/services -o services.out

$ jq . <<< `cat services.out`
 "http": {
   "ID": "http",
   "Service": "http",
   "Tags": [
   "Port": 80
 "ssl": {
   "ID": "ssl",
   "Service": "ssl",
   "Tags": [
   "Port": 443

Note that only the http and ssl services are shown.

Force a node back in service

Again, I am still looking for the best way to mark as service as 'up' once it was marked as 'down'. One way would be to register the service via the Consul HTTP API, and that requires issuing a POST request with the payload being the JSON configuration file for that service. Another way is to just restart the consul agent on the node in question. This will register the service that had been deregistered previously.

Install and configure consul-template

For the next few steps, I am going to show how to use consul-template in conjuction with consul for discovering services and configuring haproxy based on the discovered services.

I automated the installation and configuration of consul-template via an Ansible role that I put on Github, but I am going to discuss the main steps here. See also the instructions on the consul-template Github page.

In my Ansible role, I copy the consul-template binary to the target node (in my case the 2 haproxy nodes lb1 and lb2), then create a directory structure /opt/consul-template/{bin,config,templates}. The consul-template configuration file is /opt/consul-template/config/consul-template.cfg and it looks like this in my case:

$ cat config/consul-template.cfg
consul = ""

template {
  source = "/opt/consul-template/templates/haproxy.ctmpl"
  destination = "/etc/haproxy/haproxy.cfg"
  command = "service haproxy restart"

Note that consul-template needs to be able to talk a consul agent, which in my case is the local agent listening on port 8500. The template that consul-template maintains is defined in another file,  /opt/consul-template/templates/haproxy.ctmpl. What consul-template does is monitor changes to that file via changes to the services referenced in the file. Upon any such change, consul-template will generate a new target file based on the template and copy it to the destination file, which in my case is the haproxy config file /etc/haproxy/haproxy.cfg. Finally, consul-template will executed a command, which in my case is the restarting of the haproxy service.

Here is the actual template file for my haproxy config, which is written in the Go template format:

$ cat /opt/consul-template/templates/haproxy.ctmpl

  log   local0
  maxconn 4096
  user haproxy
  group haproxy

  log     global
  mode    http
  option  dontlognull
  retries 3
  option redispatch
  timeout connect 5s
  timeout client 50s
  timeout server 50s
  balance  roundrobin

# Set up application listeners here.

frontend http
  maxconn {{key "service/haproxy/maxconn"}}
  default_backend servers-http-varnish

backend servers-http-varnish
  balance            roundrobin
  option httpchk GET /
  option  httplog
{{range service "primary.varnish"}}
    server {{.Node}} {{.Address}}:{{.Port}} weight 1 check port {{.Port}}
{{range service "backup.varnish"}}
    server {{.Node}} {{.Address}}:{{.Port}} backup weight 1 check port {{.Port}}

frontend https
  maxconn            {{key "service/haproxy/maxconn"}}
  mode               tcp
  default_backend    servers-https

backend servers-https
  mode               tcp
  option             tcplog
  balance            roundrobin
{{range service "primary.ssl"}}
    server {{.Node}} {{.Address}}:{{.Port}} weight 1 check port {{.Port}}
{{range service "backup.ssl"}}
    server {{.Node}} {{.Address}}:{{.Port}} backup weight 1 check port {{.Port}}

To the trained eye, this looks like a regular haproxy configuration file, with the exception of the portions bolded above. These are Go template snippets which rely on a couple of template functions exposed by consul-template above and beyond what the Go templating language offers. Specifically, the key function queries a key stored in the Consul key/value store and outputs the value associated with that key (or an empty string if the value doesn't exist). The service function queries a consul service by its DNS name and returns a result set used inside the range statement. The variables inside the result set can be inspected for properties such as Node, Address and Port, which correspond to the Consul service node name, IP address and port number for that particular service.

In my example above, I use the value of the key service/haproxy/maxconn as the value of maxconn. In the http-varnish backend, I used 2 sets of services names, primary.varnish and backup.varnish, because I wanted to differentiate in haproxy.cfg between the primary server (wordpress1 in my case) and the backup server (wordpress2). In the ssl backend, I did the same but with the ssl service.

Everything so far would work fine with the exception of the key/value pair represented by the key service/haproxy/maxconn. To define that pair, I used the Consul key/value store API (this can be run on any member of the Consul cluster):

$ cat


curl -X PUT -d "$MAXCONN" http://localhost:8500/v1/kv/service/haproxy/maxconn

To verify that the value was set, I used:

$ cat

curl -v http://localhost:8500/v1/kv/?recurse

$ ./
* About to connect() to localhost port 8500 (#0)
*   Trying connected
> GET /v1/kv/?recurse HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/ libidn/1.23 librtmp/2.3
> Host: localhost:8500
> Accept: */*
< HTTP/1.1 200 OK
< Content-Type: application/json
< X-Consul-Index: 30563
< X-Consul-Knownleader: true
< X-Consul-Lastcontact: 0
< Date: Mon, 17 Nov 2014 23:01:07 GMT
< Content-Length: 118
* Connection #0 to host localhost left intact
* Closing connection #0

At this point, everything is ready for starting up the consul-template service (in Ubuntu), I did it via this Upstart configuration file:

# cat /etc/init/consul-template.conf
# Consul Template (Upstart unit)
description "Consul Template"
start on (local-filesystems and net-device-up IFACE!=lo)
stop on runlevel [06]

exec /opt/consul-template/bin/consul-template  -config=/opt/consul-template/config/consul-template.cfg >> /var/log/consul-template 2>&1

respawn limit 10 10
kill timeout 10

# start consul-template

Once consul-template starts, it will peform the actions corresponding to the functions defined in the template file /opt/consul-template/templates/haproxy.ctmpl. In my case, it will query Consul for the value of the key service/haproxy/maxconn and for information about the 2 Consul services varnish.service and ssl.service. It will then save the generated file to /etc/haproxy/haproxy.cfg and it will restart the haproxy service. The relevant snippets from haproxy.cfg are:

frontend http
  maxconn 4000
  default_backend servers-http

backend servers-http
  balance            roundrobin
  option httpchk GET /
  option  httplog

    server wordpress1 weight 1 check port 6081

    server wordpress2 backup weight 1 check port 6081


frontend https
  maxconn            4000
  mode               tcp
  default_backend    servers-https

backend servers-https
  mode               tcp
  option             tcplog
  balance            roundrobin

    server wordpress1 weight 1 check port 443

    server wordpress2 backup weight 1 check port 443

I've been running this as a test on lb2. I don't consider my setup quite production-ready because I don't have monitoring in place, and I also want to experiment with consul security tokens for better security. But this is a pattern that I think will work.

Wednesday, October 15, 2014

Testing CDN and geolocation with

Assume you want to migrate to a new CDN provider. Eventually you'll have to point as a CNAME to a domain name handled by the CDN provider, let's call it To test this setup before you put it in production, the usual way is to get an IP address corresponding to, then associate with that IP address in your local /etc/hosts file.

This works well for testing most of the functionality of your web site, but it doesn't work when you want to test geolocation-specific features such as displaying the currency based on the users's country of origin. For this, you can use a nifty feature from the amazing free service WebPageTest.

On the main page of WebPageTest, you can specify the test location from the dropdown. It contains a generous list of locations across the globe. To fake your DNS setting and point, you can specify something like this in the Script tab:


This will effectively associate the page you want to test with the CDN provider-specified URL, so you will hit the CDN first from the location you chose.