Agile Testing: Docker orchestration with Rancher

For the last month or so I've been experimenting with Rancher as the orchestration layer for Docker-based deployments. I've been pretty happy with it so far. Here are some of my notes and a few tips and tricks. I also recommend reading through the very good Rancher documentation. In what follows I'll assume that the cluster management engine used by Rancher is its own engine called Cattle. Rancher also supports Kubernetes, Mesos and Docker Swarm.

Running the Rancher server

I provisioned an EC2 instance, installed Docker on it, then ran this command to launch the Rancher server as a Docker container (it will also get launched automatically if you reboot the EC2 instance):

# docker run -d --restart=always -p 8080:8080 rancher/server

Creating Rancher environments

It's important to think about the various environments you want to manage in Rancher. If you have multiple projects that you want to manage with Rancher, as well as multiple environments for your infrastructure, such as development, staging and production, I recommend you create a Rancher environment per project/infrastructure-environment combination, for example a Rancher environment called proj1dev, another one called proj1stage, another called proj1prod, and similarly for other projects: proj2dev, proj2stage, proj2prod etc.

Tip: Since all containers in the same Rancher environment can by default connect to all other containers in that Rancher environment, having a project/infrastructure-environment combination as detailed above will provide good isolation and security from one project to another, and from one infrastructure environment to another within the same project. I recommend you become familiar with Rancher environments by reading more about them in the documentation.

In what follows I'll assume the current environment is proj1dev.

Creating Rancher API key pairs

Within each environment, create an API key pair. Copy and paste the two keys (one access key and one secret access key) somewhere safe.

Adding Rancher hosts

Within each environment, you need to add Rancher hosts. They are the compute nodes that will run the various Docker containers that you will orchestrate with Rancher. In my case, I provisioned two hosts per environment as EC2 instances running Docker.

In the Rancher UI, when you go to Infrastructure -> Hosts then click the Add Host button, you should see a docker run command that you can run on each host in order to launch the Rancher Agent on that host. Something like this:

# docker run -d --privileged -v /var/run/docker.sock:/var/run/docker.sock -v /var/lib/rancher:/var/lib/rancher rancher/agent:v1.0.2 http://your-rancher-server-name.example.com:8080/v1/scripts/5536854597A70149E388:1473267600000:rfQVqxXcvIPulNw72fUOQG66iGM

Note that you need to allow UDP ports 500 and 4500 from each Rancher host to/from any other host and to/from the Rancher server. This is because Rancher uses IPSec tunnels for inter-host communication. The Rancher hosts also need to talk to the Rancher server over port 8080 (or whatever port you have exposed for the Rancher server container).

Adding ECR registries

We use ECR as our Docker registry. Within each environment, I had to add our ECR registry. In the Rancher UI, I went to Infrastructure -> Registries, then clicked Add Registry and chose Custom as the registry type. In the attribute fields, I specified:

Address: my_ecr_registry_id.dkr.ecr.my_region.amazonaws.com
Email: none
Username: AWS
Password: the result of running these commands (you need to install and configure the awscli for this to work):

apt-get install python-pip; pip install awscli
aws configure (specify the keys for an IAM user allowed to access the ECR registry)
aws ecr get-login | cut -d ' ' -f 6

Application architecture

For this example I will consider an application composed of a Web application based on Apache/PHP running in 2 or more containers and mounting its shared files (configuration, media) over NFS. The Web app talks to a MySQL database server mounting its data files over NFS. The Web app containers are behind one or more instances of a Rancher load balancer, and the Rancher LB instances are fronted by an Amazon Elastic Load Balancer.

Rancher stacks

A 'stack' in Rancher corresponds to a set of services defined in a docker-compose YAML file. These services can also have Rancher-specific attributes (such as desired number of containers aka 'scale', health checks, etc) defined in a special rancher-compose YAML file. I'll show plenty of examples of these files in what follows. My stack naming convention will be projname-environment-stacktype, for example proj1-development-nfs, proj1--development-database etc.

Tip: Try to experiment with creating stacks in the Rancher UI, then either view or export their configurations via the stack settings button in the UI:

This was a life saver for me especially when it comes to lower-level stacks such as NFS or Rancher load balancers. Exporting the configuration will download a zip file containing two files: docker-compose.yml and rancher-compose.yml. It will save you from figuring out on your own the exact syntax you need to use in these files.

Creating an NFS stack

One of the advantages of using Rancher is that it offers an extensive catalog of services ready to be used within your infrastructure. One such service is Convoy NFS. To use it, I started out by going to the Catalog menu option in the Rancher UI, then selecting Convoy NFS. In the following screen I specified proj1-development-nfs as the stack name, as well as the NFS server's IP address and mount point.

Note that I had already set up an EC2 instance to act as an NFS server. I attached an EBS volume per project/environment. So in the example above, I exported a directory called /nfs/development/proj1.

After launching the NFS stack, you should see it in the Stacks screen in the Rancher UI. The stack will consist of 2 services, one called convoy-nfs and the other called convoy-nfs-storagepool:

Once the NFS stack is up and running, you can export its configuration as explained above.

To create or update a stack programmatically, I used the rancher-compose utility and wrapped it inside shell scripts. Here is an example of a shell script that calls rancher-compose to create an NFS stack:

$ cat rancher-nfssetup.sh

#!/bin/bash

COMMAND=$@

rancher-compose -p proj1-development-nfs --url $RANCHER_URL --access-key $RANCHER_API_ACCESS_KEY --secret-key $RANCHER_API_SECRET_KEY --env-file .envvars --file docker-compose-nfssetup.yml --rancher-file rancher-compose.yml $COMMAND

Note that there is no command line option for the target Rancher environment. It suffices to use the Rancher API keys for a given environment in order to target that environment.

Here is the docker-compose file for this stack, which I obtained by exporting the stack configuration from the UI:

$ cat docker-compose-nfssetup.yml convoy-nfs-storagepool: labels: io.rancher.container.create_agent: 'true' command: - storagepool-agent image: rancher/convoy-agent:v0.9.0 volumes: - /var/run:/host/var/run - /run:/host/run convoy-nfs: labels: io.rancher.scheduler.global: 'true' io.rancher.container.create_agent: 'true' command: - volume-agent-nfs image: rancher/convoy-agent:v0.9.0 pid: host privileged: true volumes: - /lib/modules:/lib/modules:ro - /proc:/host/proc - /var/run:/host/var/run - /run:/host/run - /etc/docker/plugins:/etc/docker/plugins

Here is the portion of my rancher-compose.yml file that has to do with the NFS stack, again obtained by exporting the NFS stack configuration:

convoy-nfs-storagepool: scale: 1 health_check: port: 10241 interval: 2000 unhealthy_threshold: 3 strategy: recreate response_timeout: 2000 request_line: GET /healthcheck HTTP/1.0 healthy_threshold: 2 metadata: mount_dir: /nfs/development/proj1 nfs_server: 172.31.41.108 convoy-nfs: health_check: port: 10241 interval: 2000 unhealthy_threshold: 3 strategy: recreate response_timeout: 2000 request_line: GET /healthcheck HTTP/1.0 healthy_threshold: 2 metadata: mount_dir: /nfs/development/proj1 nfs_server: 172.31.41.108 mount_opts: ''

To create the NFS stack, all I need to do at this point is to call:

$ ./rancher-nfssetup.sh up

To inspect the logs for the stack, I can call:

$ ./rancher-nfssetup.sh logs

Note that I passed various arguments to the rancher-compose utility. Most of them are specified as environment variables. This allows me to add the bash script to version control without worrying about credentials, secrets etc. I also use the --env-file .envvars option, which allows me to define environment variables in the .envvars file and have them interpolated by rancher-compose in the various yml files it uses. 

Creating volumes using the NFS stack

One of my goals was to attach NFS-based volumes to Docker containers in my infrastructure. To do this, I needed to create volumes in Rancher. One way to do it is to go to Infrastructure -> Storage in the Rancher UI, then go to the area corresponding to the NFS stack you want and click Add Volume, giving the volume a name and a description. Doing it manually is well and good, but I wanted to do it automatically, so I used another bash script around rancher-compose together with another docker-compose file:

$ cat rancher-volsetup.sh #!/bin/bash COMMAND=$@ rancher-compose -p proj1-development-volsetup --url $RANCHER_URL --access-key $RANCHER_API_ACCESS_KEY --secret-key $RANCHER_API_SECRET_KEY --env-file .envvars --file docker-compose-volsetup.yml --rancher-file rancher-compose.yml $COMMAND

$ cat docker-compose-volsetup.yml

volsetup: image: ubuntu:14.04 labels: io.rancher.container.start_once: true volumes: - volMysqlData:/var/lib/mysql - volAppShared:/var/www/shared volume_driver: proj1-development-nfs

A few things to note in the docker-compose-volsetup.yml file:

I used the ubuntu:14.04 Docker image and I attached two volumes, one called volMysqlData and once called volAppSharedData. The first one will be mounted on the Docker container as /var/lib/mysql and the second one will be mounted as /var/www/shared. These are arbitrary paths, since my goal was just to create the volumes as Rancher resources.
I wanted the volsetup service to run once so that the volumes get created, then stop. For that, I used the special Rancher label io.rancher.container.start_once: true
I used as the volume_driver the NFS stack proj1-development-nfs I created above. This is important, because I want these volumes to be created within this NFS stack.

I used the following commands to create and start the proj1-development-volsetup stack, then to show its logs, and finally to shut it down and remove its containers, which are not needed anymore once the volumes get created:

./rancher-volsetup.sh up -d sleep 30 ./rancher-volsetup.sh logs ./rancher-volsetup.sh down ./rancher-volsetup.sh rm --force

I haven't figured out yet how to remove a Rancher stack programmatically, so for these 'helper' type stacks I had to use the Rancher UI to delete them.

At this point, if you look in the /nfs/development/proj1 directory on the NFS server, you should see 2 directories with the same names as the volumes we created.

Creating a database stack

So far I haven't used any custom Docker images. For the database layer of my application, I will want to use a custom image which I will push to the Amazon ECR registry. I will use this image in a docker-compose file in order to set up and start the database in Rancher.

I have a directory called db containing the following Dockerfile:

$ cat Dockerfile
FROM percona

VOLUME /var/lib/mysql

COPY etc/mysql/my.cnf /etc/mysql/my.cnf
COPY scripts/db_setup.sh /usr/local/bin/db_setup.sh

I have a customized MySQL configuration file my.cnf (in my local directory db/etc/mysql) which gets copied to the Docker image as /etc/mysql.my.cnf. I also have a db_setup.sh bash script in my local directory db/scripts which gets copied to /usr/local/bin in the Docker image. In this script I grant rights to a MySQL user used by the Web app, and I also load a MySQL dump file if it exists:

$ cat scripts/db_setup.sh #

!/bin/bash set -e host="$1" until mysql -h "$host" -uroot -p$MYSQL_ROOT_PASSWORD -e "SHOW DATABASES"; do >&2 echo "MySQL is unavailable - sleeping" sleep 1 done >&2 echo "MySQL is up - executing GRANT statement" mysql -h "$host" -uroot -p$MYSQL_ROOT_PASSWORD \ -e "GRANT ALL ON $MYSQL_DATABASE.* TO $MYSQL_USER@'%' IDENTIFIED BY \"$MYSQL_PASSWORD\"" >&2 echo "Starting to load SQL dump" mysql -h "$host" -uroot -p$MYSQL_ROOT_PASSWORD $MYSQL_DATABASE < /dbdump/$MYSQL_DUMP_FILE >&2 echo "Finished loading SQL dump"

Note that the database name, database user name and password, as well as the MySQL root password are all passed in environment variables.

To build this Docker image, I ran:

$ docker build -t my_ecr_registry_id.dkr.ecr.my_region.amazonaws.com/db:proj1-development .

Note that I tagged the image with the proj1-development tag.

To push this image to Amazon ECR, I first called:

$(aws get-login)

then:

$ docker push my_ecr_registry_id.dkr.ecr.my_region.amazonaws.com/db:proj1-development

To run the db_setup.sh script inside a Docker container in order to set up the database, I put together the following docker-compose file:

$ cat docker-compose-dbsetup.yml

ECRCredentials:

environment:

AWS_REGION: $AWS_REGION

AWS_ACCESS_KEY_ID: $AWS_ACCESS_KEY_ID

AWS_SECRET_ACCESS_KEY: $AWS_SECRET_ACCESS_KEY

labels:

io.rancher.container.pull_image: always

io.rancher.container.create_agent: 'true'

io.rancher.container.agent.role: environment

io.rancher.container.start_once: true

tty: true

image: objectpartners/rancher-ecr-credentials

stdin_open: true

db:

image: my_ecr_registry_id.dkr.ecr.my_region.amazonaws.com/db:proj1-development

labels:

io.rancher.container.pull_image: always

io.rancher.scheduler.affinity:host_label: dbsetup=proj1

volumes:

- volMysqlData:/var/lib/mysql

volume_driver: proj1-development-nfs

environment:

- MYSQL_ROOT_PASSWORD=$MYSQL_ROOT_PASSWORD

dbsetup:

image: my_ecr_registry_id.dkr.ecr.my_region.amazonaws.com/db:proj1-development

labels:

io.rancher.container.pull_image: always

io.rancher.container.start_once: true

io.rancher.scheduler.affinity:host_label: dbsetup=proj1

command: /usr/local/bin/db_setup.sh db

links:

- db:db

volumes:

- volMysqlData:/var/lib/mysql

- /dbdump/proj1:/dbdump

volume_driver: proj1-development-nfs

environment:

- MYSQL_ROOT_PASSWORD=$MYSQL_ROOT_PASSWORD

- MYSQL_DATABASE=$MYSQL_DATABASE

- MYSQL_USER=$MYSQL_USER

- MYSQL_PASSWORD=$MYSQL_PASSWORD

- MYSQL_DUMP_FILE=$MYSQL_DUMP_FILE

A few things to note:

there are 3 services in this docker-compose file

a ECRCredentials service which connects to Amazon ECR and allows the ECR image db:proj1-development to be used by the other 2 services
a db service which runs a Docker container based on the db:proj1-development ECR image, and which launches a MySQL database with the root password set to the value of the MYSQL_ROOT_PASSWORD environment variable
a dbsetup service that also runs a Docker container based on the db:proj1-development ECR image, but instead of the default command, which would run MySQL, it runs the db_setup.sh script (specified in the command directive); this service also uses environment variables specifying the database to be loaded from the SQL dump file, as well as the user and password that will get grants to that database

the dbsetup service links to the db service via the links directive
the dbsetup service is a 'run once then stop' type of service, which is why it has the label io.rancher.container.start_once: true attached
both the db and the dbsetup service will run on a Rancher host with the label 'dbsetup=proj1'; this is because we want to load the SQL dump from a file that the dbsetup service can find

we will put this file on a specific Rancher host in a directory called /dbdump/proj1, which will then be mounted by the dbsetup container as /dbdump
the db_setup.sh script will then load the SQL file called MYSQL_DUMP_FILE from the /dbdump directory
this can also work if we'd just put the SQL file in the same NFS volume as the MySQL data files, but I wanted to experiment with host labels in this case

wherever NFS volumes are used, for example for volMysqlData, the volume_driver needs to be set to the proper NFS stack, proj1-development-nfs in this case

It goes without saying that mounting the MySQL data files from NFS is a potential performance bottleneck, so you probably wouldn't do this in production. I wanted to experiment with NFS in Rancher, and the performance I've seen in development and staging for some of our projects doesn't seem too bad.

To run a Rancher stack based on this docker-compose-dbsetup.yml file, I used this bash script:

$ cat rancher-dbsetup.sh

#!/bin/bash

COMMAND=$@

rancher-compose -p proj1-development-dbsetup --url $RANCHER_URL --access-key $RANCHER_API_ACCESS_KEY --secret-key $RANCHER_API_SECRET_KEY --env-file .envvars --file docker-compose-dbsetup.yml --rancher-file rancher-compose.yml $COMMAND

Note that all environment variables referenced in the docker-compose-dbsetup.yml file are set in the .envvars file.

I wanted to run the proj1-development-dbsetup stack and then shut down its services once the dbsetup service completes. I used these commands as part of a bash script:

./rancher-dbsetup.sh up -d

while :

./rancher-dbsetup.sh logs --lines "10" > dbsetup.log 2>&1

grep 'Finished loading SQL dump' dbsetup.log

result=$?

if [ $result -eq 0 ]; then

break

echo Waiting 10 seconds for DB load to finish...

sleep 10

done

./rancher-dbsetup.sh logs

./rancher-dbsetup.sh down

./rancher-dbsetup.sh rm --force

Once the database is setup, I want to launch MySQL and keep it running so it can be used by the Web application. I have a separate docker-compose file for that:

$ cat docker-compose-dblaunch.yml

ECRCredentials:

environment:

AWS_REGION: $AWS_REGION

AWS_ACCESS_KEY_ID: $AWS_ACCESS_KEY_ID

AWS_SECRET_ACCESS_KEY: $AWS_SECRET_ACCESS_KEY

labels:

io.rancher.container.pull_image: always

io.rancher.container.create_agent: 'true'

io.rancher.container.agent.role: environment

io.rancher.container.start_once: true

tty: true

image: objectpartners/rancher-ecr-credentials

stdin_open: true

db:

image: my_ecr_registry_id.dkr.ecr.my_region.amazonaws.com/db:proj1-development

labels:

io.rancher.container.pull_image: always

volumes:

- volMysqlData:/var/lib/mysql

volume_driver: proj1-development-nfs

The db service is similar to the one in the docker-compose-dbsetup.yml file. In this case the database is all set up, so we don't need anything except the NFS volume to mount the MySQL data files from.

As usual, I have a bash script that calls docker-compose in order to create a stack called proj1-development-database:

$ cat rancher-dblaunch.sh

#!/bin/bash

COMMAND=$@

rancher-compose -p proj1-development-database --url $RANCHER_URL --access-key $RANCHER_API_ACCESS_KEY --secret-key $RANCHER_API_SECRET_KEY --env-file .envvars --file docker-compose-dblaunch.yml --rancher-file rancher-compose.yml $COMMAND

I call this script like this:

./rancher-dblaunch.sh up -d

At this point, the proj1-development-database stack is up and running and contains the db service running as a container on one of the Rancher hosts in the Rancher 'proj1dev' environment.

Creating a Web application stack

So far, I've been using either off-the-shelf or slightly customized Docker images. For the Web application stack I will be using more heavily customized images. The building block is a 'base' image whose Dockerfile contains directives for installing commonly used packages and for adding users.

Here is the Dockerfile for a 'base' image running Ubuntu 14.04:

FROM ubuntu:14.04

RUN apt-get update && \
apt-get install -y ntp build-essential build-essential binutils zlib1g-dev \
git acl cronolog lzop unzip mcrypt expat xsltproc python-pip curl language-pack-en-base
RUN pip install awscli

RUN adduser --ui 501 --ingroup www-data --shell /bin/bash --home /home/myuser myuser
RUN mkdir /home/myuser/.ssh
COPY files/myuser_authorized_keys /home/myuser/.ssh/authorized_keys
RUN chown -R myuser:www-data /home/myuser/.ssh && \
chmod 700 /home/myuser/.ssh && \
chmod 600 /home/myuser/.ssh/authorized_keys

When I built this image, I tagged it as my_ecr_registry_id.dkr.ecr.my_region.amazonaws.com/base:proj1-development.

Here is the Dockerfile for an image (based on the base image above) that installs Apache, PHP 5.6 (using a custom apt repository), RVM, Ruby and the compass gem:

FROM  my_ecr_registry_id.dkr.ecr.my_region.amazonaws.com/base:proj1-development

RUN export LC_ALL=en_US.UTF-8 && export LC_ALL=en_US.UTF-8 && export LANG=en_US.UTF-8 && \
apt-get install -y mysql-client-5.6 software-properties-common && add-apt-repository ppa:ondrej/php5-5.6

RUN apt-get update && \
apt-get install -y --allow-unauthenticated apache2 apache2-utils libapache2-mod-php5 \
php5 php5-mcrypt php5-curl php-pear php5-gd \
php5-dev php5-mysql php5-readline php5-xsl php5-xmlrpc php5-intl

# Install composer
RUN curl -sSL https://getcomposer.org/composer.phar -o /usr/bin/composer \
&& chmod +x /usr/bin/composer \
&& composer selfupdate

# Install rvm and compass gem for SASS image compilation

RUN curl https://raw.githubusercontent.com/rvm/rvm/master/binscripts/rvm-installer -o /tmp/rvm-installer.sh && \
chmod 755 /tmp/rvm-installer.sh && \
gpg --keyserver hkp://keys.gnupg.net --recv-keys D39DC0E3 && \
/tmp/rvm-installer.sh stable --path /home/myuser/.rvm --auto-dotfiles --user-install && \
/home/myuser/.rvm/bin/rvm get stable && \
/home/myuser/.rvm/bin/rvm reload && \
/home/myuser/.rvm/bin/rvm autolibs 3

RUN /home/myuser/.rvm/bin/rvm install ruby-2.2.2 && \
/home/myuser/.rvm/bin/rvm alias create default ruby-2.2.2 && \
/home/myuser/.rvm/wrappers/ruby-2.2.2/gem install bundler && \
/home/myuser/.rvm/wrappers/ruby-2.2.2/gem install compass

COPY files/apache2-foreground /usr/local/bin/
EXPOSE 80
CMD ["apache2-foreground"]

When I built this image, I tagged it as  my_ecr_registry_id.dkr.ecr.my_region.amazonaws.com/apache-php:proj1-development

With these 2 images as building blocks, I put together 2 more images, one for building artifacts for the Web application, and one for launching it.

Here is the Dockerfile for an image that builds the artifacts for the Web application:

FROM my_ecr_registry_id.dkr.ecr.my_region.amazonaws.com/apache-php:proj1-development

ADD ./scripts/app_setup.sh /usr/local/bin/

The heavy lifting takes place in the app_setup.sh script. That's where you would do things such as pull a specified git branch from application repo on GitHub, then run composer (if it's a PHP app) or other build tools in order to generate the artifacts necessary for running the application. At the end of this script, I generate a tar.gz of the code + any artifacts and upload it to S3 so I can use it when I generate the Docker image for the Web app.

When I built this image, I tagged it as  my_ecr_registry_id.dkr.ecr.my_region.amazonaws.com/appsetup:proj1-development

To actually run a Docker container based on the appsetup image, I used this docker-compose file:

$ cat docker-compose-appsetup.yml
ECRCredentials:
environment:
AWS_REGION: $AWS_REGION
AWS_ACCESS_KEY_ID: $AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY: $AWS_SECRET_ACCESS_KEY
labels:
io.rancher.container.pull_image: always
io.rancher.container.create_agent: 'true'
io.rancher.container.agent.role: environment
io.rancher.container.start_once: true
tty: true
image: objectpartners/rancher-ecr-credentials
stdin_open: true

appsetup:
image: my_ecr_registry_id.dkr.ecr.my_region.amazonaws.com/appsetup:proj1-development
labels:
io.rancher.container.pull_image: always
command: /usr/local/bin/app_setup.sh
external_links:
- proj1-development-database/db:db
volumes:
- volAppShared:/var/www/shared
volume_driver: proj1-development-nfs
environment:
- GIT_URL=$GIT_URL
- GIT_BRANCH=$GIT_BRANCH
- AWS_S3_REGION=$AWS_S3_REGION
- AWS_S3_ACCESS_KEY_ID=$AWS_S3_ACCESS_KEY_ID
- AWS_S3_SECRET_ACCESS_KEY=$AWS_S3_SECRET_ACCESS_KEY
- AWS_S3_RELEASE_BUCKET=$AWS_S3_RELEASE_BUCKET
- AWS_S3_RELEASE_FILENAME=$AWS_S3_RELEASE_FILENAME

Some things to note:

the command executed when a Docker container based on the appsetup service is launched is /usr/local/bin/app_setup.sh, as specified in the command directive

the app_setup.sh script runs commands that connect to the database, hence the need for the appsetup service to link to the MySQL database running in the proj1-development-database stack launched above; for that, I used the external_links directive

the appsetup service mounts an NFS volume (volAppShared) as /var/www/shared

the volume_driver needs to be proj1-development-nfs
before running the service, I created proper application configuration files under /nfs/development/proj1/volAppShared on the NFS server, specifying things such as the database server name (which needs to be 'db', since this is how the database container is linked as), the database name, user name and password, etc.

the appsetup service uses various environment variables referenced in the environment directive; it will pass these variables to the app_setup.sh script

To run the appsetup service, I used another bash script around the rancher-compose command:

$ cat rancher-appsetup.sh

#!/bin/bash

COMMAND=$@

rancher-compose -p proj1-development-appsetup --url $RANCHER_URL --access-key $RANCHER_API_ACCESS_KEY --secret-key $RANCHER_API_SECRET_KEY --env-file .envvars --file docker-compose-appsetup.yml --rancher-file rancher-compose.yml $COMMAND

Tip: When using its Cattle cluster management engine, Rancher does not add services linked to each other as static entries in /etc/hosts on the containers. Instead, it provides an internal DNS service so that containers in the same environment can reach each other by DNS names as long as they link to each other in docker-compose files. If you go to a shell prompt inside a container, you can ping other containers by name even from one Rancher stack to another. For example, from a web container in the proj1-development-app stack you can ping a database container in the proj1-development-database stack linked in the docker-compose file as db and you would get back a name of the type db.proj1-development-app.rancher.internal.

Tip: There is no need to expose ports from containers within the same Rancher environment. I spent many hours troubleshooting issues related to ports and making sure ports are unique across stacks, only to realize that the internal ports that the services listen on (3306 for MySQL, 80 and 443 for Apache) are reachable from the other containers in the same Rancher environment. The only ports you need exposed to the external world in the architecture I am describing are the load balancer ports, as I'll describe below.

Here is the Dockerfile for an image that runs the Web application:

FROM my_ecr_registry_id.dkr.ecr.my_region.amazonaws.com/apache-php:proj1-development

# disable interactive functions
ARG DEBIAN_FRONTEND=noninteractive

RUN a2enmod headers \
&& a2enmod rewrite \
&& a2enmod ssl

RUN rm -rf /etc/apache2/ports.conf /etc/apache2/sites-enabled/*
ADD etc/apache2/sites-enabled /etc/apache2/sites-enabled
ADD etc/apache2/ports.conf /etc/apache2/ports.conf

ADD release /var/www/html/release
RUN chown -R myuser:www-data /var/www/html/release

This image is based on the apache-php image but adds Apache customizations, as well as the release directory obtained from the tar.gz file uploaded to S3 by the appsetup service.

When I built this image, I tagged it as my_ecr_registry_id.dkr.ecr.my_region.amazonaws.com/app:proj1-development

Code deployment

My code deployment process is a bash script (which can be used standalone, or as part of a Jenkins job, or can be turned into a Jenkins pipeline) that first runs the appsetup service in order to generate a tar.gz of the code and artifacts, then downloads it from S3 and uses it as the local release directory to be copied into the app image. The script then pushes the app Docker image to Amazon ECR. The environment variables are either defined in an .envvars file or passed via Jenkins parameters. The script assumes that the Dockerfile for the app image is in the current directory, and that the etc directory structure used for the Apache files in the app image is also in the current directory (they are all checked into the project repository, so Jenkins will find them).

./rancher-appsetup.sh up -d

sleep 20

cp /dev/null appsetup.log

while :

./rancher-appsetup.sh logs >> appsetup.log 2>&1

grep 'Restarting web server apache2' appsetup.log

result=$?

if [ $result -eq 0 ]; then

break

echo Waiting 10 seconds for app code deployment to finish...

sleep 10

done

./rancher-appsetup.sh logs

./rancher-appsetup.sh down

./rancher-appsetup.sh rm --force

# download release.tar.gz from S3 and unpack it

set -a

. .envvars

set +a

export AWS_ACCESS_KEY_ID=$AWS_S3_ACCESS_KEY_ID

export AWS_SECRET_ACCESS_KEY=$AWS_S3_SECRET_ACCESS_KEY

rm -rf $AWS_S3_RELEASE_FILENAME.tar.gz

aws s3 --region $AWS_S3_REGION ls s3://$AWS_S3_RELEASE_BUCKET/

aws s3 --region $AWS_S3_REGION cp s3://$AWS_S3_RELEASE_BUCKET/$AWS_S3_RELEASE_FILENAME.tar.gz .

tar xfz $AWS_S3_RELEASE_FILENAME.tar.gz

# build app docker image and push it to ECR

cat << "EOF" > awscreds

[default]

aws_access_key_id=$AWS_ACCESS_KEY_ID

aws_secret_access_key=$AWS_SECRET_ACCESS_KEY

EOF

export AWS_SHARED_CREDENTIALS_FILE=./awscreds

$(aws ecr --region=$AWS_REGION get-login)

/usr/bin/docker build -t my_ecr_registry_id.dkr.ecr.my_region.amazonaws.com/app:proj1-development .

/usr/bin/docker push my_ecr_registry_id.dkr.ecr.my_region.amazonaws.com/app:proj1-development

Launching the app service

At this point, the Docker image for the app service has been pushed to Amazon ECR, but the service itself hasn't been started. To do that, I use this docker-compose file:

$ cat docker-compose-app.yml
ECRCredentials:
environment:
AWS_REGION: $AWS_REGION
AWS_ACCESS_KEY_ID: $AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY: $AWS_SECRET_ACCESS_KEY
labels:
io.rancher.container.pull_image: always
io.rancher.container.create_agent: 'true'
io.rancher.container.agent.role: environment
io.rancher.container.start_once: true
tty: true
image: objectpartners/rancher-ecr-credentials
stdin_open: true

app:
image: my_ecr_registry_id.dkr.ecr.my_region.amazonaws.com/app:proj1-development
labels:
io.rancher.container.pull_image: always
external_links:
- proj1-development-database/db:db
volumes:
- volAppShared:/var/www/shared
volume_driver: proj1-development-nfs

Nothing very different about this file compare to the files I've shown so far. The app service mounts the volAppShared NFS volume as /var/www/shared, and links to the MySQL database service db already running in the proj1-development-database Rancher stack, giving it the name 'db'.

To run the app service, I use this bash script wrapping rancher-compose:

$ cat rancher-app.sh
#!/bin/bash

COMMAND=$@

rancher-compose -p proj1-development-app --url $RANCHER_URL --access-key $RANCHER_API_ACCESS_KEY --secret-key $RANCHER_API_SECRET_KEY --env-file .envvars --file docker-compose-app.yml --rancher-file rancher-compose.yml $COMMAND

Since the proj1-development-app stack may already be running with an old version of the app Docker image, I will invoke rancher-app.sh with the force-upgrade option of the rancher-compose command:

./rancher-app.sh up -d --force-upgrade --confirm-upgrade --pull --batch-size "1"

This will perform a rolling upgrade of the app service, by stopping the containers for the app service one at a time (as indicated by the batch-size parameter), then pulling the latest Docker image for the app service, and finally starting each container again. Speaking of 'containers' plural, you can indicate how many containers should run at all times for the app service by adding these lines to rancher-compose.yml:

app:
scale: 2

In my case, I want 2 containers to run at all times. If you stop one container from the Rancher UI, you will see another one restarted automatically by Rancher in order to preserve the value specified for the 'scale' parameter.

Creating a load balancer stack

When I started to run load balancers in Rancher, I created them via the Rancher UI. I created a new stack, then added a load balancer service to it. It took me a while to figure out that I can then export the stack configuration and generate a docker-compose file and a rancher-compose snippet I can add to my main rancher-compose.yml file.

Here is the docker-compose file I use:

$ cat docker-compose-lbsetup.yml
lb:
ports:
- 8000:80
- 8001:443
external_links:
- proj1-development-app/app:app
labels:
io.rancher.loadbalancer.ssl.ports: '8001'
io.rancher.loadbalancer.target.proj1-development-app/app: proj1.dev.mydomain.com:8000=80,8001=443
tty: true
image: rancher/load-balancer-service
stdin_open: true

The ports directive tell the load balancer which ports to expose externally and what ports to map them to. This example shows that port 8000 will be exposed externally and mapped to port 80 on the target service, and port 8001 will be exposed externally and mapped to port 443 on the target service.

The external_links directive tells the load balancer which service to load balance. In this example, it is the app service in the proj1-development-app stack.

The labels directive does layer 7 load balancing by allowing you to specify a domain name that you want to send to a specific port. In this example, I want to send HTTP requests coming on port 8000 for proj1.dev.mydomain.com to port 80 on the target containers for the app service, and HTTPS requests coming on port 8001 for the same proj1.dev.mydomain.com name to port 443 on the target containers.

I could have also added a new line under labels, specifying that I want requests for proj1-admin.dev.mydomain.com coming on port 8000 to be sent to a different port on the target containers, assuming that I had Apache configured to listen on that port. You can read more about the load balancing features available in Rancher in the documentation.

Here is the load balancer section in rancher-compose.yml:

lb:
scale: 2
load_balancer_config:
haproxy_config: {}
default_cert: proj1.dev.mydomain.com
health_check:
port: 42
interval: 2000
unhealthy_threshold: 3
healthy_threshold: 2
response_timeout: 2000

Note that there is a mention of a default_cert. This is an SSL key + cert that I uploaded to Rancher via the UI by going to Infrastructure -> Certificates and that I named proj1.dev.mydomain.com. The Rancher Catalog does contain an integration for Let's Encrypt but I haven't had a chance to test it yet (from the Rancher Catalog: "The Let's Encrypt Certificate Manager obtains a free (SAN) SSL Certificate from the Let's Encrypt CA and adds it to Rancher's certificate store. Once the certificate is created it is scheduled for auto-renewal 14-days before expiration. The renewed certificate is propagated to all applicable load balancer services.")

Note also that the scale value is 2, which means that there will be 2 containers for the lb service.

Tip: In the Rancher UI, you can open a shell into any container, or view the logs for any container by going to the Settings icon of that container, and choosing Execute Shell or View Logs:

Tip: Rancher load balancers are based on haproxy. You can open a shell into a container running for the lb service, then look at the haproxy configuration file in /etc/haproxy/haproxy.cfg. To troubleshoot haproxy issues, you can enable UDP logging in /etc/rsyslog.conf by removing the comments before the following 2 lines:

#$ModLoad imudp
#$UDPServerRun 514

then restarting the rsyslog service. Then you can restart the haproxy service and inspect its log file in /var/log/haproxy.log.

To run the lb service, I use this bash script:

$ cat rancher-lbsetup.sh
#!/bin/bash

COMMAND=$@

rancher-compose -p proj1-development-lb --url $RANCHER_URL --access-key $RANCHER_API_ACCESS_KEY --secret-key $RANCHER_API_SECRET_KEY --env-file .envvars --file docker-compose-lbsetup.yml --rancher-file rancher-compose.yml $COMMAND

I want to do a rolling upgrade of the lb service in case anything has changed, so I invoke the rancher-compose wrapper script in a similar way to the one for the app service:

./rancher-lbsetup.sh up -d --force-upgrade --confirm-upgrade --batch-size "1"

Putting it all together in Jenkins

First I created a GitHub repository with the following structure:

All docker-compose-*.yml files
The rancher-compose.yml file
All rancher-*.sh bash scripts wrapping the rancher-compose command
A directory for the base Docker image (containing its Dockerfile and any other files that need to go into that image)
A directory for the apache-php Docker image
A directory for the db Docker image
A directory for the appsetup Docker image
A Dockerfile in the current directory for the app Docker image
An etc directory in the current directory used by the Dockerfile for the app image

Each project/environment combination has a branch created in this GitHub repository. For example, for the proj1 development environment I would create a proj1dev branch which would then contain any customizations I need for this project -- usually stack names, Docker tags, Apache configuration files under the etc directory.

My end goal was to use Jenkins to drive the launching of the Rancher services and the deployment of the code. Eventually I will use a Jenkins Pipeline to string together the various steps of the workflow, but for now I have 5 individual Jenkins jobs which all check out the proj1dev branch of the GitHub repo above. The jobs contain shell-type build steps where I actually call the various rancher bash scripts around rancher-compose. The Jenkins jobs also take parameters corresponding to the environment variables used in the docker-compose files and in the rancher bash scripts. I also use the Credentials section in Jenkins to store any secrets such as the Rancher API keys, AWS keys, S3 keys, ECR keys etc. On the Jenkins master and executor nodes I installed the rancher and rancher-compose CLI utilities (I downloaded the rancher CLI from the footer of the Rancher UI).

Job #1 builds the Docker images discussed above: base, apache-php, db, and appsetup (but not the app image yet).

Job #2 runs rancher-nfssetup.sh and rancher-volsetup.sh in order to set up the NFS stack and the volumes used by the dbsetup, appsetup, db and app services.

Job #3 runs rancher-dbsetup.sh and rancher-dblaunch.sh in order to set up the database via the dbsetup service, then launch the db service.

At this point, everything is ready for deployment of the application.

Job #4 is the code deployment job. It runs the sequence of steps detailed in the Code Deployment section above.

Job #5 is the rolling upgrade job for the app service and the lb service. If those services have never been started before, they will get started. If they are already running, they will be upgraded in a rolling fashion, batch-size containers at a time as I detailed above.

When a new code release needs to be pushed to the proj1dev Rancher environment, I would just run job #4 followed by job #5. Obviously you can string these jobs together in a Jenkins Pipeline, which I intend to do next.

Some more Rancher tips and tricks

To troubleshoot Rancher infrastructure-related issues, it helps to inspect the cattle-debug and cattle-error log files on the Rancher server:

cd /var/lib/docker
find . -name \*cattle\*log -ls
tail -f ./volumes/e426db7ae3def92c737d1b2eb7ed9423c93148e17df5c253a3b3a4dddfb42810/_data/logs/cattle-error.log

I recommend this great series of posts on "Lessons learned building a deployment pipeline with Docker, docker-compose and Rancher" from John Patterson and Chris Lunsford at This End Out
For production environments, I recommend looking into running your Rancher server in a multi-node HA configuration

Agile Testing

Friday, September 09, 2016

Docker orchestration with Rancher

No comments:

Modifying EC2 security groups via AWS Lambda functions

Followers