Friday, December 29, 2017

Experiences with the Kong API Gateway

Kong is an open-source API Gateway that you can install on-premise if you don't want or cannot use the AWS API Gateway. The benefits of putting an API Gateway in front of your actual API endpoints are many: rate limiting, authentication, security, better logging, etc. Kong offers all of these and more via plugins. I've experimented with Kong a bit and I am writing down my notes here for future reference.

Installing Kong on Ubuntu 16.04

Install Kong from deb package

# wget https://bintray.com/kong/kong-community-edition-deb/download_file?file_path=dists/kong-community-edition-0.11.2.xenial.all.deb
# dpkg -i kong-community-edition-0.11.2.xenial.all.deb

Install PostgreSQL

# apt-get install postgresql

Create kong database and user

# su - postgres
$ psql
psql (10.1)
Type "help" for help.


postgres=# CREATE USER kong; CREATE DATABASE kong OWNER kong;
postgres=# ALTER USER kong PASSWORD 'somepassword';

Modify kong configuration

# cp /etc/kong/kong.conf.default /etc/kong/kong.conf

Edit kong.conf and set:

pg_host = 127.0.0.1             
pg_port = 5432                  
pg_user = kong                  
pg_password = somepassword
pg_database = kong              

Run kong migrations job

# kong migrations up

Increase open files limit

# ulimit -n 4096

Start kong

# kong start

Install kong dashboard

Install updated version of node first.

# curl -sL https://deb.nodesource.com/setup_6.x -o nodesource_setup.sh
# bash nodesource_setup.sh
# apt-get install nodejs
# npm install -g kong-dashboard

Create systemd service to start kong-dashboard with basic auth

I specified port 8001 for kong-dashboard, instead of the default port 8080.

# cat /etc/systemd/system/multi-user.target.wants/kong-dashboard.service
[Unit]
Description=kong dashboard
After=network.target


[Service]
ExecStart=/usr/local/bin/kong-dashboard start --kong-url http://localhost:8001 --basic-auth someadminuser=someadminpassword


[Install]
WantedBy=multi-user.target


# systemctl daemon-reload
# systemctl start kong-dashboard

Adding an API to Kong

I used the Kong admin dashboard to add a new API Gateway object which points to the actual API endpoint I have. Note that the URL name as far as Kong is concerned can be anything you want. In this example I chose /kong1. The upstream URL is your actual API endpoint. The Hosts value is a comma-separated list of the host headers you want Kong to reply to. Here is contains the domain name for my actual API endpoint (stage.mydomain.com) as well as a new domain name that I will use later in conjunction with a load balancer in front of Kong (stage-api.mydomain.com).

  • Name: my-kong-api
  • Hosts: stage.mydomain.com, stage-api.mydomain.com
  • URLs: /kong1
  • Upstream URL: https://stage.mydomain.com/api/graphql/pub

At this point you can query the Kong admin endpoint for the existing APIs. We’ll use HTTPie (on a Mac you can install HTTPie via brew install httpie).

$ http http://stage.mydomain.com:8001/apis HTTP/1.1 200 OK Access-Control-Allow-Origin: * Connection: keep-alive Content-Type: application/json; charset=utf-8 Date: Fri, 29 Dec 2017 16:23:59 GMT Server: kong/0.11.2 Transfer-Encoding: chunked { "data": [ { "created_at": 1513401906560, "hosts": [ "stage-api.mydomain.com", "stage.mydomain.com" ], "http_if_terminated": true, "https_only": true, "id": "ad23a91a-b76a-417d-9889-0088a13e3419", "name": "my-kong-api", "preserve_host": true, "retries": 5, "strip_uri": true, "upstream_connect_timeout": 60000, "upstream_read_timeout": 60000, "upstream_send_timeout": 60000, "upstream_url": "https://stage.mydomain.com/api/graphql/pub", "uris": [ "/kong1" ] } ], "total": 1 }

Note that any operation that you do via the Kong dashboard can also be done via API calls to the Kong admin backend. For lots of examples on how to do that, see the official documentation as well as this blog post from John Nikolai.

Using the Kong rate-limiting plugin

In the Kong admin dashboard, go to Plugins -> Add:
  • Plugin name: rate-limiting
  • Config: minute = 10
This means that we are limiting the rate of calls to our API to 10 per minute. You can also specify limits per other units of time. See the rate-limiting plugin documentation for more details.

Verifying that our API endpoint is rate-limited


Now we can verify that the API endpoint is working by issuing a POST request to the Kong URL (/kong1) which points to our actual API endpoint:


$ http --verify=no -v POST "https://stage.mydomain.com:8443/kong1?query=version"
POST /kong1?query=version HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 0
Host: stage.mydomain.com:8443
User-Agent: HTTPie/0.9.9


HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 43
Content-Type: application/json
Date: Sat, 16 Dec 2017 18:16:53 GMT
Server: nginx/1.10.3 (Ubuntu)
Via: kong/0.11.2
X-Kong-Proxy-Latency: 86
X-Kong-Upstream-Latency: 33
X-RateLimit-Limit-minute: 10
X-RateLimit-Remaining-minute: 9


{
   "data": {
       "version": "21633901644387004745"
   }
}

A few things to notice in the call above:

  • we are making an SSL call; by default Kong exposes SSL on port 8443
  • we are passing a query string to the /kong1 URL endpoint; Kong will pass this along to our actual API
  • the HTTP reply contains 2 headers related to rate limiting:
    • X-RateLimit-Limit-minute: 10 (this specifies the rate we set for the rate-limiting plugin)
    • X-RateLimit-Remaining-minute: 9 (this specifies the remaining calls we have)

After 9 requests in quick succession, the rate-limiting headers are:


X-RateLimit-Limit-minute: 10
X-RateLimit-Remaining-minute: 1


After 10 requests:


X-RateLimit-Limit-minute: 10
X-RateLimit-Remaining-minute: 0


After the 11th request we get an HTTP 429 error:


HTTP/1.1 429
Connection: keep-alive
Content-Type: application/json; charset=utf-8
Date: Sat, 16 Dec 2017 18:21:19 GMT
Server: kong/0.11.2
Transfer-Encoding: chunked
X-RateLimit-Limit-minute: 10
X-RateLimit-Remaining-minute: 0


{
   "message": "API rate limit exceeded"
}

Putting a load balancer in front of Kong

Calling Kong APIs on non-standard port numbers gets cumbersome. To make things cleaner, I put an AWS ALB in front of Kong. I added a proper SSL certificate via AWS Certificate Manager for the domain stage-api.mydomain.com and associated it to a listener on port 443 of the ALB. I also created two Target Groups, one for HTTP traffic to port 80 on the ALB and one for HTTPS 
traffic to port 443 on the ALB. The first Target Group points to port 8000 on the server running Kong, because that's the port Kong uses for plain HTTP requests. The second Target Group points to port 8443 on the server running Kong, for HTTPS requests.

I was now able to make calls such as these:

$ http -v POST "https://stage-api.mydomain.com/kong1?query=version"
POST /kong1?query=version HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 0
Host: stage-api.mydomain.com
User-Agent: HTTPie/0.9.9

HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 42
Content-Type: application/json
Date: Fri, 29 Dec 2017 17:42:14 GMT
Server: nginx/1.10.3 (Ubuntu)
Via: kong/0.11.2
X-Kong-Proxy-Latency: 94
X-Kong-Upstream-Latency: 29
X-RateLimit-Limit-minute: 10
X-RateLimit-Remaining-minute: 9

{
    "data": {
        "version": "21633901644387004745"
    }
}

Using the Kong syslog plugin

I added the syslog plugin via the Kong admin dashboard. I set the following values:

  • server_errors_severity: err
  • successful_severity: notice
  • client_errors_severity: err
  • log_level: notice
I then created a file called kong.conf in /etc/rsyslog.d on the server running Kong:


# cat kong.conf
if ($programname == 'kong' and $syslogseverity-text == 'notice') then -/var/log/kong.log
& ~

# service rsyslog restart

Now every call to Kong is logged in syslog format in /var/log/kong.log. The message portion of the log entry is in JSON format.

Sending Kong logs to AWS ElasticSearch/Kibana

I used the process I described in my previous blog post on AWS CloudWatch Logs and AWS ElasticSearch. One difference was that for the Kong logs I had to use a new index in ElasticSearch. 

This was because the JSON object logged by Kong contains a field called response, which was clashing with another field called response already present in the cwl-* index in Kibana. What I ended up doing was copying the Lambda function used to send CloudWatch logs to ElasticSearch and replacing cwl- with cwl-kong- in the transform function. This created a new index cwl-kong-* in ElasticSearch, and at that point I was able to add that index to a Kibana dashboard and visualize and query the Kong logs.

This is just scratching the surface of what you can do with Kong. Here are a few more resources:

Friday, November 24, 2017

Using AWS CloudWatch Logs and AWS ElasticSearch for log aggregation and visualization

If you run your infrastructure in AWS, then you can use CloudWatch Logs and AWS ElasticSearch + Kibana for log aggregation/searching/visualization as an alternative to either rolling your own ELK stack, or using a 3rd party SaaS solution such as Logentries, Loggly, Papertrail or the more expensive Splunk, Sumo Logic etc.

Here are some pointers on how to achieve this.

1) Create IAM policy and role allowing read/write access to CloudWatch logs

I created a IAM policy called cloudwatch-logs-access with the following content:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents",
                "logs:DescribeLogStreams"
            ],
            "Resource": [
                "arn:aws:logs:*:*:*"
            ]
        }
    ]
}


Then I create an IAM role called cloudwatch-logs-role and attached the cloudwatch-logs-access policy to it.

2) Attach IAM role to EC2 instances

I attached the cloudwatch-logs-role IAM role to all EC2 instances from which I wanted to send logs to CloudWatch (I went to Actions --> Instance Settings --> Attach/Replace IAM Role and attached the role)

3) Install and configure CloudWatch Logs Agent on EC2 instances

I followed the instructions here for my OS, which is Ubuntu.

I first downloaded a Python script:

# curl https://s3.amazonaws.com/aws-cloudwatch/downloads/latest/awslogs-agent-setup.py -O

Then I ran the script in the region where my EC2 instances are:


# python awslogs-agent-setup.py --region us-west-2 Launching interactive setup of CloudWatch Logs agent ... Step 1 of 5: Installing pip ...DONE Step 2 of 5: Downloading the latest CloudWatch Logs agent bits ... DONE Step 3 of 5: Configuring AWS CLI ... AWS Access Key ID [None]: AWS Secret Access Key [None]: Default region name [us-west-2]: Default output format [None]: Step 4 of 5: Configuring the CloudWatch Logs Agent ... Path of log file to upload [/var/log/syslog]: Destination Log Group name [/var/log/syslog]: Choose Log Stream name: 1. Use EC2 instance id. 2. Use hostname. 3. Custom. Enter choice [1]: 2 Choose Log Event timestamp format: 1. %b %d %H:%M:%S (Dec 31 23:59:59) 2. %d/%b/%Y:%H:%M:%S (10/Oct/2000:13:55:36) 3. %Y-%m-%d %H:%M:%S (2008-09-08 11:52:54) 4. Custom Enter choice [1]: 3 Choose initial position of upload: 1. From start of file. 2. From end of file. Enter choice [1]: 1 More log files to configure? [Y]:

I continued by adding more log files such as apache access and error logs, and other types of logs.

You can start/stop/restart the CloudWatch Logs agent via:

# service awslogs start

The awslogs service writes its logs in /var/log/awslogs.log and its configuration file is in /var/awslogs/etc/awslogs.conf.

4) Create AWS ElasticSearch cluster

Not much to say here. Follow the prompts in the AWS console :)

For the initial Access Policy for the ES cluster, I chose an IP-based policy and specified the source CIDR blocks allowed to connect:

 "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "es:*",
      "Resource": "arn:aws:es:us-west-2:accountID:domain/my-es-cluster/*",
      "Condition": {
        "IpAddress": {
          "aws:SourceIp": [
            "1.2.3.0/24",
            "4.5.6.7/32"
          ]
        }
      }
    }

5) Create subscription filters for streaming CloudWatch logs to ElasticSearch

First, make sure that the log files you configured with the AWS CloudWatch Log agent are indeed sent to CloudWatch. For each log file name, you should see a CloudWatch Log Group with that name, and inside the Log Group you should see multiple Log Streams, each Log Stream having the same name as the hostname sending those logs to CloudWatch.

I chose one of the Log Streams, went to Actions --> Stream to Amazon Elasticsearch Service, chose the ElasticSearch cluster created above, then created a new Lambda function to do the streaming. I had to create a new IAM role for the Lambda function. I created a role I called lambda-execution-role and associated with it the pre-existing IAM policy AWSLambdaBasicExecutionRole.

Once this Lambda function is created, subsequent log subscription filters for other Log Groups will reuse it for streaming to the same ES cluster.

One important note here is that you also need to allow the role lambda-execution-role to access the ES cluster. To do that, I modified the ES access policy and added a statement for the ARN of this role:

    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::accountID:role/lambda-execution-role"
      },
      "Action": "es:*",
      "Resource": "arn:aws:es:us-west-2:accountID:domain/my-es-cluster/*"
    }

6) Configure index pattern in Kibana

The last step is to configure Kibana to use the ElasticSearch index for the CloudWatch logs. If you look under Indices in the ElasticSearch dashboard, you should see indices of the form cwl-2017.11.24. In Kibana, add an Index Pattern of the form cwl-*. It should recognize the @timestamp field as the timestamp for the log entries, and create the Index Pattern correctly.

Now if you go to the Discover screen in Kibana, you should be able to visualize and search your log entries streamed from CloudWatch.

Monday, July 31, 2017

Apache 2.4 authentication and whitelisting scenarios

I have these examples scattered among many Apache installations, so I wanted to gather my notes here for my benefit, and hopefully for others as well. The following scenarios depict various requirements for Apache 2.4 authentication and whitelisting. They are all for Apache 2.4.x running on Ubuntu 14.04/16.04.

Scenario 1: block all access to Apache except to a list of whitelisted IP addresses and networks

Apache configuration snippet:

  <Directory /var/www/html/>
     IncludeOptional /etc/apache2/whitelist.conf
     Order allow,deny
     Allow from all
  </Directory>

Contents of whitelist.conf file:

# local server IPs
Require ip 127.0.0.1
Require ip 172.31.2.2

# Office network
Require ip 1.2.3.0/24

# Other IP addresses
Require ip 4.5.6.7/32
Require ip 5.6.7.8/32
etc.

Scenario 2: enable basic HTTP authentication but allow specific IP addresses through with no authentication

Apache configuration snippet:

  <Directory /var/www/html/>
     AuthType basic
     AuthBasicProvider file
     AuthName "Restricted Content"
     AuthUserFile /etc/apache2/.htpasswd

     Require valid-user
     IncludeOptional /etc/apache2/whitelist.conf
     Satisfy Any
  </Directory>

The contents of whitelist.conf are similar to the ones in Scenario 1.

Scenario 3: enable basic HTTP authentication but allow access to specific URLs with no authentication

Apache configuration snippet:

  <Directory /var/www/html/>
     Order allow,deny
     Allow from all

     AuthType Basic
     AuthName "Restricted Content"
     AuthUserFile /etc/apache2/.htpasswd

     SetEnvIf Request_URI /.well-known/acme-challenge/*  noauth=1
     <RequireAny>
       Require env noauth
       Require valid-user
     </RequireAny>
  </Directory>

This is useful when you install SSL certificates from Let's Encrypt and you need to allow the Let's Encrypt servers access to the HTTP challenge directory.

Thursday, June 01, 2017

SSL termination and http caching with HAProxy, Varnish and Apache


A common requirement when setting up a development or staging server is to try to mimic production as much as possible. One scenario I've implemented a few times is to use Varnish in front of a web site but also use SSL. Since Varnish can't handle encrypted traffic, SSL needs to be terminated before it hits Varnish. One fairly easy way to do it is using HAProxy to terminate both HTTP and HTTPS traffic, then forwarding the unencrypted traffic to Varnish, which then forwards non-cached traffic to Apache or nginx. Here are the steps to achieve this on an Ubuntu 16.04 box.

1) Install HAProxy and Varnish

# apt-get install haproxy varnish


2) Get SSL certificates from Let’s Encrypt

# wget https://dl.eff.org/certbot-auto
# chmod +x certbot-auto
# ./certbot-auto -a webroot --webroot-path=/var/www/mysite.com -d mysite.com certonly

3) Generate combined chain + key PEM file to be used by HAProxy

# cat /etc/letsencrypt/live/mysite.com/fullchain.pem /etc/letsencrypt/live/mysite.com/privkey.pem > /etc/ssl/private/mysite.com.pem

4) Configure HAProxy

Edit haproxy.cfg and add frontend sections for ports 80 and 443 + backend section pointing to varnish on port 8888

# cat /etc/haproxy/haproxy.cfg
global
        log /dev/log    local0
        log /dev/log    local1 notice
        chroot /var/lib/haproxy
        stats socket /run/haproxy/admin.sock mode 660 level admin
        stats timeout 30s
        user haproxy
        group haproxy
        daemon

        # Default SSL material locations
        ca-base /etc/ssl/certs
        crt-base /etc/ssl/private

        # Default ciphers to use on SSL-enabled listening sockets.
        # For more information, see ciphers(1SSL). This list is from:
        #  https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
        ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS
        ssl-default-bind-options no-sslv3
        tune.ssl.default-dh-param 2048

defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        timeout connect 5000
        timeout client  50000
        timeout server  50000
        errorfile 400 /etc/haproxy/errors/400.http
        errorfile 403 /etc/haproxy/errors/403.http
        errorfile 408 /etc/haproxy/errors/408.http
        errorfile 500 /etc/haproxy/errors/500.http
        errorfile 502 /etc/haproxy/errors/502.http
        errorfile 503 /etc/haproxy/errors/503.http
        errorfile 504 /etc/haproxy/errors/504.http

frontend www-http
   bind 172.31.8.204:80
   http-request set-header "SSL-OFFLOADED" "1"
   reqadd X-Forwarded-Proto:\ http
   default_backend varnish-backend

frontend www-https
   bind 172.31.8.204:443 ssl crt mysite.com.pem
   http-request set-header "SSL-OFFLOADED" "1"
   reqadd X-Forwarded-Proto:\ https
   default_backend varnish-backend

backend varnish-backend
   redirect scheme https if !{ ssl_fc }
   server varnish 172.31.8.204:8888 check

Enable UDP in rsyslog for haproxy logging by uncommenting 2 lines in /etc/rsyslog.conf:

# provides UDP syslog reception
module(load="imudp")
input(type="imudp" port="514")

Restart rsyslog and haproxy

# service rsyslog restart
# service haproxy restart

5) Configure varnish to listen on port 8888

Ubuntu 16.04 is using systemd for service management. You need to edit 2 files to configure the port varnish will listen on:

/lib/systemd/system/varnish.service
/etc/default/varnish

In both, set the port after the -a flag to 8888, then stop the varnish service, reload the systemctl daemon and restart the varnish service:

# systemctl stop varnish.service
# systemctl daemon-reload
# systemctl start varnish.service

By default, Varnish will send non-cached traffic to port 8080 on localhost.

6) Configure Apache or nginx to listen on 8080

For Apache, change port 80 to 8080 in all virtual hosts, and also change 80 to 8080 in /etc/apache2/ports.conf.




Thursday, March 30, 2017

Working with AWS CodeDeploy

As usual when I make a breakthrough after bumping my head against the wall for a few days trying to get something to work, I hasten to write down my notes here so I can remember what I've done ;) In this case, the head-against-the-wall routine was caused by trying to get AWS CodeDeploy to work within the regular code deployment procedures that we have in place using Jenkins and Capistrano.

Here is the 30,000 foot view of how the deployment process works using a combination of Jenkins, Docker, Capistrano and AWS CodeDeploy:
  1. Code gets pushed to GitHub
  2. Jenkins deployment job fires off either automatically (for development environments, if so desired) or manually
    • Jenkins spins up a Docker container running Capistrano and passes it several environment variables such as GitHub repository URL and branch, target deployment directory, etc.
    • The Capistrano Docker image is built beforehand and contains rake files that specify how the code it checks out from GitHub is supposed to be built
    • The Capistrano Docker container builds the code and exposes the target deployment directory as a Docker volume
    • Jenkins archives the files from the exposed Docker volume locally as a tar.gz file
    • Jenkins uploads the tar.gz to an S3 bucket
    • For good measure, Jenkins also builds a Docker image of a webapp container which includes the built artifacts, tags the image and pushes it to Amazon ECR so it can be later used if needed by an orchestration system such as Kubernetes
  3. AWS CodeDeploy runs a code deployment (via the AWS console currently, using the awscli soon) while specifying the S3 bucket and the tar.gz file above as the source of the deployment and an AWS AutoScaling group as the destination of the deployment
  4. Everybody is happy 
You may ask: why Capistrano? Why not use a shell script or some other way of building the source code into artifacts? Several reasons:
  • Capistrano is still one of the most popular deployment tools. Many developers are familiar with it.
  • You get many good features for free just by using Capistrano. For example, it automatically creates a releases directory under your target directory, creates a timestamped subdirectory under releases where it checks out the source code, builds the source code, and if everything works well creates a 'current' symlink pointing to the releases/timestamped subdirectory
  • This strategy is portable. Instead of building the code locally and uploading it to S3 for use with AWS CodeDeploy, you can use the regular Capistrano deployment and build the code directly on a target server via ssh. The rake files are the same, only the deploy configuration differs.
I am not going to go into details for the Jenkins/Capistrano/Docker setup. I've touched on some of these topics in previous posts.

I will go into details for the AWS CodeDeploy setup. Here goes.

Create IAM policies and roles

There are two roles that need to be created for AWS CodeDeploy to work. One is to be attached to EC2 instances that you want to deploy to, and one is to be used by the CodeDeploy agent running on each instance.

- Create following IAM policy for EC2 instances, which allows those instances to list S3 buckets and download fobject from S3 buckets (in this case the permissions cover all S3 buckets, but you can specify specific ones in the Resource variable):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:Get*",
                "s3:List*"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ]
}


- Attach above policy to an IAM role and name the role e.g. CodeDeploy-EC2-Instance-Profile

- Create following IAM policy to be used by the CodeDeploy agent running on the EC2 instances you want to deploy to:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:CompleteLifecycleAction",
        "autoscaling:DeleteLifecycleHook",
        "autoscaling:DescribeAutoScalingGroups",
        "autoscaling:DescribeLifecycleHooks",
        "autoscaling:PutLifecycleHook",
        "autoscaling:RecordLifecycleActionHeartbeat",
        "autoscaling:CreateAutoScalingGroup",
        "autoscaling:UpdateAutoScalingGroup",
        "autoscaling:EnableMetricsCollection",
        "autoscaling:DescribeAutoScalingGroups",
        "autoscaling:DescribePolicies",
        "autoscaling:DescribeScheduledActions",
        "autoscaling:DescribeNotificationConfigurations",
        "autoscaling:DescribeLifecycleHooks",
        "autoscaling:SuspendProcesses",
        "autoscaling:ResumeProcesses",
        "autoscaling:AttachLoadBalancers",
        "autoscaling:PutScalingPolicy",
        "autoscaling:PutScheduledUpdateGroupAction",
        "autoscaling:PutNotificationConfiguration",
        "autoscaling:PutLifecycleHook",
        "autoscaling:DescribeScalingActivities",
        "autoscaling:DeleteAutoScalingGroup",
        "ec2:DescribeInstances",
        "ec2:DescribeInstanceStatus",
        "ec2:TerminateInstances",
        "tag:GetTags",
        "tag:GetResources",
        "sns:Publish",
        "cloudwatch:DescribeAlarms",
        "elasticloadbalancing:DescribeLoadBalancers",
        "elasticloadbalancing:DescribeInstanceHealth",
        "elasticloadbalancing:RegisterInstancesWithLoadBalancer",
        "elasticloadbalancing:DeregisterInstancesFromLoadBalancer"
      ],
      "Resource": "*"
    }
  ]
} 
- Attach above policy to an IAM role and name the role e.g. CodeDeployServiceRole

Create a 'golden image' AMI

The whole purpose of AWS CodeDeploy is to act in conjunction with Auto Scaling Groups so that the app server layer of your infrastructure becomes horizontally scalable. You need to start somewhere, so I recommend the following:
  • set up an EC2 instance for your app server the old-fashioned way, either with Ansible/Chef/Puppet or with Terraform
  • configure this EC2 instance to talk to any other layers it needs, i.e. the database layer (either running on EC2 instances or, if you are in AWS, on RDS), the caching layer (dedicated EC2 instances running Redis/memcached, or AWS ElastiCache), etc. 
  •  deploy some version of your code to the instance and make sure your application is fully functioning
 If all this works as expected, take an AMI image from this EC2 instance. This image will serve as the 'golden image' that all other instances launched by the Auto Scaling Group / Launch Configuration will be based on.

Create Application Load Balancer (ALB) and Target Group

The ALB will be the entry point into your infrastructure. For now just create an ALB and an associated Target Group. Make sure you add your availability zones into the AZ pool of the ALB.

If you want the ALB to handle the SSL certificate for your domain, add the SSL cert to Amazon Certificate Manager and add a listener on the ALB mapping port 443 to the Target Group. Of course, also add a listener for port 80 on the ALB and map it to the Target Group.

I recommend creating a dedicated Security Group for the ALB and allowing ports 80 and 443, either from everywhere or from a restricted subnet if you want to test it first.

For the Target Group, make sure you set the correct health check for your application (something like requesting a special file healthcheck.html over port 80). No need to select any EC2 instances in the Target Group yet.

Create Launch Configuration and Auto Scaling Group

Here are the main elements to keep in mind when creating a Launch Configuration to be used in conjunction with AWS CodeDeploy:
  • AMI ID: specify the AMI ID of the 'golden image' created above
  • IAM Instance Profile: specify CodeDeploy-EC2-Instance-Profile (role created above)
  • Security Groups: create a Security Group that allows access to ports 80 and 443 from the ALB Security Group above 
  • User data: each newly launched EC2 instance based on your golden image AMI will have to get the AWS CodeDeploy agent installed. Here's the user data for an Ubuntu-based AMI (taken from the AWS CodeDeploy documentation):
#!/bin/bash
apt-get -y update
apt-get -y install awscli
apt-get -y install ruby
cd /home/ubuntu
aws s3 cp s3://aws-codedeploy-us-west-2/latest/install . --region us-west-2
chmod +x ./install

./install auto

Alternatively, you can run these commands your initial EC2 instance, then take a golden image AMI based off of that instance. That way you make sure that the CodeDeploy agent will be running on each new EC2 instance that gets provisioned via the Launch Configuration. In this case, there is no need to specify a User data section for the Launch Configuration.

Once the Launch Configuration is created, you'll be able to create an Auto Scaling Group (ASG) associated with it. Here are the main configuration elements for the ASG:
  • Launch Configuration: the one defined above
  • Target Groups: the Target Group defined above
  • Min/Max/Desired: up to you to define the EC2 instance count for each of these. You can start with 1/1/1 to test
  • Scaling Policies: you can start with a static policy (corresponding to Min/Max/Desired counts) and add policies based on alarms triggered by various Cloud Watch metrics such as CPU usage, memory usage, etc as measured on the EC2 instances comprising the ASG
Once the ASG is created, depending on the Desired instance count, that many EC2 instances will be launched.

 Create AWS CodeDeploy Application and Deployment Group

We finally get to the meat of this post. Go to the AWS CodeDeploy page and create a new Application. You also need to create a Deployment Group while you are at it. For Deployment Type, you can start with 'In-place deployment' and once you are happy with that, move to 'Blue/green deployment, which is more complex but better from a high-availability and rollback perspective.

In the Add Instances section, choose 'Auto scaling group' as the tag type, and the name of the ASG created above as the key. Under 'Total matching instances' below the Tag and Key you should see a number of EC2 instances corresponding to the Desired count in your ASG.

For Deployment Configuration, you can start with the default value, which is OneAtATime, then experiment with other types such as HalfAtATime (I don't recommend AllAtOnce unless you know what you're doing)

For Service Role, you need to specify the CodeDeployServiceRole service role created above.

Create scaffoding files for AWS CodeDeploy Application lifecycle

At a minimum, the tar.gz or zip archive of your application's built code also needs to contain what is called an AppSpec file, which is a YAML file named appspec.yml. The file needs to be in the root directory of the archive. Here's what I have in mine:

version: 0.0
os: linux
files:
  - source: /
    destination: /var/www/mydomain.com/
hooks:
  BeforeInstall:
    - location: before_install
      timeout: 300
      runas: root
  AfterInstall:
    - location: after_install
      timeout: 300
      runas: root


The before_install and after_install scripts (you can name them anything you want) are shell scripts that will be executed after the archive is downloaded on the target EC2 instance.

The before_install script will be run before the files inside the archive are copied into the destination directory (as specified in the destination variable /var/www/mydomain.com). You can do things like create certain directories that need to exist, or change the ownership/permissions of certain files and directories.

The after_install script script will be run after the files inside the archive are copied into the destination directory. You can do things like create symlinks, run any scripts that need to complete the application installation (such as scripts that need to hit the database), etc.

One note specific to archives obtained from code built by Capistrano: it's customary to have Capistrano tasks create symlinks for directories such as media or var to volumes outside of the web server document root (when media files are mounted over NFS/EFS for example). When these symlinks are unarchived by CodeDeploy, they tend to turn into regular directories, and the contents of potentially large mounted file systems get copied in them. Not what you want. I ended up creating all symlinks I need in the after_install script, and not creating them in Capistrano.

There are other points in the Application deploy lifecycle where you can insert your own scripts. See the AppSpec hook documentation.


Deploy the application with AWS CodeDeploy

Once you have an Application and its associated Deployment Group, you can select this group and choose 'Deploy new revision' from the Action drop-down. For the Revision Type, choose 'My application is stored in Amazon S3'. For the Revision Location, type in the name of the S3 bucket where Jenkins uploaded the tar.gz of the application build. You can play with the other options according to the needs of your deployment.

Finally, hit the Deploy button, baby! If everything goes well, you'll see a nice green bar showing success.


If everything does not go well, you can usually troubleshoot things pretty well by looking at the logs of the Events associated with that particular Deployment. Here's an example of an error log:

ScriptFailed
Script Name after_install
Message Script at specified location: after_install run as user root failed with exit code 1 

Log Tail [stderr]chown: changing ownership of ‘/var/www/mydomain.com/shared/media/images/85.jpg’:
Operation not permitted

 
In this case, I the 'shared' directory was mounted over NFS, so I had to make sure the permissions and ownership of the source file system on the NFS server were correct.

I am still experimenting with AWS CodeDeploy and haven't quite used it 'in anger' yet, so I'll report back with any other findings.

Tuesday, January 31, 2017

Notes on setting up Elasticsearch, Kibana and Fluentd on Ubuntu

I've been experimenting with an EFK stack (with Fluentd replacing Logstash) and I hasten to write down some of my notes. I could have just as well used Logstash, but my goal is to also use the EFK stack for capturing logs out of Kubernetes clusters, and I wanted to become familiar with Fluentd, which is a Cloud Native Computing Foundation project.

1) Install Java 8

On Ubuntu 16.04:

# apt-get install openjdk-8-jre-headless

On Ubuntu 14.04:

# add-apt-repository -y ppa:webupd8team/java
# apt-get update
# apt-get -y install oracle-java8-installer

2) Download and install Elasticsearch (latest version is 5.1.2 currently)

# wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.1.2.deb
# dpkg -i elasticsearch-5.1.2.deb


Edit /etc/default/elasticsearch/elasticsearch.yml and set

network.host: 0.0.0.0

# service elasticsearch restart

3) Download and install Kibana

# wget https://artifacts.elastic.co/downloads/kibana/kibana-5.1.2-amd64.deb
# dpkg -i kibana-5.1.2-amd64.deb


Edit /etc/kibana/kibana.yml and set

server.host: "local_ip_address"


# service kibana restart

4) Install Fluentd agent (td-agent)

On Ubuntu 16.04:

# curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-xenial-td-agent2.sh | sh

On Ubuntu 14.04:

# curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-trusty-td-agent2.sh | sh


Install Fluentd elasticsearch plugin (note that td-agent comes with its own gem installer):

# td-agent-gem install fluent-plugin-elasticsearch

5) Configure Fluentd agent

To specify the Elasticsearch server to send the local logs to, use a match stanza in /etc/td-agent/td-agent.conf:

<match **>
  @type elasticsearch
  logstash_format true
  host IP_ADDRESS_OF_ELASTICSEARCH_SERVER
  port 9200
  index_name fluentd
  type_name fluentd.project.stage.web01
</match>

Note that Fluentd is backwards compatible with logstash, so if you set logstash_format true, Elasticsearch will create an index called logstash-*. Also, port 9200 needs to be open from the client to the Elasticsearch server.

I found it useful to set the type_name property to a name specific to the client running the Fluentd agent. For example, if you have several projects/tenants, each with multiple environments (dev, stage, prod) and each environment with multiple servers, you could use something like type_name fluentd.project.stage.web01. This label will then be parsed and shown in Kibana and will allow you to easily tell the source of a given log entry.

If you want Fluentd to parse Apache logs and send the log entries to Elasticsearch, use stanzas of this form in td-agent.conf:

<source>
  type tail
  format apache2
  path /var/log/apache2/mysite.com-access.log
  pos_file /var/log/td-agent/mysite.com-access.pos
  tag apache.access
</source>

<source>
  type tail
  format apache2
  path /var/log/apache2/mysite.com-ssl-access.log
  pos_file /var/log/td-agent/mysite.com-ssl-access.pos
  tag apache.ssl.access
</source>

For syslog logs, use:

<source>
  @type syslog
  port 5140
  bind 0.0.0.0
  tag system.local
</source>

Restart td-agent:

# service td-agent restart

Inspect the td-agent log file:

# tail -f /var/log/td-agent/td-agent.log

Some things I've had to do to fix errors emitted by td-agent:
  • change permissions on apache log directory and log files so they are readable by user td-agent
  • make sure port 9200 is open from the client to the Elasticsearch server

That's it in a nutshell. In the next installment, I'll show how to secure the communication between the Fluentd agent and the Elasticsearch server.

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...