Tuesday, January 31, 2017

Notes on setting up Elasticsearch, Kibana and Fluentd on Ubuntu

I've been experimenting with an EFK stack (with Fluentd replacing Logstash) and I hasten to write down some of my notes. I could have just as well used Logstash, but my goal is to also use the EFK stack for capturing logs out of Kubernetes clusters, and I wanted to become familiar with Fluentd, which is a Cloud Native Computing Foundation project.

1) Install Java 8

On Ubuntu 16.04:

# apt-get install openjdk-8-jre-headless

On Ubuntu 14.04:

# add-apt-repository -y ppa:webupd8team/java
# apt-get update
# apt-get -y install oracle-java8-installer

2) Download and install Elasticsearch (latest version is 5.1.2 currently)

# wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.1.2.deb
# dpkg -i elasticsearch-5.1.2.deb


Edit /etc/default/elasticsearch/elasticsearch.yml and set

network.host: 0.0.0.0

# service elasticsearch restart

3) Download and install Kibana

# wget https://artifacts.elastic.co/downloads/kibana/kibana-5.1.2-amd64.deb
# dpkg -i kibana-5.1.2-amd64.deb


Edit /etc/kibana/kibana.yml and set

server.host: "local_ip_address"


# service kibana restart

4) Install Fluentd agent (td-agent)

On Ubuntu 16.04:

# curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-xenial-td-agent2.sh | sh

On Ubuntu 14.04:

# curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-trusty-td-agent2.sh | sh


Install Fluentd elasticsearch plugin (note that td-agent comes with its own gem installer):

# td-agent-gem install fluent-plugin-elasticsearch

5) Configure Fluentd agent

To specify the Elasticsearch server to send the local logs to, use a match stanza in /etc/td-agent/td-agent.conf:

<match **>
  @type elasticsearch
  logstash_format true
  host IP_ADDRESS_OF_ELASTICSEARCH_SERVER
  port 9200
  index_name fluentd
  type_name fluentd.project.stage.web01
</match>

Note that Fluentd is backwards compatible with logstash, so if you set logstash_format true, Elasticsearch will create an index called logstash-*. Also, port 9200 needs to be open from the client to the Elasticsearch server.

I found it useful to set the type_name property to a name specific to the client running the Fluentd agent. For example, if you have several projects/tenants, each with multiple environments (dev, stage, prod) and each environment with multiple servers, you could use something like type_name fluentd.project.stage.web01. This label will then be parsed and shown in Kibana and will allow you to easily tell the source of a given log entry.

If you want Fluentd to parse Apache logs and send the log entries to Elasticsearch, use stanzas of this form in td-agent.conf:

<source>
  type tail
  format apache2
  path /var/log/apache2/mysite.com-access.log
  pos_file /var/log/td-agent/mysite.com-access.pos
  tag apache.access
</source>

<source>
  type tail
  format apache2
  path /var/log/apache2/mysite.com-ssl-access.log
  pos_file /var/log/td-agent/mysite.com-ssl-access.pos
  tag apache.ssl.access
</source>

For syslog logs, use:

<source>
  @type syslog
  port 5140
  bind 0.0.0.0
  tag system.local
</source>

Restart td-agent:

# service td-agent restart

Inspect the td-agent log file:

# tail -f /var/log/td-agent/td-agent.log

Some things I've had to do to fix errors emitted by td-agent:
  • change permissions on apache log directory and log files so they are readable by user td-agent
  • make sure port 9200 is open from the client to the Elasticsearch server

That's it in a nutshell. In the next installment, I'll show how to secure the communication between the Fluentd agent and the Elasticsearch server.

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...