Tuesday, November 29, 2016

Kubernetes resource graphing with Heapster, InfluxDB and Grafana

I know that the Cloud Native Computing Foundation chose Prometheus as the monitoring platform of choice for Kubernetes, but in this post I'll show you how to quickly get started with graphing CPU, memory, disk and network in a Kubernetes cluster using Heapster, InfluxDB and Grafana.

The documentation in the kubernetes/heapster GitHub repo is actually pretty good. Here's what I did:

$ git clone https://github.com/kubernetes/heapster.git
$ cd heapster/deploy/kube-config/influxdb

Look at the yaml manifests to see if you need to customize anything. I left everything 'as is' and ran:

$ kubectl create -f .
deployment "monitoring-grafana" created
service "monitoring-grafana" created
deployment "heapster" created
service "heapster" created
deployment "monitoring-influxdb" created
service "monitoring-influxdb" created

Then you can run 'kubectl cluster-info' and look for the monitoring-grafana endpoint. Since the monitoring-grafana service is of type LoadBalancer, if you run your Kubernetes cluster in AWS, the service creation will also involve the creation of an ELB. By default the ELB security group allows 80 from all, so I edited that to restrict it to some known IPs.

After a few minutes, you should see CPU and memory graphs shown in the Kubernetes dashboard. Here is an example showing pods running in the kube-system namespace:



You can also hit the Grafana endpoint and choose the Cluster or Pods dashboards. Note that if you have a namespace different from default and kube-system, you have to enter its name manually in the namespace field of the Grafana Pods dashboard. Only then you'll be able to see data corresponding to pods running in that namespace (or at least I had to jump through that hoop.)

Here is an example of graphs for the kubernetes-dashboard pod running in the kube-system namespace:


For info on how to customize the Grafana graphs, here's a good post from Deis.

No comments:

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...