Thursday, July 15, 2010

Tracking and visualizing mail logs with MongoDB and gviz_api

To me, nothing beats a nice dashboard for keeping track of how your infrastructure and your application are doing. At Evite, sending mail is a core part of our business. One thing we need to ensure is that our mail servers are busily humming away, sending mail out to our users. To this end, I built a quick outgoing email tracking tool using MongoDB and pymongo, and I also put together a dashboard visualization of that data using the Google Visualization API via the gviz_api Python module.

Tracking outgoing email from the mail logs with pymongo

Mail logs are sent to a centralized syslog. I have a simple Python script that tails the common mail log file every 5 minutes, counts the lines that conform to a specific regular expression (looking for a specific msgid pattern), then inserts that count into a MongoDB database. Here's the snippet of code that does that:

import datetime
from pymongo import Connection

conn = Connection(host="")
db = conn.logs
maillogs = db.mail
d = {}
now =
d['insert_time'] = now
d['msg_count'] = msg_count

I use the pymongo module to open a connection to the host running the mongod daemon, then I declare a database called logs and a collection called maillogs within that database. Note that both the database and the collection are created on the fly in case they don't exist.

I then instantiate a Python dictionary with two keys, insert_time and msg_count. Finally, I use the save method on the maillogs collection to insert the dictionary into the MongoDB logs database. Can't get any easier than this.

Visualizing the outgoing email count with graph_viz

I have another simple Python script which queries the MongoDB logs database for all documents that have been inserted in the last hour. Here's how I do it:

conn = Connection()
db = conn.logs
maillogs = db.mail
now =
minutes_ago = now + datetime.timedelta(minutes=-MINUTES_AGO)
rows = maillogs.find({'insert_time': {"$gte": minutes_ago}})

As an aside, when querying MongoDB databases that contain documents with timestamp fields, the datetime module will become your intimate friend.

Just remember that you need to pass datetime objects when you put together a pymongo query. In the case above, I use the now() method to get the current timestamp, then I use timedelta with minutes=-60 to get the datetime object corresponding to 'now minus 1 hour'.

The gviz_api module has decent documentation, but it still took me a while to figure out how to use it properly (thanks to my colleague Dan Mesh for being the trailblazer and providing me with some good examples).

I want to graph the timestamps and message counts from the last hour. Using the pymongo query above, I get the documents inserted in MongoDB during the last hour. From that set, I need to generate the data that I am going to pass to gviz_api:

chart_data = []
for row in rows:
insert_time = row['insert_time']
insert_time = insert_time.strftime(%H:%M')
msg_count = int(row['msg_count'])
chart_data.append([insert_time, msg_count])

jschart("Outgoing_mail", chart_data)

In my case, chart_data is a list of lists, each list containing a timestamp and a message count.

I pass the chart_data list to the jschart function, which does the Google Visualization magic:

def jschart(name, chart_data):
description = [
("time", "string"),
("msg_count", "number", "Message count"),

data = []
for insert_time, msg_count in chart_data:
data.append((insert_time, msg_count))

# Loading it into gviz_api.DataTable
data_table = gviz_api.DataTable(description)

# Creating a JSON string
json = data_table.ToJSon()

html = TEMPL % {"title" : name, "json" : json}
open("charts/%s.html" % name, "w").write(html)

The important parts in this function are the description and the data variables. According to the docs, they both need to be of the same type, either dictionary or list. In my case, they're both lists. The description denotes the schema for the data I want to chart. I declare two variables I want to chart, insert_time of type string, and msg_count of type number. For msg_count, I also specify a user-friendly label called 'Message count', which will be displayed in the chart legend.

After constructing the data list based on chart_data, I declare a gviz_api DataTable, I load the data into it, I call the ToJSon method on it to get a JSON string, and finally I fill in a template string, passing it a title for the chart and the JSON data.

The template string is an HTML + Javascript snippet that actually talks to the Google Visualization backend and tells it to create an Area Chart. Click on this gist to view it.

That's it. I run the gviz_api script every 5 minutes via crontab and I generate an HTML file that serves as my dashboard.

I can easily also write a Nagios plugin based on the pymongo query, which would alert me for example if the number of outgoing email messages is too low or too high. It's very easy to write a Nagios plugin by just having a script that exits with 0 for success, 1 for warnings and 2 for critical errors. Here's a quick example, where wlimit is the warning threshold and climit is the critical threshold:

def check_maillogs(wlimit, climit):
# MongoDB
conn = Connection()
db = conn.logs
maillogs = db.mail
now =
minutes_ago = now + datetime.timedelta(minutes=-MINUTES_AGO)
count = maillogs.find({'insert_time': {"$gte": minutes_ago}}).count()
rc = 0
if count > wlimit:
rc = 1
if count > climit:
rc = 2
print "%d messages sent in the last %d minutes" % (count, MINUTES_AGO)
return rc

Update #1
See Mike Dirolf's comment on how to properly insert and query timestamp-related fields. Basically, use datetime.datetime.utcnow() instead of now() everywhere, and convert to local time zone when displaying.

Update #2
Due to popular demand, here's a screenshot of the chart I generate. Note that the small number of messages is a very, very small percentage of our outgoing mail traffic. I chose to chart it because it's related to some new functionality, and I want to see if we're getting too few or too many messages in that area of the application.


Anonymous said...

Very cool - thanks for the post! One thing to consider is using utcnow() instead of now() throughout, or attaching a timezone to the result of now(). Saving naive datetime instances that aren't UTC, like those returned by now(), is generally a bad idea - the dates inside of the server won't be UTC in that case. This FAQ has some more info:

- Mike

Grig Gheorghiu said...

Thanks for the quick comment and feedback, Mike! I'll definitely follow your advice and use utcnow() throughout.


Max Ischenko said...

Screenshot is missing ;)

Anonymous said...

I've been using Protovis for visualization stuff. It's probably one of the most impressive projects I've seen this year. Checkout:


Anonymous said...

nice work.

got a screenshot of the output or your dashboard?

Grig Gheorghiu said...

Thanks for all the comments. I updated the blog post with a reference to Mike's comment and with a screenshot.

Anonymous said...

You might be interested by this visualization I created on 15 years of emails. Basically each mail has a date AND a time. So I put a dot for each email: the date on the horizontal axis, the time on vertical axis (0-24h).

Anonymous said...

I am looking for a replacement of CouchDB because I am missing advanced query features. I considered MongoDB but tend now to PostreSQL. I want REST and JSON so I could natively call from a JavaScript client and I would also like to zero out the middleware part (thats the reason why I used CouchDB).
I mean I need small middle ware that can provide ultra fast REST/ JSON webservices on top of PostgreSQL. I remeber your talk about restish. Do you think there is something better than restish available now? What made you use MongoDB instead?


Grig Gheorghiu said...

Mark -- I use MongoDB as a convenient data store for the log data that I'm capturing. It's much easier to use in this scenario than a traditional RDBMS. I am not too worried about fault tolerance, high availability etc either, since it's just log data.

I think the issue of what frontend to use with it is orthogonal.


Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...