Agile Testing: May 2010

Friday, May 21, 2010

Setting up a mail server in EC2 with postfix and Postini

Setting up a mail server in 'the cloud' is notoriously difficult, mainly because many cloud-based IP addresses are already blacklisted, and also because you don't have control over reverse DNS in many cases. One way to mitigate these factors is to use Postini for outbound email from your cloud-based mail server. I've experimented with this in a test environment, with fairly good results. Here are some notes I jotted down. I am indebted to Jeff Roberts for his help with the postfix setup.

Let's assume you have a domain called mydomain.com and you want to set up a mail server in EC2 (could be any cloud provider) so that you can both send and receive email. I will also assume that you'll use Postini -- which currently costs $11/year/user. In my setup, I am mostly interested in sending mail out and less in receiving mail, so I only have one user: info@mydomain.com.

EC2 setup

I deployed a c1.medium EC2 instance running Ubuntu 9.04 and I also gave it an Elastic IP. The host name of my mail server is mail01.

I then installed postfix via 'apt-get install postfix'.

Postfix setup

For starters, I configured postfix to receive mail for mydomain.com, and to restrict the set of IP addresses that are allowed to use it as a relay. The relevant lines in /etc/postfix/main.cf are:

myhostname = mail01
myorigin = /etc/mailname
mydestination = mydomain.com, mail01, localhost.localdomain, localhost
mynetworks = cidr:/etc/postfix/allowed_relays, 127.0.0.0/8 [::ffff:127.0.0.0]/104 [::1]/128

The contents of /etc/postfix/allowed_relays are similar to:

10.1.2.3 tstapp01
10.5.6.7 stgapp01

These IPs/names are application servers that I configured to use mail01 as their outgoing mail server, so they need to be allowed to send mail out.

Postini setup

When you sign up for Postini, you can specify that the user you created is of admin type. When you then log in to https://login.postini.com/ as that user and specify that you want to go to the System Administration area, you'll be directed to a URL such as http://ac-s9.postini.com. Make a note of the number after the s, in this case 9. It will dictate which outbound Postini mail servers you will need to use, per these instructions. For s9 (which was my case), the CIDR range of the outbound Postini mail servers is 74.125.148.0/22.

Within the Postini System Administration interface, I had to edit the settings for Inbound Servers and Outbound Servers.

For Inbound Servers, I edited the Delivery Mgr tab and specified the Elastic IP of mail01 as the Email Server IP address.

For Outbound Servers, I cliked on 'Add Outbound Email Server' and I specified a range whose beginning and end are the Elastic IP of mail01. For Reinjection Hosts, I also specified the Elastic IP of mail01.

The Reinjection Host setup is necessary for using Postini as an outbound mail service. See the Postini Outbound Services Configuration Guide (PDF) for details on how to do this depending on the mail server software you use.

More Postfix setup

Now we are ready to configure postfix to use Postini as its own outbound relay. I just added one line to /etc/postfix/main.cf:

relayhost = outbounds9.obsmtp.com

(this will be different, depending on the number you get when logging into Postini; in my case the number is 9)

I also had to allow the CIDR range of the Postini mail server to use mail01 as a relay (they need to reinject the message into mail01, hence the need to specify mail01's external Elastic IP as the Reinjection Host IP above). I edited /etc/postfix/allowed_relays and added the line:

74.125.148.0/22 postini ac-s9

DNS setup

To use Postini as an incoming mail server service, you need to configure mydomain.com to use Google Apps. Then you'll be able to specify the Postini mail servers that you'll use as MX records for mydomain.com.

DKIM setup

To further improve the chances that email you send out from the cloud will actually reach its intended recipients, it's a good idea to set up DKIM and SPF. For DKIM, see my blog post on "DKIM setup with postfix and OpenDKIM".

SPF setup

This is actually fairly easy to do. You basically need to add a DNS record which details the IP addresses of the mail servers which are allowed to send mail for mydomain.com. You can use an SPF setup wizard for this, and an SPF record testing tool to see if your setup is correct. In my case, I added a TXT record with the contents:

v=spf1 ip4:207.126.144.0/20 ip4:64.18.0.0/20 ip4:74.125.148.0/22 ip4:A.B.C.D -all

where the first 3 subnets are the 3 possible CIDR ranges for Postini servers, and the last IP which I denoted with A.B.C.D is mail01's Elastic IP address.

At this point, you should be able to both send email out through mail01 (from the servers you allowed to do so) and then through Postini, and also received email for the users you created within mydomain.com. If you also have DKIM and SPF setup, you're in a pretty good shape in terms of email deliverability situation. Still, sending email reliably is one of the hardest things to do in today's Internet, so good luck!

Deploying MongoDB with master-slave replication

This is a quick post so I don't forget how I set up replication between a MongoDB master server running at a data center and a MongoDB slave running in EC2. I won't go into how to use MongoDB (I'll have another one or more posts about that), I'll just talk about installing and setting up MongoDB with replication.

Setting up the master server

My master server runs Ubuntu 9.10 64-bit, but similar instructions apply to other flavors of Linux.

1) Download MongoDB binary tarball from the mongodb.org downloads page (the current production version is 1.4.2). The 64-bit version is recommended. Since MongoDB uses memory-mapped files, if you use the 32-bit version your database files will have a 2 GB size limit. So get the 64-bit version.

2) Unpack the tar.gz somewhere on your file system. I unpacked it in /usr/local on the master server and created a mongodb symlink:

# cd /usr/local
# tar xvfz mongodb-linux-x86_64-1.4.1.tar.gz
# ln -s mongodb-linux-x86_64-1.4.1 mongodb

3) Create a mongodb group and a mongodb user. Create a data directory on a partition with generous disk space amounts. Create a log directory. Set permissions on these 2 directories for user and group mongodb:

# groupadd mongodb
# useradd -g mongodb mongodb
# mkdir /data/mongodb
# mkdir /var/log/mongodb
# chown -R mongodb:mongodb /data/mongodb /var/log/mongodb

4) Create a startup script in /etc/init.d. I modified this script so it also handles logging the way I wanted. I posted my version of the script as this gist. In my case the script is called mongodb.master. I chkconfig-ed it on and started it:

# chkconfig mongodb.master on
# service mongodb.master start

5) Add location of MongoDB utilities to your PATH. If my case, added this line to .bashrc:

export PATH=$PATH::/usr/local/mongodb/bin

That's it, you should be in business now. If you type

# mongo

you should be connected to your MongoDB instance. You should also see a log file created in /var/log/mongodb/mongodb.log.

Also check out the very good administration-related documentation on mongodb.org.

Setting up the slave server

The installation steps 1) through 3) are the same on the slave server as on the master server.

Now let's assume you already created a database on your master server. If your database name is mydatabase, you should see files named mydatabase.0 through mydatabase.N and mydatabase.ns in your data directory. You can shut down the master server (via "service mongodb.master stop"), then simply copy these files over to the data directory of the slave server. Then start up the master server again.

At this point, in step 4 you're going to use a slightly different startup script. The main difference is in the way you start the mongod process:

DAEMON_OPTS="--slave --source $MASTER_SERVER --dbpath $DBPATH -logpath $LOGFILE --logappend run"

where MASTER_SERVER is the name or the IP address of your MongoDB master server. Note that you need port 27017 open on the master, in case you're running it behind a firewall.

Here is a gist with my slave startup script (which I call mongodb.slave).

Now when you run 'service mongodb.slave start' you should have your MongoDB slave database up and running, and sync-ing from the master. Check out the log file if you run into any issues.

Log rotation

The mongod daemon has a nifty feature: if you send it a SIGUSR1 signal via 'kill -USR1 PID', it will rotate its log file. The current mongodb.log will be timestamped, and a brand new mongodb.log will be started.

Monitoring and graphing

I use Nagios for monitoring and Munin for resource graphing/visualization. I found a good Nagios plugin for MongoDB at the Tag1 Consulting git repository: check_mongo. The 10gen CTO Eliot Horowitz maintains the MongoDB munin plugin on GitHub: mongo-munin.

Here are the Nagios commands I use on the master to monitor connectivity, connection counts and long operations:

/usr/local/nagios/libexec/check_mongo -H localhost -A connect
/usr/local/nagios/libexec/check_mongo -H localhost -A count
/usr/local/nagios/libexec/check_mongo -H localhost -A long

On the slave I also monitor slave lag:

/usr/local/nagios/libexec/check_mongo -H localhost -A slavelag

The munin plugin is easy to install. Just copy the files from github into /usr/share/munin/plugins and create symlinks to them in /etc/munin/plugins, then restart munin-node.

mongo_btree -> /usr/share/munin/plugins/mongo_btree
mongo_conn -> /usr/share/munin/plugins/mongo_conn
mongo_lock -> /usr/share/munin/plugins/mongo_lock
mongo_mem -> /usr/share/munin/plugins/mongo_mem
mongo_ops -> /usr/share/munin/plugins/mongo_ops

You should then see nice graphs for btree stats, current connections, write lock percentage, memory usage and mongod operations.

That's it for now. I'll talk more about how to actually use MongoDB with pymongo in a future post. But just to express an opinion, I think MongoDB rocks! It hits a very sweet spot between SQL and noSQL technologies. At Evite we currently use it for analytics and reporting, but I'm keeping a close eye on sharding becoming production-ready so that we can potentially use it for web-facing traffic too.

Sunday, May 09, 2010

Python Testing Tools Taxonomy on bitbucket

Michael Foord (aka Fuzzyman aka @voidspace) has kindly jumpstarted the task of porting the Python Testing Tools Taxonomy (aka PTTT) to reStructuredText, and to put it up on bitbucket for easier collaboration. The new bitbucket project is called taxonomy. Michael ported the unit testing section so far.

Porting the original trac format to reST is not the most fun way to spend one's time, so we need collaborators. If you're interested in contributing to this project, please fork and let us know so we can commit your patches. Thanks!

Thursday, May 06, 2010

RabbitMQ clustering in Ubuntu

I've been banging my head against various RabbitMQ-related walls in the last couple of days, so I wanted to quickly jot down how I got clustering to work on Ubuntu boxes, before I forget.

Scenario A

2 Ubuntu 9.04 64-bit servers (app01 and app02)
1 Ubuntu 9.10 64-bit server (app03)
all servers are part of an internal DNS zone

My initial mistake here was to install the rabbitmq-server package via "apt-get install". On 9.04, I got version 1.5.4 or rabbitmq-server, while on 9.10 I got 1.6. As it turns out, the database versions (rabbitmq uses the Mnesia database) are incompatible in terms of clustering between these 2 versions.

When I tried to join the server running Ubuntu 9.10 to the cluster, it complained with:

error: {schema_integrity_check_failed,
           {aborted,{no_exists,rabbit_user,version}}}

So....I removed rabbitmq-server via "apt-get remove" on all 3 servers. I also removed /var/lib/rabbitmq (which contains the database directory mnesia) and /var/log/rabbitmq (which contains the logs).

I then downloaded the rabbitmq-server_1.7.2-1_all.deb package and installed it, but only after also installing erlang. So:

# apt-get install erlang
# dpkg -i rabbitmq-server_1.7.2-1_all.deb

The install process automatically starts up the rabbitmq server. You can see its status by running:

# rabbitmqctl status
Status of node 'rabbit@app02' ...
[{running_applications,[{rabbit,"RabbitMQ","1.7.2"},
                        {mnesia,"MNESIA CXC 138 12","4.4.7"},
                        {os_mon,"CPO CXC 138 46","2.1.8"},
                        {sasl,"SASL CXC 138 11","2.1.5.4"},
                        {stdlib,"ERTS CXC 138 10","1.15.5"},
                        {kernel,"ERTS CXC 138 10","2.12.5"}]},
{nodes,['rabbit@app02']},
{running_nodes,['rabbit@app02']}]
...done.

Very important step: make sure the 3 servers have the same .erlang.cookie file. By default, every new installation creates a file containing a unique cookie in /var/lib/rabbitmq/.erlang.cookie. For clustering, the cookie needs to be the same, otherwise the servers are not allowed to access the shared cluster state on each other.

Here's what I did:

I stopped rabbitmq-server on app02 and app03 via '/etc/init.d/rabbitmq-server stop'
I copied /var/lib/rabbitmq/.erlang.cookie from app01 to app02 and app03
I removed /var/lib/rabbitmq/mnesia on app02 and app03 (you need to do this when you change the cookie)
finally I restarted rabbitmq-server on app02 and app03 via '/etc/init.d/rabbitmq-server start'

Now comes the actual clustering setup part. In my case, I wanted all nodes to be disk nodes as opposed to RAM nodes in terms of clustering, for durability purposes.

Here's what I did on app02, following the RabbitMQ clustering doc:

# rabbitmqctl stop_app
Stopping node 'rabbit@app02' ...
...done.

# rabbitmqctl reset
Resetting node 'rabbit@app02' ...
...done.

# rabbitmqctl cluster rabbit@app01 rabbit@app02
Clustering node 'rabbit@app02' with ['rabbit@app01',
                                         'rabbit@app02'] ...
...done.

# rabbitmqctl start_app
Starting node 'rabbit@app02' ...
...done.

# rabbitmqctl status
Status of node 'rabbit@app02' ...
[{running_applications,[{rabbit,"RabbitMQ","1.7.2"},
                        {mnesia,"MNESIA CXC 138 12","4.4.7"},
                        {os_mon,"CPO CXC 138 46","2.1.8"},
                        {sasl,"SASL CXC 138 11","2.1.5.4"},
                        {stdlib,"ERTS CXC 138 10","1.15.5"},
                        {kernel,"ERTS CXC 138 10","2.12.5"}]},
{nodes,['rabbit@app02','rabbit@app01']},
{running_nodes,['rabbit@app01','rabbit@app02']}]

You need to make sure at this point that you can see both nodes app01 and app02 listed in the nodes list and also in the running_nodes list. If you forgot the cookie step, the steps above will still work, but you'll only see app02 listed under nodes and running_nodes. Only if you inspect the log file /var/log/rabbitmq/rabbit.log on app01 will you see errors such as

=ERROR REPORT==== 6-May-2010::11:12:03 ===
** Connection attempt from disallowed node rabbit@app02 **

Assuming you see both nodes app01 and app02 in the nodes list, you can proceed with the same steps on app03. At this point, you should see something similar to this if you run 'rabbitmqctl status' on each node:

# rabbitmqctl status
Status of node 'rabbit@app03' ...
[{running_applications,[{rabbit,"RabbitMQ","1.7.2"},
{mnesia,"MNESIA CXC 138 12","4.4.10"},
{os_mon,"CPO CXC 138 46","2.2.2"},
{sasl,"SASL CXC 138 11","2.1.6"},
{stdlib,"ERTS CXC 138 10","1.16.2"},
{kernel,"ERTS CXC 138 10","2.13.2"}]},
{nodes,['rabbit@app03','rabbit@app02','rabbit@app01']},
{running_nodes,['rabbit@app01','rabbit@app02','rabbit@app03']}]
...done.

Now you can shut down and restart 2 of these 3 nodes, and the cluster state will be preserved. I am still experimenting with what happens if all 3 nodes go down. I tried it and I got a timeout when trying to start up rabbitmq-server again on one of the nodes. I could only get it to start by removing the database directory /var/lib/rabbitmq/mnesia -- but this kinda defeats the purpose of persisting the state of the cluster on disk.

Now for the 2nd scenario, when the servers you want to cluster are not running DNS (for example if they're EC2 instances).

Scenario B

2 Ubuntu 9.04 servers running as EC2 instances (ec2app01 and ec2app02)
no internal DNS

Let me cut to the chase here and say that the proper solution to this scenario is to have an internal DNS zone setup. The fact is that rabbitmq-server nodes are communicating amongst themselves using DNS, so if you choose not to use DNS, and instead use entries in /etc/hosts or even IP addresses, you need to resort to ugly hacks involving editing of the rabbitmq-server, rabbitmq-multi and rabbitmqctl scripts in /usr/lib/rabbitmq/bin and specifying -name instead of -sname everywhere you find that option in those scripts. And even then things might not work, and things will break when you will upgrade rabbitmq-server. See this discussion and this blog post for more details on this issue. In a nutshell, the RabbitMQ clustering mechanism expects that the FQDN of each node can be resolved via DNS on all other nodes in the cluster.

So...my solution was to deploy an internal DNS server running bind9 in EC2 -- which is something that sooner or later you need to do if you run more than a handful of EC2 instances.

Future work

This post is strictly about RabbitMQ clustering setup. There are many other facets of deploying a highly-available RabbitMQ system.

For example, I am also experimenting with putting an HAProxy load balancer in front of the my RabbitMQ servers. I want all clients to communicate with a single entry point in my RabbitMQ cluster, and I want that entry point to load balance across all servers in the cluster. I'll report on that setup later.

Another point that I need to figure out is how to deal with failures at the individual node level, given that RabbitMQ queues are not distributed across all nodes in a cluster, but instead they stay on the node where they were initially created. See this discussion for more details on this issue.

If you have deployed a RabbitMQ cluster and worked through some of these issues already, I'd love to hear about your experiences, so please leave a comment.

Sunday, May 02, 2010

Getting wireless NIC to work on Thinkpad T410s with Ubuntu Lucid

I tweeted this before, but a blog post is less ephemeral and serves better as a longer-term memory for myself and maybe for others.

The Thinkpad T410s uses the Realtek rtl8192se wireless chip which doesn't play well with Ubuntu Lucid, even with the general release from last week. To get it to work, I had to:

1) Download Linux driver for rtl8192se from this Realtek page.
2) Untar the package, run make and 'make install' as root.
3) Reboot.

Unfortunately, you'll need to do this every time you upgrade the kernel, unless support for this chipset gets better.

Agile Testing