Monday, January 28, 2008

How modern Web design is conducted

Via the 40. (with egg) blog, a time breakdown on how Web design is conducted in our day and time. Hmmm...maybe there's room for allocating some slice of that pie to testing, using Firebug for example. UPDATE: I meant debugging with Firebug and testing with twill and Selenium of course :-)

Friday, January 25, 2008

Checklist automation and testing

This is a follow-up to my previous post on writing automated tests for sysadmin-related checklists. That post seems to have struck a chord, judging by the comments it generated.

Here's the scenario I'm thinking about: you need to deploy a standardized set of packages and configurations to a bunch of servers. You put together a checklist detailing the steps you need to take on each server -- kickstart the box, run some post-install scripts, do some configuration customization, etc. At this point, you're already ahead of the game, and you're not relying solely on human memory. However, if you rely on a human being going manually through each step of the checklist on each server, you're in for some surprises in the guise of missed steps. The answer of course is to automate as many steps as you can, ideally all of them.

Now we're getting to the main point of my post: assuming you did automate all the steps of the checklist, and you ran your scripts on each server, do you REALLY have that warm and fuzzy feeling that everything is OK? You don't, unless you also have a comprehensive automated test suite that runs on every server and actually checks that stuff happened the way you intended.

Here are some concrete examples of stuff I'm verifying after the deployment of a certain type of our servers (running Apache/Tomcat).

OS-specific tests

* does the sudoers file contain certain users that need those rights
* is the sshd process set to start at boot time
* is the ClientAliveInterval variable set correctly in /etc/sshd/sshd_config
* are certain NFS mount points defined in /etc/fstab, and do they actually exist on the server
* is sendmail set to start at boot time, and running
* are iptables and/or SELinux configured the way they should be
* ....and more

Apache-specific tests

* is httpd set to start at boot time
* do the virtual host configuration files in /etc/httpd/conf.d contain the expected information
* has mod_jk been installed and configured properly (mod_jk provides the glue between Apache and Tomcat)
* is SSL configured properly
* does the /etc/logrotate.d/httpd configuration file contain the correct options (for example keep the logs for N days and compress them)
* etc.

Tomcat-specific tests

* has a specific version of Java been installed
* has Tomcat been installed in the correct directory, with the correct permissions
* has Tomcat been set to start at boot time
* etc.

Media-specific tests

* has ImageMagick been installed in the correct location
* does ImageMagick support certain file formats (JPG, PNG, etc)
* can ImageMagick actually process certain types of files (JPG, PNG, etc.)

Some of these tests could be run from a monitoring system (one of the commenters on my previous post mentioned that their sysadmins use Zimbrix; an Open Source alternative is Nagios, and there are many others.) However, a monitoring system typically doesn't go into the level of detail I described, especially when it comes to configuration files and other more advanced customizations. That's why I think it's important to use a real test framework and a real scripting language for this type of automated tests.

In my case, each type of test resides in its own file -- for example test_os.py, test_apache.py, test_tomcat.py, test_media.py. I run the tests using the nose test framework.

Here are some examples of small test functions. I'm using sets for making sure that expected lines are in certain files or in the output of certain commands, since most of the time I don't care about the order in which those lines appear.

From test_os.py:

def test_sshd_on():
stdout, stdin = popen2.popen2('chkconfig sshd --list')
lines = stdout.readlines()
assert "sshd \t0:off\t1:off\t2:on\t3:on\t4:on\t5:on\t6:off\n" in lines

From test_apache.py:

def test_logrotate_httpd():
lines = open('/etc/logrotate.d/httpd').readlines()
lines = set(lines)
expected = set([
" rotate 100\n",
" compress\n",
])
assert lines.issuperset(expected)

From test_tomcat.py:

def test_homedir():
target_dir = '/opt/target'
assert os.path.isdir(target_dir)
(st_mode, st_ino, st_dev, st_nlink, st_uid, st_gid, st_size, st_atime, st_mtime, st_ctime) = os.stat(target_dir)
assert st_uid == TARGET_UID, 'User wrong for %s' % pathname
assert st_gid == TARGET_GID, 'Group wrong for %s' % pathname

From test_media.py:

def test_ImageMagick_listformat():
stdout, stdin = popen2.popen2(''/usr/local/bin/identify --list format'')
lines = stdout.readlines()
lines = set(lines)
expected = set([
" JNG* PNG rw- JPEG Network Graphics\n",
" JPEG* JPEG rw- Joint Photographic Experts Group JFIF format (62)\n",
" JPG* JPEG rw- Joint Photographic Experts Group JFIF format\n",
" PJPEG* JPEG rw- Progessive Joint Photographic Experts Group JFIF\n",
" JNG* PNG rw- JPEG Network Graphics\n",
" MNG* PNG rw+ Multiple-image Network Graphics (libpng 1.2.10)\n",
" PNG* PNG rw- Portable Network Graphics (libpng 1.2.10)\n",
" PNG24* PNG rw- 24-bit RGB PNG, opaque only (zlib 1.2.3)\n",
" PNG32* PNG rw- 32-bit RGBA PNG, semitransparency OK\n",
" PNG8* PNG rw- 8-bit indexed PNG, binary transparency only\n",
])
assert lines.issuperset(expected)

As always, comments and suggestions are very welcome! Also see Titus's post for some sysadmin-related automated tests that he's running on a regular basis.

Tuesday, January 22, 2008

Stay away from the AT&T Tilt phone

Why? Because it's extremely fragile. I got one a couple of weeks ago (sponsored by my company, otherwise I wouldn't have shelled out $399) and after just 2 days I found the screen cracked in 2 places. It's true I carried it in my jacket's pocket while driving, and it probably jammed against my leg or something, but I've done that with other phones and didn't have this issue.

Of course, calls to the store, AT&T warranty and the manufacturer were all fruitless. The store won't accept it back because it's not in 'like new' state, and AT&T's warranty doesn't cover cracks in the screen. My best option at this point is to send it to the manufacturer for repairs, which will run me another $190. For now, I'm just using it as is, but I just want to tell whoever bothers to read this that I'm not happy -- not with the Tilt, and not with AT&T's customer service. There. Take that, AT&T.

Joel on checklists

Another entertaining blog post from Joel Spolsky, this time on some issues they had with servers and networking equipment hosted at a data center in Manhattan. It all comes down to a network switch which had its ports configured to automatically negotiate their speed. As a result, one port was misbehaving and brough their whole web site down. The conclusion reached by Joel and his sysadmin team: we need documentation, we need checklists. I concur, but as I said in a recent post, this is still not enough. Human beings are notoriously prone to skipping tests on checklists. What Joel and his team really need are AUTOMATED TESTS that run periodically and check every single thing on those checklists. You can easily automate the step which verifies that a port on the switch is set to 100 Mbps or 1 Gbps; you can either use SNMP, or some expect-like script.

In fact, at my own company I'm developing a pretty extensive automated test suite (written in Python of course, and using nose) that verifies all the steps we go through whenever we deploy a server or a network device. It's very satisfying to see those dots and have a total of N passed tests and 0 failed tests, with N increasing daily. Automated tests for sysadmin tasks is an area little explored, so there's lots of potential for cool stuff to happen. If you're doing something similar and have ideas to share, please leave a comment.

Wednesday, January 16, 2008

MySQL has been assimilated

...by Sun, for $1 billion. Bummer. I shudder whenever I see companies at the forefront of Open Source being gobbled up by giants such as Sun. I still don't know what Sun's Open Source strategy is -- they've been going back and forth with their support for Linux, and they seem to be pushing Open Solaris pretty heavily these days, although I personally don't know anybody in the OSS community who is using Open Solaris. UPDATE: Tim O'Reilly thinks this is a great fit for both Sun and MySQL, and says that "Sun has staked its future on open source, releasing its formerly proprietary crown jewels, including Solaris, Java, and the Ultra-Sparc processor design." Hmmm...maybe, but Sun has always struck me as being bent on world domination, just as bad as Microsoft.

Update 01/18/08: Here's a really good recap on the MySQL acquisition at InfoQ. Most people express a warm fuzzy feeling about this whole thing. I hope my apprehensions are unfounded.

In other acquisition-related news, Oracle agreed to buy BEA (the makers of WebLogic) for a paltry $7.85 billion.

Friday, January 11, 2008

Looking to hire a MySQL DBA

If you're based in the Los Angeles area and are looking for a job as a MySQL DBA, send me an email at grig at gheorghiu dot net. RIS Technology, the Web hosting company I'm working for, is looking for somebody to administer MySQL databases -- things such as database design, replication, data migration, SQL query analysis and optimization. The position can be either contract-based or full time. Experience with PostgreSQL is a plus. Experience with Python is a huge plus :-)

Wednesday, January 02, 2008

Testing Tutorial accepted at PyCon08

"Practical Applications of Agile (Web) Testing Tools", the tutorial that Titus and I proposed to the PyCon08 organizers, has been accepted -- cool! Should be a lot of fun. The list of accepted tutorials looks really good.

Here's the summary of our tutorial

Practical Applications of Agile (Web) Testing Tools
---------------------------------------------------

Have Web site? Need testing? Bring your tired (code), huddled (unit
tests), and cranky AJAX to us; we'll help you come up with tactics,
techniques, and infrastructure to help solve your problems. This
includes integration with a unit test runner (nose); use of coverage
analysis (figleaf); straight HTTP driver Web testing (twill); Web
recording, examination, and playback (scotch); Selenium and Selenium
RC test script development; and continuous integration (buildbot).
We will focus on techniques for automating your Web testing for quick
turnaround, i.e. "agile" test automation.
If you have an application that needs automated tests, and if you're planning to attend our tutorial, drop us a line or leave a comment here with some details about your application.

10 technologies that will change your future

...according to an article in the Sydney Morning Herald (found via the O'Reilly Radar). Personally, I just want a chumby.

What's with the rants?

Some stars may have been aligned in a particularly nasty way on December 31, 2007, which might explain some rants that were published on various blogs. One of them at least has the quality of being humorous in a scatological sort of way: 'Rails is a Ghetto' by Zed Shaw, the creator of Mongrel -- although I'm fairly sure it doesn't seem humorous to the people he names in his post; and BTW, make sure there are no kids around when you read that post. I'd like to meet Zed one day, he's an interesting character who sure wears his heart on his sleeve.

I didn't find much humor though in James Bennett's rant against a blog post written by Noah Gift. I did find many gratuitous insults and uncalled for name-calling. As somebody said already -- chill, James! The comments on James's post are also revealing in their variety. Good thing the Python community also contains people like Ian Bicking who are trying to inject some civility and sanity into this.

For the record, I agree with Noah that documentation and marketing are two extremely important driving forces in the adoption of any framework. RoR's success is in no small part due to documentation, flashy screencasts and tireless marketing. And I also agree that Zope and its descendants would be so much better off with more marketing.

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...