Tuesday, August 26, 2008

RTFL

No, this is not a misspelling for ROTFL, but rather a variant of RTFM. It stands for Read The F...riendly Log. It's a troubleshooting technique that is very basic, yet surprisingly overlooked. I use it all the time, and I just want to draw attention to it in case you find yourself stumped by a problem that seems mysterious.

Here are some recent examples from my work.

Apache wouldn't start properly

A 'ps -def | grep http' would show only the main httpd process, with no worker processes. The Apache error log showed these lines:

Digest: generating secret for digest authentication

A google search for this line revealed this article:

http://www.raptorized.com/2006/08/11/apache-hangs-on-digest-secret-generation/

It turns out the randomness/entropy on that box had been exhausted. I grabbed the rng-tools tar.gz from sourceforge, compiled and installed it, then ran

rngd -r /dev/urandom

...and apache started its worker processes instantly.

Cannot create InnoDB tables in MySQL

Here, all it took was to read the MySQL error log in /var/lib/mysql. It's very friendly indeed, and tells you exactly what to do!

InnoDB: Error: data file ./ibdata1 is of a different size
InnoDB: 2176 pages (rounded down to MB)
InnoDB: than specified in the .cnf file 128000 pages!
InnoDB: Could not open or create data files.
InnoDB: If you tried to add new data files, and it failed here,
InnoDB: you should now edit innodb_data_file_path in my.cnf back
InnoDB: to what it was, and remove the new ibdata files InnoDB created
InnoDB: in this failed attempt. InnoDB only wrote those files full of
InnoDB: zeros, but did not yet use them in any way. But be careful: do not
InnoDB: remove old data files which contain your precious data!

Windows-based Web sites are displaying errors

Many times I've seen Windows/IIS based Web sites displaying cryptical errors such as:

Server Error in '/' Application.
Runtime Error

The IIS logs are much less friendly in terms of useful information than the Apache logs. However, the Event Viewer is a good source of information. In a recent case, inspecting the Event Viewer told us that the account used to connect from the Web server to the DB server had expired, so re-enabling it was all it took to fix the issue.

In conclusion -- RTFL and google it! You'll be surprised how large of a percentage of issues you can solve this way.

3 comments:

fumanchu said...

And the corollary--design new systems to log enough information to be nicely googlable. ;)

Michele Smith said...

Good reminder... I use them both. I also attach copies of them to my defect reports.

Srikanth said...

True, I learned this the hard way. But RTFL certainly enforces the habit. Thanks for the wonderful blog posts.

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...