Tuesday, June 21, 2005

Keyword-based logging with the py library

I've been experimenting lately with the logging functionality recently introduced in the py library. Most of the functionality is still in early experimental stage, but the main ideas are pretty well defined. I want to review some of these ideas here and hopefully get some feedback.

Holger Krekel, the main py library developer, introduced the idea of keyword-based logging and of logging producers and consumers. A log producer is an object used by the application code to send messages to various log consumers. The consumers can be STDOUT, STDERR, log files, syslog, the Windows Event Log, etc. When you create a log producer, you define a set of keywords that are then used to both route the logging messages to consumers, and to prefix those messages.

Here's a quick example of code that works with the current svn version of the py library (which you can get from http://codespeak.net/svn/py/dist). I saved the following code in a file called logmodule.py:
# logmodule.py

import py

log = py.log.Producer("logmodule")

def runcmd(cmd):
# real code follows

def post2db(data):
# real code follows

if __name__ == "__main__":
runcmd("Running command CMD1")
I started by instantiating a log object of class py.log.Producer with the keyword "logmodule". Since I didn't associate any consumer with the log producer, a default consumer will be used, which prints all messages to STDOUT.

The log producer object does not have any pre-defined methods associated with it. Instead, you can invoke arbitrarily-named methods. When accessing a brand new method (such as log.cmd or log.post2db), another producer object is returned on the fly, with the name of the method added as a keyword to the already-existent set of keywords. When calling the returned object as a function with the log message as the argument, the keywords are looked up to see if there is any log consumer associated with them. If no consumer is found, the default one is used and the message is pri nted to STDOUT.

Running the code above produces:
[logmodule:cmd] Running command CMD1
[logmodule:post2db] PASS
What happened behind the scenes when invoking log.cmd("Running command CMD1") was that a log producer object called log.cmd was created with ('logmodule', 'cmd') as the set of keywords. When calling the object as a function with the message as the argument, the message was by default printed to STDOUT, since no consumer was found to be associated with those keywords.
The message was also prefixed with the keywords, joined by ":".

Now let's assume we don't want any messages to be logged when using the logmodule module directly. To achieve this, we associate a null consumer with the "logmodule" keyword by adding the following line somewhere at the top of our module :
py.log.setconsumer("logmodule", None)
(NOTE: the name of the setconsumer function might change, but the idea will be the same, namely that any keyword or set of keywords can be associated with log consumers).

Now, when we invoke log.cmd("Running command CMD1"), the keyword matching logic will first look for a consumer associated with ('logmodule', 'cmd'), since this is the complete set of keywords. If it doesn't find it, it looks for a consumer associated with 'logmodule'. In our case, it will find the consumer we designated (None), so it will use that consumer and not log the message anywhere.

The simple mechanism of associating log consumers with keywords offers a tremendous amount of flexibility in configuring logging for an application. The association keywords-consumers can be done at run time, via a configuration file or via command-line arguments, and the application code doesn't need to know where logging gets routed. As an example, let's assume that we want to use the 2 top-level functions runcmd and post2db from the logmodule module, and at the same time we want to log messages to other consumers. Here is some code I saved in a file called use_logmodule.py:
# use_logmodule.py

import py
from logmodule import runcmd, post2db

py.log.setconsumer("logmodule", py.log.STDERR)
py.log.setconsumer("logmodule post2db", None)

runcmd("Running command CMD2")
My goal was to print all messages to STDERR by default, and not print any messages associated with the "post2db" keyword (in a real life situation, those messages could be associated with some database-management code for example).

To achieve my goal, I associated a consumer that uses STDERR with the keyword "logmodule" and a "/dev/null"-type consumer (i.e. None) with the keywords "logmodule post2db". Now when I run the above code, I get the following message printed to STDERR:
[logmodule:cmd] Running command CMD2
I could just as easily have routed the logging messages to a log file by associating a producer of type py.log.Path (currently there are 3 types of pre-defined consumers offered by the py.log package: py.log.STDOUT, py.log.STDERR and py.log.Path; in the future, they will be augmented with other types of consumers such as syslog, Windows Event Logs, etc.). Here's how to log all "post2db"-related messages to a file:
py.log.setconsumer("logmodule post2db", py.log.Path("/tmp/logmodule.out"))
When running the code in use_logmodule.py, the following line will be printed to /tmp/logmodule.out:
[logmodule:post2db] FAIL
A question you might have at this point is how does all this relate to a more traditional use of logging, based on severity levels such as debug, info, warn, error, critical. Let's assume we want our application code to issue calls to log.debug, log.info and log.error, while being able at the same time to direct these various levels of severity to different logging mechanisms. We could for example log debug messages to a log file, informational messages to the console (STDOUT or STDERR), and error messages to both a log file and the console. Here's one way to achieve this via the py.log keyword-based approach. I saved the following code in a file called config.py:
# config.py

import sys
import py

log = py.log.Producer("")
log.debug = py.log.Producer("logfile")
log.info = py.log.Producer("console")
log.error = py.log.Producer("console logfile")

logfile = "/tmp/myapp.out"
py.log.setconsumer("logfile", py.log.Path(logfile))
py.log.setconsumer("console", py.log.STDOUT)

l = open(logfile, 'a', 1)

def console_logfile_logger(msg):
print >>sys.stdout, str(msg)
print >>l, str(msg)

py.log.setconsumer("console logfile", console_logfile_logger)
Most of this stuff should look familiar by now. I instantiated an object called log of class Producer, with no keywords associated with it. This object will serve as the default producer object, so the application will be able to call log(some_msg) and have some_msg printed to STDOUT by default. Next, I created the debug, info and error attributes for the log object. This will enable me to call log.debug(msg), log.info(msg) and log.error(msg) in my application code, without necessarily caring where those messages get routed. As it stands right now in the configuration file, the log.debug object is declared as a log.Producer object associated with the keyword "logfile". Similarly, the log.info object is associated with the keyword "console" and the log.error object is associated with both keywords "console logfile".

The next step is defining consumers for the various keywords. For "logfile" I set a consumer of type Path, pointing to the file /tmp/myapp.out. For "console" I set the STDOUT consumer, and for "console logfile" I set a custom consumer that prints its argument (the message to be logged) to STDOUT and also appends it to the log file. As an aside, I want to note that consumers are simply functions that take a message as an argument and manage the message as they see fit (if you look in the testing code for the py.log module in the py library, you'll see consumers that simply append the message to a list for example).

Here is an example of application code that uses the above config.py file:
# appcode.py

import config

log = config.log

log.debug("DEBUG MESSAGE")
log.info("INFO MESSAGE")
log.error("ERROR MESSAGE")
Running appcode.py produces:
[console] INFO MESSAGE
[console:logfile] ERROR MESSAGE
At the same time, the file /tmp/myapp.out contains:
[console:logfile] ERROR MESSAGE
Note that we didn't declare log.warn anywhere. It was dynamically transformed by py.log into a producer object with keyword "warn" and printed by default to STDOUT, since the "warn" keyword didn't have any consumer associated with it.

I didn't want to complicate things too much in the code above, but in my opinion it would be better to have 'debug', 'info' and 'error' as part of the keywords that also prefix each message, so that you can see at a glance what kind of severity level the message has. To achieve that, you would change the declarations for log.debug, log.info and log.error in config.py to:
log.debug = py.log.Producer("logfile").debug
log.info = py.log.Producer("console").info
log.error = py.log.Producer("console logfile").error
or alternatively to:
log.debug = py.log.Producer("logfile debug")
log.info = py.log.Producer("console info")
log.error = py.log.Producer("console logfile error")
Now if you run appcode.py, you'll see this printed on the screen:
[console:info] INFO MESSAGE
[console:logfile:error] ERROR MESSAGE
This also makes it easier to turn off debugging for example. All you need to do is to associate the null consumer with the keywords "logfile debug":
py.log.setconsumer("logfile debug", None)
Note that the consumer associated with the single keyword "logfile" is still the Path consumer that prints to /tmp/myapp.out, so non-debug messages would still get routed to that file.

In conclusion, I think that the keyword-based approach works really well for the logging needs of any application. I emphasize once more that the py.log code is still in its infancy and as such is prone to changes (including naming changes), so I wouldn't rely on py.log just yet for production code. But the main idea of associating log producers with log consumers via keywords should be the same, and this is I think the novelty of this approach.

One enhancement that I've discussed with Holger is to have a way to specify that you want multiple keywords to be routed each to a different consumer. So for example, if you declared
log.error = py.log.Producer("console logfile")
a call to log.error(some_msg) would try to match the "console" keyword individually to a consumer, and the "logfile" keyword to another consumer.

Right now, the keyword matching logic only looks for a single consumer that matches either ("console", "logfile") or just ("console",), so you need to declare a custom consumer such as the console_logfile_logger function I described above.

In the multiple-keyword approach, you wouldn't need to do that, instead you would just associate "logfile" with a consumer of type Path, "console" with a consumer of type STDOUT/STDERR, and both consumers would then receive the message when log.error(some_msg) was called.

If you think you can use the keyword-based approach to logging in your own application code, or if you have a use case that you would like to see implemented via py.log, or if you have any suggestions at all, please leave a comment here of send email to py-dev at codespeak dot net.

Wednesday, June 08, 2005

Handling the 'Path' Windows registry value correctly

A recent recipe posted to the Python Cookbook site shows how to manipulate environment variables in Windows by modifying the registry. I've been using something similar in my code, mainly for adding directories on the fly to the 'Path' environment variable by modifying the \HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Environment\Path registry value.

Here are two simple functions for getting and setting a registry value of type string (REG_SZ):
def get_registry_value(reg, key, value_name):
k = _winreg.OpenKey(reg, key)
value = _winreg.QueryValueEx(k, value_name)[0]
return value

def set_registry_value(reg, key, value_name, value):
k = _winreg.OpenKey(reg, key, 0, _winreg.KEY_WRITE)
value_type = _winreg.REG_SZ
_winreg.SetValueEx(k, value_name, 0, value_type, value)
To add a directory dir to the Path registry value, you would do:
reg = _winreg.ConnectRegistry(None, _winreg.HKEY_LOCAL_MACHINE)
key = r"SYSTEM\CurrentControlSet\Control\Session Manager\Environment"
path = get_registry_value(reg, key, "Path")
path += ";" + dir
set_registry_value(reg, key, "Path", path)
However, there was a problem with this approach. If you look at a typical Path registry value (using for example regedit), you'll see that it contains directories such as %SystemRoot% and %SystemRoot%\system32, where %SystemRoot% is a variable which has the Windows system directory as its value. When I modified the Path value, I mistakenly set its type to REG_SZ, and in consequence the command prompt interpreter did not replace the %SystemRoot% variable with its value. The unfortunate side effect of all this was that typical Windows binaries residing in the system32 directory, such as ping, ipconfig, etc., were not found anymore by the command prompt.

As a workaround, I replaced %SystemRoot% with os.environ['SYSTEMROOT'] in the path variable, like this:

dirs = path.split(';')
systemroot = os.environ['SYSTEMROOT']
except KeyError:
dirs = [re.sub('%SystemRoot%', systemroot, dir)
for dir in dirs]
path = ';'.join(dirs)

A much better approach, suggested by Ori Berger in a comment to the recipe I mentioned, is to set the value type for the Path registry value to REG_EXPAND_SZ, and not to REG_SZ. This makes variables like %SystemRoot% be automatically expanded by the command prompt interpreter. So the modified version of my set_registry_value function is now:

def set_registry_value(reg, key, value_name, value, value_type=_winreg.REG_SZ):
k = _winreg.OpenKey(reg, key, 0, _winreg.KEY_WRITE)
_winreg.SetValueEx(k, value_name, 0, value_type, value)
For modifying the 'Path' value, you would now do:
reg = _winreg.ConnectRegistry(None, _winreg.HKEY_LOCAL_MACHINE)
key = r"SYSTEM\CurrentControlSet\Control\Session Manager\Environment"
path = get_registry_value(reg, key, "Path")
path += ";" + dir
set_registry_value(reg, key, "Path", path, _winreg.REG_EXPAND_SZ)
Of course, the whole registry value manipulation code can be neatly packaged in a class, like here.

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...