The XP machines run the IE browser in an automated fashion, pointing it to sites known or suspected for hosting malware. Each machine also runs monitoring software that records every single file and Registry read/write, as well as any attempt to hook malware into Auto-Start Extensibility Points -- for many more details on this see this research report from Microsoft. The machines act as "monkeys" by merely pointing the browser to suspected malicious Web sites and then waiting for a few minutes. The automated IE drivers do not click on any dialog box elements that might prompt for installation of software. Thus, every file that gets created outside the browser's temporary directory, and every Registry write means that malware was installed automatically, without the action of the "user" (i.e. the monkey in this case). When a machine detects that malware was installed, it forwards the URL to a "better" machine (in terms of service packs and patches installed on it) in the cluster. If the URL gets to a fully patched machine and still results in the installation of malware, it means that a zero-day exploit has been found, i.e. an exploit that exists in the wild for which there is no available patch.
As the authors of the research report point out, this approach qualifies as "black-box", since it simply points the browser to various URLs and watches for modifications to the file system, the registry and the memory. A more "white-box" approach would be to attempt to identify malware by trying to match signatures or behaviors against a known list/database. The black-box approach turns out to be much simpler to implement and very effective. The authors report finding the first zero-day exploit using their HoneyMonkeys setup in July 2005.
I think there are a lot of lessons in this stories for us testers:
- Use Virtual Machine technologies such as VMWare or VirtualPC for easy rollout and reload of multiple OS/software configurations -- when a HoneyMonkey machine is infected with malware, its Virtual Machine image is simply reloaded from a "golden image"
- Automate, automate, automate -- there is no way "real monkeys" in the shape of humans can click through thousands of URLs in order to find the ones that host malware
- Apply the KISS principle -- the monkey software is purposely kept simple and stupid; the intelligence resides with the various pieces of monitoring software that watch for modifications to the host machine
- Don't underestimate black-box techniques -- there is a tendency to relegate black-box techniques to a second-rate status compared to white-box testing; as the HoneyMonkey project demonstrates, sometimes the easier way out is better