Trusting the hardware too much
Consider this snippet of code from drivers/char/hpet.c:
do {
m = read_counter(&hpet->hpet_mc);
write_counter(t + m + hpetp->hp_delta, &timer->hpet_compare);
} while (i++, (m - start) < count);
Here, read_counter() is a thin macro wrapper around readl(). The driver is writing to the timer compare register in a loop, assuming that the "main counter" value read from the HPET will eventually exceed the threshold value. Almost always, that is exactly what happens. But if the HPET ever goes a little bit weird and stops returning something meaningful when the main counter is read, the above code could easily turn into an infinite loop. The kernel is trusting the hardware to be rational, but the hardware may not choose to live up to that expectation.
"Usbmuxd" is a daemon which facilitates communications with various Apple iDevices. Recently, this patch to usbmuxd was recognized to be a security fix for a bug eventually designated as CVE-2012-0065. In short, this daemon would read a serial number string from the device and copy it into an internal array without checking its length. Exploiting this vulnerability is not easy - it requires the ability to plug in a USB device that has been designed to overflow that particular buffer with something interesting. But it is a vulnerability, and it is worth noting that an increasing number of USB devices are really just Linux systems using the "USB gadget" code; creating that malicious device would not be hard to do. So this vulnerability could be interesting to the "leave a malicious USB stick in the parking lot" school of attacker.
This bug, too, is the result of trusting the hardware. As seen here, the hardware could be overtly evil. In other cases, it is just subject to electrical wear, power spikes, cosmic rays, and the varying skills of those who write the firmware - closed source which is never reviewed by anybody. Even in a world where price pressures didn't mandate that each component must cost as little as possible, hardware bugs would be a problem.
By now, the lesson should be clear: driver developers should always regard their hardware with extreme suspicion and take nothing for granted. The problem is that even highly diligent developers (and reviewers) can easily let this kind of bug slip by. In almost all cases, the driver appears to work just fine without the extra sanity checks; the hardware plays along most of the time, after all, until that especially inopportune moment arrives. Sometimes the developer sees the resulting failure, resulting in that "oh, yeah, I have to make sure that the hardware doesn't flake there" moment that is discouragingly common in driver development. Other times, some far away user sees strange problems and nobody really knows why.
What would be nice would a way for the computer to tell developers when they are being overly trusting of the hardware; then it might be possible to skip the "tracking down the weird problem" experience. As it happens, such a way exists in the form of a static analysis tool called Carburizer, developed by Asim Kadav, Matthew J. Renzelmann and Michael M. Swift. Those wanting a lot of information about this tool can find it in this one-page poster [PDF], this ACM Symposium on Operating Systems Principles (SOSP) paper [PDF], or in this rather over-the-top web site.
In short: Carburizer analyzes kernel code, looking for insufficiently robust dealings with the hardware. Its key strength at the moment appears to be the identification of possible infinite loops - loops whose exit condition depends solely on a value obtained from the hardware. There are, it seems, over 1000 such loops in the 3.2.1 kernel. The tool also looks for cases where unchecked values from hardware are used to index arrays or are used directly as pointers, though the false-positive rate seems to be higher for these checks. The result is an output file as linked above, from which developers can go and investigate.
Naturally enough, the tool shows some signs of its academic origins. It is written in Ocaml and requires some modifications to the kernel source tree before it can be run. Carburizer also requires that multi-file drivers be merged into one big file, with the result that the line numbers in the resulting diagnostics do not correspond to the source tree everybody else has. That may be part of why, despite a positive response to a posting of the tool on kernel-janitors in January, 2011, little in the way of actual fixes seems to have resulted. Or it may just be that, so far, these results have only been seen by a relatively small group of developers.
Interestingly, Carburizer can propose fixes of its own. These include putting time limits into potentially infinite loops and adding bounds checks to suspect array references. While it is at it, Carburizer fixes up seemingly unnecessary panic() calls and adds logging code to places where, it thinks, the driver neglects to report a hardware failure. With a separate runtime module, it can even deal with stuck interrupts (the driver is forced into a polling mode) and more. The resulting code has not been posted for consideration, which is not entirely surprising; the fixes are, necessarily, of a highly conservative "don't break the driver" nature. Such fixes are almost certain not to be what a human would write after looking at the code. But the tool is open source, so interested developers can run it themselves to see what it would do.
Meanwhile, even without automatic fixes, these results seem worthy of some
attention. Computers can be far better than humans at finding many classes
of bugs; when computers have been used in that role, some types of bugs
have nearly disappeared from the kernel. Maybe someday we'll have a
version of Carburizer that can be folded into a tool like checkpatch; for
now, though, we'll have to look at its complaints about our code separately
and decide what action is needed.
| Index entries for this article | |
|---|---|
| Kernel | Development tools |
| Security | Hardware |
| Security | Linux kernel/Tools |