Trusting the hardware too much

By Jonathan Corbet
February 15, 2012

Anybody who does low-level kernel programming for long enough learns that the hardware is not their friend. Expecting the hardware to be nice is a recipe for disaster; instead, one must treat the hardware as if it were a clever and willful dog. With some effort, it can be made to perform impressive tricks, but, given a moment of inattention, it will snag your dinner from the grill and hide under the couch. The good news is that Linux kernel developers understand the nature of their relationship with the hardware and take great care not to trust it too far. Or, at least, that is what we would like to think.

Consider this snippet of code from drivers/char/hpet.c:

	do {
		m = read_counter(&hpet->hpet_mc);
		write_counter(t + m + hpetp->hp_delta, &timer->hpet_compare);
	} while (i++, (m - start) < count);

Here, read_counter() is a thin macro wrapper around readl(). The driver is writing to the timer compare register in a loop, assuming that the "main counter" value read from the HPET will eventually exceed the threshold value. Almost always, that is exactly what happens. But if the HPET ever goes a little bit weird and stops returning something meaningful when the main counter is read, the above code could easily turn into an infinite loop. The kernel is trusting the hardware to be rational, but the hardware may not choose to live up to that expectation.

"Usbmuxd" is a daemon which facilitates communications with various Apple iDevices. Recently, this patch to usbmuxd was recognized to be a security fix for a bug eventually designated as CVE-2012-0065. In short, this daemon would read a serial number string from the device and copy it into an internal array without checking its length. Exploiting this vulnerability is not easy - it requires the ability to plug in a USB device that has been designed to overflow that particular buffer with something interesting. But it is a vulnerability, and it is worth noting that an increasing number of USB devices are really just Linux systems using the "USB gadget" code; creating that malicious device would not be hard to do. So this vulnerability could be interesting to the "leave a malicious USB stick in the parking lot" school of attacker.

This bug, too, is the result of trusting the hardware. As seen here, the hardware could be overtly evil. In other cases, it is just subject to electrical wear, power spikes, cosmic rays, and the varying skills of those who write the firmware - closed source which is never reviewed by anybody. Even in a world where price pressures didn't mandate that each component must cost as little as possible, hardware bugs would be a problem.

By now, the lesson should be clear: driver developers should always regard their hardware with extreme suspicion and take nothing for granted. The problem is that even highly diligent developers (and reviewers) can easily let this kind of bug slip by. In almost all cases, the driver appears to work just fine without the extra sanity checks; the hardware plays along most of the time, after all, until that especially inopportune moment arrives. Sometimes the developer sees the resulting failure, resulting in that "oh, yeah, I have to make sure that the hardware doesn't flake there" moment that is discouragingly common in driver development. Other times, some far away user sees strange problems and nobody really knows why.

What would be nice would a way for the computer to tell developers when they are being overly trusting of the hardware; then it might be possible to skip the "tracking down the weird problem" experience. As it happens, such a way exists in the form of a static analysis tool called Carburizer, developed by Asim Kadav, Matthew J. Renzelmann and Michael M. Swift. Those wanting a lot of information about this tool can find it in this one-page poster [PDF], this ACM Symposium on Operating Systems Principles (SOSP) paper [PDF], or in this rather over-the-top web site.

In short: Carburizer analyzes kernel code, looking for insufficiently robust dealings with the hardware. Its key strength at the moment appears to be the identification of possible infinite loops - loops whose exit condition depends solely on a value obtained from the hardware. There are, it seems, over 1000 such loops in the 3.2.1 kernel. The tool also looks for cases where unchecked values from hardware are used to index arrays or are used directly as pointers, though the false-positive rate seems to be higher for these checks. The result is an output file as linked above, from which developers can go and investigate.

Naturally enough, the tool shows some signs of its academic origins. It is written in Ocaml and requires some modifications to the kernel source tree before it can be run. Carburizer also requires that multi-file drivers be merged into one big file, with the result that the line numbers in the resulting diagnostics do not correspond to the source tree everybody else has. That may be part of why, despite a positive response to a posting of the tool on kernel-janitors in January, 2011, little in the way of actual fixes seems to have resulted. Or it may just be that, so far, these results have only been seen by a relatively small group of developers.

Interestingly, Carburizer can propose fixes of its own. These include putting time limits into potentially infinite loops and adding bounds checks to suspect array references. While it is at it, Carburizer fixes up seemingly unnecessary panic() calls and adds logging code to places where, it thinks, the driver neglects to report a hardware failure. With a separate runtime module, it can even deal with stuck interrupts (the driver is forced into a polling mode) and more. The resulting code has not been posted for consideration, which is not entirely surprising; the fixes are, necessarily, of a highly conservative "don't break the driver" nature. Such fixes are almost certain not to be what a human would write after looking at the code. But the tool is open source, so interested developers can run it themselves to see what it would do.

Meanwhile, even without automatic fixes, these results seem worthy of some attention. Computers can be far better than humans at finding many classes of bugs; when computers have been used in that role, some types of bugs have nearly disappeared from the kernel. Maybe someday we'll have a version of Carburizer that can be folded into a tool like checkpatch; for now, though, we'll have to look at its complaints about our code separately and decide what action is needed.

Index entries for this article
Kernel	Development tools
Security	Hardware
Security	Linux kernel/Tools

Trusting the hardware too much

Posted Feb 16, 2012 9:12 UTC (Thu) by dvrabel (subscriber, #9500) [Link] (7 responses)

I checked one of the drivers I'd written and the infinite loop report was a false positive.

Trusting the hardware too much

Posted Feb 16, 2012 23:43 UTC (Thu) by marcH (subscriber, #57642) [Link] (6 responses)

Could you please post here a simplified version of the code falsely reported as an infinite loop + a link to the real code?

Trusting the hardware too much

Posted Feb 17, 2012 0:29 UTC (Fri) by corbet (editor, #1) [Link] (5 responses)

I'm not the poster you're responding to, but I did also check out a report in my code. It thinks this function in drivers/media/video/ov7670.c is an infinite loop:

static int ov7670_write_array(struct v4l2_subdev *sd, struct regval_list *vals)
{
	while (vals->reg_num != 0xff || vals->value != 0xff) {
		int ret = ov7670_write(sd, vals->reg_num, vals->value);
		if (ret < 0)
			return ret;
		vals++;
	}
	return 0;
}

...but it's just reading through a static array, looking for the end marker.

Trusting the hardware too much

Posted Feb 17, 2012 1:47 UTC (Fri) by hummassa (guest, #307) [Link] (1 responses)

Why a market, and not the length or an end-pointer?

Trusting the hardware too much

Posted Feb 17, 2012 16:54 UTC (Fri) by hummassa (guest, #307) [Link]

obviously the question is "why a MARKER, and not the length or and and-pointer?" ... damn autocorrect.

Trusting the hardware too much

Posted Feb 17, 2012 2:09 UTC (Fri) by asimkadav (guest, #82931) [Link]

Thanks for pointing that out Jon! The one you reported was an issue with our new i2c support - I just fixed them across all results. Most commonly, false positive arise due to counters that we fail to detect though. There were about 8% false positives or so when we reported the results thoroughly for 2.6.18 kernel. There are still about 800+ bugs out there, and many of them occur in larger groups (although there are many one off bugs as well).

Trusting the hardware too much

Posted Feb 20, 2012 17:47 UTC (Mon) by proski (guest, #104) [Link]

Well, if the CPU fails to increment vals, then you have an infinite loop ;)

Trusting the hardware too much

Posted Feb 21, 2012 7:30 UTC (Tue) by geuder (subscriber, #62854) [Link]

> ... but it's just reading through a static array, looking for the end marker.

Looking for an end marker, which is not guaranteed to exist. At least not until you do inter function dependency analysis and have all potential callers' code and make sure there is no path your array data origins from some not to be trusted hardware or user input. Could be easy or impossible.

Of course it would not be an endless loop, but still a potentially dangerous construct overreading the allocated array.

Trusting the hardware too much

Posted Feb 17, 2012 7:53 UTC (Fri) by Felix.Braun (guest, #3032) [Link]

[O]ne must treat the hardware as if it were a clever and willful dog. With some effort, it can be made to perform impressive tricks, but, given a moment of inattention, it will snag your dinner from the grill and hide under the couch.

May I suggest this as the quote of the year?

Trusting the hardware too much

Posted Feb 17, 2012 10:07 UTC (Fri) by marcH (subscriber, #57642) [Link]

A critical feature for any such tool is the ability to silence false positives and well known bugs with ease. I mean:

/** @carburizer_shut_up_because_I_know_you_are_wrong_here */
some_code_triggering_a_false_positive()

/** @carburizer_shut_up_because_this_issue_cannot_be_fixed_yet_see_bug_4325 */
dangerous_code()

Of course it does not have to be inline comments, some external exclusion file could be even better.

How does carburizer fare with respect to this?

> ... or in this rather over-the-top web site.

I'm looking forward to the new LWN CSS inspired by it!