KS2009: Regressions

By Jonathan Corbet
October 19, 2009

LWN's 2009 Kernel Summit coverage

Regressions are the worst sort of bug. As long as working systems continue to work, we can be reasonably confident that the kernel is getting better over time. But if we break working systems, the situation is far more questionable. For this reason, it is important to avoid introducing regressions into the kernel. When they occur, they must be recognized as such, tracked, and fixed as quickly as possible. The kernel's regression tracker is Rafael Wysocki; he presented some information on how we have been doing over the last year.

Various plots showing the number of regressions reported and the number fixed against the age of each kernel were presented. These plots can be fitted to an exponential distribution which is very similar to that seen for phenomena like radioactive decay. Looking at things that way, Rafael says, regressions have a half life of about 17 days, and a mean lifetime of 24 days.

There is an important qualification to bear in mind here, though: these plots only show regressions which have been fixed. Over the year or so surveyed, 858 regressions were reported, but only 738 of those were reported to be fixed. So, says Rafael, the full picture has to include a second type of regression which is harder to find and to fix.

The 2.6.30 kernel, it seems, has a steeper regression curve than its predecessors; it is not flattening out in the same way. But the number of regressions reported is fewer for this point in the cycle. A fair amount of time was spent trying to figure out just what that means. Perhaps this kernel is simply better, with fewer bugs to report in the first place. Or perhaps there are fewer testers. Rafael believes that the number of testers is roughly constant, as is the rate at which they can find bugs. But, as the kernel grows, there is more code which will surely have bugs in it. So Rafael fears that more regressions are slipping through the cracks.

This worry was echoed by others, some of whom noted that Rafael's regression list contains only a small subset of the problems. Distribution bug trackers tend to fill with many more issues. There doesn't seem to be any easy way to propagate distribution bug data upward, though; developers for distributions which ship relatively young kernels tend to be working flat-out already.

Ted Ts'o noted, though, that he feels much safer running -rc1 kernels now than in the past. They really seem to be getting more stable.

Arjan van de Ven put up some data from Kerneloops.org suggesting that the number of bugs per kernel release remains roughly constant. As always, most users are affected by a relatively small number of bugs. In an aside, some questioned whether the posting of certain types of crashes - null pointer dereferences in particular - could be a security concern, but Alan Cox responded that null-pointer problems are being reported on the mailing lists all the time and few people are doing anything about them. A listing on kerneloops.org seems unlikely to worsen the situation.

So how can we make things better? Rafael suggested that, in the current development model, the opening of the merge window tends to distract developers just when the just-released kernel is starting to get more users. That, in turn, can keep them from looking at regression reports. So he suggested that it might make sense to delay the opening of the merge window for one week after a major kernel release. Linus did not like the idea, though, saying that it would just drag out releases without helping things much. Rather than pay attention to regression reports, developers would just continue to bash out new stuff for the delayed merge window, creating even more regressions the next time around.

What would really help, it was agreed, is more testers - hardly a new or novel conclusion. Perhaps if more people would run linux-next, more bugs would be found (and fixed) early. Linux-next is hard to test, though, and hard to debug. It changes radically from one day to the next, and it is not bisectable. Rafael thinks that there is little benefit to users to testing it, since any bugs found there are relatively unlikely to be fixed. Andrew Morton thought that developers should be testing their code in linux-next as a way of finding bugs caused by interactions with other changes. That is a hard sell, though; working with linux-next can force developers to contend with bugs from completely unrelated changes. Alan Cox noted that linux-next is often more reliable than the -rc1 releases.

There was some talk about how long it can take for regression fixes to get into the mainline. Often these fixes will set in maintainer trees for some time. Might there be a way to get them in more quickly? As it turns out, a number of subsystem maintainers feel that these fixes should age for a while in a testing tree, lest they introduce other problems. So that situation is not likely to change much.

Next: The future of perf events

Index entries for this article
Kernel	Development model/Regressions
Conference	Kernel Summit/2009

to post comments

KS2009: Regressions

Posted Oct 19, 2009 17:22 UTC (Mon) by dyqith (guest, #31406) [Link] (2 responses)

Maybe what we need is a group/company/foundation/person to come up with a huge collection of machines for automatic regression testing for new kernels.
There should be a couple of machines from different categories (servers, laptops, desktops, mobiles) and manufacturers (Dell, HP, etc), and for testing at different levels (configure/compile, static analysis, boot, devices, filesystems, memory, etc).

We have a set standard of tests, and users/devs can add in new tests when new features are needed to be tested.
The output of the tests can then be posted to a bug tracker somewhere.

The way I see it is testers have two problems (time and materials).
Some have lots of time to test things, but no machine to play with;
others have no time, but lots of machines.

KS2009: Regressions

Posted Oct 19, 2009 19:12 UTC (Mon) by dmk (guest, #50141) [Link]

http://ltp.sourceforge.net/

KS2009: Regressions

Posted Oct 19, 2009 21:07 UTC (Mon) by nix (subscriber, #2304) [Link]

We *have* that. The problem is that users tend to have stranger hardware
and stranger configurations than anything that we can actually *test*:
this is as opposed to Ingo's randconfig does-it-boot testing, which while
otherwise excellent doesn't spot problems where drivers not needed for
booting and building kernels don't work properly. I've had at least one
regression with every one of the last four released kernels, and not one
of them was something that would have been spotted by Ingo's randconfig
testing (because that testing had wiped out all the low-hanging fruit
already).

However, all those regressions got squashed *fast* once I reported them:
even the one I didn't realise was a regression until someone else reported
it (the stuck-keys-in-X-on-SMP 2.6.31 locking bug). The key is to report
and test :)

KS2009: Regressions

Posted Oct 23, 2009 23:22 UTC (Fri) by jmspeex (subscriber, #51639) [Link] (2 responses)

I think one problem is that older hardware doesn't seem to get tested. From my (limited) experience, what happens is that when a new machine gets released, Linux support is often not great because it doesn't know how to handle some of the new hardware. Then, things get better as the bugs are fixed. Then after a year or so, you start seeing regression as the machine becomes old and less tested. For example, my last two laptops both had suspend fully working at some point and both had it broken by some kernel update.

KS2009: Regressions

Posted Oct 24, 2009 11:19 UTC (Sat) by nix (subscriber, #2304) [Link] (1 responses)

That's when you report the bugs :))) I've kept some machines working for a
decade and more simply by reporting bugs when their old hardware breaks.
The only machine I've ever given up on was a box with an ancient Promise
ISA disk controller, which suddenly broke in 2.4.x versus 2.2.x. Of course
this was before git and bisection: these days I could bisect to find the
faulty commit and report it, but in those days I was pretty stuffed.

KS2009: Regressions

Posted Oct 24, 2009 18:01 UTC (Sat) by jmspeex (subscriber, #51639) [Link]

For the first laptop on which suspend got broken I was able to pinpoint that the problem happened between 2.6.12-rc5-git5 and 2.6.12-rc5-git6 after several months of testing (the problem occurred only after several days and it was my main work machine). This was by using the git "pre-releases" on kernel.org. After finding that, the response I got was simply "you need to learn git to further bisect otherwise we're not even looking at it" (I had already bisected to within one *day* of patches). At this point I just gave up. I *do* still report bugs for projects that treat reports better (gcc is one that comes to mind), but I'm no longer bothering with kernel bugs unless it *really* annoys me or there's a developer committed to putting some time into finding the problem instead of forever asking me to do more work without even looking at the code.

KS2009: Regressions

Posted Oct 24, 2009 10:47 UTC (Sat) by modernjazz (guest, #4185) [Link] (1 responses)

If the issue is getting more testers, then it seems clear that the "easy"
solution is to start making real use of the data in the distributions' bug
trackers. Since the distribution developers are swamped, people who care
about the kernel should be browsing the distribution bug databases
themselves. Saying "we need more testers!" but then not exploiting this
resource seems somewhat disingenuous.

KS2009: Regressions

Posted Oct 25, 2009 6:42 UTC (Sun) by kreutzm (guest, #4700) [Link]

Although not mentioned in the text, I guess the real problem here is that distribution kernels tend to be old by kernel developer standards. And if someone reported "my upgrade from Etch to Lenny broke ..." then it is not clear that this person would be able to either git-bisect the exact faulty patch nor that he would be able to run the latest kernel (which might not even run out of the box).

The dull (but probably) working solution would be if someone with deep knowledge of kernel regression would occassionally run through the distribution BTSs and mark things "fixed-upstream" when he notices that some regression is already solved, so that the reporter knows about it (and maybe the distro can cherry pick something if they so desire). But I can hardly estimate how much this would actually help.

Maybe someone with lots of disk space could regularly compile the latest kernels (lets say every stable, -rc, ..) for all major distributions such that "mere mortals" could yum or apt-get them. Then experienced users could narrow the problem down by quite a bit and either distro kernel people could jump in (if the problem is solved in kernels newer than the one shipped) or kernel devs could notice if it is still present in the latest upstream one. This, and a good explanation of the procedures involved (git-bisect, ....) could maybe help some more.