Is the kernel development process broken?
It is true that the process has changed in 2.6, and that each 2.6 release tends to contain a great deal of new stuff. The situation is nowhere near as bad as some people claim, however. The problems which have turned up have tended to be minor, and most have not affected all that many users. Big, embarrassing security bugs, data corruption issues, etc. have been mostly notable in their absence. Kernel developers like Andrew Morton don't think there is a problem:
Even so, there is a certain feeling that some 2.6 kernels have been released with problems which should not have been there. Last week, in an effort to improve the situation, Linus posted a proposal for a slight modification to the kernel release process. The new scheme would have set aside even-numbered kernel releases (2.6.12, 2.6.14, ...) as "extra-stable" kernels which would include nothing but bug fixes. Odd-numbered releases would continue to include more invasive patches. The idea was that an even-numbered release would follow fairly closely after the previous odd-numbered release and would fix any regressions or other problems which had turned up. With luck, people could install an even-numbered release with relative confidence.
Over the course of a lengthy discussion, an apparent consensus formed: the real problem is a lack of testing. In theory, most patches are extensively tested in the -mm tree before being merged. -mm does work well for many things, and it has helped to improve the quality of patches being merged into the mainline. But the -mm kernels are considered to be far too unstable by many users, so they are not tested as widely as anybody would like. Even quite a few kernel developers work with the mainline kernels, since they provide a more stable development platform.
The next step in the testing process is Linus's -rc releases. These kernels, too, are not tested as heavily as one might like. Many developers blame the fact that most of the -rc kernels are not really release candidates; they are merge points and an indication that a release is getting closer. Since users do not see the -rc kernels as true release candidates, they tend to shy away from them. For what it's worth, Linus disagrees with the perception of his -rc kernels:
But it's very much a call to "ok, guys, calm down now".
The fact remains, however, that many people see a "release candidate" rather differently than Linus does.
There are some -rc kernels which clearly are release candidates; 2.6.11-rc5 is an obvious example. But even that kernel did not see enough testing to turn up the Dell keyboard problem.
The real problem seems to have two components. The first is that widespread testing by users is a vital part of the free software development process. This is especially true for the kernel: no kernel developer has access to all of the strange hardware out there, but the user community, as a whole, does. The only way to get the necessary level of testing coverage is to have large numbers of users do it. But here is where the second piece of the puzzle comes in: most users are unwilling to perform this testing on anything other than official mainline kernel releases. So certain classes of bugs are only found after such a release takes place.
A solution which was proposed was to bring back the concept of a four-number release: 2.6.11.1, for example. These releases would exist solely to deal with any show-stopper bugs which turn up after a major mainline release. Linus was negative about this idea, mostly because he didn't think anybody would be willing to do that work:
Linus went on, however, to outline how the process might work if a "sucker" were found who wanted to do it. The charter for this tree would have to be extremely restricted, with many rules limiting which patches could be accepted. The "sucker tree" would only take very small, clearly correct patches which fix a serious, user-visible bug. Some sort of committee would rule on patches, and would easily be able to exclude any which do not appear to meet the criteria. These conditions, says Linus, might make it possible to maintain the sucker tree, if a suitable sucker could be found.
As it turns out, a sucker stepped forward. Greg Kroah-Hartman has volunteered to maintain this tree for now, and to find a new maintainer when he reaches his limit. Chris Wright has volunteered to help. Greg released 2.6.11.1 as an example of how the process would work; it contains three patches: two compile fixes, and the obligatory Dell keyboard fix. 2.6.11.2 followed on March 9 with a single security fix. So the process has begun to operate.
Greg and Chris have also put together a set of
rules on how the extra-stable tree will operate. To be considered for
this tree, a patch must be "obviously correct," no bigger than 100 lines, a
fix for a real bug which is seen to be affecting users, etc. There is a
new stable@kernel.org address to which such patches should be
sent. Patches which appear to qualify will be added to the queue and
considered by a review committee (which has not yet been named, but it
"will be made up of a number of kernel developers who have
volunteered for this task, and a few that haven't
").
The rules seem to be acceptable to most developers. There was one suggestion that, to qualify, patches must also be accepted into the mainline kernel. Being merged into the mainline would ensure wider testing of the patches, and would also serve to minimize the differences between the stable and mainline trees. The problem with this idea is that, often, the minimal fix which is best suited to an extra-stable tree is not the fix that the developers want for the long term. The real fix for a bug may involve wide-ranging changes, API changes, etc., but that sort of patch conflicts with the other rules for the extra-stable tree. So a "must be merged into the mainline" rule probably will not be added, at least not in that form.
How much this new tree will help is yet to be seen. It may be that its
presence will simply cause many users to hold off testing until the first
extra-stable release is made. This tree provides a safe repository
for critical fixes, but those fixes cannot be made until the bugs are
found. Finding those bugs requires widespread testing; no new kernel tree
can change that fact.
| Index entries for this article | |
|---|---|
| Kernel | Development model/Kernel quality |
| Kernel | Sucker tree |