Is the kernel development process broken?

[Posted March 9, 2005 by corbet]

According to some, the 2.6 development process has gone far out of control. Wildly destabilizing patches are routinely accepted, to the point that every 2.6.x release is really a development kernel in disguise. There are no more stable kernels anymore. As evidence, they point out certain high-profile regressions, such as the failure of 2.6.11 to work with certain Dell keyboards.

It is true that the process has changed in 2.6, and that each 2.6 release tends to contain a great deal of new stuff. The situation is nowhere near as bad as some people claim, however. The problems which have turned up have tended to be minor, and most have not affected all that many users. Big, embarrassing security bugs, data corruption issues, etc. have been mostly notable in their absence. Kernel developers like Andrew Morton don't think there is a problem:

I would maintain that we're still fixing stuff faster than we're breaking stuff. If you look at the fixes which are going into the tree (and there are a HUGE number of fixes), many of them are addressing problems which have been there for a long time.

Even so, there is a certain feeling that some 2.6 kernels have been released with problems which should not have been there. Last week, in an effort to improve the situation, Linus posted a proposal for a slight modification to the kernel release process. The new scheme would have set aside even-numbered kernel releases (2.6.12, 2.6.14, ...) as "extra-stable" kernels which would include nothing but bug fixes. Odd-numbered releases would continue to include more invasive patches. The idea was that an even-numbered release would follow fairly closely after the previous odd-numbered release and would fix any regressions or other problems which had turned up. With luck, people could install an even-numbered release with relative confidence.

Over the course of a lengthy discussion, an apparent consensus formed: the real problem is a lack of testing. In theory, most patches are extensively tested in the -mm tree before being merged. -mm does work well for many things, and it has helped to improve the quality of patches being merged into the mainline. But the -mm kernels are considered to be far too unstable by many users, so they are not tested as widely as anybody would like. Even quite a few kernel developers work with the mainline kernels, since they provide a more stable development platform.

The next step in the testing process is Linus's -rc releases. These kernels, too, are not tested as heavily as one might like. Many developers blame the fact that most of the -rc kernels are not really release candidates; they are merge points and an indication that a release is getting closer. Since users do not see the -rc kernels as true release candidates, they tend to shy away from them. For what it's worth, Linus disagrees with the perception of his -rc kernels:

Have people actually _looked_ at the -rc releases? They are very much done when I reach the point and say "ok, let's calm down". The first one is usually pretty big and often needs some fixing, simply because the first one is _inevitably_ (and by design) the one that gets the pent-up demand from the previous calming down period.

But it's very much a call to "ok, guys, calm down now".

The fact remains, however, that many people see a "release candidate" rather differently than Linus does.

There are some -rc kernels which clearly are release candidates; 2.6.11-rc5 is an obvious example. But even that kernel did not see enough testing to turn up the Dell keyboard problem.

The real problem seems to have two components. The first is that widespread testing by users is a vital part of the free software development process. This is especially true for the kernel: no kernel developer has access to all of the strange hardware out there, but the user community, as a whole, does. The only way to get the necessary level of testing coverage is to have large numbers of users do it. But here is where the second piece of the puzzle comes in: most users are unwilling to perform this testing on anything other than official mainline kernel releases. So certain classes of bugs are only found after such a release takes place.

A solution which was proposed was to bring back the concept of a four-number release: 2.6.11.1, for example. These releases would exist solely to deal with any show-stopper bugs which turn up after a major mainline release. Linus was negative about this idea, mostly because he didn't think anybody would be willing to do that work:

I'll tell you what the problem is: I don't think you'll find anybody to do the parallel "only trivial patches" tree. They'll go crazy in a couple of weeks. Why? Because it's a _damn_ hard problem. Where do you draw the line? What's an acceptable patch? And if you get it wrong, people will complain _very_ loudly, since by now you've "promised" them a kernel that is better than the mainline. In other words: there's almost zero glory, there are no interesting problems, and there will absolutely be people who claim that you're a dick-head and worse, probably on a weekly basis.

Linus went on, however, to outline how the process might work if a "sucker" were found who wanted to do it. The charter for this tree would have to be extremely restricted, with many rules limiting which patches could be accepted. The "sucker tree" would only take very small, clearly correct patches which fix a serious, user-visible bug. Some sort of committee would rule on patches, and would easily be able to exclude any which do not appear to meet the criteria. These conditions, says Linus, might make it possible to maintain the sucker tree, if a suitable sucker could be found.

As it turns out, a sucker stepped forward. Greg Kroah-Hartman has volunteered to maintain this tree for now, and to find a new maintainer when he reaches his limit. Chris Wright has volunteered to help. Greg released 2.6.11.1 as an example of how the process would work; it contains three patches: two compile fixes, and the obligatory Dell keyboard fix. 2.6.11.2 followed on March 9 with a single security fix. So the process has begun to operate.

Greg and Chris have also put together a set of rules on how the extra-stable tree will operate. To be considered for this tree, a patch must be "obviously correct," no bigger than 100 lines, a fix for a real bug which is seen to be affecting users, etc. There is a new stable@kernel.org address to which such patches should be sent. Patches which appear to qualify will be added to the queue and considered by a review committee (which has not yet been named, but it "will be made up of a number of kernel developers who have volunteered for this task, and a few that haven't").

The rules seem to be acceptable to most developers. There was one suggestion that, to qualify, patches must also be accepted into the mainline kernel. Being merged into the mainline would ensure wider testing of the patches, and would also serve to minimize the differences between the stable and mainline trees. The problem with this idea is that, often, the minimal fix which is best suited to an extra-stable tree is not the fix that the developers want for the long term. The real fix for a bug may involve wide-ranging changes, API changes, etc., but that sort of patch conflicts with the other rules for the extra-stable tree. So a "must be merged into the mainline" rule probably will not be added, at least not in that form.

How much this new tree will help is yet to be seen. It may be that its presence will simply cause many users to hold off testing until the first extra-stable release is made. This tree provides a safe repository for critical fixes, but those fixes cannot be made until the bugs are found. Finding those bugs requires widespread testing; no new kernel tree can change that fact.

Index entries for this article
Kernel	Development model/Kernel quality
Kernel	Sucker tree

to post comments

Is the kernel development process broken?

Posted Mar 9, 2005 17:20 UTC (Wed) by jwb (guest, #15467) [Link] (1 responses)

Is the extra-stable tree going to merge up to 2.6.x for every value of x? If so I don't see the value, because that would mean the 2.6.x.y kernel contains all the problems of 2.6.x, minus a few fixes. There's only so many fixes you can get in y, and none of them are going to address the fact that Linus replaced the SCSI layer and the MMU subsystem between x-1 and x.

I used to test a lot of kernels, but eventually the problems I was seeing became untestable. The problems tend to crop up on the larger multiprocessor machines with significant i/o abilities and vast storage under heavy load, and unfortunately I don't have a "spare" one of those I can use for purposes of Linux kernel quality assurance.

Maybe there's not enough new features and obnoxious bugs to cause people to try out the kernels. I know when my hardware was unsupported I would try every -mm, -ac, -aa, -et al release to see if it fixed whatever problem I was having. But now that everything works, I'd rather not change.

Indeed, the opposite is now true. Since everything works correctly, I'm afraid to try new releases. I moved from 2.6.8 to 2.6.10-rc on a desktop machine of mine, and so many things were broken that I just gave up and went back to the previous.

Is the kernel development process broken?

Posted Mar 9, 2005 23:28 UTC (Wed) by PaulDickson (subscriber, #478) [Link]

Is the extra-stable tree going to merge up to 2.6.x for every value of x? If so I don't see the value, because that would mean the 2.6.x.y kernel contains all the problems of 2.6.x, minus a few fixes.

I think you have it mentally backword: the 2.6.x.y kernel contains 2.6.x, plus a few fixes. Which is the same except for atitude. y can continue to increment even after 2.6.x+1 is released (which might happen if 2.6.x+1 is not considered to be stable enough).

Is it any wonder unreported bugs might persist? Especially on uncommon machines that developers might not have ready access. And while you may not have a spare machine, you are testing the kernels. Even if you only use kernels from a distribution for your exact system, you'll still have to do this testing, except in that case you'd send the report back to the distribution's creator.

Both positive and negative reports are always helpful, but more detail is better for negative results.

Is the kernel development process broken?

Posted Mar 9, 2005 17:53 UTC (Wed) by JoeBuck (guest, #2330) [Link]

The main reason why the 2.6.x.y kernels are needed is for security patches and emergency fixes for breakage (e.g. the Dell keyboard problem). That means that the process for producing them has to be efficient and streamlined. In some cases, an inelegant but known-safe patch is preferable to the "right solution" when the right solution involves a major redesign of a flawed API or something similar; the "right solution" can be deferred to 2.6.(x+1).

Right now, -rc kernels aren't going to get much testing because more and more of the community does not trust even the releases to be sound. I think that the four-part numbering has the potential to restore trust.

Where is the difffrence between doing a 2.6.x.y or 2.7.0

Posted Mar 9, 2005 18:22 UTC (Wed) by dambacher (subscriber, #1710) [Link] (1 responses)

Why do they add a new dot-number to fix bugs but don't want to bump the secondary number?
e.g. do a 2.6.11.2 instead of a 2.6.x for stable and a 2.7 for new features?

For me it seems to be the same stable-unstable release cycle like it was with 2.4 and 2.5 kernels.

Where is the difffrence between doing a 2.6.x.y or 2.7.0

Posted Mar 9, 2005 22:01 UTC (Wed) by iabervon (subscriber, #722) [Link]

The main difference is that 2.6.11.y will stop when 2.6.12 is released, whereas 2.4.x, 2.2.x, and 2.0.x continue to get updates. The reason is that 2.6.12 shouldn't be so different from 2.6.11 that it has major new problems that can't be resolved quickly in a 2.6.12.y, so people encountering a bug in 2.6.11.last can switch to 2.6.12.latest to resolve it; seemingly, people don't switch from 2.4.x to 2.6.11 as soon as they run into a bug, so using the minor number is not really a suitable extension of history to date.

Alter the pace of "mainline"

Posted Mar 9, 2005 18:49 UTC (Wed) by shredwheat (guest, #4188) [Link] (1 responses)

I wonder how this will effect development of the "mainline" 2.6.x releases if developers know there is now a safety net in 2.6.x.y. Will this support bigger destabilizing patches? I am suspecting the 2.6.x's will move to longer term releases and we'll be moving in the direction we already are.

As a casual user I've been pleased with the 2.6 releases, although I have hidden behind the shield of Debian and Ubuntu 2.6 kernels.

Alter the pace of "mainline"

Posted Mar 9, 2005 23:27 UTC (Wed) by JoeBuck (guest, #2330) [Link]

If you're running Debian or Ubuntu kernels, you are running patched versions of somewhat older kernel.org kernels.

Is the kernel development process broken?

Posted Mar 9, 2005 18:57 UTC (Wed) by pm101 (guest, #3011) [Link] (1 responses)

I'd much rather if they periodically 'froze' some 2.6 version, so we'd get 2.6.10.x, and that'd be updated until it, and a replacement version, reached stability (rather than when when the next 2.6.x comes out). Once 2.6.10.24 was considered "stable," we'd freeze another 2.6 (probably around 2.6.22 or something), and we'd maintain both 2.6.10.x and 2.6.22.x. Once 2.6.22.x reached stability, we'd discontinue 2.6.10.x, and move on to maintaining 2.6.22.x and 2.6.48.x. This way, at every point in time, we'd have at least one known-stable kernel. I'd also do a minimum of 6 months of maintanance on a kernel before it was considered known-stable.

I'm interpolating from between this article and a previous one on kerneltrap, so I might be out-of-date on what they're doing.

Frozen kernels

Posted Mar 9, 2005 20:09 UTC (Wed) by eru (subscriber, #2753) [Link]

I'd much rather if they periodically 'froze' some 2.6 version, so we'd get 2.6.10.x, and that'd be updated until it, and a replacement version, reached stability

That is just what major distribution makers do. They pick some kernel as baseline (presumably after evaluating it for stability) and keep patching it until the next revision of the distro is released, and even after that in distributions (like RHEL) that have a long supported lifetime.

Is the kernel development process broken?

Posted Mar 9, 2005 20:24 UTC (Wed) by dang (guest, #310) [Link]

"There was one suggestion that, to qualify, patches must also be accepted into the mainline kernel"

Roughly that is going to happen. Alan Cox suggested:

"It must be accepted to mainline, or the accepted mainline patch be
deemed too complex or risky to backport and thus a simple obvious
alternative fix applied to stable ONLY."

Andi Kleen replied:

"That is what I wrote later in my mail anyways (did you really read it completely?:) See also the followup discussion with Russel and Arjan.
In general stable specific fixes should be the exception, not the rule though."

And folks seemed generally happy.

Is the kernel development process broken?

Posted Mar 9, 2005 20:59 UTC (Wed) by mrshiny (guest, #4266) [Link] (7 responses)

I've made these comments before in other discussions, but what this amounts to is that some of the kernel developers seem unwilling to develop the kernel the way most people develop software: work on something until it's "finished", then fix it up until it's "stable". With the 2.6 kernel nothing is ever finished and lots of things are never stable. Maybe the 2.6 kernels are more stable than any 2.5 or 2.3 kernels ever were, but even the early days of 2.4 seem better than this.

I'd like to see someone step forward to maintain 2.6, and let the rest of the developers go off with 2.7. I don't understand why this hasn't happened really... 2.6 is really just like what a 2.7 might be. If people are happy with the current 2.6, they'll be just as happy with the 2.7 kernel. Those of us who want stuff to work can stick to 2.6.

Is the kernel development process broken?

Posted Mar 9, 2005 22:28 UTC (Wed) by iabervon (subscriber, #722) [Link]

The problem with that idea is that the "fix it up until it's stable" step actually properly involves sitting for three months waiting for bug reports to come in and otherwise doing nothing at all. The average bugfix involves one developer writing two emails a day for a week to whoever reported the bug trying to work out what's going on, and there are maybe a dozen such bugs outstanding at a time. That's maybe 120 manhours/week out of the 16000 available. Kernel developers seem to be unwilling to spend half of their time waiting, and users are willing to wait until it's too late to report bugs anyway.

The reality is that developers will continue to work on their projects at all times, because that's what they are paid by their employers to do. The mainline has to pick up this work at some point, or it eventually becomes irrelevant. A kernel version cannot begin to stabilize until it gets a mainline release.

The changes which were needed to the process are exclusively that developer priority has to be given to fixing bugs as soon as they are reported, and those fixes have to be released in a version that doesn't have any other changes that haven't seen widespread testing in the real world. It doesn't matter to stability what the developers are doing during the 99% of the bugfixing period that doesn't actually involve any work on fixing bugs; what matters is the the results of bugfixing actually get applied to the kernel with the bugs in such a way that the process converges to a series of kernels with 0 bugs instead of to a series of kernels with a small constant number of bugs. There is a huge difference to users if each minor release creates a few new bugs and then fixes them in a few patch releases versus each minor release fixing a few bugs and creating a few new ones.

Is the kernel development process broken?

Posted Mar 9, 2005 22:39 UTC (Wed) by PaulDickson (subscriber, #478) [Link]

Which 2.4 series were you using? :-) I found the 2.6.0-test series to be far better than the early 2.4 series. The latest kernels are even better. The last two I used were 2.6.10-rc1 and 2.6.11.

As for "finished": I seriously doubt that any acceptable definition of finished would be fair to most kernel developers. Because of the sheer number of contributors, some project/feature will always be just ready to add their code to the kernel. Only small groups or funded developers can halt development and switch to bug fixing. With a bunch of volunteers, you either allow then to contribute or you lock them out.

Is the kernel development process broken?

Posted Mar 10, 2005 6:59 UTC (Thu) by khim (subscriber, #9252) [Link] (4 responses)

Bingo! You've got a prize. Just a clarification: "some of the kernel developers" - is something like 95% of them. Most development works made on kernel is not done by "big kernel hackers" who can coordinate work with kernel releases but by independent developers (or not-so-independent developers if we are talking about people who are pair for kernel work). And they care about exactly one feature: their own. They want it to be included in kernel when it's ready and not year later when it becomes obsolete. And yes - year or so is what's needed to "properly stabilize" kernel.

With 2.4 we had mainline kernel (forever obsoleted, not really-interested for 90% of users) and vendor kernels (heavily patched and not as stable but with USB support where appropriate, with timely SATA support, with XFS and so on). With 2.6 we have exactly the opposite - and this is good thing(tm) since it's easier for everyone to have vendor kernels which are more stable and with reduced number of features then opposite situation: very small chance of API fragmentation and such.

Looks like you missed the whole article. Development kernels do not receive testing! 99% users will stuck with 2.6 till 2.8 will be out and then we'll see huge number of complains "oh, my dell keyboard does not work" and "aaa! my CD is not detected anymore". 2.6 is only stable since it's declared stable. This is not a joke: kernel can only be made stable when it's widely tested and it can only be widely tested when it's declared stable. Thus there are no realistic way to transform unstable kernel to stable one! 2.4.x were disasters for most small x - for exactly that reason. I do not see why we need repeated performance with 2.8.x...

Is the kernel development process broken?

Posted Mar 10, 2005 15:40 UTC (Thu) by mrshiny (guest, #4266) [Link] (2 responses)

I'm not sure where you get the statistic that the 2.4 kernel was not interesting to 90% of the users. I'd be very surprised if that were true.

Furthermore, I don't believe that the 2.6 kernel makes life easier for vendors. Preventing API fragmentation is a good goal, but that can be met by having shorter development cycles. But by shorter development cycles I don't mean 2.6.x to 2.6.x+1. Each 2.6 kernel is "unstable", as in, untested. You claim I didn't read the article but you got the point backards: all of the 2.6 kernels are untested and thus are exactly like 2.5 kernels. Except that with bitkeeper we can better manage the changes and make sure that each release is somewhat working. But nothing about the kernel dev process changes the fact that users want working kernels, not experimental kernels.

As to your comment that the 2.8.0 kernel would be unstable, because nobody tested 2.7.99, you're right. That's exactly what I'd expect, except for this: The kernel developers should declare a feature freeze, and then a code freeze, before releasing 2.7.99. 2.7.99 should contain fixes for bugs found in previous 2.7 kernels. 2.7.99 can be announced to the world, to distro-makers, to app developers, to anyone. And then, once bugs stop being found, it can be released as 2.8.0. This is the QA process that many software companies follow. Open-source companies also follow this process; this is exactly what KDE does. Will this catch all bugs? of course not. But if there was a real effort to make a release candidate that was actually suitable for release, people might test it. Especially if it offers features that the so-called obsolete 2.6 offers.

How can we offer QA, and real release candidates, and stable kernels, while still preventing obsolescence and API fragmentation? Have shorter release cycles. The Gnome project releases every 6 months. I'm sure the kernel team could come up with something reasonable along these lines.

Is the kernel development process broken?

Posted Mar 10, 2005 20:47 UTC (Thu) by oak (guest, #2786) [Link] (1 responses)

Kernel is one *huge* project. Gnome is a lot of "trivial" to medium sized
projects, where each one sits on a top of a stable API (Gtk, API-stable
since 2.0, after that only new APIs have been added, no old ones have been
changed).

Just calculate the lines of each gnome project together (ignoring
configure files) and compare that to the kernel code. Then look at how
many lines of that code changes in the 6 month period in both projects...

Is the kernel development process broken?

Posted Mar 10, 2005 21:15 UTC (Thu) by mrshiny (guest, #4266) [Link]

So what? That gives the kernel developers an excuse to release broken software? The technical challenges behind Gnome or KDE may be much smaller than the kernel, but then, I expect that the calibre of developer on the kernel is much higher. And I expect that their level of quality control should be higher, since their product is actually more important.

As for the problem of stable APIs, I'm a believer in stable APIs. The kernel developers like to complain that stable APIs make their jobs harder, but I think the reality is that they make certain jobs harder and other jobs easier. For example, it's easier to make any change you want if the API is unstable, because you don't have to worry about compatibility. On the other hand, it's hard to work on drivers that need to work on multiple versions of the kernel, since there's an unlimited number of API revisions. It's harder to make a new module of any kind that needs to run on multiple kernel versions.

Anyway, the hardware API that the kernel runs on changes very slowly. The kernel has hardly any dependencies, except on itself. So any problems that are due to API flux are created by the kernel developers themselves. However, they've claimed that flexible API is how they want to work, so, fine, that's how they work.

I think the kernel process could be improved by separating the drivers into a separate project. That way we could have new drivers for old kernels, instead of having to install new kernels just to get new drivers. If we just accomplished that separation, then I bet most users would be totally happy since they could pick whatever 2.6.x works for them, and use the 2.6-series drivers to their heart's content. Most people don't need new kernels, they only need new drivers. But I won't elaborate on this since Greg KH has already flamed me about this very topic. Let's just say that I disagree with the kernel developers about their development practices.

Is the kernel development process broken?

Posted Mar 11, 2005 14:19 UTC (Fri) by jschrod (subscriber, #1646) [Link]

Why are the 2.4 kernels not of intersted for users? Like, working APM instead of broken ACPI brain damage?

I would use it happily weren't it for the stopped support of SUSE 8.1 and my thus-forced upgrade.

Joachim

Is the kernel development process broken?

Posted Mar 10, 2005 8:09 UTC (Thu) by beejaybee (guest, #1581) [Link] (2 responses)

I think so. I also think this sort of thing is going to be used by proprietary software vendors as a stick to beat sysadmins into rejecting linux (and, by implication, _all_ FLOSS). This is actually more dangerous than legal arguments over copyright issues, about which most people have at least a bit of common sense.

Those who shoot themselves in the foot tend not to get too much sympathy.

I regard myself as fairly tech-sav yet I've not yet heard any real reason to switch from the 2.4 kernel to the 2.6 - apart from support for hardware features I apparently don't posess. OTOH upgrading the kernel is a _real_ bind - just about the only thing that requires a linux system to be rebooted, so you should see why I'm not keen to track minor version numbers.

As for odd minor minor version numbers, what the hell's the point? The same thing will happen all over again - someone will start pushing development (inadequately tested) code into the "stable" stream.

Sorry guys but if we can't get volunteers to test development kernels properly before inflicting them on the mainstream, I'm afraid we'll just have to stagnate. And maybe die.

Is the kernel development process broken?

Posted Mar 10, 2005 9:28 UTC (Thu) by Duncan (guest, #6647) [Link] (1 responses)

Well, I think not. <g> For the most part, people who want ultra-stable
kernels to run year-plus uptimes on are running distribution kernels,
which are literally what you are asking for, six-month "out-dated" kernels
that they picked and stabilized for six months, backporting a few drivers,
features, and security fixes, as needed. That's actually what the current
kernel development process was designed to encourage -- faster development
in the mainline kernel so less stuff had to be backported from the
development kernel to something /years/ outdated, so distribution kernels
stuck closer to mainline, and only had to freeze and stabilize it. Those
wanting ultra-stable kernels aren't /supposed/ to be running the
kernel.org kernel, with the current 2.6 arrangement.

You don't see any reason to switch from 2.4? Why aren't you still on 2.2
then, or even 2.0. Talk about wanting and getting outdated stability.
No, I'm not joking. It's a serious question. Actually, that has often
been pointed out as one of the benefits of the open source development
method. There's nothing forcing you to run on that upgrade treadmill
unless you /want/ to run on it. You can continue to backport security
fixes and what-not as necessary.

Just as 2.4 had certain improvements over 2.2, so 2.6 has improvements
over 2.4. The improved scheduling is one I like. Another is the
data=ordered and data=journal options for reiserfs, which happens to be my
chosen file system, altho it's possible that has been backported to 2.4 by
now (I know it was in the 2.4 SuSE kernels for some time b4 it was added
to 2.6 but don't know if it ever hit 2.4 mainline), I haven't kept track.
Another is the increased flexibility of UDEV as compared to devfs or a
static /dev. Still another is the IDE rewrite, including the simpler and
less confusing (from a user perspective) ATAPI burner drivers (yes I know
that ATAPI is a reduced functionality SCSI, but it still doesn't make
sense to an ordinary user, even or especially a semi-technical one that
actually cares about such things, why SCSI drivers were required for an
IDE/ATAPI CDR/RW. Yet another is the /far/ simpler mouse driver, with the
ability to set up a /dev/input/mice that looks the same from user mode no
matter /what/ sort of pointing device (or how many) are in use. One more
is the built-in ALSA sound drivers. Any one of these are IMO worth
switching to 2.6 for. However, there's nothing forcing you to upgrade if
you don't find those or other features useful, and you are comfortable
with 2.4. Again, that's considered one of the benefits of open source --
no forced upgrade treadmill. Comfortable where you are? You are welcome
to stay there, and don't have to worry about your license expiring, or the
only people that know the code deciding it's not worth supporting any
more.

As for the the nano-version numbers, no, the same thing is /not/ going to
happen all over again. It's simply the micro-release with a few very
limited fixes. The whole point is to only have fixes to anything found by
the wider testing of full release, no "development" at all in the
nano-versions.

Finally, keep in mind that despite the popularity of open source and Linux
in particular, development is still done because the developers find it
fun and challenging to do. Linux existed long before it became popular,
long before people ran or even could run "mission critical" systems on it,
and it will continue to develop and change regardless of whether people
continue to do so, as long as there remains at least one person with the
ability and interest to continue that work. You may wish it to stagnate
because you don't like the methods being used, and as far as you are
concerned, if you /want/ to continue to use an old stagnant version, you
are certainly free to do so, because that's one of the things open source
lets you do. However, the rest of the Linux world will continue to change
and evolve as it's going to, in any case. Again, that's the way open
source works. Since the developers are in a very real way volunteering
(paid or unpaid) to continue development in whatever way they find
rewarding (fun, interest, or money-wise), they will continue development
as they find best suits their wishes. Others who disagree are of course
free to stay where they are, or fork the code and develop it in a
different direction, just as recently happened with xfree/xorg. Success
isn't defined as continuing to gain market share, tho that happens to be a
nice side benefit when it occurs, but as continuing development in a way
that continues to be rewarding to those participating. Linus and the
others find the current system better matches their style and is therefore
more efficiently rewarding with the less hassle, than the old odd/even
minor version thing, and they are the ones driving the development, so
that's the way it's going to be, unless someone else decides they want to
try it a different way, and can get enough to agree with them to make the
fork.

My thoughts, no personal attacks intended, just that I found your post an
appropriate place to tack on mine, taking as it does the opposite
viewpoint.

Duncan

Why are people surprised?

Posted Mar 10, 2005 14:31 UTC (Thu) by amarjan (guest, #25108) [Link]

Those wanting ultra-stable kernels aren't /supposed/ to be running the kernel.org kernel, with the current 2.6 arrangement.

Thanks for posting that, it confirmed my understanding from previous articles. Which makes me wonder why people are surprised that the kernel.org kernel isn't getting as much testing as it used to: many potential casual testers were specifically warned off from testing it, and told to use distribution kernels instead.

Also, the churn is too much for would-be casual testers to track. That's certainly the case with me. I used to run kernel.org kernels, up to about 2.6.5, but the rate of change got to be too much (and sometimes the changes were a bit scary), and it was simply far easier for me to install newer kernel versions from my distribution.

Or am I missing something?

Is the kernel development process broken?

Posted Mar 10, 2005 16:12 UTC (Thu) by bluefoxicy (guest, #25366) [Link] (1 responses)

I've posted many, many times on the LKML and on IRC channels and kernel trap about moving to a new development model (blog has rough, light touches only; find my posts on LKML). Apparently my ideas aren't good enough even to look at for improvement; hardly anyone ever comments.

I like the 2.6 model because it produces continuously near-stable releases. NEAR stable, not stable. It produces something you could release reasonably, as we've seen; but as we all know the first "stable" release of any software needs some bug fixes. The 2.6 model gives you that perpetual "fresh release" feeling, complete with perpetual bugs and security concerns.

What I repetedly propose is that the 2.6 model become the "Volatile" model for odd-number kernels (2.7, 2.9, 3.1). Because it can be considered a fresh release whenever, it can be released at any point. A 6 month release cycle and 18 month support cycle would produce kernels stabalized over 3 releases. The "Support Cycle" would have to be strict bug fix only to improve stability and security; new code should not be introduced.

Use of the "Volatile" branch would still be encouraged because it is apparently good enough right now anyway; however, cautious users such as businesses with critical servers who need a real guarantee could use a stable branch.

Drivers should not be merged back because they introduce new code and add too much maintainer overhead. I haven't mentioned this before, but putting the UDI in the kernel would allow core and driver development to be separated on many fronts, which would possibly facilitate a new development model much less taxing on developers and on kernel.org mirror space.

But whatever, you know? Nobody's gonna listen to me ever, so these ideas will probably never make it to discussion for improvement and assesment on if they're actually viable; or if they do they'll be accredited to Linus or something.

Is the kernel development process broken?

Posted Mar 10, 2005 22:23 UTC (Thu) by giraffedata (guest, #1954) [Link]

Maybe I can give a clue as to why there's so little discussion of your idea: We can't figure out what it is.

I read over your description a few times, and was not able to figure out what sort of process you have in mind and how it would differ from what's in place now and from alternatives that have been proposed.

I have trouble following the syntax. If there were more words and consistent, defined terms, or I were a lot smarter, maybe I could see what you're proposing.

Automated testing?

Posted Mar 10, 2005 16:15 UTC (Thu) by cthulhu (guest, #4776) [Link] (4 responses)

Hi all,

I'm a perfect candidate to be a kernel tester, but I've never "tested," per se. Lately, I've tended to resist upgrades because I finally got all the things I need working, and due to lack of time. I do want to keep going forward, and I am currently engaged in a big (for me) effort to upgrade my several home machines to 2.6.x.

My method of testing is to configure/compile/install a new kernel, then see if my favorite things still work: X, audio, CD/DVD burner, USB devices. If they do, then I keep running said kernel until something crops up. At that point, I might make the connection with the kernel upgrade, but mostly only if it's a show-stopper.

So why I'm commenting, is that I'm wondering if there's some project that collects a bunch of test scripts for most of the standard features, like the things I mentioned above, plus perhaps file system testing to make sure there's no corruption.

Perhaps there is and it's commonly used - either way, it would probably be good to mention it here in case there are others like me who would get more active in testing if there was an easy way to do a bunch of it quickly...

Automated testing?

Posted Mar 10, 2005 17:48 UTC (Thu) by iabervon (subscriber, #722) [Link]

The hard thing is that there's a ton of possible hardware of the sort you mention, not all of it you want to use in testing (burn another set of test DVDs each time to test the burner?), and lots of it requires user interaction to verfiy the results. People had problems with 2.6.11 where audio worked fine, except that a changed interpretation of their saved mixer settings meant that no sound actually came out of their laptop docking station's speakers. From the point of view of the user, that's a bug, but you need a sense of hearing, a particular configuration, and a particular idea of what that configuration is supposed to mean in order to notice it. Many of the audio-related bugs these days seem to cause the sound to be distorted in some way, rather than giving wrong results that are accessible to the computer.

That said, it would probably be useful to have a program which would notice when you changed kernels, and would notice when you tried a device you hadn't tried with the current kernel, and inform you that this is the first time with that kernel and ask whether it worked. Then you could compare the hardware that you'd tested with the new kernel against the hardware you'd used previously, and consider testing your USB PDA cradle in advance of the PDA's batteries being low.

Automated testing?

Posted Mar 10, 2005 18:26 UTC (Thu) by cliffman (guest, #13144) [Link]

I'm always interested in getting more test scripts, and more test examples/tools. OSDL does a bunch of automated testing, we need all the help we can get. If you have any scripts, i'll give them a home and a web page.
Send to cliffw@osdl.org

Automated testing?

Posted Mar 11, 2005 5:28 UTC (Fri) by dlang (guest, #313) [Link]

automated testing is great for making sure that a particular feature works (useually one that you broke in the past, causeing you to write the test), but it's not very good for testing various hardware combinations (you have to HAVE that hardware combination, and you only have so much room for systems) or finding _new_ problems (becouse you haven't written a test for it yet)

the kernel does go through automated testing, and it can be improved (see the request for help above), but to think that automated testing can eliminate all problems is not understanding the problem.

Coming to a distro near you...

Posted Mar 13, 2005 20:24 UTC (Sun) by Tobu (subscriber, #24111) [Link]

Ubuntu is experimentally doing that sort of shallow, wide ranging testing:
I now have a small app that asks a few questions about how well the automatic config went, did the sound work out of the box, etc... and then sends that data plus a bunch of logs and hald reports upstream.

If ever an user is left in the cold with a kernel upgrade, a bit of datamining will make sure the problem is identified and fixed, without actually having to read all the logs.

Keep it stable (except for the things I want to change)

Posted Mar 11, 2005 5:34 UTC (Fri) by dlang (guest, #313) [Link]

this seems to be what people are asking for.

they want the kernel kept stable, but they still want the latest drivers for all their goodies to be in it

what people are refusing to recognise is that most of the problems are caused by driver updates, not by the big new features (yes the big new features do cause problems once in a while, but the Dell keyboard error was a driver update)

2.6 is doing a phenominal job of integrating the big new features with very little problem, the problems are more with missing the fact that this driver update (which affects 500 pieces of hardware) breaks one hardware combination (even if it fixes two other combinations)

replace hardware combination with workload and you cover just about all of the rest of the problems (but while not all workloads are tested many are)

get any group of 1000 kernel users togeather and ask which parts should be updated so that they aren't left out in the cold and you will find that you probably have allowed so much of the kernel to be updated that you may as well allow updates to everything

Strange indeed

Posted Mar 13, 2005 1:25 UTC (Sun) by stock (guest, #5849) [Link] (1 responses)

I admit i still run my AMD64 machine using kernel 2.6.7. Shortly after
that indeed strange things started to happen with the release of kernel
2.6.8, followed up with kernel 2.6.8.1 , which is even more strange in
itself.

The problems with 2.6.8 and 2.6.8.1 actually were with the burning of
CDR's and DVD recordables, which all was working superb using 2.6.7.

I think Linus should step in here, as indeed the commercial focus of
corporations has reached to the doors of kernel.org. I would say that
commercial vendors like RedHat, SuSE , Mandrake and others should back
off and leave kernel.org untouched.

What i mean by that is, that the old principle of running a stable kernel
by downloading a kernel source tree from a stable branch from kernel.org
should be restored, and at the same time stop pushing users to the
COMMERCIAL Linux distro builders, for running a stable kernel. This is
evil in itself. There should be a range of v2.6 kernels on kernel.org
which one can rely on, and build a stable kernel from.

So here my wake-up call to Linus Torvalds, to step in and resume the
development process as we know it, and expect it to be.

Robert

Strange indeed

Posted Mar 20, 2005 18:12 UTC (Sun) by wolfrider (guest, #3105) [Link]

[[ What i mean by that is, that the old principle of running a stable kernel
by downloading a kernel source tree from a stable branch from kernel.org
should be restored, and at the same time stop pushing users to the
COMMERCIAL Linux distro builders, for running a stable kernel. This is
evil in itself. There should be a range of v2.6 kernels on kernel.org
which one can rely on, and build a stable kernel from.

So here my wake-up call to Linus Torvalds, to step in and resume the
development process as we know it, and expect it to be. ]]

--I'm with you 100%. The kernel devels should realize that making things simple for the sysadmins is important. Putting New and Funky Shiznit into the kernel is all FWAG (fine well and good) -- but NOT if it makes life harder for the people who are trying to keep up with a stable kernel on their system!!

Is the kernel development process broken?

Posted Mar 18, 2005 17:39 UTC (Fri) by Nicolas (guest, #28602) [Link]

Free software is based on the gift culture no? Everyone is valued by the
quantity and quality of the gifts s-he makes to the comunity.
One must be given credit for what one does.

Let us give credit to the testers as much as we give to the developers

How do we do this?
I am not sure, but let me propose the following wild idea:

Create a tool that inventories all the hardware and software installed
on a given computer (is that possible? I guess it is but can't tell for
sure).

Then having this tool report on a centralized site: "Such-one used/tested
this-version-of-the-kernel for this-amount-of-time and reported X bugs.
Said bugs being: ...."
(Note: Would work for any application, not solely for the kernel)

Set an apropriate searching mechanism on the centralized site (bugoogle,
bugle, whatever)

I suppose one could then look in the site for a configuration similar to
his-her own and see how well the given kernel behave.

Anybody could become a tester, this could be something like the voting
scheme for most used applications in Debian (what's the name of this
application again?)

Could be used also to automatically create a top-10-testers list (give
people credit for their work)

Could also work like the film critics. How do you know if you might like a
movie? You find a critics that is almost always right (for you) and elect
him-her as your favorite. Everyone could elect his very own private tester
that has almost the same configuration and does actually a real good job
at testing the kernel (or whatever other piece of software).

just my 2 centavos

nicolas