Long-term support and backport risk
So it is interesting that, at the recently-concluded Linux Foundation Collaboration Summit, numerous people were heard expressing concerns about this model. Grumbles were voiced in the official panels and over beer in the evening; they came from representatives of the relevant vendors, their customers, and from not-so-innocent bystanders. The "freeze and support" model has its merits, but there appears to be a growing group of people who are wondering if it is the best way to support a fast-moving system like Linux.
The problem is that there is a great deal of stress between the "completely stable" ideal and the desire for new features and hardware support. That leads to the distribution of some interesting kernels. Consider, for example, Red Hat Enterprise Linux 4, which was released in February, 2005, with a stabilized 2.6.9 kernel. RHEL4 systems are still running a 2.6.9 kernel, but it has seen a few changes:
- Update
1 added a disk-based crash dump facility (requiring driver-level
support), a completely new Megaraid driver, a number of block I/O
subsystem and driver changes to support filesystems larger than 2TB,
and new versions of a dozen or so device drivers.
- Update
2 threw in SystemTap, an updated ext3 filesystem, the in-kernel
key management subsystem, a new OpenIPMI module, a new audit
subsystem, and about a dozen updated device drivers.
- For update
3, Red Hat added the InfiniBand subsystem, access control list
support, the error detection and correction (EDAC) subsystem, and
plenty of updated drivers.
- Update 4 added WiFi protected access (WPA) capability, ACL support in NFS, support for a number of processor models and low-level chipsets, and a large number of new and updated drivers.
The end result is that, while running uname -r on a RHEL4 system will yield "2.6.9", what Red Hat is shipping is a far cry from the original 2.6.9 kernel, and, more to the point, it is far removed from the kernel shipped with RHEL4 when it first became available. This enterprise kernel is not quite as stable as one might have thought.
Greg Kroah-Hartman recently posted an article on this topic which makes it clear that Red Hat is not alone in backporting features into its stable kernels:
Similar things have been known to happen in the embedded world. In every case, the distributors are responding to two conflicting wishes expressed by their customers: those customers want stability, but they also want useful new features and support for new hardware. This conflict forces distributors to walk a fine line, carefully backporting just enough new stuff to keep their customers happy without breaking things.
The word from the summit is that this balancing act does not always work. There were stories of production systems falling over after updates were applied - to the point that some high-end users are starting to reconsider their use of Linux in some situations. It is hard to see how this problem can be fixed: the backporting of code is an inherently risky operation. No matter how well the backported code has been tested, it has not been tested in the older environment into which it has been transplanted. This code may depend on other, seemingly unrelated fixes which were merged at other times; all of those fixes must be picked up to do the backport properly. It is also not the same code which is found in current kernels; distributor-private changes will have to be made to get the backported code to work with the older kernel. Backporting code can only serve to destabilize it, often in obscure ways which do not come to light until some important customer attempts to put it into production.
All of this argues against the backporting of code into the stabilized kernels used in long-term-support distributions. But customer demand for features, and (especially) hardware support will not go away. In fact, it is likely to get worse. Quoting Greg again:
So, if one goes on the assumption that the Plan For World Domination includes moving Linux out of the server room onto a wider variety of systems, the pressure for additional hardware support in "stabilized" kernels can only grow.
What is to be done? Greg offers three approaches, the first two of which are business as usual and the elimination of backports. The disadvantages of the first option should have been made clear by now; going to a "bug fixes only" mode has its appeal, but the resulting kernels will look old and obsolete in a very short time. Greg's third option is one which your editor heard advocated by several people at the Collaboration summit: the long-term-support distributions would simply move to a current kernel every time they do a major update.
Such a change would have obvious advantages: all of the new features and new drivers would come automatically, with no need for backporting. Distributors could focus more on stabilizing the mainline, knowing that those fixes would get to their customers quickly. Many more bug fixes would get into kernel updates in general; no distributor can possibly hope to backport even a significant percentage of the fixes which get into the mainline. The attempt to graft an old support model better suited to proprietary systems would end, and long-term support Linux customers would get something that looks more like Linux.
Of course, there may be some disadvantages as well. Dave Jones has expressed some discomfort with this idea:
As Dave also notes, some mainline kernel releases are better than others; the current 2.6.21 kernel would probably not be welcomed in many stable environments. So any plan which involved upgrading to current kernels would have to give some thought to the problem of ensuring that those kernels are suitably stable.
Some of the key ideas to achieve that goal may already be in place. There was talk at the summit of getting the long-term support vendors to coordinate their release schedules to be able to take advantage of an occasional extra-stable kernel release cycle. It has often been suggested that the kernel could go to an even/odd cycle model, where even-numbered releases are done with stability as the primary goal. Such a cycle could work well for distributors; an odd release could be used in beta distribution releases, with the idea of fixing the resulting bugs for the following even release. The final distribution release (or update) would then use the resulting stable kernel. There is opposition to the even/odd idea, but that could change if the benefits become clear enough.
Both Greg and Dave consider the effects such a change would have on the
providers of binary-only modules. Greg thinks that staying closer to the
upstream would make life easier by reducing the number of kernel variants
that these vendors have to support. Dave, instead, thinks that binary-only
modules would break more often, and "This kind of breakage in an
update isn't acceptable for the people paying for those expensive support
contracts
". If the latter position proves true, it can be seen as
an illustration of the costs imposed on the process by proprietary modules.
Dave concludes with the thought that the status quo will not change anytime
soon. Certainly distribution vendors would have to spend a lot of time
thinking and talking with their customers before making such a fundamental
change in how their products are maintained. But the pressures for change
would appear to be strong, and customers may well conclude that they would
be better off staying closer to the mainline. Linux and free software have
forced many fundamental changes in how the industry operates; we may yet
have a better solution to the long-term support problem as well.