Kernel development [LWN.net]

Kernel release status

The current 2.6 prepatch is 2.6.22-rc5, released by Linus on June 16. It contains a long list of fixes - enough that Linus complains a bit about the amount of stuff which is still going in this late in the cycle. See the long-format changelog for the details.

A very small number of patches have gone into the mainline git repository since -rc5 was released.

There have been no -mm releases over the last week, and no releases of older kernel trees. Evidently everybody has been too busy "discussing" GPLv3.

Comments (none posted)

Quotes of the week

So, I've had enough. I'm out of here forever. I want to leave before I get so disgruntled that I end up using windows. I may play occasionally with userspace code but for me the kernel is a black hole that I don't want to enter the event horizon of again.

-- Con Kolivas

The moral of the story is that currently it just doesn't pay off to do code reviews. From personal POV it pays much more to wait until buggy patch hits the mainline and then fix the issues yourself (at least you will get some credit). To change this we should put more emphasis on the importance of code reviews by "rewarding" people investing their time into reviews and "rewarding" developers/maintainers taking reviews seriously.

-- Bartlomiej Zolnierkiewicz

Comments (2 posted)

More quotes of the week - scenes from a flame war

As it turns out, there is very little from the recent, 1000-message GPLv3 flame war that justified the expenditure of so many bits. For those who haven't gotten around to reading the whole thing, here's a few selections.

I think that the Open Source community (and the FSF too) is much better off *not* concentrating so much on "legal rules" of what can and cannot be done, and instead spend much more effort on showing people why the whole "Open Source" thing actually works. And in fact, I think that's _exactly_ what Linux has been doing for the last decade!

-- Linus Torvalds

But if by the question you mean "would you think the GPLv3 is fine without the new language in section 6 about the 'consumer devices'", then the answer is that yes, I think that the current GPLv3 draft looks fine apart from that.

-- Linus Torvalds

I don't see how you can claim that the vendor is infringing on your freedom, _you_ made the decision to go out and buy the product knowing that the vendor wasn't going to go out of their way to help you hack the device. In many cases the vendor doesn't even have the option (802.11b channels and certification come to mind, GSM, etc.) of opening things up to the end user, and making changes to the license isn't going to magically change any of this.

-- Paul Mundt

I see a lot more prohibitions than freedoms in what TiVo does. I don't understand why you'd stand up for it. Is it more important that a single company be allowed to impose prohibitions on others in order for its business model to work, than to maintain the spirit of hacking and sharing that enabled Free Software and Linux to flourish? Do you expect Linux would have flourished if computers had locks that stopped people from modifying Linux in them?

-- Alexandre Oliva

So instead of thinking of Tivo as something "evil", I think of Tivo as the working bee who will never pass on its genes, but it actually ended up helping the people who *do* pass on their genes: the kernel (to a small degree - not so much because of the patches themselves, as the *mindshare* in the PVR space) and projects like MythTV (again, not so much because of any patches, but because it helped grow peoples understanding of the problem space!).

-- Linus Torvalds

I believe RMS should accept the fact that most of that code was written without people having bought into his ideology, and he should accept _responsibility_ for the power he has acquired by genius or by accident (your choice) and he should try to _understand_ how those people tick - instead of trying to further his own personal agenda.

-- Ingo Molnar

I beg to differ. By adopting _his_ license you adopted his view. If you don't like that then choose a different license (which obviously you are free to do).

-- Michael Gerdau

The GPLv2 does not state that you have to become a slave of rms and follow him in all things, and agree with him. Really. You must have read some other (perhaps unreleased early draft?) version.

-- Linus Torvalds

What the fsck it is, linux-kernel or bleeding Council of Nikea?

-- Al Viro

Comments (29 posted)

btrfs and NILFS

Almost exactly one year ago, as the developers were discussing changes to the venerable ext3 filesystem, Andrew Morton was heard to say:

All that being said, Linux's filesystems are looking increasingly crufty and we are getting to the time where we would benefit from a greenfield start-a-new-one. That new one might even be based on reiser4 - has anyone looked? It's been sitting around for a couple of years.

Reiser4 looks like it may continue to sit around for a while yet. But that does not mean that there is no interest in the creation of interesting new filesystems. LogFS was discussed here in May, but it's not the only newcomer in the filesystem arena.

The most interesting new contender, perhaps, is btrfs, which was announced by Chris Mason on June 12. It is an entirely new filesystem intended for standard rotating storage with a number of interesting features. These include:

Btrfs is a fully extent-based filesystem, meaning that it can store large files far more efficiently than ext3 (the in-development ext4 filesystem has extent support). An extent-based filesystem does away with the long lists of pointers to the individual blocks contained within a file; instead, groups of contiguous blocks ("extents") are tracked together. The result is far less metadata overhead, especially with large files. For very small files, btrfs will store the file contents themselves within the extent structure, eliminating the need for a separate block allocation.
Filesystems can be split into "subvolumes," each of which has its own directory structure and disk quota. Subvolumes can be used to subdivide a btrfs filesystem, but there is another interesting use of them...
Btrfs can do snapshotting - freezing the state of the filesystem at any given time. Snapshots are just subvolumes; they become a separate, independent directory tree which can be navigated independently from the "live" filesystem. Interestingly, though, btrfs snapshots are also live, and can be modified after being taken and snapshotted as well.
Supporting subvolumes and snapshots forces a copy-on-write structure onto btrfs. If a given extent is written to, it will be copied and the new data written to the copy. Extents have reference counts; creating a snapshot, for example, will cause reference counts to be incremented. When an extent contained in both a snapshot and the "real" filesystem is modified, it will be copied for whatever subvolume is being changed but will remain in place, unchanged in the other. If the snapshot is eventually removed, all associated reference counts will be decremented and any unused extents will be reclaimed.
The subvolume and snapshot mechanism eliminates the need for a separate journaling feature. Changes to the filesystem can be made transactional simply by taking a snapshot which only lasts until the transaction completes.
This filesystem checksums everything - data and metadata both. As a result, it is able to detect many types of filesystem corruption on the fly.

Fast filesystem checking is also an important design goal for btrfs. The data and metadata are laid out in a way that allows the offline filesystem checker to read the disk in a nearly sequential manner. That should speed the process considerably; filesystem checking usually involves vast numbers of seek operations. Online filesystem checking is also in the plans, though it has not been implemented yet; once it is working, this feature could eliminate the need for separate, mount-time filesystem checks entirely.

This filesystem is in a very early state - not recommended for data which one might actually want to keep. There's not been a whole lot of benchmarking done, and, presumably, a lot of optimization work still to happen. For example, the entire filesystem is currently protected by a single mutex, a solution which is unlikely to perform well on those leading-edge 4096-processor systems. Little details - like not oopsing when the filesystem runs out of space, direct I/O, writing via mmap(), extended attributes, asynchronous I/O, and more - have yet to be taken care of. But btrfs has garnered a considerable amount of interest; if it lives up to its initial promise we could find ourselves using btrfs-based systems in the future.

(For more information, see the btrfs project page).

Another recently-announced filesystem is NILFS, which is now at version 2.0. NILFS is a log-structured filesystem, in that the storage medium is treated like a circular buffer and new blocks are always written to the end. These filesystems tend to do very well on benchmarks which measure write performance, since all writes go to a contiguous set of blocks; read performance is not always quite as good. Log-structured filesystems are often used for flash media since they will naturally perform wear-leveling; it would appear, however, that NILFS is not aimed at flash devices.

Instead, NILFS emphasizes snapshots. The log-structured approach is a specific form of copy-on-write behavior, so it naturally lends itself to the creation of filesystem snapshots. The NILFS developers talk about the creation of "continuous snapshots" which can be used to recover from user-initiated filesystem problems - those of the "rm -r" variety. NILFS claims scalability through 64-bit data structures, but, interestingly, support for the x86_64 architecture remains on the "TODO list." The filesystem does not yet have support for extents.

More information on NILFS can be found on nilfs.org.

Comments (8 posted)

Getting the message from the kernel

As a general rule, Linux users would rather not hear from their kernel. If all is well, devices are working, applications are running, and the kernel just quietly makes it all happen. When things go wrong, however, it may become necessary to dig through the messages that the kernel puts out. These messages sometimes make sense to the developers who created them, but they are not always clear to the rest of the world. Neal Stephenson, in his In the Beginning was the Command Line, describes Linux kernel messages as having "the semi-inscrutable menace of graffiti tags." For a kernel developer, often as not, the main value of a kernel message is to pinpoint the location of the complaining code - from which the real problem can be determined.

Non-developers have a harder time using kernel messages in that way, though, and people who are not native English speakers are at even more of a disadvantage. So it is not surprising that the topic of fixing up kernel messages has popped up occasionally. It's back, possibly in a more serious form this time around.

People who would reform kernel messages generally have two goals in mind:

They would like for every message to have a unique identifier attached to it. This idea brings back memories of VMS or most IBM operating systems, which have used message identifiers for decades. The main purpose behind message identifiers is to allow the system administrator (or the support person they have called) to look up the identifier in a manual and figure out what the message is really saying. Various legacy operating systems have come with message manuals which take up significant amounts of shelf space; they contain a (relatively) detailed explanation of the problem and suggestions for how to make the problem go away.
It is much easier to maintain translations for messages which have unique identifiers attached to them. A Linux system which could output messages in multiple languages would be more approachable for much of the potential user base.

The problem, of course, is that attaching identifiers to messages is a significant job. There are tens of thousands of printk() calls in the kernel; each of them would need to have an identifier assigned and the code changed. New messages are added - in large numbers - with every kernel release; it's easy to imagine that the overhead of putting identifiers onto all of those messages would irritate developers in a hurry. For these reasons, Linus has, in the past, rejected schemes aimed at improving kernel messaging.

The idea has come back anyway. A new approach has been proposed by users in Japan who are having trouble supporting Linux as well as they would like. In this scheme, every kernel message would be assigned a component name and a message number. The component would be a per-file define:

    #define KMSG_COMPONENT "railgun"

Then printk calls would be modified to include the message number:

    printk(KMSG_ERR(100) "Rail gun fired accidentally - sorry\n")

The end result would be a message prepended with the string "railgun.100:", enabling the message to be translated or looked up in a manual. To help ensure that there is a manual, the proposal requires kerneldoc-style documentation of messages within the source; something like:

    /**
     * message
     * @100: 
     *
     * Description:
     * The rail gun fired accidentally in the absence of a specific 
     * user request.  
     *
     * User Response:
     * Operator should be sure to stand to the side.
     */

The kerneldoc scripts would be upgraded to collect all of these message descriptions and turn them into a printable manual. Another tool would check source files and complain about messages which lack accompanying descriptions.

Schemes like this have been greeted with complaints in the past, and the same happened this time around. The overhead of documenting messages in this way is more than many developers want to take on; David Miller expressed this feeling well:

I think my general response to something like this, if it goes in, would be to stop emitting useful kernel log messages in the code I write because having to document it too on top of that is just too much extra work to be worthwhile.

Keeping the message descriptions current would also be a challenge - code is often changed without updating the neighboring comments; there is no reason to believe that message descriptions would get a higher level of attention.

Andrew Morton has come back with a counter proposal designed for easier developer acceptance. His scheme would add a new form of printk() which would take a message ID in some as-yet-undetermined format. That ID would be output with the message, but everything else - translations, descriptions, condolences, etc. - would be kept in a database outside of the kernel.

The key point is that developers would not be expected to do much of anything with this database - or even with their kernel messages. Instead, there would be a "kernel messages team" charged with maintaining this information. Occasionally somebody from that team would look over new code, add message IDs where needed, and send a patch to the maintainer. Unless they were personally interested in helping, developers would not have to worry about the new mechanism at all.

There are a few gaps in this proposal; how the kernel message team would be funded (or otherwise motivated) is one of them. But it may be sufficiently low-impact to be accepted by the rest of the development community. Someday soon, Linux users, too, may have to make room on their shelves for a hefty messages manual.

Comments (23 posted)

Linus Torvalds And now for something _totally_ different: Linux v2.6.22-rc5 ?

Geert Uytterhoeven PS3 Storage Drivers for 2.6.23, take 2 ?

Thomas Gleixner High resolution timer updates and x86_64 support - V2 ?

Tony Breeds clocksouce implementation for powerpc ?

John Blackwood [PATCH] selective signal ptracing ?

Ingo Molnar CFS scheduler, -v17 ?

Andy Whitcroft update checkpatch.pl to version 0.05 ?

David Wilder Generic Trace Setup and Control (GTSC) kernel API (1/3) ?

Keiichi KII proposal for dynamic configurable netconsole ?

Mathieu Desnoyers Linux Kernel Markers ?

Rodolfo Giometti LinuxPPS & syscalls support ?

Ayyappan.Veeraiyan@intel.com new driver ixgbe for Intel(R) 10GbE PCI Express adapters. ?

Rodolfo Giometti I2C: TSL2550 support. ?

Michael Buesch bcm43xx QoS support ?

Keshavamurthy, Anil S Intel IOMMU support, take #2 ?

Haavard Skinnemoen Atmel USBA UDC driver ?

Michael Kerrisk man-pages-2.56 is released ?

Michael Kerrisk man-pages-2.57 is released ?

Michal Piotrowski Linux Kernel Tester's Guide ?

holzheu Documentation of kernel messages ?

Jan Kara Quota netlink interface ?

Tejun Heo sysfs: make directory dentries/inodes reclaimable, take#2 ?

Josef 'Jeff' Sipek Unionfs cleanups, fixes, and mmap ?

Bharata B Rao New approach to VFS based union mount ?

amagai@osrg.net NILFS version 2 now available ?

Ian Kent autofs 5.0.2 release ?

Girish Shilamkar Updated patches for journal checksums. ?

clameter@sgi.com Page cache cleanup in anticipation of Large Blocksize support ?

clameter@sgi.com Large Blocksize Support V4 ?

Peter Zijlstra per device dirty throttling -v7 ?

Mel Gorman Memory Compaction v2 ?

Paul Mundt slob: poor man's NUMA support. ?

Patrick McHardy : Netlink link creation API + driver conversions ?

Patrick McHardy : rtnl_link support ?

Kieran Mansley [Net] Support accelerated network plugin modules ?

PJ Waskiewicz NET: Multiple queue hardware support ?

Kentaro Takeda TOMOYO Linux security module. ?

Mimi Zohar [RFC][Patch 0/3] integrity: Linux Integrity Module(LIM) and provider ?

Mimi Zohar IBAC Patch ?

Serge E. Hallyn containers: implement subsys->post_clone() ?

Vaidyanathan Srinivasan [RFC][PATCH 0/4] Containers: Pagecache accounting and control subsystem (v4) ?

Jeremy Fitzhardinge paravirt/subarchitecture boot protocol ?

Avi Kivity KVM updates for 2.6.23 ?

Anthony Liguori KVM paravirt_ops implementation ?

Rusty Russell Virtio draft III ?

Rusty Russell Lguest implemention of virtio draft III ?

Richard Purdie Add LZO1X algorithm to the kernel ?

Kernel development

Brief items

Kernel release status

Kernel development news

Quotes of the week

More quotes of the week - scenes from a flame war

btrfs and NILFS

Getting the message from the kernel

Patches and updates

Kernel trees

Architecture-specific

Core kernel code

Development tools

Device drivers

Documentation

Filesystems and block I/O

Memory management

Networking

Security-related

Virtualization and containers

Miscellaneous