Kernel development
Brief items
Kernel release status
The current 2.6 prepatch is 2.6.22-rc5, released by Linus on June 16. It contains a long list of fixes - enough that Linus complains a bit about the amount of stuff which is still going in this late in the cycle. See the long-format changelog for the details.A very small number of patches have gone into the mainline git repository since -rc5 was released.
There have been no -mm releases over the last week, and no releases of older kernel trees. Evidently everybody has been too busy "discussing" GPLv3.
Kernel development news
Quotes of the week
More quotes of the week - scenes from a flame war
As it turns out, there is very little from the recent, 1000-message GPLv3 flame war that justified the expenditure of so many bits. For those who haven't gotten around to reading the whole thing, here's a few selections.
btrfs and NILFS
Almost exactly one year ago, as the developers were discussing changes to the venerable ext3 filesystem, Andrew Morton was heard to say:
Reiser4 looks like it may continue to sit around for a while yet. But that does not mean that there is no interest in the creation of interesting new filesystems. LogFS was discussed here in May, but it's not the only newcomer in the filesystem arena.
The most interesting new contender, perhaps, is btrfs, which was announced by Chris Mason on June 12. It is an entirely new filesystem intended for standard rotating storage with a number of interesting features. These include:
- Btrfs is a fully extent-based filesystem, meaning that it can store
large files far more efficiently than ext3 (the in-development ext4
filesystem has extent support). An extent-based filesystem does away
with the long lists of pointers to the individual blocks contained
within a file; instead, groups of contiguous blocks ("extents") are
tracked together. The result is far less metadata overhead,
especially with large files. For very small files, btrfs will store
the file contents themselves within the extent structure, eliminating
the need for a separate block allocation.
- Filesystems can be split into "subvolumes," each of which has its own
directory structure and disk quota. Subvolumes can be used to
subdivide a btrfs filesystem, but there is another interesting use of
them...
- Btrfs can do snapshotting - freezing the state of the filesystem at
any given time. Snapshots are just subvolumes; they become a
separate, independent directory tree which can be navigated
independently from the "live" filesystem. Interestingly, though,
btrfs snapshots are also live, and can be modified after being taken
and snapshotted as well.
- Supporting subvolumes and snapshots forces a copy-on-write structure
onto btrfs. If a given extent is written to, it will be copied and
the new data written to the copy. Extents have reference counts;
creating a snapshot, for example, will cause reference counts to be
incremented. When an extent contained in both a snapshot and the "real"
filesystem is modified, it will be copied for whatever subvolume is
being changed but will remain in place, unchanged in the other. If
the snapshot is eventually removed, all associated reference counts
will be decremented and any unused extents will be reclaimed.
- The subvolume and snapshot mechanism eliminates the need for a
separate journaling feature. Changes to the filesystem can be made
transactional simply by taking a snapshot which only lasts until the
transaction completes.
- This filesystem checksums everything - data and metadata both. As a result, it is able to detect many types of filesystem corruption on the fly.
Fast filesystem checking is also an important design goal for btrfs. The data and metadata are laid out in a way that allows the offline filesystem checker to read the disk in a nearly sequential manner. That should speed the process considerably; filesystem checking usually involves vast numbers of seek operations. Online filesystem checking is also in the plans, though it has not been implemented yet; once it is working, this feature could eliminate the need for separate, mount-time filesystem checks entirely.
This filesystem is in a very early state - not recommended for data which one might actually want to keep. There's not been a whole lot of benchmarking done, and, presumably, a lot of optimization work still to happen. For example, the entire filesystem is currently protected by a single mutex, a solution which is unlikely to perform well on those leading-edge 4096-processor systems. Little details - like not oopsing when the filesystem runs out of space, direct I/O, writing via mmap(), extended attributes, asynchronous I/O, and more - have yet to be taken care of. But btrfs has garnered a considerable amount of interest; if it lives up to its initial promise we could find ourselves using btrfs-based systems in the future.
(For more information, see the btrfs project page).
Another recently-announced filesystem is NILFS, which is now at version 2.0. NILFS is a log-structured filesystem, in that the storage medium is treated like a circular buffer and new blocks are always written to the end. These filesystems tend to do very well on benchmarks which measure write performance, since all writes go to a contiguous set of blocks; read performance is not always quite as good. Log-structured filesystems are often used for flash media since they will naturally perform wear-leveling; it would appear, however, that NILFS is not aimed at flash devices.
Instead, NILFS emphasizes snapshots. The log-structured approach is a specific form of copy-on-write behavior, so it naturally lends itself to the creation of filesystem snapshots. The NILFS developers talk about the creation of "continuous snapshots" which can be used to recover from user-initiated filesystem problems - those of the "rm -r" variety. NILFS claims scalability through 64-bit data structures, but, interestingly, support for the x86_64 architecture remains on the "TODO list." The filesystem does not yet have support for extents.
More information on NILFS can be found on nilfs.org.
Getting the message from the kernel
As a general rule, Linux users would rather not hear from their kernel. If all is well, devices are working, applications are running, and the kernel just quietly makes it all happen. When things go wrong, however, it may become necessary to dig through the messages that the kernel puts out. These messages sometimes make sense to the developers who created them, but they are not always clear to the rest of the world. Neal Stephenson, in his In the Beginning was the Command Line, describes Linux kernel messages as having "the semi-inscrutable menace of graffiti tags." For a kernel developer, often as not, the main value of a kernel message is to pinpoint the location of the complaining code - from which the real problem can be determined.
Non-developers have a harder time using kernel messages in that way, though, and people who are not native English speakers are at even more of a disadvantage. So it is not surprising that the topic of fixing up kernel messages has popped up occasionally. It's back, possibly in a more serious form this time around.
People who would reform kernel messages generally have two goals in mind:
- They would like for every message to have a unique identifier attached
to it. This idea brings back memories of VMS or most IBM operating
systems, which have used message identifiers for decades. The main
purpose behind message identifiers is to allow the system
administrator (or the support person they have called) to look up the
identifier in a manual and figure out what the message is really
saying. Various legacy operating systems have come with message
manuals which take up significant amounts of shelf space; they contain
a (relatively) detailed explanation of the problem and suggestions for
how to make the problem go away.
- It is much easier to maintain translations for messages which have unique identifiers attached to them. A Linux system which could output messages in multiple languages would be more approachable for much of the potential user base.
The problem, of course, is that attaching identifiers to messages is a significant job. There are tens of thousands of printk() calls in the kernel; each of them would need to have an identifier assigned and the code changed. New messages are added - in large numbers - with every kernel release; it's easy to imagine that the overhead of putting identifiers onto all of those messages would irritate developers in a hurry. For these reasons, Linus has, in the past, rejected schemes aimed at improving kernel messaging.
The idea has come back anyway. A new approach has been proposed by users in Japan who are having trouble supporting Linux as well as they would like. In this scheme, every kernel message would be assigned a component name and a message number. The component would be a per-file define:
#define KMSG_COMPONENT "railgun"
Then printk calls would be modified to include the message number:
printk(KMSG_ERR(100) "Rail gun fired accidentally - sorry\n")
The end result would be a message prepended with the string "railgun.100:", enabling the message to be translated or looked up in a manual. To help ensure that there is a manual, the proposal requires kerneldoc-style documentation of messages within the source; something like:
/**
* message
* @100:
*
* Description:
* The rail gun fired accidentally in the absence of a specific
* user request.
*
* User Response:
* Operator should be sure to stand to the side.
*/
The kerneldoc scripts would be upgraded to collect all of these message descriptions and turn them into a printable manual. Another tool would check source files and complain about messages which lack accompanying descriptions.
Schemes like this have been greeted with complaints in the past, and the same happened this time around. The overhead of documenting messages in this way is more than many developers want to take on; David Miller expressed this feeling well:
Keeping the message descriptions current would also be a challenge - code is often changed without updating the neighboring comments; there is no reason to believe that message descriptions would get a higher level of attention.
Andrew Morton has come back with a counter proposal designed for easier developer acceptance. His scheme would add a new form of printk() which would take a message ID in some as-yet-undetermined format. That ID would be output with the message, but everything else - translations, descriptions, condolences, etc. - would be kept in a database outside of the kernel.
The key point is that developers would not be expected to do much of anything with this database - or even with their kernel messages. Instead, there would be a "kernel messages team" charged with maintaining this information. Occasionally somebody from that team would look over new code, add message IDs where needed, and send a patch to the maintainer. Unless they were personally interested in helping, developers would not have to worry about the new mechanism at all.
There are a few gaps in this proposal; how the kernel message team would be funded (or otherwise motivated) is one of them. But it may be sufficiently low-impact to be accepted by the rest of the development community. Someday soon, Linux users, too, may have to make room on their shelves for a hefty messages manual.
Patches and updates
Kernel trees
Architecture-specific
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Memory management
Networking
Security-related
Virtualization and containers
Miscellaneous
Page editor: Jonathan Corbet
Next page:
Distributions>>