[go: up one dir, main page]

|
|
Log in / Subscribe / Register

Kernel development

Brief items

Kernel release status

The current 2.6 kernel is 2.6.7, which was announced by Linus on June 15. Changes since the last release candidate include a fix for the latest denial of service vulnerability (see below), an NTFS update, some more CPU frequency controller work, and lots of fixes. The biggest changes since 2.6.6 include scheduling domains, a big rework of the reverse-mapping VM code, filtered waitqueues, the removal of the InterMezzo filesystem, quota and extended attribute support in reiserfs, a new API for NUMA systems, the removal of IDE tagged command queueing support, and the usual pile of fixes. See the long-format changelog for the details.

Linus's BitKeeper repository contains no patches beyond 2.6.7 as of this writing.

The current tree from Andrew Morton is 2.6.7-rc3-mm2. Recent additions to -mm include ext3 resizing support (see below), a O_NOATIME option to open(), and various fixes.

The current 2.4 prepatch is 2.4.27-pre6, which was released on June 15. It includes the FPU denial of service fix, of course, along with some architecture updates, DVD-RW write support, and a fair number of fixes.

Comments (2 posted)

Kernel development news

Quote of the week

This is all part of what responsible release management is about. I was the junior whiz kid in professional release management teams before starting Namesys. I listened to my elders and learned from them. My standards for professional conduct in this arena are higher than yours as a result of that. You are a bunch of young kids who lack professional experience in release management. That is ok, but don't get aggressive about it.

-- Hans Reiser

Comments (11 posted)

A nasty FPU bug

The problem was initially reported as a gcc bug. If you execute this code:

    static void Handler(int ignore)
    {
	char fpubuf[108];
	__asm__ __volatile__ ("fsave %0\n" : : "m"(fpubuf));
	__asm__ __volatile__ ("frstor %0\n" : : "m"(fpubuf));
    }

in a signal handler, the system (or, at least, the CPU that was running the code) will freeze up hard. Ways of locking up the system from an unprivileged user-space program are generally considered to be bad news; they also, in general, are not seen as compiler bugs. A bit of digging turned up the real problem, and the latest kernel denial of service vulnerability was found.

In theory, the fsave instruction above saves the floating-point unit (FPU) status into the fpubuf array; the subsequent frstor should simply restore the same state back into the FPU. Unfortunately, the above code is incorrect; the assembly instructions should read "m"(*fpubuf) to actually store the state into the fpubuf array. The code, as written, restores from the wrong address, corrupting the state of the FPU and, in particular, setting some exception flags.

FPU exceptions do not result in immediate kernel traps; instead, the trap happens when the next floating-point command is executed. As it happens, the kernel checks when a signal handler returns and, if that handler has used any floating-point instructions, the kernel performs an fwait instruction to ensure that the last operation is complete. That fwait causes the floating point exception caused by the corrupt restore to be delivered as a kernel trap.

The kernel has a way of dealing with floating point traps; it saves the FPU state and queues up a floating point exception signal for the current process. It also sets the TS ("task switched") processor flag to indicate that the FPU state may be other than expected. At that point, it returns to the place where the exception occurred.

Normally, as part of returning from the trap, the kernel would simply deliver the floating-point exception signal to user space and get on with life. But, in this case, the kernel is returning back to kernel space, and back to the same fwait instruction that caused the problem in the first place. That instruction sees the TS flag and generates another trap. The handler for this trap knows just what to do in response to a TS flag; it restores the saved FPU state and returns. The saved FPU state is, however, the corrupted state which was in effect before the first attempt to execute fwait. So, at this point, the loop is closed and a new floating-point trap will be generated. This will go on for a while.

The fix is relatively straightforward, once the problem is understood. The kernel simply clears any pending exceptions before executing fwait, and the problem goes away. All that is left is the updating and rebooting of large numbers of vulnerable systems.

(Thanks to Sergey Vlasov, whose analysis of the problem made this article much easier to write.)

Comments (9 posted)

Online resizing of ext3 filesystems

One of the patches which slipped into 2.6.7-rc3-mm2 is one by Andreas Dilger and others which makes it possible to resize a running ext3 filesystem on the fly. This patch has been shipped with Fedora kernels for a little while, but has not seen a lot of wider use. That could change, of course, if the resize patch finds its way into the mainline.

The resize patch is conceptually quite simple. It simply adds one or more block groups which make use of extra space which, one hopes, is sitting there idle at the end of the existing filesystem. Once the block groups are hooked into the filesystem data structures, a simple ioctl() call or remount will make the space available. Behind this apparent simplicity, of course, is a significant amount of code which makes the resize operation happen on a modern, complex filesystem in a robust manner.

People wanting to try out resizing will need a few things:

  • A kernel (such as 2.6.7-rc3-mm2) with the online resize patch included.

  • A patch to e2fsprogs to make use of the resize capability; it is available from the ext2resize SourceForge download area.

  • Free disk space into which the filesystem can expand. Usually this means that the filesystem should live in a device mapper partition which can be expanded as well.

  • A very good backup of your filesystem.

This patch and its associated documentation (or lack thereof) still require some work before being ready for widespread deployment. Once they get there, however, life should get easier for system administrators who, throughout history, have routinely found out that all that "extra space" they figured into their filesystems is never enough.

Comments (2 posted)

On the alignment of IP packets

Device drivers for network interfaces must allocate a "socket buffer" ("skb") for each incoming packet. A standard idiom in the skb allocation code is a line like this:

    skb_reserve(skb, 2);

This call tells the socket buffer code to set aside the first two bytes of the data buffer. The reason why this is done can be seen by looking at the resulting layout of an IP packet in the buffer:

[Packet header layout]

The network stack makes frequent use of the IP addresses stored in the packet. By padding the beginning of an ethernet-style packet by two bytes, a network driver can cause those addresses to be aligned on a four-byte boundary. On some architectures, at least, that alignment will speed access to the addresses and make the networking system faster.

Or so it might seem. As Anton Blanchard recently figured out, this padding is not always helpful. A number of modern architectures (Anton works with PPC64, but Intel-style architectures qualify too) have no real problem with unaligned memory accesses, so the two-byte offset on IP packets does not necessarily help things. Unfortunately, the DMA engines in a number of systems do have trouble working with unaligned addresses. A padded packet buffer does not start on an aligned address, with the result that DMA operations to that buffer can be slower than they should be. As network adapters get faster, the DMA performance penalty becomes increasingly significant.

Anton's proposal was to change the skb_reserve() calls into calls to a new skb_align() function, which could, depending on the architecture, decide whether to insert the padding or not. David Miller pointed out, however, that the magic constant "2" appears in quite a few places, and simply removing the padding could create bugs elsewhere in the driver code.

The real solution is likely to be the addition of a defined constant called something like NET_IP_ALIGN; this constant would be the amount of padding needed for packet alignment on the current architecture. Yes, things probably should have been done that way from the beginning, but life is like that. In any case, once the constant is in, each individual driver can be looked over and fixed up as need be. And one small obstacle to top performance on high-end network adapters will have been removed.

Comments (4 posted)

Patches and updates

Kernel trees

Linus Torvalds Linux 2.6.7 ?
Con Kolivas 2.6.7-ck1 ?
Andrew Morton 2.6.7-rc3-mm2 ?
Andrew Rodland -ar patchset ?
Marcelo Tosatti Linux 2.4.27-pre6 ?

Architecture-specific

Build system

Core kernel code

Development tools

Takao Indoh Diskdump Update ?
David Howells R/W semaphore tester module ?

Device drivers

Documentation

Filesystems and block I/O

Memory management

Networking

Security-related

Miscellaneous

christophe.varoqui@free.fr multipath-tools-0.2.3 ?

Page editor: Jonathan Corbet
Next page: Distributions>>


Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds