printk() problems
One of the core design features of the new printk() is a change from byte-streamed output to record-oriented output. Current kernels can easily corrupt messages on their way to the log; for example, when the log buffer overflows, the kernel simply wraps around and partially overwrites older messages. Messages from multiple CPUs can also get confused, especially if one or more CPUs are using multiple printk() calls to output a single line of text. The switch to the record-oriented mechanism eliminates these problems; it also makes it possible to attach useful structured information to messages. As a whole, it looks like a solid improvement to the kernel logging subsystem.
There is just one little problem, though: when the kernel outputs a partial message (by passing a string to printk() that does not end with a newline), the logging system will buffer the text until the rest of the message arrives. The good news is that this buffering causes the full line to be output together once it's complete—if things go well. The situation when things do not go well was best summarized by Andrew Morton:
printk("testing the frobnozzle ...");
do_test();
printk(" OK\n");
and do_test() hangs up, we really really want the user to know that there was a frobnozzle testing problem. Please tell me this isn't broken.
Not only is this behavior now broken, but it has also burned at least one developer who ended up spending a lot of time trying to figure out why the kernel was hanging. Kernel developers depend heavily on printk(), so this change has caused a fair amount of concern.
Bugs happen, of course; the important thing is to fix them. A number of possible fixes have been discussed on the list, including:
- Leave printk() as it is, and change specific callers to
output only full lines. Kay Sievers, the author of the
printk() changes, suggested
that approach, saying "
We really should not optimize for cosmetics (full lines work reliably, they are not buffered) of self-tests, for the price of the reliability and integrity of all other users.
" - Adding a printk_flush() function to be called in places where
it is important to see partial lines even if things go wrong before
the newline character is printed. The problem with this approach is
that, like printing full lines only, it requires changing every place
in the code where the problem might hit. Experience says that many of
those places can only be found the hard way.
- Add a global knob by which buffering can be turned on or off; this
knob might be set by either user space or the kernel. This idea was
not particularly popular; it seems unlikely that the knob will be set
for unbuffered output when it really matters.
- Simply revert the printk() changes for 3.5 and try again for 3.6 or later. Ingo Molnar posted a patch to this effect, seemingly as a way of pressuring Kay to take the problem more seriously.
As of this writing, most of the discussion centers around this patch from Steven Rostedt which simply removes the buffering from printk(). For the most part, the advantages of the new code remain. But it is now possible that a single line of output created with multiple printk() calls may be split into multiple lines, with messages from other CPUs mixed in between. It seems to many to be a reasonable compromise fix.
Except that Kay still doesn't like the splitting
of continuation lines. Andrew Morton is also concerned about where the printk()
code is going, saying "The core printk code is starting to make one
think of things like 'cleansing with fire'.
" Steven, meanwhile, is
reconsidering the whole thing, saying that,
perhaps, printk() is not the right tool for structured logging and
other approaches should be considered. And Greg Kroah-Hartman has suggested that it might be better just to fix
the call sites rather than further complicating the printk() code.
Linus, however, has argued strongly for the
merging of Steven's patch. His view is that buffering at the logging level
is fine, but text emitted with printk() has to get to the console
immediately. So chances are that some version of Steven's fix will be
applied for the 3.5 release. But it has become clear, again, that
adding structured logging to the kernel while not making life harder for
kernel developers is a difficult problem.
| Index entries for this article | |
|---|---|
| Kernel | Kernel messages |
| Kernel | Messages |