Two sessions on review

By Jonathan Corbet
August 20, 2014

Kernel Summit 2014

Like almost every other free software project out there, the kernel has a fundamental problem: not enough reviewers. Some areas of development need more review than others; at the top of the list must be the creation of binary interfaces for user space, since those interfaces must be maintained for a very long time. But the problem goes beyond ABI definition. Two sessions at the 2014 Kernel Summit looked at the issue of review and how it can be improved.

ABI changes

The first session, led by Michael Kerrisk and Andy Lutomirski, was concerned with the ABI issue in particular. Michael started by saying that, whenever he gets around to testing a new system call, he finds a bug 50% of the time. Christoph Hellwig remarked that the 50% number meant he wasn't looking hard enough. The point being made was that there is clearly not enough review and testing of code that is going out the door in a stable release. Indeed, sometimes there has been no testing at all. The recvmmsg() system call, Michael said, was implemented with a timeout value that cannot possibly have done anything sensible in the initial release.

Sometimes we change ABIs as well, often without meaning to. For example, with the inotify interface, the IN_ONESHOT option in early kernels did not trigger the IN_IGNORED option; in later kernels that behavior has changed.

There is, he said, no specification for new ABIs, a fact which makes them hard to review and test. The lack of a specification also leads to subtle implementation problems. Again citing inotify as an example, Michael talked about the difficulties involved in tracking files that move between directories; see this article for details. There are no man pages for most new system calls and not enough reviewers, with the result that questionable designs get through. The O_TMPFILE option serves as a good example, Michael said; beyond its other problems, there should also have been consideration given to implementing that functionality as a separate system call.

Andy added that a specification is good, but a unit test for a new ABI is also a good thing to have. At this point, Peter Zijlstra asked if maybe the Linux Test Project would be a better place than the kernel tree for unit tests. But LTP is concerned with testing more than just system calls, and many kernel developers see the LTP system as a big extra thing that they would have to install.

Ted Ts'o observed that developers must have tests for the new features they have. Otherwise they would not be exercising the expected level of diligence in their work. Dave Airlie added that it would be good to have those tests in the kernel tree. He also suggested that, perhaps, the community should insist on the existence of a man page for any new system call before it could be added to the mainline. Michael responded that this has been tried before, without much success. But, it was pointed out, all four new system calls added in 3.17 came with manual pages.

Ben Herrenschmidt pointed out that system calls are just the tip of the iceberg. The kernel's ABI includes other aspects, such as ioctl() calls, sysfs, netlink, and more.

A topic that came up repeatedly is that any patch changing the kernel ABI should be copied to the linux-api mailing list. Perhaps, it was suggested, it should be the subsystem maintainers' duty to ensure that this copying happens when needed. Josh Triplett suggested that the get_maintainer script could be modified to make that happen, but that idea was not received all that warmly. That script tends to add lots of unrelated recipients to patch mailings and is not universally popular among kernel developers.

Peter Anvin made the claim that the linux-api list simply is not working. Perhaps it would be better, he said, to just merge the man pages into the kernel tree so that the code and documentation could be patched together. Michael responded that this idea has come up before. There are advantages and disadvantages to it. On the down side, there is a lot of material in the man pages that does not describe the kernel interface. The man pages are documentation for application developers, not for kernel developers, so they cover a lot of glibc interfaces, wrappers around system calls, and more.

The discussion wound down with some repeated suggestions that maintainers should insist on man pages before accepting new system calls into the kernel. The plea to copy the linux-api list for changes affecting the kernel ABI was also repeated. For a while, at least, kernel developers may try to do better, but no true solution to the problem was found at this session.

Getting more reviewers

James Bottomley stepped up to run a more general discussion on the problem of patch review. How, he asked, can we increase the amount of review that code going into the mainline gets? Are there any new ideas out there on how the kernel's review process can be improved? No answers resulted, but the session did cover some of the mechanics of how review is performed.

Peter Zijlstra said that he has been getting patches with bogus Reviewed-by tags. "Bogus," in this case, means that no in-depth review of the code has actually happened. Often, these tags come from other developers working for the same company as the patch author, but not always. James said that he ignores Reviewed-by tags unless they are offered in an email that has substantive comments in it.

Darren Hart said, though, that these tags can be the result of internal review that happens before patches are posted on a public list. In some companies, at least, this review process is serious and rigorous; it makes sense to document those reviews in the patch. Dave asked why the process has to be done internally; why not expose the whole process on the public lists? Darren answered that, with almost any project, it is natural to go for "small-group consensus" before facing the wider world.

James added that he is often suspicious of same-vendor reviews, but they are not necessarily invalid. It is really a matter of how much one trusts the specific reviewers involved.

He went on to ask a general process question: how much can a patch change before a Reviewed-by tag needs to be reviewed? A white-space change probably should not bring about a need for a new review, but substantial changes to how the patch works should. There were some differences of opinion in the room about just where the line should be drawn; in the end it seems to come down to common sense and the subsystem maintainer's opinion on the matter.

The session ended with Linus jumping in and saying that, in the end, the Reviewed-by, Acked-by, and Cc tags all mean the same thing: the person named in the tag will be copied on the report if the patch turns out to be buggy. Some developers use one tag, while others use a different one, but there is no real difference between them. The session closed with some general disagreement over the meanings of the different tags — and no new ideas on how to get more review of kernel code.

Next: One year of Coverity work.

Index entries for this article
Kernel	Development model/Code review
Conference	Kernel Summit/2014

Two sessions on review

Posted Aug 22, 2014 1:19 UTC (Fri) by kjp (guest, #39639) [Link]

I'm a bit shocked that the man pages aren't in the kernel tree. Seeing how easy it is these days to write them in asciidoc, and how important they are to the ABI, and you want them versioned with the kernel...

Two sessions on review

Posted Sep 2, 2014 20:44 UTC (Tue) by tomgj (guest, #50537) [Link] (1 responses)

I am often left wondering what articles such as this mean by "ABI", and whether there is any meaningful distinction intended by the author from the concept commonly known as "API".

Jonathan: I'd be interested to hear you expand a bit on what you intend to convey by the term, but I'll also note what I (used to) understand by it.

I was brought up with the following distinction, which I found a useful one. Consider a program which consists of source code, and is built into an executable image for a platform using some build tools. The program's source consumes whatever APIs it consumes. The build tools consume the ABI of the platform for which the program is built.

A couple of useful results stem from this. One is that if you need to run the program on a different platform, you do not need to alter the program's source code, you just need to change the target platform for which it is built. If you have access to the source, a simple rebuild adapts it to a different ABI.

What I see in articles such as this is the term "ABI" being used in a way indistinguishable from "API". The meaning of the IN_ONESHOT option, or ioctl() calls, sysfs, netlink, etc, are all relevant to the program's source code, not the way it's compiled / built. The semantics of these interface elements are consumed in the program's source, not by the build tools. If the definition of one of these interface elements "changes", it is the program's source which will need to be modified; the program image can not simply be rebuilt from the same source to fix the issue.

I don't really want to quibble if people want to use the term "ABI" in this way, since I am not one to argue definitions, but I am concerned we are losing a useful distinction. So does anyone have a suggestion for a term to unambiguously refer to the useful concept described earlier: that set of interface elements consumed in the process of building a program's source code into an executable program image for a particular platform (so calling conventions, image layout, etc, but not anything visible from within the program's own source)?

API v. ABI

Posted Sep 2, 2014 21:02 UTC (Tue) by corbet (editor, #1) [Link]

I'm probably overly sloppy on this score, but I try to be consistent. An ABI is the interface used by a binary program (not the build tools). If you want compatibility, that ABI must be held steady so that binary programs do not break; that encompasses both calling conventions and semantics. That is the kind of "ABI stability" that the kernel tries for.

An API is a source-level concept. Again, it covers both calling conventions and semantics; consider the POSIX API as an example. If you get the same results on a new system after rebuilding, the API is compatible. So, in my mind, the setjmp() change in glibc that broke things on s390 systems was an ABI break, but not an API break.

Perhaps others disagree, but that's always been how I've seen it...