Seccomp filters: No clear path
Patches to expand the functionality of seccomp ("secure computing") have been floating around for two years or more without making any real progress into the mainline. There are a number of projects that are interested in using an expanded seccomp, but the patches themselves seem to have run into a "catch-22" situation. There are conflicting visions of how the feature should be added, without a clear sense that any of the options will be acceptable to all of the maintainers involved. That leaves a useful feature without a clear path into the kernel, which is undoubtedly frustrating to some.
We first looked at seccomp sandboxing a little over two years ago, when Adam Langley posted patches that would provide a way for a process to restrict the system calls that it (and its children) could make. The idea is to allow processes to sandbox themselves by choosing which system calls are available, rather than being restricted to just the four hard-coded system calls that the existing seccomp implementation allows (read(), write(), exit(), and sigreturn()). The impetus behind Langley's patches was to provide an easier mechanism for sandboxing processes in the Chromium web browser—and to eventually remove the somewhat convoluted sandbox that Chromium currently uses on Linux.
At the time of that proposal, Ingo Molnar suggested that Ftrace-style filtering would make the expanded seccomp much more useful. That idea wasn't universally hailed at the time, and the seccomp feature went mostly dormant until it was restarted by Will Drewry back in April. Drewry took Molnar's suggestions and implemented a version of seccomp that would allow system calls to be enabled, disabled, or filtered with simple boolean expressions (e.g. sys_read: (fd == 0)).
While Molnar was pleased with the progress, he didn't think it went far enough and suggested that a perf-like interface be used instead of prctl(), which is used by the existing seccomp. He had some fairly wide-ranging ideas that using perf events in a more active way could lead to better kernel security solutions than the existing Linux Security Modules (LSM) approach provides. Once again, this idea was not universally popular. The LSM developers, in particular, were not enamored by that idea.
Nevertheless, Drewry implemented
a proof of concept along the lines of what Molnar had suggested. That
led to complaints from a somewhat
surprising direction, as both Peter Zijlstra and Thomas Gleixner strongly
objected to perf being used in an active role. Their responses didn't
leave room for any
middle ground, with Zijlstra, who is one of the perf maintainers along with
Molnar, saying that he and Gleixner would
NAK "any and all patches that extend perf/ftrace beyond the passive observing role
".
All of which led Drewry, who must be feeling a bit whipsawed at this point, to return to the patchset that seemed to have the most support: using Ftrace/perf-style filters, but maintaining the prctl() interface that is currently used by seccomp. Linus Torvalds had expressed some skepticism that the feature would have any real users, but Drewry outlined how it would be used by Chromium, and several other developers spoke up in favor of expanding seccomp, saying that QEMU, Linux containers (LXC), and others would use the feature. Those endorsements, along with resolving some other technical concerns, was enough for Torvalds to remove his objection to the feature. But, as might be guessed, Molnar is still not satisfied with the approach.
When Drewry reposted the patchset toward
the end of June, and asked what the next
steps were, Molnar noted that his concerns
were not being addressed: "You are pushing the 'filter engine' approach currently, not the
(much) more unified 'event filters' approach.
" But Drewry is trying
to find a balance between the needs of the potential users, other
maintainers, and Molnar's requests, which is somewhere between difficult
and impossible:
But Molnar is adamant that the "filter engine" approach is short-sighted, citing the diffstats of the various implementations as evidence:
bitmask (2009): 6 files changed, 194 insertions(+), 22 deletions(-)
filter engine (2010): 18 files changed, 1100 insertions(+), 21 deletions(-)
event filters (2011): 5 files changed, 82 insertions(+), 16 deletions(-)
are pretty hollow arguments to me. That diffstat sums up my argument
of proper structure pretty well.
But, as Drewry points out, there is still a
lot of work to be done to get beyond the proof-of-concept and to a fully
fleshed-out solution. Given that the approach has already received several
NAKs, doing all of that work has a very uncertain future. Drewry would
like to see the feature be available soon, and is concerned that working on
the larger problem is likely to delay that significantly, if it can ever
get beyond the objections: "If all the other work is a prerequisite
for system call restriction, I'll be very lucky to see anything this
calendar year assuming I can even write the patches in that time.
"
Molnar is undeterred, however, suggesting that there is a path into the kernel through the tree that he co-maintains:
The problem, of course, is that the 5% is the piece that Drewry and others are most interested in seeing (i.e. the system call restrictions for sandboxing) in the kernel. So, what Molnar seems to be offering is a fairly sizable chunk of work that could, in the end, still leave the "interesting" part out in the cold. Molnar may be confident that he can overcome the objections from Zijlstra and Gleixner, but Drewry can hardly be as sanguine. He describes the problem as he sees it:
Both Zijlstra and Gleixner have been absent from the most recent discussion, so it's a little hard to guess what their thoughts are. In the absence of any kind of posting softening their stances, though, it would be a bad idea to believe that they have changed their minds.
It's a problem that we have seen before, where a new feature is, to some extent, held hostage to requests that a larger problem be solved. The problem was discussed at the 2009 Kernel Summit, where there was agreement that those requests should be advisory in nature, rather than demands. In this case, Molnar is not really demanding that the bigger task be done, just that he is uninterested in taking the code via the -tip tree unless it solves the larger problem.
It is unclear where things go from here. Drewry said that he would look at
trying to do things Molnar's way ("but if my only chance of any form of this being
ACK'd is to write it such that it shares code with perf and has a
shiny new ABI, then I'll queue up the work for when I can start trying
to tackle it
"), but it may be a ways off. In the meantime, there
are various projects interested in using the feature.
If falling back to the bitmask version of the feature solves enough of the problem for those projects, there is the possibility of trying to get that into the kernel via another tree (e.g. the security tree). There would undoubtedly be objections from Molnar, but if enough users lined up behind it, that might be a reasonable approach. It would create an ABI that would need to be maintained going forward, which is one of Molnar's objections, but it would solve problems for Chromium and others.
Steven Rostedt suggested adding the seccomp
expansion as a
discussion item for the Kernel Summit in October, which might provide a
path forward. It's likely that most or all of the interested parties will
be there (unlike the Linux Security Summit that will be held with Plumbers
in September, which was
suggested as an alternative). While a face-to-face discussion could be
helpful, it might be a stretch to believe that the disagreement between active
vs. passive perf could be resolved that way. On the other hand, it could
lead to some kind of decree about the proper direction from
Torvalds. That could go a long way toward resolving the issue.
| Index entries for this article | |
|---|---|
| Kernel | Security/seccomp |
| Security | Linux kernel |