The inherent fragility of seccomp()
seccomp() allows the establishment of a filter that will restrict the set of system calls available to a process. It has obvious uses related to sandboxing; if an application has no need for, say, the open() system call, blocking access to that call entirely can reduce the damage that can be caused if that application is compromised. As more systems and programs are subjected to hardening, the use of seccomp() can be expected to continue to increase.
Michael Kerrisk recently reported that upgrading to glibc 2.26 broke one of his demonstration applications. That program was using seccomp() to block access to the open() system call. The problem that he ran into comes down to the fact that applications almost never invoke system calls directly; instead, they call wrappers that have been defined by the C library.
The glibc open() wrapper has, since the beginning, been a wrapper around the kernel's open() system call. But open() is an old interface that has long been superseded by openat(). The older call still exists because applications expect it to be there, but it is implemented as a special case of openat() within the kernel itself. In glibc 2.26, the open() wrapper was changed to call openat() instead. This change was not visible to ordinary applications, but it will break seccomp() filters that behave differently for open() and openat().
Kerrisk was not really complaining about the change, but he did want to
inform the glibc developers that there were user-visible effects from
it: "I want to raise awareness that these sorts of changes have the
potential to possibly cause breakages for some code using seccomp, and note
that I think such changes should not be made lightly or
gratuitously
". The developers should, he was suggesting, keep the
possibility of breaking seccomp() filters in mind when making
changes, and they should document such changes when they cannot be avoided.
Florian Weimer, however, disagreed:
Another way of putting this might be: seccomp() filters are not considered to be a part of the ABI that is provided by glibc, so incompatible changes there are not considered regressions. They are, instead, a consequence of filtering below the glibc level while expecting behavior above that level to remain unchanged.
Weimer's point of view would appear to be the one that will govern glibc development going forward. So Kerrisk has proposed some man-page changes to make the fragility of seccomp() filters a bit less surprising to developers. Playing the game at this level will require a fairly deep understanding of what is going on and the ability to adapt to future C-library changes.
This outcome could be seen as an argument in favor of a filtering interface like OpenBSD's pledge(). Like seccomp(), pledge() is used to limit the set of system calls available to a process, but pledge() is defined in terms of broad swathes of functionality rather than working at the level of individual system calls. It can be used to allow basic file I/O, for example, while disabling the opening (or creation) of new files. pledge() is far less expressive than seccomp() and cannot implement anything close to the same range of policies but, for basic filtering, it seems far less likely to generate surprises after a kernel or library update.
But Linux doesn't have pledge() and seems unlikely to get it.
seccomp() can certainly get the sandboxing job done, but
developers who use it should expect to spend some ongoing effort
maintaining their filters.
| Index entries for this article | |
|---|---|
| Kernel | Security/seccomp |
| Security | Linux kernel/Seccomp |