Unprivileged bpf()

By Jonathan Corbet
October 12, 2015

Over the last couple of years, the Berkeley packet filter (BPF) in-kernel virtual machine has gained capabilities and moved beyond its origins in the networking subsystem. Among other things, it has gained its own system call — bpf() — to enable the loading of BPF programs into the kernel and various ancillary functions. In current kernels, bpf() is a root-only system call, and truly root-only at that: one must be root in the initial system namespace to use it. But the plan always included making bpf() available to unprivileged users; now patches are circulating to take that last step.

Kernel developers are understandably nervous about allowing unprivileged users to load code into the kernel for execution in kernel context. It does not take long to think of a number of ways in which things could go wrong. Getting past the initial reflex that says to simply disallow such access requires a high degree of certainty that there is no way for a rogue BPF program to compromise the kernel in any way.

The job of providing this assurance belongs to the BPF verifier module. It checks that any program presented for loading does not exceed the maximum number of instructions (4096 by default) and that it does not contain any loops, thus ensuring that it will not take excessive time to run. All jumps are checked to be sure that they land within the program (and don't create loops). Accesses to memory are not allowed to go outside the memory area provided by the kernel. The type of data stored in each accessible memory location is tracked by simulating the program's execution; instructions are not allowed to operate on inappropriate data types, and uninitialized memory cannot be accessed.

According to BPF developer Alexei Starovoitov, there is just one thing that is missing: ensuring that a BPF program cannot leak information about the kernel — and kernel pointers in particular — to user space. This information can be highly useful to an attacker trying to exploit a vulnerability, so quite a bit of effort has gone into plugging such leaks in recent years. BPF programs do not have access to a great deal of kernel information, but a hostile program still might succeed in exfiltrating something that an attacker can use. That makes it dangerous to allow unprivileged users to load and run BPF programs.

Avoiding this problem requires extending the capabilities of the BPF verifier; that is the intent of this patch set from Alexei. Since the verifier already knows the data types of the values stored in each memory location, this is a matter of restricting what can be done with locations containing pointer values. So simply returning a pointer to user space is clearly to be disallowed, as is storing a pointer into a BPF map. Perhaps more subtly, comparison of pointers is also disallowed; otherwise a BPF program could arrive at a pointer's value indirectly.

Another interesting possibility has to do with pointers stored in the stack. A clever program could try to overwrite parts of a pointer with numeric values in a way that makes it possible to recover the original pointer value, then return the resulting numeric value to user space. The verifier clearly has to catch this case and block it. The list goes on, but the basic idea should be clear by now: there are a lot of ways to sneak pointer values out to user space; the verifier must anticipate and catch them all. Alexei's patch set tries to get the verifier to the point that it can meet that challenge.

Even with these checks in place, there are still some limits on the use of BPF programs by unprivileged users. In particular, with the current patch set, only socket-filter programs can be loaded by those users. BPF programs used in other settings (tracing, for example) inherently have to deal with kernel data, so they will remain restricted to root. BPF programs also pin down some kernel memory; to keep users from occupying too much memory, the space used is charged against their RLIMIT_MEMLOCK resource limit. On many systems, the default value of that limit may prove to be too small to load useful programs, so it may need to be increased by the system administrator.

Finally, there is a sysctl knob (kernel.unprivileged_bpf_disabled) that can be used to disable unprivileged access to bpf() entirely. It defaults to "false" (in the patch; distributors may choose differently); if it is set to "true" it cannot be reset without a reboot. There was some talk of adding more fine-grained control over what BPF programs from unprivileged users can do, but Alexei dismissed that idea as being driven only by "fear." If the verifier is working as intended, he said, there is no reason for such fear and no use for finer-grained controls.

That does leave open the question of whether the verifier has indeed earned the trust that is being placed in it, or whether perhaps some fear might be justified. It is 2000 lines or so of moderately complex code that has been reviewed by a relatively small number of (highly capable) people. It is, in a real sense, an implementation of a blacklist of prohibited behaviors; for it to work as advertised, all possible attacks must have been thought of and effectively blocked. That is a relatively high bar. There is a reason why David Miller described the patch set as "scary stuff".

But David applied the patch set for the 4.4 merge window despite his misgivings for a reason: as with user namespaces, unprivileged BPF access has the potential to increase system security by reducing the number of places where code must be run with elevated privileges. But first it has to get to a point where all of the exploitable loose ends have been found and tied up — and the introduction of new loose ends must be prevented. That point may well be reached in the relatively near future. But it would not be surprising if this feature were to be disabled by distributors for a while after it hits the mainline.

Index entries for this article
Kernel	BPF/Unprivileged
Security	Linux kernel

to post comments

Unprivileged bpf()

Posted Oct 15, 2015 6:31 UTC (Thu) by ibukanov (subscriber, #3942) [Link] (2 responses)

This sounds like the verifier itself is going to be rather complex code. I wonder if that can be simplified if BPF would not be some binary bytecode but rather an encoded AST for a higher level language as used, for example, for WebAssembly. It is much simpler to verify AST and translate it into an internal bytecode or even JIT it than to write a correct verifier for bytecode.

Unprivileged bpf()

Posted Oct 15, 2015 7:10 UTC (Thu) by iq-0 (subscriber, #36655) [Link] (1 responses)

I think the low-level verification is better than a higher level one. A high level verifier might give a better explanation about why it would reject some particular construct, but multiple higher level constructs would map to similar lower level bytecode, so it would have to take into consideration a lot more cases.

In contrast the low-level verification only has to permit certain operations on certain values (the trick is knowing which operations on which type of value are needed and which can be abused), no matter the construct used to generate that operation.

And the more those AST have a direct correspondence to their bytecode the smaller the difference will be between both approaches. A bytecode can easily be transformed into an AST for an albeit very low-level language.

Unprivileged bpf()

Posted Oct 15, 2015 7:43 UTC (Thu) by ibukanov (subscriber, #3942) [Link]

Preventing information leakage is very difficult so it is better to start with a restrictive whitelist of allowed constructs than one can be reasonably sure are safe and gradually widen the set. It is much easier to implement such white-list in a higher level verifier precisely because it can distinguish a particular code pattern with much less efforts and O(N) complexity.

Otherwise I am afraid that a story of Java bytecode verification would repeat itself, albeit of a smaller scale. It took years for Sun to iron out bugs. Then it turned out that they could not do better then O(N**4) complexity in the worst case of the verification making DoS against a browser trivial. So they were forced to extend the bytecode format with extra information to simply the job for the verifier. That bloated already rather fat bytecode and introducing subtle compatibility bugs in quite a few applications.

Correctness by construction

Posted Oct 16, 2015 0:38 UTC (Fri) by ncm (guest, #165) [Link]

Unless the description is inaccurate, it looks to me like a whitelist. There is a list of permitted operations that can be chained together, with optional skipping of subsequences. As long as sequence ABC is OK, and operation D is OK, then sequence ABCD is OK, and ABD is too.

Unprivileged bpf()

Posted Oct 16, 2015 6:42 UTC (Fri) by jezuch (subscriber, #52988) [Link] (1 responses)

Looks like a perfect target for fuzzing to me!

Unprivileged bpf()

Posted Oct 16, 2015 13:31 UTC (Fri) by deater (subscriber, #11746) [Link]

> Looks like a perfect target for fuzzing to me!

both perf_fuzzer and trinity have some form of bpf() fuzzing code.

so far the perf_fuzzer implementation wasn't that useful due to having to run as root, but if userspace bpf does get in (and somehow perf_event can handle that) then fuzzing will be happening once I figure out how to set it up.

Unprivileged bpf()

Posted Oct 16, 2015 7:24 UTC (Fri) by kleptog (subscriber, #1183) [Link]

Sounds like a variation on the "tainting" that Perl has. The pointers in memory are tainted, any operation on a tainted value produces a tainted result. You cannot return a tainted value.

Although, this wouldn't work because you have to deal with control flow being tainted, using branches to test a tainted value. The difference with Perl being that in Perl the data is untrusted, and in BPF the code is untrusted.

Still, it feels like a problem that if approached the right way should be solvable in an easy to verify way. I hope...

Unprivileged bpf()

Posted Oct 17, 2015 11:06 UTC (Sat) by robert_s (subscriber, #42402) [Link]

"fear" is not always a bad thing.

Unprivileged bpf()

Posted Oct 18, 2015 10:46 UTC (Sun) by paulj (subscriber, #341) [Link]

Why are unprivileged BPF user programmes even allowed to access kernel pointer values in any way?

Unprivileged bpf()

Posted Oct 22, 2015 12:15 UTC (Thu) by welinder (guest, #4699) [Link] (1 responses)

A group at CMU was looking into what they called "proof carrying code" for this purpose in the late 1990s or early 2000s.

The idea is that you present two things: (1) the program and (2) a proof that the program works. The kernel would need to verify the proof, but verification is typically trivial, i.e., the code that does it is small and easy to understand.

Coming up with the proof in the first place is mechanical for the class of filter programs you are mostly interested in. For a hand-written and clever program it might not be easy.

Unprivileged bpf()

Posted Jan 11, 2016 6:22 UTC (Mon) by lambda (subscriber, #40735) [Link]

I've always wondered what happened to proof-carrying-code. Why didn't it catch on, especially for these kinds of use cases? Why hasn't it been used for things like NaCl/WASM? It seemed like it was making good progress in the late '90s, then I haven't really heard much about it since. Was there some critical flaw found? Just not trendy enough and lost funding?

And the CVE...

Posted Apr 27, 2016 14:11 UTC (Wed) by justincormack (subscriber, #70439) [Link] (1 responses)

... was http://git.kernel.org/cgit/linux/kernel/git/davem/net.git... giving root access from userspace.

And the CVE...

Posted Apr 27, 2016 14:40 UTC (Wed) by raven667 (subscriber, #5198) [Link]

Judging from the link in the article, davem thought this was a feature that was useful but had a high risk of security-critical bugs, at least until it had been tested against the real world for a while, which seems to have been a completely correct assessment.