unsafe_put_user() turns out to be unsafe

By Jonathan Corbet
October 13, 2017

When a veteran kernel developer introduces a severe security hole into the kernel, it can be instructive to look at how the vulnerability came about. Among other things, it can point the finger at an API that lends itself toward the creation of such problems. And, as it turns out, the knowledge that the API is dangerous at the outset and marking it as such may not be enough to prevent problems.

Back in late 2015, kernel developers were looking for ways to speed up the movement of data between user space and kernel space. Accessing user data from the kernel, whether for reading or writing, has clear security implications, so the kernel must ensure that the requested access is something that the requester is allowed to do. That involves checking that the intended area of memory is indeed accessible; on newer hardware, it can also require temporarily disabling mechanisms like supervisor mode access prevention (SMAP). Those preparations are expensive relative to a simple data copy, and they can add up if the kernel must perform a sequence of several reads and/or writes to user space.

The obvious solution is to do the preparations once for a series of operations, then perform the operations themselves directly; that amortizes the cost of a single set of checks across multiple accesses. To that end, Linus Torvalds added a new set of access functions for the 4.5 kernel:

    unsafe_get_user(value, source);
    unsafe_put_user(value, destination);

These "functions" are actually macros that expand differently depending on the type of source or destination. If, for example, source is a pointer to a u16 value, then unsafe_get_user() will fetch an unsigned 16-bit value from that location and store it in value. These functions are thus like the traditional get_user() and put_user() macros, but with one change: they dispense with the normal permission checking.

The lack of checks is the clear motivation for the "unsafe" naming, even if, according to the comments in the code, that term does not really apply:

    /*
     * The "unsafe" user accesses aren't really "unsafe", but the naming
     * is a big fat warning: you have to not only do the access_ok()
     * checking before using them, but you have to surround them with the
     * user_access_begin/end() pair.
     */

The required access_ok() call ensures that the pointer involved refers to a user-space address; it is there to prevent user space from asking the kernel to overwrite itself. The user_access_begin() and user_access_end() calls, instead, disable and enable SMAP. The whole idea behind the "unsafe" functions is that access_ok() and (especially) user_access_begin() can be called once before several user-space accesses, thus speeding up the code overall.

During the 4.13 merge window, Al Viro reworked the implementation of the waitid() system call to use the "unsafe" functions. This variant of wait() requests that the kernel fill in a siginfo_t structure in user space with information on how the waited-for process died; doing so requires writing several values to user space from the kernel. The new code switched to unsafe_put_user() to speed these writes. Viro duly called user_access_begin() and user_access_end(), but left out the access_ok() calls. As a result, a call to waitid() could ask that the siginfo_t structure be stored in kernel space, overwriting a piece of kernel memory. Local attackers tend to be most pleased by this sort of unintended functionality.

The result is deemed CVE-2017-5123; it only affects the 4.13 and 4.14-rc kernels. It is fixed by this commit in the mainline, and will be fixed in the 4.13.7 stable update. The good news is that it was caught early enough that the number of machines that are exposed to the problem should be quite small.

The bad news, of course, is that it happened at all. It turns out that this was the first use of unsafe_put_user() in the mainline kernel; this use was written by a developer who understands the issues involved, and he still got it wrong. Even better, the patch tickled a bug in that until-then unused function, with the result that Torvalds looked at the specific patch in question, but the problem still was not noticed. It would seem that, contrary to the comments, the "unsafe" functions are well named indeed, and that truly unsafe use does not jump out at people looking at the code — even if the functions proclaim themselves to be unsafe.

Hopefully reviewers in the future will be better tuned into the fact that there are two necessary preconditions to the use of those functions. Even better would be some code that runs in kernels configured for debugging that would detect a lack of checking and raise the alarm. It might well be possible to apply some sort of static checking to the problem as well. Without such assistance, it seems likely that this type of bug will find its way into the kernel again in the future; it may not be so quickly detected next time.

Index entries for this article
Kernel	Security/Vulnerabilities
Security	Linux kernel

to post comments

unsafe_put_user() turns out to be unsafe

Posted Oct 13, 2017 21:25 UTC (Fri) by josh (subscriber, #17465) [Link]

> duly called user_access_begin() and user_access_end(), but left out the access_ok() calls.

This seems like an excellent argument for integrating the access_ok into user_access_begin. Have user_access_begin take a base and length, call access_ok, and fail if access_ok fails.

(That would not handle cases where you want to make multiple access_ok calls to discontiguous regions with a single user_access_begin, but that seems even less common than wanting to aggregate multiple get_user or put_user calls.)

unsafe_put_user() turns out to be unsafe

Posted Oct 13, 2017 23:30 UTC (Fri) by sorokin (guest, #88478) [Link] (10 responses)

I don't know if it is applicable to Linux kernel and C, but in C++ there is a widespread trick on how to allow calling one function only after you called the other one.

For example, let assume that we want to allow function g to be called only after function f. We can add a parameter of some type T to function g. T is some type that can not be easily constructed. And then make function f to return the type T. Now, to call function g the user needs to get object of type T and the most natural way for him to do this is calling function f.

For example, how to require calling user_access_begin before unsafe_put_user? Add a parameter to unsafe_put_user of some type that is returned by user_access_begin. The same can be made for access_ok. Let's add a parameter to unsafe_put_user that is returned by access_ok. Actually this object can hold the pointer and the size of checked region. And in debug mode unsafe_put_user can even check that the bounds of the area checked by access_ok are not violated.

The downside of this method is that it require more typing on the user side. And again I don't know if this C++ trick is easily expressible in C (without friends and private constructors).

unsafe_put_user() turns out to be unsafe

Posted Oct 14, 2017 4:58 UTC (Sat) by dark_knight (subscriber, #47846) [Link]

I was about to make a similar comment.

In C++, one could even build an object to emulate user_access_begin/end in RAII style. Then, all calls to unsafe_*_user are made in the scope where such object is defined.

unsafe_put_user() turns out to be unsafe

Posted Oct 14, 2017 7:53 UTC (Sat) by NAR (subscriber, #1313) [Link] (4 responses)

I don't think it could work in C. There are no private constructors in C - the best one can have is a struct defined privately and only a pointer publicly, but then the programmer still can create a pointer and only the pointer could be passed to the access_* functions, so not much gain (maybe an "uninitialized value" warning). And there's a performance penalty too for passing around a pointer (maybe not measurable). Also the code would be slightly harder to understand due to the seemingly unused parameter.

My gut feeling is that it is possible to do this access safely (there are functions to do that), but this whole unsafe stuff is required(?) to gain some performance and any trickery to make this unsafe stuff safe (except static analysis) will result in some performance overhead that they tried to avoid in the first place.

unsafe_put_user() turns out to be unsafe

Posted Oct 14, 2017 12:26 UTC (Sat) by alonz (subscriber, #815) [Link] (3 responses)

Well, in theory at least, you could make access_ok return the supplied pointer cast to a new type, and have the unsafe_* function require this type. Then calling unsafe_* with the original pointer will at least cause a compilation warning, and anyone forcibly casting to bypass the check will be caught and shamed in review.

unsafe_put_user() turns out to be unsafe

Posted Oct 14, 2017 14:28 UTC (Sat) by NAR (subscriber, #1313) [Link] (2 responses)

The problem is that access_ok already returns a boolean value, so it would be complicated to make it return a pointer too. What I don't understand is that why is there a need for a separate access_ok() and user_access_begin()? If user_access_begin() must not be called without calling access_ok(), why is user_access_begin() not calling it itself?

There was a thread here a couple of weeks ago about using C++ in the kernel. This is a textbook example where C++ could have avoided the problem. Also if you take a look at the code, you can see that user_access_end() needs to be called both on the happy path and on the error path - great opportunity to forget one (usually the error path). This is again a case which RAII in C++ would help.

unsafe_put_user() turns out to be unsafe

Posted Oct 15, 2017 15:08 UTC (Sun) by Paf (subscriber, #91811) [Link]

Access_ok specifies a range. The other calls are open-ended and do not have to stick to any easily describable range.

unsafe_put_user() turns out to be unsafe

Posted Oct 16, 2017 7:54 UTC (Mon) by mjthayer (guest, #39183) [Link]

It seems to me that the C++ idea also requires changing the semantics of access_ok(), so that objection applies to that too if I understood it correctly. One way around this would be a new variant of access_ok that returned the typecast pointer (or something else which also contained information about the range) or NULL. What C has trouble replicating is the trickery C++ uses to try to prevent misuse. My feeling is that this shortcoming of C is often a feature and not a bug. My feeling is that C should be good enough here to prevent accidental misuse in most cases. I am very dubious when I see what looks like attempts to prevent deliberate misuse using hard to understand C++ code.

Regarding your second objection, see also Luto's comment below.

unsafe_put_user() turns out to be unsafe

Posted Oct 16, 2017 7:27 UTC (Mon) by mm7323 (subscriber, #87386) [Link] (1 responses)

It doesn't need C++ to make the API better, though an object orientated mindset may help.

To me the start of the problem is that user_access_begin/end() don't do any checking that they match up and can't nest - that's a source of bugs right there. They could be better designed by having something like the local_irq_save/restore() API where they get passed a little area of storage, but with a type specific only to these functions. This storage could then need to be passed to access_ok() and unsafe_put/get_user().

With a debug toggle you could then use this extra storage area to track correct API use, including that executed accesses have actually been previously covered by access_ok(), as well as some call order tracking. In a performance kernel, the extra parameter could be left unused to avoid overhead.

If there are so few call sites to these functions now, it should be easy to fix the API. Just needs someone to do the work (and I'm not getting up from this armchair :)

unsafe_put_user() turns out to be unsafe

Posted Oct 16, 2017 23:07 UTC (Mon) by nix (subscriber, #2304) [Link]

In userspace, you could do this transparently, if nonportably, with a user_access_begin() macro that introduced a block and set up a local variable marked as __attribute__((__cleanup__)), but that won't fly in kernelspace: it relies on EH stack unwinding :(

unsafe_put_user() turns out to be unsafe

Posted Oct 19, 2017 9:53 UTC (Thu) by swilmet (subscriber, #98424) [Link]

> For example, let assume that we want to allow function g to be called only after function f. We can add a parameter of some type T to function g. T is some type that can not be easily constructed. And then make function f to return the type T. Now, to call function g the user needs to get object of type T and the most natural way for him to do this is calling function f.

Yes, as a general rule of thumb, a best-practice for API design is to encode the restrictions into the API itself instead of writing comments, if possible. If when documenting an API you write a comment like “Don't do that”, or “You need to do that and that before doing this”, it's a code smell, it's maybe a sign that the API needs to be refactored, or the implementation improved, to be able to remove that comment. That's why it's important to document an API early in development.

unsafe_put_user() turns out to be unsafe

Posted Oct 19, 2017 15:59 UTC (Thu) by ecree (guest, #95790) [Link]

This sounds like a job for sparse. Make access_ok take a __user pointer and return a (say) __userok pointer — one that is in a separate address space from both __user and unannotated (kernel) pointers. Then, make unsafe_put_user take a __userok pointer; that way, you can only call it with something you've passed through access_ok.
If you're sufficiently perverse, you can still lie to access_ok with the size you give it — but you can't forget to call it at all.

unsafe_put_user() turns out to be unsafe

Posted Oct 14, 2017 6:23 UTC (Sat) by luto (subscriber, #39314) [Link]

I admit to some puzzlement here. unsafe_put_user() contains two optimizations: it skips STAC/CLAC and it skips the range check. The former is a big win; the latter is tiny.

Wouldn't it be more sensible to only skip STAC/CLAC and keep the range check? It ought to be possible for gcc to prove that the range check isn't needed anyway. Consider:

if ((unsigned long)p < limit - 100) {
  /* write 4 bytes at offset 8 */
  if ((unsigned long)p + 8 < limit - 4)
     do it;
  else
     return -EFAULT;
} else {
  /* access_ok() failure */
}

The compiler should be able to omit the inner check: p is known to be low enough that p+8 can't overflow, so the condition is p < limit - 12, which is guaranteed to be true.

We have set_fs(), though, which sadly makes the analysis much nastier.

unsafe_put_user() turns out to be unsafe

Posted Oct 14, 2017 15:54 UTC (Sat) by vadim (subscriber, #35271) [Link] (2 responses)

Just had a thought: Wouldn't it make sense to have some sort of pre-processor, compiler patch or such thing enforce coding conventions?

It probably wouldn't be all that hard to make GCC able to support annotations for things like function A must be always called before B, and function C must always be followed by function D (for things like locks). I'm sure such functionality could find use in other projects.

In fact, once you get to the scope of something like the Linux kernel, why not start considering extending the language?

unsafe_put_user() turns out to be unsafe

Posted Oct 14, 2017 17:08 UTC (Sat) by corbet (editor, #1) [Link]

We have sparse, which deals with certain kinds of annotations and can find some types of problems. And coccinelle, which can pick out some sorts of bad code patterns. So the tools are not entirely missing, but they have not been applied to this particular problem.

unsafe_put_user() turns out to be unsafe

Posted Oct 16, 2017 4:54 UTC (Mon) by ncm (guest, #165) [Link]

There are reasons why languages are standardized, and related reasons why kernels are written in standard languages. It turns out that it is important for all the people who need to read kernel code to understand what they are reading, and the people writing it to understand what they wrote. That is difficult enough when the language is standard. If the language adheres to no formal standard, and changes on demand, it is increasingly difficult to know if something means what you think it might, or used to mean, or if it will mean the same thing next week.

Of course that is not an argument against using a more powerful, standard language. But for some reason, or for no defensible reason, that alternative is not on the table. That last fact leads people to suggest non-standard language extensions. Invariably the non-standard extension is a misbegotten version of a feature of the more powerful language.

Classic "protocol" error

Posted Oct 15, 2017 21:31 UTC (Sun) by davecb (subscriber, #1574) [Link]

Protocol in the sense a psycologist would use it, you understand!

If I do a b c d e, then c and d return something I want. a, c and e are overhead. If I don't do them in that order, though, I get a silent failure.

The general solution is to ensure that you can only do them in that order: this can be arbitrarily hard. In this case, it might just be a kind of scatter-gather I/O

struct pair { void *from, *to };
magic_transfer_thingie(int count, pair pairs[])