Trying to get STACKLEAK into the kernel
The STACKLEAK kernel security feature has been in the works for quite some time now, but has not, as yet, made its way into the mainline. That is not for lack of trying, as Alexander Popov has posted 15 separate versions of the patch set since May 2017. He described STACKLEAK and its tortuous path toward the mainline in a talk [YouTube video] at the 2018 Linux Security Summit.
STACKLEAK is "an awesome security feature" that was originally developed by The PaX Team as part of the PaX/grsecurity patches. The last public version of the patch set was released in April 2017 for the 4.9 kernel. Popov set himself on the goal of getting STACKLEAK into the kernel shortly after that; he thanked both his employer (Positive Technologies) and his family for giving him working and free time to push STACKLEAK.
The first step was to extract STACKLEAK from the more than 200K lines of code in the grsecurity/PaX patch set. He then "carefully learned" about the patch and what it does "bit by bit". He followed the usual path: post the patch, get feedback, update the patch based on the feedback, and then post it again. He has posted 15 versions and "it is still in progress", he said.
Vulnerability types
There are three kinds of vulnerabilities that STACKLEAK is meant to defend against. The first is information disclosure that can come from leaving data on the stack that can be exfiltrated to user space. To combat that, STACKLEAK overwrites the used portion of the kernel stack with STACKLEAK_POISON (-0xBEEF) values at the end of each system call. After that, there is no lingering, potentially sensitive data on the kernel stack to be copied.
The second STACKLEAK feature is closely related. It targets uninitialized variables on the kernel stack with the same mitigation: writing STACKLEAK_POISON to the stack after every system call. That way, the contents of uninitialized automatic variables will not be whatever was written to that stack location before, but will instead be a known value. That would have blocked CVE-2010-2963 and CVE-2017-17712, Popov said; he pointed to a writeup by Kees Cook as a good description of how CVE-2010-2963 can be exploited. (Popov's slides [PDF] also provide details of how these types of vulnerabilities can be exploited.)
One important limitation of the STACKLEAK stack-poisoning mitigation, he said, is that it only works for multi-system-call attacks; since the poisoning is done at the end of system calls, it cannot protect against attacks that complete during a single system call.
The third piece is kernel stack overflow detection at runtime. This will guard against problems like Stack Clash, but it requires some other kernel features: virtually mapped kernel stacks (CONFIG_VMAP_STACK) and moving thread_info into task_struct (CONFIG_THREAD_INFO_IN_TASK). Stack Clash is an old bug; it was first described in 2005, incorrectly fixed in 2010, and raised again by Qualys in 2017 (where the name "Stack Clash" came about). Popov pointed to a 2017 grsecurity blog post that gives the history and also describes how the PaX STACKLEAK feature would guard against the problem.
In order to stop stack overflows, STACKLEAK checks the allocation size for each alloca() call—generated by the compiler to support variable-length arrays (VLAs)—in the kernel at runtime to see if it will overrun the stack. That is done with a plugin to GCC. But the alloca() check was not well-received by Linus Torvalds.
Popov said that there is a cost, of course, to the STACKLEAK feature. It is, not surprisingly, highly workload dependent. The time-honored kernel-build benchmark showed a 0.85% performance degradation, but a hackbench run was 4.3% slower. The former may be acceptable, while the latter likely is not. He has added the STACKLEAK_METRICS option to help potential users evaluate the performance penalty on their workloads.
The current STACKLEAK patch set consists of two parts. There is the code that erases the part of the kernel thread stack that has been used, which runs at the end of system calls, and there is a GCC plugin that tracks the deepest point of the stack, so that the erase functionality covers everything that has been used. The part of the GCC plugin that does the alloca() checking has been removed because it is "hated by Linus".
Upstreaming timeline
The timeline for STACKLEAK in the mainline begins in April 2017 when grsecurity decided to start releasing its patches only to its customers. Shortly thereafter, he decided to work on upstreaming STACKLEAK, an effort that had been started by Tycho Andersen. As he posted the first versions, he was learning about the patch set; he started by looking into the assembly language stack-clearing code. In June, Stack Clash was announced and grsecurity put out its blog post that "trolled" his efforts as simply a copy and paste of the PaX feature without understanding it.
Popov had documented what he still needed to learn in the to-do list on his patches; next up was the GCC plugin, where he found and fixed a few bugs. As time went on, he dug into other pieces of STACKLEAK, found and fixed other problems, and so on. There are multiple ways to get from kernel space to user space at the end of a system call. He tracked all of those down and found a place where stack erasing had been missed, for example.
In January 2018, he was alerted to the page-table isolation (PTI) patches, so he
rebased on top of those and, once Meltdown and Spectre were announced,
changed STACKLEAK to deal with return trampolines (retpolines). At the
time, "I felt like I was in the middle of this hurricane" with everything
that was going on in the kernel; "it was very impressive".
With version 9, he thought STACKLEAK was ready to be merged, but that was when the patches got "burned by Linus". The stack-clearing feature did not pass muster with Torvalds and he said so in no uncertain terms. There were lots of "angry words" in Torvalds's responses, Popov said. But, as part of the exchange, Torvalds did say that variable-length arrays should be removed from the kernel; that started the process of removing them, which has made good progress, but is still ongoing.
That interaction left Popov "emotionally dead for several weeks". His wife suggested that he go back to the replies and try to extract the technical complaints from them. The main complaint was that the stack-clearing code was written in assembler, so he started looking to write that part in C. That was difficult to do because it is tricky to make GCC emit code that is similar to hand-written assembly code. But he got it to work and posted v10 of the patch set, which Brad Spengler of grsecurity called the "Stockholm syndrome patch series", Popov said.
Several more versions were released in the months since March and, with version 14, he once again thought it was ready to be merged. But the pull request for 4.19 was "burned by Linus a second time". In his rejection, Torvalds complained about the use of BUG_ON() in the alloca() checking and the stack-erasing code. He had several strongly worded responses in that thread. Popov once again extracted the technical objections from the angry words to produce another version that is targeted for the next version of Linux (4.20 or, perhaps, 5.0).
STACKLEAK changes
Along the way, he has made multiple changes to the original STACKLEAK feature. Bugs in the GCC plugin have been fixed, some assertions in the stack tracking and alloca() checks were wrong and have been corrected, and STACKLEAK was missing places where the stack needed erasing, which have been added. There was a lot of refactoring as well. Popov extracted the common parts (including the C rewrite of the stack-erasing code) to make it easier to port STACKLEAK to new platforms. The initial version was far from the "usual requirements" for upstream inclusion, in terms of documentation and code style, but he has cleaned that all up.
There is also new functionality that has been added to the original feature. Trampoline stack support has been added for x86_64, for example. He and Andersen put together some "nice tests" to go with the feature. Laura Abbott added support for arm64 and worked with him on GCC 8 support for the plugin. Two features that were requested by Ingo Molnar have been added: the metrics feature that tracks stack usage in system calls to help performance evaluations and a way to disable STACKLEAK at runtime, which Popov said he was opposed to, but Molnar insisted on. The runtime disable is only available if it is configured into the kernel, which it is not by default; that was the compromise that he and Molnar found, Popov said.
There is functionality that has been dropped from the PaX STACKLEAK feature as well. The erroneous assertions in the stack-tracking code are gone as he noted earlier. There was code to do stack erasing at the beginning of system calls after calls like ptrace() and seccomp(), but that got dropped early on due to complaints from Torvalds. Most recently, the alloca() checking has been dropped since it is believed that all VLAs are on their way out of the kernel (and -Wvla will be enabled so none creep back in), though that job has not been completed yet. In addition, it is now abundantly clear that BUG_ON() is completely prohibited for hardening patches.
Popov noted that Spengler has said that upstream security developers often do not understand the code they have copy-pasted from grsecurity. "I am sure that is not applicable to the STACKLEAK upstreaming efforts", Popov said.
He went on to explain what he meant by "burned by Linus" in his talk and on his slides. There is strong language in some of Torvalds's replies, "even swearing, which I don't quote" mixed with the technical objections. So people need to put their emotions aside and try to extract the actual complaints from these kinds of messages. Torvalds will also NAK patches without even looking at them, which is difficult to handle, Popov said. It makes him wonder if Torvalds is by default irritated by the kernel hardening initiatives.
Popov said that all of that "kills my motivation" to work on Linux. It remains to be seen if STACKLEAK will get merged or if all of his efforts have simply been like those of Sisyphus. In conclusion, Popov said, the attendees represent the Linux kernel community, which is responsible for all of the different systems that run "our favorite operating system". He suggested putting more effort toward kernel security features so that those efforts cannot be ignored.
[I would like to thank LWN's travel sponsor, the Linux Foundation, for
travel assistance to attend the Linux Security Summit in Vancouver.]
| Index entries for this article | |
|---|---|
| Kernel | Security/Kernel hardening |
| Security | Linux kernel/Hardening |
| Conference | Linux Security Summit North America/2018 |