Trying to get STACKLEAK into the kernel

By Jake Edge
September 12, 2018

The STACKLEAK kernel security feature has been in the works for quite some time now, but has not, as yet, made its way into the mainline. That is not for lack of trying, as Alexander Popov has posted 15 separate versions of the patch set since May 2017. He described STACKLEAK and its tortuous path toward the mainline in a talk [YouTube video] at the 2018 Linux Security Summit.

STACKLEAK is "an awesome security feature" that was originally developed by The PaX Team as part of the PaX/grsecurity patches. The last public version of the patch set was released in April 2017 for the 4.9 kernel. Popov set himself on the goal of getting STACKLEAK into the kernel shortly after that; he thanked both his employer (Positive Technologies) and his family for giving him working and free time to push STACKLEAK.

The first step was to extract STACKLEAK from the more than 200K lines of code in the grsecurity/PaX patch set. He then "carefully learned" about the patch and what it does "bit by bit". He followed the usual path: post the patch, get feedback, update the patch based on the feedback, and then post it again. He has posted 15 versions and "it is still in progress", he said.

Vulnerability types

There are three kinds of vulnerabilities that STACKLEAK is meant to defend against. The first is information disclosure that can come from leaving data on the stack that can be exfiltrated to user space. To combat that, STACKLEAK overwrites the used portion of the kernel stack with STACKLEAK_POISON (-0xBEEF) values at the end of each system call. After that, there is no lingering, potentially sensitive data on the kernel stack to be copied.

The second STACKLEAK feature is closely related. It targets uninitialized variables on the kernel stack with the same mitigation: writing STACKLEAK_POISON to the stack after every system call. That way, the contents of uninitialized automatic variables will not be whatever was written to that stack location before, but will instead be a known value. That would have blocked CVE-2010-2963 and CVE-2017-17712, Popov said; he pointed to a writeup by Kees Cook as a good description of how CVE-2010-2963 can be exploited. (Popov's slides [PDF] also provide details of how these types of vulnerabilities can be exploited.)

One important limitation of the STACKLEAK stack-poisoning mitigation, he said, is that it only works for multi-system-call attacks; since the poisoning is done at the end of system calls, it cannot protect against attacks that complete during a single system call.

The third piece is kernel stack overflow detection at runtime. This will guard against problems like Stack Clash, but it requires some other kernel features: virtually mapped kernel stacks (CONFIG_VMAP_STACK) and moving thread_info into task_struct (CONFIG_THREAD_INFO_IN_TASK). Stack Clash is an old bug; it was first described in 2005, incorrectly fixed in 2010, and raised again by Qualys in 2017 (where the name "Stack Clash" came about). Popov pointed to a 2017 grsecurity blog post that gives the history and also describes how the PaX STACKLEAK feature would guard against the problem.

In order to stop stack overflows, STACKLEAK checks the allocation size for each alloca() call—generated by the compiler to support variable-length arrays (VLAs)—in the kernel at runtime to see if it will overrun the stack. That is done with a plugin to GCC. But the alloca() check was not well-received by Linus Torvalds.

Popov said that there is a cost, of course, to the STACKLEAK feature. It is, not surprisingly, highly workload dependent. The time-honored kernel-build benchmark showed a 0.85% performance degradation, but a hackbench run was 4.3% slower. The former may be acceptable, while the latter likely is not. He has added the STACKLEAK_METRICS option to help potential users evaluate the performance penalty on their workloads.

The current STACKLEAK patch set consists of two parts. There is the code that erases the part of the kernel thread stack that has been used, which runs at the end of system calls, and there is a GCC plugin that tracks the deepest point of the stack, so that the erase functionality covers everything that has been used. The part of the GCC plugin that does the alloca() checking has been removed because it is "hated by Linus".

Upstreaming timeline

The timeline for STACKLEAK in the mainline begins in April 2017 when grsecurity decided to start releasing its patches only to its customers. Shortly thereafter, he decided to work on upstreaming STACKLEAK, an effort that had been started by Tycho Andersen. As he posted the first versions, he was learning about the patch set; he started by looking into the assembly language stack-clearing code. In June, Stack Clash was announced and grsecurity put out its blog post that "trolled" his efforts as simply a copy and paste of the PaX feature without understanding it.

Popov had documented what he still needed to learn in the to-do list on his patches; next up was the GCC plugin, where he found and fixed a few bugs. As time went on, he dug into other pieces of STACKLEAK, found and fixed other problems, and so on. There are multiple ways to get from kernel space to user space at the end of a system call. He tracked all of those down and found a place where stack erasing had been missed, for example.

In January 2018, he was alerted to the page-table isolation (PTI) patches, so he rebased on top of those and, once Meltdown and Spectre were announced, changed STACKLEAK to deal with return trampolines ~~(retpolines)~~. At the time, "I felt like I was in the middle of this hurricane" with everything that was going on in the kernel; "it was very impressive".

With version 9, he thought STACKLEAK was ready to be merged, but that was when the patches got "burned by Linus". The stack-clearing feature did not pass muster with Torvalds and he said so in no uncertain terms. There were lots of "angry words" in Torvalds's responses, Popov said. But, as part of the exchange, Torvalds did say that variable-length arrays should be removed from the kernel; that started the process of removing them, which has made good progress, but is still ongoing.

That interaction left Popov "emotionally dead for several weeks". His wife suggested that he go back to the replies and try to extract the technical complaints from them. The main complaint was that the stack-clearing code was written in assembler, so he started looking to write that part in C. That was difficult to do because it is tricky to make GCC emit code that is similar to hand-written assembly code. But he got it to work and posted v10 of the patch set, which Brad Spengler of grsecurity called the "Stockholm syndrome patch series", Popov said.

Several more versions were released in the months since March and, with version 14, he once again thought it was ready to be merged. But the pull request for 4.19 was "burned by Linus a second time". In his rejection, Torvalds complained about the use of BUG_ON() in the alloca() checking and the stack-erasing code. He had several strongly worded responses in that thread. Popov once again extracted the technical objections from the angry words to produce another version that is targeted for the next version of Linux (4.20 or, perhaps, 5.0).

STACKLEAK changes

Along the way, he has made multiple changes to the original STACKLEAK feature. Bugs in the GCC plugin have been fixed, some assertions in the stack tracking and alloca() checks were wrong and have been corrected, and STACKLEAK was missing places where the stack needed erasing, which have been added. There was a lot of refactoring as well. Popov extracted the common parts (including the C rewrite of the stack-erasing code) to make it easier to port STACKLEAK to new platforms. The initial version was far from the "usual requirements" for upstream inclusion, in terms of documentation and code style, but he has cleaned that all up.

There is also new functionality that has been added to the original feature. Trampoline stack support has been added for x86_64, for example. He and Andersen put together some "nice tests" to go with the feature. Laura Abbott added support for arm64 and worked with him on GCC 8 support for the plugin. Two features that were requested by Ingo Molnar have been added: the metrics feature that tracks stack usage in system calls to help performance evaluations and a way to disable STACKLEAK at runtime, which Popov said he was opposed to, but Molnar insisted on. The runtime disable is only available if it is configured into the kernel, which it is not by default; that was the compromise that he and Molnar found, Popov said.

There is functionality that has been dropped from the PaX STACKLEAK feature as well. The erroneous assertions in the stack-tracking code are gone as he noted earlier. There was code to do stack erasing at the beginning of system calls after calls like ptrace() and seccomp(), but that got dropped early on due to complaints from Torvalds. Most recently, the alloca() checking has been dropped since it is believed that all VLAs are on their way out of the kernel (and -Wvla will be enabled so none creep back in), though that job has not been completed yet. In addition, it is now abundantly clear that BUG_ON() is completely prohibited for hardening patches.

Popov noted that Spengler has said that upstream security developers often do not understand the code they have copy-pasted from grsecurity. "I am sure that is not applicable to the STACKLEAK upstreaming efforts", Popov said.

He went on to explain what he meant by "burned by Linus" in his talk and on his slides. There is strong language in some of Torvalds's replies, "even swearing, which I don't quote" mixed with the technical objections. So people need to put their emotions aside and try to extract the actual complaints from these kinds of messages. Torvalds will also NAK patches without even looking at them, which is difficult to handle, Popov said. It makes him wonder if Torvalds is by default irritated by the kernel hardening initiatives.

Popov said that all of that "kills my motivation" to work on Linux. It remains to be seen if STACKLEAK will get merged or if all of his efforts have simply been like those of Sisyphus. In conclusion, Popov said, the attendees represent the Linux kernel community, which is responsible for all of the different systems that run "our favorite operating system". He suggested putting more effort toward kernel security features so that those efforts cannot be ignored.

[I would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to attend the Linux Security Summit in Vancouver.]

Index entries for this article
Kernel	Security/Kernel hardening
Security	Linux kernel/Hardening
Conference	Linux Security Summit North America/2018

to post comments

Trying to get STACKLEAK into the kernel

Posted Sep 12, 2018 22:53 UTC (Wed) by roc (subscriber, #30627) [Link] (16 responses)

I don't understand why Cook and Popov and others keep putting up with Linus' abusive behaviour. There are plenty of other interesting projects with non-toxic communities. I suppose in Cook's case Google makes it worth his while.

Trying to get STACKLEAK into the kernel

Posted Sep 13, 2018 16:48 UTC (Thu) by flussence (guest, #85566) [Link] (5 responses)

Linus' abusive behaviour is unnecessary theatrics, a curable condition. I hope one day it is so, rehabilitation is better than isolation.

grsecurity's abusive behaviour on the other hand is sincere, unwavering, and part of their business model.

I see this work, and similar efforts, as a long term project to extinguish the latter. With no more unique product to sell and a bit of patience, the company will perish and we'll suffer no more of them; they bring no skills to the table besides public trolling of anyone in the same field as them (= burnout, less security work being done, more for them to charge for), and code that takes a herculean effort every time to be hammered into fit-for-upstream condition (I don't think Hanlon's Razor applies here).

It seems to be worth money to some companies, and it's a lot more bang for the buck in that regard than trying to kick Linus out.

Trying to get STACKLEAK into the kernel

Posted Sep 13, 2018 20:09 UTC (Thu) by roc (subscriber, #30627) [Link] (1 responses)

> Linus' abusive behaviour is unnecessary theatrics, a curable condition. I hope one day it is so, rehabilitation is better than isolation.

I agree, but he needs help. It's unclear from the outside whether anyone he respects is willing to call him on it.

Maybe the LWN editors? They surely appreciate the problem, given that if someone repeated Linus' behaviour in these LWN comments, they'd be banned.

Trying to get STACKLEAK into the kernel

Posted Sep 14, 2018 17:13 UTC (Fri) by nix (subscriber, #2304) [Link]

I doubt that. People have been much more offensive than Linus without being banned here. It *is* possible to be so offensive you get banned, but it's hard. (Linus levels of offensiveness would probably only get a warning.)

Trying to get STACKLEAK into the kernel

Posted Sep 13, 2018 22:29 UTC (Thu) by sjfriedl (✭ supporter ✭, #10111) [Link] (1 responses)

> grsecurity's abusive behaviour on the other hand is sincere, unwavering, and part of their business model.

Not having any of the history, I'm trying to figure out what led the universe here. Everything I've seen about grsecurity is that it's real-deal security, but my review has been extremely superficial (and I'm not a user).

Is this a case of some really smart people doing yeoman security work on the kernel, but nobody wants to pay for security, so they react badly when their business model doesn't pan out?

Or is this something else?

I really don't know the answer (and I have no dog in this fight).

Trying to get STACKLEAK into the kernel

Posted Sep 15, 2018 12:40 UTC (Sat) by flussence (guest, #85566) [Link]

The grsec code does add some security from what I've observed second-hand, but it goes about it like windows antivirus vendors: sticking hooks in random places and doing many things to cause userspace breakage (one of the reasons it never got upstreamed) and worse, kernel breakage (a normal user should never be able to hang the kernel with `cat`).

The way they respond to criticism of those things, you'd think they were malware authors.

Trying to get STACKLEAK into the kernel

Posted Sep 17, 2018 23:41 UTC (Mon) by ThinkRob (guest, #64513) [Link]

Linus' abusive behaviour is unnecessary theatrics, a curable condition. I hope one day it is so, rehabilitation is better than isolation.

I think part of why Linus doesn't get called out on it as much is that it's not always obvious from the various quotes and excerpts that make it into the mainstream trade press. (And they certainly make for attention-getting headlines, so I get why they're reprinted.) Before I followed kernel dev that closely I would have been surprised if this were the case. But after having gotten more into kernel dev in the last year or two now, it's obvious that yeah, it is. Why? Because he often leads in to a rejection with a flame, but then [frequently] has solid technical critiques... later on in the thread.

There's some argument to be made that "you have to be blunt or exceptions start getting made". And I get, and am even sympathetic to that... to some degree. But that doesn't mean that you have to 1) go in all-guns-blazing every single time 2) make the attacks personal. You can still have an honest response to a bad patch that rips the code apart [1] and that doesn't come off as condescending or malicious or personal, but far too often I've read flames from him that make it sound like he dislikes the coder rather than the code. And that makes it a problem, whether that's his intent or not.

He doesn't even have to stop calling bad code bad! We've probably all ranted to a friend or coworker about some busted, oddball library we were forced to use. And most of us probably get a kick out of things like that infamous rant on the PSD file format or some of JWZ's musings on "the Xwindows disaster". But this isn't the same, because Linus's screeds often aren't aimed at the code, they're about people. And that takes it from a potential solution to a technical problem to serious problem for people trying to write technical solutions.

^{[1] although whether or not such a blunt response is necessary or appropriate is situation dependent}

Trying to get STACKLEAK into the kernel

Posted Sep 14, 2018 0:23 UTC (Fri) by curtis3389 (guest, #127185) [Link] (9 responses)

> There are plenty of other interesting projects with non-toxic communities.

This struck me.

Is there a consensus that the Linux community is toxic?

Trying to get STACKLEAK into the kernel

Posted Sep 14, 2018 0:50 UTC (Fri) by roc (subscriber, #30627) [Link] (1 responses)

I assume a lot of developers in the kernel community itself would disagree, though selection effects must partially explain that.

But I don't think there are many other open-source projects where Linus' behaviour would be tolerated. In that sense, I think there is a consensus.

Trying to get STACKLEAK into the kernel

Posted Sep 14, 2018 9:36 UTC (Fri) by blackwood (guest, #44174) [Link]

Linus and Linux are fairly regularly used in keynotes as _the_ example of a toxic/dysfunctional community. e.g. rust just recently:

https://www.youtube.com/watch?v=J9OFQm8Qf1I&feature=y...

(jumps directly to the right spot)

Trying to get STACKLEAK into the kernel

Posted Sep 14, 2018 6:00 UTC (Fri) by seyman (subscriber, #1172) [Link]

> Is there a consensus that the Linux community is toxic?

I believe there's a consensus that a number of former kernel devs are now working on other projects because they did not appreciate the Linux community mindset. There's also consensus that a number of people have chosen to not get involved with kernel development in the first place for the same reason.

As roc said, the kernel community itself probably disagrees and I suspect only legal action will change their minds.

Trying to get STACKLEAK into the kernel

Posted Sep 14, 2018 7:21 UTC (Fri) by lkundrak (subscriber, #43452) [Link] (5 responses)

> Is there a consensus that the Linux community is toxic?

What would justify calling the whole community toxic? Is a couple of individuals who occasionally say something that insults someone else's feelings sufficient?

If so, then any sufficiently large community is guaranteed to get toxic.

Trying to get STACKLEAK into the kernel

Posted Sep 14, 2018 8:34 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

It's the tolerance of this behavior that makes it toxic.

Trying to get STACKLEAK into the kernel

Posted Sep 14, 2018 12:35 UTC (Fri) by deater (subscriber, #11746) [Link] (1 responses)

I would argue that Linux devel has become toxic, but for the complete opposite reason that you think.

Linux devel used to be fun, exciting, risky, interesting. But the big push has come in to make it corporate. And the more corporate it becomes, it's mostly now indistinguishable for coding for IBM or Microsoft or similar.

So despite years and years of being a committed Linux user and developer, I find myself caring less and less. Because with the limited free time I have to code, why volunteer that time to a project that has become bland and boring.

But anyway, feel free to keep up your push to stamp out what little flame there is left in the community.

Trying to get STACKLEAK into the kernel

Posted Sep 14, 2018 13:40 UTC (Fri) by excors (subscriber, #95769) [Link]

Perhaps the issue is that Linux's original culture has become a victim of its own success. Nowadays Linux is so big and so important that it would be irresponsible to leave its development to just a bunch of hackers having fun. Maintaining it and extending it requires thousands of developers working together, because of the sheer amount of work involved, and getting that many people to cooperate productively requires boring management and professional communication etc to minimise interpersonal conflicts - the kind of culture that has helped Microsoft and IBM successfully develop software over decades with thousands of developers. It's not fun or efficient, but it works.

If you want to work in a culture like Linux had when it was small and unimportant, there are plenty of other projects that are small and unimportant that you could work on. And hopefully you will help those to become large and successful and boring, and then can move on to another one.

Trying to get STACKLEAK into the kernel

Posted Sep 14, 2018 11:35 UTC (Fri) by roc (subscriber, #30627) [Link] (1 responses)

It's disingenuous to refer to Linus as just "an individual" in the community.

As Cyberax says, it's the toleration, defense and even sometimes celebration of this behaviour that is toxic.

individuals, and robustness in the face of extreme diversity thereof

Posted Sep 24, 2018 5:18 UTC (Mon) by Garak (guest, #99377) [Link]

It's disingenuous to refer to Linus as just "an individual" in the community.

lkundrak was clearly enough describing a general logical proof involving sets and individuals. Extrapolating the situation to the general, not referring to the specific instance. Thus not disingenuous.

I too think that a significant aspect of this is the 'specialness' of the 'supreme leader'(core trademark holder). At the end of the day, anyone is free to fork linux, call it anything else, and do whatever they want. It's not like anything Linus Torvalds does or does not do interferes with anyone's freedom to do that. Given that dynamic (the core dynamic of FOSS), this doesn't seem very important to me (given the context of billions if not trillions of processors out there running 'linux' and keeping the world turning)

Trying to get STACKLEAK into the kernel

Posted Sep 13, 2018 6:01 UTC (Thu) by Lionel_Debroux (subscriber, #30014) [Link] (10 responses)

Here's what spender has to say about this article:
https://twitter.com/grsecurity/status/1039987873683070977
https://twitter.com/grsecurity/status/1039988230811250689
https://twitter.com/grsecurity/status/1039988817745375233

'Some false quotes from LWN's regurgibloid article on STACKLEAK: "some assertions in the stack tracking and alloca() checks were wrong and have been corrected" / "STACKLEAK was missing places where the stack needed erasing"
In fact, the current upstream-proposed STACKLEAK is weaker in a number of areas where it matters, but LWN will never report that because they need it on some public mailing list and written by an upstream developer they can copy+paste their uncritical articles from
(It's also slower for reasons that serve no security purpose at all, and their manual VLA removal has resulted in slower/buggier code in general -- what's faster, a simple check inserted by the compiler to make sure a VLA use is safe, or a whole kmalloc/kfree in a function?)'

Since two persons are asserting opposite things about a matter, we know that at least one of them is lying, whether voluntarily or not.
Given the stellar track record of spender and PaXTeam on:
* creating quality defenses and being able to explain their tradeoffs and why they work: KERNEXEC, MEMORY_UDEREF, CONSTIFY, RANDSTRUCT, RAP (none of which mainline Linux has at a breadth that resembles grsecurity's, or at all, despite their protective ability), etc.;
* criticizing / not implementing poor defenses: KASLR, Intel's very weak CFI called CET, etc.
* IME, usually thanking / pointing when their own mistakes are found: for instance, some bugs I reported in grsecurity over time, such as a double stable patch security backport; more recently, spender pointing that he found that his initial thought about Meltdown being already inexploitable on properly configured grsec kernels was wrong, although the standard exploit which tripped other kernels was indeed blocked
their vision of Linux security is more effective, and they usually warrant more trust wrt. statements related to Linux security, than basically anyone else. Even if they can certainly make mistakes, like everyone else.

Mainline Linux security is a lost cause... even if a semi-official curated hardened Linux tree, like strcat's, were to be created, and maintained in a sustainable way over the long term, as a way to widen testing of patches in the real world before they hit mainline.

Trying to get STACKLEAK into the kernel

Posted Sep 13, 2018 6:38 UTC (Thu) by k8to (guest, #15413) [Link]

Please aim higher.

This presentation does little to convince any who are not already convinced. If you don't care about that goal, then I do not see the point of this at all.

Trying to get STACKLEAK into the kernel

Posted Sep 13, 2018 12:50 UTC (Thu) by jkingweb (subscriber, #113039) [Link] (7 responses)

I'm mystified as to why grsecurity is attacking LWN for the patch's supposed failings. Is Jake supposed to know the patch better than its author does? Where's the hostility coming from?

Trying to get STACKLEAK into the kernel

Posted Sep 13, 2018 13:14 UTC (Thu) by sjfriedl (✭ supporter ✭, #10111) [Link]

> [jkingweb] Where's the hostility coming from?

No kidding. I knew nothing about this issue before this article, and the responses don't give a very good impression.

> [Lionel_Debroux] Since two persons are asserting opposite things about a matter, we know that at least one of them is lying, whether voluntarily or not.

Uh, "involuntarily lying"? Maybe an alternate explanation is that one of them is merely mistaken?

Trying to get STACKLEAK into the kernel

Posted Sep 13, 2018 18:15 UTC (Thu) by seyman (subscriber, #1172) [Link]

> I'm mystified as to why grsecurity is attacking LWN for the patch's supposed failings.

I share that sentiment. One of the tweets pointed out by Lionel criticizes LWN for requiring quoted information to be public but I view that as a good thing.

Trying to get STACKLEAK into the kernel

Posted Sep 13, 2018 19:53 UTC (Thu) by josh (subscriber, #17465) [Link] (4 responses)

Once the community no longer supports you and the current state of the world no longer supports your business model, and you start building your business on FUD, then good reporting, facts, and sunlight become your enemy. (Sunlight is the best disinfectant, which is good news unless you're building your business around the infection.) grsecurity is pretty much SCO at this point, lawsuits included.

Trying to get STACKLEAK into the kernel

Posted Sep 13, 2018 22:14 UTC (Thu) by seyman (subscriber, #1172) [Link]

> grsecurity is pretty much SCO at this point, lawsuits included.

Speaking at which, did grsecurity ever refile the defamation suit they had going against Bruce Perens? Last I heard, the case had been thrown out by the judge.

Trying to get STACKLEAK into the kernel

Posted Sep 14, 2018 10:23 UTC (Fri) by Lionel_Debroux (subscriber, #30014) [Link] (2 responses)

As long as the PaX and grsecurity patchsets were publicly downloadable at no extra cost (beyond some form of Internet connection, that is), until April 2017:

about community support: they did have a loyal following of users aware of the insecurity of mainline kernels, and willing to go through limited extra pain (especially after multiple large distros gained linux-grsec packages, at the cost of slightly reduced security because it nullifies RANDSTRUCT) to use more secure kernels than the default.
Unfortunately, too many people trust words by Linus and friends more than words by the real security experts PaXTeam and spender. Many people - including subsystem and driver maintainers - didn't even bother to dig deeper into the huge security benefits of PaX / grsecurity + the hundreds of small, scattered bugfixes relevant to subsystem maintainers + the many additional stable backports (see Twitter thread about hundreds more patches being backported in grsecurity than in mainline) relevant to, well, pretty much everyone not trying to run mainline kernels all the time, i.e. lots of users. Instead of making their informed opinion by themselves, some of these people based their decisions on hearsay...
The insecurity of mainline kernels is technically alleviable, as shown by PaX / grsecurity, but politically unfixable as shown by Linus rejecting some useful features and watering others down - as reported in this article and other earlier articles.
about supporting their business model: I can agree with that part of your post. It's a fact that the corporate customers, and community supporters, paying spender's company used not to be enough for PaXTeam + spender to be able to work on PaX / grsecurity near-full-time (assuming they wished to be able to do that anyway - I simply don't know). Most people just used the free version even in professional setups, few of them contributed money.
The KSPP made their business model even more unsustainable by creating more work for them by integrating buggy, watered down derivatives of outdated versions of small PaX / grsecurity subsets: PaXTeam and spender had to fix conflicts, debug issues, review mainline changes which often turned out to be more bugs than fixes they should reintegrate.
good reporting, facts and sunlight: it's not clear to me how good reporting / facts / sunlight would be real enemies to PaX/grsecurity. If "reporters" pretend to provide good information, based on facts, and shine some sunlight, then they have no choice but point the large feature set. I found the earlier version clearer, with its single table full of ticks for grsecurity and missing features for mainline, but it was less detailed.
Maybe on the communication style front, as spender's style is known to be abrasive ? Sure, but then Linus' style is also well documented as repeatedly offensive, turning some developers away from the kernel community (see multiple posts in the sub-thread above the sub-thread I started - I'll add a mention of Sarah Sharp, who created the USB 3 stack in Linux, making Linux the first kernel with decent USB 3 support). Good reporting needs to point it too.
The defamation suit ? Indeed, that wasn't a step in the right direction, and definitely earned them negative publicity... But Bruce Perens contributing negative things against a technically unmatched product - from a relatively famous and supposedly trusted person, that other people can use to justify more FUD against grsecurity - might not have been a smart thing to do for the progress of mankind. There would have been no reason to make a suit against him without that trigger.

Now that they only provide the PaX and grsecurity patchsets behind a paywall accessible only to corporations (AFAIK):

they have lost their community of individual users, obviously;
however, they have gained customers (more logos on their web page, at least, but...), so the state of the world does support their business model to some extent, at least better than it used to when nearly everyone was free-riding. The defamation suit probably alienated them some potential customers, but the increased visibility might have unexpected benefits, who knows.
no change on good reporting, facts and sunlight: they obviously need to mention the defamation suit and the abrasiveness, but they also have to mention the technical advantages of the product, the severe systemic insecurity of the product it is based on, and the toxicity of mainline development, starting with Linus' abrasiveness.

TL;DR: I tried to imagine what you meant in multiple areas, but I partially failed. Are you willing to give more details on what you meant ? :)

Trying to get STACKLEAK into the kernel

Posted Sep 14, 2018 12:26 UTC (Fri) by seyman (subscriber, #1172) [Link]

> But Bruce Perens contributing negative things [...] might not have been a smart thing to do for the progress of mankind.

Bruce Perens warning people that using Grsecurity's Linux kernel security could invite legal trouble is indeed a smart thing to do, contrary to what you claim.

> There would have been no reason to make a suit against him without that trigger.

There was no reason to sue him even with that trigger, as a judge has ruled.

Trying to get STACKLEAK into the kernel

Posted Sep 15, 2018 5:40 UTC (Sat) by pabs (subscriber, #43278) [Link]

Re RANDSTRUCT, any idea if PaX/grsec folks have evaluated multicompiler?

http://ssllab.org/#multic
https://github.com/securesystemslab/multicompiler

Trying to get STACKLEAK into the kernel

Posted Sep 20, 2018 11:14 UTC (Thu) by Wol (subscriber, #4433) [Link]

> Since two persons are asserting opposite things about a matter, we know that at least one of them is lying, whether voluntarily or not.

From personal experience I can assure you that you're wrong. And I'm pretty certain the correct conclusion in policing is that if two people assert the SAME thing about a matter, then they are probably lying (colluding).

Firstly there is the law of relativity - two observers in two different places will see the same event in two different ways.

And secondly, I have had plenty of experience of friends describing things to me - where I have the same personal experience as them - and I beg to differ strongly with what they see. That doesn't mean one of us is lying. It means one of us is *wrong*, but that's a very different matter - chances are *both* of us are wrong.

Cheers,
Wol

GCC stackleak function attribute

Posted Sep 13, 2018 6:47 UTC (Thu) by mjw (subscriber, #16740) [Link] (3 responses)

At the GNU Tools Cauldron last week there was a presentation on a similar topic to get a new stackleak like attribute for functions that would clear the stack used on function return into GCC.

This might be used instead of the gcc plugin currently used (assuming the goals are similar enough). It might be interesting to collaborate on these kind of functionality/security mitigations can be made more generic so they can be used across user and kernel space.

Slides and video should appear here soon: https://gcc.gnu.org/wiki/cauldron2018#Slides.2C_Videos_an...

GCC stackleak function attribute

Posted Sep 13, 2018 10:08 UTC (Thu) by mjthayer (guest, #39183) [Link]

That certainly sounds sensible. Following up on roc's comment above, might the working environment be more friendly as well? Perhaps other people working on security in the kernel should be looking at whether some of their work can be done in the toolchain instead.

GCC stackleak function attribute

Posted Sep 13, 2018 13:45 UTC (Thu) by a13xp0p0v (guest, #118926) [Link] (1 responses)

Nice! Thanks for the link.

GCC stack_erase function attribute

Posted Sep 13, 2018 13:52 UTC (Thu) by mjw (subscriber, #16740) [Link]

It looks like the video of the talk isn't there yet, but I found the slides already here: https://gmarkall.files.wordpress.com/2018/09/secure_and_g...

The suggested attribute name was actually "stack_erase".

Trying to get STACKLEAK into the kernel

Posted Sep 13, 2018 12:56 UTC (Thu) by a13xp0p0v (guest, #118926) [Link] (4 responses)

Thanks for the article, Jake!
You've done a very good job since grsecurity doesn't complain about crediting them right.

Let me correct this:

> In January 2018, he was alerted to the page-table isolation (PTI) patches, so he rebased on top
> of those and, once Meltdown and Spectre were announced, changed STACKLEAK to deal with
> return trampolines (retpolines).

STACKLEAK has nothing to do with Spectre and retpolines.

PTI patches introduced the trampoline stack on x86_64. Kernel switches to
this stack from a thread stack just before returning to the userspace. However
there are cases when we return to the userspace directly from the thread stack,
without switching to this trampoline stack.

So during rebasing I adjusted the stack erasing. It detects which stack is used and:
- if it is called from the trampoline stack, erasing goes up to the thread stack top,
- if it is called from the thread stack, erasing goes up to the stack pointer.

----

Now let me give the technical details according to Brad Spengler's comments in twitter.

First, I should say that having any technical feedback from him about my patches
is a VERY rare event. So I'm glad that he posted it, let's talk about that.

> 'Some false quotes from LWN's regurgibloid article on STACKLEAK: "some assertions
> in the stack tracking and alloca() checks were wrong and have been corrected"

1. The original assertion in track_stack() for detecting stack exhaustion
looks like that:
+ if (unlikely((sp & ~(THREAD_SIZE - 1)) < (THREAD_SIZE / 16)))
+ BUG();

After a careful look you can see that this check will never work because of erroneous '~'.

But if we remove this '~', we get into another trouble: when kernel stack is exhausted
and this check is hit, we get into recursive BUG(), since the functions which handle
BUG() are instrumented and call track_stack() themselves.

As a result, this recursive BUG() handling hits the guard page below the thread stack or
corrupts the neighbor memory (if CONFIG_VMAP_STACK or similar feature is disabled).

That's why I say that the original assertion in stack tracking was wrong.

2. Now about assertion in check_alloca().
In v4 I also fixed the surplus and erroneous code for calculating stack_left in check_alloca()
on x86_64. That code repeats the work which is already done in get_stack_info() and it
misses the fact that different exception stacks on x86_64 have different size.
https://www.openwall.com/lists/kernel-hardening/2017/10/0...

> "STACKLEAK was missing places where the stack needed erasing"

I added missing erase_kstack() call at ret_from_fork() for x86_32.

Anyway, if Brad will prove that I'm wrong, I'll be only happy to learn it.
By the way, he never thanked me for the STACKLEAK fixes which I shared with him.

> In fact, the current upstream-proposed STACKLEAK is weaker in a number
> of areas where it matters

I absolutely agree with Brad. Linus and Ingo made me do several changes
which I'm not happy about. It's the price for our compromise. In fact, Linus
doesn't want STACKLEAK at all, but I fight for it.

> (It's also slower for reasons that serve no security purpose at all,

Yes, technically that's true. However, I didn't see the noticeable difference during
performance tests. Original stack erasing in assembly language and my stack erasing
in C show the same numbers.

> and their manual VLA removal has resulted in slower/buggier code in general
> -- what's faster, a simple check inserted by the compiler to make sure a VLA use
> is safe, or a whole kmalloc/kfree in a function?)'

I don't have a strong opinion on that.
Please see Kees' talk at Linux Security Summit, which gives more details:
https://www.youtube.com/watch?v=XfNt6MsLj0E

Anyway, I don't like dropping check_alloca() forced by Linus.
Let's imagine that all VLAs are removed from the mainline kernel.
But how about VLAs in non-upstream code? STACKLEAK with check_alloca()
could protect it from Stack Clash!

----

Let's see what will happen with my patches in the next merge window.
The current v15 fits Linus' requirements.
I wish Kees Cook the best of luck to negotiate with Linus.

Trying to get STACKLEAK into the kernel

Posted Sep 13, 2018 14:02 UTC (Thu) by jake (editor, #205) [Link]

> STACKLEAK has nothing to do with Spectre and retpolines.

ah, sorry about that ... my brain badly wanted to turn 'return trampoline' into 'retpoline' for some reason ... i have adjusted the article ...

thanks,

jake

Trying to get STACKLEAK into the kernel

Posted Sep 14, 2018 14:48 UTC (Fri) by a13xp0p0v (guest, #118926) [Link] (2 responses)

Ok, Brad Spengler has replied with a very detailed post about STACKLEAK:
https://grsecurity.net/~spender/stackleak_response.txt

Cool, I appreciate that! Normally such discussions happen at the Linux Kernel
Mailing List (LKML), but in our case I have to reply here at LWN.

I also see that Brad is desperately trying to be always right. Actually he can relax:
he is genius, he is always right, I'm not protesting :)

I just do my work: I'm pushing this feature into the mainline.

Let me reply Brad.

============

> RE: the assertion in track_stack, the flaw in it (the added ~) was found by Tycho Andersen,
> not Alex (though it's not claimed directly, it's implied as Alex is taking credit for the STACKLEAK
> upstreaming work). This was properly credited below:
> commit 12927d314b2763dd791ef11e56c42184fba4d3f8
> Author: Brad Spengler <spender@grsecurity.net>
> Date: Tue Aug 15 07:11:47 2017 -0400
>
> Fix 32bit stackleak stack_left test present in grsec only, as spotted
> by Tycho Andersen

Oh, Brad, come on! Tycho is my good friend, he supports my upstreaming efforts very
much, I always praise his work. Yes, he was the first who spotted your mistake with '~':
https://www.openwall.com/lists/kernel-hardening/2017/08/15/1

And then later I've investigated the recursive BUG() trouble caused by this check and finally
dropped it in v6:
https://www.openwall.com/lists/kernel-hardening/2017/12/0...

> This test however doesn't have to do with PAX_STACKLEAK as mentioned there -- you can look
> at any PaX patch and see that the test doesn't exist. What the check in grsecurity (tried) to do was
> piggy-back off being called in useful places throughout the entire kernel and in the lack of
> KSTACKOVERFLOW wanted to avoid a recursion-based stack overflow from being able to cleanly
> overwrite its intended target. In fact, the same day I made the above change I added an #ifndef
> to make this explicit:
> commit 16e1332faabc9f270fde9787ddb23e95cb2aad9c
> Author: Brad Spengler <spender@grsecurity.net>
> Date: Tue Aug 15 07:16:30 2017 -0400
>
> Make 32bit stack_left check depend on !KSTACKOVERFLOW to improve performance a bit
>
> So it is correct that there was a bug in the check I added that caused it to be a no-op, but it's
> not part of the STACKLEAK defense and I don't believe we ever advertised that particular added check.

Ok, I see. I know that it is your code, not PaX Team's.

But I didn't know that it is NOT a part of STACKLEAK feature. That's because all we have
is a single giant grsecurity patch and NOT the git history which you quote here.
N.B. I don't blame you.

> A further claim was "STACKLEAK was missing PLACES where the stack needed erasing" (emphasis mine).
> Alex identified one location in ret_from_fork which in our 4.9 patch was missing instrumentation
> for x86_32. This instrumentation wasn't missing in the original STACKLEAK code, nor in our stable
> patches for 3.2 or 3.14. Alex is free to verify this as we have done. It was introduced during
> some upstream churn in the entry code via the following commit:
> commit 39e8701f33d65c7f51d749a5d12a1379065e0926
> Author: Andy Lutomirski <luto@kernel.org>
> Date: Mon Oct 5 17:48:13 2015 -0700
>
> x86/entry/32: Open-code return tracking from fork and kthreads
>
> This open-coding changed ret_from_fork from following a path that would perform the stack clearing
> to one that would not, and since we didn't have any comment-based guards in place there, it slipped
> our notice. As mentioned, this only affected i386, and would be rendered benign by having
> RANDKSTACK enabled (as it is by default in autoconfig) which would clear the full stack on entry to
> the following syscall (as the lowest_stack field is set to the end of the stack for the new
> process). Further, the newly-created process' stack would already be cleared in the presence of
> CONFIG_DEBUG_STACK_USAGE or in any kernel with commit e01e80634ecdde1dd113ac43b3adad21b47f3957
> "fork: unconditionally clear stack on fork". Further, it is likely (possibly guaranteed, I'd have
> to confirm this) that the presence of PAX_MEMORY_SANITIZE (which would be auto-enabled in every
> instance where STACKLEAK was auto-enabled) would ensure the newly-allocated stack would be cleared
> with the SANITIZE poison value.

Ok, so it's evil Andy Lutomirski who has broken your patch :)
Ok, it's not security relevant, that is good.

> There was no other location identified by Alex (and I went ahead and confirmed with v14 that no
> other location has been identified), so the claim that "places" (a plural) were missing
> instrumentation is false.

Oh, I'm sorry for 'PLACES', it's definitely my fault...
In v6 I added two missing erase_kstack() calls:
https://www.openwall.com/lists/kernel-hardening/2017/12/0...
But later one of them turned out to be surplus (kudos to Dmitry V. Levin).

So let me sum up:
- it is not your mistake, it is evil upstream which has broken your patch :)
- there was only ONE erase_kstack() missing, which I fixed.

> Regarding errors in the alloca() checking, Alex's claims there are false. get_stack_info didn't
> exist when STACKLEAK was first written, but when it was introduced we did convert to using it.
> We don't needlessly duplicate functionality of get_stack_info, we only have some additional code
> for correctly computing the amount of stack space left, and our checks there are correct.

Hm... Let's compare the code. That's funny to do it here and not at LKML.

Here is my version (ouch, lwn breaks the identation):

+void __used stackleak_check_alloca(unsigned long size)
+{
+ unsigned long sp = (unsigned long)&sp;
+ struct stack_info stack_info = {0};
+ unsigned long visit_mask = 0;
+ unsigned long stack_left;
+
+ BUG_ON(get_stack_info(&sp, current, &stack_info, &visit_mask));
+
+ stack_left = sp - (unsigned long)stack_info.begin;
+
+ if (size >= stack_left) {
+ /*
+ * Kernel stack depth overflow is detected, let's report that.
+ * If CONFIG_VMAP_STACK is enabled, we can safely use BUG().
+ * If CONFIG_VMAP_STACK is disabled, BUG() handling can corrupt
+ * the neighbour memory. CONFIG_SCHED_STACK_END_CHECK calls
+ * panic() in a similar situation, so let's do the same if that
+ * option is on. Otherwise just use BUG() and hope for the best.
+ */
+#if !defined(CONFIG_VMAP_STACK) && defined(CONFIG_SCHED_STACK_END_CHECK)
+ panic("alloca() over the kernel stack boundary\n");
+#else
+ BUG();
+#endif
+ }
+}

And here is grsecurity code for x86_64 from the last public patch (there is a
separate implementation for x86_32):

+void __used pax_check_alloca(unsigned long size)
+{
+ struct stack_info stack_info = {0};
+ unsigned long visit_mask = 0;
+ unsigned long sp = (unsigned long)&sp;
+ unsigned long stack_left;
+
+ BUG_ON(get_stack_info(&sp, current, &stack_info, &visit_mask));
+
+ switch (stack_info.type) {
+ case STACK_TYPE_TASK:
+ stack_left = sp & (THREAD_SIZE - 1);
+ break;
+
+ case STACK_TYPE_IRQ:
+ stack_left = sp & (IRQ_STACK_SIZE - 1);
+ break;
+
+ case STACK_TYPE_EXCEPTION ... STACK_TYPE_EXCEPTION_LAST:
+ stack_left = sp & (EXCEPTION_STKSZ - 1);
+ break;
+
+ case STACK_TYPE_SOFTIRQ:
+ default:
+ BUG();
+ }
+
+ BUG_ON(stack_left < 256 || size >= stack_left - 256);
+}

First, I think there is no reason for this 'switch', since get_stack_info() calculates
the stack size itself, so we can simply do:
+ stack_left = sp - (unsigned long)stack_info.begin;

Moreover, I have previously stated that different exception stacks have different
size at x86_64. All of them are 4K, except the debug stack which is 8K:
static unsigned long exception_stack_sizes[N_EXCEPTION_STACKS] = {
[0 ... N_EXCEPTION_STACKS - 1] = EXCEPTION_STKSZ,
[DEBUG_STACK - 1] = DEBUG_STKSZ
};

Maybe it's not critical for alloca check... Anyway, I prefer to rely on get_stack_info()
to follow "Don't Repeat Yourself" rule. And if it changes, we don't have to patch
check_alloca().

Now the second difference: grsecurity code uses this '256' magic value in BUG_ON().

Mark Rutland and I had a long discussion about it in this thread:
http://openwall.com/lists/kernel-hardening/2018/05/11/12

I think that this '256' is useless here, since we don't know how much of stack space
the BUG_ON() handling consumes. So it can overflow these 256 bytes and corrupt the
neighbour memory.

And here is our solution:

+ if (size >= stack_left) {
+ /*
+ * Kernel stack depth overflow is detected, let's report that.
+ * If CONFIG_VMAP_STACK is enabled, we can safely use BUG().
+ * If CONFIG_VMAP_STACK is disabled, BUG() handling can corrupt
+ * the neighbour memory. CONFIG_SCHED_STACK_END_CHECK calls
+ * panic() in a similar situation, so let's do the same if that
+ * option is on. Otherwise just use BUG() and hope for the best.
+ */
+#if !defined(CONFIG_VMAP_STACK) && defined(CONFIG_SCHED_STACK_END_CHECK)
+ panic("alloca() over the kernel stack boundary\n");
+#else
+ BUG();
+#endif

At the same time I see one tricky aspect in my code.
We don't know in which order the compiler puts the local variables on the stack.
So calculating the stack pointer with this:
+ unsigned long sp = (unsigned long)&sp;
can make alloca size check incorrect (but 256 magic value mitigates that).

If one day I will come up with the "check_alloca() add-on" to my v15, I will use
'current_stack_pointer' instead to avoid that aspect.

> If Alex would like us to explain to him how his change there is incorrect and our checks are correct,

Brad, you perfectly know all my arguments, I've already posted them at LKML,
and I always put you in CC. My development process is completely open.

> I'd be happy to explain it in full provided he agree to donate $1000 to a charity of my choosing.
> Now that I've stated he's wrong, he's able to either figure out the reason on his own and correct
> his statement publicly, or if he's so certain he's correct, has nothing to lose by entering
> into this challenge. In the case that we're wrong (not possible as we re-confirmed it prior to
> writing this), I'll be happy to admit defeat and donate $1000 to charity, providing full proof to
> the public and correcting this statement.

Huh. Organizing such a bet to donate money to charity?
We do it on a regular basis without such bets. I'm sure, Grsecurity is a company with
social responsibility, which regularly donates to charity, doesn't it?

Anyway, let me thank you once again for all the information that you've already shared with us.
All my arguments and patches are open, feel free to use them for your version of STACKLEAK.

By the way, you keep silence about my fixes in the gcc plugin... Didn't you apply them?

> So to sum up:
> Yes there was a bug in an added check in grsecurity that depended on STACKLEAK being enabled, but
> which wasn't advertised and wasn't part of the STACKLEAK defense. This was found by Tycho
> Andersen, not Alex, and credited properly in our changelogs.

You are absolutely right. I should only add that your check also causes the recursive BUG()
which corrupts the memory below the stack bottom or hits the guard page.

> Yes, in some newer versions of grsecurity (after the commit mentioned above) we were missing
> explicit STACKLEAK clearing in returning from fork on i386 in a newly-forked process. Due to the
> other factors mentioned above, this likely had 0 real-life impact.

I absolutely agree. I've fixed this SINGLE flaw.

> No, our alloca() tests aren't wrong and don't needlessly duplicate code. We have made a public
> offer to donate $1000 to charity if we're wrong on this point (with us offering to provide all the
> details to easily determine the truth of the statement) provided that Alex agrees to the same
> terms, as we won't do the KSPP's work for free.

Sorry Brad, I don't like this idea. I'm not going to donate to charity because of such a bet.
I've already described all my technical arguments above.
But I'm also not going to force you "do the KSPP's work for free".

I sincerely thank you for such an interesting discussion!
Maybe later I will post the link or digest to LKML, since discussing Linux kernel patches
in twitter-lwn-something doesn't work well.

Best regards,
Alexander

Trying to get STACKLEAK into the kernel

Posted Sep 17, 2018 15:31 UTC (Mon) by shemminger (subscriber, #5739) [Link]

This is great technical feedback. And it is why security fixes should be done in the open on mailing lists instead of flame wars and twitter storms.

Trying to get STACKLEAK into the kernel

Posted May 27, 2019 15:27 UTC (Mon) by a13xp0p0v (guest, #118926) [Link]

[resurrecting an old thread]

> Moreover, I have previously stated that different exception stacks have different
> size at x86_64. All of them are 4K, except the debug stack which is 8K:
> static unsigned long exception_stack_sizes[N_EXCEPTION_STACKS] = {
> [0 ... N_EXCEPTION_STACKS - 1] = EXCEPTION_STKSZ,
> [DEBUG_STACK - 1] = DEBUG_STKSZ
> };

This particular statement was wrong - my mistake!
Brad has revealed (thanks to him) why calculating the stack size from grsecurity was correct:
"The debug IST stack is actually two separate debug stacks to handle #DB recursion"
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.g...
Great!

I will *not* make a patch for the upstream kernel since alloca() checking didn't get to the mainline -
all VLA (Variable Length Arrays) should be removed instead.

Best regards,
Alexander

Trying to get STACKLEAK into the kernel

Posted Sep 13, 2018 17:43 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

I normally do not care about Linus' behavior, but this really starts to cross the line: https://lkml.org/lkml/2018/8/15/510

No. The correct reply here would not be:
> "Is there someone else up there we can talk to?"

It would be an apology with a reference to Monty Python to explain it. And thinking the next time that in some cultures "your mom" jokes are especially EXTREMELY offensive.

Trying to get STACKLEAK into the kernel

Posted Sep 13, 2018 18:34 UTC (Thu) by patrick_g (subscriber, #44470) [Link] (5 responses)

>in some cultures "your mom" jokes are especially EXTREMELY offensive.

Problem is you'll always find a culture in which a specific referential joke is offensive.
The only realistic solution is to accept jokes depending on the work environment and sub-culture you are in. And Monty Python references/jokes are an important part of the hacker culture so I don't see Linus' mail crossing the line.

Trying to get STACKLEAK into the kernel

Posted Sep 13, 2018 18:41 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

Doesn't matter. In this case one person clearly indicated that Linus had crossed the line. The correct answer here is to apologize and maybe try to explain yourself.

Trying to get STACKLEAK into the kernel

Posted Sep 13, 2018 18:56 UTC (Thu) by patrick_g (subscriber, #44470) [Link]

> The correct answer here is to apologize and maybe try to explain yourself.

IMHO Linus tried to explain the joke because he wrote : "just google for it if you haven't seen the Holy Grail".

Trying to get STACKLEAK into the kernel

Posted Sep 13, 2018 21:33 UTC (Thu) by GennaroReinger (guest, #127208) [Link] (2 responses)

The thing is he wasn't chatting with his buddies while drinking in the pub. He was communicating with people which have only professional relationship with him. Some of them may don't even know (in-person) or like him.

It also tells something that those "jokes" and other rude comments are traveling only the one way.

Trying to get STACKLEAK into the kernel

Posted Sep 14, 2018 17:29 UTC (Fri) by hkario (subscriber, #94864) [Link] (1 responses)

you could look at it the other way, as Linus degrading himself and putting the person "attacked" as the King Arthur while himself as the french buffoon that won't let the king to enter for entirely petty reasons

Trying to get STACKLEAK into the kernel

Posted Sep 15, 2018 15:14 UTC (Sat) by nilsmeyer (guest, #122604) [Link]

Thus offending the French ;)

Trying to get STACKLEAK into the kernel

Posted Sep 15, 2018 8:25 UTC (Sat) by meyert (subscriber, #32097) [Link]

👏 for your work mr. Poppov! please keep on pushing this to the mainline kernel!