Kernel runtime security instrumentation

By Jake Edge
September 4, 2019

Finding ways to make it easier and faster to mitigate an ongoing attack against a Linux system at runtime is part of the motivation behind the kernel runtime security instrumentation (KRSI) project. Its developer, KP Singh, gave a presentation about the project at the 2019 Linux Security Summit North America (LSS-NA), which was held in late August in San Diego. A prototype of KRSI is implemented as a Linux security module (LSM) that allows eBPF programs to be attached to the kernel's security hooks.

Singh began by laying out the motivation for KRSI. When looking at the security of a system, there are two sides to the coin: signals and mitigations. The signals are events that might, but do not always, indicate some kind of malicious activity is taking place; the mitigations are what is done to thwart the malicious activity once it has been detected. The two "go hand in hand", he said.

For example, the audit subsystem can provide signals of activity that might be malicious. If you have a program that determines that the activity actually is problematic, then you might want it to update the policy for an LSM to restrict or prevent that behavior. Audit may also need to be configured to log the events in question. He would like to see a unified mechanism for specifying both the signals and mitigations so that the two work better together. That is what KRSI is meant to provide.

He gave a few examples of different types of signals. For one, a process that executes and then deletes its executable might well be malicious. A kernel module that loads and then hides itself is also suspect. A process that executes with suspicious environment variables (e.g. LD_PRELOAD) might indicate something has gone awry as well.

On the mitigation side, an administrator might want to prevent mounting USB drives on a server, perhaps after a certain point during the startup. There could be dynamic whitelists or blacklists of various sorts, for kernel modules that can be loaded, for instance, to prevent known vulnerable binaries from executing, or stopping binaries from loading a core library that is vulnerable to ensure that updates are done. Adding any of these signals or mitigations requires reconfiguration of various parts of the kernel, which takes time and/or operator intervention. He wondered if there was a way to make it easy to add them in a unified way.

eBPF + LSM

He has created a new eBPF program type that can be used by the KRSI LSM. There is a set of eBPF helpers that provide a "unified policy API" for signals and mitigations. They are security-focused helpers that can be built up to create the behavior required.

Singh is frequently asked why he chose to use an LSM, rather than other options. Security behaviors map better to LSMs, he said, than to things like seccomp filters, which are based on system call interception. Various security-relevant behaviors can be accomplished via multiple system calls, so it would be easy to miss one or more, whereas the LSM hooks intercept the behaviors of interest. He also hopes this work will benefit the overall LSM ecosystem, he said.

He talked with some security engineers about their needs and one mentioned logging LD_PRELOAD values on process execution. The way that could be done with KRSI would be to add a BPF program to to the bprm_check_security() LSM hook that gets executed when a process is run. So KRSI registers a function for that hook, which gets called along with any other LSM's hooks for bprm_check_security(). When the KRSI hook is run, it calls out to the BPF program, which will communicate to user space (e.g. a daemon that makes decisions to add further restrictions) via an output buffer.

The intent is that the helpers are "precise and granular". Unlike the BPF tracing API, they will not have general access to internal kernel data structures. His slides [PDF] had bpf_probe_read() in a circle with a slash through it as an indication of what he was trying to avoid. The idea is to maintain backward compatibility by not tying the helpers to the internals of a given kernel.

He then went through various alternatives for implementing this scheme and described the problems he saw with them. To start with, why not use audit? One problem is that the mitigations have to be handled separately. But there is also a fair amount of performance overhead when adding more things to be audited; he would back that up with some numbers later in the presentation. Also, audit messages have rigid formatting that must be parsed, which might delay how quickly a daemon could react.

Seccomp with BPF was up next. As he said earlier, security behaviors map more directly into LSM hooks than to system-call interception. He is also concerned about time-of-check-to-time-of-use (TOCTTOU) races when handling the system call parameters from user space, though he said he is not sure that problem actually exists.

Using kernel probes (kprobes) and eBPF was another possibility. It is a "very flexible" solution, but it depends on the layout of internal kernel data structures. That makes deployment hard as things need to be recompiled for each kernel that is targeted. In addition, kprobes is not a stable API; functions can be added and removed from the kernel, which may necessitate changes.

The final alternative was the Landlock LSM. It is geared toward providing a security sandbox for unprivileged processes, Singh said. KRSI, on the other hand, is focused on detecting and reacting to security-relevant behaviors. While Landlock is meant to be used by unprivileged processes, KRSI requires CAP_SYS_ADMIN to do its job.

Case study

He then described a case study: auditing the environment variables set when executing programs on a system. It sounds like something that should be easy to do, but it turns out not to be. For one thing, there can be up to 32 pages of environment variables, which he found surprising.

He looked at two different designs for an eBPF helper, one that would return all of the environment variables or one that just returned the variable of interest. The latter has less overhead, so it might be better, especially if there is a small set of variables to be audited. But either of those helpers could end up sleeping because of a page fault, which is something that eBPF programs are not allowed to do.

Singh did some rough performance testing in order to ensure that KRSI was not completely unworkable, but the actual numbers need to be taken with a few grains of salt, he said. He ran a no-op binary 100 times and compared the average execution time (over N iterations of the test) of that on a few different systems: a kernel with audit configured out, a kernel with audit but no audit rules, one where audit was used to record execve() calls, and one where KRSI recorded the value of LD_PRELOAD. The first two were measured at a bit over 500µs (518 and 522), while the audit test with rules came in at 663µs (with a much wider distribution of values than any of the other tests). The rudimentary KRSI test clocked in at 543µs, which gave him reason to continue on; had it been a lot higher, he would have shelved the whole idea.

There are plenty of things that are up for discussion, he said. Right now, KRSI uses the perf ring buffer to communicate with user space; it is fast and eBPF already has a helper to access it. But that ring buffer is a per-CPU buffer, so it uses more memory than required, especially for systems with a lot of CPUs. There is already talk of allowing eBPF programs to sleep, which would simplify KRSI and allow it to use less memory. Right now, the LSM hook needs to pin the memory for use by the eBPF program. He is hopeful that discussions in the BPF microconference at the Linux Plumbers Conference will make some progress on that.

As part of the Q&A, Landlock developer Mickaël Salaün spoke up to suggest working together. He went through the same thinking about alternative kernel facilities that Singh presented and believes that Landlock would integrate well with KRSI. Singh said that he was not fully up-to-speed on Landlock but was amenable to joining forces if the two are headed toward the same goals.

[I would like to thank LWN's travel sponsor, the Linux Foundation, for funding to travel to San Diego for LSS-NA.]

Index entries for this article
Kernel	BPF/Security
Kernel	Security/Security modules
Security	Linux kernel/BPF
Security	Linux Security Modules (LSM)
Conference	Linux Security Summit North America/2019

to post comments

Kernel runtime security instrumentation

Posted Sep 4, 2019 20:40 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (27 responses)

eBPF and LSM to create what is essentially an antivirus. Mmm... What could possibly go wrong?

Kernel runtime security instrumentation

Posted Sep 4, 2019 22:41 UTC (Wed) by kpsingh (subscriber, #112411) [Link] (26 responses)

Would you consider audit and SELinux to be antiviruses? What KRSI intends to do is provide a unified API to flexibly configure auditing and (mitigation based on the audited data) very similar to what Linux does today but in multiple different places.

Also "antivirus" is an overloaded term, I guess you mean it as something that detects and prevents malicious activity based on known signals?

As to what could go wrong? We will find out when we try it :)

Kernel runtime security instrumentation

Posted Sep 4, 2019 23:23 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (23 responses)

I consider SELinux to be an anti-feature and auditing a giant slowdown and a waste of time.

> mitigation based on the audited data
The only valid mitigation for a detected intrusion is to bring down or isolate the host.

Kernel runtime security instrumentation

Posted Sep 6, 2019 13:40 UTC (Fri) by cpitrat (subscriber, #116459) [Link] (19 responses)

You can easily imagine cases where isolating the host could have worse consequences than anything the attacker could do. Having ways to react automatically and limit attacker's possibilities is still useful.

This could also be useful in honeypots.

Kernel runtime security instrumentation

Posted Sep 6, 2019 16:24 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (18 responses)

> You can easily imagine cases where isolating the host could have worse consequences than anything the attacker could do.
For example?

> Having ways to react automatically and limit attacker's possibilities is still useful.
Then why not do it from the start?

Kernel runtime security instrumentation

Posted Sep 6, 2019 16:53 UTC (Fri) by cpitrat (subscriber, #116459) [Link] (17 responses)

For example if the host is supporting a critical service, then switching to a highly protected mode (think read-only, potentially degraded mode) allows to continue serving while investigating rather than having a DoS caused by a script kiddy doing a prank.

This is just one scenario. This seems like a flexible solution that allows for some interesting tools.

Kernel runtime security instrumentation

Posted Sep 6, 2019 16:54 UTC (Fri) by cpitrat (subscriber, #116459) [Link] (1 responses)

For a more concrete example, the degraded mode could be a self driving car pulling over or giving back control to the driver.

Kernel runtime security instrumentation

Posted Sep 6, 2019 20:33 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

This is actually an example where isolation is the best policy.

Kernel runtime security instrumentation

Posted Sep 6, 2019 16:58 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (13 responses)

There are so many things wrong with this picture:
1) A single critical server.
2) Accessible through the Internet.
3) To a script kiddie.

> This is just one scenario. This seems like a flexible solution that allows for some interesting tools.
This seems like an overengineered solution for a non-problem.

Kernel runtime security instrumentation

Posted Sep 6, 2019 18:19 UTC (Fri) by cpitrat (subscriber, #116459) [Link] (12 responses)

1) A single critical server.
I didn't say there was a single one. There can be multiple one and they all get compromised at (more or less) the same time by the same person.
2) Accessible through the Internet.
If the service is available through the Internet, that's unavoidable. The server could have been exploited through the service it provides.
3) To a script kiddie.
See 2)

Kernel runtime security instrumentation

Posted Sep 6, 2019 20:35 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (11 responses)

So isolate all of them. The "active countermeasures" nonsense is just crap. There's not much you can do once an attacker is in and you have to let them continue.

Kernel runtime security instrumentation

Posted Sep 7, 2019 16:47 UTC (Sat) by kpsingh (subscriber, #112411) [Link] (10 responses)

You say that once you find out you are attacked, the host should be completely isolated. (While I do not agree with this). Even if one was to agree with you, the detection cannot simply happen without effective monitoring of security signals.

Whether you chose to block the specific malicious activity or the host itself is a decision you can make.

Kernel runtime security instrumentation

Posted Sep 7, 2019 18:44 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (7 responses)

> You say that once you find out you are attacked, the host should be completely isolated. (While I do not agree with this). Even if one was to agree with you, the detection cannot simply happen without effective monitoring of security signals.
And so why does this need yet even more eBPF crap?

Kernel runtime security instrumentation

Posted Sep 7, 2019 18:58 UTC (Sat) by kpsingh (subscriber, #112411) [Link] (6 responses)

Because you can create signals dynamically with the krsi eBPF

PS: I don't intend to reply further if your communication stays unprofessional.

Kernel runtime security instrumentation

Posted Sep 7, 2019 19:07 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

What signals? Seriously, where is an example of use with an example of mitigated damage and that can't be done with existing audit infrastructure?

All I see is handwaving like:

> There could be dynamic whitelists or blacklists of various sorts, for kernel modules that can be loaded, for instance, to prevent known vulnerable binaries from executing, or stopping binaries from loading a core library that is vulnerable to ensure that updates are done.
If you have a "vulnerable binary" then why the hell it's not deleted?

For me personally the last thing I want is more of SELinux-style security theater that _will_ inevitably break in various exciting ways.

Kernel runtime security instrumentation

Posted Sep 7, 2019 19:21 UTC (Sat) by kpsingh (subscriber, #112411) [Link] (4 responses)

You yourself mentioned auditing is a giant slowdown. So, you are now contradicting yourself!

Patching / deleting a binary on a really huge number of servers cannot be done in seconds.

Can you audit environment variables with audit? No you cannot!

What do you need to do to add support? Change a lot of stuff, the policy language, auditd, parsers etc.

The development cycle for adding a new signal and then some new policy based on the signal, e.g. a permission error if you set the same environment variable twice, touches many components and this is an attempt to fix that.

Kernel runtime security instrumentation

Posted Sep 7, 2019 20:26 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

> You yourself mentioned auditing is a giant slowdown. So, you are now contradicting yourself!
And so will be eBPF. There's also a low-hanging fruit of using BPF to JIT-compile the audit rules.

> Patching / deleting a binary on a really huge number of servers cannot be done in seconds.
What does it have to do with audit slowness?

> Can you audit environment variables with audit? No you cannot!
> What do you need to do to add support? Change a lot of stuff, the policy language, auditd, parsers etc.
Do environment variables actually pose a significant threat to warrant a new full-blown, user-controlled arbitrary code injection facility on the critical paths? Can it itself be abused to create livelocks/deadlocks? Can an adversary use it to frustrate efforts to recover? ....

> The development cycle for adding a new signal and then some new policy based on the signal, e.g. a permission error if you set the same environment variable twice, touches many components and this is an attempt to fix that.
I contend that none of this is even needed, as it's going to be useless and trivial to bypass.

Kernel runtime security instrumentation

Posted Sep 7, 2019 20:52 UTC (Sat) by kpsingh (subscriber, #112411) [Link] (2 responses)

>And so will be eBPF. There's also a low-hanging fruit of using BPF to JIT-compile the audit rules.
^^^^^^^^^^^^^^^^^^^
We are doing performance comparisons and it's not.

>> Patching / deleting a binary on a really huge number of servers cannot be done in seconds.
> What does it have to do with audit slowness?

It's got to do with your statement: "If you have a "vulnerable binary" then why the hell it's not deleted?"

> Do environment variables actually pose a significant threat to warrant a new full-blown, user-controlled arbitrary code injection facility on the critical paths? Can it itself be abused to create livelocks/deadlocks? Can an adversary use it to frustrate efforts to recover? ....

Environment variables is one use case where one needs to use a signal that audit does not currently provide. We are **not** talking about unprivileged eBPF here, it needs CAP_SYS_ADMIN and CAP_MAC_ADMIN. If privileged users want to shoot themselves in their feet, they have plenty other opportunities.

> The development cycle for adding a new signal and then some new policy based on the signal, e.g. a permission error if you set the same environment variable twice, touches many components and this is an attempt to fix that.
> I contend that none of this is even needed, as it's going to be useless and trivial to bypass.

I disagree. It's about building defense in depth. The more hoops an attacker has to jump to attack you, the slower and harder it gets for them. Anyways, I am happy to hear if you have a constructive solution. Otherwise, this discussion is simply leading nowhere.

Kernel runtime security instrumentation

Posted Sep 7, 2019 22:04 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> We are doing performance comparisons and it's not.
Then improve it. Translate audit rules into BPF and run them.

> It's got to do with your statement: "If you have a "vulnerable binary" then why the hell it's not deleted?"
How do you recognize that a binary was used for nefarious purposes?

> Environment variables is one use case where one needs to use a signal that audit does not currently provide.
Then extend it, rather than create a completely new system. Is there anything else that is not covered by audit subsystem and that is not a trivial addition?

> I disagree. It's about building defense in depth. The more hoops an attacker has to jump to attack you, the slower and harder it gets for them. Anyways, I am happy to hear if you have a constructive solution. Otherwise, this discussion is simply leading nowhere.
The constructive solution is simple - improve audit subsystem instead of adding more eBPF.

Kernel runtime security instrumentation

Posted Sep 7, 2019 22:17 UTC (Sat) by kpsingh (subscriber, #112411) [Link]

> We are doing performance comparisons and it's not.
> Then improve it. Translate audit rules into BPF and run them.

Feel free to go that route and suggest / make improvements to audit. Audit does not meet our other key requirement of having the MAC and signaling (auditing) possible with a single API, which is something that you are not constrained by (based on your comments)

Kernel runtime security instrumentation

Posted Sep 11, 2019 5:17 UTC (Wed) by ssmith32 (subscriber, #72404) [Link] (1 responses)

If you don't isolate the host, even ignoring security concerns, you're just kinda being a jerk, because you're knowingly providing connected resources to a bad actor. Yes, caveats may apply, but, in general, you should isolate it. Note that trying to justify not isolating it by wanting to gather more info for a more for an exciting security blog/research paper does not qualify as a caveat..

Kernel runtime security instrumentation

Posted Sep 11, 2019 5:48 UTC (Wed) by cpitrat (subscriber, #116459) [Link]

I'd expect some kind of justification when you call jerks a significant number of security researchers who use honeypots.

Otherwise, I can do it too:
If you don't isolate the host, you're not being a jerk.

Ok you said: "because you're knowingly providing connected resources to a bad actor." But anybody can have connected resources and it's very cheap. Look, I'm using one to answer you.

If you think about a botnet of honeypots, I think your either overestimating the number of honeypots, their lifespan or underestimating the number of hosts required for a useful botnet.

Kernel runtime security instrumentation

Posted Sep 11, 2019 4:54 UTC (Wed) by ssmith32 (subscriber, #72404) [Link]

Sooo... a "Safe Mode". Just hold insert while it boots!

Yeah, cheap shot, but toooo easy. I'll be quiet now.

Kernel runtime security instrumentation

Posted Sep 8, 2019 6:48 UTC (Sun) by jezuch (subscriber, #52988) [Link] (2 responses)

Well, my thought was that you would investigate while limiting potential damage so that you don't alert the attackers so that you have some more time to identify them. But I'm not a security expert and this sounds dangerous even to me so I don't expect this to be a plausible scenario.

Kernel runtime security instrumentation

Posted Sep 8, 2019 7:08 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

It's pretty clear that the idea here is to create something like Windows antiviruses, an automatic tool to detect malicious patterns and try to counteract them.

Unfortunately, the patch authors don't seem to have nearly enough experience with that kind of stuff. Modern Windows antiviruses have multiple layers or defenses, they intrude into the very heart of the OS. Windows itself scans and checksums its internal control structures (PatchGuard, CodeIntegrity), and antiviruses tune it up to 11. Which is kinda awe inspiring - it's like watching CoreWar.

Yet it's still not enough. All the OS protections have been bypassed ( https://www.symantec.com/content/dam/symantec/docs/securi... ) and malware now routinely bypasses antiviruses. This is because attacks don't get worse, they always keep getting better.

Kernel runtime security instrumentation

Posted Sep 11, 2019 5:05 UTC (Wed) by ssmith32 (subscriber, #72404) [Link]

Ok, I'm not sure how relevant it is, but that paper was from over 10 years, when Symantec was all bent out of shape that Microsoft's drivers were going to be able to do things it's drivers - written largely without code review and QA - couldn't.

Some rumblings about anti trust later, an API was provided, Symantec realized windows was a dying revenue stream, and you haven't seen much work in the area since. So it's a bit of an unknown.

Kernel runtime security instrumentation

Posted Sep 11, 2019 4:51 UTC (Wed) by ssmith32 (subscriber, #72404) [Link] (1 responses)

Well, in the defense of Cyberax, the only other software that I've ever seen that attempts the same goal (detecting and reacting to actions that violate a security policy), in the same manner (a kernel level vm running a limited language that allows hooks to be dynamically added over time), was commercially marketed as antivirus software.

{rant}
On the other hand, KRSI is not attempting to call out to central servers, nor is it maintaining a large and ever-growing blacklist in something referred to as a "memory-mapped" file, which actually refers to home brewed code designed to swap memory contents to disk. A list from which nothing can be removed, even if designed to detect old DOS floppy malware, out of fear of failing some dog & pony show that putatively somehow relates to how effective it is. Nor does it incorporate code from a wide variety of engineers that is completely unreviewed and never QA'd. Nor does it attempt to do whatever it can to insert itself wherever it can in kernel memory. And, oh yeah, is produced and managed solely for profit.
{\rant}

In other words, there are a few differences too... ;)

Kernel runtime security instrumentation

Posted Sep 11, 2019 7:51 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

> a kernel level vm running a limited language
In case of Kaspersky Antivirus this VM actually is (or used to be as of ~5 years ago) position-independent x86 code that is checked to not include loops or external calls. Kinda like JIT-compiled eBPF :)