Auditing io_uring

By Jonathan Corbet
June 3, 2021

The io_uring subsystem, first introduced in 2019, has quickly become the leading way to perform high-bandwidth, asynchronous I/O. It has drawn the attention of many developers, including, more recently, those who are focused more on security than performance. Now some members of the security community are lamenting a perceived lack of thought about security support in io_uring, and are trying to remedy that shortcoming by adding audit and Linux security module support there. That process is proving difficult, and has raised the prospect of an unpleasant fallback solution.

The Linux audit mechanism allows the monitoring and logging of all significant activity on the system. If somebody wants to know, for example, who looked at a specific file, an audit-enabled system can provide answers. This capability is required to obtain any of a number of security certifications which, in turn, are crucial if one wants to deploy Linux in certain types of security-conscious settings. It is probably fair to say that a relatively small percentage of Linux systems have auditing turned on, but distributors, almost without exception, enable auditing in their kernels.

The audit mechanism relies, in turn, on a large array of hooks sprinkled throughout the kernel source. Whenever an event that may be of interest occurs, it is reported via the appropriate hook to the audit code. There, a set of rules loaded from user space controls which events are reported to user space.

When io_uring was being developed (which is still happening now, of course), the developers involved were deeply concerned about performance and functionality. Supporting security features like auditing was not at the top of their list, so they duly neglected to add the needed hooks — or to think about how auditing could be supported in a way consistent with the performance goals. Now that io_uring is showing up in more distributor kernels (and, in particular, the sorts of kernels where auditing is relatively likely to be enabled), security-oriented developers are starting to worry about it. Having io_uring serve as a way to circumvent the otherwise all-seeing audit eye does not seem like a good way to maintain those security certifications.

Adding security support

In late May, Paul Moore (a maintainer of the audit subsystem) posted a set of patches adding Linux security module (LSM) and audit capabilities to io_uring. The LSM side is relatively straightforward; the operations performed by io_uring are already covered by LSM hooks, so all that was needed was a pair of new hooks to pass judgment on io_uring-specific actions. Specifically, these hooks control the sharing (between processes) of credentials that are stored with the ring buffer that is used to communicate operation requests to the kernel; see this patch for details. This part of the patch set does not seem to be controversial.

The audit code is another story. The core io_uring code has been carefully optimized to dispatch requests and their results as quickly as possible. Use cases for io_uring can involve performing millions of I/O operations per second, so any added overhead will prove most unwelcome. Adding the audit hooks to cover operations submitted through the ring slowed down one of the most performance-critical parts of io_uring, leading Pavel Begunkov to react negatively: "So, it adds two if's with memory loads (i.e. current->audit_context) per request in one of the hottest functions here... No way, nack".

Begunkov suggested that perhaps the audit hooks could be patched in at run time when they are actually enabled, the way tracepoints and kprobes work. Moore responded that the audit subsystem doesn't support that sort of patching, and that doing so could raise problems of its own: "I fear it would run afoul of the various security certifications". So that does not appear to be the route to a possible solution.

Meanwhile, Jens Axboe, the io_uring maintainer, ran some tests. A simple random-read test slowed down by nearly 5% with the audit hooks installed, even in the absence of any actual audit rules. Various other benchmarks, even when run with an updated version of the patch set (which was not posted publicly), gave the same results. Kernel developers can work for months for a 5% performance gain; losing that amount to audit hooks is a bitter pill for them to swallow.

Axboe pointed out that read and write operations are not audited when they are initiated through the older asynchronous I/O system calls. "In the past two decades, I take it that hasn't been a concern?" He agreed that some operations (such as opening or removing files) should be audited, but said that auditing read and write operations was "just utter noise and not useful at all". Since those operations are the ones where performance matters the most, taking the audit hooks out of the fast path for them might be a possible solution.

Moore suggested an approach based on that idea; only a specific, carefully chosen set of operations would have the audit hooks applied. There is a handy switch statement in the io_uring dispatcher that makes it easy to instrument just the desired operations. He asked for feedback, but has not gotten much so far. The important question, as Begunkov pointed out, is which operations in particular need audit support. Adding an audit call when opening a file is unlikely to bother anybody; a call added to, say, a poll operation would be another story. Moore has posted an initial set of operations that he thinks merit auditing.

Threats and grumbles

With luck, that solution will prove acceptable to everybody. The alternative to adding audit support to io_uring is, according to Moore, not particularly pleasant:

If we can't find a solution here we would need to make io_uring and audit mutually exclusive and I don't think that is in the best interests of the users, and would surely create a headache for the distros.

"Headache" is not really the word for it. If the two features are made exclusive, then it will not be possible to configure a kernel containing both of them. So distributors would have to either ship two different kernels (something they will go far out of their way to avoid) or pick one of the two features to support. Hopefully it will not come to that.

Meanwhile, there has been some disgruntlement expressed by developers on both sides, but the security developers have made it especially clear that they would have liked to see audit designed into io_uring from the beginning. As Casey Schaufler put it:

It would have been real handy had the audit and LSM requirements been designed into io_uring. The performance implications could have been addressed up front. Instead it all has to be retrofit.

Richard Guy Briggs also complained that "multiple aspects of security appear to have been a complete afterthought to this feature, necessitating it to be bolted on after the fact". The implication in both cases is that, with adequate forethought, the difficulties being encountered now could have been avoided.

That is arguably a fair criticism. Kernel developers working on new features often leave security as something to be thought about later; that is especially true for relatively niche features like auditing, which is unlikely to be enabled on development systems. The kernel community can be a bit unfriendly toward its security developers, characterizing them as prioritizing security above anything else (and above performance in particular). Such an environment seems like a recipe for leaving security concerns by the wayside, to be fixed up later.

It is also fair to point out, though, that io_uring has been developed in public since early 2019. It has been heavily discussed on the mailing lists (and in LWN), but the security community did not see that as their cue to make suggestions on how features like auditing could be supported. It is a rare kernel developer who can summon the focus to implement a million-operation-per-second I/O subsystem while simultaneously making provisions for security hooks that won't kill performance. Perhaps the io_uring developers should have been considering security from the beginning, but they should also have had help from the beginning.

The kernel community has surprisingly few rules regarding the addition of new features like io_uring. In theory, new system calls should come with manual pages, but it's somewhat surprising when that actually happens. In a project with a more bureaucratic process, it would make sense to insist that new features do not go in until they have proper support for mechanisms like LSM and auditing. That might force earlier interactions with security developers and avoid this kind of problem.

That is not the world that we live in, though; there is nobody with a checklist making sure that all of the relevant boxes have been marked before a new subsystem can be merged. So the kernel community will have to continue to muddle along, supporting the needed features as best it can. This is not the last time that a security mechanism will have to be retrofitted into an existing kernel feature. It's arguably not the best approach, but it generally gets the job done in the end.

Index entries for this article
Kernel	Auditing
Kernel	io_uring/Security
Security	io_uring

to post comments

Auditing io_uring

Posted Jun 3, 2021 19:13 UTC (Thu) by willy (subscriber, #9762) [Link] (8 responses)

FWIW, this is the kind of thing that leads people to say things like "The kernel is too slow at FOO; I did a prototype in userspace and it was eleventy-bajillion percent faster!"

Having worked on one or two of these projects, my first response is "What functionality did your prototype leave out to achieve that speedup, are real users actually ok without those features, and can we make those features optional in the current implementation?"

Yeah, audit support is slow. But it's also mandatory in some environments.

Auditing io_uring

Posted Jun 3, 2021 20:46 UTC (Thu) by JanC_ (guest, #34940) [Link] (6 responses)

But how many would need both hyper-detailed audit support and hyper-fast I/O?

Auditing io_uring

Posted Jun 3, 2021 20:50 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

A database server with important information might want fast IO and good security/auditing support.

Auditing io_uring

Posted Jun 3, 2021 22:05 UTC (Thu) by andresfreund (subscriber, #69562) [Link] (1 responses)

Seems unlikely that kernel level audit support for individual read/write operations would be useful for a database server. The audit information wouldn't contain enough context (which client, how did they authorize, what statement), and the audit operations would often end up far from the user actions due to the database buffer pools. Including IO potentially happening in different processes/threads from the user action.

Auditing io_uring

Posted Jun 3, 2021 23:04 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

Typically you want audit for other stuff, to detect and trace intrusions into the overall system.

Auditing io_uring

Posted Jun 14, 2021 0:27 UTC (Mon) by bartoc (guest, #124262) [Link]

a huge percentage of users. SELinux depends on AUDIT, and basically all managed linux systems have that on. All Fedora, Red Hat, and Ubuntu systems have either SELinux or AppArmor turned on by default, with various levels of enforcement and permissiveness in their settings.

I doubt anyone really needs audit support for async reads and writes but lacking file open and close operations would add a pretty trivial way around selinux.

Further the whole point of selinux is defense in depth should some application get compromised, and there are lots of applications that are network accessible and require fast IO, or want async IO that has the model of io_uring.

Auditing io_uring

Posted Jun 14, 2021 16:18 UTC (Mon) by flussence (guest, #85566) [Link]

If it were possible to have both at the same time, MS Windows probably wouldn't be as miserably slow as it is with active antivirus scanners that intercept file ops. That's a problem they've been saddled with for an extremely long time and not for lack of resources.

Auditing io_uring

Posted Jun 21, 2021 0:23 UTC (Mon) by roblucid (guest, #48964) [Link]

Someone who wants auditting won't be happy if it can be circumvented using different system calls.
Everyone cares about performance sometimes, even those audit crazy types, so the best solution is to make auditting low over head.

Auditing io_uring

Posted Jun 4, 2021 14:52 UTC (Fri) by jbarns231 (subscriber, #152627) [Link]

Large scale customers I worked for in the past I saw flamegraphs where they had audit on with a number of rules. Man.. audit took ~80% of the cycles that were needed for the syscall and ~20% just for the actual logic. If you can just turn it off by default and opt-in globally the speedup is really immense. No wonder people complain kernel is too slow. Because of regulations or whatever the argument is here is <1% of kernel users.

Auditing io_uring

Posted Jun 3, 2021 22:39 UTC (Thu) by roc (subscriber, #30627) [Link] (2 responses)

Seems to me that people should push harder on the code-patching approach. The objections to it amount to "might be a problem". Well, find out if it IS a problem.

Auditing io_uring

Posted Jun 4, 2021 0:35 UTC (Fri) by Paf (subscriber, #91811) [Link]

It’s a fascinating idea, actually, with potentially a lot of benefit for expensive (and commonly disabled) security checks anywhere they exist. I hope it resurfaces.

Auditing io_uring

Posted Jun 4, 2021 8:48 UTC (Fri) by leromarinvit (subscriber, #56850) [Link]

Also, in case adding audit hooks at runtime really is a problem for certification - removing them when they aren't needed seems like it can't possibly be an issue. You want to make sure they're never disabled? Just don't load/install/compile the module that removes them.

This seems like it's mostly the same situation we have now, just the runtime patcher module is something an attacker needs to bring/write instead of it being bundled with the kernel. And when an attacker can load kernel modules, you've already lost anyway.

Auditing io_uring

Posted Jun 4, 2021 1:58 UTC (Fri) by dancol (guest, #142293) [Link] (6 responses)

Runtime code patching *is* the answer here, sorry. Static keys work fine. What *exactly* is the security concern behind runtime code patching to enable audit rules?

It seems to me like the opposition there is just more "it sounds scary, so no way" FUD-based superstition. Runtime code patching works fine and has worked fine for many years. I can't think of a single good reason that the audit subsystem is too precious for it.

Auditing io_uring

Posted Jun 4, 2021 13:35 UTC (Fri) by Paf (subscriber, #91811) [Link]

A very good variation was suggested above:

Patch *out* auditing at runtime, if not having it compiled in gives certification people the willies. (Now if we’re opposed to that, we have to ask ourselves why... and if the security certification people aren’t being so crazy after all.)

Auditing io_uring

Posted Jun 4, 2021 15:59 UTC (Fri) by Nahor (subscriber, #51583) [Link] (4 responses)

My bikeshed guess is that they don't want the attacker to be able to disable the audit. Patching out would not solve that problem.

Auditing io_uring

Posted Jun 4, 2021 19:03 UTC (Fri) by zlynx (guest, #2285) [Link] (3 responses)

What do they imagine would stop the attacker from loading a module of their own to patch out the audit code?

Auditing io_uring

Posted Jun 4, 2021 20:05 UTC (Fri) by Nahor (subscriber, #51583) [Link] (2 responses)

I was thinking that with patching out disallowed, the kernel could be made read-only by the bootloader, but I guess people who care about auditing could still make the kernel RO anyway, while other keep the kernel RW and can patch out the auditing code.

Auditing io_uring

Posted Jun 4, 2021 20:46 UTC (Fri) by zlynx (guest, #2285) [Link] (1 responses)

I am not sure about all other hardware but on x86 I am fairly sure that the only way to make kernel memory truly read-only is with a hypervisor enforcing it. Otherwise anything running at the kernel level can set it read-write again.

This is what Windows does with the Core Isolation setting, which uses Hyper-V to protect the kernel and selected drivers even from other driver code.

So I suppose that if you have a boot loader which can set up and configure a virtual machine to load the kernel into then you could have a read-only kernel image.

Auditing io_uring

Posted Jun 4, 2021 20:54 UTC (Fri) by Nahor (subscriber, #51583) [Link]

An article about making x86 trustworthy was just mentioned yesterday in the LWN weekly edition (https://mjg59.dreamwidth.org/57199.html).

But I'm guessing the issue of audit an io_uring is not specific to x86 anyway.

Auditing io_uring

Posted Jun 4, 2021 9:28 UTC (Fri) by Freeaqingme (guest, #103259) [Link] (1 responses)

I'm not sure I understand where the slowdown comes from. If it's in a hot path, the code is executed a gazillion times. Shouldn't the CPU be able to optimize it away pretty quickly using branch prediction?

I realize branch prediction has been somewhat of a hassle around the kernel (and elsewhere), but I'd expect it to be quite useful in scenarios like these.

Auditing io_uring

Posted Jun 4, 2021 12:12 UTC (Fri) by matthias (subscriber, #94967) [Link]

The CPU cannot optimize away the comparisons. The audit flag could change in the meantime. What branch prediction does, is to allow the execution to continue while the comparison is done. If the prediction is correct (which it should be almost always), you only have the impact of loading the values and doing the comparison. If the prediction is wrong, the CPU pipeline needs to be flushed (because execution continued in the wrong branch) and you loose many cycles.

It seems that the slowdown is just because of the overhead of the instructions added to the hot path. However, 5% is really quite much. Either the auditing code is not optimized that much, or the io_uring only has a few instructions on this path. This means that the "real code" only takes 20 times as much time as the auditing code that just has to verify that auditing is indeed turned off.

Auditing io_uring

Posted Jun 7, 2021 7:33 UTC (Mon) by jezuch (subscriber, #52988) [Link] (1 responses)

How about exporting the audit events via... io_uring?

Auditing io_uring

Posted Jun 10, 2021 11:07 UTC (Thu) by k3ninho (subscriber, #50375) [Link]

I have an example implementation right here, but this eBPF format is too small to contain it.

K3n.