From O_MAYEXEC to trusted_for()

By Jonathan Corbet
October 1, 2020

The ability to execute the contents of a file is controlled by the execute-permission bits — some of the time. If a given file contains code that can be executed by an interpreter — such as shell commands or code in a language like Perl or Python, for example — there are easy ways to run the interpreter on the file regardless of whether it has execute permission enabled or not. Mickaël Salaün has been working on tightening up the administrator's control over execution by interpreters for some time, but has struggled to find an acceptable home for this feature. His latest attempt takes the form of a new system call named trusted_for().

Tightly locked-down systems are generally set up to disallow the execution of any file that has not been approved by the system's overlords. That control is nearly absolute when it comes to binary machine code, especially when security modules are used to enforce signature requirements and prevent techniques like mapping a file into some process's address space with execute permission. Execution of code by an interpreter, though, just looks like reading a file to the kernel so, without cooperation from the interpreter itself, the kernel cannot know whether an attempt is being made to execute code contained within a given file. As a result, there is no way to apply any kernel-based policies to that type of access.

Enabling that cooperation is the point of Salaün's work; it is, at its core, a way for an interpreter to inform the kernel that it intends to execute the contents of a file. Back in May 2020, the first attempt tried to add an O_MAYEXEC flag to be used with the openat2() system call. If system policy does not allow a given file to be executed, an attempt to open it with O_MAYEXEC will fail.

This feature was controversial for a number of reasons, but Salaün persisted with the work; version 7 of the O_MAYEXEC patch set was posted in August. At that point, Al Viro asked, in that special way he has, why this check was being added to openat2() rather than being made into its own system call. Florian Weimer added that doing so would allow performing checks on an already-open file; that would enable interpreters to indicate an intent to execute code read from their standard input, for example — something that O_MAYEXEC cannot do. Salaün replied that controlling the standard input was beyond the scope of what he was trying to do.

Nonetheless, he tried to address this feedback in version 8, which implemented a new flag (AT_INTERPRETED) for the proposed faccessat2() system call instead. That allowed the check to be performed on either a file or an open file descriptor. This attempt did not fly either, though, with Viro insisting that a separate system call should be provided for this feature. This approach also introduced a potential race condition if an attacker could somehow change a file between the faccessat2() call and actually opening the file. So Salaün agreed to create a new system call for this functionality.

Thus, the ninth version introduced introspect_access(), which would ask the kernel if a given action was permissible for a given open file descriptor. There comes a point in kernel development (and beyond) when one can tell that the substantive issues have been addressed: when everybody starts arguing about naming instead. That is what happened here; Matthew Wilcox didn't like the name, saying that checking policy on a file descriptor is not really "introspection". Various suggestions then flew by, including security_check(), interpret_access(), entrusted_access(), fgetintegrity(), permission(), lsm(), should_faccessat(), and more.

In the tenth version, posted on September 24, Salaün chose none of those. The proposed new system call is now:

    int trusted_for(const int fd, const enum trusted_for_usage usage,
                    const unsigned int flags);

The fd argument is, of course, the open file descriptor, while usage describes what the caller intends to do with that file descriptor; in this patch set, the only possible option is TRUSTED_FOR_EXECUTION, but others could be added in the future. There are no flags defined, so the flags argument must be zero. The return value is zero if the system's security policy allows the indicated usage, or EACCES otherwise. In the latter case, it is expected that the caller will refuse to proceed with executing the contents of the file.

The patch also adds a new sysctl knob called fs.trust_policy for setting a couple of policy options. Setting bit zero disables execution access for files located on a filesystem that was mounted with the noexec option; bit one disables execution for any file that does not have an appropriate permission bit set. Both of these are checks that are not made by current kernels. There are no extra security-module hooks added at this time, but that would appear to be the plan in the future; that will allow more complex policies and techniques like signature verification to be applied.

This time around, even the name of the system call has avoided complaints — as of this writing, at least. So it may just be that this long-pending feature will finally make its way into the mainline kernel. That is not a complete solution to the problem, of course. Security-module support will eventually be needed, but also support within the interpreters themselves. That will require getting patches accepted into a variety of user-space projects. Fully locking down access to files by interpreters, in other words, is going to take a while yet.

Index entries for this article
Kernel	Filesystems/Virtual filesystem layer
Kernel	Security/Language interpreters

to post comments

From O_MAYEXEC to trusted_for()

Posted Oct 1, 2020 21:10 UTC (Thu) by zev (subscriber, #88455) [Link] (17 responses)

The cover letter of the patch series mentions things like modifying the python interpreter to disable -c, but if you can control a shell enough to attempt python -c "$(cat foo.py)" as a workaround for not being able to run python foo.py directly, why not just have the shell do your evil bidding itself?

What exactly even defines an interpreter though anyway? Or execution of code? As soon as you've got a data-dependent conditional branch, the line between data and code starts to melt into much more of a continuous spectrum.

From O_MAYEXEC to trusted_for()

Posted Oct 1, 2020 21:58 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

> What exactly even defines an interpreter though anyway? Or execution of code?

My understanding of this entire effort is that these questions are fundamentally userspace's problem. If the Python interpreter tells the kernel "I want to execute this file," then that counts as "executing" that file. If the Python interpreter does not so indicate, then it doesn't count. As far as the kernel is concerned, userspace is a black box that can call trusted_for() whenever it wants, and the kernel gives it answers based on security policies, in the same way as userspace can call flock() whenever it wants, and the kernel gives it answers based on previous calls to flock(). flock() is not enforced in any meaningful way; it's up to userspace to decide what it means and when it ought to be called.

In practice, I would imagine that it will be completely impractical for Python to audit all possible instances of code being evaluated at runtime, *and* tie those audit events back to an identifiable file or file descriptor (for example, if the user does exec(open("foo.py").read()), how does exec() know what file that string came from?). The former is already in PEP 578, but the latter does not seem possible from my understanding of how Python actually works. So instead, you basically have two options:

1. Some kind of "restricted mode Python," which would disable most runtime code-evaluation altogether unless it can be directly tied to a file (but see https://www.python.org/dev/peps/pep-0578/#why-not-a-sandbox).
2. Put Python inside of a container, VM, or hypervisor, and use that as your security boundary instead of trying to lock down Python.

IMHO #2 is by far the most straightforward approach to something like this. It does not require any changes to either Python or the kernel. But there are certainly other interpreted languages that are easier to lock down than Python, and defense in depth is a good idea anyway, so this is probably still worth doing regardless of factors specific to Python.

From O_MAYEXEC to trusted_for()

Posted Oct 2, 2020 5:15 UTC (Fri) by comex (subscriber, #71521) [Link]

I can think of two possibilities; both pretty marginal though:

- Preventing attackers from pivoting from some limited or unstable form of ‘code execution’, like return-oriented programming after exploiting some C code, to a more flexible and stable Python script. But ROP is Turing complete, so there’s no real *need* to pivot in the first place; it’s just more convenient for an attacker.

- Guarding against some kind of command injection vulnerability that involves arguments directly passed to execve() or similar rather than going through a shell. But that would be a very unusual vulnerability.

From O_MAYEXEC to trusted_for()

Posted Oct 2, 2020 5:42 UTC (Fri) by richiejp (guest, #111135) [Link] (4 responses)

Because most people won't understand the security implications of using `-c` in a shell script. You have to put up barriers to people taking the insecure path.

From O_MAYEXEC to trusted_for()

Posted Oct 2, 2020 7:13 UTC (Fri) by zarak (guest, #132244) [Link] (3 responses)

Can you elaborate the issue with `-c` in a script ? I must be part of "most people".

From O_MAYEXEC to trusted_for()

Posted Oct 2, 2020 8:38 UTC (Fri) by richiejp (guest, #111135) [Link] (2 responses)

Well the problem is not with -s in general, but using it when you could pass the file name instead. If you load the script contents into memory with `cat` and then pass it as an argument to Python with -s, Python can't check the original file with `trusted_for`. It either has to assume the script is trusted, disable -s or sh/cat needs to check the permissions before passing the data to Python. I suppose there is the same issue with passing data on stdio, which is mentioned in the article.

Also, on a partially related note, there was some buffer overflow or "stack smashing" attack involving large command lines and now the linux command line length is much more limited to prevent that, so you probably don't want to use `-s` in shell scripts unless it is a string of known length generated in the script or static.

BTW "most people" includes myself when I'm in a less trusted state.

From O_MAYEXEC to trusted_for()

Posted Oct 2, 2020 8:41 UTC (Fri) by richiejp (guest, #111135) [Link]

I mean -c not -s xD

From O_MAYEXEC to trusted_for()

Posted Nov 3, 2020 20:53 UTC (Tue) by nix (subscriber, #2304) [Link]

Also, on a partially related note, there was some buffer overflow or "stack smashing" attack involving large command lines

This was incredibly annoying and continues to break my workflows to this day, but apparently not breaking userspace doesn't apply when it can be abused by attackers and it might be inconvenient to fix it.

From O_MAYEXEC to trusted_for()

Posted Oct 2, 2020 11:41 UTC (Fri) by NAR (subscriber, #1313) [Link] (2 responses)

The shell is less powerful than Python, e.g. as far as I know, you can't open TCP connections to remote sites from the shell without using external programs like curl, wget, netcat, etc.

From O_MAYEXEC to trusted_for()

Posted Oct 2, 2020 12:40 UTC (Fri) by geert (subscriber, #98403) [Link] (1 responses)

I thought so, too. Until I ran "man bash", and found:

       Bash handles several filenames specially when they are used in redirec‐
       tions, as described in the following table.  If the operating system on
       which bash is running provides these special files, bash will use them;
       otherwise it will emulate them internally with the  behavior  described
       below.

              /dev/tcp/host/port
                     If host is a valid hostname or Internet address, and port
                     is  an integer port number or service name, bash attempts
                     to open the corresponding TCP socket.
              /dev/udp/host/port
                     If host is a valid hostname or Internet address, and port
                     is  an integer port number or service name, bash attempts
                     to open the corresponding UDP socket.

And it works!

machineA$ nc -l -p 8000

machineB$ echo hello > /dev/tcp/machineA/8000

From O_MAYEXEC to trusted_for()

Posted Oct 2, 2020 15:30 UTC (Fri) by mwsealey (subscriber, #71282) [Link]

> If the operating system on which bash is running provides these special files, bash will use them; otherwise it will emulate them internally

About time for someone to write a system-level /dev/tcp and /dev/udp... isn't this something systemd-networkd should provide?

From O_MAYEXEC to trusted_for()

Posted Oct 3, 2020 4:38 UTC (Sat) by martin.pitt (subscriber, #26246) [Link] (6 responses)

I totally don't see the point of this entire approach. As the author admits himself, this is super-easy to work around, and there is not even a well-defined goal. There may be embedded systems where the user is not supposed to do arbitrary stuff, but then disable login or bash/python/awk/sed/perl/etc. But this doesn't work with standard distros obviously.

If you want to lock down a system, protect the filesystem/OS *objects* with MAC like SELinux or AppArmor, and restrict resource usage with cgroups. That's a proven and robust path. Whereas this is just dead weight and snake oil, sorry.

From O_MAYEXEC to trusted_for()

Posted Oct 3, 2020 7:03 UTC (Sat) by dxin (guest, #136611) [Link] (5 responses)

>There may be embedded systems where the user is not supposed to do arbitrary stuff

That's the entire Android ecosystem BTW, aka 90%+ of Linux users. I'm pretty sure the need for this comes from Android as it's the only distro that has complete control over the entire FS. For any other distro that control is partially in the hands of admins and partially in the users'.

The usecase is pretty clear IMO. The interpreter wants to know if it should proceed to execute a script. Until this feature there's simply nowhere to ask. To take full advantage of this feature of course every interpreter that ships with the distro will be patched to disable execuating from stdin and respect what kernel returns from trusted_for().

TBH it's kind of a design mistake for an interpreter to not check for execute flag from the beginning.

From O_MAYEXEC to trusted_for()

Posted Oct 3, 2020 19:58 UTC (Sat) by NYKevin (subscriber, #129325) [Link] (4 responses)

> TBH it's kind of a design mistake for an interpreter to not check for execute flag from the beginning.

The only way (that I know of) to do that is to call access(2), which has a Big Honking Warning telling you not to use it as a security boundary because of TOCTTOU races.

Sure, you can call fstat(2) and manually evaluate the mode bits against the EUID, but then the operating system cannot enforce filesystem noexec and similar policies, which the Android people almost certainly want to do. So instead the interpreter has to know about every system-level policy that might possibly want to restrict execute permission, for every platform where it runs, and manually check them all one at a time, which is gross and uncivilized. Much better to just ask the kernel "Hey, can I execute that?" - exactly the question trusted_for() is intended to answer.

From O_MAYEXEC to trusted_for()

Posted Oct 3, 2020 23:09 UTC (Sat) by MrWim (subscriber, #47432) [Link] (3 responses)

access("/proc/self/fd/n", X_OK) should be safe from TOCTTOU I would have thought.

From O_MAYEXEC to trusted_for()

Posted Oct 6, 2020 16:38 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (2 responses)

I think you're right, but I can still think of a number of problems with this:

- euidaccess(3) appears to fall back to mode bits if uid != euid. The version I looked at doesn't even check for EROFS in this case (a fact which is conveniently omitted from the man page). But interpreters probably shouldn't be setuid in the first place.
- stdin might be a tty/pty, an anonymous pipe, etc., which are non-executable on most systems most of the time. But most interpreters want to treat a tty, at least, as if it were executable (because otherwise you can't have a REPL), so now you have to figure out whether to allow any other stdin-related exceptions, and if so, how to check for them.
- You need proc to be mounted and accessible.
- /proc/[pid]/fd is Linux-specific. So is trusted_for(), but trusted_for() was specifically designed for this individual use case, while /proc was not, so interpreters would have had to foresee that this extra permissions enforcement would be useful on Linux in particular. They would be unable to do it on systems that lacked a /proc or lacked a /proc/[pid]/fd.
- I'm having some difficulty figuring out exactly when /proc/[pid]/fd was introduced. For very old interpreters, it might not have been available at the time they made this decision (especially for languages that originated on a pre-Linux Unix).

And the single most important reason:

- Most people are going to read the access/euidaccess man pages, which tell you not to use them, and then they will conclude that you should not use them. This is the same problem that /dev/urandom had.

From O_MAYEXEC to trusted_for()

Posted Oct 7, 2020 6:46 UTC (Wed) by zev (subscriber, #88455) [Link] (1 responses)

stdin might be a tty/pty, an anonymous pipe, etc., which are non-executable on most systems most of the time. But most interpreters want to treat a tty, at least, as if it were executable (because otherwise you can't have a REPL), so now you have to figure out whether to allow any other stdin-related exceptions, and if so, how to check for them.

On the kinds of systems this is (I guess?) purportedly targeted at, I'd expect losing REPL functionality wouldn't be considered a problem.

However, that in turn makes me wonder if they might be better served by eliminating the general-purpose interpreter binary entirely and replacing each script that depends on it with the equivalent of

#include <Python.h>
int main(void) { PyRun_SimpleString("...script source code..."); }

(A partial application of the script to the interpreter, basically.) Seems like it'd be a much more foolproof approach.

From O_MAYEXEC to trusted_for()

Posted Oct 8, 2020 3:29 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

You would also need to C-ify every single .py file in both the standard library and all of your external dependencies. Since you can't use PyRun_SimpleString() for anything that doesn't live in __main__, this quickly becomes a lot more complicated than what you describe.

Cython can *mostly* do that work automatically, but I don't think they've yet managed to reach perfect 1:1 compatibility with pure Python modules, so you would need to run a battery of unit tests over the resulting output to be sure it actually works.

From O_MAYEXEC to trusted_for()

Posted Oct 3, 2020 21:34 UTC (Sat) by sbaugh (guest, #103291) [Link] (5 responses)

With this kind of interface, it's less clear why the kernel is even involved. Couldn't this check be performed in userspace, by querying some kind of userspace policy daemon? If a userspace daemon can't get enough information to determine whether the access should be allowed, perhaps userspace should be given that information in some way...

Maybe that sounds excessively micro-kernel-y, but this whole scheme will require a carefully tuned and controlled userspace anyway...

From O_MAYEXEC to trusted_for()

Posted Oct 6, 2020 5:14 UTC (Tue) by matthias (subscriber, #94967) [Link]

> If a userspace daemon can't get enough information to determine whether the access should be allowed, perhaps userspace should be given that information in some way...

You mean by adding a trusted_for() system call? ;)

Sorry, could not resist.

From O_MAYEXEC to trusted_for()

Posted Oct 6, 2020 16:31 UTC (Tue) by jamesmorris (subscriber, #82698) [Link] (2 responses)

One use-case is the flag then being used by an LSM (e.g. IPE) to cause the file to be integrity-verified.

From O_MAYEXEC to trusted_for()

Posted Oct 6, 2020 16:48 UTC (Tue) by sbaugh (guest, #103291) [Link] (1 responses)

That could happen in user space too, though, couldn't it? Can't user space query the integrity-verified-status of a file?

From O_MAYEXEC to trusted_for()

Posted Oct 8, 2020 6:16 UTC (Thu) by jamesmorris (subscriber, #82698) [Link]

Yes, but this crosses a trust boundary.

From O_MAYEXEC to trusted_for()

Posted Oct 8, 2020 5:21 UTC (Thu) by skx (subscriber, #14652) [Link]

It is funny you say that, because I wrote a Linux Security Module which does almost exactly that:

Every time the kernel tries to execute a binary..
I calls back to user-space to run a "is this permitted" executable.
If the user-space program returns 0 the execution is permitted, otherwise it is denied.

On the one hand this is horrific, on the other hand it does work. It would allow you to write policies in userspace.

The overhead is high, for every executable that is launched you have to also launch the policy-program.

No chance in hell of this getting merged into the kernel, so I didn't even try, but I had fun learning:

https://github.com/skx/linux-security-modules/tree/master/security/can-exec

From O_MAYEXEC to trusted_for()

Posted Oct 9, 2020 23:18 UTC (Fri) by sbelmon (subscriber, #55438) [Link] (1 responses)

I'm baffled. Since the interpreter has to cooperate anyway, why not have it check some setting in /etc that says "hey, interpreters, don't run any script that isn't +x"?

From O_MAYEXEC to trusted_for()

Posted Oct 10, 2020 2:15 UTC (Sat) by mjg59 (subscriber, #23239) [Link]

Just leaving it up to the interpreters to do validation means you can't do signature validation without implementing equivalent code in every interpreter. Passing information up to the kernel lets you apply an appropriate IMA check instead.

From O_MAYEXEC to trusted_for()

Posted Nov 3, 2020 21:01 UTC (Tue) by nix (subscriber, #2304) [Link] (4 responses)

Hm. Don't like that syscall much. Not because of the name, though:

- what's with all the const? Const applied to pointers is meaningful. Const applied to integers... it has no effect, but it doesn't even feel like useful for documentation purposes. I can't figure out what it might mean even conceptually. You can't modify this... integer constant? Well, no, you generally can't mutate abstract mathematical entities like integers without being God (or older FORTRAN ;) ).
- is it sane to use enums in system calls, given that their bit-width is implementation-defined and can change merely because you *add values* to the enumeration?

From O_MAYEXEC to trusted_for()

Posted Nov 10, 2020 17:09 UTC (Tue) by ksandstr (guest, #60862) [Link]

The consting applied to parameters passed by value is just noise. And the enum issue applies in userspace alone, since the user-to-kernel syscall interface promotes integral-typed parameters to native word size anyway; so the manpage trusted_for(2) would specify that as an int.

Sort of smells like a literal-minded machine translation of a completely different language, in fact.

From O_MAYEXEC to trusted_for()

Posted Nov 10, 2020 22:31 UTC (Tue) by anselm (subscriber, #2796) [Link] (2 responses)

Const applied to integers... it has no effect, but it doesn't even feel like useful for documentation purposes. I can't figure out what it might mean even conceptually.

Hm. No effect?

$ cat t.c
void foo(int k) {
    k++;
}

void cfoo(const int k) {
    k++;
}
$ gcc t.c
t.c: In function ‘cfoo’:
t.c:6:6: error: increment of read-only parameter ‘k’
    6 |     k++;
      |      ^~

Looks like you don't get to change a const int parameter inside a function. This doesn't seem to require huge conceptual leaps.

As far as the C language is concerned, it's OK to change the (non-const) int parameter inside the foo() function, if one doesn't mind that the caller of foo() never sees the change. In effect, the parameter is just another local variable.

From O_MAYEXEC to trusted_for()

Posted Nov 12, 2020 0:29 UTC (Thu) by foom (subscriber, #14868) [Link] (1 responses)

Const in a declaration is what's meaningless (and ignored by the compiler).

E.g. if you have a declaration in a header:

void foo(const int k);

You can still entirely validly and correctly have the implementation:

void foo(int k) {
    k++;
}

From O_MAYEXEC to trusted_for()

Posted Nov 20, 2020 22:12 UTC (Fri) by nix (subscriber, #2304) [Link]

Aha! Thank you: as with so many decades-old misconceptions this one is really obvious in hindsight once pointed out.

Occasionally I fall into the trap of thinking I know a lot about C, and then something dead obvious and simple like this pops up and I realise I know nothing.