SA_IMMUTABLE and the hazards of messing with signals

By Jonathan Corbet
December 17, 2021

There are some parts of the kernel where even the most experienced and capable developers fear to tread; one of those is surely the code that implements signals. The nature of the signal API almost guarantees that any implementation will be full of subtle interactions and complexities, and the version in Linux doesn't disappoint. So the inclusion of a signal-handling change late in the 5.16 merge window might have been expected to have the potential for difficulties; it didn't disappoint either.

Forced signals

The signal API normally allows any process to control what happens on receipt of a signal; that can include catching the signal, masking it temporarily, or blocking it outright. There are a few exceptions, of course, including for SIGKILL, which cannot be blocked or caught. Within the kernel, there is a more subtle exception wherein a process can be forced to take a signal and exit regardless of any other arrangements that process may have made. Most of these situations come about in response to hardware problems that cannot be recovered from, but the signals sent by the seccomp() mechanism are also forced in this way. At times, a signal is so important that it simply cannot be ignored.

The kernel has a function to force a signal in this way called force_sig_info_to_task(). It will unblock the intended signal if need be and deliver the signal to the target process; it can also remove the process's handler for this signal, resetting it to the default action (which is normally to kill the process for the signals in question). Interestingly, though, this function wasn't always being used in forced-signal situations; instead, the kernel would just kill the target process and set its exit status to look like the signal had done it. In October, Eric Biederman sent out a patch set causing the kernel to do what it was pretending to do and actually deliver the signal rather than fake it.

This change works as intended — except for a small problem that was pointed out by Andy Lutomirski. It seems that, in current kernels, a call to sigaction() in the target process can re-block a signal in the window between when force_sig_info_to_task() unblocks it and the actual delivery of that signal. A call to ptrace() can also race with forced signals in this way. This race can cause misbehavior at best; in some cases (such as the delivery of a signal from seccomp()) it may be a security problem.

Not wanting to introduce potential vulnerabilities, Biederman set out to close this race; the solution took the form of a new flag (SA_IMMUTABLE) that can be applied to a process's internal signal-handling information. This flag is normally not set; if that changes, then any subsequent attempts to change the handling of the signal in question will fail with an EINVAL error. The flag is hidden from user space and can only be set by the kernel, and that only happens in force_sig_info_to_task(). There is no way to clear SA_IMMUTABLE once it has been set; the assumption is that the process is on its way out anyway. This change fixes the race in question and, since the flag is invisible to user space, there are no user-space ABI concerns. Or so it was thought.

This patch was posted on October 29, and found its way into the mainline on November 10 — near the end of the 5.16 merge window — as part of a set of "exit cleanups". Once the 5.16-rc1 kernel was released, the patch was picked up by the stable series and appeared in 5.15.3 on November 18.

Debugger bugs

Unfortunately, the day before the 5.15.3 release, Kyle Huey reported that the SA_IMMUTABLE change breaks debuggers. It is common for interactive debuggers to catch signals on behalf of the debugged process, including some of the signals that are forced by the kernel (not all of which are fatal). With this change, ptrace() is no longer able to change the handling of SA_IMMUTABLE signals and, in fact, is no longer even notified of them. The patch, Huey said, should be reverted. It was nonetheless shipped in 5.15.3 the next day.

After some discussion, it was concluded that the SA_IMMUTABLE change was indeed overly broad; it blocked some legitimate signal-related operations that were possible before. Biederman posted a pair of patches on the 18th to address the problems. The first of those reflects the conclusion that not all signals forced by the kernel should have the SA_IMMUTABLE flag set on them; instead, that should be restricted to situations where the kernel intends for the target process to exit. That intent is communicated via a new parameter to force_sig_info_to_task(); the call from within the seccomp() subsystem is changed to indicate that intent. The second patch adds a new function (force_exit_sig()) to be used in other places where a forced exit is intended, and adds a number of callers.

It's worth noting that, in the forced-exit case, ptrace() is still unable to trap the signal after these changes. But that is no different from the way things worked before all of these patches went in. The previous implementation, remember, bypassed the signal-delivery mechanism entirely, so there was never any opportunity for a debugger to influence things. The kernel, as seen from user space, now behaves as it did before.

These patches appear to have fixed the problem; they were merged for 5.16-rc2, then found their way into 5.15.5 on November 25. The regression caused by the original patch, in other words, existed for a full week in the 5.15-stable kernel. The rules state that no patch should go into a stable kernel until after it has appeared in a mainline -rc release. That rule was followed in this case; 5.16-rc1 came out between the patch landing in the mainline and it showing up in a stable update. That same rule may have delayed the inclusion of the fixes into the stable 5.15 releases; it could not be considered until after 5.16-rc2 came out on November 21.

The relevant question may well be: is the one-release rule enough to prevent this kind of regression in the stable kernels? That rule was added in response to previous problems, and may well have prevented some bugs from being backported, but some still clearly get through. There is an argument to be made that, for patches that reach into tricky subsystems like signals, a more cautious approach to backporting would make sense. In the absence of the developer resources to make those decisions, though, the current policy is unlikely to change and the occasional regression will make a (hopefully brief) appearance in stable kernels.

Index entries for this article
Kernel	Releases/5.16
Kernel	Security/seccomp
Kernel	Signal handling

to post comments

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 18, 2021 8:10 UTC (Sat) by roc (subscriber, #30627) [Link] (11 responses)

What is actually needed here is not "a more cautious approach to backporting", but automated tests of basic functionality, which are actually run on every "stable" kernel at least once before it is released, and whose failure blocks the release of that "stable" kernel.

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 20, 2021 2:29 UTC (Mon) by khim (subscriber, #9252) [Link] (10 responses)

How exactly would tests help? This is clear case of Hyrum's Law: feature which developers haven't designed, planned or even expected to work… yet which users discovered and started using.

Tests rarely (if ever!) help in cases like that. They only verify that aspects of the implementation which developers know about work.

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 20, 2021 5:00 UTC (Mon) by roc (subscriber, #30627) [Link] (9 responses)

Tests not only *could have* caught this regression, they *actually did*: rr has an extensive automated test suite which caught this regression (and has caught many other kernel regressions). There is no reason why kernel CI tests shouldn't cover at least as much kernel functionality as rr's tests --- they could even just run rr's tests.

As it happens, this isn't Hyrum's Law. Ptracers are supposed to be notified every time a signal is delivered to a tracee (other than SIGKILL). It's clearly documented in the man page:

> While being traced, the tracee will stop each time a signal is delivered, even if the signal is being ignored. (An exception is SIGKILL, which has its usual effect.)

The changes here broke that guarantee.

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 21, 2021 16:10 UTC (Tue) by walters (subscriber, #7396) [Link] (8 responses)

Robert, I happen to agree with you overall. However, I am doubtful that LWN comments are going to make someone just wake up tomorrow and try to gate kernel commits on rr's test suite.

What is more likely to work (I'd say) is directly getting in touch with one of the CI groups. Maybe https://01.org/lkp/documentation/0-day/lkp-test-suites-de... ?

Bear in mind that this type of "reverse dependency testing" also incurs a maintenance burden; for example, any flakes in the rr test suite suddenly become flakes that kernel maintainers need to deal with. There's questions of versioning (pin to rr commit or track git main), build system, etc.

So if you're serious about this then I think some effort on your side would need to happen both initially to drive it, then there's an ongoing maintenance commitment.

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 22, 2021 2:04 UTC (Wed) by roc (subscriber, #30627) [Link]

You're right that there's a certain amount of shouting into the wind on my part.

> I think some effort on your side would need to happen both initially to drive it then there's an ongoing maintenance commitment.

I'm sure you're right about this too. Unfortunately I'm not confident the expected value *for us* of doing this would actually be greater than what we're currently doing; we could easily do a bunch of work and then the CI test failures end up being ignored (unless we do the work to diagnose and chase down each regression, which is not all that different from what we have to do today). At a less rational level, it grates to be a tiny, unpaid team doing work to clean up the messes of people being paid megabucks to do kernel development; this is already the case, but upping the stakes in the hope of a net win makes it more so.

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 22, 2021 2:16 UTC (Wed) by roc (subscriber, #30627) [Link] (6 responses)

> Bear in mind that this type of "reverse dependency testing" also incurs a maintenance burden; for example, any flakes in the rr test suite suddenly become flakes that kernel maintainers need to deal with. There's questions of versioning (pin to rr commit or track git main), build system, etc.

This is all true too.

Ideally there would be a standard test harness and repository for user-space tests of kernel APIs, and we would be able to merge tests from rr into that system. We wouldn't have to merge all of them, and we could run them hundreds of times pre-merge and reject any that failed even once. But as far as I know that harness and repository don't actually exist. Or am I wrong?

Kernel API test harnesses

Posted Dec 22, 2021 3:01 UTC (Wed) by corbet (editor, #1) [Link] (5 responses)

kselftest is probably the droid you are looking for here. I am sure they would welcome contributions!

The Linux Test Project might also be a good home.

Kernel API test harnesses

Posted Dec 22, 2021 9:41 UTC (Wed) by roc (subscriber, #30627) [Link] (4 responses)

Thanks.

Thinking about it some more, I guess I was wrong to suggest that merging rr tests into an existing test framework makes sense. Maybe 0-Day CI is the place for this after all.

Kernel API test harnesses

Posted Dec 22, 2021 14:09 UTC (Wed) by corbet (editor, #1) [Link] (3 responses)

Getting tests merged into kselftest is a good way to get them run in a number of the current testing operations, including those that run for stable kernel candidates.

Kernel API test harnesses

Posted Dec 22, 2021 20:59 UTC (Wed) by roc (subscriber, #30627) [Link] (2 responses)

Yeah, but you wouldn't want to merge a copy of rr itself into kselftest, and that's what would be needed.

Most rr tests are small programs that exercise certain kernel functionality. Running the test requires recording execution of the small program, followed by replaying it, and we verify that the replay worked correctly. Some of the kernel regressions rr caught could have been detected just by running the small programs normally --- because the kernel functionality they use regressed --- and those could be trivially added to kselftest. But many of the kernel regressions rr caught were related to ptrace, signals, etc that were exercised by rr itself.

rr isn't huge (50K-ish lines) but it's written in C++ and I doubt it would be acceptable to paste it into the kernel tree.

I think 0-Day-CI is really the right approach here. But do test failures there block kernel "stable" releases?

Kernel API test harnesses

Posted Dec 22, 2021 21:10 UTC (Wed) by corbet (editor, #1) [Link] (1 responses)

My suggestion would be to submit tests that only run if rr is available. But I'm not the kselftest maintainer, so I can't guarantee they would accept such a thing; some of those folks have been known to read here, maybe they could speak up.

As for blocking stable releases: regression reports will delay them and cause the offending patches to be fixed or removed; see the discussion that follows any of the review postings to see how that works. You can see an example here. The problem is not (usually) responding to reports of regressions, it's knowing that the regressions exist in the first place. As you have often (rightly) pointed out, more and better tests would help a lot in that regard.

Kernel API test harnesses

Posted Dec 23, 2021 20:59 UTC (Thu) by roc (subscriber, #30627) [Link]

Thanks for pointing that out. It looks like 0-Day CI results were not reported there. But results from https://lkft.linaro.org/ were reported there --- and despite the site saying "the mission of LKFT is to improve the quality of the Linux kernel by performing functional testing on Arm hardware" they actually run tests on x86 as well. Maybe that's the right place to add a new test suite.

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 18, 2021 12:32 UTC (Sat) by smcv (subscriber, #53363) [Link] (2 responses)

The recent FUSE regression triggered by "fuse: fix page stealing" in 5.16-rc2, 5.14.20, 5.15.3(?) and contemporary stable kernels, subsequently fixed by "fuse: release pipe buf after last use" in 5.16-rc3, 5.15.6, 5.10.83, 5.4.163 and and 4.19.219, is another example of a fix in mainline kernels being backported and then found to cause a regression.

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 18, 2021 12:41 UTC (Sat) by Wol (subscriber, #4433) [Link] (1 responses)

To me, saying it needs to appear in one mainline rc kernel is just asking for trouble.

It's been released to mainline FOR TESTING. To me that means it is definitely NOT known to be stable. So why is it being cherry-picked into stable? The requirement should be it needs to make it into a .0 kernel ...

Cheers,
Wol

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 18, 2021 17:31 UTC (Sat) by matthias (subscriber, #94967) [Link]

There cannot be one requirement that is appropriate for all patches. How fast a patch is backported to stable (if at all) should depend on the complexity of the patch and on the importance of the fix. Just as an example: Delaying the backport of KPTI until after the next kernel release would have been a very bad idea, meltdown was just too important. And there is the obvious problem that there are not enough developers to determine for each and every patch how much testing that needs and how important it is.

But yes, cherry-picking directly after rc1 seems not to be the safest option.

Best,
Matthias

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 18, 2021 20:00 UTC (Sat) by NYKevin (subscriber, #129325) [Link] (2 responses)

If a signal falls in the forest, but nobody can hear it, does it make a sound?

More prosaically, what exactly is the *observable* difference between sending a signal that can't be caught or ignored, and "faking it?" If there is no observable difference, why bother?

This is a genuine question, not a rhetorical "they wasted a bunch of time" argument. I don't know enough about the kernel's internals to answer it for myself.

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 18, 2021 21:42 UTC (Sat) by matthias (subscriber, #94967) [Link] (1 responses)

I just quote Eric from this (https://lwn.net/ml/linux-kernel/87sfwqxv8g.fsf@disp2133/) mail:

> The why is that do_exit as an interface needs to be refactored.
>
> As it exists right now "do_exit" is bad enough that on a couple of older
> architectures do_exit in a random location results in being able to
> read/write the kernel stack using ptrace.
>
> So to addresses the issues I need to get everything that really
> shouldn't be using do_exit to use something else.

Also it seems that there is indeed a difference: The "fake signal and do_exit()"-approach only terminates a single thread, however the signal should bring down the whole process.

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 23, 2021 4:51 UTC (Thu) by mmirate (guest, #143985) [Link]

> Also it seems that there is indeed a difference: The "fake signal and do_exit()"-approach only terminates a single thread, however the signal should bring down the whole process.

Ah, so *that* is why my multithreaded program isn't working on new kernels anymore. Whenever each thread finished its assigned work, it would clean itself up by committing a seccomp violation. Very annoying I have to change the program now. I'm going to complain on the mailing list.

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 19, 2021 19:48 UTC (Sun) by marcH (subscriber, #57642) [Link] (13 responses)

> Within the kernel, there is a more subtle exception wherein a process can be forced to take a signal and exit regardless of any other arrangements that process may have made.

Imagine "forcing" a specific interruption and interrupting a piece of kernel code that explicitly requested for it to be temporarily ignored. What could possibly go wrong!?

Oh wait, this is just user space code, so who cares :-)

> The nature of the signal API almost guarantees that any implementation will be full of subtle interactions and complexities, and the version in Linux doesn't disappoint.

A lost cause then...

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 19, 2021 22:18 UTC (Sun) by roc (subscriber, #30627) [Link] (12 responses)

The forcing behavior makes sense where it's used. For example if a process ignores/masks SIGILL and then executes an illegal instruction, that cannot really be "ignored", so there's no obviously better alternative to the kernel's approach unblocking the signal and then delivering it.

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 19, 2021 23:03 UTC (Sun) by marcH (subscriber, #57642) [Link] (11 responses)

> For example if a process ignores/masks SIGILL and then executes an illegal instruction, that cannot really be "ignored", so there's no obviously better alternative to the kernel's approach unblocking the signal and then delivering it.

So why can SIGILL be ignored in the first place?

This ignore/force "arms race" demonstrates the lack of clear, top-down classification of signals. For instance the fact that SIGKILL cannot be ignored is always mentioned "in passing" deep down in the SIGKILL-specific section. That always struck me as incredibly bad documentation and/or design: the distinction between signals that can be ignored versus not should be made _at the very top_ with further subsections for the various other classes of signals-that-can-be-ignored-except-when-not. In fact they should even have different prefixes, I mean SIGKILL should not even be called SIGKILL but something else like UNMASKABLE_KILL.

> The forcing behavior makes sense where it's used.

And what is the formal definition of "makes sense where it's used"? As long as there's none, race conditions and unanticipated corner cases will keep happening. Do we still need proof in 2021 that concurrency is hard and requires very careful and formal design? I heard that the high level design of signals was botched 50 years ago but Linux is now in a unique, quasi-monopolistic position to clean things up without caring about portability. However this requires a whiteboard and some computer _science_, not everything can be solved with a hands-on, incremental, "send patches" approach.

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 20, 2021 4:40 UTC (Mon) by roc (subscriber, #30627) [Link] (7 responses)

This is almost entirely unrelated to concurrency so I'm not sure why you're bringing that up.

> So why can SIGILL be ignored in the first place?

Yes, arguably it shouldn't be ignorable. However, it does make sense to install a signal handler for SIGILL (e.g. to emulate the instruction or recover in some other way). I don't know the history of the development of Unix signal handling, but it's easy to imagine that once signal handlers were invented, it was convenient to use a single mechanism for both async signals (e.g. SIGINT, SIGTERM) and "deterministic" signals such as SIGILL and SIGSEGV.

I certainly agree that when we look at what signals and signal handling have evolved into, the design looks pretty nasty.

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 20, 2021 6:54 UTC (Mon) by marcH (subscriber, #57642) [Link] (2 responses)

> This is almost entirely unrelated to concurrency

When you need exclusive access to some data, there is not much difference between 1. another thread accessing the data concurrently on a different core 2. another thread/handler pausing you to access the data on the same core. The concepts and tools are the same and people usually try to write code that handles both cases indifferently.

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 20, 2021 11:15 UTC (Mon) by roc (subscriber, #30627) [Link] (1 responses)

> The concepts and tools are the same

I don't agree with this at all. Signal-safety and thread-safety are actually very different concepts with very different constraints. For example, you can make code thread-safe by using mutexes, but that actually doesn't work at all for signal handlers because you will instantly deadlock if a signal handler tries to acquire a mutex held by the thread it interrupted (assuming non-reentrant mutexes; if your mutexes are reentrant the situation is even worse since you won't get mutual exclusion).

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 20, 2021 16:14 UTC (Mon) by marcH (subscriber, #57642) [Link]

Please try to describe the problems and solutions using concepts and terms not used when dealing with concurrency.

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 21, 2021 14:34 UTC (Tue) by jrtc27 (subscriber, #107748) [Link] (3 responses)

To add to the ways in which OpenSSL is a terrifying piece of software, it also installs a SIGILL handler in order to probe hardware features on Arm. Most are covered by AT_HWCAP* these days, but apparently not all, at least on 32-bit Arm. This makes debugging OpenSSL-using software particularly fun on Arm, since it breaks on the SIGILL during initialisation, and you have to continue past it.

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 23, 2021 19:38 UTC (Thu) by pm215 (subscriber, #98099) [Link] (2 responses)

Using SIGILL for that on Arm is a bad idea which has been disrecommended for more than a decade (the alternatives today being hwcaps or directly reading the ID registers, which the kernel trap-and-emulates). There is at least one case I know of where trying an insn might succeed without causing a SIGILL but where it is not safely available for userspace. (To be fair, that case is a bit old and obscure now -- a few early Cortex-A8 based boards had issues with Neon, so the kernel reported "no Neon" to userspace, but since there was no way to make Neon trap separately from VFP, code that took the "see if it SIGILLs" approach would get the wrong answer.)

On the bright side, openssl seems to skip all but one of the SIGILL tests if it can get the hwcaps through getauxval. Maybe the kernel needs to grow another hwcap for that last test?

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 29, 2021 3:22 UTC (Wed) by foom (subscriber, #14868) [Link] (1 responses)

There are also arm devices which have heterogeneous sets of CPU cores, e.g. it might have two armv8.0 and two armv8.2 cores available.

The kernel hwcaps mechanism ensures that only the capabilities available on all cores are reported available (modulo buggy kernels, sigh). But, if you ignore that, and just try executing instructions to see what looks like it works, you'll be in for a bad time when your process happens to get moved to a different core with fewer instructions available later on.

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 30, 2021 17:24 UTC (Thu) by pm215 (subscriber, #98099) [Link]

Yeah, I thought of the heterogenous-cluster case, but I couldn't remember whether the kernel reported a lowest-common-denominator hwcaps or just flatly refused to boot (as a "don't do this silly thing" hint to hardware designers :-)), so I didn't mention it.

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 20, 2021 4:42 UTC (Mon) by roc (subscriber, #30627) [Link] (2 responses)

> Linux is now in a unique, quasi-monopolistic position to clean things up without caring about portability.

Linux does have to care about compatibility though, so at this point creating something brand-new in place of signals would only increase complexity since the existing signal system would still need to be supported.

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 20, 2021 7:01 UTC (Mon) by marcH (subscriber, #57642) [Link] (1 responses)

I wrote "clean things up", not "something brand new". I meant: clean up the documentation, add a couple cleaner APIs while keeping the old ones for backward compatibility, define and document undefined behaviors, explicitly forbid the sequences that never worked... I think design decisions like these already happened in other, dark POSIX areas?

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 20, 2021 11:15 UTC (Mon) by roc (subscriber, #30627) [Link]

Sure, I guess, but it's not clear to me you can make things much better this way.

SA_IMMUTABLE and the hazards of messing with signals

Posted Dec 27, 2021 16:56 UTC (Mon) by ebiederm (subscriber, #35028) [Link] (1 responses)

The deep issue with area of the kernel is it has been extended many times since it has been written to the point much of what happens is not well expressed in the code, and the number of active kernel developers who can figure it out is dwindling.

My goal is to clean up the implementation in the kernel so it is more straight forward and less error prone to touch.

There is also the challenge that "Fixes:" is the new "CC: stable@kernel.org". Which means that code that has no need to be backported (because no one has seen the bug in practice) gets backported amplifying any mistakes made in cleanups, or new features meant to address old issues.

I don't know of any solutions other than keep plodding along and make the code less error prone to use.

SA_IMMUTABLE and the hazards of messing with signals

Posted Jan 13, 2022 0:13 UTC (Thu) by sammythesnake (guest, #17693) [Link]

Greg KH gave a talk a while back that was reported here on LWN describing some of his process for deciding what goes into -stable. He mentioned that his criteria vary depending on the source of the code.

It might be worth getting in touch with him and agreeing the right process for your contributions when you have an opinion regarding -stable suitability. He doesn't want to make mistakes any more than you want him to :-P