SA_IMMUTABLE and the hazards of messing with signals
Forced signals
The signal API normally allows any process to control what happens on receipt of a signal; that can include catching the signal, masking it temporarily, or blocking it outright. There are a few exceptions, of course, including for SIGKILL, which cannot be blocked or caught. Within the kernel, there is a more subtle exception wherein a process can be forced to take a signal and exit regardless of any other arrangements that process may have made. Most of these situations come about in response to hardware problems that cannot be recovered from, but the signals sent by the seccomp() mechanism are also forced in this way. At times, a signal is so important that it simply cannot be ignored.
The kernel has a function to force a signal in this way called force_sig_info_to_task(). It will unblock the intended signal if need be and deliver the signal to the target process; it can also remove the process's handler for this signal, resetting it to the default action (which is normally to kill the process for the signals in question). Interestingly, though, this function wasn't always being used in forced-signal situations; instead, the kernel would just kill the target process and set its exit status to look like the signal had done it. In October, Eric Biederman sent out a patch set causing the kernel to do what it was pretending to do and actually deliver the signal rather than fake it.
This change works as intended — except for a small problem that was pointed out by Andy Lutomirski. It seems that, in current kernels, a call to sigaction() in the target process can re-block a signal in the window between when force_sig_info_to_task() unblocks it and the actual delivery of that signal. A call to ptrace() can also race with forced signals in this way. This race can cause misbehavior at best; in some cases (such as the delivery of a signal from seccomp()) it may be a security problem.
Not wanting to introduce potential vulnerabilities, Biederman set out to close this race; the solution took the form of a new flag (SA_IMMUTABLE) that can be applied to a process's internal signal-handling information. This flag is normally not set; if that changes, then any subsequent attempts to change the handling of the signal in question will fail with an EINVAL error. The flag is hidden from user space and can only be set by the kernel, and that only happens in force_sig_info_to_task(). There is no way to clear SA_IMMUTABLE once it has been set; the assumption is that the process is on its way out anyway. This change fixes the race in question and, since the flag is invisible to user space, there are no user-space ABI concerns. Or so it was thought.
This patch was posted on October 29, and found its way into the mainline on November 10 — near the end of the 5.16 merge window — as part of a set of "exit cleanups". Once the 5.16-rc1 kernel was released, the patch was picked up by the stable series and appeared in 5.15.3 on November 18.
Debugger bugs
Unfortunately, the day before the 5.15.3 release, Kyle Huey reported that the SA_IMMUTABLE change breaks debuggers. It is common for interactive debuggers to catch signals on behalf of the debugged process, including some of the signals that are forced by the kernel (not all of which are fatal). With this change, ptrace() is no longer able to change the handling of SA_IMMUTABLE signals and, in fact, is no longer even notified of them. The patch, Huey said, should be reverted. It was nonetheless shipped in 5.15.3 the next day.
After some discussion, it was concluded that the SA_IMMUTABLE change was indeed overly broad; it blocked some legitimate signal-related operations that were possible before. Biederman posted a pair of patches on the 18th to address the problems. The first of those reflects the conclusion that not all signals forced by the kernel should have the SA_IMMUTABLE flag set on them; instead, that should be restricted to situations where the kernel intends for the target process to exit. That intent is communicated via a new parameter to force_sig_info_to_task(); the call from within the seccomp() subsystem is changed to indicate that intent. The second patch adds a new function (force_exit_sig()) to be used in other places where a forced exit is intended, and adds a number of callers.
It's worth noting that, in the forced-exit case, ptrace() is still unable to trap the signal after these changes. But that is no different from the way things worked before all of these patches went in. The previous implementation, remember, bypassed the signal-delivery mechanism entirely, so there was never any opportunity for a debugger to influence things. The kernel, as seen from user space, now behaves as it did before.
These patches appear to have fixed the problem; they were merged for 5.16-rc2, then found their way into 5.15.5 on November 25. The regression caused by the original patch, in other words, existed for a full week in the 5.15-stable kernel. The rules state that no patch should go into a stable kernel until after it has appeared in a mainline -rc release. That rule was followed in this case; 5.16-rc1 came out between the patch landing in the mainline and it showing up in a stable update. That same rule may have delayed the inclusion of the fixes into the stable 5.15 releases; it could not be considered until after 5.16-rc2 came out on November 21.
The relevant question may well be: is the one-release rule enough to
prevent this kind of regression in the stable kernels? That rule was added
in response to previous problems, and may well have prevented some bugs
from being backported, but some still clearly get through. There is an
argument to be made that, for patches that reach into tricky subsystems like
signals, a more cautious approach to backporting would make sense. In the
absence of the developer resources to make those decisions, though, the
current policy is unlikely to change and the occasional regression will
make a (hopefully brief) appearance in stable kernels.
| Index entries for this article | |
|---|---|
| Kernel | Releases/5.16 |
| Kernel | Security/seccomp |
| Kernel | Signal handling |