May the FOLL_FORCE not be with you
The special file /proc/PID/mem provides read and write access to the virtual address space of the process identified by PID. It is used primarily by debuggers, but it has a place in other applications (certain types of user-space hardening, for example) as well. Writing to this file will overwrite the process's memory at the current file offset. Interestingly, the kernel function that implements writing to this file — mem_rw() — uses the FOLL_FORCE flag when accessing the target memory. That flag causes the write to succeed, regardless of whether the normal memory protections at the target address would allow writing. As a result, /proc/PID/mem can be used to overwrite executable memory.
This sort of capability serves as a sort of giant "welcome" mat to those who would compromise a system; in this case, it has been present for a long time. LWN covered one exploit that used /proc/PID/mem — in early 2012. Linus Torvalds put a fix into the mainline in response to that disclosure, but it turned out to be insufficient; another serious exploit surfaced in 2017. That time around, Torvalds applied a patch during the 4.12 merge window that removed the use of FOLL_FORCE from mem_rw() on the theory that, perhaps, nobody actually needs that behavior. The reality of the situation soon became clear, though, as various users complained; the patch restoring FOLL_FORCE that Torvalds merged for 4.12-rc4 included descriptions of some of those users, including the just-in-time compiler for the Julia language and the rr debugger.
So FOLL_FORCE remained for another seven years. Surprisingly, attackers have not gone away over that time either. So the interest in closing the FOLL_FORCE loophole is still around as well. Just before the opening of the 6.11 merge window, Christian Brauner sent a pull request containing a patch from Adrian Ratiu that tried to improve the situation. Since disabling FOLL_FORCE entirely has proved to not be possible, Ratiu's patch added four kernel command-line parameters to allow administrators to control access to /proc/PID/mem:
- proc_mem.restrict_open_read: controls whether the file can be opened for read access.
- proc_mem.restrict_open_write: controls opening for write access.
- proc_mem.restrict_write: controls whether writes are allowed on an open file descriptor.
- proc_mem.restrict_foll_force: disables the use of FOLL_FORCE.
Setting any of these options to "all" applies the restriction to all processes in the system. If, instead, an option is set to "ptracer", then the given access is allowed only to a process that has already attached to the target process with ptrace(), as debuggers tend to do. The default is to apply no restrictions, retaining the behavior of current kernels.
In the pull request, Brauner indicated a lack of enthusiasm for this
approach: "The level of fine-grained management isn't my favorite as it
requires distributions to have some level of knowledge around the
implications of FOLL_FORCE and /proc/<pid>/mem access in
general
". He was able to overcome his reluctance to push the work to
Torvalds, but the request found
a chillier welcome there: "I pulled this, and looked at it, and then
I decided I can't live with something this ugly
". He had complaints
about the implementation, but also about the complicated set of
command-line options and the logic needed to implement them; he rejected
the pull request and suggested adding a single option to control
FOLL_FORCE instead.
Ratiu responded with a patch adding a single kernel-configuration option regulating the use of FOLL_FORCE. Kees Cook responded that a command-line parameter was still needed for developers to be able to test the restricted mode. Torvalds dug up an old patch from previous iterations that enabled FOLL_FORCE only for the ptrace() case, suggesting that it could be adapted and used now. On July 23, Ratiu posted a new patch adding a boot-time parameter, called proc_mem.force_override, that can be set to "never" (to disable FOLL_FORCE entirely), "ptrace" (to enable just the ptrace() case), or "always" to preserve the existing behavior. There is also a build-time configuration option to control the default behavior in the absence of a command-line parameter.
Torvalds was
much happier with this version, though he still pointed out that it
could be "prettied up some more
" with a few changes. Ratiu agreed with the
suggestions, and posted a followup
version with minor changes.
The end of the 6.11 merge window is approaching; there are no guarantees
that this work will be ready and accepted in time to land in this release.
Given that it is a small change, though, and that it addresses a
longstanding security problem, there will be interest in merging it sooner
rather than later if possible. After that, it will be interesting to see
how distributors set the default; the ptrace mode avoids breaking
debuggers like GDB, but does not address all of the other use cases for
FOLL_FORCE. If the damage from restricting FOLL_FORCE
proves too painful, it may turn out that this loophole will remain open for
years to come.
| Index entries for this article | |
|---|---|
| Kernel | Releases/6.12 |
| Kernel | Security/Vulnerabilities |
| Security | Linux kernel |