An unexpected perf feature
Local privilege escalations seem to be regularly found in the Linux kernel these days, but they usually aren't quite so old—more than two years since the release of 2.6.37—or backported into even earlier kernels. But CVE-2013-2094 is just that kind of bug, with a now-public exploit that apparently dates back to 2010. It (ab)uses the perf_event_open() system call, and the bug was backported to the 2.6.32 kernel used by Red Hat Enterprise Linux (and its clones: CentOS, Oracle, and Scientific Linux). While local privilege escalations are generally considered less worrisome on systems without untrusted users, it is easy to forget that UIDs used by network-exposed services should also qualify as untrusted—compromising a service, then using a local privilege escalation, leads directly to root.
The bug was found by Tommi Rantala when running the Trinity fuzz tester and was fixed in mid-April. At that time, it was not recognized as a security problem; the release of an exploit in mid-May certainly changed that. The exploit is dated 2010 and contains some possibly "not safe for work" strings. Its author expressed surprise that it wasn't seen as a security problem when it was fixed. That alone is an indication (if one was needed) that people in various colored hats are scrutinizing kernel commits—often in ways that the kernel developers are not.
The bug itself was introduced in 2010, and made its first appearance in the 2.6.37 kernel in January 2011. It treated the 64-bit perf event ID differently in an initialization routine (perf_swevent_init() where the ID was sanity checked) and in the cleanup routine (sw_perf_event_destroy()). In the former, it was treated as a signed 32-bit integer, while in the latter as an unsigned 64-bit integer. The difference may not seem hugely significant, but, as it turns out, it can be used to effect a full compromise of the system by privilege escalation to root.
The key piece of the puzzle is that the event ID is used as an array index in the kernel. It is a value that is controlled by user space, as it is passed in via the struct perf_event_attr argument to perf_event_open(). Because it is sanity checked as an int, the upper 32 bits of event_id can be anything the attacker wants, so long as the lower 32 bits are considered valid. Because event_id is used as a signed value, the test:
if (event_id >= PERF_COUNT_SW_MAX)
return -ENOENT;
doesn't exclude negative IDs, so anything with bit 31 set (i.e. 0x80000000) will be
considered valid.
The exploit code itself is rather terse, obfuscated, and hard to follow, but Brad Spengler has provided a detailed description of the exploit on Reddit. Essentially, it uses a negative value for the event ID to cause the kernel to change user-space memory. The exploit uses mmap() to map an area of user-space memory that will be targeted when the negative event ID is passed. It sets the mapped area to zeroes, then calls perf_event_open(), immediately followed by a close() on the returned file descriptor. That triggers:
static_key_slow_dec(&perf_swevent_enabled[event_id]);
in the sw_perf_event_destroy() function.
The code then looks for non-zero values in the mapped area, which can be
used (along with the event ID value and the size of the array elements) to
calculate the base address of the perf_swevent_enabled array.
But that value is just a steppingstone toward the real goal. The exploit gets the base address of the interrupt descriptor table (IDT) by using the sidt assembly language instruction. From that, it targets the overflow interrupt vector (0x4), using the increment in perf_swevent_init():
static_key_slow_inc(&perf_swevent_enabled[event_id]);
By setting event_id appropriately, it can turn the address of the
overflow interrupt handler into a user-space address.
The exploit arranges to mmap() the range of memory where the clobbered interrupt handler will point and fills it with a NOP sled followed by shellcode that accomplishes its real task: finding the UID/GIDs and capabilities in the credentials of the current process so that it can modify them to be UID and GID 0 with full capabilities. At that point, in what almost feels like an afterthought, it spawns a shell—a root shell.
Depending on a number of architecture- or kernel-build-specific features (not least x86 assembly) makes the exploit itself rather fragile. It also contains bugs, according to Spengler. It doesn't work on 32-bit x86 systems because it uses a hard-coded system call number (298) passed to syscall(), which is different (336) for 32-bit x86 kernels. It also won't work on Ubuntu systems because the size of the perf_swevent_enabled array elements is different. The following will thwart the existing exploit:
echo 2 > /proc/sys/kernel/perf_event_paranoid
But a minor change to the flags passed to perf_event_open()
will still allow the privilege escalation. None of these is a real defense
of any sort
against the vulnerability, though they do defend against this
specific exploit. Spengler's analysis has more details, both of the
existing exploit as well as ways to change it to work around its fragility.
The code uses syscall(), presumably because perf_event_open() is not (yet?) available in the GNU C library, but it could also be done to evade any argument checks done in the library. Any sanity checking done by the library must also be done in the kernel, because using syscall() can avoid the usual system call path. Kernels configured without support for perf events (i.e. CONFIG_PERF_EVENTS not set) are unaffected by the bug as they lack the system call entirely.
There are several kernel hardening techniques that would help to avoid this kind of bug leading to system compromise. The grsecurity UDEREF mechanism would prevent the kernel from dereferencing the user-space addresses so that the perf_swevent_enabled base address could not be calculated. The PaX/grsecurity KERNEXEC technique would prevent the user-space shellcode from executing. While these techniques can inhibit this kind of bug from allowing privilege escalation, they impose costs (e.g. performance) that have made them unattractive to the mainline developers. Suitably configured kernels on hardware that supports it would be protected by supervisor mode access prevention (SMAP) and supervisor mode execution protection (SMEP), the former would prevent access to the user-space addresses much like UDEREF, while the latter would prevent execution of user-space code as does KERNEXEC.
This is a fairly nasty hole in the kernel, in part because it has existed for so long (and apparently been known by some, at least, for most of that time). Local privilege escalations tend to be somewhat downplayed because they require an untrusted local user, but web applications (in particular) can often provide just such a user. Dave Jones's Trinity has clearly shown its worth over the last few years, though he was not terribly pleased how long it took for fuzzing to find this bug.
Jones suspects there may be "more fruit on that branch
somewhere
", so more and better fuzzing of the perf system calls (and
kernel as a whole) is
indicated. In addition, the exploit author at least suggests that he has
more exploits waiting in the wings (not necessarily in the perf
subsystem), it is quite likely that others do as well. Finding and fixing
these security holes is an important task; auditing the commit stream to
help ensure that these
kinds of problems aren't introduced in the first place would be quite useful.
One hopes that companies using Linux find a way to fund more work in this
area.
| Index entries for this article | |
|---|---|
| Kernel | Security/Vulnerabilities |