Memory protection keys v5

By Jonathan Corbet
December 9, 2015

The first memory protection keys patch set showed up in May; it adds support for an upcoming feature in high-end Intel processors. This mechanism allows applications to assign an integer key value to each of their pages; each key has associated with it a protection mask that can deny access regardless of what the regular protection bits say. The feature is useful as a way to quickly change the restrictions applying to large ranges of memory without having to change each page's protection bits independently. Protection keys can support "read-mostly" memory, for example, or make memory containing sensitive information (cryptographic keys) entirely inaccessible most of the time.

The fifth version of the memory protection keys patch set has been posted. The API that is proposed for this feature has shifted considerably since May, so it merits another look.

Intel's memory protection keys feature works by making use of four page-table bits to assign one of sixteen key values to each page. A separate register then allows the assertion of "write-disable" and "access-disable" bits for each key value. Setting the write-disable bit for key seven, for example, will cause all pages marked with that key as being read-only, even if the protection bits on those pages would otherwise allow write access. The write- and access-disable bits are local to each thread, and they can be modified without privilege. Since keys are assigned to pages in the page-table entries, only the kernel can change those.

The original patch set allowed processes to assign keys to pages with any system call that changed page permissions — mprotect() and mmap(), for example. Four new "permissions" bits were defined, corresponding to the four bits of the key value. This API ran into some difficulties in the review process, though; it was criticized as being too closely tied to one specific implementation of memory protection keys. It might not extend well even to future changes in Intel's mechanism, and might not fit equivalent mechanisms on other processors at all. So a rethink of the API was called for.

In the middle of the discussion of this feature, Ingo Molnar came up with an interesting use case. The access-disable bit applies to data access, not execution access; as a result, it can be used, in conjunction with the regular "execute access" bit, to create regions of memory that can be executed by the processor, but which cannot be read by the executing process. That, he said, could be used to frustrate attacks against address-space layout randomization that read the executable text in order to try to locate a specific data structure or chunk of code. There could be security advantages to protecting library code, at least, in this manner.

This idea seemed popular among the security-oriented developers in the discussion. Like anything else, this protection would not be absolute, since the access-disable bit can be turned off. But it adds another barrier that must be overcome by an attacker; in many cases, it may be enough to thwart an attack.

Fully implementing this feature could be challenging for a number of reasons, not the least of which being that it's common to mix executable and read-only data in an executable image. Most of the work to implement this feature would have to be done in user space, and is thus beyond the immediate reach of the kernel community, but, as Ingo said: "That does not mean we can not try!" As part of getting there, he suggested that, rather than just giving user space total control over the protection keys, the kernel should manage them and allocate them on request. That would, among other things, allow the kernel to reserve some keys for its own use in the future.

That suggestion was implemented in the fourth revision of the patch set in the form of two new system calls:

    int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
    int pkey_free(int pkey);

The flags value to pkey_alloc() is currently unused and must be zero. The initial access restrictions associated with the allocated key are provided in init_access_rights; either of the PKEY_DENY_WRITE and PKEY_DENY_ACCESS bits can be set. If a key is available for allocation, the kernel will allocate it and return the associated key number as the return value from pkey_alloc().

If an application is done with a particular key, that key can be returned to the system with pkey_free(). The code does not check whether any pages have that key value assigned to them; applications will want to be careful there or surprising things might happen.

Assigning a key to a specific set of pages is done with the new mprotect_key() system call:

    int mprotect_key(void *start, size_t len, unsigned long prot, int pkey);

This system call will set both the page protections and the protection key for the pages starting at start and extending for len bytes. The given pkey must have been allocated to the process using pkey_alloc() or the call will fail. For what it's worth, this system call is called mprotect_pkey() and pkey_mprotect() in other parts of the patch set, so the final name may not yet be set in stone.

Comments on the patch set this time around have been relatively subdued; it would seem that most developers have had their say and are happy with the direction that this work is taking. So it may find its way into the kernel in a near-future development cycle. What may take a bit longer, though, is actual availability of hardware that supports this feature, which is slated to first show up in Skylake server chips.

Index entries for this article
Kernel	Memory protection keys
Kernel	Security/Security technologies

Memory protection keys v5

Posted Dec 10, 2015 19:08 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (8 responses)

Hmm. I wonder if this could be combined with the kernel keyring functionality. Any reason it couldn't be? I don't imagine that the userspace would have much to say about it, but having the keys locked to a specific kernel thread would be an improvement.

Memory protection keys v5

Posted Dec 10, 2015 22:06 UTC (Thu) by hansendc (subscriber, #7363) [Link] (7 responses)

The feature only affect access to pages with _PAGE_USER bit set. So they can't be used control access to kernel memory, unfortunately.

Memory protection keys v5

Posted Dec 10, 2015 22:54 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (6 responses)

This is different than the memory sealing stuff from a few weeks ago right? That isn't also user-mode only is it? *crosses fingers*

Memory protection keys v5

Posted Dec 10, 2015 22:57 UTC (Thu) by hansendc (subscriber, #7363) [Link] (5 responses)

This is completely separate from the memory sealing stuff (https://lwn.net/Articles/591108/).

Memory protection keys v5

Posted Dec 11, 2015 15:50 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (4 responses)

I guess the terminology is confusing here. That's not the feature I was referring to. It was a feature which would allow a process to set up memory that only it could read (not even the kernel could read it). There were worries about DRM. My search-fu is failing me :( .

Memory protection keys v5

Posted Dec 11, 2015 15:58 UTC (Fri) by PaXTeam (guest, #24616) [Link] (3 responses)

https://software.intel.com/en-us/isa-extensions/intel-sgx ?

Memory protection keys v5

Posted Dec 11, 2015 22:21 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (2 responses)

Yes. The LWN article which mentioned it: https://lwn.net/Articles/656750/

Memory protection keys v5

Posted May 17, 2016 16:16 UTC (Tue) by hailfinger (subscriber, #76962) [Link] (1 responses)

SGX is also one of the best anti-debugging mechanisms usable by malware so far.

Memory protection keys v5

Posted May 20, 2016 17:07 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

And encryption is used by ransomware. What's your point?

Memory protection keys v5

Posted Dec 11, 2015 3:08 UTC (Fri) by spender (guest, #23067) [Link]

From May of this year: https://twitter.com/grsecurity/status/600091830185762816 , predating this September discussion.

Won't make any difference for distros though unless they add support for runtime binary diversification. It doesn't matter if the code is non-readable when everyone in the world is running the exact same binaries. There are ways it can be (ab)used for the kernel as well, even though it was designed for userland...

-Brad

Memory protection keys v5

Posted Dec 11, 2015 15:54 UTC (Fri) by cov (guest, #84351) [Link] (1 responses)

On the surface, this sounds a lot like the Memory Attribute Indirection Register (MAIR) on ARM.

Memory protection keys v5

Posted Dec 11, 2015 15:59 UTC (Fri) by cov (guest, #84351) [Link]

That is, if the RWX bits were removed from block and page descriptors into the MAIR. Currently the MAIR doesn't have the RWX bits but rather attributes such as cacheability.

pkey_change_prot ?

Posted Dec 12, 2015 9:31 UTC (Sat) by Jandar (subscriber, #85683) [Link] (2 responses)

Where is the syscall to change the access rights? The special value of this PMK is the ability to *change* the protection quickly without touching any page tables.

pkey_change_prot ?

Posted Dec 12, 2015 14:12 UTC (Sat) by corbet (editor, #1) [Link] (1 responses)

There isn't one; as I understand it, changing the access flags is a simple, unprivileged operation, so no syscall is needed.

pkey_change_prot ?

Posted Dec 14, 2015 10:03 UTC (Mon) by pbonzini (subscriber, #60935) [Link]

Yes, you can use the new RDPKRU and WRPKRU instructions to read or set which protection keys are allowed.

Interestingly, the new instructions require register RCX to be zero, so it looks like they will evolve into a generic mechanism for "user-mode accessible special registers" (i.e. similar to MSRs, many of which aren't that much model-specific anymore, but for user space).