The Arm64 memory tagging extension in Linux

By Jonathan Corbet
October 15, 2020

One of the first features merged for the 5.10 kernel development cycle was support for the Arm v8.5 memory tagging extension [PDF]. By adding a "key" value to pointers, this mechanism enables the automated detection of a wide range of memory-safety issues. The result should be safer and more secure code — once support for the feature shows up in actual hardware.

As one might expect, the Arm64 architecture uses 64-bit pointers to address memory. There is no need (yet!) for an address space that large, though, so normally only 48 of those bits are actually used by the hardware — or 52 bits if a special large-address-space option is enabled. So there are 12-16 bits that can be used for other purposes. Arm systems have long supported a "top byte ignore" feature that allows software to store arbitrary data in the uppermost byte of a virtual address, but the hardware designers have been busy coming up with other uses for those bits as well. The memory tagging extension (MTE) is one of those uses.

Specifically, MTE allows the storage of a four-bit "key" in bits 59-56 of a virtual address — the lower "nibble" of the top byte. It is also possible to associate a specific key value with one or more 16-byte ranges of memory. When a pointer is dereferenced, the key stored in the pointer itself is compared to that associated with the memory the pointer references; if the two do not match, a trap may be raised. Keys can be managed by the application, or they can be randomly generated by the CPU.

Four bits only allow for 16 distinct key values, but that is enough to do some interesting things. If a function like malloc() ensures that allocations that are adjacent in memory have different key values, then an access that overruns any given allocation will be detected by the processor. Use-after-free bugs can be detected by changing the key value immediately when a range of memory is freed. If each stack frame is given its own key, buffer overruns on the stack will also generate traps. An attempt to dereference a completely wild pointer (or one injected by an attacker) also has a good chance of being detected.

MTE thus has two levels of applicability. If enabled during the normal software-development process, it should help to identify a range of bugs before they ever make it into a release. But it can also be enabled on production systems to add one more obstacle that an attacker must overcome to exploit a known vulnerability.

MTE is disabled by default on Linux systems, even on hardware that supports it. A user-space process can enable MTE for a specific region of memory by specifying the new PROT_MTE flag in the mmap() call creating that region. mprotect() can also be used to enable MTE on already-mapped memory. Only anonymous memory can have PROT_MTE set; attempts to use it with file-backed memory will fail.

The default key associated with all memory is zero; using any other value requires a couple of steps. The first of those is usually to create a tagged address for the memory of interest; that is simply a matter of storing the key value in bits 59-56 of the address. There is a new instruction (IRG) that will generate a random key and store it into an address. The other piece is to associate the key with the memory itself. To that end, another new instruction (STG) takes a pointer value and sets the key for the 16-byte "granule" containing the pointed-to memory to the key found in that pointer. Various other instructions exist for managing tags, setting the contents of memory along with the tag, etc. These are all unprivileged operations that do not require assistance from the kernel.

If a process attempts to access memory with the wrong key, the processor will, by default, do nothing. This can be changed by using the PR_SET_TAGGED_ADDR_CTRL command to the prctl() system call. Providing a value of PR_MTE_TCF_NONE disables tag checking (the default). There are two values (PR_MTE_TCF_SYNC and PR_MTE_TCF_ASYNC) that will cause a SIGSEGV to be delivered on a key mismatch; the former causes the signal to be delivered immediately (synchronously), while the latter queues it asynchronously. A synchronous signal will be delivered immediately to the offending thread and the operation will not be performed; if the signal is not handled the process will be terminated. An asynchronous signal will be queued for later delivery to the process, and the mismatched operation will proceed.

There are some other features associated with MTE that are supported by the kernel, including a set of ptrace() commands for manipulating tags for another process. Some more information (and a sample program) can be found in Documentation/arm64/memory-tagging-extension.rst in the kernel source. Note that, in 5.10, use of MTE is only supported for user space; support for MTE within the kernel itself will come in a future development cycle.

Some readers may note a resemblance to the Arm pointer authentication feature, which stores a short cryptographic signature into the upper bits of pointer values. Pointer authentication can prevent the creation of new pointers by an attacker; it depends entirely on the knowledge of a secret key value and does not associate any sort of key with the memory itself. This feature and MTE can be used together, though MTE will rob some bits and make the pointer-authentication signature shorter. There is value in both; MTE can prevent overruns on the stack, while authentication can prevent the corruption of the stack pointer itself.

While the MTE feature seems useful, the number of applications that will gain direct support for it is likely to be small. Happily, much of the benefit can be had without the need to change applications at all. If the C library (and its memory allocator in particular) supports MTE, then all applications should gain the extra memory-safety checks automatically. MTE patches for the GNU C library have been posted for consideration, so that support should be available eventually. The LLVM compiler has support for stack tagging now; GCC should gain that support eventually.

None of this is helpful to anybody now, though, since hardware with MTE support is not actually shipping yet. The good news is that, once that hardware is available, the software side should be ready for it immediately. That, with any luck at all, should lead to more secure systems and software with fewer bugs, even on hardware without the memory-tagging feature.

Index entries for this article
Kernel	Architectures/Arm

to post comments

The Arm64 memory tagging extension in Linux

Posted Oct 16, 2020 1:08 UTC (Fri) by mm7323 (subscriber, #87386) [Link] (8 responses)

Does anyone know where are the 'Tag Granule' keys are stored?

Presumably somewhere there needs to be 4-bits stored for every 16-bytes of tagged memory, so 3.125% memory overhead. I'm guessing something like a page table and TLB type arrangement is used by the processor to lookup the key for each memory area, but I can't find a description of it.

The Arm64 memory tagging extension in Linux

Posted Oct 16, 2020 2:33 UTC (Fri) by songmaster (subscriber, #1748) [Link] (1 responses)

Also the STE instruction only sets the tag for one 16-byte granule. If an address is a pointer to a structure that may be tens to thousands of bytes in size, presumably the code would have to loop through the whole structure setting the same tag for every granule that belongs to it. Not a problem for code that only uses malloc() which should take care of that, but some programs use their own allocators. I guess they shouldn’t break as long as they’re using malloc() to begin with, but unless they are made aware of this the advantage would be reduced.

The Arm64 memory tagging extension in Linux

Posted Oct 16, 2020 2:57 UTC (Fri) by mm7323 (subscriber, #87386) [Link]

Yes, it sounds like there is some overhead initialising memory too, though there is a STZGM instruction to set tags while zeroing blocks of memory.

Elsewhere I read that if the compiler is modified to use MTE on each stack frame, things like char path[PATH_MAX] on the stack can have excessive overhead as it requires tagging in the function prologue, but paths will typically be shorter than PATH_MAX.

PATH_MAX is kinda broken anyway, but it's an example where large stack buffers may have increased cost.

Security is rarely free, but still usually worthwhile.

Tags are stored in separate physical memory

Posted Oct 16, 2020 3:13 UTC (Fri) by CChittleborough (subscriber, #60775) [Link] (5 responses)

I had the same question as mm7323. I happen to have the Armv8-A Architecture Reference Manual open, so I did a text search. I was surprised by what I found: you need separate physical memory!

“Tag load and store instructions to access Allocation tags in a tag physical address space, separate to the data physical address space accessed by data load and store instructions to access data in normal memory and devices.”
— §D6.1 on p2660 (of 7900!)

So chips which support MTE need to store a 4-bit tags for every 16-byte ‘granule’ of data. Moreover, tags applies to Logical Addresses, ie., virtual addresses, so you need enough tag storage to cover all the virtual addresses you will ever allocate.

If you require physical memory to be contiguous, you could just reserve some of it at boot time by setting a single implementation-specific register. But if you want to allow non-contiguous physical memory ranges, you might want multiple registers. Supporting hot-plugging of memory would be quite hairy. Maybe MTE and hot-pluggable memory would be mutally exclusive?

Are there any chips which support MTE on the market yet? Does anyone know how they handle tag storage?

Tags are stored in separate physical memory

Posted Oct 16, 2020 3:35 UTC (Fri) by mm7323 (subscriber, #87386) [Link] (1 responses)

Wowsers. I wonder what happens when you try and set more tags than there is memory to store the granule keys...

I also don't see an instruction to 'unset' a key either.

Tags are stored in separate physical memory

Posted Oct 16, 2020 5:23 UTC (Fri) by mm7323 (subscriber, #87386) [Link]

In answer to my own question, and having looked at the reference manual, it looks like the tags are translated from the virtual into physical tags, then keys looked up at that point.

Therefore there just needs to be $mem/32 bytes of extra RAM set aside for storing the Granule tags for each physical address, and it can never run out.

Other things of curiosity - the synchronous tag error mode has a significant runtime overhead, as does a tag value of all 1's, according to the manual.

Tags are stored in separate physical memory

Posted Oct 16, 2020 13:15 UTC (Fri) by anton (subscriber, #25547) [Link]

I expect it to be in physical memory: Why would you tag memory that is not physically backed? Well, actually there is quite a bit of virtual memory that is never used and is only ever backed by the same zero-filled page; do you need such unused memory to have tags already before use?

For physical RAM, I expect that it uses some bits that you get with ECC memory (you only need 8 SECDED ECC bits for 128bits (16 bytes) of payload, leaving 8 bits for other purposes.

For virtual memory backed by mass storage, some systems have mass storage with larger sectors to accomodate meta-data, but this mass storage is rare and therefore expensive. It may be cheaper to provide enough ECC RAM that you don't need swap space.

Tags are stored in separate physical memory

Posted Oct 16, 2020 16:30 UTC (Fri) by mwsealey (subscriber, #71282) [Link] (1 responses)

It does need a bus protocol that can transport it, so new interconnects and a memory controller that can receive and respond to it as the endpoint.

A "seperate PA space" isn't really "separate physical memory" in the sense that you don't need a *dedicated* memory controller or a particular SRAM block, for example, for MTE alone.

It's no different to, for example, the logical separation Secure vs Non-Secure memory. In theory, Secure 0x8000 and Non-Secure 0x8000 are two seperate PA spaces - the two numerical addresses aren't the same address. In practical reality, it's a "n+1th bit" of addressing, and the underlying memory technology (the cells or gates storing the information) are the same one for each address.

Most modern DRAM controllers have a TrustZone address space controller built in (or something broadly similar) which can effectively allow or deny access to particular regions of RAM based on the security state. It's just an address range, and the differentiator between being secure or non-secure read or write in the system being a single bit. So you can have 4GB of RAM and 2GB of it be Secure and 2GB of it be Non-Secure, but they're on the same 32Gbit DRAM die.. or separate 16Gbit ones, or striped across them, whatever you like.

Where that memory 'lives' is up to that system, maybe the top MBs will be partitioned off by the memory controller and interconnect for the tag space, and the tag management instructions will essentially be putting data in there. Your OS will be none the wiser except that it may be told that it only has 7.xGB available 'Normal' physical memory (which we already see since we can have software carve-outs for secure firmware or other purposes). It's not really an architectural question in the CPU sense, more of a thing for memory controller vendors to describe how they want to make it happen.

Tags are stored in separate physical memory

Posted Sep 15, 2022 4:47 UTC (Thu) by nikhildevshatwar (guest, #159628) [Link]

I am still wondering how it is sufficient to maintain tags for only that portion of virtual memory that is resident (i.e. not swapped out) and has valid mapping to physical addresses.

In reality, there will be swapping and the total virtual memory will be much more than the physical memory.
Is the kernel going to modify the swap in/out handler to also backup the tag memory corresponding to the data memory when swapping in/out.

If not done this way, the process which behaved nicely, will start seeing tag mismatch exceptions if it lost the tag memory while swapping the pages.