The intersection of shadow stacks and CRIU
Shadow stacks are one of the methods employed to enforce control-flow integrity and thwart attackers; they are a mechanism for fine-grained, backward-edge protection. Most of the time, applications are not even aware that shadow stacks are in use. As is so often the case, though, life gets more complicated when the Checkpoint/Restore in Userspace (CRIU) mechanism is in use. Not breaking CRIU turns out to be one of the big challenges facing developers working to get user-space shadow-stack support into the kernel.
The idea behind shadow stacks is simple: in addition to the normal program stack (which holds return addresses, local variables, and more) there is a special memory area, called the "shadow stack", that stores only return addresses. Whenever a CALL instruction is executed, the return address is pushed onto both the normal and the shadow stacks. When, later, a function ends with a RET instruction, the return address that's popped from the normal stack is compared to that on the shadow stack. If they match, the execution continues; if they don't, a violation of control-flow integrity has just been detected.
Recent x86 processor models implement shadow stacks in hardware, meaning that no instrumentation is required for a program to get the protection that shadow stacks provide and that the cost of using a shadow stack is negligible. Once the feature is enabled, the CPU takes care of pushing and popping the return address on the shadow stack and comparing the return addresses. If the return addresses do not match, the CPU generates a control protection exception. To support shadow stacks, the x86 architecture has been extended with a model-specific register (MSR) that controls the use of the shadow stack and its features. There are also shadow-stack pointer MSRs (one for each possible privilege level) and a set of instructions for manipulating shadow-stack contents.
The discussion about how kernel should support shadow stacks for user space started a long time ago, but it has still not concluded. One of the difficulties in enabling this feature is the possibility that some applications will be broken by shadow stacks because they use non-standard ways to change their control flow. The list of problematic applications includes GDB, various JIT engines, and, of course, CRIU.
CRIU and shadow stacks
CRIU is known for its intimate relations with the kernel and its use of obscure kernel interfaces. Among other things, CRIU has to intervene in the control flow of the tasks to be checkpointed in order to extract the information that cannot be obtained by other means (such as from the /proc file system). When restoring a saved process, CRIU has to be able to recreate its state as it was at checkpoint time, so if the process had a shadow stack enabled, that shadow stack has to be restored exactly as it was before the checkpoint.
To checkpoint (or "dump" in CRIU jargon) a process, CRIU injects a blob with parasite code into the target to get parts of the process state that are not visible from the outside or which can only be obtained slowly and painfully. To inject the parasite, CRIU stops the task with ptrace(), finds a free area in the task's memory layout, puts the parasite code there, and makes the task jump into that code. So far, there are no conflicts with shadow-stack enforcement because, after the parasite starts running, the CALL and RET instructions within the parasite are properly paired.
There is, however, a problem when the parasite's job is done and the normal process execution should be resumed. CRIU uses the sigreturn() system call, which is normally only invoked at the end of a signal handler, to "cure" the task of the parasite and resume its normal execution. This operation could be done with ptrace(), but sigreturn() reduces the synchronization complexity between CRIU and the parasite and, more importantly, allows the task to continue even if CRIU itself fails.
The implementation of sigreturn() in the kernel takes special measures to ensure that its usage does not violate shadow-stack integrity. Whenever the kernel needs to deliver a signal to a process, it sets up the return frame that will be used when signal handler is concluded; it also pushes some data to the shadow stack and then verifies the integrity of that data when sigreturn() is called. Since CRIU uses sigreturn() directly — without any signal being delivered to the process that is being dumped — it has to tweak the shadow-stack contents to match the state expected by the kernel. The modification of the shadow-stack pointer is done using a couple of ptrace() calls are part of the latest API proposed for shadow-stack enablement; the shadow-stack contents can already be adjusted using existing ptrace() calls. This shadow-stack modification is performed early during parasite injection in order to preserve the ability to resume normal task execution if anything goes wrong.
Once parasite injection and removal are handled, dumping a process with a shadow stack enabled is simple. The only difference from a "normal" dump is the need to save the shadow-stack enable/disable state and the shadow-stack pointer, both of which can be easily done with the ptrace() calls. The shadow-stack memory area is saved exactly as any other anonymous memory and does not require any special care.
CRIU restore
Restoring a process with a shadow stack is slightly more involved than dumping. When CRIU restores a process tree, it creates all of the tasks and threads found in the checkpoint and then modifies them so that their state will be exactly as the state that was saved at dump time. After the state of each thread is restored, CRIU sets up a sigreturn() frame for each thread, cleans up leftovers of the original CRIU process, and calls sigreturn() to restart the execution of the restored tasks. In order to restore the shadow stack, CRIU needs to be able to map the shadow-stack memory at exactly at the same address as it was before the dump. CRIU also needs a way to efficiently populate the contents of the shadow stack with the saved data and the ability to set the shadow-stack control bits and pointer. Additionally, the kernel API lets the C library and program loader lock various shadow-stack features; CRIU must thus be able to ensure that these feature locks are kept after a restore.
Since shadow-stack memory is somewhat special, the virtual memory area for it should be created with proposed map_shadow_stack() system call (described in this article) rather than with mmap(). Shadow-stack memory is read-only and it cannot be remapped. Based on the feedback from the CRIU developers, the latest version of the kernel patches that enable shadow stacks for user space allows passing a desired address to the map_shadow_stack() system call. This allows CRIU to map the shadow stack of the restored processes exactly where it was before the dump.
As a result of the way CRIU recreates the process's memory layout and restores its memory contents, mapping shadow-stack memory requires some additional care beyond having it at the correct address. To avoid conflicts between the memory layouts of CRIU and the restored process, CRIU reserves enough virtual memory to hold all of the restored process's memory areas, partially populates that memory, and then uses mremap() to map chunks of the reserved area to the appropriate addresses; it then finishes restoring the memory contents. The remapping happens late in the restore process and, since the shadow-stack memory cannot be remapped, it has to be created after the memory layout is nearly finalized; otherwise map_shadow_stack() could clobber an existing mapping.
Once the shadow stack has been put into the correct place, CRIU switches the shadow-stack pointer to it using the x86 RSTORSSP and SAVEPREVSSP instructions. At this point, the shadow stack can be populated with the WRUSS instruction. After restoring the saved shadow-stack data, CRIU uses WRUSS again to set up a frame for sigreturn() that will later resume normal execution of the restored tasks.
Restoring the shadow-stack contents could also be done with ptrace(), but user-space stacks can grow quite deep; there may be a lot of threads, and so restoring shadow-stack contents that way would involve complex synchronization between the CRIU control process and the tasks being restored. Additionally, filling memory with ptrace() is terribly slow. Although WRUSS is not as efficient as memcpy(), it is still much faster than ptrace(). Before using WRUSS, though, it should be enabled in the shadow-stack control register, where it is disabled by default. CRIU can enable WRUSS before restoring the shadow-stack memory with an arch_prctl() call that allows manipulating bits in the shadow-stack control MSR, and switch it back off before letting the restored tasks run.
The last task that CRIU has to take care of is the locking of the shadow-stack features. The GNU C Library (glibc) will enable shadow stacks for a process if it finds certain bits in the ELF header of the running program, and disables the feature if these bits are absent. Once the shadow stack is enabled or disabled, glibc locks its state with an arch_prctl() call. The same call allows locking the state of WRUSS enablement but, at the moment, glibc does not use it. The feature locks are inherited across a clone() call so, if CRIU runs with shadow stacks enabled, it cannot restore a process that has shadow stacks disabled and similarly, if CRIU starts without the shadow stack, it has no way to enable it after clone()ing the restored tasks. To resolve this problem, the proposed kernel API introduces another arch_prctl() call that will unlock the shadow-stack features. This call is only available via ptrace(), so an attacker won't be able to disable shadow stack from within a process. With this arch_prctl() call, CRIU can control the shadow-stack feature locks for the clone()ed tasks and then reset them to the final, secure state after the shadow stack is restored.
Conclusions
Shadow stacks on the x86 architecture provide efficient protection
against return-oriented
programming (ROP) and similar attacks, but its use necessitates updates of
certain applications. Hopefully, CRIU's experience with shadow stacks will be
useful to other projects that need to address shadow-stack compatibility
issues.
Enabling shadow stack-support in CRIU revealed several gaps in the earlier
versions of the proposed kernel APIs and the initial
implementation of shadow-stack support in CRIU relied on API extensions
that were not included in the original
kernel patches. The latest version of those patches has incorporated
feedback from the CRIU developers and has all the necessary knobs to
support checkpoint and restore of applications with shadow stacks.
| Index entries for this article | |
|---|---|
| Kernel | Checkpointing |
| Kernel | Security/Control-flow integrity |
| GuestArticles | Rapoport, Mike |