GB2638803A

GB2638803A - N429457gb

Info

Publication number: GB2638803A
Application number: GB2411611.3A
Authority: GB
Inventors: Sanderson Graham; Wren Luke
Original assignee: Raspberry Pi Ltd
Current assignee: Raspberry Pi Ltd
Priority date: 2024-08-07
Filing date: 2024-08-07
Publication date: 2025-09-03
Also published as: GB202411611D0

Abstract

A security coprocessor 200 for use with a processor having a stack is configured to receive a tag value 204 from the processor, generate a canary value 206 based on the tag value and a salt, and return the canary value to the processor for incorporation in the stack. The canary value may be generated as logical combination of bits of the tag and bits of the salt such that different tags are guaranteed to yield different canary values and/or for any two different tags, each is a function of at least one salt bit that the other is not a function of. In another embodiment, a security coprocessor is configured to receive a redundant value from the processor, validate the redundant value and issue a panic instruction if the redundant value is not valid. In a further embodiment, a security coprocessor is configured to initiate a counter in response to a write counter instruction, and in response to a check counter instruction including a count value, check that the count value matches a current value of the counter and increment the counter.

Description

COPROCESSOR

Field of the Invention

100011 The present invention relates to coprocessors, in particular to coprocessors than can improve the security of operations of a computing device or system.

Background

100021 Security is a very important issue in computing. In particular it is desirable that the code executed by a computer is the code that was intended to be executed and has not been modified or replaced by an adversary. Another requirement is that any secrets, e.g. cryptographic keys, are not discoverable by an adversary. A variety of different hardware devices have been proposed to improve security in computers and may be referred to under the general term of "security coprocessor" although there is not common understanding of what functions may be performed by a security coprocessor.

100031 For example, some security coprocessors may be designed for storing sensitive information such as user credentials, passwords, fingerprints, certificates, encryption keys, etc. in tamper-resistant or tamper-evident memory. Other forms of security coprocessor are designed to efficiently encrypt data and may include keys stored in tamper resistant memory. Still further functions of security coprocessors are validation of hardware and software, especially for secure boot processes, and isolation of sensitive operations and data from a main processor.

100041 There remains a need for improvements in the field of hardware to support security in computing environments.

Summary

100051 It is an aim of the invention to provide a hardware security device having additional functionality.

100061 According to the invention, there is provided a security coprocessor for use with a processor having a stack, the coprocessor configured to: receive a tag value from the processor; generate a canary value based on the tag value and a salt; and return the canary value to the processor for incorporation in the stack 100071 According to another aspect of the invention, there is provided a security coprocessor for use with a processor, the coprocessor configured to: receive a redundant value from the processor; validate the redundant value; and issue a panic instruction if the redundant value is not valid.

100081 According to another aspect of the invention, there is provided a security coprocessor for use with a processor, the coprocessor configured to: in response to a write counter instruction from the processor, initiate a counter nside the security coprocessor; and in response to a check counter instruction from the processor, the check counter instruction including a count value, check that the count value matches a current value of the counter and increment the counter.

100091 Embodiments of the invention are able to assure the integrity of execution of secure software, e.g. bootrom software, and provide secure hardware that can efficiently and 15 securely perform certain additional security functions that might otherwise need to be carried out in software in a less efficient manner.

Brief Description of the Drawings

100101 The present invention will be described further below with reference to exemplary embodiments and the accompanying drawings, in which: 100111 Figure 1 is a schematic of a microcontroller in which embodiments of the invention may be embedded; 100121 Figure 2 is a schematic of the system bus architecture of the microcontroller of Figure 1; 100131 Figure 3 is schematic of a general-purpose processor (CPU) to which an embodiment of the invention may be connected; 100141 Figure 4 is a schematic diagram of instruction and data flow in a redundancy coprocessor according to an embodiment; and 100151 Figure 5 is a logic diagram of an arrangement for cross-core triggering according to an embodiment.

100161 In the various figures, like parts are indicated by like references.

Detailed Description

100171 The present invention is described below in the context of a dual core microcontroller referred to below as the RP2350. However, it will be appreciated that security coprocessors embodying the principles of the present invention may be applied in other devices and entirely different architectures. For example, security coprocessors of the invention may be incorporated in a System on Chip in particular but not exclusively for use in single board computers. The security coprocessor may also be referred to as a redundancy coprocessor, abbreviated to RCP. A general description of the RP2350 is provided first for context.

100181 As shown in Figures 1 and 2, the RP2350 system 100 is built around a central ABB5 crossbar 101(a) for processors, DMA 102 and memories, with a separate AHR5 layer 101c for some fast peripherals, and an APB layer 101b for other peripherals. It is a symmetric dual-core 103,104 system. Dual Cortex-M33 (Arm) processors 103a, 104a or dual Hazard3 (RISC-V) processors 103b, 1046 are selected via OTP or software and multiplexed 105 onto the top-level bus ports. Based on this configuration, the system comes out of reset as either a dual-Arm or dual-RISC-V microcontroller. The Cortex-M33 has two bus master ports. For full performance, these must be connected to two independent top-level fabric ports, which implies a 6-master crossbar. The Hazard3 processors are 3-stage processors implementing the RV321MACZb* instruction set. Various parts of the RP2350, for example the programmable input/output devices PI00-P102 106, are the same as the corresponding parts described in W02022153025A1, which document is hereby incorporated by reference in its entirety.

100191 A schematic block diagram of a Cortex-M33 processor is shown in Figure 3.

Further details of this processor are given in Documentation provided by ARM Limited. Although a description is given below in reference to ARMv8-M architecture, the invention 25 may be used with other versions and other architectures.

10020] The Cortex-M33 processor provides four separate stacks for distinct security and privilege levels, however stack based attacks apply even to the most secure.

100211 The RP2350 has a one-time programmable memory 107 (referred to herein as the OTP) to store: * Hardware-relevant security configuration, e.g. secure boot enabled * Public key fingerprint for secure boot * Symmetric keys for decryption of flash contents into SRAM * Device information, e.g. unique device identifier, oscillator trim values * Boot configuration, e.g. enable serial slave boot * Customer data, possibly including a boot image for flashless operation 100221 For basic key and boot configuration support, a few hundred bytes would be sufficient, but boot-from-OTP 107 needs at least a few kilobytes to be useful. For storing decryption keys it is desirable that the OTP is user-programmable, since these keys are provisioned by the customer. A handful of critical configuration bits are desirably resilient against fault injection (i.e. deliberate, malicious manipulation of core power supplies). It is undesirable that critical bits can be flipped from a 1 to a 0 by glitching the core power supply at the time it is read from an OTP array. It is desirable to be able to hard-lock (i.e. with permissions stored in OTP itself) parts of the OTP 107 against reads/writes by either security domain one example of this is the chip information written in the factory, which should never be altered by the user. There is no need to lock the OTP word-by-word, so a page is defined as a contiguous OTP region controlled by a single set of locks. The page size is a 15 trade-off between lock granularity and OTP capacity dedicated to locking.

100231 It is also desirable to be able to further lock down OTP permissions in a non-persistent manner, so that early boot stages can restrict the access of later stages, irrespective of security domain. The prime example of this is the decryption keys used to load an encrypted secure binary: these should be accessible only to the code that does the decryption, and then sequestered until the next boot.

100241 To support board configuration that can be edited by the user but is not accessible to firmware running on the device, an OTP access key is implemented in hardware. Since OTP is unreliable, general data stored in the OTP is provided with Forward Error Correction (FEC) protection or similar. However, since the OTP needs to support some data structures (such as thermometer-code counters used for rollback protection) that require the reading and writing of individual bits, it must be possible to bypass any FEC protection. Ideally FEC vs non-FEC reads should be a non-stateful operation, e.g. two different address windows, one FEC and one non-FEC, so that read accesses can be freely mixed. There is no such requirement for writes, since OTP writing is inherently a stateful operation.

100251 Various commercially available non-volatile memories (NVM) can meet the above requirements, e.g. NVM designs provided by Synopsys, Inc of Sunnyvale, CA, USA.

100261 The redundancy coprocessor 200, which may also be considered a security coprocessor, is attached to the Cortex-M33 coprocessor port, and is used in the bootrom to provide runtime integrity checking during boot, and mitigate return-oriented programming attacks from user code using the bootrom as a ROP surface.

100271 The security requirements implemented in the RP2350 are the following: L Prevent unauthorised code from running on the device, even with physical access 2. Protect unauthorised reading of user code and data, even with physical access 3. Isolate trusted and untrusted software, running concurrently on the device, from one another.

100281 Security requirement (1), preventing unauthorised code from running, is a prerequisite for the other two high-level requirements. What we mean here, more concretely, is: * The recipient of a blank device can permanently alter it, such that it will only run their code * Further alterations can revoke ability to run older software versions 100291 RP2350 can run software in the following ways: * Flash XIP * Loaded into SRAM from flash, OTP or a USB/UART/I2C host * Written into SRAM by the debugger 100301 As a flashless microcontroller, we must assume that anyone with physical access can: * Read and write the flash * Intercept flash accesses at runtime (MITM) 100311 Therefore, we can only trust code that has been loaded into internal SRAM and then checked for authenticity, to avoid a time-of-check/time-of-use issue with flash XIP. (Though checking signatures on XIP'd code may still be useful for circumstantial security.) Note that meeting requirement 3 (isolating trusted and untrusted software running concurrently) means we can still XIP untrusted software without interfering with the execution of trusted software that was loaded and checked in SRAM.

100321 As a proxy for authenticity, we rely on cryptographic signatures. A signature is a hash that has been encrypted using a private key. To check the signature, the recipient computes its own hash of the data, and decrypts the signature using a pre-provided public key.

If these two hashes match, it proves that the author of the data was in possession of the private key (or of an infinite amount of compute power).

100331 The requirements that follow are: * An option to enforce a cryptographic signature check on any loaded code * An option to disable debug interfaces, to prevent direct loading of code into

SRAM

100341 These will be implemented with: * OTP storage for: o Feature enable for the signature check o Public key fingerprint o Feature disable for debug access * Bootrom software to perform the signature check * Additional hardware to increase confidence that the bootrom executes correctly: o Canary flops, to directly detect glitching of the core supply o Control flow and data checks performed by a Redundancy Coprocessor described below o In either case, the response should be to either hard-reset or lock up the chip (fail-safe) 100351 Various different hash and cipher algorithms can be used, for example: * SHA-256 (SHA-2) hash * ECDSA-256 boot public key, stored with the image * SHA-256 boot public key fingerprint stored on-device 100361 Protection against rollback to old firmware images is provided by thermometer counters stored in OTP, and additional bootrom software.

100371 To protect user code and data (security requirement (2)), it is desirable to preventing unauthorised readback of firmware to make it more difficult to clone a product or reverse-engineer the firmware to find vulnerabilities. Preventing unauthorised data readback is necessary for storage of private/symmetric keys, or any other sensitive user data.

100381 As a flashless microcontroller, flash is external and we must assume that anyone with physical access can read the flash. However, since flash is our only nonvolatile, mutable storage, there is often no choice but to store our code and data in flash. The solution is to encrypt flash contents, reducing the problem of protecting flash, which is large and mutable, to protecting the decryption key, which is small and immutable and can therefore be stored in on-device OTP. Once flash contents is decrypted into SRAM, we lean on requirement (3) to ensure untrusted code can't access it from within the chip.

100391 Note that this does allow someone with access to the flash to destroy the sensitive data by overwriting it. However, they can also do this by hitting the flash with a hammer, so this isn't a huge concern.

100401 Some further properties that are desirable are: * Decryption is not implemented in hardware o Unlike signature checks, decryption raises sidechannel analysis (e.g. power sidechannel) concerns, as it has direct contact with secrets o This is difficult to get right, therefore it is desirable to make it as cheap as possible to patch o This rules out encrypted X1P (which is problematic anyway) * Decryption is not directly implemented in the bootrom o Again, it is desirable to be able to patch the code o Meeting requirement (1) allows us to trust a flash-loaded second stage to handle the keys [0041] So, to boot an encrypted software image: * Store an encryption key in OTP * Bootrom loads a decryption stage into SRAM o This program itself is not encrypted, since it contains no secrets o This program is authenticated, so we can trust it to access the key in OTP * Decryption stage chainloads the encrypted flash text into SRAIVI * Decryption stage can then lock down the OTP encryption key region from further access until the chip is reset, if the key is not required by the encrypted image itself * Decryption stage erases its working areas and registers before transferring control to the loaded image 100421 Therefore, the following hardware features are desirable: * Ability to lock read access to regions of OTP at runtime * Ability to disable or filter debugger accesses, to prevent direct reads to OTP 100431 Thanks to requirement (3) we have the option of leaving the decryption key open after decrypting the image, so that the encrypted program can perform further decryption of flash-resident data, without exposing the key or data to untrusted code.

100441 Isolating trusted and untrusted programs (security requirement (3)) maps to the Armv8-M Security Extension concepts of the Secure and NonSecure domain which is described in the Armv8-M Architecture Reference Manual published by ARM Limited and therefore only brief details are given here. Some key invariants are: * is roundabout means NonSecure code must not access Secure memory, even v * * NonSecure code must not be able to access device secrets (e.g * keys stored in such as DMA NonSecure memory must not be executed in the Secure processor state NonSecure code must not be able to interfere with peripherals managed by Secure code OTP) The Redundancy Coprocessor (RCP) 200 is attached to the Cortex-M33 coprocessor port. There is one identical instance per processor, accessed as coprocessor number 7. The RCPs are accessible to both Secure and NonSecure code, though NonSecure operations cannot observe or alter the RCP's internal state.

100461 The RCP 200 provides some mitigation against fault injection attacks during boot, and prevents return-oriented programming attacks using the bootrom as a source of ROP gadgets. It may also be referred to as the canary coprocessor (though RCP is preferred), named after stack canaries, tell-tale values placed between stack variables and the function return frame, which are checked on function exit to detect stack corruption.

100471 When an inconsistency is detected in Secure software, the RCP 200 asserts the processor's non-maskable interrupt line 108, and stalls any further RCP accesses indefinitely. Note: the bootrom NMI vector immediately puts the processor to sleep, so that it can't be glitched further. If the NMI is escaped, the coprocessor stall is a second attempt to bring the processor to a safe, halted state if it continues executing ROM code.

100481 The RCP's error state can only be cleared by a warm reset of the processor.

100491 It is possible to attach the debugger and reset the processor when the RCP is in its error state. Note: the RCP may fire due to non-malicious power supply instability, and it should be possible to diagnose via the debugger that the NMI state has been entered, so we can't stop the processor clock or hold it in reset.

100501 RCP writes are no-ops in the NonSecure state, and reads return 0. Note: this allows NonSecure software to share code with Secure software, without leaking coprocessor 5 state that would allow NonSecure code to construct valid stack canaries.

100511 The RCP's instructions fall into five categories: * Stack canary generation and checking * Checking redundant 32-bit integers * Checking redundant booleans * Checking sequence counts * Panicking 100521 The RCP supports three canary operations: * Write salt (mcrr instruction) * Read canary (mrc instruction) * Check canary (mcr instruction) 100531 NB as used herein, "mcr instruction" refers to a "Move to Coprocessor Register" instruction to transfer data from the main processor's general-purpose registers to coprocessor registers. "mcrr instruction" refers to a "Move to Coprocessor Register" instruction used to move a doubleword (64 bits) of data from two general-purpose registers into two coprocessor registers. "mrc instruction" refers to a "Move from Coprocessor Register" instruction used to transfer data from coprocessor registers to general-purpose registers within the main processor.

100541 Write salt is executed at power-on or whenever core 0 is reset, and seeds the canary with a 64-bit value read from the system true-random number generator, in a single 64-bit write (mcrr instruction). This value may be referred to as a cryptographic salt. Writing the salt twice, or executing any other coprocessor instruction without writing the salt first, is fatal. The salt is cleared (and becomes writable again) only by a warm reset of core 0. Note: except in debugger use cases, a reset of core 0 is also expected to cause a reset of core 1.

100551 To simplify the early boot path, both RCPs are initialised by core 0. The core 0 RCP provides instructions for writing the core 0 salt register and the core 1 salt register. Attempting to execute a salt-write instruction on core 1 is fatal. Core 1 is expected to spin in the bootrom until its RCP has been initialised by core 0.

100561 Read canary causes the security coprocessor to generate a 32-bit value which is a derived from the salt and an 8-bit tag value encoded in the instruction sent to the coprocessor, as discussed below. The 8 LSBs of the canary value are all-zeroes, so that string operations terminate when reaching a canary.

100571 Check canary writes back to the security coprocessor a 32-bit canary value to be validated, and has an 8-bit tag value that should match the corresponding canary read. Checking a canary value that does not match the corresponding canary read is fatal.

100581 Note: the intended use is for ROM functions to read a canary value with a unique tag at the beginning of each function, and place it on the stack. At the end of the function, the value is read off the stack, and checked with the same tag. This gives some confidence that the function was entered from the beginning rather than part way through, and that there was no stack corruption in between that overflowed into the stack return frame. Writing the canary to the relevant stack and reading it back are functions carried out by a process running on the main processor, not the redundancy coprocessor.

100591 The RCP supports one redundant 32-bit integer instruction: * Validate 32-bit integer (mcrr instruction) 100601 Note: a redundant 32-bit integer is a single 32-bit value stored redundantly in two 32-bit variables which always differ by a fixed XOR mask of 0x96009600. This value is advantageous as it can be loaded in a single instruction and has a Hamming weight of 8.

Critical operations with 32-bit arithmetic can be performed redundantly on the two halves, and at various points the two halves can be compared to confirm the two redundant chains of operations calculated the same value.

100611 The RCP supports another instruction which uses similar logic: * Assert that two registers are equal (mcrr instruction) equivalent to the validation instruction but with an XOR mask of 0.

100621 Validating a redundant integer whose two sides do not XOR to the correct value is fatal, i.e locks up both processors.

100631 The RCP supports 9 redundant boolean instructions: * Validate boolean (mcr instruction) * Assert boolean true (mcr instruction) * Assert boolean false (mcr instruction) * Three previous instructions, but with an additional register value XOR'd first (mcrr instructions) * Assert logical OR (mcrr instruction): assert two booleans are valid and one or more is true * Assert logical AND (mcrr instruction): assert two booleans are true * Validate two (mcrr instruction): assert two booleans are valid (provided for code size reduction) 100641 A boolean value of true is represented by the bit pattern Oxa500a500, and a boolean value of false is represented by the bit pattern Ox00c300c3. All other values are 10 poisonous.

100651 Validate boolean is fatal if the written value matches neither the true nor false bit pattern.

[00661 Assert true is fatal if the written value does not match the true bit pattern.

100671 Assert false is fatal if the written value does not match the false bit pattern.

100681 The two-operand XOR variants of the boolean instructions are identical to the single-operand instructions, except they first XOR two registers together and then check that result.

[00691 Note: redundant booleans can be used to check the consistency of branch decisions. For example, before an i if ( ) statement, validate that you are branching on a 20 valid boolean, and then in the if/else branches assert that the boolean is true/false.

100701 Note: the XOR variants may be used for redundant booleans used as function return codes. In this case, the callee XORs a unique per-function constant into its return code, and the caller XORs the same constant in before validating the return code. This confirms that the expected function was actually called.

[00711 Note: there will be software functions for forming redundant booleans from redundant u32s.

100721 The RCP provides two sequence count instructions, which can he used to check that multiple sections of a function are reached in order: * Write counter (mcr instruction) * Check and increment counter (mcr instruction) 100731 Write counter writes to an 8-bit counter inside the RCP.

100741 Check and increment counter writes a value which must match the current counter value, else fatal The counter post-increments by I. 100751 The cdp instruction is reserved for panicking, i.e. software has detected some inconsistency and wishes to halt immediately.

100761 Panic instructions stall the coprocessor port indefinitely and hence blocks the main processor until a hardware reset. Note: stalling is the safest thing to do, as any further action is also liable to be glitched. However, due to the design of the Cortex-M33 coprocessor port, stalls are not to be data-dependent, so only the panic instruction has this behaviour.

100771 If the coprocessor stall on panic falls through, a non-maskable interrupt (NMI) is asserted. Note that the Cortex-M33 will abandon a stalled coprocessor access if a higher-priority interrupt arrives, in which case plan B is to go to the bootrom NMI vector and sleep the core that way.

100781 Coprocessor accesses after a panic will stall indefinitely. Plan C is to halt the processor as soon as possible if it manages to escape the NMI vector. Note that there is no 15 higher priority interrupt than the NMI, so this halt will not be abandoned.

100791 All coprocessor instructions support a random delay, which stalls for up to 127 cycles based on an internal multi-LFSR random number generator. Different canary salts give different sequences of random delays (the delay pseudo random number generator is seeded by the canary salt). Instigation of random delays hardens against fault injection attacks, especially attacks that manipulate the supply voltage.

100801 For convenience, the delay is enabled on all instructions by default, and can be explicitly disabled for some instructions by setting one of their opc2 bits. It may be disabled on instructions in user-facing functions where the delay is not desirable.

100811 NonSecure code is not delayed by random delays. We don't want to make it too easy for NonSecure code to determine the seed of the PRNG.

100821 Desirably, the redundancy coprocessor and the main processor are combined in a single system-on-chip (SOC), in other words the main processor(s) and redundancy coprocessor(s) are formed on the same semiconductor (silicon) substrate. This ensures that the timing of instructions and responses is consistent and avoids delays. The security coprocessor is small (in terms of silicon real estate) and need not be a layout block of its own. In a system having multiple processors, it is desirable that each processor be associated with a respective security coprocessor. In system having a host SoC and an 10 southbridge with its own embedded processor, the embedded processor desirably has a security coprocessor. In particular, any boot coprocessors which aid in launching host processors desirably have a security coprocessor.

100831 Communication between the processor and its redundancy coprocessor is desirably through synchronous instructions ensuring deterministic and predictable behaviour.

100841 Figure 4 is a schematic diagram of instruction and data flow in the redundancy coprocessor 200. The redundancy coprocessor implements hardware-checked assertions, to aid control flow and data flow integrity checking. Its two-phase pipeline is closely coupled to the Cortex-M33 pipeline. A 64-bit salt register 201 holds a once-per-boot random number, which is used to generate and validate stack canary values, and generate pseudorandom delay sequences on RCP instructions. Other comparison functions provide more general hardware-checked assertion support.

100851 The redundancy coprocessor (RCP) is used in the RP2350 bootrom to provide hardware-assisted mitigation against fault injection and return-oriented programming attacks. 15 This includes the following instructions: * generate and validate stack canary values based on a per-boot random seed * assert that certain points in the program are executed in the correct order without missing steps * validate booleans stored as one of two valid bit patterns in a 32-bit word * validate 32-bit integers stored redundantly in two words with an XOR parity mask * halt the processor upon reaching a software-detected panic condition 100861 The RCP can be used by other secure software running on the chip; it is not restricted to the bootrom.

100871 The RCP instruction is set out in full below. RCP instruction encodings contain a parity bit, and executing an invalid instruction or an instruction with bad parity triggers an RCP fault.

100881 Each Cortex-M33 processor is equipped with a single RCP instance, mapped as coprocessor number 7 in the coprocessor opcode space. The two RCP instances are linked: an RCP fault on one core immediately triggers a fault on the other. An RCP fault takes place in 30 two steps: * The non-maskable interrupt (NMI) is asserted, and remains asserted until a warm reset of the processor.

* Any further RCP instructions stall the coprocessor port until a warm reset of the processor. This stall cannot be interrupted, as the processor is already in the NMI state.

100891 In the RP2350 bootrom, the NMI and HardFault vectors are implemented with an rcp p an i c instruction, which unconditionally stalls the coprocessor port. This is intended to prevent the processor from retiring any more instructions until either a debugger connects to reset the processors, or the processors are reset by some other mechanism such as the system watchdog timer. The processor quickly reaches a quiescent state where it is far less vulnerable to further fault injection, deliberate or otherwise.

100901 Each core's RCP has a 64-bit seed value, which the RCP uses to generate stack canary values, and to add short pseudorandom delays to RCP instructions. Both RCP instances are seeded by core 0 during the early boot path in the bootrom, using the system true-random number generator. Until a salt value is provided, any RCP instruction immediately triggers an RCP fault, making it difficult to skip the initialisation. The use of random data in stack canary values makes it difficult to reuse return-oriented-programming stack payloads across multiple boots.

100911 Figure 4 gives a dataflow-level overview of the RCP hardware. The RCP is structured as a two-phase pipeline (Opcode Phase and Data Phase), which overlays the Cortex-M33 execution pipeline. It exchanges data with the core via a 64-bit incoming bus (CPWDATA) and a 32-bit outgoing bus (CPRDATA). The Cortex-M33 can issue two register reads to the coprocessor in one cycle through the CPWDATA bus, and the RCP leverages this throughput for some of its assertion instructions, such as rcp i equal which raises a fault when two Arm registers do not contain the same 32-bit value. Processor Opcode interface 202 decodes instructions received on CPOPC bus to obtain control signals 203, tag 204 and decode error flag 205.

100921 The 8-bit "tag" value 204 in Figure 4 is an 8-bit instruction immediate value, encoded by the instruction CRn and CRm fields. These 8-bit values are used to uniquely identify functions for canary value generation 206, so that stack frames are not interchangeable between functions. They also provide 8-bit counter values for rcp count set and rcp count check instructions. Encoding the tags using the CRn, CRm fields makes RCP instruction sequences more compact, as it obviates additional instructions to materialise these small constants in registers and pass them through CPWDATA. It also makes the tag values less vulnerable to glitching, because the instruction opcode fields are available earlier in the cycle than the register values passed on CPWDATA.

100931 RCP instructions may also execute in the Non-secure state, with certain differences to prevent Non-secure code from triggering RCP faults or observing the value of the salt register. This supports Non-secure software executing shared ROM routines which contain RCP instructions, but does not allow probing of the RCP's internal state from a Non-secure context. Further details and rationale for Non-secure execution support are given below.

100941 Fault conditions arising as a result of a decode error 205 on a failed comparison 207 are combined by OR gate 208. The output of OR gate 208 and the current state of Fault Flag 210 are OR'd by OR gate 209 to set Fault Flag 210.

10095_1 Certain details are elided from Figure 4 for clarity, such as the delay counter used for pseudorandom instruction delays, and the logic for suppressing faults under Non-secure 15 execution. This behaviour is described in full below.

100961 Salt Register Each RCP instance is provisioned with a 64-bit salt register 201, which provides a seed for stack canary values and random instruction delays. This is expected to be initialised 20 with a random value early in the boot process: the RP2350 bootrom uses the true random number generator to generate the salt values.

100971 Initially the salt register is in the invalid state. In this state only the following operations are permitted: * Checking the valid state of the salt register, via rcp canary status * Writing a salt via rcpsaltcore0 or rcp salt corel, which writes a 64-bit value to that core's salt register, and changes its state to _valid_.

100981 Any other RCP instruction unconditionally triggers an RCP fault when the salt register is invalid. This makes it difficult to skip RCP initialisation via fault injection, because the RP2350 bootrom contains a high density of RCP instructions.

100991 Similarly, attempting to write to an already-valid RCP salt register triggers an RCP fault. There is no reason to initialise the RCP salt register twice, so this case is detected as an anomaly that indicates loss of control flow integrity.

101001 Core O's coprocessor port writes the salt registers for both cores' RCP instances, to simplify multicore interactions during early boot. In the RP2350 bootrom, core l's first steps are to lock down its MPU execute permissions to a small region of the ROM containing its wait-for-launch code, and then poll for its RCP salt to become valid once core 0 has cleared 5 boot memory, performed some minimal hardware setup, and generated the RCP salts.

101011 When core 0 is switched to RISC-V architecture, and core 1 is Arm, the core 1 salt register is forcibly marked as valid to permit core 1 to execute the ROM. This has no impact on secure boot because RISC-V cores are only enabled when secure boot is disabled: ability to set core 0 to RISC-V already implies subversion of secure boot.

101021 Access from Non-secure Setting bit 7 of the Cortex-M33 NSACR register permits Non-secure code to set bit 7 of CPACR NS, which in turn enables Non-secure access to the RCP. Non-secure RCP access is useful for executing shared Secure/Non-secure routines which contain RCP instructions. For example, the memcp y implementation in the RP2350 bootrom is shared by Secure code in the main boot path, and Non-secure code such as the USB bootloader.

101031 Since an RCP fault is fatal for all software running on the system, Non-secure must not be able to trigger RCP faults at will. Similarly, if Non-secure code were able to read out the RCP salt register, it would make it easier to engineer stack payloads which can control Secure execution without triggering RCP faults. Therefore, the RCP handles Non-secure accesses differently from Secure: * Read data is masked to all-zeroes * Write data is ignored: any instruction which would generate a data-dependent RCP fault becomes a no-op * Invalid instructions report coprocessor errors instead of RCP faults, which the processor maps to Non-secure UNDEFINSTR UsageFault * The pseudorandom instruction delay is skipped: all RCP instructions execute in one cycle, assuming the Cortex-M33 is able to issue them at one instruction per cycle 10104] The lack of pseudorandom instruction delays makes it more difficult for Non-secure code to extract the seed value used to add delays to Secure execution of RCP instructions.

101051 Instruction Validation Processor Opcode interface 202 applies the following rules to all coprocessor instructions which target coprocessor 7: * The number of 1 bits in the Opc] field, plus the instruction parity bit, must be an even number.

o For mcr, mrc and cdp instructions, the parity bit is encoded by bit 0 of the Opc2 field.

o For mcrr, it is encoded by bit 3 of the CRm field.

* The instruction must not an mrrc (64-bit coprocessor-to-core) * For mcr instructions (32-bit core-to-coprocessor):

o The Opcl field must be in the range 0 through 6.

o If there is no 8-bit tag (i.e. any other than rcp canary check, rcp count check, rcpcounts et) then the CRn and CRm opcode

fields must be all-zeroes.

* For mrc instructions (32-bit coprocessor-to-core):

o The Opcl field must be in the range 0 through 2.

o For instructions other than rcpcanaryval id and rcp canary get, the CRn and CRm opcode fields must be all-zeroes.

* For mcrr instructions (64-bit core-to-coprocessor):

o The Opcl field must be in the range 0 through 8.

o For the r cpsaltcore instructions, the CRm field must be 0 or 1 (referred to as rcp salt core° and rcp salt corel respectively) o For all other mcrr instructions, the CRm field must be 0 101061 The terms Opel, Opc2, CRm and CRn in the description above refer to standard encoding fields in the Arm T32 instruction encoding for coprocessor instructions. See the Armv8-M Architecture Reference Manual for full details of the encoding and assembler syntax.

101071 Any coprocessor instruction targeting coprocessor 7 that fails these validation rules will have one of two outcomes, depending on the security domain in which the instruction is executed: * Secure execution of an invalid instruction is an immediate, unconditional RCP fault. The RCP asserts the core's non-maskable interrupt signal, and any further RCP instructions stall the coprocessor port indefinitely. This continues until the core receives a warm reset. This also triggers RCP faults on other cores, as described in "Cross-core Triggering" below * Non-secure execution of an invalid instruction returns an error on the opcodephase coprocessor interface, which is interpreted as a Non-secure UNDEFINSTR UsageFault by the core. (The full description of this Armv8-M-specific fault is furnished by the Armv8-M Architecture Reference Manual.) 101081 Cross-core Triggering An RCP fault indicates that the integrity of the software environment as a whole is compromised. Though the fault may originate on a single processor, all processors which share the same trusted memory may behave unpredictably if they continue to execute, since: * The physical condition which caused one processor to mis-execute in a detectable way, such as low supply voltage, may cause other processors to mis-execute in a manner which was not detected * The processor which triggered an RCP fault may already have corrupted shared, trusted memory contents in such a way that other processors misbehave: particularly corrupting the other core's stack 101091 Therefore an RCP fault on one core also triggers an RCP fault on other cores.

RP2350 has only two cores; an RCP fault on core 0 always triggers a fault on core 1, and vice versa.

101101 A logic circuit 300 to effect cross-core triggering is depicted in Figure 5.

Triggering an RCP fault on one core also triggers a fault on the other core. Triggers are accumulated into respective fault registers 305, 306 which remains set until the core is reset. The NMI asserts when the fault register is set.

101111 Each core locally ORs in the trigger signal from the other core. The outputs of the two OR gates 301,302 on the left are logically equivalent, but the gates are kept local to the 30 core to minimise delay on the routing of the core's own fault trigger to its own fault register. A further two OR gates 303,304 OR the output of the respective one of the first two OR gates 301, 302 and the output of the respective fault registers 305, 306. Outputs of the further OR gates 303, 304 are used to set the respective fault register 305, 306.

[0112] Stack Canary Values Canaries are values written to the stack on function entry, and validated on function exit, to assure that: * The exit matches the entry: when leaving through the back door, you must have entered through the front door * The stack was not completely overwritten in the course of executing the function [0113] This helps to mitigate two classes of attack: * Fault injection: any physical fault condition which corrupts the program counter or causes a wild indirect branch is likely to cause the processor to execute a function epilogue which does not match the prologue. Any branch into the middle of a function is likely to eventually reach the epilogue.

* Return-oriented programming: deliberate stack corruption, for example by exploiting missing bounds checks on stack buffer operations, can redirect control flow through a sequence of function tails which perform arbitrary operations. The random canary values make it difficult to craft such a stack payload.

[0114] Return-oriented programming mitigation is particularly important for the bootrom because it exposes an API surface that is mapped at a known location at runtime (the bootrom is physically always mapped at 0 x 0 00 000 0 0) and therefore provides a well-known exploit surface in the same way as the C standard library.

[0115] The RCP's support for canary values is in the form of two instructions: * rcp canary get generates a 32-bit value as a function of the salt register and an 8-bit tag * rcp canary check validates a 32-bit value, and raises an RCP fault if the value does not match that produced by an rcp canary get for the same tag.

[0116] The 32-bit canary value is as follows: * Bits 7: 0 are all-zero * Bits 15: 8 are the XOR of bits 7: 0 of the salt, with the AND of bits 31: 2 4 of the salt and the 8-bit tag * Bits 2 3: 1 6 are the XOR of bits 1 5: 8 of the salt, with the AND of bits 3 9: 3 2 of the salt and the bitwi se NOT of the 8-bit tag * Bits 31:24 are the XOR of bits 23: 16 of the salt with the 8-bit tag 101171 This can equivalently be expressed as C code: uint32 t canary value(uint64 t salt, uinte t tag) 1 uint32 t tag expanded > > 24) & Ox0Offffu)); (uint32 t)tag ((uint32 t)-tag << 8) ((uint32 t)tag << 16); tag expanded &= (Oxff0000u 1 ((salt uint32 t result24 = tag expanded ^ salt; return result24 << 8; 101181 This canary value is chosen such that: * Different tags are guaranteed to yield different canary values * For any two different tags, each is a function of at least one salt bit that the other is not a function of (so it is difficult to calculate canaries for different tags even if one value is known) * Null-terminated string operations on the stack terminate before reading or writing a canary 101191 Desirably, each function uses a different canary tag, to prevent a stack frame for one function being used to return through another function's epilogue. It is also desirable to 25 avoid using canary values for other purposes than stack canaries. Other functions to derive a salt value having the above properties can also be used.

101201 The RP2350 bootrom uses 8-bit tags in the range 0 x 4 0 through Oxbf. The remaining tags are free for use by user code.

101211 Pseudorandom Instruction Delays By default, all RCP instructions execute with a pseudorandom delay in the range of 0 to 127 cycles. These delays make it more difficult for an outside observer to precisely time a fault injection event with respect to an RCP instruction, or the critical code path it protects.

101221 Setting bit 12 of the first halfword of an instruction disables the pseudorandom delay for that instruction only. The instruction executes in a single cycle, assuming the Cortex-M33 does not insert stall cycles due to other microarchitectural constraints. To set this bit, assemble the *2 variant of any given coprocessor instruction, e.g. mr c2 rather than rar c. In the Non-secure state, RCP instructions always execute without delay.

101231 The RCP implements instruction execution delays by stalling the coprocessor opcode interface 202 during the opcode phase (shown in the Figure 4 pipeline diagram). The Cortex-M33 may choose to abandon a stalled coprocessor instruction due to an interrupt. When this happens, the delay counter continues counting down, waiting for the delay period to elapse. If the Cortex-M33 issues another RCP instruction whilst the delay counter is still running (either in the interrupt, or after returning to the interrupted RCP instruction), this instruction executes once the existing countdown completes. However, if the delay counter of an abandoned instruction has already expired before the next RCP instruction executes, the next instruction samples a pseudorandom delay count, and begins a new countdown.

101241 The pseudorandom delay sequence is a function of bits 63: 4 0 of the salt value.

As such the pattern of delays is unique per-boot, provided each boot writes a different 64-bit value to the salt register.

101251 The pseudorandom number generator (PANG) used for delays implements a number of small linear feedback shift registers (LFSRs) in bits 63: 4 0 of the salt register, and returns a nonlinear function of the 24-bit state. The LFSR feedback functions on the 24-bit state are: * Bits 2 3: 2 0: 4-bit LFSR with taps Oxc * Bits 1 9: 15: 5-bit LFSR with taps Ox 1 4 * Bits 1 4: 8: 7-bit LFSR with taps Ox 6 0 * Bits 7: 0: 8-bit LFSR with taps 0 xb 4 101261 The LFSRs are implemented by shifting the XOR reduction of (state AND taps) into the LSB with each state update. When an LFSR's state is all-zeroes, a one bit is shifted into the LSB. The LFSR state advances each time a random number is generated: this happens when executing an instruction with a pseudorandom delay, or when executing a rcp random byte instruction.

101271 Each bit of the pseudorandom output is the XOR of six bits of the 24-bit state, XORed with the majority-3 vote of three other bits of the state: Output Bit XOR Taps Majority-3 Taps 7 7 17 6 16 13 8 9 12 21 6 4 21 19 6 16 13 4 14 6 7 5 2 18 11 1 18 14 7 4 4 19 17 0 18 7 18 II 3 3 2 12 7 16 14 5 17 3 15 2 1 13 20 21 8 12 7 22 9 1 4 16 11 18 9 6 14 21 16 0 1 3 4 19 10 14 1 2 9 101281 Bits 6: 0 of this function are used for pseudorandom instruction delays, producing delays in the range of 0 to 127 cycles. The delay is in addition to the one-cycle base cost of 5 executing a coprocessor instruction. The full 8-bit result is available through the rcp random byte instruction.

101291 This is a simple pseudorandom number generator which makes it difficult to recover the initial 24-bit state from a small number of observations, by making the observation size much smaller than the state size, and using a nonlinear combination function for the output. It has a number of statistical aberrations which make it unsuitable for general random number generation, not to mention its small state size. For high-quality random number generation, either use the system true-random number generator (TRNG) directly, or use a high-quality software PRNG with a large state that has been seeded from the TRNG.

101301 Note that the 24 MSBs of the salt value used to seed the delay PANG do not overlap with the 40 LSBs used to generate stack canary values. Therefore, measuring the random delays externally provides no information on the canary values.

101311 Instruction Listing The Cortex-M33 processors access the RCP using Inc r, mcrr, mrc and cdp instructions.

The Armv8-M Architecture Reference Manual (http.,' ocunientation-s;;Tvicy"-.,:zarkl-LLcfnifstati ci() ^ d2484i67$1).e7) describes the intricacies of these instructions in relation to the processor's architectural state, but from the coprocessor's point of view: * mcr writes a 32-bit value to the coprocessor, from a single Arm integer register * mcrr writes a 64-bit value to the coprocessor, from a pair of Arm integer registers * mrc reads a 32-bit value from the coprocessor, writing to either a single Arm integer register or to the processor status flags * cdp performs some internal coprocessor operation without exchanging data with the processor [0132] For each mcr, mcrr, mr c and cdp instruction, the RCP also accepts the matching mcr2, mcrr2, mrc2 and cdp2 opcode variant. These differ only in bit 12 of the opcode. The plain versions have a pseudorandom delay of up to 127 cycles on their execution, whereas the 2-suffixed versions have no such delay.

101331 Most RCP instructions are in the form of hardware-checked assertions. The phrase "assert that" in the following instruction listings means, if some asserted condition is not true, raise an RCP fault.

[0134] Initialisation 101351 rcp salt core° Assert that the core 0 salt register is currently invalid. Write a 64-b t value, and mark it as valid.

Opcode: mcrr p7, #8, Rt, Rt2, ce Rt is the 32 LSBs of the salt, Rt2 is the 32 MSBs.

101361 rcp salt corel Assert that the core 1 salt register is currently invalid. Write a 64-b t value, and mark it as valid.

Opcode: mcrr p7, #8, Rt, Rt2, cl 101371 rcp canary status Return true/false bit pattern (Oxa500a500 or 0x00c300c3 respectively) for whether the salt register for this core has been initialised.

Opcode: mrc p7, #1, Rt, cO, cO, #0 Invoking with Rt = Oxf will set the Arm N and C flags if and only if the salt register is valid. If the salt has not been initialised then any operation other than initialising the salt or checking the canary status triggers an RCP fault.

This opcode is used on core 0 to skip the RCP initialisation sequence if the bootrom has been 10 re-entered without reset under debugger control, and on core 1 to wait for its RCP salt to be initialised.

101381 Canary 101391 rcp canary get Get a 32-bit canary value, as a function of the salt register and the 8-bit tag encoded by two Obit coprocessor register numbers CRn and CRm. CRn contains the four MSBs, and CRm the LSBs.

Opcode: mrc p7, #0, Rt, CRn, Cam, #1 101401 The 32-bit value returned by this instruction is described above (Stack Canary Values), but in general this should be treated as an opaque value to be consumed by rcp canary check.

101411 rcp canary check Assert that a value matches the result of an r cpc anary get with the same 8-bit tag. The tag is encoded by two 4-bit coprocessor register numbers CRn and CRm. CRn contains the four MSBs, and CRm the LSBs.

Opcode: mcr p7, #0, Rt, CRn, CRm, #1 101421 Boolean Validation The RCP defines Oxa500a500 as the true value for 32-bit booleans, and 0x00c300c3 as the false value. All other bit patterns are poison, and trigger an RCP fault when consumed by any RCP boolean instructions. These values are chosen as they are valid 5 immediates in Annv8-M Main.

101431 This provides limited runtime type checking that boolean values are used in boolean contexts. The RP2350 bootrom occasionally uses redundant operations to generate booleans in a way that results in an invalid bit pattern if the two redundant operations did not return the same value, such as when checking boot flags in OTP.

101441 rcp bvalid Assert that Rt is a valid boolean (Oxa500a500 or Ox00c300c3).

Opcode: mcr p7, #1, Rt, cO, cO, 101451 rcp btrue Assert Rt is true (Oxa500a500).

Opcode: mcr p7, #2, Rt, GO, 00, #0 101461 rcp bfalse Assert Rt is false (0x00c300c3).

Opcode: mcr p7, #3, Rt, cO, cO, #1 101471 rcp b2valid Assert Rt and Rt2 are both valid booleans.

Opcode: mcrr p7, #0, Rt, Rt2, ce 101481 rcp b2and Assert Rt and Rt2 are both true.

20 25 30 Opcode: mcrr p7, #1, Rt, Rt2, cO [0149] rcpb2or Assert both Rt and Rt2 are valid, and at least one is true.

Opcode: mcrr p7, #2, Rt, Rt2, ce [0150] rcpbxorvalid Assert Rt XOR Rt2 is a valid boolean. The XOR mask is generally a fixed bit pattern used to validate the origin of the boolean, such as a return value from a critical function.

Opcode: mcrr p7, #3, Rt, Rt2, c8 101511 rcp bxortrue Assert Rt XOR Rt2 is true.

Opcode: mcrr p7, #4, Rt, Rt2, ce [0152] rcpbxorfalse Assert Rt XOR Rt2 is false.

Opcode: mcrr p7, #5, Rt, Rt2, c8 [0153] Integer Validation [0154] rcpivalid Assert Rt XOR Rt2 is equal to 0x96009 60 0. This is used to validate 32-bit integers stored redundantly in two memory words. The XOR difference provides assurance that two 30 parallel chains of integer operations have not mixed.

Opcode: mcrr p7, #6, Rt, Rt2, c8 101551 rcp iequal Assert Rt is equal to Rt2. Useful for general software assertions that are worth checking in hardware.

Opcode: mcrr p7, #7, Rt, Rt2, ce 101561 Random 101571 rcp random byte Return a random 8-bit value generated from the upper 24 bits of the 64-bit salt value. Bits 31: 8 of the result are all-zero.

Opcode: mrc p7, #2, Rt, cO, cO, #0 101581 This is the same PRNG used for random delay values. It is mainly exposed for debugging purposes, and should not be used for general software RNG purposes, because the 24-bit state space is inadequate for scenarios where the quality and predictability of the random numbers is important.

101591 This instruction never has an execution delay. Once the Cortex-M33 issues the coprocessor access, it always completes in one cycle.

101601 Sequence Count Checking These instructions are used to assert that a sequence of operations happen in the correct order. The count is initialised to an 8-bit value based on tag 204 at the beginning of such a sequence, and then repeatedly checked 207, incrementing 211 with each check. If the 8-bit check value does not match the current counter 213 value, the coprocessor raises an RCP fault. Multiplexor 212 supplies either the initial value or the incremented value to sequence counter 213.

101611 rcp count set Write an 8-bit count value to the RCP sequence counter. The 8-bit value is encoded by two Obit coprocessor numbers: CRn provides the MSBs, and CRm the LSBs.

Opcode: mcr p7, #4, rO, Can, Cam, 40 101621 rcpcomatcheck Assert that an 8-bit count value matches the current value of the RCP sequence counter. Increment the counter by one, wrapping back to Ox00 after reaching Oxf. f. The 8-bit count value is encoded by two 4-bit coprocessor numbers: CRn provides the MSBs, and CRm the LSBs.

Opcode: mcr p7, #5, rO, CRn, CRm, 101631 Panic 101641 rcp panic Stall the coprocessor port indefinitely. If the processor abandons the coprocessor access, assert NMI and continue stalling the coprocessor port. Also, immediately raise an RCP fault on other cores.

Opcode: cdp p7, #0, cO, cO, cO, #1 101651 Software executes an rcp panic instruction when it detects a condition that makes it unsafe to continue executing the current program. The RCP responds by stalling the processor's CDP access forever, which should cause the processor to stop fetching and executing instructions.

101661 The processor is allowed to abandon a stalled coprocessor instruction when interrupted, which may cause it to continue executing in an unsafe state. The RCP responds to an abandoned transfer by asserting the non-maskable interrupt, preempting the interrupt handler that caused the coprocessor access to be abandoned. This should swiftly encounter another RCP instruction and once again stall the processor, this time without allowing interruption.

101671 Panic is specified in this way, rather than simply gating the processor clock, so that the debugger can still attach cleanly to the processor after a panic.

101681 Conclusion

Logic modules and components of the present invention can be incorporated in a variety of other devices, such as IO modules, interfaces, single board computers, micro-controller devices, etc. and are particularly useful in portable devices such as smart phones due to their low power consumption. Logic modules of the invention can be embodied in separated integrated circuits or incorporated in other devices, such as System on Chip devices. Security coprocessors embodying the principles of the present invention require little silicon real estate, and so can be included in a die with other modules.

101691 The methods of the present invention may be performed by computer systems comprising one or more computers. A computer used to implement the invention may comprise one or more processors, including general purpose CPUs, graphical processing units (CPUs), tensor processing units (TPU) or other specialised processors. A computer used to implement the invention may be physical or virtual. A computer used to implement the invention may be a server, a client or a workstation. Multiple computers used to implement the invention may be distributed and interconnected via a network such as a local area network (LAN) or wide area network (WAN). Individual steps of the method may be carried out by a computer system but not necessarily the same computer system. Results of a method of the invention may be displayed to a user or stored in any suitable storage medium. The present invention may be embodied in a non-transitory computer-readable storage medium that stores instructions to carry out a method of the invention. Any suitable programming language may be used to implement the invention. The present invention may be embodied in a computer system comprising one or more processors and memory or storage storing instructions to carry out a method of the invention.

101701 Having described the invention it will be appreciated that variations may be made on the above described embodiments, which are not intended to be limiting. The invention is defined in the appended claims and their equivalents.

Claims

CLAIMS1. A security coprocessor for use with a processor having a stack, the coprocessor configured to: receive a tag value from the processor; generate a canary value based on the tag value and a salt; and return the canary value to the processor for incorporation in the stack 2. A security coprocessor according to claim 1 wherein the canary value is generated as logical combination of bits of the tag and bits of the salt such that (a) different tags are guaranteed to yield different canary values and/or (b) for any two different tags, each is a function of at least one salt bit that the other is not a function of 3. A security coprocessor according to claim 1 or 2 wherein the canary value has a predetermined number of least significant bits set to 0.A security coprocessor according to claim 1, 2 or 3 further configured to: wait for a salt based on a random number at power on or reset.A security coprocessor according to any preceding claim further configured to: receive a canary value read from the stack and a tag value; check that the canary value matches the tag value using the cryptographic function.6. A security coprocessor according to claim 5 further configured to: issue a panic instruction if the canary value does not match the tag value.7. A security coprocessor for use with a processor, the coprocessor configured to: receive a redundant value from the processor; validate the redundant value; and issue a panic instruction if the redundant value is not valid.8. A security coprocessor according to claim 7 wherein the redundant value is a redundant integer or a redundant Boolean.9. A security coprocessor according to claim 6, 7 or 8 wherein the panic instruction does at 5 least one of a. stalling the coprocessor port of the processor; b. asserting a non-maskable interrupt; and c. halting the processor.10. A security coprocessor for use with a processor, the coprocessor configured to: in response to a write counter instruction from the processor, initiate a counter inside the security coprocessor; and in response to a check counter instruction from the processor, the check counter instruction including a count value, check that the count value matches a current value of the 15 counter and increment the counter.11. A security coprocessor according to any preceding claim configured to delay by a random number of cycles before responding to an instruction.12. A system on chip comprising a processor and a security coprocessor according to any preceding claim.13. A system on chip according to claim 12 wherein the processor is a reduced instruction set computer.14. A system on chip according to claim 12 or 13 wherein there are a plurality of processors and a security coprocessor for each processor.15. A system on chip according to claim 14 wherein each security coprocessor is configured 30 to stall if another security coprocessor stalls.