CN119760794A

CN119760794A - Instruction prefix encoding for cryptographic capability calculation data types

Info

Publication number: CN119760794A
Application number: CN202311850381.8A
Authority: CN
Inventors: 大卫·M·达勒姆; 迈克尔·勒梅; 汉斯·戈兰·利列斯特兰德
Original assignee: Intel Corp
Current assignee: Intel Corp
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2025-04-04

Abstract

The present application discloses instruction prefix encoding for cryptographic capability computing data types. Techniques for instruction prefix encoding for cryptographic capability computing data types are described. In an embodiment, an apparatus includes: an instruction decoder for decoding a first instruction including a first prefix; and cryptographic circuitry for performing a cryptographic operation on data, the cryptographic operation being based at least in part on the first prefix and a relative enumeration in a pointer to the data.

Description

Instruction prefix encoding for cryptographic capability calculation data types

Statement regarding federally sponsored research and development

The invention was completed with government support under protocol No. N66001-23-9-4004 awarded by the pacific naval information warfare center and sponsored by the national defense advanced research project agency. The government has certain rights in this invention.

Technical Field

The present disclosure relates to instruction prefix encoding for cryptographic capability calculation data types.

Background

Computers and other information handling systems may store confidential, proprietary, and secret information in their memory. The software may have vulnerabilities that may be exploited to steal such information. Hardware may also have vulnerabilities that may be exploited and/or an adversary may make physical modifications to the system to steal information. Memory security and security are therefore important issues in computer system architecture and design.

Disclosure of Invention

According to a first aspect of the present disclosure, there is provided a processor device for instruction prefix encoding, comprising an instruction decoder to decode a first instruction comprising a first prefix, and cryptographic circuitry to perform a cryptographic operation on data, the cryptographic operation based at least in part on the first prefix and relative enumeration in a pointer to the data.

According to a second aspect of the present disclosure, there is provided a method for instruction prefix encoding, comprising decoding, by an instruction decoder of a processing device, a first instruction comprising a first prefix, and performing, by cryptographic circuitry of the processing device, a cryptographic operation on data, the cryptographic operation based at least in part on the first prefix and relative enumeration in a pointer to the data.

According to a third aspect of the present disclosure, there is provided a non-transitory machine-readable medium storing at least a single instruction that, when executed by a machine, causes the machine to perform a method for instruction prefix encoding, the method comprising decoding, by an instruction decoder of a processing device, a first instruction comprising a first prefix, and performing, by cryptographic circuitry of the processing device, a cryptographic operation on data, the cryptographic operation based at least in part on the first prefix and relative enumeration in pointers to the data.

Drawings

Various examples according to the present disclosure will be described with reference to the accompanying drawings, in which:

FIG. 1 illustrates a computing system for instruction prefix encoding according to an embodiment.

Fig. 2A, 2B, 2C, 2D, 2E, and 2F illustrate a method for instruction prefix encoding according to an embodiment.

Fig. 3 illustrates an encryption/decryption example that may include a location-dependent pointer according to an embodiment.

FIG. 4 illustrates an example computing system.

Fig. 5 illustrates a block diagram of an example processor and/or system on a chip (SoC) that may have one or more cores and an integrated memory controller.

FIG. 6A is a block diagram illustrating both an example in-order pipeline and an example register renaming, out-of-order issue/execution pipeline according to an example.

FIG. 6B is a block diagram illustrating an example in-order architecture core and an example register renaming, out-of-order issue/execution architecture core to be included in a processor according to an example.

Fig. 7 illustrates an example of execution unit circuitry(s).

FIG. 8 is a block diagram illustrating the use of a software instruction converter to convert binary instructions in a source instruction set architecture to binary instructions in a target instruction set architecture according to an example.

Detailed Description

The present disclosure relates to methods, apparatus, systems, and non-transitory computer readable storage media for instruction prefix encoding of encryption capability computation data types. According to some examples, an apparatus includes an instruction decoder to decode a first instruction including a first prefix, and cryptographic circuitry to perform a cryptographic operation on data, the cryptographic operation based at least in part on the first prefix and a relative enumeration in a pointer to data accessed by the first instruction.

As mentioned in the background section, memory security and security are important issues in computer system architecture and design.

Examples of computer memory security violations include overflowing a buffer to cause adjacent memory locations to be overwritten, reading beyond the end of the buffer, and creating a hanging pointer that does not resolve to a valid destination (e.g., for use after release). Some methods of protecting memory from attacks such as these include tagging data with metadata to encode information about ownership, memory size, location, type, version, etc. However, particularly for embodiments with fine-grained labels, these methods may require additional storage and/or instructions and may negatively impact performance. Embodiments according to the present description may provide methods that use less storage (e.g., no or less metadata) and/or have less impact on performance, even for protecting data at fine granularity. Additionally, the use of cryptography prevents physical and hardware opponents, such as the use of an interposer on a memory bus or a hardware trojan horse.

For example, embodiments may use fine-grained cryptography to isolate objects and their various member variables. In an embodiment, encryption of the variable may be associated with an object instance and an instruction prefix enumeration of members of the object (e.g., each member). By uniquely encrypting the object and its member variables, protection can be maintained even in the presence of vulnerabilities, such as overflow of access from one variable to another.

Examples may be described and/or illustrated below and/or in the accompanying figures, embodiments may use instruction prefix enumeration and cryptographic addressing for various members of an object. For example, based on the instruction prefix and memory address, cryptographic trimming (tweak) for each object instance and each object member variable may be used to provide cryptographic differentiation between objects and members within objects without additional instructions, tags, or isolated memory for metadata.

Descriptions of embodiments based on instruction prefix enumeration, relative enumeration in pointers, encryption capability calculation (C3), etc. are provided as examples. Embodiments may include and/or relate to other memory security technologies.

Fig. 1 illustrates an apparatus (e.g., computing system) 100 for instruction prefix encoding according to an embodiment. Apparatus 100 may correspond to a computer system, such as multiprocessor system 400 in fig. 4.

The apparatus 100 is shown in fig. 1 as including a processor 110 and a memory 150, each of which may represent any number of corresponding components (e.g., multiple processors and/or processor cores, multiple Dynamic Random Access Memories (DRAMs), etc.).

For example, processor 110 may represent all or part of one or more hardware components, including one or more processors, processor cores, or execution cores, integrated on a single substrate or packaged within a single package, each of which may include multiple execution threads and/or multiple execution cores in any combination. Each processor represented as processor 110 or within processor 110 may be any type of processor, including general purpose microprocessors, such asProcessor family or fromA processor, special purpose processor, or microcontroller in another processor family of a company or another company, or any other device or component in an information handling system in which embodiments may be implemented. Processor 110 may be constructed and designed to operate in accordance with any Instruction Set Architecture (ISA), whether or not controlled by microcode. For convenience and/or illustration, some features (e.g., instructions, etc.) may be referred to by a name associated with a particular processor architecture (e.g.64 And/or IA 32), embodiments are not limited to these features, names, architectures, etc.

The processor 110 may be implemented in circuitry, gates, logic, structures, hardware, etc., all or portions of which may be included in discrete components and/or circuitry of a processing device or any other means integrated into a computer or other information handling system. For example, processor 110 in FIG. 1 may correspond to and/or be implemented/included in any of processors 470, 480, or 415 in FIG. 4, processor 500 or one of cores 502A-502N in FIG. 5, and/or core 690 in FIG. 6B, each of which is described below.

Memory 150 may represent one or more DRAMs and/or other memory components that provide system memory or other memory or storage within device 100 or for device 100. Memory 150 may contain one or more memory objects 152. Any such memory object may represent an object, region, structure, fragment, stack frame, etc. in memory in addition to associated program code or instructions, with which use of instruction prefix encodings in accordance with embodiments may be associated, and which may be referred to as an object for convenience.

As shown, processor 110 includes an instruction unit 120, an execution unit 140, and a secure memory access unit 130 (which may be included within execution unit 140). Processor 110 may include any number of each of these elements (e.g., a plurality of execution units) and/or any other elements not shown in fig. 1.

Instruction unit 120 may correspond to front-end unit 630 in fig. 6B and/or be implemented/included in front-end unit 630, as described below, and/or may include any circuitry, gates, logic, structure, hardware, etc., such as an instruction decoder, to fetch, receive, decode, interpret, schedule, and/or process instructions to be executed by processor 110, such as data move instruction (e.g., mov, add, sub, etc.) 122, address calculate instruction (e.g., load effective address or lea) 124, etc. In fig. 1, instructions that may be decoded or otherwise processed by instruction unit 120 are represented as blocks with dashed boundaries, because these instructions are not hardware themselves, but rather instruction unit 120 may include hardware or logic capable of decoding or otherwise processing these instructions.

Any instruction format may be used in embodiments, for example, an instruction may include an opcode and one or more operands, where the opcode may be decoded into one or more micro-instructions or micro-operations for execution by execution unit 140. The operand or other parameter may be associated with the instruction implicitly, directly, indirectly, or according to any other method. A prefix may be added to an instruction to modify instruction behavior, and the prefix may also include additional prefix data. The prefix has the added benefit that some prefix values may be ignored by the existing processor, so that the binary compiled using the instructions of the new prefix function constructed for the new processor will also be compatible with the existing processor and effectively ignored by the existing processor.

Execution unit 140 may correspond to and/or be implemented/included in execution engine 650 in fig. 6B and/or execution unit circuitry 662 in fig. 6B and 7, each of which is described below, and/or execution unit 140 includes any circuitry, gates, logic, structure, hardware, etc., such as arithmetic units, logic units, floating point units, shifters, load/store units, etc., to process data and execute instructions (including those shown in fig. 1), micro-instructions, and/or micro-operations. Execution units 140 may represent any one or more physically or logically distinct execution units.

Secure memory access unit 130 may represent and/or include any circuitry, gates, logic, structures, hardware, etc., such as cryptographic units, arithmetic units, logic units, load/store units, etc., to control, perform, participate in encryption, decryption, encoding, decoding, etc., of objects and their members described in this specification. The secure memory access unit 130 may include a cryptographic unit 132, which cryptographic unit 132 may represent and/or include any circuitry, gates, logic, structures, hardware, etc., e.g., to control, perform, participate in encryption, decryption, encoding, decoding, etc., of the objects and their members described in this specification.

The secure memory access unit may decrypt the partially encrypted pointer in conjunction with the address generation unit of the processor before converting the generated linear/virtual address to a physical address. The secure memory access unit may use the encryption pointer as a trimming or key stream generator to encrypt data stored to the memory/cache or decrypt data loaded from the memory/cache in combination with the secret key. The secure memory access unit may also create an integrity check value associated with the encrypted data on the storage device and verify the integrity check value for the encrypted data when loaded from the memory/cache.

Although shown as separate units in fig. 1, in various embodiments secure memory access unit 130 and/or cryptographic unit 132 may be partially or wholly integrated or included in another unit of processor 110, such as instruction unit 120 and/or execution unit 140.

For example, embodiments may use a combination of instruction prefix enumeration and relative enumeration fields in pointers to provide cryptographic protection for individual member variables, data elements, etc. (any of which may be referred to as members) of an object, data structure, etc. (any of which may be referred to as an object) to provide fine-tuning of encryption of the members. In an embodiment, the relative enumeration field in the pointer may be combined with version fields, location fields, etc. in the pointer to provide fine-tuning encryption for various instances of the encrypted object. In an embodiment, encryption of members may be separated from encryption of object instances to allow operation of unstructured memory copies (e.g., memcpy) and other generic functions that are not aware of the internal composition of the individual memory allocations/objects.

Embodiments may be described by way of example using the following sample of C source code and its corresponding x86 assembly code:

C, code:

Embodiments may include and/or be implemented according to two concepts. The first concept is to planarize the structure and enumerate the variables, as shown in the example in FIG. 2A. The second concept is to provide a relative enumeration field in the pointer (e.g., linear/virtual address), as shown in the example in fig. 2B.

Prefix enumerations for the individual instructions may then be added to the relative enumerations in the pointers to generate values to fine tune encryption of the members, as shown in the example in fig. 2C.

Referring to FIG. 2C, the compiler adds an enumeration prefix to the named SStrings structure such that there is a direct association between the instruction and the data members of the composite structure. The prefix is then added as a fine-tuning for data encryption of the storage device or decryption of load instructions associated with the respective memory data accesses. In this case, the instruction operates on the first byte of each of the strings, where the first mov sets the first byte of the first string to a value of 0 using the first prefix value as a trim and the second mov sets the first byte of the second string to a value of 0 using the second prefix value as a trim. Since the prefix value of each mov instruction is different, the resulting 0 is encrypted using different hints, resulting in different ciphertexts. Because of the different fine-tuning between two string members, overflows from one member to the other will result in different ciphertexts, preventing leakage and deterministic control of neighboring and/or non-neighboring members of the same data structure/object and/or across different instance data structures/objects by a writing adversary.

Referring now to the pointer relative enumeration manipulation example shown in FIG. 2D, to allow for a specific library manipulator structure that operates across functions and on more primitive data types, the relative enumeration field of the pointer allows the compiler to specify a relative offset using a prefix on the LEA instruction. Thus, the LEA operation provides an address to the associated substructure with the correct relative enumeration. For example, the LEA may add a prefix value to an input memory address relative enumeration field in the input register to produce a correctly formatted/encoded/encrypted address with updated enumeration in the output register. A first LEA instruction with prefix value 0 operates on a first nested SString structure address in the SStrings structure, while a second LEA instruction with prefix value 2 operates on a second nested SString structure address in the SStrings structure. The first LEA instruction with prefix value 0 will not increment the relative enumeration field of the address because the first SString is at the beginning of the nested SStrings structure. A second LEA instruction with prefix value 2 adds 2 to the relative enumeration field of the address because the second SString structure is two move enumerations away from the first SString structure in the nested SStrings structure.

Then, referring to the example shown in FIG. 2E, the access function may use a relative enumeration pointer field with an instruction prefix that is designed for a function that only understands the nested structure or the sub-structure of the original data type of the object by adding relative enumeration to the enumeration specified by the compiler in the instruction prefix. For example, the SStringCompare function understands the more primitive SString data type rather than the compound SStrings data type. However, the source and destination addresses passed into the function have the correct relative enumeration of each SString substructure nested within the SStrings data structure. The prefix value added to the mov instruction is then used to properly access byte array members of the SString structure. The prefix of the first mov instruction is added to the relative enumeration field of the address to produce the correct enumeration of the first SString byte array member with a value of 1, so the processor will decrypt the value of the first byte array with the correct fine-tuning. The second movement of the destination byte array of the destination SString structure will add a prefix value of 1 to the relative enumeration field of the destination address, resulting in the correct enumeration value of 3 for the second byte array member of the SStrings data structure. The processor will use this value 3 as a fine-tuning of the data encryption of the destination byte array.

Some embodiments may isolate individual array elements. For example, a different relative enumeration value or prefix value may be used to address each array element. This can also be applied in a nested manner. For example, in a structure array containing three fields per structure, a relative enumeration or prefix value of 3 may be added each time the pointer needs to advance to the next structure entry in the array. For example, if one or more fields in the structure are themselves arrays or nested structures, additional enumerations or prefix value offsets may be added to these fields.

Fig. 2F illustrates a method 200 for instruction prefix encoding according to an embodiment. Method 200 may be performed by and/or in connection with the operation of an apparatus, such as apparatus 100 in fig. 1, and/or in connection with the compilation of code to be performed by an apparatus, such as apparatus 100, and thus, all or any portion of the foregoing description of apparatus 100 may be applicable to method 200.

At 210, one or more structures are flattened structures and one or more variables of the structure(s) are enumerated, for example, by a compiler, as shown in the example of fig. 2A. In 220, a relative enumeration field is provided in a pointer (e.g., linear address), as shown in the example in fig. 2B.

At 230, prefix enumerations for the respective instructions are added to the relative enumerations in the pointers, e.g., by a compiler, as shown in the example in FIG. 2C, to generate encrypted trim values. At 232, the generated value is used to fine tune encryption of the member.

In 240, the relative offset may be specified using a relative enumeration field of the pointer, e.g., by a compiler using a prefix on the LEA instruction, as shown in the example of fig. 2D. In 242, the LEA operation(s) calculate and/or provide addresses to the associated substructures with the correct relative enumeration.

In 250, the access function uses a relative enumeration pointer field with an instruction prefix, as shown in the example in FIG. 2E.

In the processor pipeline, the data object may be encrypted based on the version field (or location) to provide instance-specific cryptographic distinctions, and then enumerated again to provide individual member variable cryptographic distinctions of the data. Thus, a function (e.g., memcpy) may decrypt a source instance using a version field as a fine-tune and re-encrypt the same object using a different version in a destination instance without knowing the enumeration of the various members of the data object.

The pointer may also be first partially encrypted or authenticated using relative enumeration, and then version/location again to prevent pointer forgery and deterministic manipulation of fields by an adversary. Encrypting the field will prevent adversaries from deterministically predicting the value. In an embodiment, fields may be combined with pointer authentication to provide further protection.

The detailed encoding of the prefix may take various forms. For example, existing prefix bytes (such as prefix bytes used for overlay segment selection or repeated data access) may be redefined to number fields and array elements. However, few existing prefix bytes are available for redefinition in this manner. Larger field or array element numbers may be specified using larger prefix encodings, such as REX, REX2, VEX, EVEX, and extended EVEX. Fields within such prefix encodings may be redefined, although this may interfere with the previous use of these fields, such as addressing extended register files. To avoid interfering with the previous use of these fields, additional byte(s) may be appended to these extended prefixes to accommodate the field and the numbering of the tuple elements.

Fig. 3 illustrates an encryption/decryption example that may include a location-dependent pointer. In fig. 3, the encoding pointer 310 is provided to the address cryptographic unit 302, and the address cryptographic unit 302 generates data/code fine-tuning 317 and decodes the linear address 312 based on the encoding pointer 310. The data/code key 316 and the data/code trim 317 are provided to a block cipher 372 in the cryptographic calculation engine 370 to generate a keystream 376 for a logic function 374 in the cryptographic calculation engine 370 to generate decrypted data/code 322 by decrypting the encrypted data/code 324.

In embodiments, the cryptographic calculation engine may also provide the diffusion data directly on the data using a block cipher, or a combination of a block cipher that uses an enumerated value as a trim value to diffuse the data and a stream cipher that uses a version/location and a secret key stored in a processor register to encrypt the data.

In an embodiment, a cryptographic calculation engine, such as an Advanced Encryption Standard (AES) Galois Counter Mode (GCM) or Ascon, may provide encryption of the authentication, allowing the integrity value to be stored with the data granule (e.g., message authentication code or MAC). Using integrity data, corruption may be detected and incorrect key trimming used when attempting to access the data. Pointer encryption may use a k-password or a trimmable bipbip password or similar password, or may use a pointer authentication code to encrypt or authenticate data using a secret key stored in a processor register.

In an embodiment, the data (and/or pointer) encryption key may be configured in registers accessible to privileged software and may be switched when switching a process, virtual machine to a kernel, or the like.

Techniques according to embodiments may also be used for memory tagging, e.g., pointer tags may provide relative enumeration, and instruction prefixes may be added to the relative enumeration, and the resulting values may be compared to stored memory tags for memory particles (e.g., 8 or 16 bytes).

Example apparatus, methods, and the like

According to some examples, an apparatus (e.g., a processing device) includes an instruction decoder to decode a first instruction including a first prefix, and cryptographic circuitry to perform a cryptographic operation on data, the cryptographic operation based at least in part on the first prefix and relative enumeration in a pointer to the data.

According to some examples, a method includes decoding, by an instruction decoder of a processing device, a first instruction including a first prefix, and performing, by cryptographic circuitry of the processing device, a cryptographic operation on data, the cryptographic operation based at least in part on the first prefix and relative enumeration in a pointer to the data.

Any such examples may include any one or any combination of the following aspects. The instruction decoder may also be configured to decode a second instruction that includes a second prefix based on relative enumeration in the pointer. The method may further include decoding, by the instruction decoder, a second instruction including a second prefix based on relative enumeration in the pointer. The data may correspond to members of the object. The first prefix may be associated with a member. The second prefix may be associated with the object. The cryptographic operation may be based at least on a fine-tuning derived from the first prefix and the relative enumeration. The fine-tuning may be derived from the sum of the first prefix and the relative enumeration. The processor device may also include execution circuitry to perform one or more operations corresponding to the first instruction, the one or more operations including moving data. The method may also include performing, by execution circuitry of the processing device, one or more operations corresponding to the first instruction, the one or more operations including moving data. The processor device may also include execution circuitry to perform one or more operations corresponding to the second instruction, the one or more operations including calculating an effective address of the data based at least in part on the pointer. The method may also include performing, by execution circuitry of the processing device, one or more operations corresponding to the second instruction, the one or more operations including calculating an effective address of the data based at least in part on the pointer.

According to some examples, an apparatus may include means for performing any of the functions disclosed herein, an apparatus may include a data storage device storing code that, when executed by a hardware processor or controller, causes the hardware processor or controller to perform any of the methods or portions of the methods disclosed herein, an apparatus, method, system, etc. may be as described in the detailed description, a non-transitory machine-readable medium may store instructions that, when executed by a machine, cause the machine to perform any of the methods or portions of the methods disclosed herein. Embodiments may include any of the details, features, etc. or combinations of the details, features, etc. described in this specification.

Example computer architecture

A description of an example computer architecture is detailed below. Other system designs and configurations known in the art are also suitable for laptop computers, desktop computers, and handheld Personal Computers (PCs), personal digital assistants, engineering workstations, servers, decomposition servers, network devices, network hubs, switches, routers, embedded processors, digital Signal Processors (DSPs), graphics devices, video game devices, set-top boxes, microcontrollers, cell phones, portable media players, handheld devices, and various other electronic devices. In general, various systems or electronic devices capable of incorporating the processors and/or other execution logic disclosed herein are generally suitable.

FIG. 4 illustrates an example computing system. Multiprocessor system 400 is an interface system, and includes multiple processors or cores, including a first processor 470 and a second processor 480 coupled via an interface 450, such as a point-to-point (P-P) interconnect, fabric, and/or bus. In some examples, the first processor 470 and the second processor 480 are homogenous. In some examples, the first processor 470 and the second processor 480 are heterogeneous. Although the example system 400 is shown with two processors, the system may have three or more processors, or may be a single processor system. In some examples, the computing system is a system on a chip (SoC).

Processors 470 and 480 are shown including Integrated Memory Controller (IMC) circuitry 472 and 482, respectively. Processor 470 also includes interface circuits 476 and 478 and similarly, second processor 480 includes interface circuits 486 and 488. Processors 470, 480 may exchange information via an interface 450 using interface circuits 478, 488. IMCs 472 and 482 couple processors 470, 480 to respective memories, namely a memory 432 and a memory 434, which may be portions of main memory locally attached to the respective processors.

Processors 470, 480 may each exchange information with a network interface (NW I/F) 490 via respective interfaces 452, 454 using interface circuits 476, 494, 486, 498. The network interface 490 (e.g., one or more of an interconnect, bus, and/or fabric, and in some examples a chipset) may optionally exchange information with the coprocessor 438 via interface circuitry 492. In some examples, coprocessor 438 is a special-purpose processor, such as, for example, a high-throughput processor, a network or communication processor, compression engine, graphics processor, general-purpose graphics processing unit (GPGPU), neural Network Processing Unit (NPU), embedded processor, or the like.

A shared cache (not shown) may be included in either processor 470, 480 or may be included outside of both processors, but connected to the processors via an interface such as a P-P interconnect, such that if the processors are in a low power mode, local cache information for either or both of the processors may be stored in the shared cache.

The network interface 490 may be coupled to the first interface 416 via an interface circuit 496. In some examples, the first interface 416 may be an interface such as a Peripheral Component Interconnect (PCI) interconnect, a PCI express interconnect, or another I/O interconnect. In some examples, the first interface 416 is coupled to a Power Control Unit (PCU) 417, which power control unit 417 may include circuitry, software, and/or firmware for performing power management operations with respect to the processors 470, 480 and/or the co-processor 438. The PCU 417 provides control information to a voltage regulator (not shown) to cause the voltage regulator to generate the appropriate regulated voltage. The PCU 417 also provides control information to control the generated operating voltage. In various examples, PCU 417 may include various power management logic units (circuitry) for performing hardware-based power management. Such power management may be entirely under the control of the processor (e.g., by various processor hardware and may be triggered by workload and/or power, thermal or other processor constraints) and/or may be performed in response to an external source (such as a platform or power management source or system software).

PCU417 is illustrated as residing as separate logic from processor 470 and/or processor 480. In other cases, PCU417 may execute on a given core or cores (not shown) of processor 470 or 480. In some cases, PCU417 may be implemented as a microcontroller (dedicated or general purpose) or other control logic configured to execute its own dedicated power management code (sometimes referred to as P-code). In still other examples, the power management operations to be performed by PCU417 may be implemented external to the processor, such as by a separate Power Management Integrated Circuit (PMIC) or another component external to the processor. In still other examples, the power management operations to be performed by PCU417 may be implemented within a BIOS or other system software.

Various I/O devices 414 may be coupled to the first interface 416 along with a bus bridge 418, which bus bridge 418 couples the first interface 416 to a second interface 420. In some examples, one or more additional processors 415 (e.g., coprocessors, high-throughput multi-integrated core (MIC) processors, GPGPUs, accelerators (e.g., graphics accelerators or Digital Signal Processing (DSP) units), field Programmable Gate Arrays (FPGAs), or any other processor) are coupled to the first interface 416. In some examples, the second interface 420 may be a Low Pin Count (LPC) interface. Various devices may be coupled to the second interface 420 including, for example, a keyboard and/or mouse 422, a communication device 427, and storage circuitry 428. Storage circuitry 428 may be one or more non-transitory machine-readable storage media, such as a disk drive or other mass storage device that may include instructions/code and data 430. In addition, an audio I/O424 may be coupled to the second interface 420. Note that other architectures besides the point-to-point architecture described above are also possible. For example, instead of a point-to-point architecture, a system such as multiprocessor system 400 may implement a multi-point interface or other such architecture.

Example core architecture, processor, and computer architecture

The processor cores may be implemented in different ways, for different purposes, and in different processors. For example, embodiments of such cores may include 1) a general purpose ordered core intended for general purpose computing, 2) a high performance general purpose unordered core intended for general purpose computing, 3) a specialized core intended primarily for graphics and/or scientific (throughput) computing. Embodiments of the different processors may include 1) a CPU comprising one or more general purpose ordered cores intended for general purpose computing and/or one or more general purpose unordered cores intended for general purpose computing, and 2) a coprocessor comprising one or more special-purpose cores intended primarily for graphics and/or scientific (throughput) computing. Such different processors result in different computer system architectures that may include 1) a coprocessor on a separate chip from the CPU, 2) the coprocessor is on a separate die in the same package as the CPU, 3) the coprocessor is on the same die as the CPU (in which case such a coprocessor is sometimes referred to as special-purpose logic, such as integrated graphics and/or scientific (throughput) logic, or a special-purpose core), and 4) a system on a chip that may include the described CPU (sometimes referred to as application core(s) or application processor (s)) on the same die, the above-described coprocessor, and additional functionality. An example core architecture is described next followed by a description of an example processor and computer architecture.

Fig. 5 illustrates a block diagram of an example processor and/or SoC 500 that may have one or more cores and an integrated memory controller. The solid line box illustrates a processor 500 having a single core 502 (a), system proxy unit circuitry 510, and a set of one or more interface controller unit circuitry 516, while the optional addition of the dashed line box illustrates an alternative processor 500 having a plurality of cores 502 (a) through 502 (N), a set of one or more integrated memory controller unit circuitry 514 and dedicated logic 508 in the system proxy unit circuitry 510, and a set of one or more interface controller unit circuitry 516. Note that processor 500 may be one of processors 470 or 480 or coprocessors 438 or 415 of fig. 4.

Thus, different embodiments of processor 500 may include 1) a CPU where dedicated logic 508 is integrated graphics and/or scientific (throughput) logic (may include one or more cores, not shown) and cores 502 (A) through 502 (N) are one or more general-purpose cores (e.g., general-purpose ordered cores, general-purpose unordered cores, or a combination of both), 2) a coprocessor where cores 502 (A) through 502 (N) are a large number of dedicated cores intended primarily for graphics and/or science (throughput), and 3) a coprocessor where cores 502 (A) through 502 (N) are a large number of general-purpose ordered cores. Thus, the processor 500 may be a general purpose processor, a coprocessor or special-purpose processor, such as, for example, a network or communication processor, compression engine, graphics processor, GPGPU (general purpose graphics processing unit), a high-throughput Multiple Integrated Core (MIC) coprocessor (including 30 or more cores), embedded processor, or the like. The processor may be implemented on one or more chips. The processor 500 may be part of one or more substrates and/or may be implemented on one or more substrates using any of a variety of process technologies, such as, for example, complementary Metal Oxide Semiconductor (CMOS), bipolar CMOS (BiCMOS), P-type metal oxide semiconductor (PMOS), or N-type metal oxide semiconductor (NMOS).

The memory hierarchy includes one or more levels of cache unit circuitry 504 (a) through 504 (N) within cores 502 (a) through 502 (N), a set of one or more shared cache unit circuitry 506, and an external memory (not shown) coupled to a set of integrated memory controller unit circuitry(s) 514. The set of one or more shared cache unit circuitry 506 may include one or more mid-level caches, such as level 2 (L2), level 3 (L3), level 4 (L4), or other levels of caches, such as Last Level Caches (LLC), and/or combinations thereof. While in some examples, interface network circuitry 512 (e.g., a ring interconnect) interfaces with dedicated logic 508 (e.g., integrated graphics logic), a set(s) of shared cache unit circuitry 506, and system agent unit circuitry 510, alternative examples interface with such units using any number of well-known techniques. In some examples, consistency is maintained between one or more of the shared cache unit circuitry(s) 506 and cores 502 (a) through 502 (N). In some examples, interface controller unit circuitry 516 couples core 502 to one or more other devices 518, such as one or more I/O devices, storage, one or more communication devices (e.g., wireless network, wired network, etc.), and so forth.

In some examples, one or more of cores 502 (a) through 502 (N) are capable of multi-threaded processing. The system agent unit circuitry 510 includes those components of the coordination and operation cores 502 (a) through 502 (N). The system agent unit circuitry 510 may include, for example, power Control Unit (PCU) circuitry and/or display unit circuitry (not shown). The PCU may be or may include logic and components required to adjust the power states of cores 502 (a) through 502 (N) and/or dedicated logic 508 (e.g., integrated graphics logic). The display element circuitry is used to drive one or more externally connected displays.

Cores 502 (a) through 502 (N) may be homogenous in terms of Instruction Set Architecture (ISA). Alternatively, cores 502 (A) through 502 (N) may be heterogeneous in ISA, i.e., a subset of cores 502 (A) through 502 (N) may be capable of executing an ISA, while other cores may be capable of executing only a subset of the ISA or another ISA.

Example core architecture-ordered and unordered core block diagrams

FIG. 6A is a block diagram illustrating both an example in-order pipeline and an example register renaming, out-of-order issue/execution pipeline according to an example. FIG. 6B is a block diagram illustrating both an example in-order architecture core and an example register renaming, out-of-order issue/execution architecture core to be included in a processor according to an example. The solid line boxes in fig. 6A-6B illustrate an in-order pipeline and in-order core, while the optional addition of dashed line boxes illustrates a register renaming, out-of-order issue/execution pipeline and core. Considering that ordered aspects are a subset of unordered aspects, unordered aspects will be described.

In fig. 6A, processor pipeline 600 includes an fetch stage 602, an optional length decode stage 604, a decode stage 606, an optional allocate (Alloc) stage 608, an optional rename stage 610, a dispatch (also referred to as dispatch or issue) stage 612, an optional register read/memory read stage 614, an execute stage 616, a write back/memory write stage 618, an optional exception handling stage 622, and an optional commit stage 624. One or more operations may be performed in each of the processor pipeline stages. For example, during the fetch stage 602, one or more instructions are fetched from instruction memory, and during the decode stage 606, the fetched one or more instructions may be decoded, an address using a forwarding register port (e.g., a Load Store Unit (LSU) address) may be generated, and branch forwarding (e.g., immediate offset or Link Register (LR)) may be performed. In one example, decode stage 606 and register read/memory read stage 614 may be combined into one pipeline stage. In one example, during the execution stage 616, decoded instructions may be executed, LSU address/data pipelines to an Advanced Microcontroller Bus (AMB) interface may be executed, multiplication and addition operations may be performed, arithmetic operations with branch results may be performed, and so forth.

By way of example, the example register renaming, out-of-order issue/execution architecture core of FIG. 6B may implement the pipeline 600:1) the instruction fetch circuitry 638 execution fetch stage 602 and length decode stage 604; 2) the decode circuitry 640 execution decode stage 606; 3) the rename/allocator unit circuitry 652 execution dispatch stage 608 and rename stage 610; 4) the scheduler circuitry 656 execution dispatch stage 612; 5) the physical register file circuitry 658 and memory unit circuitry 670 execution register read/memory read stage 614; the execution cluster(s) 660 execution stage 616; 6) the memory unit circuitry 670 and physical register file circuitry(s) 658 execution write back/memory write stage 618; 7) various circuitry may be involved in the exception handling stage 622, and 8) the retirement unit circuitry 654 and physical register file circuitry(s) 658 execution stage 624.

Fig. 6B shows that processor core 690 includes front end unit circuitry 630 coupled to execution engine unit circuitry 650, and that both execution engine unit circuitry 650 and front end unit circuitry 630 are coupled to memory unit circuitry 670. The core 690 may be a reduced instruction set architecture computing (RISC) core, a complex instruction set architecture computing (CISC) core, a Very Long Instruction Word (VLIW) core, or a hybrid or alternative core type. As yet another option, core 690 may be a special-purpose core such as, for example, a network or communication core, a compression engine, a coprocessor core, a general purpose computing graphics processing unit (GPGPU) core, a graphics core, or the like.

The front-end unit circuitry 630 may include branch prediction circuitry 632, the branch prediction circuitry 632 coupled to instruction cache circuitry 634, the instruction cache circuitry 634 coupled to an instruction translation look-aside buffer (TLB) 636, the instruction translation look-aside buffer 636 coupled to instruction fetch circuitry 638, the instruction fetch circuitry 638 coupled to decode circuitry 640. In one example, instruction cache circuitry 634 is included in memory unit circuitry 670, rather than front-end circuitry 630. The decode circuitry 640 (or decoder) may decode the instruction and generate as output one or more micro-operations, micro-code entry points, micro-instructions, other instructions, or other control signals that are decoded from, or otherwise reflect, or are derived from the original instruction. The decoding circuitry 640 may also include address generation unit (AGU, not shown) circuitry. In one example, the AGU generates an LSU address using a forwarded register port, and may further perform branch forwarding (e.g., immediate offset branch forwarding, LR register branch forwarding, etc.). Decoding circuitry 640 may be implemented using a variety of different mechanisms. Examples of suitable mechanisms include, but are not limited to, look-up tables, hardware implementations, programmable Logic Arrays (PLAs), microcode read-only memories (ROMs), and the like. In one example, core 690 includes a microcode ROM (not shown) or other medium (e.g., in decode circuitry 640 or within front-end circuitry 630) that stores microcode for certain macro-instructions. In one example, the decode circuitry 640 includes micro-operations (micro-ops) or operation caches (not shown) to hold/cache decode operations, micro-tags, or micro-operations generated during decoding or other stages of the processor pipeline 600. The decode circuitry 640 may be coupled to rename/allocator unit circuitry 652 in the execution engine circuitry 650.

The execution engine circuitry 650 includes rename/allocator unit circuitry 652, the rename/allocator unit circuitry 652 being coupled to retirement unit circuitry 654 and a set of one or more scheduler circuitry 656. Scheduler circuitry(s) 656 represents any number of different schedulers including reservation stations, central instruction windows, and the like. In some examples, scheduler circuitry(s) 656 may include Arithmetic Logic Unit (ALU) scheduler/scheduling circuitry, ALU queues, address Generation Unit (AGU) scheduler/scheduling circuitry, AGU queues, and the like. Scheduler circuitry(s) 656 is coupled to physical register file circuitry(s) 658. Each of the physical register file(s) circuitry 658 represents one or more physical register files, different ones of which store one or more different data types, such as scalar integer, scalar floating point, packed integer, packed floating point, vector integer, vector floating point, state (e.g., instruction pointer as address of the next instruction to be executed), and so forth. In one example, physical register file circuitry(s) 658 includes vector register unit circuitry, write mask register unit circuitry, and scalar register unit circuitry. These register units may provide architectural vector registers, vector mask registers, general purpose registers, and the like. Physical register file circuitry(s) 658 is coupled to retirement unit circuitry 654 (also referred to as retirement queues or retirement queues) to illustrate various ways in which register renaming and out-of-order execution may be implemented (e.g., using reorder buffer(s) (ROB) and retirement register files, using future file(s), history buffer(s), and retirement register file(s), using register maps and register pools, etc.). Retirement unit circuitry 654 and physical register file circuitry 658(s) are coupled to execution cluster(s) 660. Execution cluster(s) 660 include a set of one or more execution unit circuitry 662 and a set of one or more memory access circuitry 664. Execution unit circuitry(s) 662 may perform various arithmetic, logic, floating point, or other types of operations (e.g., shifting, adding, subtracting, multiplying) on various types of data (e.g., scalar integer, scalar floating point, packed integer, packed floating point, vector integer, vector floating point). While some examples may include several execution units or execution unit circuitry dedicated to a particular function or set of functions, other examples may include only one execution unit circuitry or multiple execution units/execution unit circuitry that all perform all functions. Scheduler circuitry(s) 656, physical register file circuitry(s) 658, and execution cluster(s) 660 are shown as possibly being multiple, because certain examples create separate pipelines (e.g., scalar integer pipeline, scalar floating point/packed integer/packed floating point/vector integer/vector floating point pipeline, and/or memory access pipeline, each with its own scheduler circuitry, physical register file circuitry(s), and/or execution cluster-and in the case of separate memory access pipeline, embodiments are implemented in which only the execution cluster of the pipeline has memory access unit circuitry(s) 664) for certain types of data/operations. It should also be appreciated that where separate pipelines are used, one or more of these pipelines may be out-of-order issue/execution, with the remaining pipelines being in order.

In some examples, execution engine unit circuitry 650 may execute Load Store Unit (LSU) address/data pipelines to an Advanced Microcontroller Bus (AMB) interface (not shown) as well as address stages and writebacks, data stage loads, stores, and branches.

A set of memory access circuitry 664 is coupled to memory unit circuitry 670, the memory unit circuitry 670 including data TLB circuitry 672, the data TLB circuitry 672 being coupled to data cache circuitry 674, the data cache circuitry 674 being coupled to level 2 (L2) cache circuitry 676. In one example, the memory access circuitry 664 may include load unit circuitry, store address unit circuitry, and store data unit circuitry, each of which is coupled to data TLB circuitry 672 in the memory unit circuitry 670. Instruction cache circuitry 634 is further coupled to level 2 (L2) cache circuitry 676 in memory cell circuitry 670. In one example, instruction cache 634 and data cache 674 are combined into a single instruction and data cache (not shown) in L2 cache circuitry 676, level 3 (L3) cache circuitry (not shown), and/or main memory. L2 cache circuitry 676 is coupled to one or more other levels of cache and ultimately to main memory.

The core 690 may support one or more instruction sets (e.g., x86 instruction set architecture (optionally with some extensions that have been added with newer versions), MIPS instruction set architecture, ARM instruction set architecture (optionally with optional additional extensions, e.g., NEON)), including the instruction(s) described herein. In one example, core 690 includes logic to support compressed data instruction set architecture extensions (e.g., AVX1, AVX 2), allowing operations used by many multimedia applications to be performed using compressed data.

Example execution unit circuitry(s)

Fig. 7 illustrates an example of execution unit circuitry(s), such as execution unit circuitry(s) 662 of fig. 6B. As shown, execution unit circuitry(s) 662 may include one or more ALU circuits 701, optional vector/Single Instruction Multiple Data (SIMD) circuits 703, load/store circuits 705, branch/skip circuits 707, and/or Floating Point Unit (FPU) circuits 709. The ALU circuit 701 performs integer arithmetic and/or boolean operations. vector/SID circuitry 703 performs vector/SIMD operations on packed data, such as SIMD/vector registers. Load/store circuitry 705 executes load and store instructions to load data from memory into registers or store data from registers to memory. The load/store circuitry 705 may also generate addresses. Branch/jump circuit 707 causes a branch or jump to a memory address depending on the instruction. The FPU circuit 709 performs floating point arithmetic. The width of execution unit circuitry(s) 662 varies depending on the example and may range from 16 bits to 1024 bits, for example. In some examples, two or more smaller execution units are logically combined to form a larger execution unit (e.g., two 128-bit execution units are logically combined to form a 256-bit execution unit).

Program code may be applied to input information to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this disclosure, a processing system includes any system having a processor, such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a microprocessor, or any combination thereof.

The program code may be implemented in a high level programming language or an object oriented programming language to communicate with a processing system. Program code may also be implemented in assembly or machine language, if desired. In fact, the mechanisms described herein are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.

Examples of the mechanisms disclosed herein may be implemented in hardware, software, firmware, or a combination of such implementation approaches. Examples may be implemented as a computer program or program code that is executed on a programmable system including at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.

One or more aspects of at least one example may be implemented by representative instructions stored on a machine-readable medium which represents various logic within a processor, which when read by a machine, cause the machine to fabricate logic to perform the techniques described herein. Such representations, referred to as "Intellectual Property (IP) cores," may be stored on tangible machine-readable media and supplied to various customers or manufacturing facilities for loading into the manufacturing machines that manufacture the logic or processor.

Such machine-readable storage media may include, but is not limited to, non-transitory tangible arrangements of articles manufactured or formed by a machine or device, including storage media such as hard disks, any other type of magnetic disks including floppy disks, optical disks, compact disc read only memories (CD-ROMs), compact rewritable discs (CD-RWs), and magneto-optical disks, semiconductor devices such as Read Only Memories (ROMs), random access memories (e.g., dynamic Random Access Memories (DRAMs), static Random Access Memories (SRAMs), erasable Programmable Read Only Memories (EPROMs), flash memories, electrically Erasable Programmable Read Only Memories (EEPROMs), phase Change Memories (PCMs)), magnetic or optical cards, or any other type of media suitable for storing electronic instructions.

Thus, examples also include non-transitory tangible machine-readable media, such as Hardware Description Language (HDL), that define the structures, circuits, devices, processors, and/or system features described herein, including instructions or including design data. Such an example may also be referred to as a program product.

Simulation (including binary conversion, code morphing, etc.)

In some cases, an instruction converter may be used to convert instructions from a source instruction set architecture to a target instruction set architecture. For example, the instruction converter may translate (e.g., using static binary translation, dynamic binary translation including dynamic compilation), morph, emulate, or otherwise convert an instruction into one or more other instructions to be processed by the core. The instruction converter may be implemented in software, hardware, firmware, or a combination thereof. The instruction converter may be on-processor, off-processor, or partially on-processor and partially off-processor.

FIG. 8 is a block diagram illustrating the use of a software instruction translator to translate binary instructions in a source ISA to binary instructions in a target ISA, according to an example. In the example shown, the instruction converter is a software instruction converter, although alternatively the instruction converter may be implemented in software, firmware, hardware or various combinations thereof. FIG. 8 illustrates a program that can be compiled in high-level language 802 using first ISA compiler 804 to generate first ISA binary 806 that can be executed natively by a processor having at least one first ISA core 816. The processor with at least one first ISA core 816 represents any processor that may execute instructions compatible with the processor with at least one first ISA core by compatibly executing or otherwise handling (1) a substantial portion of the first ISA or (2) object code versions of applications or other software targeted to run on an intel processor with at least one first ISA coreThe processors have substantially the same functionality to achieve substantially the same results as processors having at least one first ISA core. First ISA compiler 804 represents a compiler operable to generate first ISA binary code 806 (e.g., object code), which first ISA binary code 806 may be executed on a processor with at least one first ISA core 816 with or without additional chaining processing. Similarly, FIG. 8 illustrates that a program of high-level language 802 can be compiled using alternative ISA compiler 808 to generate alternative ISA binary code 810 that can be executed natively by a processor without first ISA core 814. Instruction translator 812 is used to translate first ISA binary code 806 into code that can be executed natively by a processor without first ISA core 814. The translated code is not necessarily identical to the alternate ISA binary 810, however, the translated code will complete the general operation and consist of instructions from the alternate ISA. Thus, instruction translator 812 represents software, firmware, hardware, or a combination thereof that, through emulation, simulation, or any other process, allows a processor or other electronic device that does not have a first ISA processor or core to execute first ISA binary code 806.

References to "one example", "an example", "one embodiment", "an embodiment", etc., indicate that the example or embodiment described may include a particular feature, structure, or characteristic, but every example or embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same example or embodiment. Furthermore, when a particular feature, structure, or characteristic is described in connection with an example or embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other examples or embodiments whether or not explicitly described.

Moreover, in the various examples described above, unless specifically noted otherwise, disjunctive language such as the phrase "at least one of A, B or C" or "A, B and/or C" is intended to be understood to mean A, B or C or any combination thereof (i.e., a and B, A and C, B and C and A, B and C). As used in this specification and the claims, and unless otherwise specified, the use of the ordinal adjectives "first", "second", "third", etc., to describe an element merely indicate that a particular instance of an element, or different instances of like elements, are being referred to, and are not intended to imply that the elements so described must be in a particular sequence, either temporally, spatially, in ranking, or in any other manner. Also, as used in the description of the embodiments, the "/" character between terms may mean that what is described may include or be implemented using, utilizing, and/or in accordance with the first term and/or the second term (and/or any other additional term).

Moreover, the terms "bit," "flag," "field," "entry," "indicator," and the like may be used to describe any type or content of a storage location in a register, table, database, or other data structure, whether implemented in hardware or software, but are not meant to limit embodiments to any particular type of storage location or number of bits or other elements within any particular storage location. For example, the term "bit" may be used to refer to a bit location within a register and/or data stored or to be stored at that bit location. The term "clear" may be used to indicate that a logical value of 0 is stored or otherwise caused to be stored in a storage location and the term "group" may be used to indicate that a logical value of one, all, or some other specified value is stored in a storage location, however, these terms are not meant to limit the embodiments to any particular logical convention, as any logical convention may be used within an embodiment.

The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the disclosure as set forth in the claims.

Claims

1. A processor device for instruction prefix encoding, comprising:

An instruction decoder for decoding a first instruction including a first prefix, and

Cryptographic circuitry to perform a cryptographic operation on data, the cryptographic operation based at least in part on relative enumeration in the first prefix and a pointer to the data.

2. The processor device of claim 1, wherein the instruction decoder is further to decode a second instruction comprising a second prefix based on relative enumeration in the pointer.

3. The processor device of claim 2, wherein the data corresponds to a member of an object.

4. A processor device according to claim 3, wherein the first prefix is associated with the member.

5. A processor device according to claim 3, wherein the second prefix is associated with the object.

6. The processor device of claim 1, wherein the cryptographic operation is based at least on a fine-tuning derived from the first prefix and the relative enumeration.

7. The processor device of claim 6, wherein the fine-tuning is derived from a sum of the first prefix and the relative enumeration.

8. The processor device of claim 1, further comprising execution circuitry to perform one or more operations corresponding to the first instruction, the one or more operations comprising moving the data.

9. The processor device of claim 2, further comprising execution circuitry to perform one or more operations corresponding to the second instruction, the one or more operations comprising calculating an effective address of the data based at least in part on the pointer.

10. A method for instruction prefix encoding, comprising:

decoding, by an instruction decoder of a processing device, a first instruction including a first prefix, and

A cryptographic operation is performed on data by cryptographic circuitry of the processing device, the cryptographic operation based at least in part on relative enumeration in the first prefix and a pointer to the data.

11. The method of claim 10, further comprising decoding, by the instruction decoder, a second instruction, the second instruction including a second prefix based on relative enumeration in the pointer.

12. The method of claim 11, wherein the data corresponds to a member of an object.

13. The method of claim 12, wherein the first prefix is associated with the member.

14. The method of claim 12, wherein the second prefix is associated with the object.

15. The method of claim 10, wherein the cryptographic operation is based at least on a fine-tuning derived from the first prefix and the relative enumeration.

16. The method of claim 15, wherein the fine-tuning is derived from a sum of the first prefix and the relative enumeration.

17. The method of claim 10, further comprising performing, by execution circuitry of the processing device, one or more operations corresponding to the first instruction, the one or more operations comprising moving the data.

18. The method of claim 11, further comprising performing, by execution circuitry of the processing device, one or more operations corresponding to the second instruction, the one or more operations including calculating an effective address of the data based at least in part on the pointer.

19. A non-transitory machine-readable medium storing at least a single instruction, which when executed by a machine, causes the machine to perform a method for instruction prefix encoding, the method comprising:

20. The non-transitory machine-readable medium of claim 19, wherein the method further comprises decoding, by the instruction decoder, a second instruction comprising a second prefix based on relative enumeration in the pointer.