[go: up one dir, main page]

WO2017028909A1 - Shared physical registers and mapping table for architectural registers of multiple threads - Google Patents

Shared physical registers and mapping table for architectural registers of multiple threads Download PDF

Info

Publication number
WO2017028909A1
WO2017028909A1 PCT/EP2015/068977 EP2015068977W WO2017028909A1 WO 2017028909 A1 WO2017028909 A1 WO 2017028909A1 EP 2015068977 W EP2015068977 W EP 2015068977W WO 2017028909 A1 WO2017028909 A1 WO 2017028909A1
Authority
WO
WIPO (PCT)
Prior art keywords
registers
register
threads
architectural
recent usage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/EP2015/068977
Other languages
French (fr)
Inventor
Simcha Gochman
Zuguang WU
Weiguang CAI
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to PCT/EP2015/068977 priority Critical patent/WO2017028909A1/en
Priority to CN201580082261.5A priority patent/CN107851006B/en
Publication of WO2017028909A1 publication Critical patent/WO2017028909A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30138Extension of register space, e.g. register cache
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3838Dependency mechanisms, e.g. register scoreboarding
    • G06F9/384Register renaming
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming

Definitions

  • the present invention in some embodiments thereof, relates to
  • CPU cores especially those that are targeted to server market segments, increasingly support multi-threading (MT).
  • MT multi-threading
  • the demand for multi-threaded cores has been increasing at a high rate in all server market segments, especially in the context of Scale-out applications (e.g. Big Data).
  • Fine Grain MT (FGMT) - Threads are interleaved on a clock by clock basis;
  • Coarse Grain MT (CGMT, also denoted Switch on Event MT and SoE MT) - A thread runs until it is blocked by an event (that typically results in a long latency stall). Then it is replaced by the next thread that waits in the raw.
  • Each Architectural Register File Set typically includes:
  • Integer Register File (e.g. ARMv8 employs 31 Registers, each 64-bits wide);
  • Floating Point / SIMD Register File e.g. ARMv8 employs 32 Registers, each 128-bits wide;
  • Status Register e.g. ARMv8 employs roughly 6 Registers, each 64- bits wide).
  • Duplicating the ARF for each thread This is used for FGMT and SMT and in some cases also for CGMT (to avoid long switch times). Duplicating ARFs is very wasteful in terms of silicon area and energy consumption.
  • An object of the current invention is to improve multi-threading.
  • Embodiments presented herein perform map recently and/or frequently used registers of running threads (i.e. active threads) to physical registers. Registers of all the threads are saved in architectural registers, optionally in a SRAM. When a requested register is not mapped to a physical register, the content of the architectural register is stored in an allocated physical register, possibly replacing previously stored content (e.g. from a suspended thread). In this way, silicon area and energy
  • the system includes an interface which receives register accessing requests and a processing unit.
  • the processing unit dynamically maps a group of registers from multiple architectural registers to at least one of a multiplicity of physical registers based on at least one of recent usage and access frequency of each one of the architectural registers by multiple multithreading (MT) threads, and looks up a match for each one of the register accessing requests in the architectural registers when the match is not found in the physical registers.
  • MT multithreading
  • the MT threads submit the register accessing requests and are of a multithreading processor.
  • the register accessing requests are received via at least one pipeline engine.
  • the architectural registers are stored in a static random access memory (SRAM).
  • SRAM static random access memory
  • the system further includes a memory for storing an access frequency dataset.
  • the processing unit updates the access frequency dataset with a frequency of access to respective registers and performs the mapping according to the access frequency dataset.
  • the system further includes a memory for storing a recent usage dataset.
  • the processing unit updates the recent usage dataset with the recent usage and performs the mapping according to the recent usage dataset.
  • the recent usage dataset includes multiple records.
  • Each of the records documents a recent usage of each one of the MT threads to the architectural registers.
  • the recent usage dataset includes respective allocation states of architectural registers.
  • the architectural registers maps to an allocation of suspended and running threads of the multiple MT threads and the physical registers maps to an allocation of running threads of the multiple MT threads.
  • the processing unit updates the recent usage dataset when switching an allocation of any of the physical registers from one of the MT threads to another of the MT threads.
  • the processing unit maps the architectural registers to the MT threads.
  • the processing unit switches mapping of any of the architectural registers from one of the MT threads to another of the architectural registers.
  • the processing unit sets a respective state of physical registers mapped to an active thread to available when the active thread is inactivated by a switch to a different thread.
  • a method for handling a register accessing request includes:
  • mapping dynamically a group of registers from multiple architectural registers to at least one of a multiplicity of physical registers based on at least one of recent usage and access frequency of each one of the architectural registers by multiple multithreading (MT) threads;
  • the method further includes monitoring the at least one of recent usage and access frequency by recording the plurality of register accessing requests which are received via at least one pipeline engine.
  • Fig. 1 is a simplified block diagram of system for handling a register accessing request, according to embodiments of the invention
  • Fig. 2 is a simplified illustration of a register mapping scheme, according to embodiments of the invention.
  • Fig. 3 is a simplified block diagram of a method for handling register accessing requests according to embodiments of the invention.
  • Fig. 4 is a simplified block diagram of a method for thread context switching according to embodiments of the invention.
  • Fig. 5 is a simplified flowchart of a method for handling a register accessing request according to embodiments of the invention.
  • the present invention in some embodiments thereof, relates to multithreading and, more specifically, but not exclusively, to architectural register management in multi-threading cores.
  • Embodiments of the invention utilize a register mapping scheme that dynamically maps the most recently and/or frequently used architectural registers to a smaller physical register file set, and fetches the registers' content on demand.
  • the register mapping (also denoted herein the mapping table) is checked to see if the requested architectural register is mapped to a physical registers.
  • the physical register is utilized for the register access.
  • mapping table is maintained dynamically and updated as needed during assignment and reassignment of physical registers to architectural registers.
  • the present invention may be a system, a method, and/or a computer program product.
  • the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
  • the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
  • the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
  • a network for example, the Internet, a local area network, a wide area network and/or a wireless network.
  • the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • LAN local area network
  • WAN wide area network
  • Internet Service Provider for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
  • FPGA field-programmable gate arrays
  • PLA programmable logic arrays
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur out of the order noted in the figures.
  • two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • a memory e.g. an SRAM that stores architectural states of multiple
  • architectural register file and "ARF” mean the dataset which includes the entire architectural state for all threads. The terms are not limited to a particular type of file, organization of data or memory element used for storing the ARF.
  • the memory storing the ARF has denser data storage than the physical registers but its access time is longer than the access time of the physical registers.
  • a reasonably sized PRF enables quick access to some architectural register content without requiring a drastic increase in silicon area.
  • physical register file and “PRF” mean the dataset stored in the physical registers. The terms are not limited to a particular type of file or organization of data.
  • the memory stores all the architectural states for all threads. This embodiment uses simple logic for fixed indexing but is more costly in area. In other embodiments, the memory only stores architectural states not stored in the PRF, resulting in a reduction in area with increased indexing complexity.
  • each active thread has predefined number of physical registers cannot use physical registers allocated to other threads.
  • physical registers are dynamically allocated to all active threads.
  • the mapping table shows that the requested register is present in the PRF
  • the physical register is utilized for the register access.
  • a replacement cycle occurs.
  • one or more physical registers for example the least recently used registers
  • These physical registers are then available for storage of other architectural register values.
  • all the selected architectural registers e.g. recently used
  • the mapping table is maintained dynamically and updated as needed during or after the replacement cycle.
  • register request and "register access request” include requests for read and write operations to the register.
  • Fig. 1 is a simplified block diagram of system for handling a register accessing request, according to embodiments of the invention.
  • System 100 includes interface 110 and processing unit 120.
  • Interface 110 receives register accessing requests.
  • the register accessing requests are submitted by multiple MT threads.
  • the register accessing requests are received via at least one pipeline engine.
  • Processing unit 120 dynamically maps a group of registers from architectural registers 150 to physical registers 140. Optionally the mapping is based on:
  • processing unit 120 determines from the mapping table whether the register value is stored in physical registers 140. When a match is not found in the physical registers 140, processing unit 120 looks up a match for the requested register in architectural registers 150.
  • the architectural registers are stored in a static random access memory (SRAM).
  • SRAM static random access memory
  • system 100 includes a memory which stores a recent usage dataset.
  • Processing unit 120 updates the recent usage dataset with recent usage of each register, and performs the mapping, at least in part, according to the recent usage dataset.
  • the memory stores an access frequency dataset.
  • Processing unit 120 updates the recent usage dataset with access frequency of each register, and performs the mapping, at least in part, according to the access frequency dataset.
  • the recent usage dataset comprises multiple records.
  • Each record documents the recent usage of architectural registers by a respective thread.
  • the recent usage dataset includes an allocation state of each architectural register.
  • the allocation state indicates when the architectural register is allocated to a physical register, in which case the architectural register value may be read from or written to the physical register, and optionally indicates the physical register to which the architectural register is allocated.
  • architectural registers 150 are allocated to both suspended (i.e. inactive) and running (i.e. active) threads of the multiple MT threads, and physical registers 140 are allocated to running threads.
  • processing unit 120 updates the recent usage dataset when the allocation of an architectural register is switched from one MT thread to another thread. This may occur when a thread is terminated or added.
  • processing unit 120 updates the recent usage dataset when the allocation of a physical register is switched from one MT thread to another thread. This may occur when a thread is inactive and the physical register is reallocated to an architectural register of a different thread.
  • processing unit 120 maps the architectural registers to respective MT threads.
  • processing unit 120 switches the mapping of an architectural register for a given MT thread to different architectural register.
  • FIG. 2 is a simplified illustration of a register mapping scheme, according to embodiments of the invention.
  • Fig. 2 is a simplified illustration of a register mapping scheme, according to embodiments of the invention.
  • N denotes a number of active threads
  • M denotes a total number of threads (active and inactive);
  • K denotes a number of all registers per thread
  • J denotes a number of registers per thread which are stored in
  • the total number of registers in the ARF is M*K, whereas the number of registers in the PRF is the smaller number N*J.
  • the registers stored in the PRF are selected on the basis of access frequency ("frequently-used").
  • the registers stored in the PRF are selected by a different criterion (e.g. recently-used) and register mapping, access and handling is performed in a substantially similar manner.
  • mapping table 210 specifies whether the requested register is allocated in PRF 220 and also maintains other information used for finding candidates for replacement (e.g. the least frequently used register). In the embodiment of Fig. 2, mapping table 210 holds the following fields for each register of the active thread:
  • iii) Dirty - indicates whether the value stored in the architectural register is corresponds to the value of the mapped physical register; iv) Access frequency - May be used to select a physical register for overwrite when a requested architectural register is not in the PRF.
  • pipeline engine 200 is running N active threads. Active threads issue register access requests for architectural registers.
  • mapping table 210 is used to determine whether the register value may be accessed from the PRF 220 relatively quickly or must be obtained from ARF 230.
  • ARF 230 stores the architecture register files of all the active and inactive threads. Data may be transferred between ARF 230 and PRF 220 to keep the architectural and physical register values up to date as required for operation. The mapping table is updated accordingly.
  • a "victim" physical register is reallocated for the requested architectural register and the content is of the reallocated register is replaced.
  • inactive threads are the preferred providers of victim physical registers.
  • the victim physical register is selected at least in part on data stored in the mapping table (e.g. access frequency and/or recent access).
  • remapping of source and destination registers is done in the pipeline engine.
  • FIG. 3 is a simplified flowchart of a method for handling a register accessing request according to embodiments of the invention.
  • register accessing requests are received.
  • the register mapping is checked in 320 to determine when the requested register is mapped to a physical register.
  • a match is looked up in the architectural registers (i.e. ARF) for each requested register which is not mapped to a physical register.
  • the architectural registers i.e. ARF
  • the requested register is accessed from the PRF.
  • Register mapping from architectural registers to physical registers is performed dynamically in 360.
  • the mapping may be based on a recent usage of each architectural register by the MT threads and/or on recent usage of each architectural register by the MT threads.
  • the mapping is performed based on an alternate or additional mapping criterion.
  • recent usage of register usage is monitored by recording register accessing requests which are received via at least one pipeline engine.
  • FIG. 4 is a simplified block diagram of a method for handling register accessing requests according to embodiments of the invention.
  • a register accessing request is issued by a pipeline engine.
  • the mapping table is checked to determine whether the requested register is stored in the PRF (e.g. by checking the "valid" bit of the requested register).
  • register read or write access is performed in 420.
  • the write data is stored in the physical register mapped to the requested architectural register.
  • the value stored in the physical register mapped to the requested architectural register is returned.
  • the PRF is searched in
  • FIG. 5 is a simplified block diagram of a method for thread context switching according to embodiments of the invention.
  • the thread switch is hardware or software.
  • the engine pipeline switches the active thread to a different thread, temporarily blocking the previously active thread.
  • the valid bits for the now active thread are set to zero.
  • only registers marked as dirty in the mapping table are updated in the ARF.
  • the thread switch is a software switch
  • the software switches the active thread to another thread, inactivating the previously active thread.
  • all physical registers mapped to architectural registers for the currently active thread are read and written to memory (i.e. updated in the ARF).
  • all valid bits in the mapping table are set to zero for the currently active thread.
  • the previously active thread is deleted from the thread control register which specifies active threads.
  • ARFs are not duplicated per thread and recovery of architectural registers is done on demand.
  • Retrieval of registers for a new thread may be performed during the thread switch time (i.e. while the machine front- end is fetching instructions from the new thread). Suspended threads naturally provide physical register victim candidates. Avoiding full ARF duplication results in significant reduction in area (die size) and in energy consumption.
  • the thread switch time is significantly shortened relative to full ARFs save and restore and may be performed primarily in the background of execution.
  • composition or method may include additional ingredients and/or steps, but only if the additional ingredients and/or steps do not materially alter the basic and novel characteristics of the claimed composition or method.
  • the singular form “a”, “an” and “the” include plural references unless the context clearly dictates otherwise.
  • the term “a compound” or “at least one compound” may include a plurality of compounds, including mixtures thereof.
  • range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

A system for handling a register accessing request, comprising includes an interface for receiving register accessing requests and a processing unit connected to the interface. The processing unit dynamically maps architectural registers to physical registers based on a criterion such as recent usage and/or access frequency of the architectural registers by multithreading (MT) threads. The processing unit also looks up a respective architectural register for register accessing requests for which a match is not found in the physical registers.

Description

REGISTER MAPPING FOR MULTI-THREADING
BACKGROUND The present invention, in some embodiments thereof, relates to
implementation of multi-threading and, more specifically, but not exclusively, to architectural register management in multi-threading cores.
CPU cores, especially those that are targeted to server market segments, increasingly support multi-threading (MT). The demand for multi-threaded cores has been increasing at a high rate in all server market segments, especially in the context of Scale-out applications (e.g. Big Data).
There are currently three MT implementation schemes:
1. Fine Grain MT (FGMT) - Threads are interleaved on a clock by clock basis;
2. Simultaneous MT (SMT) - Threads run simultaneously sharing all machine resources; and
3. Coarse Grain MT (CGMT, also denoted Switch on Event MT and SoE MT) - A thread runs until it is blocked by an event (that typically results in a long latency stall). Then it is replaced by the next thread that waits in the raw.
Current MT implementations include:
1. Larrabee by Intel (4 way FGMT) ;
2. Xeon Servers by Intel (2 way SMT); and
3. Intel's Itanium Montecito (2 way CGMT).
In MT, each thread carries over the entire architectural state of the machine. Each Architectural Register File Set (ARF) typically includes:
1. Integer Register File (e.g. ARMv8 employs 31 Registers, each 64-bits wide);
2. Floating Point / SIMD Register File (e.g. ARMv8 employs 32 Registers, each 128-bits wide); and
3. Status Register (e.g. ARMv8 employs roughly 6 Registers, each 64- bits wide).
Supporting multiple threads on same die multiplies this amount. The registers must be available and easily accessed. Current MT implementations use the following strategies for handling register files (RFs):
1) Duplicating the ARF for each thread. This is used for FGMT and SMT and in some cases also for CGMT (to avoid long switch times). Duplicating ARFs is very wasteful in terms of silicon area and energy consumption.
2) Holding a single register file set and copying back and forth
(applicable only to CGMT). This approach is time consuming, makes the switch time fairly long and inefficient and severely reduces performance.
SUMMARY
An object of the current invention is to improve multi-threading.
This object is obtained by the subject matter of the independent claims. The dependent claims protect further embodiments.
Embodiments presented herein perform map recently and/or frequently used registers of running threads (i.e. active threads) to physical registers. Registers of all the threads are saved in architectural registers, optionally in a SRAM. When a requested register is not mapped to a physical register, the content of the architectural register is stored in an allocated physical register, possibly replacing previously stored content (e.g. from a suspended thread). In this way, silicon area and energy
consumption are reduced and switch time may be shortened.
According to a first aspect of some embodiments of the present invention there is provided system for handling a register accessing request. The system includes an interface which receives register accessing requests and a processing unit. The processing unit dynamically maps a group of registers from multiple architectural registers to at least one of a multiplicity of physical registers based on at least one of recent usage and access frequency of each one of the architectural registers by multiple multithreading (MT) threads, and looks up a match for each one of the register accessing requests in the architectural registers when the match is not found in the physical registers.
In a first possible implementation form of the system according to the first aspect the MT threads submit the register accessing requests and are of a multithreading processor. In a second possible implementation form of the system, the register accessing requests are received via at least one pipeline engine.
In a third possible implementation form of the system, the architectural registers are stored in a static random access memory (SRAM).
In a fourth possible implementation form of the system, the system further includes a memory for storing an access frequency dataset. The processing unit updates the access frequency dataset with a frequency of access to respective registers and performs the mapping according to the access frequency dataset.
In a fifth possible implementation form of the system, the system further includes a memory for storing a recent usage dataset. The processing unit updates the recent usage dataset with the recent usage and performs the mapping according to the recent usage dataset.
In a second possible implementation form of the system according to the fifth implementation form of the first aspect, the recent usage dataset includes multiple records. Each of the records documents a recent usage of each one of the MT threads to the architectural registers.
In a third possible implementation form of the system according to the fifth implementation form of the first aspect, the recent usage dataset includes respective allocation states of architectural registers.
In a fourth possible implementation form of the system according to the fifth implementation form of the first aspect, the architectural registers maps to an allocation of suspended and running threads of the multiple MT threads and the physical registers maps to an allocation of running threads of the multiple MT threads.
In a fifth possible implementation form of the system according to the fifth implementation form of the first aspect, the processing unit updates the recent usage dataset when switching an allocation of any of the physical registers from one of the MT threads to another of the MT threads.
In a sixth possible implementation form of the system, the processing unit maps the architectural registers to the MT threads.
In a seventh possible implementation form of the system, the processing unit switches mapping of any of the architectural registers from one of the MT threads to another of the architectural registers. In an eighth possible implementation form of the system, the processing unit sets a respective state of physical registers mapped to an active thread to available when the active thread is inactivated by a switch to a different thread.
According to a second aspect of some embodiments of the present invention there is provided a method for handling a register accessing request. The method includes:
i) receiving multiple register accessing requests;
ii) mapping dynamically a group of registers from multiple architectural registers to at least one of a multiplicity of physical registers based on at least one of recent usage and access frequency of each one of the architectural registers by multiple multithreading (MT) threads; and
iii) looking up a match for each one of the register accessing requests in the architectural registers when the requested register not found in the mapping of the physical registers.
In a first possible implementation form of the method according to the second aspect the method further includes monitoring the at least one of recent usage and access frequency by recording the plurality of register accessing requests which are received via at least one pipeline engine.
Unless otherwise defined, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the invention, exemplary methods and/or materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and are not intended to be necessarily limiting.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS Some embodiments of the invention are herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of embodiments of the invention. In this regard, the description taken with the drawings makes apparent to those skilled in the art how embodiments of the invention may be practiced.
In the drawings:
Fig. 1 is a simplified block diagram of system for handling a register accessing request, according to embodiments of the invention;
Fig. 2 is a simplified illustration of a register mapping scheme, according to embodiments of the invention;
Fig. 3 is a simplified block diagram of a method for handling register accessing requests according to embodiments of the invention;
Fig. 4 is a simplified block diagram of a method for thread context switching according to embodiments of the invention; and
Fig. 5 is a simplified flowchart of a method for handling a register accessing request according to embodiments of the invention. DETAILED DESCRIPTION
The present invention, in some embodiments thereof, relates to multithreading and, more specifically, but not exclusively, to architectural register management in multi-threading cores.
Embodiments of the invention utilize a register mapping scheme that dynamically maps the most recently and/or frequently used architectural registers to a smaller physical register file set, and fetches the registers' content on demand.
In some embodiments, when a new register access request is issued the register mapping (also denoted herein the mapping table) is checked to see if the requested architectural register is mapped to a physical registers. When the requested register is present in the PRF, the physical register is utilized for the register access.
When a requested register is not mapped to the PRF, one or more physical registers are written back to the ARF to make registers in the PRF available for storage of other architectural register values. The requested architectural register is written to a physical register, and access continues from the PRF.
The mapping table is maintained dynamically and updated as needed during assignment and reassignment of physical registers to architectural registers.
Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not necessarily limited in its application to the details of construction and the arrangement of the components and/or methods set forth in the following description and/or illustrated in the drawings and/or the Examples. The invention is capable of other embodiments or of being practiced or carried out in various ways.
The present invention may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
Some embodiments of the invention are based on a computing system that includes:
1) A memory (e.g. an SRAM) that stores architectural states of multiple
MT threads;
2) Physical registers which store a physical register file (denoted herein the PRF); and
3) A register mapping that dynamically maps architectural registers to physical registers.
As used herein the terms "architectural register file" and "ARF" mean the dataset which includes the entire architectural state for all threads. The terms are not limited to a particular type of file, organization of data or memory element used for storing the ARF.
The memory storing the ARF has denser data storage than the physical registers but its access time is longer than the access time of the physical registers. A reasonably sized PRF enables quick access to some architectural register content without requiring a drastic increase in silicon area. As used herein the terms "physical register file" and "PRF" mean the dataset stored in the physical registers. The terms are not limited to a particular type of file or organization of data.
In some embodiments, the memory stores all the architectural states for all threads. This embodiment uses simple logic for fixed indexing but is more costly in area. In other embodiments, the memory only stores architectural states not stored in the PRF, resulting in a reduction in area with increased indexing complexity.
In some embodiments, each active thread has predefined number of physical registers cannot use physical registers allocated to other threads. In other
embodiments, physical registers are dynamically allocated to all active threads.
In some embodiments, when a new register access request is issued its source and destination operands are looked up in the register mapping (also denoted herein the mapping table). When the mapping table shows that the requested register is present in the PRF, the physical register is utilized for the register access. When one or more requested registers are not mapped to the PRF, a replacement cycle occurs. In the replacement cycle one or more physical registers (for example the least recently used registers) are written back to the architectural registers. These physical registers are then available for storage of other architectural register values. After a warm up period, all the selected architectural registers (e.g. recently used) will be cached in the PRF and execution will require relatively few replacement cycles. However when the processing moves to a new phase that employs different architectural registers a new warm up period may occur. The mapping table is maintained dynamically and updated as needed during or after the replacement cycle.
As used herein the terms "register request" and "register access request" include requests for read and write operations to the register.
The register mapping described herein is particularly beneficial for core implementations that employ a large number of threads in order to exploit thread level parallelism (such as graphic accelerators, big data servers, etc.). A single core is able to support an increased number of threads by increasing thread level parallelism (TLP) without having the overhead of duplicating the entire architectural states of all threads or limiting operation to CGMT with long thread switch periods. As the number of threads per core increases, the potential benefit of the register mapping described herein increases. Reference is now made to Fig. 1, which is a simplified block diagram of system for handling a register accessing request, according to embodiments of the invention. System 100 includes interface 110 and processing unit 120.
Interface 110 receives register accessing requests. Optionally, the register accessing requests are submitted by multiple MT threads. Optionally, the register accessing requests are received via at least one pipeline engine.
Processing unit 120 dynamically maps a group of registers from architectural registers 150 to physical registers 140. Optionally the mapping is based on:
i) Access frequency by the MT threads ("frequently-used");
ii) Recent usage by the MT threads ("recently-used"); and/or
iii) A combination of access frequency and recent usage.
In response to a register access request, processing unit 120 determines from the mapping table whether the register value is stored in physical registers 140. When a match is not found in the physical registers 140, processing unit 120 looks up a match for the requested register in architectural registers 150.
Optionally, the architectural registers are stored in a static random access memory (SRAM).
In some embodiments, system 100 includes a memory which stores a recent usage dataset. Processing unit 120 updates the recent usage dataset with recent usage of each register, and performs the mapping, at least in part, according to the recent usage dataset. Additionally or alternately, the memory stores an access frequency dataset. Processing unit 120 updates the recent usage dataset with access frequency of each register, and performs the mapping, at least in part, according to the access frequency dataset.
Optionally, the recent usage dataset comprises multiple records. Each record documents the recent usage of architectural registers by a respective thread.
Optionally, the recent usage dataset includes an allocation state of each architectural register. The allocation state indicates when the architectural register is allocated to a physical register, in which case the architectural register value may be read from or written to the physical register, and optionally indicates the physical register to which the architectural register is allocated.
Optionally, architectural registers 150 are allocated to both suspended (i.e. inactive) and running (i.e. active) threads of the multiple MT threads, and physical registers 140 are allocated to running threads. Optionally, processing unit 120 updates the recent usage dataset when the allocation of an architectural register is switched from one MT thread to another thread. This may occur when a thread is terminated or added.
Optionally, processing unit 120 updates the recent usage dataset when the allocation of a physical register is switched from one MT thread to another thread. This may occur when a thread is inactive and the physical register is reallocated to an architectural register of a different thread.
Optionally, processing unit 120 maps the architectural registers to respective MT threads.
Optionally, processing unit 120 switches the mapping of an architectural register for a given MT thread to different architectural register.
Reference is now made to Fig. 2, which is a simplified illustration of a register mapping scheme, according to embodiments of the invention. In Fig. 2:
i) N denotes a number of active threads;
ii) M denotes a total number of threads (active and inactive);
iii) K denotes a number of all registers per thread; and
iv) J denotes a number of registers per thread which are stored in
PRF 130.
Thus, the total number of registers in the ARF is M*K, whereas the number of registers in the PRF is the smaller number N*J.
For clarity, in the non-limiting embodiment of Fig. 2 the registers stored in the PRF are selected on the basis of access frequency ("frequently-used"). In other embodiments, the registers stored in the PRF are selected by a different criterion (e.g. recently-used) and register mapping, access and handling is performed in a substantially similar manner.
Mapping table 210 specifies whether the requested register is allocated in PRF 220 and also maintains other information used for finding candidates for replacement (e.g. the least frequently used register). In the embodiment of Fig. 2, mapping table 210 holds the following fields for each register of the active thread:
i) Valid - indicates whether the architectural register value is stored in the PRF;
ii) Index- maps the architectural register to a physical register;
iii) Dirty - indicates whether the value stored in the architectural register is corresponds to the value of the mapped physical register; iv) Access frequency - May be used to select a physical register for overwrite when a requested architectural register is not in the PRF.
In Fig. 2, pipeline engine 200 is running N active threads. Active threads issue register access requests for architectural registers. When a register request is received from pipeline engine 200, mapping table 210 is used to determine whether the register value may be accessed from the PRF 220 relatively quickly or must be obtained from ARF 230.
ARF 230 stores the architecture register files of all the active and inactive threads. Data may be transferred between ARF 230 and PRF 220 to keep the architectural and physical register values up to date as required for operation. The mapping table is updated accordingly.
In the case of a "register miss" (i.e. the requested architectural register is not in PRF 220) a "victim" physical register is reallocated for the requested architectural register and the content is of the reallocated register is replaced. In some
embodiments, inactive threads are the preferred providers of victim physical registers.
Optionally, the victim physical register is selected at least in part on data stored in the mapping table (e.g. access frequency and/or recent access).
Optionally, remapping of source and destination registers is done in the pipeline engine.
Reference is now made to Fig. 3, which is a simplified flowchart of a method for handling a register accessing request according to embodiments of the invention. In 310, register accessing requests are received. The register mapping is checked in 320 to determine when the requested register is mapped to a physical register. In 330 a match is looked up in the architectural registers (i.e. ARF) for each requested register which is not mapped to a physical register. Optionally, in 340 the
architectural register value is stored in a physical register.
Optionally, in 350 the requested register is accessed from the PRF.
Register mapping from architectural registers to physical registers (i.e. PRF) is performed dynamically in 360. The mapping may be based on a recent usage of each architectural register by the MT threads and/or on recent usage of each architectural register by the MT threads. Optionally, the mapping is performed based on an alternate or additional mapping criterion. Optionally, recent usage of register usage (physical and/or architectural) is monitored by recording register accessing requests which are received via at least one pipeline engine.
Reference is now made to Fig. 4, which is a simplified block diagram of a method for handling register accessing requests according to embodiments of the invention.
In 400, a register accessing request is issued by a pipeline engine. In 410 the mapping table is checked to determine whether the requested register is stored in the PRF (e.g. by checking the "valid" bit of the requested register).
When the requested register is stored in the PRF, register read or write access is performed in 420. For a write operation the write data is stored in the physical register mapped to the requested architectural register. For a read operation, the value stored in the physical register mapped to the requested architectural register is returned.
When the requested register is not stored in the PRF, the PRF is searched in
430 to find an available physical register to store the requested architectural register's data. When an available physical register is not found, in 450 a victim physical register is selected and its content stored back to the ARF, thereby creating an available physical register.
In 460 it is determined whether the access is a read request or not. When the access is a read request, then in 470 the requested register value is copied from the ARF to an available register in the PRF as shown in 470. The read or write operation is then performed in 420, as described above.
Reference is now made to Fig. 5, which is a simplified block diagram of a method for thread context switching according to embodiments of the invention.
In 500, it is determined whether the thread switch is hardware or software. When the thread switch is a hardware switch, in 510 the engine pipeline switches the active thread to a different thread, temporarily blocking the previously active thread. In 520, the valid bits for the now active thread are set to zero. In 530, only registers marked as dirty in the mapping table are updated in the ARF.
When the thread switch is a software switch, in 540 the software switches the active thread to another thread, inactivating the previously active thread. In 550, all physical registers mapped to architectural registers for the currently active thread are read and written to memory (i.e. updated in the ARF). In 560, all valid bits in the mapping table are set to zero for the currently active thread. In 570 the previously active thread is deleted from the thread control register which specifies active threads.
In summary, the embodiments presented above are useful for all MT implementations, including CGMT. The register mapping described herein
significantly reduces CGMT overheads since ARFs are not duplicated per thread and recovery of architectural registers is done on demand. Retrieval of registers for a new thread may be performed during the thread switch time (i.e. while the machine front- end is fetching instructions from the new thread). Suspended threads naturally provide physical register victim candidates. Avoiding full ARF duplication results in significant reduction in area (die size) and in energy consumption. The thread switch time is significantly shortened relative to full ARFs save and restore and may be performed primarily in the background of execution.
The descriptions of the various embodiments of the present invention have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
It is expected that during the life of a patent maturing from this application many relevant multithreading implementations, register files, architectural registers, physical registers, register mapping implementations and register access operations will be developed and the scope of the terms multithreading, register file, architectural register, physical register, register mapping, register access and register access request is intended to include all such new technologies a priori.
The terms "comprises", "comprising", "includes", "including", "having" and their conjugates mean "including but not limited to". This term encompasses the terms "consisting of and "consisting essentially of.
The phrase "consisting essentially of means that the composition or method may include additional ingredients and/or steps, but only if the additional ingredients and/or steps do not materially alter the basic and novel characteristics of the claimed composition or method. As used herein, the singular form "a", "an" and "the" include plural references unless the context clearly dictates otherwise. For example, the term "a compound" or "at least one compound" may include a plurality of compounds, including mixtures thereof.
The word "exemplary" is used herein to mean "serving as an example, instance or illustration". Any embodiment described as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments and/or to exclude the incorporation of features from other embodiments.
The word "optionally" is used herein to mean "is provided in some embodiments and not provided in other embodiments". Any particular embodiment of the invention may include a plurality of "optional" features unless such features conflict.
Throughout this application, various embodiments of this invention may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
Whenever a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range. The phrases "ranging/ranges between" a first indicate number and a second indicate number and "ranging/ranges from" a first indicate number "to" a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals therebetween.
It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as suitable in any other described embodiment of the invention. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements.
All publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention. To the extent that section headings are used, they should not be construed as necessarily limiting.

Claims

1. A system for handling a register accessing request, comprising:
an interface adapted to receive a plurality of register accessing requests;
a processing unit, connected to said interface and adapted to:
map dynamically a group of registers from a plurality of architectural registers to at least one of a plurality of physical registers based on at least one of recent usage and access frequency of each one of said plurality of architectural registers by a plurality of multithreading (MT) threads; and
lookup a match for each one of said register accessing requests in said plurality of architectural registers when said match is not found in said plurality of physical registers.
2. The system of claim 1, wherein said plurality of MT threads submit said plurality of register accessing requests and are of a multi-threading processor.
3. The system of any of the previous claims, wherein said plurality of register accessing requests are received via at least one pipeline engine.
4. The system of any of the previous claims, wherein said plurality of architectural registers are stored in a static random access memory (SRAM).
5. The system of any of the previous claims, further comprising a memory adapted to store an access frequency dataset; wherein said processing unit is adapted to update said access frequency dataset with a frequency of access to respective registers and to perform said mapping according to said access frequency dataset.
6. The system of any of the previous claims, further comprising a memory adapted to store a recent usage dataset; wherein said processing unit is adapted to update said recent usage dataset with said recent usage and to perform said mapping according to said recent usage dataset.
7. The system of claim 6, wherein said recent usage dataset comprises a plurality of records, each documenting a recent usage of each one of said plurality of MT threads to said plurality of architectural registers.
8. The system of any of claims 6-7, wherein said recent usage dataset comprises respective allocation states of said plurality of architectural registers.
9. The system of any of claims 6-8, wherein said plurality of architectural registers maps to an allocation of suspended and running threads of said plurality of MT threads and said plurality of physical registers maps to an allocation of running threads of said plurality of MT threads.
10. The system of any of claims 6-9, wherein said processing unit is adapted to update said recent usage dataset when switching an allocation of any of said plurality of physical registers from one of said plurality of MT threads to another of said plurality of MT threads.
11. The system of any of the previous claims, wherein said processing unit is adapted to map said plurality of architectural registers to said plurality of MT threads.
12. The system of any of the previous claims, wherein said processing unit is adapted to switch mapping of any of said plurality of architectural registers from one of said plurality of MT threads to another of said plurality of architectural registers.
13. The system of any of the previous claims, wherein said processing unit is adapted to set a respective state of physical registers mapped to an active thread to available when said active thread is inactivated by a switch to a different thread.
14. A method for handling a register accessing request, comprising:
receiving a plurality of register accessing requests;
mapping dynamically a group of registers from a plurality of architectural registers to at least one of a plurality of physical registers based on at least one of recent usage and access frequency of each one of said plurality of architectural registers by a plurality of multithreading (MT) threads; and looking up a match for each one of said register accessing requests in said plurality of architectural registers when said requested register not found in said mapping of said plurality of physical registers.
15. The method of claim 14, further comprising: monitoring said at least one of recent usage and access frequency by recording said plurality of register accessing requests which are received via at least one pipeline engine.
PCT/EP2015/068977 2015-08-18 2015-08-18 Shared physical registers and mapping table for architectural registers of multiple threads Ceased WO2017028909A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/EP2015/068977 WO2017028909A1 (en) 2015-08-18 2015-08-18 Shared physical registers and mapping table for architectural registers of multiple threads
CN201580082261.5A CN107851006B (en) 2015-08-18 2015-08-18 Multithreaded register map

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2015/068977 WO2017028909A1 (en) 2015-08-18 2015-08-18 Shared physical registers and mapping table for architectural registers of multiple threads

Publications (1)

Publication Number Publication Date
WO2017028909A1 true WO2017028909A1 (en) 2017-02-23

Family

ID=54007684

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2015/068977 Ceased WO2017028909A1 (en) 2015-08-18 2015-08-18 Shared physical registers and mapping table for architectural registers of multiple threads

Country Status (2)

Country Link
CN (1) CN107851006B (en)
WO (1) WO2017028909A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11294683B2 (en) * 2020-03-30 2022-04-05 SiFive, Inc. Duplicate detection for register renaming

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210132985A1 (en) * 2019-10-30 2021-05-06 Advanced Micro Devices, Inc. Shadow latches in a shadow-latch configured register file for thread storage
CN112445616B (en) * 2020-11-25 2023-03-21 海光信息技术股份有限公司 Resource allocation method and device
CN113626205B (en) * 2021-09-03 2023-05-12 海光信息技术股份有限公司 Processor, physical register management method and electronic device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050138338A1 (en) * 2003-12-18 2005-06-23 Intel Corporation Register alias table cache
WO2011147727A1 (en) * 2010-05-27 2011-12-01 International Business Machines Corporation Improved register allocation for simultaneous multithreaded processors
US8200949B1 (en) * 2008-12-09 2012-06-12 Nvidia Corporation Policy based allocation of register file cache to threads in multi-threaded processor
US20120216004A1 (en) * 2011-02-23 2012-08-23 International Business Machines Corporation Thread transition management
US20130086364A1 (en) * 2011-10-03 2013-04-04 International Business Machines Corporation Managing a Register Cache Based on an Architected Computer Instruction Set Having Operand Last-User Information
US20140122841A1 (en) * 2012-10-31 2014-05-01 International Business Machines Corporation Efficient usage of a register file mapper and first-level data register file

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101794214B (en) * 2009-02-04 2013-11-20 世意法(北京)半导体研发有限责任公司 Register renaming system using multi-block physical register mapping table and method thereof
US8479176B2 (en) * 2010-06-14 2013-07-02 Intel Corporation Register mapping techniques for efficient dynamic binary translation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050138338A1 (en) * 2003-12-18 2005-06-23 Intel Corporation Register alias table cache
US8200949B1 (en) * 2008-12-09 2012-06-12 Nvidia Corporation Policy based allocation of register file cache to threads in multi-threaded processor
WO2011147727A1 (en) * 2010-05-27 2011-12-01 International Business Machines Corporation Improved register allocation for simultaneous multithreaded processors
US20120216004A1 (en) * 2011-02-23 2012-08-23 International Business Machines Corporation Thread transition management
US20130086364A1 (en) * 2011-10-03 2013-04-04 International Business Machines Corporation Managing a Register Cache Based on an Architected Computer Instruction Set Having Operand Last-User Information
US20140122841A1 (en) * 2012-10-31 2014-05-01 International Business Machines Corporation Efficient usage of a register file mapper and first-level data register file

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11294683B2 (en) * 2020-03-30 2022-04-05 SiFive, Inc. Duplicate detection for register renaming
US11640301B2 (en) 2020-03-30 2023-05-02 SiFive, Inc. Duplicate detection for register renaming

Also Published As

Publication number Publication date
CN107851006B (en) 2020-12-04
CN107851006A (en) 2018-03-27

Similar Documents

Publication Publication Date Title
US7496735B2 (en) Method and apparatus for incremental commitment to architectural state in a microprocessor
KR101025354B1 (en) Global Overflow Method for Virtual Transaction Memory
KR101136610B1 (en) Sequencer address management
US9690625B2 (en) System and method for out-of-order resource allocation and deallocation in a threaded machine
JP7397858B2 (en) Controlling access to the branch prediction unit for a sequence of fetch groups
US10671744B2 (en) Lightweight trusted execution for internet-of-things devices
US11379592B2 (en) Write-back invalidate by key identifier
US10417134B2 (en) Cache memory architecture and policies for accelerating graph algorithms
US10275251B2 (en) Processor for avoiding reduced performance using instruction metadata to determine not to maintain a mapping of a logical register to a physical register in a first level register file
US10169245B2 (en) Latency by persisting data relationships in relation to corresponding data in persistent memory
US11093414B2 (en) Measuring per-node bandwidth within non-uniform memory access (NUMA) systems
US8799611B2 (en) Managing allocation of memory pages
US7509511B1 (en) Reducing register file leakage current within a processor
WO2017028909A1 (en) Shared physical registers and mapping table for architectural registers of multiple threads
US20200125496A1 (en) Operation of a multi-slice processor implementing a unified page walk cache
US20140244977A1 (en) Deferred Saving of Registers in a Shared Register Pool for a Multithreaded Microprocessor
US12019629B2 (en) Hash-based data structure
KR20250027760A (en) List of split registers for renaming
US7055020B2 (en) Flushable free register list having selected pointers moving in unison
US20220066830A1 (en) Compaction of architected registers in a simultaneous multithreading processor
GB2516091A (en) Method and system for implementing a dynamic array data structure in a cache line
US20200019405A1 (en) Multiple Level History Buffer for Transaction Memory Support

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15754162

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15754162

Country of ref document: EP

Kind code of ref document: A1