[go: up one dir, main page]

CN119201234A - Instruction processing method, processor and electronic device - Google Patents

Instruction processing method, processor and electronic device Download PDF

Info

Publication number
CN119201234A
CN119201234A CN202411266817.3A CN202411266817A CN119201234A CN 119201234 A CN119201234 A CN 119201234A CN 202411266817 A CN202411266817 A CN 202411266817A CN 119201234 A CN119201234 A CN 119201234A
Authority
CN
China
Prior art keywords
instruction
cache
queue
unit
fetch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411266817.3A
Other languages
Chinese (zh)
Inventor
张克松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hygon Information Technology Co Ltd
Original Assignee
Hygon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hygon Information Technology Co Ltd filed Critical Hygon Information Technology Co Ltd
Priority to CN202411266817.3A priority Critical patent/CN119201234A/en
Publication of CN119201234A publication Critical patent/CN119201234A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3814Implementation provisions of instruction buffers, e.g. prefetch buffer; banks

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

本公开的至少一个实施例提供了一种指令处理方法、处理器以及电子装置。该指令处理方法包括:根据指令预取请求提取对象指令,将提取的对象指令缓存在指令缓存队列中;响应于来自处理器核的流水线的指令释放信息,将对象指令从指令缓存队列填写到指令缓存单元中,其中,指令缓存单元被配置为由处理器核直接访问。该指令处理方法可以减少基于分支预测预取的指令对于指令缓存单元的污染,提高了处理器的性能。

At least one embodiment of the present disclosure provides an instruction processing method, a processor, and an electronic device. The instruction processing method includes: extracting an object instruction according to an instruction prefetch request, caching the extracted object instruction in an instruction cache queue; in response to instruction release information from a pipeline of a processor core, filling the object instruction from the instruction cache queue into an instruction cache unit, wherein the instruction cache unit is configured to be directly accessed by the processor core. The instruction processing method can reduce the pollution of the instruction cache unit by instructions prefetched based on branch prediction, thereby improving the performance of the processor.

Description

Instruction processing method, processor and electronic device
Technical Field
Embodiments of the present disclosure relate to an instruction processing method, a processor, and an electronic device.
Background
Processor cores of single-core or multi-core processors increase inter-Instruction parallelism (Instruction LEVEL PARALLELISM) through pipelining.
FIG. 1 shows a schematic diagram of a pipeline of a processor core.
As shown in FIG. 1, the processor core includes a plurality of pipeline stages, the broken lines with arrows in the figure represent redirected instruction streams, for example, after the pipeline feeds into program counters of various sources, a next Program Counter (PC) is selected through a multiplexer (Mux), and the corresponding instruction of the program counter is subjected to branch prediction (Branch prediction), instruction fetch (Instruction fetch), instruction Decode (Decode), instruction dispatch and renaming (DISPATCH AND RENAME), instruction execution (Execute), instruction end (Retire), and the like. Decoupling queues, typically first-in-first-out (FIFO) queues, are provided as needed between the various pipeline stages. For example, a Branch Prediction (BP) FIFO queue is provided after the branch prediction unit to store branch prediction results, an instruction cache unit (Instruction Cache, IC) FIFO is provided after the instruction fetch unit to cache fetched instructions, a Decode (DE) FIFO is provided after the instruction decode unit to cache decoded instructions, and an end (RT) FIFO is provided after the instruction dispatch and rename unit to cache instructions waiting for confirmation of release after execution.
Meanwhile, the pipeline of the processor core further comprises an instruction queue for buffering waiting instruction execution units to execute instructions after instruction dispatch and renaming. To support high operating frequencies, each pipeline stage may in turn contain multiple pipeline stages (clock cycles). Although each pipeline stage performs limited operations, each clock may be minimized to improve the performance of the CPU core by increasing the operating frequency of the CPU. Each pipeline stage may also further improve the performance of the processor core by accommodating more instructions, i.e., superscalar (superscalar) technology.
Disclosure of Invention
At least one embodiment of the present disclosure provides an instruction processing method including fetching an object instruction according to an instruction prefetch request, caching the fetched object instruction in an instruction cache queue, and filling the object instruction from the instruction cache queue into an instruction cache unit in response to instruction release information from a pipeline of a processor core, wherein the instruction cache unit is configured to be directly accessed by the processor core.
At least one embodiment of the present disclosure provides a processor including control logic, an instruction fetch unit, an instruction release unit, an instruction cache unit, and an instruction cache queue. The instruction fetching unit is configured to fetch an object instruction according to an instruction prefetch request, cache the fetched object instruction in the instruction cache queue, the instruction release unit is configured to provide instruction release information, and the control logic is configured to fill the object instruction from the instruction cache queue into the instruction cache unit in response to the instruction release information, wherein the instruction cache unit is configured to be directly accessed by the instruction fetching unit.
At least one embodiment of the present disclosure provides an electronic device comprising a processor according to any one of the embodiments of the present disclosure.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly described below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure, not to limit the present disclosure.
FIG. 1 shows a schematic diagram of a pipeline of a processor core.
FIG. 2 illustrates a schematic diagram of a finger portion of a processor core that involves branch prediction.
FIG. 3 illustrates a schematic diagram of a processor core in accordance with at least one embodiment of the present disclosure.
Fig. 4 illustrates an exemplary flow chart of an instruction processing method in accordance with at least one embodiment of the present disclosure.
FIG. 5 illustrates a flow diagram of a secondary cache backfilling operation in accordance with at least one embodiment of the present disclosure.
FIG. 6 illustrates an exemplary operational flow diagram of an instruction cache unit update in an instruction processing method in accordance with at least one embodiment of the present disclosure.
Fig. 7 illustrates an exemplary operational flow diagram of a snoop operation in an instruction processing method in accordance with at least one embodiment of the present disclosure.
Fig. 8 illustrates an exemplary flow chart of an instruction processing method in accordance with at least one further embodiment of the present disclosure.
Fig. 9 shows a schematic diagram of a processor front-end architecture according to one embodiment of the present disclosure.
FIG. 10 illustrates a schematic diagram of a relationship between various queues included in a processor core in at least one embodiment of the present disclosure.
Fig. 11 is a schematic diagram of an electronic device according to at least one embodiment of the present disclosure.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present disclosure. It will be apparent that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are within the scope of the present disclosure, based on the described embodiments of the present disclosure.
Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. Likewise, the word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.
In the existing processor (also referred to as processor core or CPU core in this disclosure) architecture, both the program and the data are stored in memory (e.g., DRAM), so there are a large number of memory read instructions (Load instructions) in the program. Because the operating frequency of the processor core is far higher than the operating frequency of the memory, hundreds of processor core clocks are required to acquire data from the memory, which often causes idle running of the processor core due to incapability of continuing to run related instructions, and performance loss is caused. High performance processor cores typically include multiple levels of caches (caches) (e.g., level one Cache L1, level two Cache L2, etc.) to reduce the latency of memory accesses and speed up the operation of the processor core, but when reading data that has never been accessed or data that has been kicked out due to Cache size limitations, the processor core still needs to wait tens or even hundreds of clock cycles, which can result in performance loss.
In multi-level caches, the level one Cache (L1 Cache) is typically integrated directly within the processor core and operated directly by the processor core, the level one Cache L1 is very fast but has a relatively small capacity, typically between a few KB to a few tens KB, and is further divided into an instruction Cache (IC for storing instructions to be executed) and a data Cache (DC for storing data being processed) for storing data and instructions that are most frequently accessed by the processor core, to reduce the number of times data is read from a slower Cache or memory. The second level Cache (L2 Cache or L2 for short) may be integrated in the processor core or may be located outside the processor core, the speed of the second level Cache is slower than that of the first level Cache, but still fast, the second level Cache can respond to the read-write request of the CPU quickly, the capacity of the second level Cache is larger than that of the first level Cache, and the second level Cache is not generally subdivided into an instruction Cache and a data Cache, so that data and instructions which cannot be found in the first level Cache but are used recently or can be used again soon can be stored. The tertiary Cache (L3 Cache) is usually located in the CPU package, but may be separate from the processor core or shared in the multi-core processor, is slower than the secondary Cache but still much faster than the main memory, has a larger capacity than the secondary Cache, typically between a few MB and tens of MB, further expands the capacity of the Cache, storing data that cannot be found in the secondary Cache but may be accessed again in the near future. In multi-core processors, tertiary caches are typically designed as shared caches for all cores to access to reduce cache coherency issues.
For example, each cache may include a cache miss (miss) queue for which when a read-write request or prefetch request is not in the cache, then a read-to-next level cache or memory-to-memory read is required, the request and its corresponding attributes being held in the cache miss queue until the next level cache or memory returns the data or instruction for which the request is intended.
In addition, the high-performance processor core not only includes a multi-level cache architecture to store recently accessed data, but also utilizes a prefetcher to find out the rule of checking data and instruction access by the processor so as to prefetch the data and the instruction to be accessed into a cache in advance. If the prefetching is instruction, the corresponding operation is called instruction prefetching, the corresponding prefetcher is instruction prefetcher, if the prefetching is data, the corresponding prefetcher is data prefetcher. The latter may be further subdivided into L1 prefetchers (prefetching to the first level cache), L2 prefetchers (prefetching to the second level cache), LLC data prefetchers (prefetching to the last level cache (LAST LEVEL CACHE)) and so on, depending on the target cache location.
Currently, there are various methods for prefetching instructions, such as prefetching N cache lines (CACHE LINE) or +1 cache lines consecutively, a branch prediction method using branch prediction, a fetch-specific-mode (pattern) instruction prefetching method, a prefetching method based on the characteristics of the cache hierarchy in which the instruction is located, and so on. Some processors employ a branch prediction based instruction prefetch method (Fetch-Directed Instruction Prefetch, FDIP) that is capable of relatively accurately issuing instruction prefetch requests.
FIG. 2 illustrates a schematic diagram of a finger portion of a processor core that involves branch prediction.
As shown in FIG. 2, the front end of the processor core includes a fetch unit, a decode unit, an execution unit, an instruction release unit, a branch prediction unit, an instruction Cache unit (L1) instruction Cache, control logic, etc., and other components are omitted, where corresponding control needs to be implemented, for example, branch prediction address queue write control, branch prediction address queue release control, prefetch control, fetch control, instruction write-back control, miss fetch control, etc., respectively, and the processor core may include a built-in secondary Cache unit (L2 Cache) or be coupled to an external secondary Cache unit.
The instruction fetching unit fetches instructions based on the branch prediction unit, obtains instruction data, provides the instruction data to the decoding unit, provides the instruction data for the execution unit to execute after the decoding unit decodes the instruction data, and provides execution information for the instruction release unit, wherein the instruction release unit provides instruction release information according to related information submitted by instructions so as to control whether to release an instruction branch prediction address information queue.
As shown in FIG. 2, a branch prediction unit is coupled to the instruction release unit and the instruction fetch unit, receives instruction release information from the instruction release unit, and provides branch prediction information to the instruction fetch unit after a branch prediction has been made. For example, the branch prediction unit includes a branch prediction component and a branch prediction address information queue, and the control logic provides branch prediction address queue release control and branch prediction address queue write control accordingly. In the branch prediction unit, a branch prediction unit performs branch prediction, and virtual addresses of a branch prediction fragment, i.e., an instruction fragment, obtained by the branch prediction are stored in a branch prediction address information queue based on a branch prediction address queue write control. Also, the physical address after address translation may be simultaneously stored in the branch prediction address information queue. The branch prediction address information is written into the branch prediction address information queue and is also transferred to the instruction fetch unit as fetch information.
As shown in FIG. 2, the instruction fetch unit is coupled to the instruction cache unit and includes a fetch information queue, a fetch request queue, etc., and accordingly the control logic provides fetch control, prefetch control, miss fetch control, etc.
The fetch unit fills the fetch information into the fetch information queue after receiving the branch prediction address information from the branch prediction unit, and may perform a prefetch operation based on prefetch control when filling the fetch information queue. In addition, the instruction prefetch operation or the instruction fetch request to the secondary Cache unit (L2 Cache) caused by the Miss (Miss) of the query instruction Cache unit in the normal instruction fetch process is stored in the instruction fetch request queue, then the secondary Cache itself or the lower-level Cache or the memory is waited for returning the required instruction data, and the instruction fetch request caused by the Miss (Miss) of the query instruction Cache unit is performed based on Miss instruction fetch control.
In the process that the second-level cache unit acquires instruction data and the instruction data is backfilled to the instruction cache unit through instruction write-back control, the instruction fetch information queue is awakened through the information stored in the instruction fetch request queue, so that instruction data can be acquired from an instruction backfill bus through instruction reading operation corresponding to the instruction fetch information at the head of the instruction fetch information queue. Otherwise, if the instruction data of the current backfill is not at the head of the instruction fetching information queue, the instruction data is backfilled into the instruction cache unit, and then the instruction data is read out from the instruction cache unit again to complete the corresponding instruction fetching operation.
In the architecture design implementing the branch prediction based instruction pre-fetching method described above, instruction data is either fetched from the instruction backfill bus of the secondary cache unit or read from the storage array (i.e., SRAM array) of the secondary cache unit itself. In theory, the best mode is that the instruction data is directly taken out from the backfill bus of the secondary cache unit, so that the power consumption of the instruction for accessing the storage array of the secondary cache unit can be reduced, and the instruction data can be read earlier, thereby improving the instruction fetching efficiency. In practice, however, due to branch prediction, a large portion of the instruction data is stored into the cache early, so that the cache has to be read repeatedly in order to fetch the instruction data. This increases both the power consumption of the read operation and reduces the fetch efficiency of the CPU core. Moreover, branch prediction errors may also occur frequently in processors, and then instruction prefetching based on branch prediction may be difficult to circumvent the instruction prefetching of the wrong path of the branch prediction error, resulting in instruction cache pollution and thus more cache query misses. Furthermore, aggressive prefetch operations may be more severe to the pollution of the instruction cache.
At least one embodiment of the present disclosure provides a processor, an instruction processing method, and an electronic device.
A processor in accordance with at least one embodiment of the present disclosure includes control logic, an instruction fetch unit, an instruction release unit, an instruction cache unit L, and an instruction cache queue L0. The instruction fetch unit is configured to fetch the object instruction according to the instruction prefetch request, to cache the fetched object instruction in the instruction cache queue L0, the instruction release unit is configured to provide instruction release information, and the control logic is configured to fill the object instruction from the instruction cache queue L0 into the instruction cache unit L1 in response to the instruction release information.
Accordingly, an instruction processing method according to at least one embodiment of the present disclosure includes fetching an object instruction according to an instruction prefetch request, buffering the fetched object instruction in an instruction cache queue L0, and filling the object instruction from the instruction cache queue L0 into an instruction cache unit L1 in response to instruction release information from a processor pipeline.
The "object instruction" is used herein to refer to any possible instruction that is the object of description, and instruction cache unit L1 is configured to be directly accessed by the processor core, and correspondingly the secondary cache is not directly accessed by the processor core. For example, if the instruction cache unit is not queried while the processor core is fetching, the instruction cache unit will request that the required instruction data be read from the secondary cache and returned to the processor core after the required instruction data is fetched from the secondary cache, i.e. the processor core accesses the secondary cache through the instruction cache unit.
The above-described embodiments of the present disclosure add instruction cache queues (e.g., herein understood as "zero-level (L0) instruction caches") in addition to instruction cache units in a processor (or processor core), and in at least one example of an embodiment, may further provide for time to backfill instructions to instruction cache units, improving instruction fetch efficiency, avoiding pollution to instruction cache units due to prefetched instructions. Here, as the instruction cache queues are added to the front end of the processor core, the control logic of the processor is correspondingly increased. These control logic in the processor core of embodiments of the present disclosure relate to the implementation of various functions in the processor, either centrally or distributed.
A processor according to embodiments of the present disclosure may include a single processor core or include multiple processor cores, for example, and may be provided in the form of a separately packaged processor, or integrated with other functional components in the form of a system on a chip (SOC), for example. For example, the processor core may employ microarchitecture as required by the x86, ARM, RISC-V, MIPS instruction set, as embodiments of the present disclosure are not limited in this respect.
The processor and instruction processing method of the embodiments of the present disclosure will be described below with reference to specific examples.
FIG. 3 illustrates a schematic diagram of a processor core, primarily illustrating a front end of the processor core, which relates to a finger fetch portion of a branch prediction, in accordance with at least one embodiment of the present disclosure.
As shown in FIG. 3, the instruction fetch unit, decode unit, execution unit, instruction release unit, branch prediction unit, instruction cache queue, control logic, etc. of the processor core omit other components, and the control logic is respectively disposed where corresponding control needs to be implemented, for example, branch prediction address queue write control, branch prediction address queue release control, prefetch control, instruction fetch control, instruction cache independent write back control, instruction cache write back control, miss (Miss) instruction fetch control, etc., where the processor core may include a built-in secondary cache unit (or simply "secondary cache"), or be coupled to a secondary cache unit external to the processor core.
The instruction fetching unit fetches instructions based on the branch prediction unit, obtains instruction data, provides the instruction data for the decoding unit, provides the instruction execution ending information for the instruction releasing unit after the decoding unit decodes the instruction data, and provides the instruction releasing information for controlling whether to release the instruction branch prediction address information queue according to the related information submitted by the instruction.
The control logic may be implemented by a micro-program or hard-wired in embodiments of the present disclosure, which embodiments of the present disclosure are not limited to. The execution unit includes a plurality of different types of functional units to handle different types of operations, such as integer operations, floating point operations, vector operations, etc., for example, the execution unit may include an Arithmetic Logic Unit (ALU), a floating point operation unit (FPU), a vector execution unit (Vector Execution Unit), a load/store execution unit (LSU), a special function execution unit (Special Function Execution Unit), etc., as embodiments of the disclosure are not limited in this respect.
As shown in FIG. 3, a branch prediction unit is coupled to the instruction release unit and the instruction fetch unit, receives instruction release information from the instruction release unit, and provides branch prediction information to the instruction fetch unit after a branch prediction has been made. For example, the branch prediction unit includes a branch prediction component and a branch prediction address information queue, and the control logic provides branch prediction address queue release control and branch prediction address queue write control accordingly. In the branch prediction unit, a branch prediction unit performs branch prediction, and virtual addresses of a branch prediction fragment, i.e., an instruction fragment, obtained by the branch prediction are stored in a branch prediction address information queue based on a branch prediction address queue write control. Also, the physical address after address translation may be simultaneously stored in the branch prediction address information queue. The branch prediction address information is written into the branch prediction address information queue and is also transferred to the instruction fetch unit as fetch information.
As shown in fig. 3, the instruction fetch unit is coupled to the instruction cache unit and the instruction cache queue, and includes a fetch information queue, a fetch request queue, and the like, and accordingly the control logic provides fetch control, prefetch control, miss fetch control, and the like.
The instruction cache unit and the instruction cache queue are coupled to each other for caching instruction data (but not for caching application data processed during execution of the instructions), and the instruction cache queue is further coupled to the secondary cache unit, and further, the instruction cache unit may be coupled to the secondary cache unit as needed, whereby the instruction cache unit and the instruction cache queue may receive backfilled instructions directly from the secondary cache unit. Accordingly, the control logic provides instruction cache unit write back control for backfilling instruction data from the instruction cache queue to the instruction cache unit and instruction cache queue write back control for backfilling instruction data from the secondary cache unit to the instruction cache queue.
For example, in at least one embodiment of the present disclosure, the instruction cache unit and the second level cache unit may be implemented in a conventional manner, and the instruction cache queue may be identical in implementation (including structure and control) to the conventional cache unit, for example, include Static Random Access Memory (SRAM), and use cache line (CACHE LINE) as a basic data storage unit, and the address mapping manner may be direct mapping, full association, set association mapping, and the like, and the embodiment of the present disclosure does not limit the implementation of the instruction cache queue, the instruction cache unit, and the second level cache unit.
The fetch unit fills the fetch information into the fetch information queue after receiving the branch prediction address information from the branch prediction unit, and may perform a prefetch operation based on prefetch control when filling the fetch information queue. In addition, the instruction prefetch operation or the instruction fetch request to the secondary Cache unit (L2 Cache) caused by the Miss (Miss) of the query instruction Cache unit in the normal instruction fetch process is stored in the instruction fetch request queue, then the secondary Cache itself or the lower-level Cache or the memory is waited for returning the required instruction data, and the instruction fetch request caused by the Miss (Miss) of the query instruction Cache unit is performed based on Miss instruction fetch control.
In the process that the second-level cache unit acquires instruction data and the instruction data is backfilled to the instruction cache unit through instruction write-back control, the instruction fetch information queue is awakened through the information stored in the instruction fetch request queue, so that instruction data can be acquired from an instruction backfill bus through instruction reading operation corresponding to the instruction fetch information at the head of the instruction fetch information queue. Otherwise, if the instruction data of the current backfill is not at the head of the instruction fetching information queue, the instruction data is backfilled into the instruction cache unit, and then the instruction data is read out from the instruction cache unit again to complete the corresponding instruction fetching operation.
For example, an instruction processing method according to at least one embodiment of the present disclosure further includes generating an instruction prefetch request according to the branch prediction result. For example, a branch prediction component of a branch prediction unit of a processor generates an instruction prefetch request based on a branch prediction result.
For example, in a processor and an instruction processing method according to at least one embodiment of the present disclosure, fetching an object instruction according to an instruction prefetch request includes acquiring an instruction prefetch address according to the instruction prefetch request, transmitting an access request to a secondary cache L2 using the instruction prefetch address in response to an instruction prefetch operation according to the instruction prefetch address, and writing the object instruction corresponding to the instruction prefetch address to an instruction cache queue L0 in response to an instruction data backfilling operation of the secondary cache L2.
For example, in a processor and an instruction processing method according to at least one embodiment of the present disclosure, fetching an object instruction according to an instruction prefetch request further includes determining whether to perform an instruction prefetch operation according to a prefetch algorithm.
For example, an instruction processing method according to at least one embodiment of the present disclosure further includes fetching an object instruction according to a fetch request to the instruction cache unit L1 or the instruction cache queue L0, and then sending the object instruction to the decode unit and dispatch to the execution unit. For example, the decoding unit of the processor decodes the object instruction after receiving the object instruction, and then provides the decoding result to the corresponding execution unit for execution. The decoded results include micro instructions that are dispatched by a dispatch unit (not shown) to an execution unit for execution.
For example, in a processor and an instruction processing method according to at least one embodiment of the present disclosure, obtaining an object instruction according to a fetch request to an instruction cache unit L1 or an instruction cache queue L0 includes writing the fetch request to a fetch information queue, and querying the instruction cache unit L1 or the instruction cache queue L0 according to the fetch information queue to obtain the object instruction.
The above-described processor and instruction processing method are described below with reference to specific embodiments.
Fig. 4 illustrates an exemplary flow chart of an instruction processing method in accordance with at least one embodiment of the present disclosure.
As shown in fig. 4, first, (1) a branch prediction unit receives instruction release information from an instruction release unit, a branch prediction unit generates a branch prediction result according to a branch prediction algorithm, obtains a new instruction address, and stores the new instruction address into a branch prediction address information queue.
And (2) determining whether to perform prefetching operation according to an instruction prefetching algorithm based on the new instruction address, if so, distributing an instruction Miss queue to the prefetching request, storing the instruction prefetching address corresponding to the expected request into the instruction Miss queue, and then initiating instruction fetching request operation to the secondary cache to wait for the response of the secondary cache.
And (3) storing the fetch information corresponding to the new instruction address into a fetch information queue. The fetch information at the head of the fetch information queue is the oldest fetch information.
And (4) reading the instruction fetch information at the head of the instruction fetch information queue, and judging whether the current instruction fetch operation occupies the instruction Miss queue.
(5) Based on the result of (4), if the instruction Miss queue is not occupied, the instruction cache queue will be looked up.
(6) Based on the result of the step (4), if the instruction Miss queue is occupied, continuing to judge whether the instruction data to be extracted by the current instruction fetching operation is in a backfilling process or a backfilled state of the secondary cache.
(7) Based on the result of (6), if the instruction data to be extracted by the current fetching operation is not in the backfilling process or the backfilling state of the secondary cache, the current fetching operation is stopped, and the fetching information is not read out from the fetching information queue.
(8) Based on the result of (6), if the instruction data to be extracted by the current instruction fetching operation is in the backfilling process or the backfilled state of the secondary cache, judging whether the instruction data is in the backfilled state.
Further, (9) based on the result of (8), if the current required instruction data is in the backfilled state, inquiring the instruction cache queue and obtaining an entry of instruction information corresponding to the instruction fetching address in the instruction cache queue by indexing, and reading the required instruction data from the instruction cache queue, otherwise, (10) based on the result of (8), if the current required instruction data is not in the backfilled state, directly acquiring the instruction data required for reading from a backfilled bus of the secondary cache L2 to the instruction cache unit.
(11) Based on the two conditions (9) and (10), the success of finger taking is determined, and the finger taking operation is completed.
Further, in the above operation, (12) based on the result of (4), if the current fetch operation does not occupy the instruction Miss queue, the instruction cache queue is searched, and at this time, if the instruction cache queue is searched, i.e., no query Miss (Miss) occurs, the fetch is successful, which means that the fetch operation is completed this time.
Otherwise, (13) based on the result of (4), if the query instruction cache queue is not hit, namely query missing (Miss) occurs, the instruction cache unit L1 is further searched, if the query instruction cache unit L1 is hit, the instruction fetching operation is completed, otherwise, (14) based on the result of (13), a memory access request is sent to the instruction Miss queue.
(15) In either case of (2) or (14) above, it is necessary to read the memory request from the instruction Miss queue and send the memory request to the secondary cache.
In the above operation, if the access request is sent to the instruction Miss queue, and then the instruction fetch operation is initiated to the secondary cache, after receiving the response of the secondary cache, the queues, the instruction fetch buffer unit, the instruction fetch buffer queue, and the like need to be updated according to the response of the secondary cache.
For example, in a processor and instruction processing method in accordance with at least one embodiment of the present disclosure, sending an access request to a secondary cache L2 using an instruction prefetch address includes updating an instruction Miss (Miss) queue for an instruction cache unit using the instruction prefetch address and then sending the access request to the secondary cache L2. And writing the object instruction corresponding to the instruction prefetch address into the instruction cache queue L0 in response to the instruction data backfilling operation of the secondary cache L2, wherein the method comprises the steps of waking up the instruction missing queue in response to a backfilling request of the secondary cache L2 for the object instruction, and determining to write the object instruction corresponding to the instruction prefetch address into the instruction cache queue L0 according to record information corresponding to the object instruction in the instruction missing queue.
For example, in a processor and an instruction processing method according to at least one embodiment of the present disclosure, record information corresponding to an object instruction in an instruction miss queue includes whether an entry corresponding to the object instruction is Valid (Valid) and cacheable (non-cacheable).
FIG. 5 illustrates a flow diagram of a secondary cache backfilling operation in accordance with at least one embodiment of the present disclosure.
As shown in FIG. 5, the second level cache backfills a requested instruction data to the instruction cache queue, (1) at this time, the entry information corresponding to the instruction data to be backfilled in the instruction Miss queue needs to be checked, and the entry in the instruction Miss queue is awakened to start the data backfilling process.
Then, (2) see if the corresponding entry in the instruction Miss queue is VALID (VALID), if the entry is not VALID at this time (VALID value is 0), so this instruction data backfilled from the level two cache L2 will not be backfilled into the instruction cache queue.
Otherwise, (3) if the entry is VALID at this time (VALID value is 1), the cache attribute of the instruction fetch request needs to be continuously checked, if the cache attribute is Non-Cacheable, the instruction data of the pen backfilled from the secondary cache L2 will not be backfilled into the instruction cache queue, otherwise, the instruction data is backfilled into the instruction cache queue.
In addition, (4) after the corresponding entry in the wake-up instruction Miss queue of (1) above, the corresponding entry in the get instruction information queue may be awakened.
And (5) judging whether the awakened instruction fetch item is positioned at the head of the instruction fetch information queue, if so, obtaining instruction data in a backfill bus of the direct secondary cache, and otherwise, performing subsequent indexing/query operation.
As described above, the processor of the embodiment of the present disclosure adds an instruction cache queue, which may be regarded as a level 0 cache, to the processor shown in fig. 2, for example, and accordingly, two cache systems of the instruction cache queue and the instruction cache unit need to be maintained.
FIG. 6 illustrates an exemplary operational flow diagram of an instruction cache unit update in an instruction processing method in accordance with at least one embodiment of the present disclosure.
As shown in FIG. 6, first, (1) the instruction release unit releases a number of branch prediction entries to the branch prediction unit, judges whether the number of released branch prediction entries is 0, if so, continues to monitor the instruction release unit, otherwise, starts the update operation of the instruction cache unit.
(2) After the update operation is started, determining the corresponding entry of the branch prediction entry released by the release unit to the branch prediction unit in the branch prediction address information queue through indexing or searching.
(3) On the basis of (2), for the determined branch prediction entry to be released, determining the physical position of the instruction information corresponding to the instruction fetch request, which needs to be backfilled into the instruction cache unit, according to the physical address.
Thereafter, (4) waking up a corresponding entry of the branch prediction entry in the instruction Miss queue based on (2) based on the branch prediction entry to be released.
(5) The "backfill process flag bit" in the corresponding entry in the instruction Miss queue is set high (i.e., 1, in the backfill process).
Thereafter, (6) find instruction Miss queue if there is an entry with "backfill" and "valid" bit of 1 and "shared" flag bit of 0 (with shared instruction fetch request reduced, "shared" flag adaptation determination is set to 0).
(7) According to the result of (6), if such an entry exists, an entry of the instruction Miss queue to be backfilled into the instruction cache unit is determined, and the "backfill process" flag bit of the entry is set low (i.e., 0, the backfill process is completed).
And (8) determining the WAY (WAY) information needed to be backfilled to the instruction cache unit according to the group (SET) index of the instruction cache unit stored in the instruction Miss queue entry.
And (9) backfilling the corresponding instruction data in the instruction cache queue to the designated position in the instruction cache unit.
And (10) releasing the corresponding entry of the instruction Miss queue.
In the processor of the embodiment of the present disclosure, since there is an instruction cache queue, a SNOOP (SNOOP) management control system may be updated. SNOOP (SNOOP) operations are used primarily in cache coherency protocols to ensure coherency of data in multiprocessor or multi-cache systems. The snoop mechanism allows the cache controller to snoop communications between other caches or memory controllers, thereby being able to detect modifications to the shared data and update its own cache copy as needed. This ensures that all the processors see up-to-date data and avoids problems due to inconsistencies.
The MESI protocol is a common cache coherency protocol, whose names originate from four possible states, modified, exclusive, shared, invalid. These states help manage the state of the cache blocks in the different processor caches to ensure data coherency.
For example, according to the MESI protocol, when a cache attempts to Modify a shared data, it sends a "Modify" request to the bus, and other cache controllers, upon listening to the request, check whether they hold a copy of the data. If the other cache does hold a copy of the data, it will mark the copy as Invalid (Invalid), thereby ensuring that only the cache requesting the modification can modify the data.
For example, an instruction processing method according to at least one embodiment of the present disclosure further includes maintaining cache coherency between the instruction cache unit L1 and the instruction cache queue L0.
For example, in a processor and instruction processing method in accordance with at least one embodiment of the present disclosure, maintaining cache coherency between instruction cache unit L1 and instruction cache queue L0 includes maintaining cache coherency by a snoop operation issued by secondary cache L2. For example, the control logic of the secondary cache includes performing snoop operations to maintain cache coherency.
For example, in a processor and an instruction processing method according to at least one embodiment of the present disclosure, maintaining cache coherency by issuing snoop operations through a level two cache L2 includes clearing a valid bit in a first target entry of an instruction cache queue L0 in response to a snoop operation hitting the first target entry and determining whether the cleared first target entry is operating as a backfill instruction cache unit, and clearing the valid bit in the first target entry in response to a snoop operation hitting a second target entry of the instruction cache unit L1.
Fig. 7 illustrates an exemplary operational flow diagram of a snoop operation in an instruction processing method in accordance with at least one embodiment of the present disclosure.
As shown in fig. 7, (1) a SNOOP (SNOOP) operation request is sent from the level two cache L2.
Then, (2) based on (1), determining whether a branch prediction address information queue is hit, if an entry in the branch prediction address information queue is hit, indicating that a cache line to be snooped is already in the process of the processor pipeline, then clearing the VALID bit (VALID) of the entry, and Flushing (FLUSH) other instructions in the processor pipeline following the cache line (CACHELINE).
(3) Based on (1), it is determined whether a cache line in the instruction cache queue is hit.
(4) On the basis of the step (3), if the cache line in the instruction cache queue is hit, the VALID bit (VALID) of the cache line in the instruction cache queue is cleared, whether the cache line in the instruction cache queue is backfilling the instruction cache unit is further judged, if the backfilling operation is being carried out, the backfilling bus is refreshed, and if not, the operation is abandoned.
(5) On the basis of (3), if the cache line in the instruction cache queue is not hit, the VALID bit (VALID) clearing operation of the present cache line is abandoned.
(6) Based on the step (1), judging whether a cache line in the instruction cache unit is hit, if the cache line in the instruction cache unit is hit, resetting the valid bit corresponding to the cache line in the instruction cache unit, otherwise, abandoning the monitoring request operation.
In at least one embodiment of the present disclosure, because of the addition of the instruction cache queue, in some cases, the instruction fetch unit only needs to read the prefetched instruction into the instruction cache queue, in other states, it needs to look up the instruction cache queue first, and if not, then look up the instruction cache unit. To better reduce the impact on timing, another embodiment of the present disclosure provides a modified instruction processing method, an exemplary flow chart of which is shown in FIG. 7.
Fig. 8 illustrates an exemplary flow chart of an instruction processing method in accordance with at least one further embodiment of the present disclosure. In a processor of at least one embodiment of the present disclosure, the control logic adjusts accordingly.
As shown in fig. 8, first, (1) a branch prediction unit receives instruction release information from an instruction release unit, a branch prediction unit generates a branch prediction result according to a branch prediction algorithm, obtains a new instruction address, and stores the new instruction address into a branch prediction address information queue.
And (2) determining whether to perform prefetching operation according to an instruction prefetching algorithm based on the new instruction address, if so, allocating an instruction Miss queue to the prefetching request, storing the instruction prefetching address corresponding to the expected request into the instruction Miss queue, then initiating the instruction prefetching operation to the secondary cache, and waiting for the response of the secondary cache.
And (3) storing the fetch information corresponding to the new instruction address into a fetch information queue.
(4) Based on the step (2), if the prefetching operation is performed, the subsequent operations such as querying the instruction cache queue are abandoned.
(5) Based on (2), if no prefetch operation is performed, the instruction Miss queue is looked up based on the instruction address.
(6) Based on the result of (5), if the instruction Miss queue is not hit, the subsequent operations such as inquiring the instruction cache queue are abandoned, and if the instruction Miss queue is hit, the instruction Miss queue is marked to be occupied by the instruction fetching operation.
And (7) reading the instruction fetch information at the head of the instruction fetch information queue, and judging whether the current instruction fetch operation occupies the instruction Miss queue.
(8) Based on the result of (7), if the instruction Miss queue is not occupied, the instruction cache unit will be looked up.
(9) Based on the result of the step (7), if the instruction Miss queue is occupied, continuing to judge whether the instruction data to be extracted by the current instruction fetching operation is in a backfilling process or a backfilled state of the secondary cache.
(10) Based on the result of (9), if the instruction data to be extracted by the current fetching operation is not in the backfilling process or the backfilling state of the secondary cache, the current fetching operation is stopped, and the fetching information is not read out from the fetching information queue.
(11) Based on the result of the step (9), if the instruction data to be extracted by the current instruction fetching operation is in the backfilling process or the backfilled state of the secondary cache, judging whether the instruction data is in the backfilled state or not.
Further, (12) based on the result of (11), if the currently required instruction data is in the backfilled state, querying an instruction cache queue, and obtaining an entry of instruction information corresponding to the instruction fetching address in the instruction cache queue by indexing, and reading the required instruction data from the instruction cache queue.
And (13) if the currently required instruction data is not in the backfilled state based on the result of (11), directly acquiring the instruction data required for reading from the backfilling bus of the second-level cache L2 to the instruction cache unit.
(14) Based on the two conditions (12) and (13), the success of finger taking is determined, and the finger taking operation is completed.
Further, in the above operation, (15) based on the result of (8), the instruction cache unit is searched, and at this time, if the query instruction cache queue hits, i.e., no query Miss (Miss) occurs, it also means that the fetching is successful, and the fetching operation is completed this time.
And (6) if the query instruction cache unit is not hit, namely query missing (Miss) occurs, sending a memory access request to an instruction Miss queue, and if the query instruction cache unit is hit, the instruction fetching operation is finished, wherein the instruction fetching operation is successful.
(17) In either case of (2) or (16) above, it is necessary to read the memory request from the instruction Miss queue and send the memory request to the secondary cache.
Similarly, in the above operation, if the access request is sent to the instruction Miss queue, then the instruction fetch operation is initiated to the secondary cache, and after the response of the secondary cache is received, the queues, the instruction fetch buffer unit, the instruction fetch buffer queue, and the like need to be updated according to the response of the secondary cache.
In the processing method, when the instruction fetching request enters the instruction fetching information queue, the instruction Miss queue is searched, so that whether the instruction is read in the instruction cache queue or the instruction data is read in the instruction cache unit is determined in the instruction fetching process. Since the speed of the instruction fetch request entering the instruction fetch information queue is greater than the speed of the instruction fetch request being read from the instruction fetch information queue, the operation can be performed before entering the instruction fetch information queue.
In at least one embodiment of the present disclosure, a processor (or processor core) translates each architecture instruction (instruction) into one or more micro-instructions (uops) within a micro-architecture, each micro-instruction performing only limited operations, which may ensure that each pipeline stage is short to increase processor core operating frequency. For example, a load may be translated into an address generation micro instruction and a memory read micro instruction, where a second micro instruction depends on the result of the first micro instruction, so that the second micro instruction begins execution only after the first micro instruction has completed execution. The microinstructions include a plurality of microarchitectural-related fields that are used to communicate related information between the pipeline stages.
Accordingly, in this embodiment, the instruction cache system at the front end of the pipeline of the processor core includes a micro instruction cache (OC) in addition to the instruction cache unit (IC) and the instruction cache queue, and in order to better increase the respective effective capacities of the instruction cache unit and the micro instruction cache, it is necessary to maintain a mutual exclusion (exclusive) relationship between the two.
Fig. 9 shows a schematic diagram of a processor front-end architecture according to one embodiment of the present disclosure. As shown in fig. 9, the front-end architecture 10 of the processor includes a branch prediction unit 101, a branch prediction address information queue 102, an instruction cache unit 103, a decode unit 104, a microinstruction processing module 105, a microinstruction cache unit 106, a microinstruction queue 107, and an issue unit 108.
For the front end, the corresponding finger extraction method can be as follows.
The branch prediction unit 101 sends prediction information to the branch prediction address information queue 102 for caching to await processing of the prediction information.
The processor core initially enables the instruction cache mode to process the prediction information. For example, first, based on address information in the prediction information from the branch prediction address information queue 102, the instruction data requested by the prediction information is tried to be extracted from the instruction cache unit 103, and sent to the decoding unit 104 for decoding. Here, the instruction data may be continuous binary data. Decode units 104 may decode the fetched instruction data into corresponding micro instruction groups (each including one or more micro instructions) and send the micro instruction groups to a micro instruction queue 107 cache to await dispatch (not shown in FIG. 2).
Decode units 104 also provide decoded micro instruction groups to micro instruction cache units 106 for caching. At this point, a micro instruction register entry may be created in the micro instruction cache unit 106 for storing the micro instruction. One or more micro instructions in the micro instruction set are cached in the created micro instruction register entry, e.g., one micro instruction register entry may store 8 micro instructions. The micro instruction cache 106 provides a determination of whether the micro instruction is already present in the micro instruction cache 106 when caching the micro instruction. For example, when a micro instruction is already present in the micro instruction cache unit 106, the micro instruction cache unit 106 may present information about a cache hit (build hit), and when a micro instruction is not present in the micro instruction cache unit 106, the micro instruction cache unit 106 may present information about a cache miss (build miss).
The processor core determines whether to enable the micro instruction cache fetch mode based on the information of the cache hit or cache miss provided by the micro instruction cache unit 106. In one embodiment, for example, when the micro instruction cache unit 106 gives cache hit information that several consecutive micro instruction groups exist in the micro instruction cache unit 106, the determination result is yes, the micro instruction cache fetch mode is enabled, and when the determination result is no, the prediction information is continuously processed in the instruction cache mode.
In response to enabling the micro instruction cache fetch mode, the prediction information in the branch prediction address information queue 102 is sent to a micro instruction cache fetch queue contained in the micro instruction processing module 105, and the sending of the prediction information to the instruction cache unit 103 is stopped.
The micro instruction cache unit 106 determines whether a micro instruction group corresponding to the prediction information can be fetched from the micro instruction cache unit 106 according to the address information in the prediction information. For example, in response to failing to fetch the microinstruction set corresponding to the prediction information, the system resumes processing the prediction information in the instruction cache mode and processing the prediction information of the current miss in the instruction cache mode.
In response to being able to fetch the set of micro instructions corresponding to the prediction information, the fetched set of micro instructions is sent to the micro instruction queue 107 to await dispatch.
The micro instruction queue 107 sequentially directs groups of micro instructions from the instruction cache mode or micro instruction cache mode processing to the issue unit 108 for back-end execution, e.g., register renaming, execution, retirement (retire), etc.
In this embodiment, it is checked at instruction release whether the corresponding cache line is fetched via the micro instruction fetch path, and if so, the corresponding cache line in the instruction cache queue will not be updated in the instruction cache.
For example, in a processor core of an embodiment of the present disclosure as shown in fig. 2, the processor core front end includes a plurality of queues, e.g., a branch prediction address information queue, a fetch request queue, an instruction cache queue. In at least one embodiment of the present disclosure, the size of the instruction cache queue and the size of the instruction fetch request queue are made the same, while the size of the instruction fetch request queue is the same as the size of the branch prediction address information queue.
FIG. 10 illustrates a schematic diagram of a relationship between various queues included in a processor core in at least one embodiment of the present disclosure.
In embodiments of the present disclosure, entries of a branch prediction address information queue are used to store branch prediction related data, e.g., including branch prediction address information, branch prediction instruction information, and the like.
The instruction fetch information queue may be understood herein as a subset of the branch prediction address information queue into which branch prediction information is filled in addition to the branch prediction address information queue, e.g., in at least one embodiment, an index corresponding to an entry of the branch prediction address information queue into which a branch prediction fragment is placed is also stored. As shown in FIG. 10, each entry of the fetch information queue includes a branch prediction address queue index number, a fetch physical address, a level two cache (L2) request queue index number, an instruction cache queue backfill "done/process" flag bit, a fetch request queue index, and the like.
The instruction fetching request queue is used for storing prefetch requests sent by the branch prediction unit or instruction cache unit miss requests generated by the instruction fetching unit, and relevant information distributed to the instruction fetching request queue is also stored in the instruction fetching information queue and used for waking up relevant instruction fetching operations, and meanwhile, the relevant instruction fetching operations can be woken up to the corresponding instruction cache queues to read instruction information. As shown in FIG. 10, each entry of the fetch request queue may include a Valid bit (Valid), a branch prediction address queue tag, a Way value of a corresponding instruction cache unit (IC), a tag bit during backfilling of the instruction cache unit, a fetch address, and so forth.
As shown in fig. 10, the instruction cache queue includes a plurality of cache lines, each of which has a size of, for example, 32 bytes, and each of which stores 1 cache line with a minimum storage granularity of 1/4 cache line.
In this embodiment, the branch prediction address information queue has 64 entries, the fetch information queue has 16 entries, the fetch request queue has 64 entries, and the instruction cache queue has 64 cache lines, and at this time, the capacity (number of entries) of the branch prediction address information queue, the capacity (number of entries) of the fetch request queue, and the size (number of cache lines) of the instruction cache queue are identical to each other, which helps control the instruction backfill time.
Some embodiments of the present disclosure also provide an electronic device including the processor of any one of the above embodiments or an instruction processing method capable of executing any one of the above embodiments.
Fig. 11 is a schematic diagram of an electronic device according to at least one embodiment of the present disclosure. The electronic device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a notebook computer, a PDA (personal digital assistant), a PAD (tablet computer), etc., and a fixed terminal such as a desktop computer.
The electronic device 1000 shown in fig. 11 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments. For example, as shown in fig. 11, in some examples, an electronic device 1000 includes a processor of any of the embodiments of the present disclosure that can perform various suitable actions and processes, such as a method of processing a computer program of an embodiment of the present disclosure, according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage device 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data required for the operation of the computer system are also stored. The processor 1001, ROM 1002, and RAM 1003 are connected thereto by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
For example, components including input devices 1006 such as a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc., output devices 1007 such as a Liquid Crystal Display (LCD), speaker, vibrator, etc., storage devices 1008 such as magnetic tape, hard disk, etc., communication devices 1009 such as network interface cards such as LAN cards, modems, etc., may also be connected to I/O interface 1005. The communication device 1009 may allow the electronic device 1000 to perform wireless or wired communication with other apparatuses to exchange data, performing communication processing via a network such as the internet. The drive 1010 is also connected to the I/O interface 1005 as needed. A removable storage medium 1011, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in the drive 1010, so that a computer program read therefrom is installed as needed in the storage device 1008.
While fig. 11 illustrates an electronic device 1000 that includes various devices, it should be understood that not all illustrated devices are required to be implemented or included. More or fewer devices may be implemented or included instead.
For example, the electronic device 1000 may further include a peripheral interface (not shown), and the like. The peripheral interface may be various types of interfaces, such as a USB interface, a lightning (lighting) interface, etc. The communication means 1009 may communicate with a network, such as the internet, an intranet, and/or a wireless network, such as a cellular telephone network, a wireless Local Area Network (LAN), and/or a Metropolitan Area Network (MAN), and other devices via wireless communication. The wireless communication may use any of a variety of communication standards, protocols, and technologies including, but not limited to, global System for Mobile communications (GSM), enhanced Data GSM Environment (EDGE), wideband code division multiple Access (W-CDMA), code Division Multiple Access (CDMA), time Division Multiple Access (TDMA), bluetooth, wi-Fi (e.g., based on the IEEE 802.11a, IEEE 802.11b, IEEE 802.11g, and/or IEEE 802.11n standards), voice over Internet protocol (VoIP), wi-MAX, protocols for email, instant messaging, and/or Short Message Service (SMS), or any other suitable communication protocol.
For the purposes of this disclosure, the following points are also noted:
(1) The drawings of the embodiments of the present disclosure relate only to the structures related to the embodiments of the present disclosure, and other structures may refer to the general design.
(2) The embodiments of the present disclosure and features in the embodiments may be combined with each other to arrive at a new embodiment without conflict.
The foregoing is merely exemplary embodiments of the present disclosure and is not intended to limit the scope of the disclosure, which is defined by the appended claims.

Claims (23)

1.一种指令处理方法,包括:1. A command processing method, comprising: 根据指令预取请求提取对象指令,将提取的所述对象指令缓存在指令缓存队列中;Extracting the object instruction according to the instruction prefetch request, and caching the extracted object instruction in the instruction cache queue; 响应于来自处理器核的流水线的指令释放信息,将所述对象指令从所述指令缓存队列填写到指令缓存单元中,In response to instruction release information from the pipeline of the processor core, the object instruction is filled from the instruction cache queue into the instruction cache unit, 其中,所述指令缓存单元被配置为由所述处理器核直接访问。Wherein, the instruction cache unit is configured to be directly accessed by the processor core. 2.根据权利要求1所述的指令处理方法,还包括:2. The instruction processing method according to claim 1, further comprising: 根据分支预测结果产生所述指令预取请求。The instruction prefetch request is generated according to the branch prediction result. 3.根据权利要求2所述的指令处理方法,其中,所述根据所述指令预取请求提取所述对象指令,包括:3. The instruction processing method according to claim 2, wherein extracting the target instruction according to the instruction prefetch request comprises: 根据所述指令预取请求获取指令预取地址;Acquire an instruction prefetch address according to the instruction prefetch request; 响应于根据所述指令预取地址进行指令预取操作,使用所述指令预取地址发送访问请求到二级缓存;In response to performing an instruction prefetch operation according to the instruction prefetch address, sending an access request to a secondary cache using the instruction prefetch address; 响应于所述二级缓存的指令数据回填操作,将对应于所述指令预取地址的所述对象指令写入到所述指令缓存队列。In response to the instruction data backfill operation of the secondary cache, the object instruction corresponding to the instruction prefetch address is written into the instruction cache queue. 4.根据权利要求3所述的指令处理方法,其中,所述根据所述指令预取请求提取所述对象指令,还包括:4. The instruction processing method according to claim 3, wherein said extracting said target instruction according to said instruction prefetch request further comprises: 根据预取算法确定是否进行所述指令预取操作。Determine whether to perform the instruction prefetch operation according to a prefetch algorithm. 5.根据权利要求3所述的指令处理方法,其中,5. The instruction processing method according to claim 3, wherein: 所述使用所述指令预取地址发送访问请求到二级缓存,包括:The step of sending an access request to a secondary cache using the instruction prefetch address comprises: 使用所述指令预取地址更新用于所述指令缓存单元的指令缺失队列,然后发送所述访问请求到所述二级缓存;以及updating an instruction miss queue for the instruction cache unit using the instruction prefetch address, and then sending the access request to the secondary cache; and 所述响应于所述二级缓存的指令数据回填操作,将对应于所述指令预取地址的所述对象指令写入到所述指令缓存队列,包括:The step of writing the object instruction corresponding to the instruction prefetch address into the instruction cache queue in response to the instruction data backfill operation of the secondary cache comprises: 响应于所述二级缓存对于所述对象指令的回填请求,唤醒所述指令缺失队列,根据所述指令缺失队列中对应于所述对象指令的记录信息,确定将对应于所述指令预取地址的所述对象指令写入到所述指令缓存队列。In response to the backfill request of the secondary cache for the object instruction, the instruction miss queue is awakened, and according to the record information corresponding to the object instruction in the instruction miss queue, it is determined to write the object instruction corresponding to the instruction prefetch address into the instruction cache queue. 6.根据权利要求5所述的指令处理方法,其中,6. The instruction processing method according to claim 5, wherein: 所述指令缺失队列中对应于所述对象指令的记录信息包括对应于所述对象指令的条目是否有效以及是否是可缓存的。The record information corresponding to the target instruction in the instruction miss queue includes whether the entry corresponding to the target instruction is valid and cacheable. 7.根据权利要求3所述的指令处理方法,其中,所述根据所述指令预取请求提取所述对象指令,还包括:7. The instruction processing method according to claim 3, wherein said extracting said target instruction according to said instruction prefetch request further comprises: 在根据所述分支预测结果产生所述指令预取地址之后,将所述指令预取地址填充到分支预测地址信息队列。After the instruction prefetch address is generated according to the branch prediction result, the instruction prefetch address is filled into the branch prediction address information queue. 8.根据权利要求1所述的指令处理方法,其中,所述响应于来自处理器流水线的指令释放信息,将所述对象指令从所述指令缓存队列填写到指令缓存单元中,包括:8. The instruction processing method according to claim 1, wherein, in response to instruction release information from a processor pipeline, filling the object instruction from the instruction cache queue into the instruction cache unit comprises: 根据所述处理器流水线中的指令释放单元的指令释放信息,确定被释放的分支预测条目;Determining a released branch prediction entry according to instruction release information of an instruction release unit in the processor pipeline; 根据所述被释放的分支预测条目,唤醒指令缺失队列中的目标条目;According to the released branch prediction entry, waking up a target entry in the instruction miss queue; 根据指令缺失队列中的目标条目记录的信息,将所述指令缺失队列的目标条目对应的所述指令缓存队列中的所述对象指令回填到所述指令缓存单元中;According to the information recorded in the target entry in the instruction missing queue, backfill the object instruction in the instruction cache queue corresponding to the target entry in the instruction missing queue into the instruction cache unit; 释放所述指令缺失队列中的目标条目。The target entry in the instruction miss queue is released. 9.根据权利要求8所述的指令处理方法,其中,所述将所述指令缺失队列的目标条目对应的所述指令缓存队列中的所述对象指令回填到所述指令缓存单元中,包括:9. The instruction processing method according to claim 8, wherein the step of backfilling the object instruction in the instruction cache queue corresponding to the target entry of the instruction missing queue into the instruction cache unit comprises: 根据所述对象指令的物理地址确定需要回填到所述指令缓存单元中的物理位置。The physical location that needs to be backfilled into the instruction cache unit is determined according to the physical address of the object instruction. 10.根据权利要求1所述的指令处理方法,还包括:10. The instruction processing method according to claim 1, further comprising: 根据取指请求到所述指令缓存单元或所述指令缓存队列获取所述对象指令,然后将所述对象指令发送到译码单元、并派发到执行单元。The object instruction is obtained from the instruction cache unit or the instruction cache queue according to an instruction fetch request, and then the object instruction is sent to a decoding unit and dispatched to an execution unit. 11.根据权利要求10所述的指令处理方法,其中,所述根据取指请求到所述指令缓存单元或所述指令缓存队列获取所述对象指令,包括:11. The instruction processing method according to claim 10, wherein the acquiring the object instruction from the instruction cache unit or the instruction cache queue according to the instruction fetch request comprises: 将所述取指请求写入到取指信息队列;Writing the instruction fetch request into an instruction fetch information queue; 根据所述取指信息队列,查询所述指令缓存单元或所述指令缓存队列获取所述对象指令。According to the instruction fetch information queue, the instruction cache unit or the instruction cache queue is queried to obtain the object instruction. 12.根据权利要求11所述的指令处理方法,其中,所述根据所述取指信息队列,查询所述指令缓存单元或所述指令缓存队列获取所述对象指令,包括:12. The instruction processing method according to claim 11, wherein the step of querying the instruction cache unit or the instruction cache queue to obtain the object instruction according to the instruction fetch information queue comprises: 从所述取指信息队列的队首读取取指信息,判断当前的取指操作是否占用指令缺失队列;Reading instruction fetch information from the head of the instruction fetch information queue to determine whether the current instruction fetch operation occupies the instruction missing queue; 响应于所述当前的取指操作未占用所述指令缺失队列,查询所述指令缓存队列;或者In response to the current instruction fetch operation not occupying the instruction miss queue, querying the instruction cache queue; or 响应于所述当前的取指操作占用所述指令缺失队列,确定对于二级缓存的回填操作,并且,响应于对于所述二级缓存的回填操作完毕,查询所述指令缓存队列以读取所述对象指令,或者响应于对于所述二级缓存的回填操作未完毕,从所述二级缓存的回填总线读取所述对象指令。In response to the current instruction fetch operation occupying the instruction miss queue, a backfill operation for the secondary cache is determined, and, in response to the backfill operation for the secondary cache being completed, the instruction cache queue is queried to read the object instruction, or in response to the backfill operation for the secondary cache being incomplete, the object instruction is read from a backfill bus of the secondary cache. 13.根据权利要求12所述的指令处理方法,其中,所述根据所述取指信息队列,查询所述指令缓存单元或所述指令缓存队列获取所述对象指令,还包括:13. The instruction processing method according to claim 12, wherein the step of querying the instruction cache unit or the instruction cache queue to obtain the object instruction according to the instruction fetch information queue further comprises: 响应于查询所述指令缓存队列没有命中,查询所述指令缓存单元。In response to a query of the instruction cache queue with no hit, querying the instruction cache unit. 14.根据权利要求11所述的指令处理方法,其中,所述根据所述取指信息队列,查询所述指令缓存单元或所述指令缓存队列获取所述对象指令,包括:14. The instruction processing method according to claim 11, wherein the querying the instruction cache unit or the instruction cache queue to obtain the object instruction according to the instruction fetch information queue comprises: 响应于使用当前的取指操作查询指令缺失队列命中,标记占用所述指令缺失队列;In response to querying the instruction miss queue using the current instruction fetch operation and finding a hit, marking the instruction miss queue as occupied; 响应于所述当前的取指操作占用所述指令缺失队列,确定对于二级缓存的回填操作,或者,响应于所述当前的取指操作未占用所述指令缺失队列,查询所述指令缓存单元;In response to the current instruction fetch operation occupying the instruction miss queue, determining a backfill operation for the secondary cache, or, in response to the current instruction fetch operation not occupying the instruction miss queue, querying the instruction cache unit; 响应于所述二级缓存的回填操作完毕,查询所述指令缓存队列以读取所述对象指令,或者响应于所述二级缓存的回填操作完毕,从所述二级缓存的回填总线读取所述对象指令。In response to the backfill operation of the secondary cache being completed, querying the instruction cache queue to read the object instruction, or in response to the backfill operation of the secondary cache being completed, reading the object instruction from the backfill bus of the secondary cache. 15.根据权利要求1所述的指令处理方法,还包括:15. The instruction processing method according to claim 1, further comprising: 维持所述指令缓存单元和所述指令缓存队列之间的缓存一致性。Maintaining cache coherence between the instruction cache unit and the instruction cache queue. 16.根据权利要求15所述的指令处理方法,其中,所述维持所述指令缓存单元和所述指令缓存队列之间的缓存一致性,包括:16. The instruction processing method according to claim 15, wherein maintaining cache coherence between the instruction cache unit and the instruction cache queue comprises: 通过二级缓存发出监听操作来维持所述缓存一致性。The cache coherence is maintained by issuing snoop operations through the secondary cache. 17.根据权利要求16所述的指令处理方法,其中,所述通过二级缓存发出监听操作来维持所述缓存一致性,包括:17. The instruction processing method according to claim 16, wherein the maintaining the cache consistency by issuing a snoop operation through the secondary cache comprises: 响应于所述监听操作命中所述指令缓存队列的第一目标条目,将所述第一目标条目中的有效位清零,且判断所清空的所述第一目标条目的是否在做回填所述指令缓存单元操作;In response to the snoop operation hitting a first target entry of the instruction cache queue, clearing a valid bit in the first target entry, and determining whether the cleared first target entry is performing a backfill operation of the instruction cache unit; 响应于所述监听操作命中所述指令缓存单元的第二目标条目,将所述第一目标条目中的有效位清零。In response to the snoop operation hitting the second target entry of the instruction cache unit, clearing a valid bit in the first target entry. 18.一种处理器,包括:控制逻辑、取指单元、指令释放单元、指令缓存单元和指令缓存队列,18. A processor comprising: a control logic, an instruction fetch unit, an instruction release unit, an instruction cache unit and an instruction cache queue, 其中,所述取指单元配置为根据指令预取请求提取对象指令,将提取的所述对象指令缓存在所述指令缓存队列中;The instruction fetch unit is configured to fetch the object instruction according to the instruction prefetch request, and cache the fetched object instruction in the instruction cache queue; 所述指令释放单元配置为提供指令释放信息;The instruction release unit is configured to provide instruction release information; 所述控制逻辑配置为,响应于所述指令释放信息,将所述对象指令从所述指令缓存队列填写到指令缓存单元中,The control logic is configured to, in response to the instruction release information, fill the object instruction from the instruction cache queue into the instruction cache unit, 其中,所述指令缓存单元被配置为由所述取指单元直接访问。Wherein, the instruction cache unit is configured to be directly accessed by the instruction fetch unit. 19.根据权利要求18的处理器,还包括分支预测单元,其中,所述分支预测单元配置为根据分支预测结果产生所述指令预取请求;19. The processor according to claim 18, further comprising a branch prediction unit, wherein the branch prediction unit is configured to generate the instruction prefetch request according to a branch prediction result; 所述取指单元进一步配置为:The instruction fetch unit is further configured as: 根据所述指令预取请求获取指令预取地址;Acquire an instruction prefetch address according to the instruction prefetch request; 响应于根据所述指令预取地址进行指令预取操作,使用所述指令预取地址发送访问请求到二级缓存;In response to performing an instruction prefetch operation according to the instruction prefetch address, sending an access request to a secondary cache using the instruction prefetch address; 响应于所述二级缓存的指令数据回填操作,将对应于所述指令预取地址的所述对象指令写入到所述指令缓存队列。In response to the instruction data backfill operation of the secondary cache, the object instruction corresponding to the instruction prefetch address is written into the instruction cache queue. 20.根据权利要求19的处理器,还包括分支预测地址信息队列,其中,所述分支预测单元还配置为,在根据所述分支预测结果产生所述指令预取地址之后,将所述指令预取地址填充到所述分支预测地址信息队列,20. The processor according to claim 19, further comprising a branch prediction address information queue, wherein the branch prediction unit is further configured to, after generating the instruction prefetch address according to the branch prediction result, fill the instruction prefetch address into the branch prediction address information queue, 所述分支预测地址信息队列大小与所述指令缓存队列的大小相同。The size of the branch prediction address information queue is the same as the size of the instruction cache queue. 21.根据权利要求18的处理器,还包括指令缺失队列,其中,所述控制逻辑还配置为:21. The processor of claim 18, further comprising an instruction miss queue, wherein the control logic is further configured to: 根据所述指令释放信息,确定被释放的分支预测条目;Determining a released branch prediction entry according to the instruction release information; 根据所述被释放的分支预测条目,唤醒所述指令缺失队列中的目标条目;According to the released branch prediction entry, waking up the target entry in the instruction miss queue; 根据指令缺失队列中的目标条目记录的信息,将所述指令缺失队列的目标条目对应的所述指令缓存队列中的所述对象指令回填到所述指令缓存单元中;According to the information recorded in the target entry in the instruction missing queue, backfill the object instruction in the instruction cache queue corresponding to the target entry in the instruction missing queue into the instruction cache unit; 释放所述指令缺失队列中的目标条目。The target entry in the instruction miss queue is released. 22.根据权利要求18的处理器,其中,所述控制逻辑还配置为:进行监听操作以维持所述指令缓存单元和所述指令缓存队列之间的缓存一致性。22. The processor of claim 18, wherein the control logic is further configured to: perform a snoop operation to maintain cache coherence between the instruction cache unit and the instruction cache queue. 23.一种电子装置,包括根据权利要求18-22任一所述的处理器。23. An electronic device comprising the processor according to any one of claims 18-22.
CN202411266817.3A 2024-09-09 2024-09-09 Instruction processing method, processor and electronic device Pending CN119201234A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411266817.3A CN119201234A (en) 2024-09-09 2024-09-09 Instruction processing method, processor and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411266817.3A CN119201234A (en) 2024-09-09 2024-09-09 Instruction processing method, processor and electronic device

Publications (1)

Publication Number Publication Date
CN119201234A true CN119201234A (en) 2024-12-27

Family

ID=94041231

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411266817.3A Pending CN119201234A (en) 2024-09-09 2024-09-09 Instruction processing method, processor and electronic device

Country Status (1)

Country Link
CN (1) CN119201234A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119536810A (en) * 2025-01-22 2025-02-28 芯来智融半导体科技(上海)有限公司 Data processing method, device, equipment and medium of single-port RAM based on BHT
CN120743353A (en) * 2025-09-04 2025-10-03 北京翼华云网科技有限公司 Processor instruction reading control method and system and electronic equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119536810A (en) * 2025-01-22 2025-02-28 芯来智融半导体科技(上海)有限公司 Data processing method, device, equipment and medium of single-port RAM based on BHT
CN119536810B (en) * 2025-01-22 2025-04-11 芯来智融半导体科技(上海)有限公司 Data processing method, device, equipment and medium of single-port RAM based on BHT
CN120743353A (en) * 2025-09-04 2025-10-03 北京翼华云网科技有限公司 Processor instruction reading control method and system and electronic equipment
CN120743353B (en) * 2025-09-04 2025-12-02 北京翼华云网科技有限公司 Processor instruction reading control method and system and electronic equipment

Similar Documents

Publication Publication Date Title
US11816036B2 (en) Method and system for performing data movement operations with read snapshot and in place write update
US7213126B1 (en) Method and processor including logic for storing traces within a trace cache
CN108268385B (en) Optimized caching agent with integrated directory cache
CN115934170B (en) Pre-fetching method and device, pre-fetching training method and device, and storage medium
US8117389B2 (en) Design structure for performing cacheline polling utilizing store with reserve and load when reservation lost instructions
CN119201234A (en) Instruction processing method, processor and electronic device
CN108694057B (en) Apparatus and method for memory write back
KR20120024974A (en) Cache prefill on thread migration
TW201346556A (en) Coordinated prefetching in hierarchically cached processors
US11669454B2 (en) Hybrid directory and snoopy-based coherency to reduce directory update overhead in two-level memory
US20200104259A1 (en) System, method, and apparatus for snapshot prefetching to improve performance of snapshot operations
US10108548B2 (en) Processors and methods for cache sparing stores
US6711651B1 (en) Method and apparatus for history-based movement of shared-data in coherent cache memories of a multiprocessor system using push prefetching
US10705962B2 (en) Supporting adaptive shared cache management
US9009420B2 (en) Structure for performing cacheline polling utilizing a store and reserve instruction
JP2015527684A (en) System cache with sticky removal engine
CN118245218A (en) Cache management method, cache management device, processor and electronic device
CN116627506A (en) Microinstruction cache and operation method, processor core and instruction processing method
TWI531913B (en) Prefetch with request for ownership without data
US9983874B2 (en) Structure for a circuit function that implements a load when reservation lost instruction to perform cacheline polling
CN119440880B (en) Processor cache structure, processor and data cache method
CN115080464B (en) Data processing method and data processing device
KR20240067941A (en) Store representations of specific data patterns in spare directory entries
CN116627505A (en) Instruction cache and operation method, processor core and instruction processing method
CN114995884B (en) Instruction retirement unit, instruction execution unit, and related apparatus and methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination