[go: up one dir, main page]

CN119883366A - Instruction acquisition method, processor, system on chip and computing device - Google Patents

Instruction acquisition method, processor, system on chip and computing device Download PDF

Info

Publication number
CN119883366A
CN119883366A CN202411709265.9A CN202411709265A CN119883366A CN 119883366 A CN119883366 A CN 119883366A CN 202411709265 A CN202411709265 A CN 202411709265A CN 119883366 A CN119883366 A CN 119883366A
Authority
CN
China
Prior art keywords
instruction
current
storage unit
unit
jump
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411709265.9A
Other languages
Chinese (zh)
Inventor
郝帅
孙浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Damo Institute Hangzhou Technology Co Ltd
Original Assignee
Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Damo Institute Hangzhou Technology Co Ltd filed Critical Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority to CN202411709265.9A priority Critical patent/CN119883366A/en
Publication of CN119883366A publication Critical patent/CN119883366A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3005Arrangements for executing specific machine instructions to perform operations for flow control
    • G06F9/30069Instruction skipping instructions, e.g. SKIP
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/32Address formation of the next instruction, e.g. by incrementing the instruction counter
    • G06F9/321Program or instruction counter, e.g. incrementing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3804Instruction prefetching for branches, e.g. hedging, branch folding

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Advance Control (AREA)

Abstract

The embodiment of the specification provides an instruction acquisition method, a processor, a system on chip and computing equipment, wherein the instruction acquisition method is applied to an instruction acquisition unit of the processor, the processor further comprises a cache unit, a tightly coupled storage unit and a program counter, the instruction acquisition method comprises the steps of receiving branch prediction information for a current jump instruction, determining the current storage unit where the current jump instruction is located from the cache unit and the tightly coupled storage unit based on the branch prediction information, and taking out the current jump instruction from the current storage unit based on the offset of a current program count value recorded by the program counter, so that the time sequence pressure is reduced, the main frequency of the processor is improved, meanwhile, the extra dynamic power consumption caused by the simultaneous acquisition of the jump instruction by double branches is effectively avoided, the performance and product endurance of the processor when the non-continuous instruction is processed are improved, more efficient and energy-saving instruction acquisition is realized on the premise that the processing speed is not sacrificed, and the instruction acquisition method is applicable to application scenes with higher requirements on instantaneity and product endurance.

Description

Instruction acquisition method, processor, system on chip and computing device
Technical Field
Embodiments of the present disclosure relate to the technical field of electronic hardware, and in particular, to an instruction acquisition method, a processor, a system on a chip, and a computing device.
Background
With the development of semiconductor technology and processor architecture technology, modern processors are not only pursuing higher performance, but also need to meet the requirements of specific application fields on real-time performance and energy efficiency ratio.
Currently, for a high real-time scenario, the processor needs to ensure that the processing delay of the critical instruction is within a certain range, and if the processor acquires the critical instruction from the cache unit, the processor cannot ensure the stable delay by pulling the critical instruction from the memory outside the processor through the bus interface unit (Bus Interface Unit, abbreviated as BIU) under the condition that the critical instruction is not in the cache unit. Aiming at the problem, by adding an independent close-coupled storage unit in the processor, which is independent of the cache unit and exists simultaneously, the key instruction can be put in the storage unit, so that the key instruction acquisition under stable time delay is realized.
However, if the critical instruction is a discontinuous instruction such as a jump instruction, which may cause a sudden change of the instruction stream direction, an address needs to be compared in advance to determine whether the critical instruction is stored in the buffer unit or the close-coupled storage unit in the current jump, so as to obtain the critical instruction from the corresponding storage unit, and such a way that the instruction is obtained in correspondence with the early comparison address again generates a larger delay, which reduces the main frequency of the processor and affects the processing performance of the processor.
Disclosure of Invention
In view of this, the present embodiments provide an instruction fetch method. One or more embodiments of the present specification also relate to a processor, a system-on-a-chip, a computing device, a computer-readable storage medium, and a computer program product that address the technical deficiencies of the prior art.
According to a first aspect of embodiments of the present disclosure, there is provided an instruction fetching method applied to an instruction fetching unit of a processor, the processor further including a cache unit, a tightly coupled storage unit, and a program counter, including:
Receiving branch prediction information for a current jump instruction;
Determining a current storage unit where a current jump instruction is located from the cache unit and the tightly coupled storage unit based on branch prediction information;
and based on the offset of the current program count value recorded by the program counter, the current jump instruction is fetched from the current storage unit.
According to a second aspect of embodiments of the present disclosure, there is provided a processor including a finger fetch unit, a cache unit, a tightly coupled storage unit, and a program counter;
the instruction fetching unit is used for receiving branch prediction information aiming at the current jump instruction, determining a current storage unit where the current jump instruction is located from the cache unit and the close-coupled storage unit based on the branch prediction information, and fetching the current jump instruction from the current storage unit based on the offset of the current program count value recorded by the program counter.
According to a third aspect of embodiments of the present specification, there is provided a system on a chip comprising:
a control unit and a plurality of on-chip components including a processor;
The control unit is configured to control and manage a plurality of on-chip components, and the processor is configured to execute a computer program/instruction that, when executed by the processor, implements the steps of the instruction acquisition method described above.
According to a fourth aspect of embodiments of the present specification, there is provided a computing device comprising:
memory and system on chip;
the memory is used for storing a computer program/instruction, and the system on chip is used for executing the computer program/instruction, and the computer program/instruction realizes the steps of the instruction acquisition method when being executed by the system on chip.
According to a fifth aspect of embodiments of the present specification, there is provided a computer readable storage medium storing a computer program/instruction which, when executed by a processor, implements the steps of the instruction fetch method described above.
According to a sixth aspect of embodiments of the present specification, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the instruction fetch method described above.
In one embodiment of the specification, in the instruction fetching stage, based on branch prediction information for a current jump instruction, one of the current storage units where the current jump instruction is located is directly determined from the cache unit and the close-coupled storage unit, the current jump instruction is fetched from the current storage unit based on the offset of the current program count value recorded by the program counter, address comparison and instruction acquisition are not sequentially performed, time sequence pressure is reduced, the main frequency of the processor is improved, meanwhile, the problem that extra dynamic power consumption is brought by simultaneously acquiring the jump instruction by double branches is effectively avoided, the performance and product endurance of the processor when the discontinuous instruction is processed are improved, more efficient and energy-saving instruction acquisition is realized on the premise that the processing speed is not sacrificed, and the instruction fetching method is suitable for application scenes with higher requirements on instantaneity and product endurance.
Drawings
FIG. 1 is a pipeline stage schematic of an instruction fetch stage;
FIG. 2 is a flow chart of an instruction fetch method;
FIG. 3 is a flow chart of another method of instruction fetching;
FIG. 4 is a flow chart of a method of instruction fetching provided by one embodiment of the present description;
FIG. 5 is a flow chart of an instruction fetch method according to one embodiment of the present disclosure;
FIG. 6 is a second flow chart of an instruction fetch method according to one embodiment of the present disclosure;
FIG. 7 is a schematic diagram illustrating a method of instruction fetching according to an embodiment of the present disclosure;
FIG. 8 is a schematic diagram of a processor according to one embodiment of the present disclosure;
FIG. 9 is a block diagram of a system-on-chip provided in one embodiment of the present disclosure;
FIG. 10 is a block diagram of a computing device provided in one embodiment of the present description.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many other forms than described herein and similarly generalized by those skilled in the art to whom this disclosure pertains without departing from the spirit of the disclosure and, therefore, this disclosure is not limited by the specific implementations disclosed below.
The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The term "if" as used herein may be interpreted as "at..once" or "when..once" or "in response to a determination", depending on the context.
Furthermore, it should be noted that, user information (including, but not limited to, user equipment information, user personal information, etc.) and data (including, but not limited to, data for analysis, stored data, presented data, etc.) according to one or more embodiments of the present disclosure are information and data authorized by a user or sufficiently authorized by each party, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions, and is provided with corresponding operation entries for the user to select authorization or denial.
First, terms related to one or more embodiments of the present specification will be explained.
A reduced instruction set (Reduced Instruction Set Computing, RISC for short) a processor architecture design concept emphasizes the use of a smaller number of simple instructions to build a processor in order to increase the execution efficiency and speed of the processor.
A reduced instruction set Processor (RISC Processor) is a Processor designed based on the RISC concept, and is characterized by having a shorter instruction pipeline, a fixed instruction format and a simple addressing mode, and being capable of realizing high-speed processing.
A processor architecture design concept emphasizes the use of a greater number of complex instructions to build a processor in order to increase the functionality and flexibility of the processor.
Complex instruction set processors (CISC processors), processors designed based on the CISC concept, feature a large number of complex instructions, each of which can perform multiple operations.
The Cache unit (Cache) is a high-speed small-capacity random storage unit positioned between the processor and the main memory and is used for temporarily storing data and instructions which are accessed frequently recently so as to reduce the time for accessing the main memory and improve the working efficiency of the processor. The Cache unit is divided into an instruction Cache (Instruction Cache, abbreviated as ICache) and a data Cache (DATA CACHE, abbreviated as DCache) to optimize the instruction stream and the data stream respectively.
A tightly coupled memory unit (Tightly Coupled Memory, TCM for short) is a random access memory unit directly connected to the processor for storing critical data or instructions that require fast access with lower access latency than a normal cache. The tightly Coupled Memory units are divided into instruction tightly Coupled Memory units (Instruction Tightly Coupled Memory, ITCM) and data tightly Coupled Memory units (DATA TIGHTLY Coupled Memory, DTCM) to optimize instruction flow and data flow, respectively.
A static random access memory unit (Static Random Access Memory, SRAM for short) is a random access memory unit, and is characterized by that it can retain stable storage of data without need of refreshing circuit, and has quick access speed, and is commonly used for making high-speed cache.
The fetch unit (Instruction Fetch Unit, IFU) is a part of the processor, and is responsible for fetching the next instruction to be executed from the memory unit and sending it to the subsequent processing stage.
A Program Counter (PC) is a register storing the address of the next instruction, directing the fetch unit to fetch the next instruction from where.
The Load Store Unit (LSU) is responsible for handling data Load and Store operations, i.e., reading data from memory or writing data back to memory.
An Execution Unit (EXU) is responsible for executing instructions, including arithmetic logic operations, etc.
An arithmetic logic unit (ARITHMETIC LOGIC UNIT, ALU for short) is the portion of the processor that performs basic arithmetic operations (addition, subtraction, etc.) and logical operations (AND, OR, NOT, etc.).
A bus interface unit (Bus Interface Unit, BIU for short) is responsible for communication between the processor and external devices, including data transfer and address transmission.
Branch Unit (BJU) is a Unit of components for processing Branch instructions. It is responsible for analyzing branch conditions and predicting the direction of the branch to reduce pipeline stalls caused by waiting for branch instruction results, thereby improving the execution efficiency of the processor.
And the predictor is used for predicting the component unit of the next instruction address. By analyzing the historical data and the current condition, whether the branch instruction jumps to the target address is predicted, so that pipeline stagnation caused by waiting for the result of the branch instruction is reduced, and the execution efficiency of the processor is improved.
FIG. 1 shows a pipeline stage schematic of an instruction fetch stage, as shown in FIG. 1:
The instruction lifecycle, from fetched to execution, generally includes five stages, a fetch stage (Instruction Fetch, IF), a decode stage (Instruction Decode, ID), an execute stage (Instruction Execute, EX), a Memory Access stage (MEM), and a Memory Access stage (MEM).
Instruction fetching stage, the first stage of instruction cycle, the process of fetching instructions from memory into processor interior.
The decoding stage is used for converting the fetched instruction into control signals to determine the specific operation of the instruction and the process of the required operands.
And the execution stage is to execute specific operations, such as calculation, comparison and the like, according to the decoding result.
Memory access phase-a process that occurs when an instruction needs to access memory (read or write data).
And the write-back stage is used for writing back the execution result into a register or a memory to finish the execution of the instruction.
In the fetch stage of the fetch unit, four stages, namely a sequencer generation stage (Program Counter GENeration, PCGEN for short), a fetch stage, a Pre-decode stage (IP for short), and a cache stage (Instruction Buffer, IB for short), are typically subdivided.
The sequencer generation stage includes combinational logic that determines the current memory location where the current jump instruction is located.
The instruction fetching stage is responsible for accessing a first-level Cache (L1) instruction Cache and a predictor record, and completing conversion from a virtual address to a physical address.
The pre-decode stage is responsible for instruction pre-processing and initiates branch jump requests recorded by the predictor.
The caching stage initiates a branch jump request recorded by the miss predictor, initiates an indirect branch and a jump request returned by a function call, encapsulates and caches an instruction preprocessed by the pre-decoding stage, and simultaneously transmits at most two instructions to the decoding stage after the instruction fusion operation.
Because the sequencer generation stage is logically coupled to other blocks, it is not cached by registers, and sometimes is not a pipeline stage in the instruction fetch stage.
At present, fig. 2 shows a flow chart of an instruction fetch method, as shown in fig. 2:
The critical instructions are discontinuous instructions such as jump instructions, which may result in abrupt changes in instruction flow direction.
In the program counter generation stage, after receiving the combination logic of the branch prediction information (the instruction retirement unit skip, the branch unit skip, the pre-decode predictor skip, the cache predictor skip and the program count value increment) of the current jump instruction, address comparison judgment is carried out in advance, and according to whether the tightly coupled storage unit is hit or not, an access request is initiated to the cache unit or the tightly coupled storage unit according to the current program count value.
In the instruction fetching stage, the current jump instruction is fetched from the cache unit or the close-coupled storage unit.
In the advanced address determination scheme shown in fig. 2, the advanced comparison address corresponds to the instruction acquisition mode, which generates a larger delay in the processing time sequence, reduces the main frequency of the processor, and affects the processing performance of the processor.
For the above-mentioned timing delay problem, fig. 3 shows a flow chart of another instruction fetch method, as shown in fig. 3:
The critical instructions are discontinuous instructions such as jump instructions, which may result in abrupt changes in instruction flow direction.
In the generation stage of the program counter, after receiving the combination logic of the branch prediction information (the jump of the instruction retirement unit, the jump of the branch unit, the jump of the pre-decoding predictor, the jump of the cache predictor and the increment of the program count value) aiming at the current jump instruction, the current instruction is respectively fetched from the cache unit and the close coupling storage unit according to the current program count value of the program counter, and simultaneously, the target address calculation is carried out, and the target address is compared with the address range of the cache unit or the close coupling storage unit respectively to determine that the cache unit or the close coupling storage unit is the effective current storage unit.
In the fetch stage, the current jump instruction from the active current memory location is reserved.
In the post-issue option shown in fig. 3, although the timing pressure is smaller, since the other branch is additionally executed, even if the capacity of the tightly coupled memory unit is larger, if the instruction is acquired at the same time, larger dynamic power consumption is generated, which affects the endurance and heat dissipation of the processor.
It can be seen that the scheme shown in fig. 2 selects a mode of comparing addresses and then acquiring instructions, the scheme shown in fig. 3 selects a mode of comparing addresses while acquiring instructions and then reserving one mode according to address comparison results, so that the time delay caused by address comparison can not be effectively reduced while the instruction acquisition stability is ensured, additional dynamic power consumption is avoided, and the processing performance and product endurance of a processor are reduced.
In view of the foregoing, an instruction acquiring method is provided in the present specification, which relates to a processor, a system on a chip, a computing device, a computer-readable storage medium, and a computer program product, and the following embodiments are described in detail.
Referring to fig. 4, fig. 4 shows a flowchart of an instruction fetching method according to an embodiment of the present disclosure, where the method is applied to an instruction fetching unit of a processor, and the processor further includes a cache unit, a tightly coupled storage unit, and a program counter, and includes the following specific steps:
Step 402, receiving branch prediction information for a current jump instruction.
The processor is the core component in the computer system, responsible for executing instructions and handling data, and controlling the other parts of the computer system, such as the input/output devices, memory, and network interfaces. Processors perform a particular processing task by performing a complex series of operations, including fetching instructions from a storage medium, decoding the instructions, executing the instructions, accessing memory, and writing back results, etc. Processors include, but are not limited to, reduced instruction set processors and complex instruction set processors.
The instruction fetch unit is a component unit in the processor, which is responsible for fetching the next instruction to be executed from the storage unit. The instruction fetching unit is used for fetching the instruction from the storage unit in the instruction fetching stage and transmitting the instruction to the subsequent processing stages, such as a decoding stage of the decoding unit and an execution stage of the execution unit.
The caching unit is a storage unit used for temporarily storing data and instructions in the processor, and the data and instructions which are accessed frequently recently are stored in the caching unit, so that the number of times that the processor accesses the extra-core memory is reduced, the data access delay is reduced, and the working efficiency of the processor is improved. The cache elements are typically divided into multiple levels, each level differing in speed and capacity. In this embodiment, the cache unit may be an instruction cache unit. For example, in the three-level caches (L1, L2, L3), the L1 cache speed is the fastest but the capacity is the smallest, and the L3 cache speed is the slowest but the capacity is the largest.
The tightly coupled storage unit is a storage unit for temporarily storing key instructions in the processor, the tightly coupled storage unit and the cache unit are storage units of the same level, and compared with the cache unit, the instruction and data stored in the tightly coupled storage unit are generally distinguished by: such as interrupt handling instructions, real-time operating system kernel instructions, communication protocol stack instructions, etc. These critical instructions and data are very sensitive to delay and require a fast response. In the embodiment of the present disclosure, the close-coupled storage unit may be an instruction close-coupled storage unit.
The program counter is a register in the processor for storing the instruction address of the next instruction, and is used for guiding the instruction fetching unit to fetch the next instruction from the instruction address. The instruction address is a program count value, and the program count value is a storage unit address and points to a next instruction stored in a storage unit.
The jump instruction is a discontinuous instruction for changing the execution sequence of the program. Jump instructions enable the processor to skip certain instructions or jump to another location in the program to continue execution. Jump instructions are commonly used to implement conditional branching, looping, subroutine call and return logic, and the like. For example, unconditional jump to JMP label, regardless of the condition, jump to label for continued execution. And (3) conditional jump, namely if the zero flag bit (Z) is 1, jumping to the label to continue execution. Subroutine CALL, CALL function, jump to function and save return address. RET, return from subroutine, resume to instruction address before call.
The current jump instruction is a jump instruction currently being executed by the processor. The processor needs to parse the instruction and update the program counter value recorded by the program counter to point to the new instruction address. Execution of the current jump instruction may affect the control flow of the program, resulting in different execution paths.
The branch prediction information for the current jump instruction is information for predicting whether the current instruction is a jump instruction, and the branch prediction information is a prediction for the current jump instruction output by the branch prediction unit, including a prediction result (jump or no jump), and may further include a branch prediction unit of a source. Branch prediction information is typically represented as electrical signals transmitted between component units in a processor. The purpose of branch prediction is to reduce pipeline stalls caused by waiting for an instruction branch to jump, thereby improving the execution efficiency of the processor.
The branch prediction information for the current jump instruction is received, optionally by receiving the branch prediction information for the current jump instruction output by the branch prediction unit. The branch prediction unit is a component unit used for predicting jump instruction behavior in the processor, and includes, but is not limited to, an instruction retirement unit, a branch unit, a fetch predictor and a cache predictor.
Illustratively, in a RISC-V processor, a fetch unit, an instruction cache unit, an instruction tightly coupled storage unit, and a program counter are included. N instructions are executed in a pipelined manner in a RISC-V processor, and in the Instruction fetching stage of the ith Instruction, a fetching unit receives branch prediction information JumpSignal _i, which is output by a branch prediction unit and is aimed at a current jump Instruction instruction_i, and the predicted result is that a jump occurs, and the source is a branch unit.
In step 402, branch prediction information for a current jump instruction is received, providing an information basis for a subsequent determination of a current memory location in which the current jump instruction is located.
Step 404, determining a current storage unit where the current jump instruction is located from the cache unit and the tightly coupled storage unit based on the branch prediction information.
The current storage unit where the current jump instruction is located is a storage unit for storing the current jump instruction, which is predicted according to the branch prediction information, is a fetching object of a fetching stage of the current jump instruction, and is either a cache unit or a close-coupled storage unit, but cannot be both. The current storage unit may store a valid current jump instruction or an invalid current jump instruction.
Based on the branch prediction information, determining the current storage unit where the current jump instruction is located from the cache unit and the close-coupled storage unit, wherein in the optional mode, when the prediction result of the branch prediction information is that the jump occurs, the current storage unit where the current jump instruction is located is determined from the cache unit and the close-coupled storage unit.
For example, in the case that the predicted result of the branch prediction information JumpSignal _i is that a jump occurs, the current Storage Unit Storage unit_i where the current jump Instruction instruction_i is located is determined as the Instruction tightly coupled Storage Unit from the Instruction cache Unit and the Instruction tightly coupled Storage Unit.
Based on the branch prediction information, a current storage unit where the current jump instruction is located is determined from the cache unit and the close-coupled storage unit, and a basis for fetching an instruction object is provided for the subsequent acquisition of the current jump instruction.
Step 406, fetching the current jump instruction from the current storage unit based on the offset of the current program count value recorded by the program counter.
The current program count value is the instruction address of the next instruction (current jump instruction in the embodiment of the present specification) updated by the program counter after the execution of the previous instruction is completed. For example, the current program counter has a value of 0x1000, indicating that the next instruction to be executed is at address 0x1000. If the processor executes a jump instruction JMP 0x1010, the program counter is updated to 0x1010, indicating that the next instruction to be executed is at address 0x1010.
The offset of the current program count value is the offset of the instruction address from the last instruction to the next instruction. This offset is typically fixed, depending on the architecture of the processor. In most processors, the instruction length is fixed, so the offset is also a fixed value. For example, for a 32-bit processor, each instruction is 4 bytes in length, and thus the offset is 4 bytes. For a 64-bit processor, each instruction may be 8 bytes or more in length.
Based on the offset of the current program count value recorded by the program counter, the current jump instruction is fetched from the current storage unit, and an alternative way is to determine the instruction address based on the offset of the current program count value recorded by the program counter, and fetch the instruction corresponding to the instruction address from the current storage unit as the current jump instruction.
For example, based on the offset of the current program count value 0x1000 recorded by the program counter, the Instruction address 0 x1000+4=0x1004 is determined, and the Instruction corresponding to the Instruction address (0 x 1004) is fetched from the current Storage Unit Storage unit_i, which is the Instruction close-coupled Storage Unit, as the current jump Instruction instruction_i.
In the embodiment of the specification, in the instruction fetching stage, based on the branch prediction information for the current jump instruction, one of the current storage units where the current jump instruction is located is directly determined from the cache unit and the tightly coupled storage unit, the current jump instruction is fetched from the current storage unit based on the offset of the current program count value recorded by the program counter, address comparison and instruction acquisition are not sequentially performed any more, time sequence pressure is reduced, the main frequency of the processor is improved, meanwhile, the extra dynamic power consumption caused by simultaneous acquisition of the jump instruction by double branches is effectively avoided, the performance and product endurance of the processor when the discontinuous instruction is processed are improved, more efficient and energy-saving instruction acquisition is realized on the premise of not sacrificing the processing speed, and the instruction fetching method is suitable for application scenes with higher requirements on instantaneity and product endurance.
Aiming at the problems that the issue selection scheme shown in the figure 3 causes that the simultaneous acquisition of instructions can generate larger dynamic power consumption and influence the endurance and heat dissipation of a processor, the scheme 3 can be used for targeted optimization, one of the cache units or the tightly coupled storage units is directly selected as the current storage unit, the simultaneous acquisition of instructions from the two storage units is avoided, and the dynamic power consumption is reduced.
Correspondingly, in an alternative embodiment of the present disclosure, step 404 includes the following specific steps:
and under the condition that the predicted result of the branch prediction information is that the jump occurs, determining the last storage unit where the last jump instruction is located as the current storage unit where the current jump instruction is located from the cache unit and the tightly coupled storage unit.
The last jump instruction is the last executed jump instruction preceding the current jump instruction.
The last storage unit where the last jump instruction is located is a storage unit for storing the last jump instruction, is a fetching object of a fetching stage of the last jump instruction, and is either a cache unit or a close coupling storage unit, but cannot be both. The last time the memory cell stores the valid last jump instruction.
For example, in the case that the predicted result of the branch prediction information JumpSignal _i is that a jump occurs, a last Storage Unit Storage unit_i (i-1) where a last jump Instruction instruction_i (i-1) is located is determined from the Instruction cache Unit and the Instruction close-coupled Storage Unit, where the Instruction close-coupled Storage Unit is the current Storage Unit Storage unit_i where the current jump Instruction instruction_i is located.
In the embodiment of the specification, the retransmission scheme is adopted, so that the last storage unit where the last jump instruction is located is directly determined to be the current storage unit where the current jump instruction is located, the instruction is prevented from being acquired from two storage units at the same time, the time sequence pressure is reduced, the main frequency of the processor is improved, and the performance and the product endurance of the processor when the discontinuous instruction is processed are improved.
In an optional embodiment of the present disclosure, before determining, from the cache unit and the tightly coupled storage unit, that the last storage unit in which the last jump instruction is located is the current storage unit in which the current jump instruction is located, if the predicted result of the branch prediction information is that a jump occurs, the method further includes the following specific steps:
receiving branch prediction information for a first jump instruction;
Under the condition that the branch prediction information records that a first jump instruction jumps, determining that the cache unit and the tightly coupled storage unit are both first storage units;
Based on the offset of the current program count value recorded by the program counter, a first primary jump instruction is fetched from the cache unit, and based on the offset of the current program count value recorded by the program counter, a second primary jump instruction is fetched from the close-coupled storage unit;
Comparing the current program count value recorded by the program counter with the address ranges of the cache unit and the close-coupled storage unit, and determining an effective primary storage unit from the cache unit and the close-coupled storage unit;
The first primary jump instruction or the second primary jump instruction from the effective primary storage unit is reserved as an initial jump instruction.
The first jump instruction is a jump instruction that is first executed by the processor.
The branch prediction information for the first jump instruction is information for predicting whether the first instruction is a jump instruction, and the branch prediction information is a prediction for the first jump instruction output by the branch prediction unit, including a prediction result (jump or no jump), and may further include a branch prediction unit of a source. Branch prediction information is typically represented as electrical signals transmitted between component units in a processor.
The primary storage unit is a storage unit that may store a primary jump instruction. The primary storage unit is a cache unit and a close-coupled storage unit. When a jump instruction is executed for the first time, the processor needs to fetch the instruction from the cache unit and the tightly coupled storage unit at the same time to ensure integrity, since no history data (last storage unit) can be referenced.
The first jump instruction is a first jump instruction fetched from the cache unit. The first primary jump instruction may be a valid primary jump instruction or an invalid primary jump instruction, and the address needs to be verified in a subsequent comparison.
The second first jump instruction is a first jump instruction fetched from the close coupled memory location. The second primary jump instruction may be a valid primary jump instruction or an invalid primary jump instruction, requiring verification of the address at a subsequent comparison.
The address range of the cache unit is the address interval of the data and the instruction stored in the cache unit. The address range of the cache unit is usually fixed, which is determined by the hardware design. The processor determines whether the first jump instruction acquired from the cache unit is a valid first jump instruction by comparing the current program count value recorded by the program counter with the address range of the cache unit. The address range of the cache unit is 0x1000 to 0x1FFF.
The address range of the close-coupled memory unit is an address range of the data and the instruction stored in the close-coupled memory unit. The address range of the tightly coupled memory cells is also typically fixed, as determined by the hardware design. The processor determines whether the second first jump instruction acquired from the close coupled storage unit is a valid first jump instruction by comparing the current program count value recorded by the program counter with an address range of the close coupled storage unit. For example, the address range of the close-coupled memory cells is 0x2000 to 0x2FFF.
The effective primary storage unit is a primary storage unit for determining that a primary jump instruction is stored. When the jump instruction is executed for the first time, the processor needs to take out the instruction from the buffer memory unit and the tightly coupled storage unit respectively, then compares the current program count value recorded by the program counter with the address ranges of the two storage units, determines from which storage unit the stored jump instruction is valid, and the jump instruction in the valid first storage unit is reserved and executed subsequently.
Based on the offset of the current program count value recorded by the program counter, the first primary jump instruction is fetched from the cache unit, and an alternative way is to determine the instruction address based on the offset of the current program count value recorded by the program counter, and fetch the instruction corresponding to the instruction address from the cache unit as the first primary jump instruction.
Based on the offset of the current program count value recorded by the program counter, the second primary jump instruction is fetched from the close-coupled storage unit, and an alternative way is to determine the instruction address based on the offset of the current program count value recorded by the program counter, and fetch the instruction corresponding to the instruction address from the close-coupled storage unit as the second primary jump instruction.
Illustratively, in the Instruction fetch stage of Instruction 1, the Instruction fetch unit receives branch prediction information JumpSignal _1 for the first jump Instruction instruction_1 output by the branch prediction unit, the predicted outcome is the jump taken place, and the source is the branch unit. Under the condition that the branch prediction information JumpSignal _1 records that a jump occurs to a first jump instruction, the instruction cache Unit and the instruction close coupling Storage Unit are determined to be the first Storage Unit Storage unit_1. Based on the current program count value of 0x0000 and the offset of 4 recorded by the program counter, the instruction address 0x0000+4=0x0004 is determined. The Instruction corresponding to the Instruction address (0 x 0004) is fetched from the Instruction cache unit as a first primary jump Instruction Instruction_1.1, and the Instruction corresponding to the Instruction address (0 x 0004) is fetched from the Instruction close-coupled storage unit as a second primary jump Instruction Instruction_1.2. The address range of the instruction cache unit is 0x0000 to 0x0FFF. The address range of the instruction tightly coupled memory unit is 0x1000 to 0x1FFF. Comparing the target address 0x0004 with the address range 0x0000 to 0x0FFF of the instruction cache unit, the target address is found to be within the address range of the instruction cache unit. Comparing the target address 0x0004 with the address range 0x1000 to 0x1FFF of the instruction tightly coupled memory unit, the target address is found not to be in the address range of the instruction tightly coupled memory unit. And determining the instruction cache unit as a valid primary storage unit. The first primary jump Instruction instruction_1.1 from the active primary storage unit is reserved as an initial jump Instruction instruction_1.
In the embodiment of the specification, when the jump instruction is executed for the first time, the instruction is respectively taken out from the buffer memory unit and the tightly coupled storage unit, the current program count value recorded by the program counter is compared with the address ranges of the two storage units, and the effective initial storage unit is determined, so that the effective initial jump instruction can be ensured to be taken out from the correct storage unit, the additional dynamic power consumption caused by simultaneously acquiring the instruction from the two storage units is avoided, the time sequence pressure is reduced, the main frequency and the energy efficiency of the processor are improved, and the performance and the product endurance of the processor when the discontinuous instruction is processed are improved.
Although targeted optimization can be performed on the basis of scheme 3, one of the buffer units or the tightly coupled storage units is directly selected as the current storage unit, so that the instruction is prevented from being acquired from the two storage units at the same time, and the dynamic power consumption is reduced, the instantaneous power consumption of the processor still in jump is increased, and if the source type of the branch prediction information is relatively stable, the instruction acquisition can be completed in a speculative mode.
Correspondingly, in an alternative embodiment of the present disclosure, step 404 includes the following specific steps:
And under the condition that the predicted result of the branch prediction information is that the jump occurs, determining the current storage unit where the current jump instruction is located from the cache unit and the close-coupled storage unit based on the source type of the branch prediction information.
The source type of the branch prediction information is the output source of the branch prediction information, i.e., the component unit from which the information predicting whether the current instruction is a jump instruction is generated. Different source types reflect different prediction mechanisms and prediction accuracy, and have different effects on the performance and power consumption of the processor. The source types of branch prediction information include, but are not limited to, instruction retirement units, branch units, predictors (fetch predictors and cache predictors), and program count value incrementing.
It should be noted that, according to the habit judgment of using the tightly coupled storage unit, under a common use scenario, key instructions, such as an exception vector table and an exception handling function, are stored in the tightly coupled storage unit, and the jump of entering such a scenario is usually triggered by the instruction retirement unit. Normally executed codes generally enter from sources such as branch units and predictors, and have low latency requirements, so the default cache unit is the current memory unit.
For example, in the case that the predicted result of the branch prediction information JumpSignal _i is that a jump occurs, based on the source type of the branch prediction information JumpSignal _i, an Instruction retire Unit determines, from the Instruction cache Unit and the Instruction tightly coupled Storage Unit, that the current Storage Unit Storage unit_i where the current jump Instruction structure_i is located is the Instruction tightly coupled Storage Unit.
In the embodiment of the specification, under the condition that the predicted result of the branch prediction information is that the jump occurs, the current storage unit where the current jump instruction is located is directly determined from the cache unit and the tightly coupled storage unit based on the source type of the branch prediction information, so that the instantaneous power consumption of the processor can be effectively reduced, the prediction precision and the processing performance are improved, the extra dynamic power consumption caused by simultaneously acquiring the jump instruction by double branches is effectively avoided while the performance is ensured, and the performance and the product endurance of the processor when the discontinuous instruction is processed are improved.
In an optional embodiment of the present disclosure, in a case that a predicted result of the branch prediction information is that a jump occurs, determining, from the cache unit and the tightly coupled storage unit, a current storage unit in which the current jump instruction is located based on a source type of the branch prediction information, includes the following specific steps:
and determining the cache unit as a current storage unit where the current jump instruction is located under the condition that the predicted result of the branch prediction information is that the jump occurs and the source type of the branch prediction information is a branch unit.
The branch unit is a component unit used for predicting the jump instruction in the processor and is responsible for analyzing the branch condition and predicting the direction of the branch so as to reduce the pipeline stagnation caused by waiting for the jump instruction result, thereby improving the execution efficiency of the processor. The main functions of the branch unit include branch condition analysis, namely analyzing the condition zone bits (such as zero zone bit Z, negative zone bit N, overflow zone bit V and the like) of the jump instruction to determine whether the branch condition is met. Branch prediction, i.e. predicting whether an instruction will jump according to historical data and current conditions. Common prediction algorithms include static prediction and dynamic prediction. Branch instruction address calculation the instruction address of the jump instruction is calculated so that the fetch unit can fetch the next instruction from the correct address. Branch history maintaining a branch history table (Branch History Table, BHT for short) to record the behavior of past jump instructions for dynamic prediction.
Illustratively, in the case that the predicted outcome of the branch prediction information JumpSignal _i is that a jump occurs, the current Storage Unit Storage unit_i where the current jump Instruction structure_i is located is determined to be the Instruction cache Unit from the Instruction cache Unit and the Instruction tightly coupled Storage Unit based on the source type of the branch prediction information JumpSignal _i.
In the embodiment of the present disclosure, when the predicted result of the branch prediction information is that a jump occurs and the source type of the branch prediction information is a branch unit, the cache unit is directly determined to be the current storage unit where the current jump instruction is located, so that the latency requirement is not high, the instantaneous power consumption of the processor can be effectively reduced, and the prediction accuracy and the processing performance can be improved.
In an optional embodiment of the present disclosure, in a case that a predicted result of the branch prediction information is that a jump occurs, determining, from the cache unit and the tightly coupled storage unit, a current storage unit in which the current jump instruction is located based on a source type of the branch prediction information, includes the following specific steps:
And determining the tightly coupled storage unit as the current storage unit where the current jump instruction is located under the condition that the predicted result of the branch prediction information is that the jump occurs and the source type of the branch prediction information is the instruction retirement unit.
The instruction retire unit is a component unit in the processor responsible for validating the results of instruction execution and updating the state. The instruction retirement unit ensures that the results of the execution of the instructions are properly committed into the state of the processor, including updating registers and memory states. The instruction retirement unit may generate branch prediction information after confirming that the instruction is executed correctly, particularly for jump instructions, indicating whether a jump has occurred. The main functions of the instruction retirement unit include validating the instruction execution results, ensuring that the instruction has executed correctly and that no exceptions or errors have occurred. Updating state, namely updating the register and the memory state to reflect the execution result of the instruction. Generating branch prediction information for jump instructions, the instruction retirement unit may generate branch prediction information indicating whether a jump has occurred. Exception handling, namely handling possible abnormal situations in the process of executing instructions.
In an exemplary case where the predicted result of the branch prediction information JumpSignal _i is that a jump occurs, determining, based on the source type of the branch prediction information JumpSignal _i, from the Instruction cache Unit and the Instruction tightly coupled Storage Unit, that the current Storage Unit Storage unit_i where the current jump Instruction structure_i is located is the Instruction tightly coupled Storage Unit.
In the embodiment of the present disclosure, when the predicted result of the branch prediction information is that a jump occurs and the source type of the branch prediction information is the instruction retirement unit, the tightly coupled storage unit is directly determined to be the current storage unit where the current jump instruction is located, so that the time for entering an exception can be reduced, the processing flow can be accelerated, the instantaneous power consumption of the processor can be effectively reduced, and the prediction accuracy and the processing performance can be improved.
In an optional embodiment of the present disclosure, in a case that a predicted result of the branch prediction information is that a jump occurs, determining, from the cache unit and the tightly coupled storage unit, a current storage unit in which the current jump instruction is located based on a source type of the branch prediction information, includes the following specific steps:
And under the condition that the predicted result of the branch prediction information is that a jump occurs and the source type of the branch prediction information is a predictor, determining the last storage unit where the last jump instruction is located as the current storage unit where the current jump instruction is located from the cache unit and the tightly coupled storage unit.
The predictor is a component unit for predicting the address of the next instruction in the processor, and the predictor predicts whether the jump instruction jumps and the target address thereof by analyzing historical data and current conditions so as to reduce pipeline stagnation caused by waiting for the jump instruction result and improve the execution efficiency of the processor. The main functions of the predictor include static prediction, which predicts based on fixed rules, such as always predicting whether a jump instruction will jump or not. Dynamic prediction-prediction based on historical data-common dynamic prediction algorithms include local prediction-taking into account only the historical behavior of the current branch. Global prediction-taking into account the historical behaviour of all branches. Hybrid prediction-combining local prediction and global prediction. A branch target buffer (Branch Target Buffer, BTB for short) for storing the target address of a known jump instruction, speeds up the calculation of the branch target address. And the branch history record table is used for recording the behavior of the past jump instruction and supporting dynamic prediction.
In an alternative embodiment of the present description, the predictor is at least one of a pre-coding predictor and a cache predictor.
A Pre-decode predictor (Instruction Pre-decode Predictor) is a component unit in the processor that predicts the address of the next Instruction during the Instruction Pre-decode stage. By analyzing the historical data and the current conditions, whether the jump instruction jumps and the target address of the jump instruction are predicted, so that pipeline stagnation caused by waiting for the jump instruction result is reduced, and the execution efficiency of the processor is improved.
The cache predictor (Instruction Cache Predictor) is a component unit in the processor for predicting cache access patterns. According to the method, whether the next instruction or data hits in the cache is predicted by analyzing the historical data and the current condition, so that the use efficiency of the cache is optimized, the condition of cache miss is reduced, and the execution efficiency of a processor is improved.
Illustratively, in the case where the predicted outcome of the branch prediction information JumpSignal _i is that a jump has occurred, based on the source type of the branch prediction information JumpSignal _i, the pre-decode predictor determines, from the cache Unit and the tightly coupled Storage Unit, that the last Storage Unit Storage unit_i (i-1) in which the last jump Instruction instruction_i was located is the current Storage Unit Storage unit_i in which the current jump Instruction instruction_i was located.
In the embodiment of the present disclosure, when the predicted result of the branch prediction information is that a jump occurs and the source type of the branch prediction information is a predictor, the last storage unit where the last jump instruction is located is directly determined from the cache unit and the tightly coupled storage unit to be the current storage unit where the current jump instruction is located, so that the time delay requirement is not high, the instantaneous power consumption of the processor can be effectively reduced, and the prediction precision and the processing performance are improved.
Since in steps 402 to 406, the address comparison is not performed, the current memory location may be determined to be erroneous, and an invalid instruction is obtained, so that checking and correction may be performed in the instruction fetching stage, if an error occurs, the offset of the current program count value is retransmitted, and the current memory location is updated, and the current jump instruction is re-obtained.
In an alternative embodiment of the present disclosure, following step 406, the following specific steps are further included:
Comparing the current program count value recorded by the program counter with the address range of the current storage unit to determine whether the current program count value falls in the address range;
if not, updating the other storage unit from the cache unit and the tightly coupled storage unit to be the current storage unit where the current jump instruction is located;
And based on the offset of the current program count value, taking out the current jump instruction from the updated current storage unit.
The updated current storage unit is another storage unit updated after address comparison in the instruction fetching stage. This update process ensures the correctness and stability of instruction fetching, avoiding instruction fetch failure due to misprediction.
Illustratively, the fetch unit determines that the current memory location is an instruction tightly coupled memory location. The current program count value recorded by the program counter is compared with the address range of the current storage unit, wherein the current program count value is 0x2000, and the address range of the instruction close-coupled storage unit is 0x1000 to 0x1FFF. The comparison result shows that the current program count value 0x2000 is not in the address range of the instruction close-coupled Storage Unit, and the error is predicted before, and the instruction cache Unit is updated to be the current Storage Unit Storage unit_i. The current jump instruction is fetched from the instruction cache unit based on an offset of the current program count value 0x 2000. Instruction_i.
In the embodiment of the specification, after address comparison is performed in the instruction fetching stage, the other storage unit is updated to be the current storage unit where the current jump instruction is located, so that the accuracy and stability of instruction acquisition are ensured, the instruction acquisition failure caused by misprediction is avoided, and the high efficiency and stability under the application scene with high real-time performance and energy efficiency requirements are ensured.
Referring to the above embodiment, fig. 5 shows one of the flow diagrams of an instruction fetching method according to the embodiment of the present disclosure, as shown in fig. 5:
The critical instructions are discontinuous instructions such as jump instructions, which may result in abrupt changes in instruction flow direction.
In the generation stage of the program counter, after receiving the combination logic of the branch prediction information (the jump of the instruction retirement unit, the jump of the branch unit, the jump of the pre-decoding predictor, the jump of the cache predictor and the increment of the program count value) aiming at the first jump instruction, the current instruction is respectively fetched from the cache unit and the close coupling storage unit according to the current program count value of the program counter, the current instruction is sequentially executed, the address comparison judgment is carried out, and according to whether the close coupling storage unit is hit or not, the access request is initiated to the cache unit or the close coupling storage unit according to the current program count value.
And in the instruction fetching stage, the first jump instruction is fetched from the cache unit or the close-coupled storage unit.
After that, in the program counter generation stage, after receiving the combination logic of the branch prediction information (the instruction retirement unit skip, the branch unit skip, the pre-decode predictor skip, the cache predictor skip and the program count value increment) for the current jump instruction, if the prediction result of the branch prediction information is that the jump occurs, determining that the last storage unit where the last jump instruction is located is the current storage unit where the current jump instruction is located from the cache unit and the close-coupled storage unit, judging that the close-coupled storage unit is hit, and selecting to initiate an access request to the cache unit or the close-coupled storage unit according to the current program count value.
And in the instruction fetching stage, the current jump instruction is fetched from the cache unit or the close-coupled storage unit, whether an error occurs or not is determined through address comparison and judgment, and if the error occurs, a result is fed back.
In the retransmission scheme shown in fig. 5, the address comparison is bypassed, so that the time sequence pressure of the generation stage of the program counter is relieved, and the main frequency of the processor can be increased.
Referring to the above embodiment, fig. 6 shows a second flow chart of an instruction fetching method according to an embodiment of the present disclosure, as shown in fig. 6:
The critical instructions are discontinuous instructions such as jump instructions, which may result in abrupt changes in instruction flow direction.
In the program counter generation stage, after receiving the combination logic of the branch prediction information (instruction retirement unit skip, branch unit skip, pre-decode predictor skip, cache predictor skip and program count value increment) for the current jump instruction, determining the current storage unit where the current jump instruction is located from the cache unit and the close-coupled storage unit based on the source type of the branch prediction information (instruction retirement unit, branch unit, pre-decode predictor and cache predictor) under the condition that the prediction result of the branch prediction information is that the jump occurs, judging whether the close-coupled storage unit is hit or not, and selecting to initiate an access request to the cache unit or the close-coupled storage unit.
And in the instruction fetching stage, the current jump instruction is fetched from the cache unit or the close-coupled storage unit, and whether an error occurs or not is determined through address comparison and judgment, and if the error occurs, the instruction is retransmitted.
In the speculative scheme shown in fig. 6, the record/deduction is selected to select a branch path, so that the dynamic power consumption is greatly reduced under the condition that the performance is not reduced or slightly reduced, the product endurance is improved, and the speculative scheme is a preferred scheme when the speculative success rate is high.
For the four schemes of fig. 2, 3, 5 and 6, fig. 7 shows a comparison schematic diagram of an instruction fetch method according to an embodiment of the present disclosure:
the advance address determination scheme shown in fig. 2 is at the program counter generation stage:
selecting a jump target, comparing addresses, selecting a current storage unit, and sending an access request.
In the finger taking stage:
And acquiring a current jump instruction.
The address comparison is performed by comparing the time sequence pressure source, the logic of the part is more complex than that of other parts, and the main frequency of the processor can be influenced by excessive stages of other logic superposition.
The post-issue option shown in fig. 3, at the program counter generation stage:
selecting a jump target, comparing addresses, simultaneously issuing a request as a current storage unit, and sending an access request.
In the finger taking stage:
And selecting a source according to the address comparison result.
The address comparison is performed by comparing the time sequence pressure source, the logic of the part is more complex than that of other parts, and the main frequency of the processor can be influenced by excessive stages of other logic superposition. Meanwhile, the current storage unit issues a request as a power consumption increasing source, and the tightly coupled storage unit and the cache unit are accessed at the same time, but only instruction data of one of the tightly coupled storage unit and the cache unit is used, and the other path of instruction data is discarded, so that power consumption waste is caused.
The retransmission scheme shown in fig. 5, at the program counter generation stage:
Selecting a jump target, comparing addresses, simultaneously issuing a request as a current storage unit, and issuing an access request according to a comparison result after the first time.
In the finger taking stage:
obtaining current jump instruction, recording comparison result, selecting source according to address comparison result.
The address comparison is performed by comparing the time sequence pressure source, the logic of the part is more complex than that of other parts, and the main frequency of the processor can be influenced by excessive stages of other logic superposition. Meanwhile, the current storage unit issues a request as a power consumption increasing source, and the tightly coupled storage unit and the cache unit are accessed at the same time, but only instruction data of one of the tightly coupled storage unit and the cache unit is used, and the other path of instruction data is discarded, so that power consumption waste is caused. The record comparison result depends on the locality principle, the source of the skipped instruction will not switch, but if the switch will not generate error, the instruction will be resent to select another branch path.
The speculative scheme illustrated in fig. 6, at the program counter generation stage:
selecting jump target, comparing addresses, determining current storage unit according to source, and issuing access request according to comparison result after the first time.
In the finger taking stage:
obtaining current jump instruction, recording comparison result, selecting source according to address comparison result.
And determining the current storage unit according to the source, and selecting to access the tightly coupled storage unit or the cache unit, thereby saving the power consumption. Meanwhile, the current storage unit issues a request as a power consumption increasing source, and the tightly coupled storage unit and the cache unit are accessed at the same time, but only instruction data of one of the tightly coupled storage unit and the cache unit is used, and the other path of instruction data is discarded, so that power consumption waste is caused. The record comparison result depends on the locality principle, the source of the skipped instruction will not switch, but if the switch will not generate error, the instruction will be resent to select another branch path.
In summary, the schemes of fig. 5 and fig. 6, in combination with retransmission and speculative strategies, ensure that the performance is not reduced or is reduced slightly, and simultaneously greatly reduce the dynamic power consumption of the processor, so that the processor has higher performance, longer product endurance and lower heat dissipation requirements on embedded devices and edge computing devices.
Corresponding to the above method embodiments, the present disclosure further provides a processor embodiment, and fig. 8 shows a schematic structural diagram of a processor provided in one embodiment of the present disclosure. As shown in fig. 8, the processor 800 includes a finger unit 810, a cache unit 820, a close-coupled storage unit 830, and a program counter 840;
the fetching unit 810 is configured to receive branch prediction information for a current jump instruction, determine a current storage unit where the current jump instruction is located from the buffer unit 820 and the close-coupled storage unit 830 based on the branch prediction information, and fetch the current jump instruction from the current storage unit based on an offset of a current program count value recorded by the program counter 840.
In the embodiment of the specification, in the processor, the instruction fetching unit directly determines one of the current storage units where the current jump instruction is located from the cache unit and the tightly coupled storage unit based on the branch prediction information for the current jump instruction in the instruction fetching stage, and based on the offset of the current program count value recorded by the program counter, the current jump instruction is fetched from the current storage unit, and address comparison and instruction acquisition are not sequentially performed, so that the time sequence pressure is reduced, the main frequency of the processor is improved, meanwhile, the additional dynamic power consumption caused by simultaneous acquisition of the jump instruction by double branches is effectively avoided, the performance and product endurance of the processor when the discontinuous instruction is processed are improved, more efficient and energy-saving instruction acquisition is realized on the premise of not sacrificing the processing speed, and the instruction fetching method is suitable for application scenes with higher requirements on instantaneity and product endurance.
The above is a schematic solution of a processor of the present embodiment. It should be noted that, the technical solution of the processor and the technical solution of the instruction acquisition method belong to the same concept, and details of the technical solution of the processor, which are not described in detail, can be referred to the description of the technical solution of the instruction acquisition method.
Corresponding to the above embodiments of the instruction fetch method, the present disclosure further provides an embodiment of a system-on-chip, and fig. 9 shows a schematic structural diagram of a system-on-chip provided in one embodiment of the present disclosure, where the system-on-chip 900 includes, but is not limited to, a control unit 910 and a plurality of on-chip components 920, and the plurality of on-chip components 920 includes a processor 9210.
Wherein the control unit 910 is configured to control and manage a plurality of on-chip components 920.
Wherein the processor 9210 is configured to execute computer programs/instructions that, when executed by the processor 9210, perform the steps of the instruction acquisition method described above.
The system-on-chip 900 is a hardware device at a system-on-chip level that integrates a control unit 910 and a plurality of on-chip components 920, and typically integrates a control unit 910, and a plurality of on-chip components 920 such as a central processor, a digital signal processor, a processor, and an input/output interface (I/O). The system on chip 900 can improve performance, reduce power consumption, and reduce physical size, and can be suitable for various scenarios such as mobile devices, internet of things devices, embedded systems, edge computing devices, and the like. The specific embodiments of the present application are not limited to a specific implementation of a system-on-chip.
The control unit 910 is a core component in the system on a chip 900, and is responsible for decoding instructions, controlling execution flow, and coordinating with the multiple on-chip components 920. The control unit 910 controls the workflow of managing the plurality of on-chip components 920 by transmitting control signals, ensuring that the respective on-chip components can properly complete tasks in a predetermined order and logic.
The plurality of on-chip components 920 are various functional hardware components in the system-on-chip 900 that cooperate together to accomplish a particular task or function. The plurality of on-chip components 920 may include, but are not limited to, a central processor, a digital signal processor, a processor, input-output interfaces, and the like.
The above is a schematic solution of a system on chip of the present embodiment. It should be noted that, the technical solution of the system on chip and the technical solution of the instruction acquisition method belong to the same concept, and details of the technical solution of the system on chip, which are not described in detail, can be referred to the description of the technical solution of the instruction acquisition method.
Corresponding to the embodiments of the instruction fetch method and the system-on-chip described above, the present disclosure further provides an embodiment of a computing device, and fig. 10 shows a schematic structural diagram of a computing device provided in one embodiment of the present disclosure. The computing device 1000 includes, but is not limited to, a memory 1010 and a system on chip 1020.
Wherein the memory 1010 is used to store computer programs/instructions.
The system on chip 1020 is configured to execute a computer program/instruction that, when executed by the system on chip 1020, implements the steps of the instruction fetch method described above.
The computing device 1000 is an electronic apparatus having computing capabilities that can execute various software programs to perform particular functions or tasks. Computing device 1000 typically includes memory 1010 and hardware devices for performing tasks. In the present description embodiment, computing device 1000 includes, but is not limited to, a system on chip 1020 and memory 1010. The specific embodiments of the present application are not limited to a particular implementation of a computing device.
Memory 1010 is a component of computing device 1000 used to store data and computer programs/instructions, and is typically divided into two types, volatile and nonvolatile. Volatile memory (e.g., RAM) loses data after power failure, primarily for temporary storage of running programs and data, and non-volatile memory (e.g., ROM, flash memory) retains data after power failure for storage of firmware and persisted data.
The foregoing is a schematic illustration of a computing device of this embodiment. It should be noted that, the technical solution of the computing device and the technical solution of the instruction acquisition method belong to the same concept, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the instruction acquisition method.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The computer program/instructions comprise computer program code which may be in source code form, object code form, executable file or in some intermediate form, etc.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the embodiments are not limited by the order of actions described, as some steps may be performed in other order or simultaneously according to the embodiments of the present disclosure. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the embodiments described in the specification.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
The preferred embodiments of the present specification disclosed above are merely used to help clarify the present specification. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the teaching of the embodiments. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. This specification is to be limited only by the claims and the full scope and equivalents thereof.

Claims (12)

1. An instruction acquisition method is applied to an instruction fetch unit of a processor, the processor further comprises a cache unit, a tightly coupled storage unit and a program counter, and the instruction acquisition method comprises the following steps:
Receiving branch prediction information for a current jump instruction;
Determining a current storage unit where the current jump instruction is located from the cache unit and the tightly coupled storage unit based on the branch prediction information;
And based on the offset of the current program count value recorded by the program counter, the current jump instruction is fetched from the current storage unit.
2. The method of claim 1, wherein determining a current memory location of the current jump instruction from the cache location and the tightly coupled memory location based on the branch prediction information comprises:
And under the condition that the prediction result of the branch prediction information is that the jump occurs, determining the last storage unit where the last jump instruction is located as the current storage unit where the current jump instruction is located from the cache unit and the tightly coupled storage unit.
3. The method of claim 2, wherein in the case that the predicted outcome of the branch prediction information is that a jump occurs, determining, from the cache unit and the tightly coupled storage unit, that a last storage unit in which a last jump instruction is located is a current storage unit in which the current jump instruction is located, further comprising:
receiving branch prediction information for a first jump instruction;
under the condition that the branch prediction information records that the primary jump instruction jumps, determining that the cache unit and the close coupling storage unit are primary storage units;
based on the offset of the current program count value recorded by the program counter, a first primary jump instruction is fetched from the cache unit, and based on the offset of the current program count value recorded by the program counter, a second primary jump instruction is fetched from the close-coupled storage unit;
Comparing the current program count value recorded by the program counter with the address ranges of the cache unit and the close-coupled storage unit, and determining an effective primary storage unit from the cache unit and the close-coupled storage unit;
And reserving the first primary jump instruction or the second primary jump instruction from the effective primary storage unit as an initial jump instruction.
4. The method of claim 1, wherein determining a current memory location of the current jump instruction from the cache location and the tightly coupled memory location based on the branch prediction information comprises:
And under the condition that the predicted result of the branch prediction information is that a jump occurs, determining a current storage unit where the current jump instruction is located from the cache unit and the close-coupled storage unit based on the source type of the branch prediction information.
5. The method of claim 4, wherein, in the case that the predicted outcome of the branch prediction information is that a jump occurs, determining, based on the source type of the branch prediction information, a current storage unit in which the current jump instruction is located from the cache unit and the tightly coupled storage unit, comprises:
And determining the cache unit as a current storage unit where the current jump instruction is located under the condition that the predicted result of the branch prediction information is that a jump occurs and the source type of the branch prediction information is a branch unit.
6. The method of claim 4, wherein, in the case that the predicted outcome of the branch prediction information is that a jump occurs, determining, based on the source type of the branch prediction information, a current storage unit in which the current jump instruction is located from the cache unit and the tightly coupled storage unit, comprises:
And determining the tightly coupled storage unit as the current storage unit where the current jump instruction is located under the condition that the predicted result of the branch prediction information is that the jump occurs and the source type of the branch prediction information is an instruction retirement unit.
7. The method of claim 4, wherein, in the case that the predicted outcome of the branch prediction information is that a jump occurs, determining, based on the source type of the branch prediction information, a current storage unit in which the current jump instruction is located from the cache unit and the tightly coupled storage unit, comprises:
And under the condition that the predicted result of the branch prediction information is that a jump occurs and the source type of the branch prediction information is a predictor, determining the last storage unit where the last jump instruction is located as the current storage unit where the current jump instruction is located from the cache unit and the tightly coupled storage unit.
8. The method of claim 7, the predictor is at least one of a pre-coding predictor and a cache predictor.
9. The method of any of claims 1-8, further comprising, after the fetching the current jump instruction from the current memory location based on the offset of the current program count value recorded by the program counter:
Comparing the current program count value recorded by the program counter with the address range of the current storage unit to determine whether the current program count value falls within the address range;
If not, updating the other storage unit from the buffer memory unit and the tightly coupled storage unit to be the current storage unit where the current jump instruction is located;
And based on the offset of the current program count value, the current jump instruction is fetched from the updated current storage unit.
10. A processor, comprising a finger fetch unit, a cache unit, a tightly coupled storage unit and a program counter;
The instruction fetching unit is configured to receive branch prediction information for a current jump instruction, determine, from the cache unit and the tightly coupled storage unit, a current storage unit in which the current jump instruction is located based on the branch prediction information, and fetch the current jump instruction from the current storage unit based on an offset of a current program count value recorded by the program counter.
11. A system on a chip, comprising:
a control unit and a plurality of on-chip components, the plurality of on-chip components including a processor;
The control unit is adapted to control and manage the plurality of on-chip components, the processor being adapted to execute a computer program/instruction which, when executed by the processor, carries out the steps of the method according to claims 1 to 9.
12. A computing device, comprising:
memory and system on chip;
the memory is adapted to store a computer program/instruction for execution by the system on chip, which computer program/instruction when executed by the system on chip implements the steps of the method according to claims 1 to 9.
CN202411709265.9A 2024-11-26 2024-11-26 Instruction acquisition method, processor, system on chip and computing device Pending CN119883366A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411709265.9A CN119883366A (en) 2024-11-26 2024-11-26 Instruction acquisition method, processor, system on chip and computing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411709265.9A CN119883366A (en) 2024-11-26 2024-11-26 Instruction acquisition method, processor, system on chip and computing device

Publications (1)

Publication Number Publication Date
CN119883366A true CN119883366A (en) 2025-04-25

Family

ID=95428519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411709265.9A Pending CN119883366A (en) 2024-11-26 2024-11-26 Instruction acquisition method, processor, system on chip and computing device

Country Status (1)

Country Link
CN (1) CN119883366A (en)

Similar Documents

Publication Publication Date Title
US7437537B2 (en) Methods and apparatus for predicting unaligned memory access
JP5335946B2 (en) Power efficient instruction prefetch mechanism
US6151662A (en) Data transaction typing for improved caching and prefetching characteristics
US6029228A (en) Data prefetching of a load target buffer for post-branch instructions based on past prediction accuracy's of branch predictions
EP1441284B1 (en) Apparatus and method for efficiently updating branch target address cache
US5774710A (en) Cache line branch prediction scheme that shares among sets of a set associative cache
US20140075156A1 (en) Fetch width predictor
CN110806900B (en) Memory access instruction processing method and processor
US20080072024A1 (en) Predicting instruction branches with bimodal, little global, big global, and loop (BgGL) branch predictors
CN103257849A (en) Program execution control device
JP5745638B2 (en) Bimodal branch predictor encoded in branch instruction
EP1439459B1 (en) Apparatus and method for avoiding instruction fetch deadlock in a processor with a branch target address cache
US6912650B2 (en) Pre-prefetching target of following branch instruction based on past history
CN112241288A (en) Detecting dynamic control flow reconvergence points for conditional branches in hardware
US20100269118A1 (en) Speculative popcount data creation
US20030149860A1 (en) Stalling Instructions in a pipelined microprocessor
JP3486690B2 (en) Pipeline processor
CN101371223B (en) Early conditional selection of an operand
US20030204705A1 (en) Prediction of branch instructions in a data processing apparatus
US12423075B2 (en) Code prefetch instruction
CN112395000B (en) Data preloading method and instruction processing device
CN116738414A (en) Attack detection method, processor, electronic device and storage medium
EP0415351A2 (en) Data processor for processing instruction after conditional branch instruction at high speed
CN119883366A (en) Instruction acquisition method, processor, system on chip and computing device
EP2816466A1 (en) Data processing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination