[go: up one dir, main page]

CN119781827A - A loop execution method, system, program product, medium and device - Google Patents

A loop execution method, system, program product, medium and device Download PDF

Info

Publication number
CN119781827A
CN119781827A CN202411755705.4A CN202411755705A CN119781827A CN 119781827 A CN119781827 A CN 119781827A CN 202411755705 A CN202411755705 A CN 202411755705A CN 119781827 A CN119781827 A CN 119781827A
Authority
CN
China
Prior art keywords
loop
immediate
instruction
branch instruction
loop body
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411755705.4A
Other languages
Chinese (zh)
Inventor
闻军会
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Bitmap Information Technology Co ltd
Original Assignee
Chongqing Bitmap Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Bitmap Information Technology Co ltd filed Critical Chongqing Bitmap Information Technology Co ltd
Priority to CN202411755705.4A priority Critical patent/CN119781827A/en
Publication of CN119781827A publication Critical patent/CN119781827A/en
Pending legal-status Critical Current

Links

Landscapes

  • Advance Control (AREA)

Abstract

本发明提供了一种循环体执行方法、系统、程序产品、介质及设备。所述一种循环体执行方法通过自动推断循环体,不需要对指令集或编译器有任何修改,对编程没有额外要求,扩大了应用范围;使用循环缓冲器缓存循环体指令,剔除循环变量累加指令和比较指令,加速循环的执行,还避免了每次从内存取指,进一步提高了运行效率,降低了功耗。还通过检测所述目的地址中的指令是否在失败缓冲器中,所述循环缓冲器是否已满,是否对对所述两个操作数中的一个进行多次立即数运算,或者对所述一个操作数进行立即数运算后又对所述另一个操作数进行立即数运算,确保循环的正确性,减少无意义的比较,进一步增加执行效率。

The present invention provides a loop body execution method, system, program product, medium and device. The loop body execution method automatically infers the loop body, does not require any modification to the instruction set or compiler, has no additional requirements for programming, and expands the scope of application; uses a loop buffer to cache loop body instructions, removes loop variable accumulation instructions and comparison instructions, accelerates the execution of the loop, and avoids fetching instructions from the memory each time, further improves the operating efficiency, and reduces power consumption. It also detects whether the instruction in the destination address is in the fail buffer, whether the loop buffer is full, whether multiple immediate number operations are performed on one of the two operands, or whether an immediate number operation is performed on one operand and then an immediate number operation is performed on the other operand, to ensure the correctness of the loop, reduce meaningless comparisons, and further increase execution efficiency.

Description

Method, system, program product, medium and equipment for executing circulating body
Technical Field
The present invention relates to the field of digital signal processing, and in particular, to a method, a system, a program product, a medium, and a device for executing a cyclic body.
Background
Artificial Intelligence (AI) operations have significant and broad roles. The AI operation can rapidly identify, process and analyze a large amount of data, and improves the calculation efficiency and accuracy. For example, in the financial industry, AI operations predict stock prices by analyzing historical transaction records and market conditions, providing support for investment decisions. In addition, AI operation techniques extract new insights from a large number of data sets and generate new capabilities through machine learning algorithms. For example, america traffic uses AI operations to detect fraud in credit card transactions. In addition, AI operations have been widely used in various fields of medical treatment, education, transportation, and the like. In the medical field, AI operation can make accurate judgement to patient's state of an illness, diagnosis and treatment scheme through data analysis, improves patient's survival rate.
In the execution of AI operations for applications, it is common to process loops, especially a large number of matrix, vector operations, which when implemented are calculated element by loops. Generally, the loop body needs to accumulate loop variables except for data operation instructions in the middle loop body, and make a judgment at the end of the loop body. In a short cycle, the performance is greatly affected by the overhead caused by accumulation of the number of cycles and judgment of whether the cycle is ended. In order to speed up the execution of the program, a branch instruction (loop jump instruction) is often executed by a speculative method, that is, it is determined whether to jump out of the loop by directly estimating (predicting) the execution result of the instruction, and if the speculation is wrong, the pipeline is flushed, and the performance is further lost.
Some central processing units (cpus) provide special instructions for loops, e.g., the x86 instruction set provides instructions beginning with REP, and provide counters for loops to help speed up execution of loop blocks. However, not all CPUs (e.g., RISCV) provide such instructions. Patent CN107450888B proposes a zero overhead loop in an embedded digital signal processor, but its operation is based on the presence of loop instructions. And when no such instruction is present in the instruction set, the method is not effective. Therefore, there is a need for an efficient AI operation acceleration method for loops that is more widely applied.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a circulating body execution method, a circulating body execution system, a circulating body execution program product, a circulating body execution medium and circulating body execution equipment, and mainly solves the problem that an effective AI operation acceleration method which is widely applied and aims at circulating is lacking in the prior art.
The invention aims at realizing the following scheme:
According to the embodiment of the invention, a loop body execution method is provided, which comprises the following steps of S1, responding to detection of a branch instruction, determining whether a jump destination address of the branch instruction is smaller than a program counter of the branch instruction, if yes, recording the program counter of the branch instruction in a loop body end position register, recording two operands in the branch instruction, recording the comparison type of jump conditions in the branch instruction in a loop comparator, S2, jumping to the destination address to fetch a new instruction, determining whether an immediate operation is carried out on one of the two operands, if yes, storing the operand which carries out the immediate operation in a loop data register, storing the other operand in a constant register, storing the immediate operation in a step length register, if not, storing the new instruction in a loop buffer, sequentially fetching the next instruction, S3, judging whether the program counter of the next instruction is smaller than the program counter of the branch instruction, if yes, repeating the step S4, and when the comparison of the program counter of the next instruction in the loop buffer is executed, determining whether the immediate operation is equal to the instruction and accumulating the accumulated condition is met or not.
According to the embodiment of the invention, the method for executing the loop body further comprises the step of detecting whether the instruction in the destination address is in a failure buffer or not in S2, and if so, skipping the rest steps, wherein the failure buffer is used for storing a program counter of the last detected backward jump but not a loop instruction, and the backward jump is that the destination address of the backward jump is smaller than the program counter of the branch instruction.
According to an embodiment of the present invention, the types of comparison include greater than, less than, equal to, greater than or equal to, and less than or equal to.
According to an embodiment of the invention, the immediate operations include an immediate addition operation and an immediate subtraction operation.
According to an embodiment of the invention, the immediate operation is performed using an adder.
According to an embodiment of the present invention, when the immediate operation is an immediate subtraction operation, an addition operation is performed using the complement of the immediate.
According to an embodiment of the invention, the immediate may be any non-zero integer.
According to an embodiment of the present invention, the loop body execution method further includes, in the step S2, when the loop buffer is full and an address of a next instruction is smaller than a value in the loop body end position register, putting the value in the loop body end position register into a failure buffer, and skipping the remaining steps.
According to an embodiment of the present invention, the loop body execution method further includes, in the step S2, when performing a plurality of immediate operations on one of the two operands, placing the value in the loop body end position register into a failure buffer, and skipping the remaining steps.
According to the embodiment of the invention, the method for executing the cyclic body further comprises the steps of putting the value in the end position register of the cyclic body into a failure buffer and skipping the rest steps when the immediate operation is carried out on the one operand and then the immediate operation is carried out on the other operand in S4.
According to a second aspect, according to another embodiment of the present invention, there is provided a loop body execution system based on branch instruction detection, including a loop controller executing a program by using a loop body execution method according to the first aspect, a loop comparator for judging a loop termination condition, a loop adder for accumulating a loop variable, a loop body end position register for storing a program counter of a last branch instruction of a loop body, a loop buffer for temporarily storing an instruction of a loop body, a failure buffer for temporarily storing a program counter of a backward jump branch instruction which has been judged to be non-loop recently, a loop data register for storing a value of a loop accumulation variable, a constant register for storing a value of a loop threshold constant, and a step size register for storing a step size of loop variable accumulation.
In a third aspect, according to a further embodiment of the present invention, there is provided a computer program product comprising computer program code which, when run on an electronic device, causes the electronic device to perform the method according to the first aspect.
In a fourth aspect, according to a further embodiment of the present invention, there is provided a computer readable storage medium having stored thereon a computer program executable by a processor to perform the steps of the method according to the first aspect.
In a fifth aspect, according to a further embodiment of the present invention, there is provided an electronic device comprising one or more processors, and a memory, wherein the memory is for storing executable instructions, the one or more processors being configured to implement the steps of the method according to the first aspect via execution of the executable instructions.
Compared with the prior art, the method has the advantages that the loop body is automatically inferred, no modification is needed to an instruction set or a compiler, no additional requirement is needed to programming, the application range is expanded, the loop body instruction is cached by using the loop buffer, the loop variable accumulation instruction and the comparison instruction are removed, the execution of the loop is accelerated, the instruction is prevented from being fetched from the memory each time, the operation efficiency is further improved, and the power consumption is reduced.
Drawings
Embodiments of the invention are further described below with reference to the accompanying drawings:
FIG. 1 is a flow chart of a loop body execution method based on branch instruction detection according to an embodiment of the invention;
FIG. 2 is a diagram of a loop body execution system based on branch instruction detection according to an embodiment of the present invention.
Detailed Description
Before proceeding with the following detailed description, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document. The terms "coupled," "connected," and derivatives thereof, refer to any direct or indirect communication or connection between two or more elements, whether or not those elements are in physical contact with one another. The terms "transmit," "receive," and "communicate," and derivatives thereof, encompass both direct and indirect communication. The terms "include" and "comprise," as well as derivatives thereof, mean inclusion without limitation. The term "or" is inclusive, meaning and/or. The phrase "associated with" and its derivatives are intended to include, be included within, interconnect with, contain, be included within, connect to, or be in communication with, mate with, interleave, juxtapose, proximity to, bind to, have an attribute of, have a relationship with, or have a relationship with, etc. the phrase "associated with" and its derivatives are intended to include, be included within, be connected to, or be in communication with the phrase. The term "controller" refers to any device, system, or portion thereof that controls at least one operation. Such a controller may be implemented in hardware, or a combination of hardware and software and/or firmware. The functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. The phrase "at least one," when used with a list of items, means that different combinations of one or more of the listed items may be used, and that only one item in the list may be required. For example, "at least one of A, B, C" includes any one of the combinations A, B, C, A and B, A and C, B and C, A and B and C.
Definitions for other specific words and phrases are provided throughout this patent document. Those of ordinary skill in the art should understand that in many, if not most instances, such definitions apply to prior as well as future uses of such defined words and phrases.
In this patent document, the application combinations of modules and the division levels of sub-modules are for illustration only, and the application combinations of modules and the division levels of sub-modules may have different manners without departing from the scope of the disclosure.
For the purpose of making the technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail by way of specific embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In order to more clearly illustrate the present invention, a review of the prior art loop instruction segment execution method is first made.
For example, one for loop, 100 times, where vector addition of b [ i ] and c [ i ] is performed:
for(i=0;i<100,i++){
A[i]=b[i]+c[i]
}
After compiling, the instruction blocks form a series of instruction codes, and the approximate assembly pseudo codes are as follows:
X10=0; # corresponds to i=0
X20=100; # number of cycles
Loop: # cycle body
LOAD X12, (b_addr) # places B into register X12
LOAD X13, (c_addr) # places C into register X13
ADD X11, X12, x13# calculate a=b+c, and the result is put into X11
STORE (A_ADDR), x11# STOREs results back into memory
Add A_ADDR,1# address accumulation
ADD B_ADDR,1# address accumulation
ADD C_ADDR,1# address accumulation
ADD X10,1#i =i+1, enter the next cycle
BLT X10, X20, loop# if less than 100, returns to computing the next set of elements at Loop
It can be seen that except for the middle ai=b i+c I, the for statement needs to accumulate the variable I and make a decision at the end of the loop. As described in the background art, in a short cycle, the performance is greatly affected by the overhead caused by accumulation of the cycle number and judgment of whether the cycle is ended. Furthermore, executing branch instructions speculatively may further lose performance. Therefore, there is a lack of an efficient AI operation acceleration method for loops that is more widely used in the prior art.
In response to detecting a branch instruction, determining whether a jump destination address of the branch instruction is smaller than a program counter of the branch instruction, if yes, recording the program counter of the branch instruction in a loop body end position register, recording two operands in the branch instruction, recording a comparison type of a jump condition in the branch instruction in a loop comparator, S2 jumping to the destination address to fetch a new instruction, determining whether to perform an immediate operation on one of the two operands, if yes, storing the operand performing the immediate operation in a loop data register, storing the other operand in a constant register, storing the immediate operation in a step length register, if not, storing the new instruction in a loop buffer, sequentially fetching the next instruction, S3 determining whether the program counter of the next instruction is smaller than the branch instruction, if yes, storing the new instruction in the loop buffer, and if yes, and determining whether the program counter of the next instruction is equal to the instruction in the loop buffer, and if the comparison condition is satisfied, and accumulating the instruction is executed from the loop buffer.
The method has the advantages that the loop body is automatically deduced without any modification to an instruction set or a compiler and additional requirements to programming, the application range is enlarged, the loop body instructions are cached by using the loop buffer, the loop variable accumulation instructions and the comparison instructions are removed, the execution of the loop is accelerated, the instruction fetching from the memory each time is avoided, the running efficiency is further improved, and the power consumption is reduced.
In addition, to reduce meaningless comparisons and further increase execution efficiency, according to an embodiment of the present invention, the loop body execution method further includes detecting whether an instruction in the destination address is in a failure buffer, and if so, skipping the remaining steps, wherein the failure buffer is used for storing a program counter of a last detected backward-skip but not loop instruction, and the backward-skip is that the destination address of the skip is smaller than the program counter of the branch instruction.
In order to enhance the adaptability of the loop body execution method, according to an embodiment of the present invention, the types of comparison include greater than, less than, equal to, greater than or equal to, and less than or equal to. In addition, other judging modes for the cycle ending condition can be supported according to the actual use scene.
According to an embodiment of the invention, the immediate operations include an immediate addition operation and an immediate subtraction operation. The immediate operation is used to jump to the next cycle. For positive versatility, whether immediate addition operations and immediate subtraction operations, the immediate operations are performed using adders according to embodiments of the present invention. According to an embodiment of the present invention, when the immediate operation is an immediate subtraction operation, an addition operation is performed using the complement of the immediate.
Furthermore, to increase the adaptability of the loop body execution method, the immediate may be any non-zero integer. That is, 1 may be added (subtracted) each time, or 2 may be added each time, and any non-zero integer may be added, so as to select and set according to the actual use scenario.
In addition, in order to ensure that program execution does not exceed hardware limitations, according to an embodiment of the present invention, the loop body execution method further includes, in the step S2, when the loop buffer is full and an address of a next instruction is smaller than a value in the loop body end position register, placing the value in the loop body end position register into a failure buffer, and skipping the remaining steps. If this happens, this means that the program is beyond what the hardware can withstand, as it may be that the loop body is too large or not loop at all.
In addition, in order to ensure the correctness of the loop, according to the embodiment of the invention, the method for executing the loop body further comprises the steps of putting the value in the loop body end position register into a failure buffer and skipping the rest when performing a plurality of immediate operations on one of the two operands in the step S2. As is known from the general definition, the conditions of the general cycle are not satisfied in this case.
In order to further ensure the correctness of the loop, according to the embodiment of the invention, the method further comprises the steps of putting the value in the loop body end position register into a failure buffer and skipping the rest steps when the immediate operation is performed on the one operand and then the immediate operation is performed on the other operand in the step S4. Likewise, this case does not meet the conditions of the general cycle.
According to another embodiment of the present invention, as shown in FIG. 2, there is provided a loop body execution system based on branch instruction detection, including a loop controller (not shown) executing a program using a loop body execution method as described above; a loop comparator (loop cmp) for determining a loop termination condition (jump? the loop adder (loop add) for accumulating the loop variable, a loop body end position register (not shown) for storing a program counter (loop pc) of the last branch instruction of the loop body, a loop buffer (loop buffer) for temporarily storing the instruction of the loop body, a fail buffer (fail cmp) for temporarily storing a program counter of the backward jump branch instruction which is judged to be non-loop recently, a loop data register (acc_value) for storing the value of the loop accumulation variable, a constant register (const_value) for storing the value of the loop threshold constant, and a step size register (step) for storing the step size of the loop variable accumulation. It can be seen that the loop variable accumulation instruction and loop determination in the original loop body are performed in the adder and comparator, while the other instructions (inst 0-4) are placed in the loop buffer. By the arrangement, the value does not need to be taken in the memory, and the instruction execution in the circulation buffer, the circulation variable accumulation instruction and the circulation judgment instruction can be performed simultaneously, so that the execution efficiency is improved.
According to yet another embodiment of the invention, a computer program product is provided, comprising computer program code which, when run on an electronic device, causes the electronic device to perform a loop body execution method as described above.
According to a further embodiment of the present invention, there is provided a computer readable storage medium having stored thereon a computer program executable by a processor to perform the steps of a method of performing a loop body as described above.
According to yet another embodiment of the present invention, there is provided an electronic device comprising one or more processors and memory, wherein the memory is for storing executable instructions, the one or more processors being configured to implement, via execution of the executable instructions, the steps of a loop body execution method as previously described.
The present invention may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present invention.
The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may include, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium include a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical encoding device, punch cards or intra-groove protrusion structures such as those having instructions stored thereon, and any suitable combination of the foregoing.
The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (14)

1. A method of executing a loop body, comprising the steps of:
S1, in response to detection of a branch instruction, determining whether a jump destination address of the branch instruction is smaller than a program counter of the branch instruction, if so, recording the program counter of the branch instruction in a loop body end position register, recording two operands in the branch instruction, and recording the comparison type of jump conditions in the branch instruction in a loop comparator;
S2, jumping to a destination address to fetch a new instruction, determining whether to perform immediate operation on one of the two operands, if so, storing the operand performing the immediate operation into a cyclic data register, storing the other operand into a constant register, and storing the immediate operation into a step size register;
s3, judging whether the program counter of the next instruction is smaller than the program counter of the branch instruction, and if so, repeating the step S2;
S4, when the program counter of the next instruction is equal to the program counter of the branch instruction, directly fetching instructions from the circular buffer for execution, accumulating operands in a circular data register at the same time, and determining whether the accumulated operands meet the judging condition of the branch instruction based on a circular comparator.
2. The method of claim 1, further comprising detecting whether the instruction in the destination address is in a miss buffer and, if so, skipping the remaining steps, wherein the miss buffer is used to store a program counter for the last detected backward jump but not a loop instruction, the backward jump being a jump with a destination address less than the program counter of the branch instruction.
3. The loop body execution method according to claim 1, wherein the types of the comparisons include greater than, less than, equal to, greater than or equal to, and less than or equal to.
4. The method of claim 1, wherein the immediate operations include an immediate addition operation and an immediate subtraction operation.
5. The method of claim 4, wherein the immediate operation is performed using an adder.
6. The method according to claim 5, wherein when the immediate operation is an immediate subtraction operation, an addition operation is performed using a complement of the immediate.
7. A method of performing a loop body according to claim 1, wherein the immediate value is any non-zero integer.
8. The method according to claim 1, further comprising placing the value in the loop body end position register into a failure buffer and skipping the remaining steps when the loop buffer is full and the address of the next instruction is smaller than the value in the loop body end position register in step S2.
9. The method according to claim 1, further comprising placing the value in the loop body end position register into a failure buffer and skipping the remaining steps when performing a plurality of immediate operations on one of the two operands in step S2.
10. The method according to claim 1, further comprising placing the value in the loop body end position register into a failure buffer and skipping the remaining steps when the immediate operation is performed on the one operand and then the immediate operation is performed on the other operand in S4.
11. A loop body execution system based on branch instruction detection, comprising:
a cycle controller that executes a program using one of the cycle body execution methods according to any one of claims 1 to 10;
A cycle comparator for judging a cycle termination condition;
A cyclic adder for accumulating cyclic variables;
The end position register of the loop body is used for storing a program counter of the last branch instruction of the loop body;
the loop buffer is used for temporarily storing instructions of the loop body;
A failure buffer for temporarily storing a program counter of the backward jump branch instruction which is judged to be non-cyclic recently;
a loop data register for storing the value of the loop accumulation variable;
a constant register for storing a value of the loop threshold constant;
and the step length register is used for storing the step length of the accumulation of the cyclic variables.
12. A computer program product, characterized in that the computer program product comprises computer program code which, when run on an electronic device, causes the electronic device to perform the method according to any of claims 1-10.
13. A computer readable storage medium, having stored thereon a computer program executable by a processor to implement the steps of the method of any one of claims 1 to 10.
14. An electronic device, comprising:
One or more processors, and
A memory, wherein the memory is for storing executable instructions;
the one or more processors are configured to implement the steps of the method of any one of claims 1-11 via execution of the executable instructions.
CN202411755705.4A 2024-12-03 2024-12-03 A loop execution method, system, program product, medium and device Pending CN119781827A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411755705.4A CN119781827A (en) 2024-12-03 2024-12-03 A loop execution method, system, program product, medium and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411755705.4A CN119781827A (en) 2024-12-03 2024-12-03 A loop execution method, system, program product, medium and device

Publications (1)

Publication Number Publication Date
CN119781827A true CN119781827A (en) 2025-04-08

Family

ID=95231434

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411755705.4A Pending CN119781827A (en) 2024-12-03 2024-12-03 A loop execution method, system, program product, medium and device

Country Status (1)

Country Link
CN (1) CN119781827A (en)

Similar Documents

Publication Publication Date Title
CN102934075B (en) Method and apparatus for changing the sequential flow of a program using advance notification techniques
US8019976B2 (en) Memory-hazard detection and avoidance instructions for vector processing
KR102025556B1 (en) Processing apparatus, trace unit and diagnostic apparatus
KR102379894B1 (en) Apparatus and method for managing address conflicts when performing vector operations
TWI649693B (en) Data processing device, method and computer program product for controlling speculative vector computing performance
JP2011100466A5 (en)
CN107003858A (en) By the runtime code parallelization for monitoring repetitive instruction sequence
JP5579694B2 (en) Method and apparatus for managing a return stack
CN109690476A (en) Handling inter-element address hazards for vector instructions
US8250344B2 (en) Methods and apparatus for dynamic prediction by software
US20160196156A1 (en) Simulation apparatus, simulation method, and computer product
JP2008052684A (en) Branch history length indicator, branch prediction system, and branch prediction method
CN119781827A (en) A loop execution method, system, program product, medium and device
US8924693B2 (en) Predicting a result for a predicate-generating instruction when processing vector instructions
CN108021563B (en) Method and device for detecting data dependency between instructions
Sazeides Modeling value speculation
US9311247B1 (en) Method and apparatus for detecting patterns of memory accesses in a computing system with out-of-order program execution
CN113703842B (en) Value prediction method, device and medium based on branch prediction
JP2002014868A (en) Microprocessor having memory reference operation detecting mechanism and compiling method
CN100407133C (en) Method and apparatus for branch prediction
US9436473B2 (en) Scheduling program instructions with a runner-up execution position
US20250355669A1 (en) Differential treatment of context-sensitive indirect branches in indirect target predictors
JP2503223B2 (en) Prior control method
EP3933597A1 (en) Code flow trace compression employing branch prediction for implicit code flow data encoding in a processor
JP3748191B2 (en) Computer and its control method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination