CN119201232B

CN119201232B - Instruction processing device, system and method

Info

Publication number: CN119201232B
Application number: CN202411686759.XA
Authority: CN
Inventors: 胡振波; 彭剑英; 蔡骏
Original assignee: Shin Lai Zhirong Semiconductor Technology Shanghai Co ltd
Current assignee: Shin Lai Zhirong Semiconductor Technology Shanghai Co ltd
Priority date: 2024-11-22
Filing date: 2024-11-22
Publication date: 2025-04-11
Anticipated expiration: 2044-11-22
Also published as: CN119201232A

Abstract

The embodiment of the application provides instruction processing equipment, system and method, wherein the equipment comprises an instruction fetching module, a decoding module and an executing module, the instruction fetching module is used for acquiring original instruction data, performing length segmentation and pre-decoding processing on the original instruction data to obtain target instruction data, the decoding module is used for reading the target instruction data from the instruction fetching module, performing full decoding processing on the target instruction data to obtain instruction information, the executing module is used for acquiring the instruction information from the decoding module, determining the existence result of a data dependency relationship according to operand information, determining an executing unit in the executing module according to the instruction type, the existence result and the operand information, and executing operation processing in the executing unit to obtain an executing result. The device can allocate more execution resources for the performance critical instructions, disperse the performance critical instructions to each pipeline stage, reduce cavitation, distribute complex instructions to different pipeline stages, and improve the working efficiency of the processor.

Description

Instruction processing apparatus, system and method

Technical Field

The present application relates to the field of computer technology, and in particular, to an instruction processing apparatus, system, and method.

Background

With the continuous development of computer devices, a central processing unit (CPU, central Processing Unit) is used as an operation and control core of a computer device, and its function is mainly to complete execution of computer instructions and processing of data, and in the process of increasing the computational complexity, the performance of the CPU is becoming more and more important. The nature of the computer program running on the computer device is the execution process of the instructions, and in order to improve the working efficiency and performance of the CPU in the computer device, the study on how to process the computer instructions is particularly important.

Currently, a typical five-stage pipeline processor architecture is adopted in the related art, and the processor is, for example, a microprocessor (Microprocessor without interlocked PIPELINED STAGES, MIPS), where each stage pipeline sequentially includes five instruction processing stages, namely, instruction fetching (Instruction Fetch, IF), decoding (Instruction Decode, ID), execution (EX), memory Storage (MEM), and Write Back (WB), and each stage pipeline stage is used for processing different tasks in the Execution of one instruction. However, in this solution, the processing task of the single stage pipeline stage is heavy, so that the resource consumption is too large, and the generated electrical signal needs to be transmitted in a longer path, so that the working frequency of the processor is limited, the performance of the processor is affected, and critical tasks and complex tasks may occupy the execution units, so that the resource contention and cavitation phenomenon exist, and the processing efficiency of the processor is low.

Disclosure of Invention

The embodiment of the application provides instruction processing equipment, a system and a method.

In a first aspect of an embodiment of the present application, there is provided an instruction processing apparatus including:

The instruction fetching module is used for acquiring original instruction data, and performing length segmentation and pre-decoding processing on the original instruction data to obtain target instruction data;

The decoding module is connected with the instruction fetching module and is used for reading the target instruction data from the instruction fetching module and performing full decoding processing on the target instruction data to obtain instruction information, wherein the instruction information comprises instruction type and operand information;

And the execution module is connected with the decoding module and is used for acquiring the instruction information from the decoding module, determining the existence result of the data dependency relationship according to the operand information, determining the execution units in the execution module according to the instruction type, the existence result and the operand information, and executing corresponding operation processing in the execution units to obtain an execution result, wherein each execution unit corresponds to one pipeline stage.

In an optional embodiment of the application, the execution module comprises a first execution unit, a second execution unit and a third execution unit, wherein the second execution unit is connected with the first execution unit and the third execution unit, and the execution result comprises a first execution result and a second execution result;

The first execution unit is used for executing operation processing on the performance critical instruction according to operand information to obtain a first execution result when the instruction type is a performance critical instruction and the data dependency relationship does not exist in the result representation, and calculating the memory address of the memory instruction according to operand information when the instruction type is a memory instruction;

The second execution unit is used for executing data loading or storing operation according to the memory address;

The third execution unit is used for executing operation processing on the performance critical instruction when the performance critical instruction is not executed in the first execution unit and the data dependency relationship exists in the result representation, and executing corresponding processing according to operand information when the instruction type is a non-performance critical instruction to obtain a second execution result.

In an alternative embodiment of the application, the first execution result comprises a first operation result and a first prediction result, and the first execution unit comprises a first arithmetic logic unit, a first branch prediction unit and an address generation unit;

When the instruction type of the target instruction data is a performance critical instruction and the data dependency relationship is not represented by the existing result, executing operation processing on the performance critical instruction without the data dependency relationship according to the operand information to obtain the first operation result;

When the instruction type of the target instruction data is a performance critical instruction and the existing result representation has no data dependency, executing branch prediction processing on the performance critical instruction without the data dependency according to the operand information to obtain the first prediction result;

And the address generation unit is used for calculating the memory address of the access instruction according to the operand information of the access instruction when the instruction type of the target instruction data is the access instruction.

In an alternative embodiment of the application, the second execution unit comprises an address access unit, a first execution unit and a second execution unit, wherein the address access unit is connected with the address generation unit;

the address access unit is used for executing data loading operation or data storing operation according to the memory address and operand information corresponding to the access instruction.

In an optional embodiment of the application, the second execution result comprises a second operation result and a second prediction result, and the third execution unit comprises a second arithmetic logic unit and a second branch prediction unit;

when the performance critical instruction is not executed in the first execution unit and the data dependency is not represented by the result, executing logic operation processing on the performance critical instruction without the data dependency according to the operand information to obtain a second operation result;

and the second branch prediction unit is used for executing branch prediction processing on the performance critical instruction without the data dependency according to the operand information when the performance critical instruction is not executed in the first execution unit and the data dependency is not represented by the existence result, so as to obtain the second prediction result.

In an optional embodiment of the application, the second execution result comprises a control result, a multiplication result and a division result, and the third execution unit further comprises a control and status register, a multiplication unit and a division unit;

The control and status register is used for executing the read-write of the control and status register according to the operand information to obtain the control result;

The multiplication unit is used for executing multiplication processing according to the operand information to obtain the multiplication result;

The division unit is used for executing division processing according to the operand information to obtain the division result.

In an optional embodiment of the present application, the finger picking module includes a finger picking unit and a preprocessing unit, where the finger picking unit is connected with the preprocessing unit;

the instruction fetching unit is used for performing length segmentation processing on the original instruction data to obtain complete instruction data and sending the complete instruction data to the preprocessing unit;

The preprocessing unit is used for performing pre-decoding processing on the complete instruction data to obtain the target instruction data.

In a second aspect of the embodiment of the present application, there is provided an instruction processing system including the instruction processing apparatus provided in the above embodiment.

In a third aspect of the embodiment of the present application, there is provided an instruction processing method, including:

acquiring original instruction data, and performing length segmentation and pre-decoding processing on the original instruction data to obtain target instruction data;

Performing full decoding processing on the target instruction data to obtain instruction information, wherein the instruction information comprises instruction type and operand information;

Determining the existence result of the data dependency relationship according to the operand information, determining the execution unit in the execution module according to the instruction type, the existence result and the operand information, and executing corresponding operation processing in the execution unit to obtain an execution result, wherein each execution unit corresponds to one pipeline stage.

The instruction processing device comprises an instruction fetching module, a decoding module and an executing module, wherein the instruction fetching module is used for acquiring original instruction data, performing length segmentation and pre-decoding processing on the original instruction data to obtain target instruction data, the decoding module is used for reading the target instruction data from the instruction fetching module and performing full decoding processing on the target instruction data to obtain instruction information, the instruction information comprises an instruction type and operand information, the executing module is used for acquiring the instruction information from the decoding module, determining the existence result of a data dependency relationship according to the operand information, determining an executing unit in the executing module according to the instruction type, the existence result and the operand information, and executing corresponding operation processing in the executing unit to obtain an executing result. Compared with the prior art, the instruction processing equipment provided by the application has the advantages that on one hand, the target instruction data is obtained by carrying out length segmentation and pre-decoding processing on the original instruction data, and the target instruction data is fully decoded by the decoding module, so that data guiding information can be provided when the subsequent execution module executes operation processing, on the other hand, more execution resources are distributed to pipeline stages corresponding to each execution unit according to the existence result of the instruction type and the data dependency relationship, so that cavitation caused by the data dependency relationship is reduced, the instructions are allowed to be executed in a distributed mode in the pipeline stages, the blocking caused by the data dependency relationship is reduced, the utilization resource can be maximized, the overall performance of the processor is improved, the complex instruction tasks are distributed in different pipeline stages, single-stage overload is avoided, the load of each stage is lightened, and the processing efficiency of the processor is further improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a block diagram of a prior art classical five-stage pipeline according to one embodiment of the present application;

FIG. 2 is a schematic diagram of an instruction processing apparatus according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an instruction processing apparatus according to an embodiment of the present application;

FIG. 4 is a flow chart of an instruction processing method according to an embodiment of the present application;

Fig. 5 is a schematic structural diagram of an instruction processing apparatus according to an embodiment of the present application.

Reference numerals illustrate:

The system comprises a finger fetching module-10, a finger fetching unit-11, a preprocessing unit-12, a decoding module-20, an execution module-30, a first execution unit-31, a second execution unit-32, a third execution unit-33, a first arithmetic logic unit-311, a first branch prediction unit-312, an address generation unit-313, an address access unit-321, a second arithmetic logic unit-331, a second branch prediction unit-332, a control and status register-333, a multiplication unit-334 and a division unit-335.

Detailed Description

In the process of realizing the application, the inventor finds that the processing task of the current single-stage pipeline stage is heavier, so that the resource consumption is overlarge, the working frequency of a processor is limited, the performance of the processor is influenced, critical tasks and complex tasks can occupy an execution unit, and the resource contention and cavitation phenomena exist, so that the processing efficiency of the processor is lower.

It will be appreciated that referring to FIG. 1, a related art processor CPU employs pipelining, such as a classical five-stage pipeline. Each stage of pipeline sequentially comprises IF, ID, EX, MEM and WB of five instruction processing units. The IF unit is used for reading instructions from the memory, the ID unit is used for decoding the instructions, namely identifying the types of the instructions, the EX unit is used for performing operation on the instructions to obtain operation results, the MEM unit is used for memory access operation, namely reading data from an internal or external memory or writing the instruction results into the memory, and the WB unit is used for writing the results of instruction execution into a register file of the processor to facilitate the next quick access.

For a processor architecture, the processor architecture can be divided into different pipelines according to different pipeline classifications, for different processors, the number of stages of the adopted pipelines is limited, and a large number of combinational logic circuits are needed for executing operation of each stage, especially for an EX pipeline stage, when complex instructions or performance critical instructions are encountered, a single-stage pipeline can face a large computational burden, so that the single-stage pipeline processing task is overweight, the overweight task quantity can cause the signal path in the processor to be prolonged, delay is increased, and therefore the working frequency which can be achieved by the processor is limited, and the overall performance is further influenced. And in classical pipeline designs all operations rely on a shared execution unit to complete, if a long time consuming instruction is currently being processed, other instructions waiting for execution have to be queued up. In this case, even a simple operation that can be completed quickly is delayed by the more complicated operation before. This wait due to resource contention is referred to as a "correlation" and causes "cavitation" in the pipeline, i.e., a period of time during which no active work occurs during a pipeline cycle. These cavitation bubbles waste the time window that would otherwise be available to execute other instructions, thereby reducing processor efficiency.

Compared with the prior art, the instruction processing device, the system and the method provided by the embodiment of the application have the advantages that on one hand, the target instruction data is obtained by carrying out length segmentation and pre-decoding processing on the original instruction data, and the target instruction data is fully decoded by the decoding module, so that data guiding information can be provided when operation processing is carried out on a subsequent execution module, on the other hand, more execution resources are allocated to the performance critical instruction according to the existence result of the instruction type and the data dependency relationship, the performance critical instruction is distributed to pipeline stages corresponding to each execution unit, the cavitation phenomenon caused by the data dependency relationship is reduced, the instructions are allowed to be executed in the pipeline stages in a distributed mode, the blocking caused by the data dependency relationship is reduced, the overall performance of the processor can be improved, the complex instruction tasks are distributed in different pipeline stages, the overload of a single stage is avoided, the load of each stage is reduced, and the processing efficiency of the processor is further improved.

The solution in the embodiment of the present application may be implemented in various computer languages, for example, object-oriented programming language Java, an transliterated script language JavaScript, verilog, HDL hardware description languages such as system verilog, and the like.

In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of exemplary embodiments of the present application is provided in conjunction with the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application and not exhaustive of all embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.

Referring to fig. 2, fig. 2 is a schematic structural diagram of an instruction processing apparatus according to an embodiment of the present application, where the instruction processing apparatus includes an instruction fetching module 10, a decoding module 20, and an executing module 30. The instruction fetching module 10 is used for obtaining original instruction data and performing length segmentation and pre-decoding processing on the original instruction data to obtain target instruction data, the decoding module 20 is connected with the instruction fetching module 10 and is used for reading the target instruction data from the instruction fetching module 10 and performing full decoding processing on the target instruction data to obtain instruction information, the instruction information comprises an instruction type and operand information, the executing module 30 is connected with the decoding module 20 and is used for obtaining the instruction information from the decoding module 20, determining the existence result of the data dependency according to the operand information, determining executing units in the executing module 30 according to the instruction type, the existence result and the operand information, executing corresponding operation processing in the executing units to obtain an executing result, and each executing unit corresponds to one pipeline stage.

It should be noted that, the original instruction data is instruction data that needs to be processed, and the original instruction data may be an original instruction stream that is not subjected to slicing processing, for example, may be an entire instruction data stream, which includes a plurality of instruction data, where the plurality of instruction data may be instruction data of different instruction types, or may be instruction data of the same instruction type.

Alternatively, the plurality of raw instruction data may be fetched from a memory or other memory unit during the fetching of the plurality of raw instruction data.

After the instruction fetching module 10 acquires the original instruction data, the length splitting process may be performed on the original instruction data, specifically, the length information of the original instruction data is acquired according to a preset instruction format, and the length splitting process is performed on the original instruction data according to the length information of the original instruction data, so as to obtain complete instruction data. The preset instruction format is obtained by sorting according to actual instruction data, and can be set in a self-defined mode according to actual requirements.

After the complete instruction data is obtained, the complete instruction data can be subjected to pre-decoding processing, key features are extracted, target instruction data is obtained, and the key features are used for providing information for prediction of branch jump instructions. The branch jump instruction may include, among other things, a conditional branch instruction, an unconditional branch instruction, for example, identifying the portion of the instruction's opcode that may represent a branch jump, or a field associated with jump address calculation. Based on these key features, it is predicted whether the branch is executed, and possibly the jump target address. The target instruction data refers to instruction data after the original instruction data is subjected to full decoding processing. The target instruction data is complete instruction data.

In this embodiment, the pre-decoding process is performed on the complete instruction data, which is helpful to improve the instruction pipeline efficiency of the processor and reduce the pipeline stall caused by branch prediction errors.

The decoding module 20 is connected with the instruction fetching module 10, and after the original instruction data is obtained from the instruction fetching module 10, the decoding module 20 can perform comprehensive decoding processing on the target instruction data to obtain instruction information. Alternatively, the full decode process described above may include at least one operation of opcode resolution, register file reading, immediate processing, dependency detection, exception detection, and the like. The operation code analysis may be performing operation code analysis on target instruction data to determine an instruction type, the register file reading may be reading a corresponding source operand from a register file according to a register number in the target instruction data, when an instruction contains an immediate, immediate processing may be performed, specifically, expansion or adjustment may be performed to conform to a required operation format, the dependency detection may be detecting whether a current instruction needs to wait for a result of a previous instruction, that is, whether a data dependency exists, if a data dependency exists, a data forwarding path may need to be set or execution of the instruction may need to be delayed, and the exception detection may be detecting whether the instruction is legal, whether illegal operation exists or other conditions possibly causing an exception, and the like. In addition, for some complex instructions, special fields of the instruction, such as mode bits, privilege level related bits, etc., need to be resolved.

In this embodiment, by setting the decoding module, the instruction can be quickly and accurately resolved, which is helpful for reducing delay in the pipeline and improving the overall performance of the processor.

An execution module 30 is coupled to the decode module 20, and the execution module 30 may include a multi-stage pipeline, each stage of pipeline including an execution unit. Each target instruction data can correspondingly obtain an instruction type and operand information after full decoding processing, wherein the instruction type can comprise a performance critical instruction, a memory access instruction and a non-performance critical instruction, and the performance critical instruction can comprise an operation instruction, a jump instruction or memory access instruction, a storage instruction and the like. The number of the target instruction data may be three, four, or any other number.

After the instruction type and the operand information are obtained from the decoding module, the execution module 30 determines, for each target instruction data, whether a data dependency exists according to the operand information, and obtains a presence result, where the presence result may include that the target instruction data has a data dependency and that the target instruction data does not have a data dependency. The performance-critical instruction may include two execution points in the execution process, where each execution point corresponds to an execution unit, and different execution points may also have different conditions to be satisfied when performing an operation, for example, may include a first execution point and a second execution point.

It will be appreciated that a data dependency is a condition where the result of one instruction is used by a subsequent instruction during execution of the program, and this dependency affects the order of execution of the instructions and the scheduling of the pipeline.

When the instruction type of the target instruction data is a performance critical instruction, the target instruction data is executed in a first execution point of the execution module 30 when the data dependency exists as a result representation, and the target instruction data is executed in a second execution point of the execution module 30 when the performance critical index is not executed in the first execution point and the data dependency exists as a result representation.

For the memory access instruction, two execution points may be included, where each execution point corresponds to one execution unit, for example, the memory access instruction may be executed at a first execution point and then executed at a second execution point, so that the complex instruction is distributed to different pipelines, so that the task amounts born by the pipelines are balanced, and the working efficiency of the processor is improved.

In the process of executing the operation processing on the target instruction data by each execution point, different instruction types can be included, and corresponding operations are executed, so that an execution result is obtained. After the execution result is obtained, the execution result can be written back to a physical register for data processing, and the execution result can be called later.

In an alternative embodiment of the present application, please continue to refer to fig. 2, the execution module 30 includes a first execution unit 31, a second execution unit 32, and a third execution unit 33, the second execution unit 32 is connected to the first execution unit 31 and the third execution unit 33, and the execution results include a first execution result and a second execution result.

The first execution unit 31 is configured to execute an operation process on the performance-critical instruction according to operand information to obtain a first execution result when the instruction type is a performance-critical instruction and the result indicates that no data dependency exists, and calculate a memory address of a memory access instruction according to the operand information when the instruction type is a memory access instruction;

the second execution unit 32 is configured to execute a data load or store operation according to the memory address;

the third execution unit 33 is configured to execute an operation process on the performance-critical instruction when the performance-critical instruction is not executed in the first execution unit and there is no data dependency of the result representation, and execute a corresponding process according to the operand information when the instruction type is a non-performance-critical instruction, so as to obtain a second execution result.

It should be noted that, the performance critical instruction is a general instruction, and may include a general calculation instruction and a jump instruction, where the first execution unit is a first execution point of the performance critical instruction, and the third execution unit is a second execution point of the performance critical instruction. After the instruction type and operand information of the target instruction data are obtained, whether the instruction type is a performance critical instruction or not can be judged, whether a data dependency relationship exists or not can be judged according to the operand information and a preset rule when the instruction type is the performance critical instruction, whether the data dependency relationship exists or not is judged, when the data dependency relationship exists, the execution condition of a first execution point of the performance critical instruction is characterized, and then the target instruction data are executed in a first execution unit to obtain a first execution result. When the data dependency relationship exists in the performance critical instruction, the execution condition of the first execution point which does not meet the performance critical instruction is characterized, the execution is required to be performed at the second execution point, then whether the data dependency relationship exists or not is judged, when the data dependency relationship exists, the execution condition of the second execution point which meets the performance critical instruction is characterized, and then the target instruction data is subjected to operation processing in the third execution unit to obtain a second execution result.

Taking an instruction module as an IF module, a decoding module as a DE module, a first execution unit as EX0, a second execution unit as EX1, and a third execution unit as EX2 as examples, after target instruction data is transmitted to the DE module through the IF module, full decoding is performed on the target instruction data in the DE module to obtain instruction information, for example, the current instruction type can be determined, and the operand source thereof can be determined, and the operand source can include data forwarding or reading a register file, and IF data forwarding is required, each data forwarding source can be detected in a subsequent pipeline stage. When the target instruction data is a performance critical instruction and has no data dependency, the performance critical instruction can be directly executed in an EX0 pipeline stage, if the data dependency exists, the execution needs to be continued to a next stage pipeline EX2, and if the performance critical instruction is not executed in the EX0 pipeline stage and has no data dependency, the performance critical instruction can be directly executed in an EX2 unit.

In this embodiment, since the performance critical instruction has two execution opportunities, when the first execution point obtains the execution resource but the instruction itself has a data dependency relationship, the pipeline is not blocked, but the execution resource is immediately yielded and the next pipeline stage is continued, i.e. the second execution point is entered. If the data dependency of the instruction itself is still not released when the execution resource is obtained for the second time, the pipeline is blocked, so pipeline lock generated by the data dependency is greatly reduced in the architecture. Compared with the blocking caused by data dependency and resource dependency in a classical pipeline, the blocking of the architecture is obviously reduced, and the execution efficiency of a processor is higher.

When the instruction type is the memory access instruction, the memory address of the memory access instruction is calculated in the first execution unit EX0 according to operand information, and data loading or storing operation is executed in the second execution unit EX1 according to the memory address, so that the task amount born by each pipeline stage is balanced. Avoiding the concentration of all tasks to the MEM unit can increase the operating frequency of the processor compared to MEM-like operating instructions in existing classical pipeline stages in only one pipeline stage.

It will be appreciated that memory access instructions are a class of instructions in a processor that are used to interact with memory, and in some cases are considered to be relatively time consuming operations. They allow programs to read or load data in memory or write or store data to memory. A memory access instruction is one of the basic operations in the execution of a computer program. Optionally, the access instruction includes:

The Load instruction is to read data from the memory and Store the data in the register, and the Store instruction is to write the data in the register into the memory.

The non-performance critical instructions may include multi-cycle instructions, system control class instructions ebreak/ecall/fence, and the like, and for the non-performance critical instructions, corresponding processing may be executed in the third execution unit EX2, so as to obtain a second execution result.

In this embodiment, more execution resources are allocated to the performance-critical instructions, which allow the instructions to be executed in a distributed manner in the pipeline, so as to reduce the blocking caused by the data dependency relationship, thereby improving the overall performance of the processor, and the complex memory access instructions are cut into a plurality of parts, so that the fragmented instruction tasks are uniformly distributed in different pipeline stages, and compared with the classical architecture in which all execution logic is concentrated in one pipeline stage, the logic circuit between the stages in this scheme is simpler, and therefore the working frequency is higher.

In an alternative embodiment of the present application, please continue to refer to fig. 3, the first execution result includes a first operation result and a first prediction result, and the first execution unit 31 includes a first arithmetic logic unit 311, a first branch prediction unit 312, and an address generation unit 313.

The first arithmetic logic unit 311 is configured to perform an operation process on a performance critical instruction having no data dependency according to operand information to obtain a first operation result when the instruction type of the target instruction data is a performance critical instruction and the existence result indicates that no data dependency exists, the first branch prediction unit 312 is configured to perform a branch prediction process on a performance critical instruction having no data dependency according to operand information to obtain a first prediction result when the instruction type of the target instruction data is a memory instruction and the existence result indicates that no data dependency exists, and the address generation unit 313 is configured to calculate a memory address according to operand information of the memory instruction when the instruction type of the target instruction data is the memory instruction.

It should be noted that the first arithmetic logic unit (ARITHMETIC LOGIC UNIT, ALU) is an ALU1, which is mainly used for performing basic arithmetic and logic operations, such as addition, subtraction, and, or, non-equal operations. When logic operation is required to be executed, the instruction type of the target instruction data is a performance critical instruction, and no data dependency exists, operation processing is executed on the target instruction data without the data dependency according to operand information, so that a first operation result is obtained.

The first branch prediction unit (Branch Jump Predictor, BJP) is BJP and is responsible for predicting the outcome of a branch instruction (e.g., conditional jump, unconditional jump, etc.) during program execution to reduce the delay associated with the branch. Which can help to determine the execution path of a program in advance. When branch prediction is required to be executed, the instruction type of the target instruction data is a performance critical instruction and no data dependency exists, and branch prediction processing is executed on the target instruction data without the data dependency according to operand information, so that a first prediction result is obtained.

The address generating unit (Address Generation Unit, AGU) may be AGU1, configured to calculate the real address of the access instruction, and process a complex addressing mode. When the instruction type of the target instruction data is a memory access instruction, the memory address is calculated according to the operand information of the memory access instruction.

In this embodiment, since the first arithmetic logic unit and the first branch prediction unit are provided, the pipeline efficiency can be improved, the performance loss caused by branching can be reduced, and by providing the address generation unit, part of complex tasks can be shared on the pipeline stage, the memory access speed can be increased, and the execution efficiency of data loading and storing instructions can be improved.

In an alternative embodiment of the present application, as shown in fig. 3, the second execution unit 32 includes an address access unit 321, where the address access unit 321 is connected to the address generation unit 313, and the address access unit 321 is configured to execute a data loading operation or a data storing operation according to operand information corresponding to a memory address and a memory access instruction.

It can be understood that in the prior art, the circuit frequency is affected by single-cycle execution, so that the complex task fragmentation is divided into processing, that is, the memory access instruction is executed in stages and is executed in different pipeline stages, so that the task burden of a single-stage pipeline can be reduced.

The address access unit may be understood as AGU2, and after the instruction type is a memory access instruction and the address generating unit in the first execution unit calculates an actual memory address, a data loading operation or a data storing operation may be performed in the address access unit according to the memory address and operand information corresponding to the memory access instruction. And when the access instruction is a data storage instruction, executing data storage operation according to the memory address in the address access unit.

In the embodiment, the memory access instruction is processed in stages, so that the task burden of a single-stage pipeline is reduced, pipeline resources are better utilized, and the clamping caused by complex tasks is reduced, thereby improving the working efficiency and the overall performance of the processor.

In an alternative embodiment of the present application, please refer to fig. 3, the second execution result includes a second operation result and a second prediction result, and the third execution unit 33 includes a second arithmetic logic unit 331 and a second branch prediction unit 332.

The second arithmetic logic unit is used for executing logic operation processing on the performance critical instruction without the data dependency according to operand information to obtain a second operation result when the performance critical instruction is not executed in the first execution unit and the data dependency is not represented by the result representation, and executing branch prediction processing on the performance critical instruction without the data dependency according to operand information to obtain a second prediction result when the performance critical instruction is not executed in the first execution unit and the data dependency is not represented by the result representation.

It should be noted that, taking the second arithmetic logic unit as ALU2 and the second branch prediction unit as BJP as an example, when the performance critical instruction is not executed in the first execution unit and there is no data dependency, when the performance critical instruction is a logic arithmetic instruction, logic operation processing is performed on the performance critical instruction without data dependency according to operand information, so as to obtain a second operation result.

When the performance critical instruction is not executed in the first execution unit and the data dependency relationship does not exist, when the performance critical instruction is a branch prediction instruction, the branch prediction processing is executed on the performance critical instruction without the data dependency relationship according to operand information, and a second prediction result is obtained.

In this embodiment, two execution opportunities (EX 0 and EX 2) are provided for the critical instructions, if a data dependency relationship is encountered during the first execution, the instructions will not block the pipeline, but continue to flow downwards until the second execution point (EX 2) satisfies the condition for execution, so that cavitation caused by the data dependency is reduced, the instruction execution is conveniently and effectively performed, and the instruction processing efficiency is improved.

In an alternative embodiment of the application the second execution results comprise a control result, a multiplication result and a division result, and the third execution unit 33 further comprises a control and status register 333, a multiplication unit 334, a division unit 335.

The control and status register 333 is configured to perform control and status register reading and writing according to operand information to obtain a control result, the multiplication unit is configured to perform multiplication according to operand information to obtain a multiplication result, and the division unit is configured to perform division according to operand information to obtain a division result.

It should be noted that the control and status registers (Control and Status Registers, CSR) are used to store status information of the processor and control signals, which may include, for example, interrupt enable control signals, exception handling address control signals, and the like. The state information may include an operating state of the processor, configuration information, and the like.

The control and status registers may include a program counter, a condition code register, an interrupt enable register, an abnormal program counter, an abnormal cause register, a privilege mode register, a performance monitoring register, a debug register, a system control register, and the like, where different registers have different corresponding functions.

The multiplication unit (Multiplier Unit, MUL) is configured to perform a multiplication operation according to operand information to obtain a multiplication result when the instruction type is a non-performance critical instruction.

The division Unit (DIV) is configured to perform a division operation according to the operand information when the instruction type is a non-performance critical instruction, so as to obtain a division result.

In the embodiment, the control and status register is set to manage and control the internal status of the processor so as to realize complex control logic, and the multiplication unit and the division unit are set to execute the non-performance critical instruction quickly, thereby reducing data blocking, improving performance and reducing hardware resource consumption.

In one embodiment, the application also provides a specific implementation manner of the structure of the finger taking module, and the finger taking module 10 comprises a finger taking unit 11 and a preprocessing unit 12, wherein the finger taking unit 11 is connected with the preprocessing unit 12.

The instruction fetching unit 11 is used for performing length segmentation processing on the original instruction data to obtain complete instruction data and sending the complete instruction data to the preprocessing unit, and the preprocessing unit 12 is used for performing pre-decoding processing on the complete instruction data to obtain target instruction data.

Specifically, after the instruction fetching unit 11 obtains the original instruction data, the length splitting process may be performed on the original instruction data, first, length information of the original instruction data is obtained according to a preset instruction format, and the length splitting process is performed on the original instruction data according to the length information of the original instruction data, so as to obtain complete instruction data.

After the complete instruction data is obtained, the complete instruction data may be subjected to a pre-decoding process by the pre-processing unit 12, and key features are extracted to obtain target instruction data, where the key features are used to provide information for predicting the branch jump instruction. The branch jump instruction may include, among other things, a conditional branch instruction, an unconditional branch instruction, for example, identifying the portion of the instruction's opcode that may represent a branch jump, or a field associated with jump address calculation. Based on these key features, it is predicted whether the branch is executed, and possibly the jump target address. The target instruction data refers to instruction data after the original instruction data is subjected to full decoding processing. The target instruction data is complete instruction data.

According to the embodiment of the application, the instruction fetching unit is arranged to be capable of performing length segmentation processing on the original instruction data to obtain complete instruction data, so that a complete instruction can be provided for subsequent pipeline stage processing, the efficiency of a pipeline is improved, delay caused by waiting for the complete instruction is reduced, the acquisition and processing speed of the instruction is improved, cavitation in the pipeline is reduced, the processing of a control flow is optimized through early branch prediction, the requirement of high-performance calculation can be better supported, and meanwhile, lower power consumption and smaller chip area are maintained.

On the other hand, the embodiment of the present application further provides an instruction processing method, referring to fig. 4, where the instruction processing method includes steps 201 to 203 as follows:

Step 201, obtaining original instruction data, and performing length segmentation and pre-decoding on the original instruction data to obtain target instruction data.

Step 202, performing full decoding processing on target instruction data to obtain instruction information, wherein the instruction information comprises instruction type and operand information.

Step 203, determining the existence result of the data dependency according to the operand information, determining the execution unit in the execution module according to the instruction type, the existence result and the operand information, and executing corresponding operation processing in the execution unit to obtain an execution result, wherein each execution unit corresponds to one pipeline stage.

The execution module comprises a first execution unit, a second execution unit and a third execution unit, wherein the second execution unit is connected with the first execution unit and the third execution unit, and the execution result comprises a first execution result and a second execution result.

Determining a presence result of a data dependency relation according to operand information, determining an execution unit in an execution module according to an instruction type, a presence result and the operand information, and executing corresponding operation processing in the execution unit to obtain an execution result, wherein when the instruction type is a performance critical instruction and the presence result indicates that no data dependency relation exists, executing operation processing on the performance critical instruction according to the operand information in a first execution unit to obtain a first execution result, and when the instruction type is a memory access instruction, calculating a memory address of the memory access instruction according to the operand information in the first execution unit, wherein the performance critical instruction comprises at least one of a general calculation instruction and a jump instruction, executing data loading or storing operation according to the memory address in a second execution unit, executing operation processing on the performance critical instruction in a third execution unit when the performance critical instruction is not executed in the first execution unit and the presence result indicates that no data dependency relation exists, and executing corresponding processing according to the operand information in the third execution unit to obtain a second execution result when the instruction type is a non-performance critical instruction.

Specifically, referring to fig. 5, the processor includes an instruction fetching module (IF), a decoding module (DE) and an execution module (EX), the instruction fetching module (IFU) includes a six-stage pipeline, the first stage pipeline step includes a first instruction fetching unit (IF 1), the second stage pipeline step includes a second instruction fetching unit (IF 2), the original instruction data is obtained through the IF1 unit, and the original instruction data is subjected to length slicing processing to obtain complete instruction data, and the complete instruction data is sent to the IF2 unit, and the complete instruction data is subjected to pre-decoding processing through the IF2 unit to obtain target instruction data, and the target instruction data is sent to the decoding module.

The third stage of pipeline step comprises a decoding module (DE), wherein target instruction data is subjected to full decoding processing in the decoding module, instruction type and operand information are obtained, and the instruction type and operand information are transmitted to an execution module (EX). The execution module (EX) comprises three pipelines, namely a fourth-stage pipeline, a fifth-stage pipeline and a sixth-stage pipeline, wherein the fourth-stage pipeline comprises a first execution unit (EX 0), the fifth-stage pipeline comprises a second execution unit (EX 1), and the sixth-stage pipeline comprises a third execution unit (EX 2).

The EX0 unit may include a first arithmetic logic unit (ALU 1), a first branch and prediction unit (BJP 1), and an address generation unit (AGU 1), the EX1 unit may include an address access unit (AGU 2), the EX2 unit may include a second arithmetic logic unit (ALU 2), a second branch and prediction unit (BJP 2), a Control and Status Register (CSR), a Multiplication Unit (MUL), and a division unit (DIV), when the instruction type is a performance critical instruction, it is determined whether the target instruction data has a data dependency relationship, when the data dependency relationship does not exist and a logic operation is required, the ALU1 in the EX0 unit performs an operation process to obtain a first operation result, and when the data dependency relationship does not exist and a branch prediction operation is required, the EX0 unit BJP1 performs a branch prediction process to obtain a first prediction result. When the instruction type is a memory access instruction, a memory address is calculated in AGU1 in the EX0 unit, and then a data loading or storing operation is executed in the AGU2 unit according to the memory address.

If the performance criticality index is not executed in EX0 and has no data dependency, the ALU2 in the EX2 unit can execute operation processing to obtain a second operation result when logic operation is needed, and if the performance criticality index is not executed in EX0 and has no data dependency, the BJP in the EX2 unit can execute branch prediction processing to obtain a second prediction result when the branch prediction processing is needed. For non-performance critical instructions, control processing can be performed by a CSR unit to obtain a control result, multiplication processing can be performed by a MUL unit to obtain a multiplication result, and division processing can be performed by a DIV unit to obtain a division result.

Compared with the prior art, the instruction processing method in the embodiment obtains target instruction data by performing length segmentation and pre-decoding processing on original instruction data, and performs full decoding processing on the target instruction data through the decoding module, so that data guiding information can be provided when operation processing is performed on subsequent execution modules, on the other hand, more execution resources are distributed to execution units corresponding to execution units according to the existence result of instruction types and data dependency relationships, cavitation caused by the data dependency relationships is reduced, the instructions are allowed to be executed in a distributed mode in the pipeline stages, the blocking caused by the data dependency relationships is reduced, resources can be utilized to the greatest extent, the overall performance of a processor is improved, complex instruction tasks are distributed in different pipeline stages, single-stage overload is avoided, the load of each stage is reduced, and the processing efficiency of the processor is further improved.

On the other hand, the embodiment of the application also provides a processing system, which comprises the instruction processing equipment provided by the embodiment.

Compared with the prior art, the processing system of the application obtains target instruction data by carrying out length segmentation and pre-decoding processing on original instruction data, and carries out full decoding processing on the target instruction data through the decoding module, thus being capable of providing data guiding information when executing operation processing for subsequent execution modules, on the other hand, distributing more execution resources to performance critical instructions according to the existence result of instruction types and data dependence relations, distributing the performance critical instructions to pipeline stages corresponding to each execution unit, reducing cavitation phenomenon caused by the data dependence relations, allowing the instructions to be executed in a distributed mode in the pipeline stages, reducing the clamping caused by the data dependence relations, maximizing the utilization of resources, improving the overall performance of the processor, distributing complex instruction tasks in different pipeline stages, avoiding single-stage overload, reducing the load of each stage, and further improving the processing efficiency of the processor.

It should be understood that, although the steps in the flowchart are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the figures may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of other steps or other steps.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. An instruction processing device, characterized in that the device comprises:

An instruction fetch module, the instruction fetch module is used to obtain original instruction data, and perform length segmentation and pre-decoding processing on the original instruction data to obtain target instruction data;

A decoding module, the decoding module is connected to the instruction fetch module, and is used to read the target instruction data from the instruction fetch module, perform full decoding processing on the target instruction data, and obtain instruction information; the instruction information includes instruction type and operand information;

An execution module, the execution module is connected to the decoding module, and is used to obtain the instruction information from the decoding module, determine the existence result of the data dependency relationship according to the operand information, and determine the execution unit in the execution module according to the instruction type, the existence result and the operand information, and perform corresponding operation processing in the execution unit to obtain an execution result, each of the execution units corresponds to a pipeline stage; wherein the execution module includes: a first execution unit, a second execution unit and a third execution unit; the second execution unit is connected to the first execution unit and the third execution unit; the execution result includes a first execution result and a second execution result;

The first execution unit is used to: when the instruction type is a performance-critical instruction and there is a result representation that there is no data dependency, perform operation processing on the performance-critical instruction according to operand information to obtain a first execution result, and when the instruction type is a memory access instruction, calculate the memory address of the memory access instruction according to the operand information; the performance-critical instruction includes at least one of the following: a general computing instruction and a jump instruction;

The second execution unit is used to: perform a data loading or storing operation according to the memory address;

The third execution unit is used to: when the performance-critical instruction is not executed in the first execution unit and there is a result indicating that there is no data dependency, perform operation processing on the performance-critical instruction; and when the instruction type is a non-performance-critical instruction, perform corresponding processing according to operand information to obtain a second execution result.

2. The device according to claim 1, characterized in that the first execution result includes a first operation result and a first prediction result; the first execution unit includes: a first arithmetic logic unit, a first branch prediction unit and an address generation unit;

The first arithmetic logic unit is used to: when the instruction type of the target instruction data is a performance-critical instruction and there is a result indicating that there is no data dependency, perform operation processing on the performance-critical instruction without data dependency according to the operand information to obtain the first operation result;

The first branch prediction unit is used to: when the instruction type of the target instruction data is a performance-critical instruction and there is a result representation that there is no data dependency, perform branch prediction processing on the performance-critical instruction without data dependency according to the operand information to obtain the first prediction result;

The address generating unit is used for: when the instruction type of the target instruction data is a memory access instruction, calculating a memory address for the memory access instruction according to operand information of the memory access instruction.

3. The device according to claim 2, characterized in that the second execution unit comprises: an address access unit; the address access unit is connected to the address generation unit;

The address access unit is used to perform a data loading operation or a data storage operation according to the memory address and operand information corresponding to the memory access instruction.

4. The device according to claim 1, characterized in that the second execution result comprises: a second operation result and a second prediction result; the third execution unit comprises: a second arithmetic logic unit and a second branch prediction unit;

The second arithmetic logic unit is used to: when the performance-critical instruction is not executed in the first execution unit and there is a result indicating that there is no data dependency, perform a logic operation on the performance-critical instruction without data dependency according to the operand information to obtain the second operation result;

The second branch prediction unit is used to: when the performance-critical instruction is not executed in the first execution unit and there is a result indicating that there is no data dependency, perform branch prediction processing on the performance-critical instruction without data dependency according to the operand information to obtain the second prediction result.

5. The device according to claim 4, characterized in that the second execution result includes: a control result, a multiplication result and a division result; the third execution unit also includes: a control and status register, a multiplication unit, and a division unit;

The control and status register is used to: perform reading and writing of the control and status register according to the operand information to obtain the control result;

The multiplication unit is used to: perform multiplication processing according to the operand information to obtain the multiplication result;

The division unit is used to perform division processing according to the operand information to obtain the division result.

6. The device according to claim 1, characterized in that the instruction fetch module comprises: an instruction fetch unit and a preprocessing unit, wherein the instruction fetch unit is connected to the preprocessing unit;

The instruction fetch unit is used to: perform length segmentation processing on the original instruction data to obtain complete instruction data, and send the complete instruction data to the preprocessing unit;

The pre-processing unit is used to: perform pre-decoding processing on the complete instruction data to obtain the target instruction data.

7. A processing system, characterized in that the processing system comprises: an instruction processing device according to any one of claims 1-6.

8. An instruction processing method, characterized in that it is applied to the instruction processing device according to any one of claims 1 to 6, the method comprising:

Acquire original instruction data, and perform length segmentation and pre-decoding processing on the original instruction data to obtain target instruction data;

Fully decoding the target instruction data to obtain instruction information; the instruction information includes instruction type and operand information;

Determining the existence result of the data dependency relationship according to the operand information, and determining the execution unit in the execution module according to the instruction type, the existence result and the operand information, performing corresponding operation processing in the execution unit to obtain the execution result; each of the execution units corresponds to a pipeline stage; wherein the execution module includes a first execution unit, a second execution unit and a third execution unit, and the execution result includes a first execution result and a second execution result; determining the existence result of the data dependency relationship according to the operand information, and determining the execution unit in the execution module according to the instruction type, the existence result and the operand information, performing corresponding operation processing in the execution unit to obtain the execution result, including:

When the instruction type is a performance-critical instruction and there is a result representation that there is no data dependency, the first execution unit performs a calculation on the performance-critical instruction according to the operand information to obtain a first execution result, and when the instruction type is a memory access instruction, the first execution unit calculates the memory address of the memory access instruction according to the operand information; the performance-critical instruction includes at least one of the following: a general computing instruction, a jump instruction;

In the second execution unit, a data loading or storing operation is performed according to the memory address;

When the performance-critical instruction is not executed in the first execution unit and there is a result indicating that there is no data dependency, the performance-critical instruction is operated in the third execution unit, and when the instruction type is a non-performance-critical instruction, corresponding processing is performed in the third execution unit according to operand information to obtain a second execution result.