CN119536818A

CN119536818A - Instruction scheduling decision method, device, equipment and medium

Info

Publication number: CN119536818A
Application number: CN202510082413.7A
Authority: CN
Inventors: 杨龚轶凡; 朱国梁; 闯小明; 戴长乐; 赵宇轩; 杨添淇
Original assignee: Zhonghao Xinying Hangzhou Technology Co ltd
Current assignee: Zhonghao Xinying Hangzhou Technology Co ltd
Priority date: 2025-01-20
Filing date: 2025-01-20
Publication date: 2025-02-28
Anticipated expiration: 2045-01-20
Also published as: CN119536818B

Abstract

The present application relates to the field of computer technology, and provides an instruction scheduling decision method, which establishes a dependency graph based on a sequence of instructions to be executed; establishes constraint conditions including dependency constraints according to the dependency and delay between each instruction node; constructs multiple candidate decisions, any candidate decision meets the constraint conditions, and each candidate decision includes the execution status of all instructions in the entire execution cycle of the sequence of instructions to be executed; obtains the total consumption value of scheduling and executing instructions according to each candidate decision, and any total consumption value includes the time consumption value and resource consumption value when scheduling and executing instructions according to the corresponding candidate decision; and uses the candidate decision with the smallest total consumption value as the instruction scheduling strategy. This method can ensure the global optimality of scheduling, effectively avoid the negative impact that the local optimal solution may have on the overall performance, and thus can significantly improve the performance and efficiency of the VLIW processor.

Description

Instruction scheduling decision method, device, equipment and medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a medium for instruction scheduling decision.

Background

With the continued development of computer architecture, very long instruction word (VLIW, very Long Instruction Word) processors have found wide application in high performance computing and embedded systems due to their efficient parallel computing capabilities. The VLIW processor greatly improves the calculation efficiency by packing a plurality of operations into one instruction word and executing the operations in parallel. However, instruction scheduling issues are one of the core challenges of VLIW architecture. Unlike superscalar processors, VLIW processors shift instruction scheduling responsibilities from hardware to a compiler, so that the compiler must efficiently solve dependency problems between instructions and allocate hardware resources rationally to avoid conflicts. Conventional VLIW instruction scheduling strategies generally rely on heuristic algorithms, which can quickly generate a feasible scheduling scheme, but often have difficulty in guaranteeing global optimum, and particularly, the problems of insufficient scheduling efficiency and parallelism are particularly prominent when complex dependency relationships and resource constraints are faced.

In instruction scheduling of VLIW processors, a number of inter-related optimization problems need to be solved. First are dependencies between instructions, including data dependencies, control dependencies, and resource dependencies. Data dependencies require that certain instructions must be executed after execution of a predecessor instruction is complete, while control dependencies involve the order of execution of instructions in a branch structure. Furthermore, the number of hardware resources under VLIW architecture is limited, such as integer arithmetic units, floating point arithmetic units, and memory units, so that the compiler needs to balance the use of these resources during scheduling, avoiding resource conflicts. For example, if multiple instructions contend for the same type of functional unit at the same time, the compiler must make an appropriate schedule to maximize parallelism without incurring resource conflicts.

Therefore, aiming at the problems of insufficient utilization of parallelism, unreasonable resource allocation and the like in the instruction scheduling of the current VLIW processor, an instruction scheduling scheme based on an accurate optimization algorithm is needed to further improve the computing efficiency and performance of the VLIW processor.

Disclosure of Invention

The embodiment of the application mainly aims to provide an instruction scheduling decision method, device, equipment and medium, and aims to solve at least one of the problems of insufficient parallelism utilization, unreasonable resource allocation and the like in the instruction scheduling of the current VLIW processor in the related technology.

In a first aspect, an embodiment of the present application provides an instruction scheduling decision method, where the method includes:

establishing a dependency graph based on an instruction sequence to be executed, wherein the dependency graph comprises a plurality of instruction nodes, edges between adjacent instruction nodes represent the dependency relationship between corresponding instructions, and weight values of the edges represent time delays between the instruction nodes connected with the edges;

Establishing constraint conditions comprising dependency constraint according to the dependency and delay among all instruction nodes, wherein the number of the dependency constraint included in the constraint conditions is smaller than the number of the dependency paths in the dependency graph;

Constructing a plurality of candidate decisions, wherein any candidate decision accords with the constraint condition, and each candidate decision comprises the execution state of all instructions in the whole execution period of the instruction sequence to be executed;

And acquiring total consumption values of scheduling and executing the instruction according to each candidate decision, wherein any total consumption value comprises a time consumption value and a resource consumption value when scheduling and executing the instruction according to the corresponding candidate decision.

Optionally, the establishing a constraint condition including a dependency constraint according to the dependency and the delay between the instruction nodes includes:

traversing a target instruction node pair with a dependency relationship, and establishing a dependency relationship constraint based on the dependency path length between two instruction nodes in the target instruction node pair, wherein the dependency path length is obtained based on the delay of the two connected instruction nodes;

And obtaining the constraint conditions according to the dependency constraint of each target node pair.

Optionally, the establishing a dependency constraint based on a dependency path length between two instruction nodes in the target instruction node pair includes:

If a unique dependence path exists between two instruction nodes, establishing dependence relation constraint according to the length of the dependence path;

If there are multiple dependent paths between two instruction nodes, the length of each dependent path is obtained, and

And constructing a dependency constraint based on the longest dependency path in the plurality of dependency paths.

Optionally, the constraint further includes one of:

A resource constraint configured to constrain, at each time, the processing unit resources allocated to each instruction to not exceed the total amount of resources that the system can provide;

instruction scheduling constraints configured to constrain each instruction to be necessary and to be scheduled only once;

a time constraint configured to control earliest and latest execution times of each instruction, and

Instruction issue constraints configured to constrain individual instructions that can be scheduled at the same time.

Optionally, the constructing a plurality of candidate decisions includes:

Introducing a decision variable for the execution state of each instruction at each moment, wherein the decision variable is a binary variable, different values represent different execution states, and the execution states comprise execution or non-execution;

and traversing the feasible value combinations of the decision variables of each instruction at each moment to obtain candidate decisions, wherein each value combination corresponds to one candidate decision, and any candidate decision accords with the constraint condition.

Optionally, at least one candidate decision scheduling instruction is executed in parallel in each candidate decision, and the specific content included in the constraint condition is adjusted based on hardware resources and performance targets.

Optionally, the instruction scheduling policy is obtained from a plurality of candidate decisions by minimizing the following objective function:

O= l+β×C

Wherein O is the total consumption value of one candidate decision, l is the total scheduling length of one candidate decision, which represents the total time required by all instructions to complete execution when the instructions are scheduled to be executed based on the candidate decision, C is the maximum number of virtual registers which survive simultaneously, the virtual registers are used for guaranteeing the execution of each instruction, and beta is a weight factor used for adjusting the relative importance of l and C in the decision.

In a second aspect, an embodiment of the present application provides an instruction scheduling decision device, which has a function of implementing an instruction scheduling decision method corresponding to the first aspect. The functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above, which may be software and/or hardware.

In one embodiment, the instruction scheduling decision apparatus includes:

The first construction module is configured to establish a dependency graph, the dependency graph comprises a plurality of instruction nodes, edges between adjacent instruction nodes represent dependency relations among corresponding instructions, and weight values of the edges represent time delays among the instruction nodes connected with the edges;

The second construction module is configured to establish constraint conditions comprising dependency constraint according to the dependency relationship and the time delay among the instruction nodes, wherein the number of the dependency relationship constraint included in the constraint conditions is less than the number of the dependency paths in the dependency relationship graph;

the third construction module is configured to construct a plurality of candidate decisions, any candidate decision accords with the constraint condition, and each candidate decision comprises the execution state of all instructions in the whole execution cycle of the instruction sequence to be executed;

an acquisition module configured to acquire total consumption values of scheduled execution of instructions according to respective candidate decisions, each total consumption value including a time consumption value and a resource consumption value when scheduled execution of instructions according to corresponding candidate decisions, and

And the decision module is configured to take the candidate decision with the minimum total consumption value as an instruction scheduling strategy.

In a third aspect, embodiments of the present application provide a computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the instruction scheduling decision method of the first aspect.

In a fourth aspect, an embodiment of the present application provides a computing device, including a memory, a processor, and a computer program stored on the memory and capable of running on the processor, where the processor implements the instruction scheduling decision method of the first aspect when executing the computer program.

Compared with the prior art, the embodiment of the application provides a novel instruction scheduling decision method, which is based on an instruction sequence to be executed, and firstly constructs a dependency graph, wherein the graph is composed of a plurality of instruction nodes. Between these nodes, adjacent instructions represent their dependencies by edges, and the weight value of each edge reflects in particular the time delay between the nodes of the connected instructions. And then, according to the dependency relationship among the instruction nodes and the corresponding time delay, a series of constraint conditions containing the dependency relationship constraint are formulated. The number of dependency relation constraints contained in the constraint conditions is less than the total number of dependency paths in the dependency relation graph, so that effective simplification and management of complex dependency relations are realized.

To obtain different scheduling possibilities, a number of candidate decision schemes are thus constructed. Each candidate decision strictly follows the dependency constraints described above and covers the execution state of all instructions of the sequence of instructions to be executed throughout the execution cycle.

Then, the total consumption value of each candidate decision in executing the instruction is evaluated. This total consumption value is an overall indicator that includes not only the time consumption value, i.e., the total time required for the instruction to schedule execution, but also the resource consumption value, i.e., the various resource costs required during execution.

Finally, by comparing the total consumption values of the candidate decisions, the candidate decision in which the total consumption value is the smallest is selected as the instruction scheduling policy. The strategy effectively ensures the instruction execution sequence and efficiency, and simultaneously reduces the time and resource cost in the execution process to the greatest extent.

In summary, the method can ensure global optimality of scheduling, effectively avoid negative influence of local optimal solution on overall performance, remarkably improve performance and efficiency of the VLIW processor, and meet requirements of modern computing tasks on efficient parallel computing.

Drawings

The objects, features and advantages of embodiments of the present application will become readily apparent from the detailed description of the embodiments of the present application read with reference to the accompanying drawings. Wherein:

FIG. 1 is a flow chart of an instruction scheduling decision method according to an embodiment of the present application;

FIG. 2 is a partial schematic diagram of a dependency graph according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an instruction scheduling decision apparatus according to an embodiment of the present application, and

FIG. 4 is a schematic diagram of a computing device according to an embodiment of the application.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that some, but not all embodiments of the application are described. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.

It is to be understood that the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

The embodiment of the application provides a method, a device, equipment and a medium for instruction scheduling decision. The instruction scheduling decision method can be applied to terminal equipment, and the terminal equipment can be electronic equipment such as tablet computers, notebook computers, desktop computers, personal digital assistants, wearable equipment and the like. The terminal device may be a server or a server cluster.

Some embodiments of the application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

Referring to fig. 1, fig. 1 is a flow chart of an instruction scheduling decision method according to an embodiment of the application.

As shown in fig. 1, the instruction scheduling decision method includes steps S101 to S105.

And step S101, establishing a dependency graph based on the instruction sequence to be executed.

In the embodiment of the application, the instruction sequence to be executed can be a plurality of instructions which are constructed based on VLIW and wait for scheduling to be executed, and various dependency relations can exist among the instructions. The dependency graph is built based on the dependency relationship among the instructions in the instruction sequence to be executed, and can be a directed acyclic graph (DIRECTED ACYCLIC GRAPH, DAG), as shown in fig. 2, the dependency graph comprises a plurality of instruction nodes, edges between adjacent instruction nodes represent the dependency relationship among corresponding instructions, and weight values of the edges represent time delays among the connected instruction nodes. For example, in fig. 2, a load, an add, and a store each represent an instruction node, each instruction node corresponds to an instruction, an edge between the load and the add represents a dependency relationship between the two, an arrow represents a direction of the dependency, and a weight 7 represents a delay of 7 between the instruction load and the instruction add.

In a VLIW architecture, dependencies cover both data dependencies and control dependencies. Data dependency refers to the fact that the output of one instruction is the input of another instruction, and control dependency refers to control structures such as conditional branches or loops, where execution of one instruction depends on the outcome of the other instruction.

In the embodiment of the application, the delay between instruction nodes represents the execution delay or the preparation time between corresponding instructions, which is important for optimizing the execution sequence of the instructions and reducing the waiting time. Through global dependency analysis, a compiler or processor can identify which instructions can be executed in parallel (i.e., there are no direct dependencies between them) and which instructions must be executed in a particular order (due to the presence of dependencies).

In the embodiment of the application, the instruction sequence to be executed is constructed into the directed acyclic dependency graph, so that the relationship among the instructions can be analyzed based on the DAG graph, the efficient instruction packets can be generated, the instruction packets can be executed in parallel in a single long instruction word, and all the dependency relationships are ensured to be satisfied, thereby avoiding execution errors and performance degradation, enabling the processor to more effectively utilize the parallel processing capability, and improving the overall calculation efficiency.

Step S102, establishing constraint conditions comprising dependency constraint according to the dependency and delay among the instruction nodes.

In the embodiment of the application, each dependency constraint can be established directly based on the dependency and delay between each instruction node in the dependency graph, and each dependency constraint is used as a constraint condition. In particular, edges and weights between instruction nodes may be translated into individual dependency constraints, i.e., each feasible dependency path is translated into a dependency constraint. For example, referring to FIG. 2, based on the dependencies shown in FIG. 2, the 4 dependency constraints of time(store) - time(load)>= 7、time(store) - time(add)>= 4、time(add) - time(load)>= 7、time(store) - time(load)>= 11 may be derived and based thereon constraints may be derived.

In a parallel processing environment of a VLIW architecture, processing dependencies between instructions is a complex and critical task. In the embodiment of the application, in order to optimize the execution sequence and efficiency of the instructions, constraint conditions can be constructed based on the dependency relationship and delay among instruction nodes. These constraints are intended to ensure that all necessary dependencies are satisfied while minimizing strict constraints on the execution order in order to better utilize the parallel processing capabilities of the processor.

First, a direct dependency between instructions is determined from the dependency graph. In a DAG, each edge represents a dependency, the weight of which represents the delay between instructions. However, in building constraints, it is not necessary to translate each edge in the DAG into a dependency constraint. This is because some dependencies may be indirect or redundant repeatedly, i.e., they may be deduced by other direct dependencies or equivalently replaced by other dependencies.

Thus, a more efficient approach may be employed to identify key direct dependencies or dependencies sufficient to replace other dependencies through global dependency analysis and translate these into dependency constraints. These dependency constraints ensure that all necessary dependencies are satisfied while allowing more instructions to execute in parallel without direct dependencies.

In a preferred embodiment of the application, the number of dependency constraints included in the constraint condition is less than the number of dependency paths in the dependency graph. Referring to fig. 2, fig. 2 is a schematic diagram illustrating a part of an instruction dependency graph according to an embodiment of the present application, in which not all dependencies are valid for an optimal decision of instruction scheduling in one dependency graph, a large number of redundant dependencies may cause a time for obtaining the optimal decision of instruction scheduling to rise. For example, as can be seen from fig. 2, the path length from the instruction load through the add to the store is 11, which is greater than the path length from the load directly to the store, so that a time (store) -time (load) > = 7 constraint need not be created when the dependency constraint is established. That is, when the dependency constraint is established based on fig. 2, the number of resulting dependency constraints (e.g., 3) is less than the number of all the dependency paths in fig. 2, i.e., 4.

In the embodiment of the application, the dependency paths in the dependency graph represent paths between two instructions with a dependency relationship, and the dependency relationship is not only a direct dependency relationship but also an indirect dependency relationship, so in the dependency graph shown in fig. 2, there are 4 dependency paths in total, that is, in addition to 3 edges representing the direct dependency relationship, there are dependency paths from an instruction load to a store through an add.

It will be appreciated that while redundant constraints increase the complexity of obtaining optimal scheduling decisions, appropriate redundant constraints will also accelerate the process of obtaining optimal scheduling decisions, equivalent to a constraint that is constrained by a strip of constraint in space where the original solution of a function without constraint is to be performed. Thus, in the example shown in fig. 2 above, a time (store) -time (load) > = 11 dependency constraint may be created on the basis of creating a time (store) -time (add) > = 4, time (add) -time (load) > = 7 dependency constraint, which would speed up propagation of the constraint on the solution space for solving the optimal scheduling decision.

In an alternative embodiment of the present application, in order to obtain dependency constraints less than the number of dependency paths based on the dependency graph, step S102 may include the following steps S1021 to S1022:

step S1021, traversing a target instruction node pair with a dependency relationship, and establishing a dependency relationship constraint based on the dependency path length between two instruction nodes in the target instruction node pair.

In the embodiment of the application, in order to establish the dependency constraint, all instructions with the dependency relationship need to be acquired. Therefore, the dependency graph needs to be traversed first, and all instruction nodes with dependency relationships, namely target instruction node pairs, are acquired. After each target instruction node pair in the dependency graph is obtained, each dependency constraint can be established based on the dependency path length between each target instruction node pair. It will be appreciated that the dependency path length between a target instruction node pair may be derived based on the latency of the two instruction nodes to which it is connected (i.e. the two instruction nodes comprised by the target instruction node pair). For example, referring to fig. 2, if one target instruction node pair is a load and an add, the dependency path length is 7, and the dependency constraint established accordingly is time (add) -time (load) > =7.

It will be appreciated that the dependency paths between instruction nodes are conveniently constructed based on dependencies and delays between instructions, and thus, in some embodiments, the target node pairs and dependency path lengths may also be obtained based on dependencies and delays between instructions in order to establish dependency relationships. In particular, some basic data structures may be defined to store individual instruction nodes, dependencies of individual instruction nodes, and latency information. Each instruction node may contain an identifier unique to each instruction, the time required for the instruction to execute, other instructions dependent on the instruction, delay information. All instruction nodes and their dependencies are represented using a adjacency table, where each instruction node is indexed by its ID and stores its dependent instruction nodes and latency information.

The dependent path length between the two instruction nodes is then calculated, which typically involves all paths from the source node to the destination node, and the total delay on these paths is calculated. To calculate the dependent path length between two instruction nodes, it is necessary to traverse all possible paths and calculate the total delay on the paths. This may be done by a depth-first search or breadth-first search algorithm. Considering that there may be multiple paths between two instruction nodes, it may be generally only the shortest path or the longest path that is of interest, i.e. the dependency constraint is built only from the longest path or the shortest path.

Finally, according to the calculated path length, a corresponding dependency constraint can be established. For example, if the shortest path length between two nodes exceeds a certain threshold, a constraint may be set to ensure that the two nodes cannot execute at the same time or to adjust their execution order.

In the embodiment of the application, based on the dependency path length between two instruction nodes in a target instruction node pair, a dependency relationship constraint is established, and the method comprises the following steps of:

And step I, if a unique dependency path exists between two instruction nodes, establishing dependency relation constraint according to the length of the dependency path.

In the embodiment of the application, when the dependency relationship between the instruction nodes is processed, if a dependency path exists between two instruction nodes, dependency relationship constraint needs to be established according to the length of the dependency path. If there is only one unique dependency path between two instruction nodes, the dependency constraint is established directly according to the length of the path. This means that the instruction corresponding to the source instruction node must wait for completion before executing the instruction corresponding to the destination instruction node, and the waiting time is at least equal to the length of this path (i.e. the sum of all delays on the path). For example, referring to fig. 2, where there is only one dependency path between an instruction node load and an instruction node add, and the dependency path length is 7, a dependency constraint time (add) -time (load) > = 7 between two instructions may be established.

And step II, if a plurality of dependent paths exist between the two instruction nodes, acquiring the lengths of the dependent paths.

If there are multiple dependent paths between two instruction nodes, the lengths of all the dependent paths need to be considered and a dependency constraint established based on the longest path. This is because the longest path represents the longest time that the target instruction node needs to wait to execute.

Specifically, all dependency paths between two instruction nodes are first determined. Then, for each path, the sum of all delays on the path is calculated. And finding out the longest path in all paths, establishing dependency constraint, and ensuring that the target instruction node can be executed after the source instruction node completes and the waiting time is at least equal to the length of the longest path. For example, referring to fig. 2, where there are two dependency paths between an instruction node load and an instruction node store, one is from the instruction node load directly to the instruction node store, and the other is from the instruction node load through the instruction node add to the instruction node store, where the path length of the direct dependency path is 7, the path length of the dependency path of the path instruction node add is 11, and thus a dependency constraint time (store) -time (load) > = 11 between two instructions can be established.

In order to implement these steps, a graph algorithm (such as breadth first search BFS or depth first search DFS) may be used to traverse the dependency relationship between instruction nodes, and calculate the path length, so as to obtain the longest path length, which is not described in detail herein.

And step III, constructing a dependency constraint based on the longest dependency path in the plurality of dependency paths.

In the embodiment of the application, a manner of constructing a dependency constraint based on a dependency graph is exemplarily given by steps I-III, but not limited thereto. Those skilled in the art can also select other ways to construct each dependency constraint based on the dependency between each instruction in the sequence to be executed according to the actual situation.

Step S1022, obtaining the constraint conditions according to the dependency constraint of each target node pair.

In the embodiment of the application, after the dependency constraint among each target node pair is obtained, the dependency constraints can be integrated to form a complete dependency constraint set. I.e. constraints. These constraints are typically used to ensure that dependencies between instruction nodes are properly maintained during execution, i.e., instructions corresponding to the dependent instruction nodes are executed first and then instructions corresponding to the instruction nodes that depend thereon.

The following is the step of obtaining constraint conditions according to dependency constraints:

For each pair of target nodes (a, B), it is checked whether there is a dependent path from a to B. If a dependent path exists, it is determined whether the dependent path is the only dependent path or whether the longest path of the plurality of dependent paths exists. For a determined dependent path, the total time required for all nodes on the dependent path to execute is calculated. If there are multiple dependent paths, the length of the longest dependent path is selected as the dependent length.

For each pair of nodes (a, B) having a dependency, a constraint is formed indicating that execution of B must take place after execution of a has completed and the latency is at least equal to the dependency length.

All the formed dependency constraints are integrated into one constraint set. It is ensured that each constraint in the set of constraints is explicit and executable, i.e. there are no cyclic dependencies or unsatisfied dependencies. In practical applications, it may be necessary to verify the validity of the constraint set to ensure that it is able to properly guide the execution order of the nodes.

This can be verified by simulating the execution process or using a special scheduling algorithm. Once the constraint set is verified as valid, it can be applied to the actual execution. During execution, the scheduler (compiler in very long instruction word processing) or execution engine will follow these constraints to ensure that dependencies between instructions are properly maintained.

It will be appreciated that reliable execution of instructions does not only require resolution of dependencies between different instructions, but also requires suitable computing resources, execution time, etc. Thus, in an embodiment of the present application, the constraints further include at least one of:

a time constraint configured to control earliest and latest execution times of each instruction;

In particular, they together ensure efficient utilization of system resources, proper execution order of instructions, and timing requirements. The following is a detailed explanation of these constraints:

The resource constraint is to ensure that at any given time, the processing unit (e.g., CPU core, memory, IO device, etc.) resources allocated to each instruction do not exceed the total amount of resources that the system can provide. This may be implemented by a resource manager or scheduler that keeps track of available resources and allocated resources and performs resource reclamation or reallocation as necessary. At each moment, the processing unit resources allocated to each instruction cannot exceed the total amount of resources available to the system, and in particular, the register resources are not strictly limited, not allowed to exceed the upper limit of the physical register, but are included in the optimization target of the objective function (used for computing and solving the optimal instruction scheduling policy), and the register in the final scheduling result cannot exceed the upper limit by adding an instruction overflowing the register data into the memory in the subsequent scheduling.

The instruction scheduling constraint is to ensure that each instruction must be scheduled only once to avoid resource waste or data inconsistency caused by repeated execution. Specifically, the scheduler may maintain an instruction state table that records the current state (e.g., unscheduled, scheduled, executing, completed, etc.) of each instruction. When a new instruction is dispatched, the dispatcher checks whether the instruction has been dispatched and determines whether to add it to the execution queue accordingly.

The time constraint is to control the earliest and latest execution times of each instruction to meet specific timing requirements. In particular, the time constraint is typically implemented by a time stamp or a time window. Each instruction is given an earliest start time and a latest completion time. The scheduler may take these time constraints into account when scheduling instruction execution to ensure that the instructions execute within a specified time frame.

The instruction issue constraint is to be able to schedule the various instructions at the same time to ensure compatibility between them (e.g., not simultaneously occupying the same resources or generating data collisions). In particular, instruction issue constraints are typically determined by dependencies between instructions and resource usage. The scheduler checks dependencies between instructions before issuing instructions and ensures that instructions issued simultaneously do not violate any resource usage restrictions or data consistency requirements.

These constraints play a critical role in instruction execution. Together they ensure stability, reliability and performance of the system. In practical applications, these constraints may be adjusted and optimized according to specific hardware platforms, software architectures, and application requirements. For example, in high performance computing or real-time systems, time constraints and instruction emission constraints may be more stringent, while in resource constrained embedded systems, resource constraints may be more critical.

By carefully selecting these key dependencies and translating them into constraints, the correctness and efficiency of instruction execution can be ensured. At the same time, since the number of dependency constraints is less than the number of dependency paths in the DAG, more flexibility is provided for parallel execution of instructions.

Ultimately, these constraints will be used to guide the scheduling and execution of instructions to ensure that all dependencies are satisfied while maximizing the efficiency of parallel processing.

S103, constructing a plurality of candidate decisions.

In the embodiment of the application, any candidate decision accords with the constraint condition, and each candidate decision comprises the execution state of all instructions in the whole execution cycle of the instruction sequence to be executed. Specifically, the execution state of each instruction at each time may be determined and output as a candidate decision. For example, referring to FIG. 2, assuming the sequence of instructions to be executed is load, add, and store, the candidate decision may be to execute the instruction load (other instructions not executed) at times 1-7, to execute the instruction add (other instructions not executed) at times 8-11, and to execute the store (other instructions not executed) at times 12.

For more convenient construction of candidate decisions, in an embodiment of the present application, step S103 may include the following steps S1031 to S1032:

step S1031, introducing a decision variable for the execution state of each instruction at each time.

Wherein the decision variable is a binary variable, different values representing different execution states, the execution states including executing or not executing. For example, the binary variable may take on a value of 0 or 1, or other possible values, without limitation. While different values may represent different execution states, e.g., a1 may be set to execute and a0 may be set to not execute.

In the embodiment of the application, decision variables are introduced in the planning, scheduling or optimizing problem in order to describe the behavior of the system precisely and quantitatively, and more visual and precise methods (such as linear planning, integer planning and the like) can be used for solving the optimal solution. Specifically, a binary (0-1) decision variable may be introduced for each instruction at each time instant to indicate whether the instruction is executed. If the instruction is executed at a certain moment, the variable takes a value of 1, otherwise, the variable takes a value of 0.

Assuming a time range t= { T ₁, t₂, ..., t_n } and a set of instructions i= { I ₁, i₂, ..., i_m }, a binary decision variable x _it can be defined corresponding to the execution state of instruction I at time T, if x _it = 1, if instruction I is executed at time T, if x _it = 0, if instruction I is not executed at time T.

This decision variable may help express various constraints and goals regarding instruction execution in constructing candidate decisions. For example, if it is desired that an instruction is executed only within a specific time period, a corresponding constraint may be added, and if it is desired to minimize or maximize the number of executions of an instruction, a related parameter term may be added to the target.

By reasonably setting these decision variables and associated constraints and targets, algorithms such as path planning can be utilized to find a solution that satisfies all constraints while optimizing the targets, i.e., an optimal instruction scheduling strategy.

S1032, obtaining candidate decisions by traversing feasible value combinations of decision variables of each instruction at each moment.

Each value combination corresponds to one candidate decision, and any candidate decision accords with the constraint condition.

In the embodiment of the application, the feasible value combinations of the decision variables of each instruction at each moment are traversed to obtain the candidate decision, and the method involves exhausting all possible execution states and ensures that each value combination accords with constraint conditions.

Specifically, all instructions and time points are defined first, a two-dimensional matrix or similar structure is built, wherein each element represents a binary decision variable x _it, and then all dependency constraints and other constraint conditions are defined explicitly. For example, some instructions must be executed before other instructions, or some instructions may not be executed concurrently, etc. For each decision variable, try its two possible values (0 or 1) and generate all possible state combinations. For each generated combination, it is checked whether all constraints are met. Only combinations that meet these constraints can be considered candidate decisions. From all candidate decisions meeting the constraint conditions, an optimal solution is selected according to an optimization target (such as minimizing cost, maximizing resource utilization, etc.).

In the embodiment of the present application, a way of constructing a candidate decision is exemplarily provided through steps S1031 to S1032, but is not limited thereto. For example, in the embodiment of the application, the candidate decision of the instruction sequence to be executed can be constructed through a 0-1 integer linear programming model.

It can be understood that, in order to achieve optimal resource and time consumption for scheduling execution of instructions, the processor resource utilization efficiency is improved as much as possible, in the embodiment of the present application, at least one candidate decision scheduling instruction exists in each candidate decision to execute in parallel, that is, when there is an instruction that can be executed in parallel in the instruction sequence to be executed, the scheduling scheme of the parallel execution instruction is necessarily needed in the candidate decision to maximize the efficiency of instruction execution. In addition, it will be appreciated that execution of the instructions requires consumption of computational resources, i.e., the instructions are limited by the need to rely on computational resources, and that the scheduling execution scheme of the instructions requires as much as possible to achieve the optimization objective, so that the constraints include specific content that is adjusted based on hardware resources and performance objectives. For example, if the resources are limited, the resource constraint in the constraint needs to be set to the maximum resource consumption that the system can withstand.

S104, obtaining the total consumption value of scheduling execution of the instruction according to each candidate decision.

Wherein, any total consumption value comprises a time consumption value and a resource consumption value when the instruction is scheduled to be executed according to the corresponding candidate decision.

In the embodiment of the application, in order to obtain the total consumption value of scheduling execution of the instruction according to each candidate decision, the time consumption value and the resource consumption value under each candidate decision need to be calculated respectively.

Specifically, the total consumption value for scheduling execution of instructions according to each candidate decision may be obtained by the following formula:

O= l+β×C

Based on the above formula, an objective function for solving the optimal instruction scheduling policy can be obtained. Instruction scheduling policies may be obtained from a plurality of candidate decisions by minimizing the objective function. Specifically, once a mathematical model is built that contains decision variables, objective functions, and constraints, an appropriate solver (e.g., CPLEX, gurobi, etc.) can be used to find the optimal solution. These tools can efficiently handle complex integer programming problems and provide a viable scheduling solution.

Specifically, the time consumption value is a total scheduling length l for executing one candidate decision, and represents a total time required for completing execution of all instructions when the instructions are scheduled to be executed based on the candidate decision. First, all possible instruction scheduling decisions (i.e., candidate decisions) may be listed. Each candidate decision is simulated, recording the total time it takes from the start of execution to completion of the instruction. This may be done by using special simulators or running time consuming values under different candidate decisions on the actual hardware to find the optimal solution.

The resource consumption value is the maximum number of virtual registers which survive at the same time and are used for guaranteeing the execution of each instruction, and in addition, the resource consumption value can also comprise CPU time, memory use, IO operation and the like. In order to calculate the resource consumption value, steps may be taken in particular to collect resource usage data using a system monitoring tool (such as a/usr/bin/time command under Linux) when executing instruction scheduling. And calculating the resource consumption value under each candidate decision according to the collected data.

Finally, the time consumption value and the resource consumption value are weighted and summed to obtain the total consumption value of each candidate decision. This value may be used to compare the merits of different candidate decisions to select the optimal instruction scheduling decision.

In the optimization problem, the weight factor β is used to adjust the relative importance between different objectives or constraints. The beta value depends on the specific traffic requirements and priorities. For example, if time is the most critical factor (e.g., a real-time system), a smaller β value may be selected such that the time consumption value l (D) occupies a more important position in the decision. If resource efficiency is more important (e.g., energy limited environments), a larger beta value may be selected such that resource consumption C (D) is a major consideration.

If a balance is required between time and resources, β can be adjusted according to the actual situation, ensuring that both are properly valued. In practical applications, the choice of the weight factor β may not be fixed, but dynamically adjusted. For example, in some phases, time may be the most important, while in other phases, resource efficiency may be more critical. Therefore, the value of beta can be timely adjusted according to the running state of the system or the change of external conditions so as to adapt to different requirements.

In the embodiment of the present application, it is assumed that there are two candidate decisions D1, D2 whose time consumption and resource consumption are shown in table 1:

TABLE 1

If β=0.6 is set, the total consumption values are respectively:

f(D1)=(1−0.6)⋅10+0.6⋅5=4+3=7

f(D2)=(1−0.6)⋅8+0.6⋅7=3.2+4.2=7.4

In this case, D2 is better in time, but the total consumption value of D1 is slightly lower because β is more biased towards resource consumption, and thus may be chosen as a better decision.

S105, taking a candidate decision with the minimum total consumption value as an instruction scheduling strategy

In the embodiment of the application, the possible value combinations of the decision variables of different instructions in all candidate decisions can be directly traversed to obtain the total consumption value of each candidate decision so as to obtain the optimal candidate decision as the instruction scheduling strategy, and more efficient algorithms and techniques can be used to improve the solving efficiency, for example, a business solver (such as CPLEX, gurobi and an open source solver or-tools) is used for processing complex constraint conditions.

By the method, each candidate decision can be systematically evaluated, and the final instruction scheduling strategy with the minimum total consumption value is selected. This approach not only takes into account time and resource consumption, but also allows for dynamic adjustment of the balance between the two by the weight factor β, thereby better accommodating different traffic demands and priorities.

The embodiment of the application provides a novel instruction scheduling strategy method, which is based on an instruction sequence to be executed, and firstly constructs a dependency graph which is composed of a plurality of instruction nodes. Between these nodes, adjacent instructions represent their dependencies by edges, and the weight value of each edge reflects in particular the time delay between the nodes of the connected instructions. And then, according to the dependency relationship among the instruction nodes and the corresponding time delay, a series of constraint conditions containing the dependency relationship constraint are formulated. The number of dependency relation constraints contained in the constraint conditions is less than the total number of dependency paths in the dependency relation graph, so that effective simplification and management of complex dependency relations are realized.

In order to obtain different scheduling possibilities, embodiments of the present application construct a plurality of candidate decision schemes. Each candidate decision strictly follows the dependency constraints described above and covers the execution state of all instructions of the sequence of instructions to be executed throughout the execution cycle.

The present exemplary embodiment proposes an instruction scheduling decision apparatus, and referring to fig. 3, fig. 3 shows a schematic structural diagram of an instruction scheduling decision apparatus, which is applicable to a server in a scenario where an instruction scheduling decision method needs to be determined.

The instruction scheduling decision device in the embodiment of the present application can implement the functions corresponding to the instruction scheduling decision method executed in the embodiment corresponding to fig. 1. The function realized by the instruction scheduling decision device can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above, which may be software and/or hardware.

In the embodiment of the present application, please refer to fig. 3, the instruction scheduling decision apparatus 1 includes a first construction module 11, a second construction module 12, a third construction module 13, an obtaining module 14, and a decision module 15, wherein:

A first building block 11 configured to build a dependency graph;

In order to precisely maintain the correct order of instruction execution, the first building block 11 builds a dependency graph (DAG, directed acyclic graph) of the instruction sequence, in which each node represents a specific instruction and each edge symbolizes the dependency between instructions. The weighting of the edges is given the practical meaning of execution delay between instructions.

A second building block 12 configured to build constraint conditions including dependency constraints according to dependencies and delays between the respective instruction nodes;

in an embodiment of the present application, to optimize the execution order and efficiency of instructions, constraints may be constructed based on dependencies and delays between instruction nodes.

The second construction module 12 establishes constraint conditions including dependency constraints according to the dependency relationships and delays between the instruction nodes, wherein the number of the dependency relationship constraints included in the constraint conditions is smaller than the number of the dependency paths in the dependency graph.

These constraints are intended to ensure that all necessary dependencies are satisfied while minimizing strict constraints on the execution order in order to better utilize the parallel processing capabilities.

A third construction module 13 configured to construct a plurality of candidate decisions;

In the embodiment of the present application, in order to construct a plurality of candidate decisions, and ensure that each candidate decision accords with a dependency constraint and includes an execution state of all instructions in an entire execution cycle, a third construction module 13 constructs a plurality of candidate decisions, any candidate decision accords with the dependency constraint, and each candidate decision includes an execution state of all instructions in an entire execution cycle of an instruction sequence to be executed. This approach ensures not only that all instructions are considered, but also that the scheduling scheme is valid and viable.

An acquisition module 14 configured to acquire a total consumption value for scheduling execution of instructions in accordance with the respective candidate decisions.

In the embodiment of the application, in order to obtain the total consumption value of scheduling execution of the instruction according to each candidate decision and ensure that each total consumption value comprises a time consumption value and a resource consumption value, the time consumption value and the resource consumption value under each candidate decision need to be calculated respectively. The obtaining module 14 obtains total consumption values of scheduling execution of the instructions according to each candidate decision, wherein any total consumption value comprises a time consumption value and a resource consumption value when scheduling execution of the instructions according to the corresponding candidate decision.

A decision module 15 configured to take the candidate decision with the smallest total consumption value as an instruction scheduling policy.

In order to optimize the system performance and reduce the execution time and resource usage, the embodiment of the present application uses the candidate decision with the smallest total consumption value as the instruction scheduling policy as a reasonable decision method, and the decision module 15 uses the candidate decision with the smallest total consumption value as the instruction scheduling policy. The strategy effectively ensures the instruction execution sequence and efficiency, and simultaneously reduces the time and resource cost in the execution process to the greatest extent.

The embodiment of the application provides an instruction scheduling decision device, which is based on an instruction sequence to be executed, and firstly constructs a dependency graph through a first construction module 11, wherein the graph is composed of a plurality of instruction nodes. Between these nodes, adjacent instructions represent their dependencies by edges, and the weight value of each edge reflects in particular the time delay between the nodes of the connected instructions. Based on the dependency relationships between the instruction nodes and their corresponding time delays, the second building block 12 builds a series of constraints including the dependency relationship constraints. The number of dependency relation constraints contained in the constraint conditions is less than the total number of dependency paths in the dependency relation graph, so that effective simplification and management of complex dependency relations are realized.

To obtain different scheduling possibilities, the third construction module 13 constructs a plurality of candidate decision schemes. Each candidate decision strictly follows the constraints described above and covers the execution state of all instructions of the sequence of instructions to be executed throughout the execution cycle.

Then, the total consumption value of each candidate decision in executing the instruction is evaluated. The obtaining module 14 is configured to obtain a total consumption value, which is a comprehensive index, of scheduling execution of the instruction according to each candidate decision, and includes not only a time consumption value, i.e. a total time required for scheduling execution of the instruction, but also a resource consumption value, i.e. various resource costs required in the execution process.

Finally, by comparing the total consumption values of the individual candidate decisions, the decision module 15 selects the candidate decision, where the total consumption value is the smallest, as our instruction scheduling policy. The strategy effectively ensures the instruction execution sequence and efficiency, and simultaneously reduces the time and resource cost in the execution process to the greatest extent.

In summary, the device can ensure global optimality of scheduling, and effectively avoid negative effects of local optimal solutions on overall performance, so that performance and efficiency of the VLIW processor can be remarkably improved, and requirements of modern computing tasks on efficient parallel computing are met.

In order to achieve the above objective, the present application further provides a computer device 2, please refer to fig. 4, fig. 4 is a schematic structural diagram of a computing device according to an embodiment of the present application, in fig. 4, the computer device 2 includes a plurality of computer devices, the components of the instruction scheduling decision device 1 may be dispersed in different computer devices 2, and the computer device 2 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server or a rack server (including a server cluster formed by a plurality of servers) for executing a program, etc. The computer device 2 of the present embodiment includes at least, but is not limited to, a memory 21, a processor 23, a network interface 22, and an instruction scheduling decision apparatus 1 (refer to fig. 4) that can be communicatively connected to each other through a system bus. It should be noted that fig. 4 only shows a computer device 2 having components, but it should be understood that not all of the illustrated components are required to be implemented, and that more or fewer components may be implemented instead.

In this embodiment, the memory 21 includes at least one type of computer readable storage medium, including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 21 may be an internal storage unit of the computer device 2, such as a hard disk or a memory of the computer device 2. In other embodiments, the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk provided on the computer device 2, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like. Of course, the memory 21 may also comprise both an internal memory unit of the computer device 2 and an external memory device. In this embodiment, the memory 21 is typically used for storing an operating system and various types of application software installed on the computer device 2, such as program codes of the instruction scheduling decision method of the present application. Further, the memory 21 may be used to temporarily store various types of data that have been output or are to be output.

The processor 23 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 23 is typically used to control the overall operation of the computer device 2, e.g. to perform control and processing related to data interaction or communication with said computer device 2, etc. In this embodiment, the processor 23 is configured to execute the program code or process data stored in the memory 21, for example, execute the instruction scheduling policy apparatus 1.

The network interface 22 may comprise a wireless network interface or a wired network interface, which network interface 22 is typically used to establish a communication connection between the computer device 2 and other computer devices 2. For example, the network interface 22 is used to connect the computer device 2 to an external terminal through a network, establish a data transmission channel and a communication connection between the computer device 2 and the external terminal, and the like. The network may be an Intranet (Intranet), the Internet (Internet), a global system for mobile communications (Global System of Mobile communication, GSM), wideband code division multiple access (Wideband Code Division Multiple Access, WCDMA), a 4G network, a 5G network, bluetooth (Bluetooth), wi-Fi, or other wireless or wired network.

It is noted that fig. 4 only shows a computer device 2 having components 21-23, but it is understood that not all of the illustrated components are required to be implemented, and that more or fewer components may alternatively be implemented.

In this embodiment, the instruction scheduling policy apparatus 1 stored in the memory 21 may be further divided into one or more program modules, which are stored in the memory 21 and executed by one or more processors (the processor 23 in this embodiment) to complete the present application.

To achieve the above object, the present application also provides a computer-readable storage medium including a plurality of storage media such as a flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by the processor 23, performs the corresponding functions. The computer readable storage medium of the present embodiment is used for storing the instruction scheduling decision apparatus 1, and when executed by the processor 23, implements the instruction scheduling decision method of the embodiment of the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, apparatuses and modules described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein.

In the embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When the computer program is loaded and executed on a computer, the flow or functions according to the embodiments of the present application are fully or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk Solid STATE DISK (SSD)), etc.

The foregoing describes the technical solution provided by the embodiments of the present application in detail, and specific examples are applied to the embodiments of the present application to explain the principles and implementation modes of the embodiments of the present application, and the description of the above embodiments is only for helping to understand the methods and core ideas of the embodiments of the present application, and meanwhile, for those skilled in the art, according to the ideas of the embodiments of the present application, there are various changes in the specific implementation manners and application ranges, so the disclosure should not be interpreted as limiting the embodiments of the present application.

Claims

1. An instruction scheduling decision method, comprising:

Acquiring total consumption values of scheduling and executing the instructions according to each candidate decision, wherein any total consumption value comprises a time consumption value and a resource consumption value when scheduling and executing the instructions according to the corresponding candidate decision;

and taking the candidate decision with the minimum total consumption value as an instruction scheduling strategy.

2. The method of claim 1, wherein the establishing a constraint comprising a dependency constraint according to the dependencies and delays between the respective instruction nodes comprises:

3. The method of claim 2, the establishing a dependency constraint based on a dependency path length between two instruction nodes in a target instruction node pair, comprising:

4. The method of claim 1, wherein the constraints further comprise one of:

5. The method of claim 1, wherein the constructing a plurality of candidate decisions comprises:

6. The method of claim 1, wherein at least one candidate decision scheduling instruction in each candidate decision is executed in parallel;

The constraints include specific content that is adjusted based on hardware resources and performance goals.

7. The method of claim 1, wherein the instruction scheduling policy is obtained from a plurality of candidate decisions by minimizing the following objective function:

O= l+β×C

8. An instruction scheduling decision apparatus comprising:

9. A computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-7 when the computer program is executed.

10. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of any of claims 1-7.

11. A computer program product comprising computer instructions which, when executed by a processor, implement the method of any one of claims 1 to 7.