CN119201833A

CN119201833A - Computational graph execution method, device, equipment, storage medium and program product

Info

Publication number: CN119201833A
Application number: CN202411127726.1A
Authority: CN
Inventors: 尹首一; 谷江源; 陈博颖; 韩慧明
Original assignee: Shanghai Tsinghua International Innovation Center; Tsinghua University
Current assignee: Shanghai Tsinghua International Innovation Center; Tsinghua University
Priority date: 2024-08-16
Filing date: 2024-08-16
Publication date: 2024-12-27

Abstract

The present application relates to a method, device, computer equipment, computer-readable storage medium and computer program product for executing a computational graph. The method is used for a computer device, the device includes a reconfigurable processor and a memory, the reconfigurable processor includes multiple computing units, the memory includes multiple storage units, and the method includes: obtaining a preset computational graph, the computational graph includes multiple nodes and edge information corresponding to each of the nodes; for each of the nodes, according to the node information corresponding to the node and the edge information, determining the target computing unit and target storage unit corresponding to the node; according to the target computing unit and the target storage unit, generating the operation instruction corresponding to the node, and running each of the operation instructions through each of the target computing units. The use of this method can reduce the access overhead of the memory.

Description

Calculation map execution method, apparatus, device, storage medium, and program product

Technical Field

The present application relates to the field of memory technology, and in particular, to a method, an apparatus, a computer device, a computer readable storage medium, and a computer program product for executing a computation graph.

Background

The reconfigurable processor is a flexible and programmable processor architecture, and hardware resources can be flexibly configured according to different application requirements and algorithm characteristics. In reconfigurable processors, memory and data access directly affect the execution efficiency of the algorithm and the performance of the system.

In the conventional technology, the access efficiency of the memory is improved by optimizing the structure of the memory. However, in a large-scale data processing scenario, the reconfigurable processor still has a problem of large memory access overhead in the process of executing the computation graph.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a computational graph execution method, apparatus, computer device, computer-readable storage medium, and computer program product that can reduce access overhead of a memory.

In a first aspect, the present application provides a method for executing a computational graph. For a computer device, the device comprising a reconfigurable processor including a plurality of computing units and a memory including a plurality of storage units, the method comprising:

Acquiring a preset calculation graph, wherein the calculation graph comprises a plurality of nodes and side information corresponding to each node;

For each node, determining a target computing unit and a target storage unit corresponding to the node according to node information and side information corresponding to the node;

and generating operation instructions corresponding to the nodes according to the target calculation units and the target storage units, and running the operation instructions through the target calculation units.

In one embodiment, the node information includes a calculation requirement corresponding to the node, and the determining, according to the node information corresponding to the node and the side information, the target calculation unit and the target storage unit corresponding to the node includes:

Determining the target computing unit from the computing units according to the computing requirements;

And determining the target storage unit from the storage units according to the side information.

In one embodiment, the determining the target storage unit from the storage units according to the side information includes:

determining candidate storage units from the storage units according to the side information;

determining the access type of the node according to the historical access times of the node;

and determining the target storage unit from the candidate storage units according to the access type.

In one embodiment, the method further comprises:

determining a plurality of association nodes with association relations from the nodes according to the side information;

and if the storage positions of the target storage units corresponding to the association nodes do not meet the preset conditions, re-determining the association storage units corresponding to the association nodes from the storage units according to the side information, wherein the association storage units meet the preset conditions, and the preset conditions are that the difference value between the storage addresses of the target storage units corresponding to the association nodes is smaller than a preset threshold value.

In one embodiment, the executing, by each of the target computing units, each of the operation instructions includes:

acquiring the execution priority of each node, and determining the execution sequence of each node according to the priority and the calculation requirement;

and executing each operation instruction through each target computing unit according to the execution sequence.

In one embodiment, the operation instruction includes a calculation operation instruction and a data transmission instruction, and the generating the operation instruction according to the target calculation unit and the target storage unit includes:

generating the calculation operation instruction according to the calculation requirement of the target calculation unit;

and generating the data transmission instruction according to the side information and the storage position of the target storage unit.

In a second aspect, the application also provides a calculation map execution device. For a computer device, the device comprising a reconfigurable processor including a plurality of computing units and a memory including a plurality of storage units, the apparatus comprising:

The acquisition module is used for acquiring a preset calculation graph, wherein the calculation graph comprises a plurality of nodes and side information corresponding to each node;

The first determining module is used for determining a target computing unit and a target storage unit corresponding to each node according to node information and the side information corresponding to the node;

And the operation module is used for generating operation instructions corresponding to the nodes according to the target calculation units and the target storage units and operating the operation instructions through the target calculation units.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the method of the first aspect described above when the processor executes the computer program.

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of the first aspect described above.

In a fifth aspect, the present application also provides a computer program product. The computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of the first aspect described above.

According to the computing graph execution method, the computing graph execution device, the computer equipment, the computer readable storage medium and the computer program product, the computer equipment acquires the preset computing graph, the computing graph comprises a plurality of nodes and side information corresponding to each node, then, for each node, a target computing unit and a target storage unit corresponding to the node are determined according to the node information and the side information corresponding to the node, then, according to the target computing unit and the target storage unit, computing instructions corresponding to the node are generated, and each computing instruction is operated through each target computing unit, and because the target computing unit is determined according to the node information, the determined target computing unit can meet the node information of each node, and the target storage unit is determined according to the side information, the determined target storage unit can meet the data dependency relation corresponding to the side information, so that in the process of operating each computing instruction, the storage unit accessed in the data transmission process can be determined quickly and accurately, the access times to a memory can be reduced, and the access cost to the memory can be reduced.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are needed in the description of the embodiments of the present application or the related technologies will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other related drawings may be obtained according to these drawings without inventive effort to those of ordinary skill in the art.

FIG. 1 is an application environment diagram of a method of computing graph execution in one embodiment;

FIG. 2 is a flow diagram of a method of performing a computational graph in one embodiment;

FIG. 3 is a flow chart of a method of performing a computational graph in another embodiment;

FIG. 4 is a flow chart of a method of performing a computational graph in another embodiment;

FIG. 5 is a flowchart of S203 in another embodiment;

FIG. 6 is a flowchart of S203 in another embodiment;

FIG. 7 is a block diagram showing the structure of a calculation map execution apparatus in one embodiment;

Fig. 8 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

In the conventional art, there are several methods to optimize the performance of the processor and the memory. The first is to optimize the memory hierarchy, e.g., perform data cache optimization, locality optimization, memory prefetch optimization, etc. The second is direct memory access technology, which is used to realize high-speed data transmission between the host memory and the peripheral, reduce the load of the CPU and improve the parallelism of the system. In reconfigurable processors, DMA technology is often used to enable data exchange with external memory. The third is to design on-chip memory and memory controllers, which are memory controllers for improving efficiency and bandwidth of memory access, e.g., optimization of caches, local memory and memory channels, and support for high bandwidth data transfer and parallel computation. The fourth is a compilation optimization technique, in which the compiler plays a vital role in the reconfigurable processor. Compiler optimization techniques can significantly impact program performance and execution efficiency. Compile optimization techniques may include instruction scheduling, data reordering, loop unrolling, and vectorization optimization, among others, to improve parallelism and data locality of a program. The fifth is to optimize the technology of the memory, and new memory technologies such as nonvolatile memory, 3D stacked memory, and memory storage level memory are continuously emerging. These new memory technologies are of great significance for memory and data access of reconfigurable processors, and can improve memory capacity, bandwidth and energy efficiency.

However, in the hardware architecture of the reconfigurable processor, if the computing scenario is large-scale and the parallelism is required for data processing, the conventional memory access technology cannot meet the high performance requirement of the processor, and in addition, extra delay and energy consumption are usually introduced in data transmission, especially data exchange with an external memory. Memory access scheduling is also a complex problem, and multiple factors such as data dependency, load balancing, and hardware resource allocation need to be considered in the memory access scheduling algorithm. However, the existing memory access scheduling algorithm often cannot fully utilize hardware resources, resulting in performance loss and resource waste. In addition, in the memory and data management method of the reconfigurable processor, a direct control method under a programming model is lacking, so that a developer cannot effectively customize and optimize the memory and data management of the processor, and meanwhile, the abstract level provided by a classical programming language is too high to meet the refinement requirement of the memory and data management in certain application scenes.

Therefore, in the conventional technology, under the large-scale data processing scene, the memory access overhead of the reconfigurable processor is high in the process of executing the calculation map. Based on the above, the application provides a calculation map execution method to reduce the memory access overhead.

The calculation map execution method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Fig. 1 includes a computer device having a reconfigurable processor 102 and a memory 104, wherein the reconfigurable processor 102 includes a plurality of computing units and the memory 104 includes a plurality of memory units. The computer equipment can acquire a preset calculation graph, the calculation graph comprises a plurality of nodes and side information corresponding to each node, for each node, the computer equipment determines a target calculation unit and a target storage unit corresponding to the node according to the node information and the side information corresponding to the node, and generates calculation instructions corresponding to the node according to the target calculation unit and the target storage unit and runs each calculation instruction through each target calculation unit. The memory 104 is used for storing data that the reconfigurable processor 102 needs to process. The memory 104 may include external memory, on-chip memory.

In an exemplary embodiment, as shown in fig. 2, there is provided a calculation map execution method, which is described by taking an example that the method is applied to the computer device in fig. 1, and includes the following steps:

s201, acquiring a preset calculation graph, wherein the calculation graph comprises a plurality of nodes and side information corresponding to each node.

Wherein the computational graph is a data structure for representing a computational process. A computational graph is a set of graphs made up of nodes and edges, where the node identities represent the basic operations or computational units in a computational task, e.g., such as additions, multiplications, logical operations, etc., each node typically having input and output ports for receiving and transmitting data. The edges represent the transmission relation of data in the computing task, and the edges connect input and output ports between different nodes and represent the data flow paths. In the present embodiment, the node to which each edge is connected is determined as the edge information.

The data flow corresponding to the computation graph describes the data dependency relationship in the computation task, that is, the computation result of a certain node may be used as the input data of another node, so as to form a data flow path. The data dependency refers to a dependency between associated nodes, for example, if the calculation result of the node a affects the calculation process of the node B, it is explained that the node B depends on the data of the node a. The data dependencies may be divided into control dependencies, which represent control flow dependencies in a computing task, and data dependencies, which represent data transfer dependencies in a computing task.

It should be noted that, in the embodiment of the present application, in order to solve the problem that a developer cannot effectively customize and optimize the memory and data management of a processor due to the lack of a direct control method under a programming model in the process of executing a computation graph by a reconfigurable processor, meanwhile, the level of abstraction provided by a classical programming language is too high to meet the refinement requirements for memory and data management in some application scenarios, a domain feature language (Domain Specific Language, DSL) is adopted to program the computation graph, so that the programming result of the DSL language is programmed in a heterogeneous manner in the reconfigurable processor. Therefore, in this embodiment, the preset calculation map is a calculation map corresponding to the DSL language. Wherein,

In this embodiment, the computer device may obtain, in advance, a calculation map to be run in a calculation map database, where the obtained calculation map includes a plurality of nodes and edges for connecting the nodes, and by obtaining the calculation map, a plurality of nodes included in the calculation map and edge information corresponding to each node may be obtained.

S202, for each node, determining a target computing unit and a target storage unit corresponding to the node according to node information and side information corresponding to the node.

The node information refers to information such as data, data type, operation type, input node, output node and the like corresponding to each node. The side information refers to nodes to which each side is connected, and data flow path information between a node inputting data and a node outputting data.

The calculation unit is a component for performing arithmetic operation or logical operation in the processor, and the storage unit is a storage space for storing data in the memory to realize data reading and data writing functions, and in this embodiment, the target storage unit is a storage unit capable of satisfying side information. It will be appreciated that before executing the computation graph, the computer device needs to map the computation graph into the computer device, that is, map the nodes in the computation graph into the computation units in the processor, and map the data to be processed into the storage units in the processor, so in this embodiment, the target computation unit refers to the computation unit capable of satisfying the node information of the nodes, and the target storage unit refers to the storage unit capable of satisfying the side information.

In the present embodiment, the computer apparatus may determine, as the target calculation unit, a calculation unit satisfying the node information from among the plurality of calculation units, and a storage unit satisfying the side information from among the plurality of storage units, as the target storage unit, based on the node information corresponding to the node. It is understood that one target computing unit may correspond to a plurality of target storage units, and the plurality of target storage units may be used to store input data and output data corresponding to the target computing unit.

S203, according to the target computing units and the target storage units, generating operation instructions corresponding to the nodes, and running the operation instructions through the target computing units.

Wherein the arithmetic instructions refer to a computer program that instructs a target computing unit to perform a data processing operation. It can be understood that, since the target computing unit is determined according to the node, the operation instruction generated according to the target computing unit is the operation instruction corresponding to the node.

In this embodiment, the computer device may generate the operation instruction according to the data processing type and the processing requirement corresponding to each target computing unit, and then compile the operation instruction by controlling the compiler, so as to execute each operation instruction by each target computing unit. It will be appreciated that, in this embodiment, if the node information indicates that the node can be processed in parallel, the target computing unit corresponding to the node may execute the operation instruction in parallel with the target computing unit that can also be processed in parallel.

As an alternative embodiment, if a plurality of nodes correspond to one target computing unit, the generated operation instruction may be used as the operation instruction corresponding to the plurality of nodes.

In the above calculation map execution method, the computer device obtains the preset calculation map, the calculation map includes a plurality of nodes and side information corresponding to each node, then, for each node, according to the node information and the side information corresponding to the node, a target calculation unit and a target storage unit corresponding to the node are determined, then, according to the target calculation unit and the target storage unit, an operation instruction corresponding to the node is generated, and each operation instruction is executed through each target calculation unit, because the target calculation unit is determined according to the node information, the determined target calculation unit can meet the node information of each node, and the target storage unit is determined according to the side information, the determined target storage unit can meet the data dependency relation corresponding to the side information, so that in the process of executing each operation instruction, the storage unit accessed in the data transmission process can be determined quickly and accurately, the access times to the memory can be reduced, and the access cost to the memory can be reduced.

In the above-described process of determining the target computing unit and the target storage unit, the computer device may determine the target computing unit according to the computing requirement corresponding to the node, and determine the target storage unit according to the side information. In an exemplary embodiment, the node information includes a computation requirement corresponding to the node, and the specific process of S202 may include:

in one possible embodiment, the target computing unit is determined from the computing units according to the computing requirements.

The computing requirements refer to computing resources required for running the corresponding computer program when processing the data to be processed corresponding to each node, for example, CPU resources, memory resources, network resources, such as bandwidth, hard disk resources, parallel requirements, and the like. It should be noted that, the processing performance of different computing units in the processor may be different, and therefore, a computing unit capable of meeting the computing requirement of the node needs to be selected from a plurality of computing units included in the processor as a corresponding target computing unit.

In this embodiment, the computer device may screen, according to the calculation requirement of each node, the calculation units satisfying the calculation requirement of the node from the calculation units, so as to determine the calculation unit satisfying the calculation requirement as the target calculation unit.

In another possible embodiment, the target memory cell is determined from the memory cells based on the side information.

The side information may represent a data dependency relationship between nodes in the computation graph, that is, the side information may represent a flow path of data transmitted between the nodes, and in a data transmission process, it may relate to which storage unit the data is read from and to which storage unit the data is written. Thus, the target storage unit can be determined from the side information.

In this embodiment, the computer device may determine a data dependency relationship between the nodes according to the side information, and then determine a storage unit corresponding to the input data and the output data of each node according to the data dependency relationship, thereby determining the corresponding storage unit as the target storage unit.

In the above-described "determining a target storage unit from among storage units based on side information", as shown in fig. 3, the process may include:

S301, determining candidate storage units from the storage units according to the side information.

The candidate storage unit refers to a computing unit capable of meeting the storage requirements of the input data and the output data of each node, for example, if the input data of the node a includes 100 bytes, the storage capacity of the storage unit 1 is 256 bytes, the storage capacity of the storage unit 2 is 20 bytes, and the storage capacity of the storage unit 3 is 120 bytes, then both the storage unit 1 and the storage unit 3 can be used as candidate storage units of the node a.

In this embodiment, the computer device may determine the data dependency relationship between the nodes according to the side information, and then determine the computing unit that satisfies the data dependency relationship and satisfies the data storage requirement as the candidate storage unit.

S302, determining the access type of the node according to the historical access times of the node.

The historical access times of the nodes refer to times of accessing the nodes in a historical time period, and whether each node is frequently accessed or not can be determined according to the historical access times, so that the access type of each node is determined, wherein the access type can comprise frequent access or infrequent access.

In this embodiment, a threshold value of the access times may be preset, if the historical access times of a certain node are greater than or equal to the threshold value, the access type of the node may be determined to be frequent access, and if the historical access times of the certain node are less than the threshold value, the access type of the node may be determined to be infrequent access.

S303, determining a target storage unit from the candidate storage units according to the access type.

It will be appreciated that the storage medium of the storage unit is different, as is the corresponding access speed. In the embodiment of the application, in order to improve the overall access efficiency and the utilization rate of storage resources, the data corresponding to the nodes can be stored in the storage units with different access speeds according to the access frequency of the nodes. It should be noted that the memory may be divided into memories with different access speeds, such as external memories or on-chip memories, where on-chip memories refer to memories directly integrated within an integrated circuit chip, and ‌ these memories are typically located within a CPU, ‌ such as a first Level cache (‌ Level 1 Cache) ‌ and a second Level cache (‌ Level 2 Cache) ‌, ‌ are used to increase the data processing speed and efficiency, and ‌ on-chip memories include ‌ static random access memories ‌ and ‌ dynamic random access memories ‌, ‌, where ‌ static random access memories are typically used for cache due to their faster speed, ‌ and ‌ dynamic random access memories are used for main memory due to their lower cost and larger capacity. The external memory refers to a memory except a computer memory and a CPU cache, the memory can still store data after being powered off, common external memories include a hard disk, a floppy disk, an optical disk, a U disk and the like, and the access speed of the external memory is lower than that of the on-chip memory.

Optionally, in this embodiment, if the access type of the node is frequent, the data corresponding to the node may be stored in a storage unit with a faster access speed, so as to determine a storage unit with a faster access speed in each candidate storage unit as a target storage unit, and if the access type of the node is infrequent, the data corresponding to the node may be stored in a storage unit with a slower access speed, so as to determine a storage unit with a slower access speed in each candidate storage unit as a target storage unit.

In this embodiment, the computer device determines the candidate storage units from each storage unit according to the side information, then determines the access type of the node according to the historical access times of the node, determines the target storage unit from each candidate storage unit according to the access type, and can store the data corresponding to the nodes with different access types into the corresponding target storage unit, thereby improving the matching degree between the target storage unit and the data corresponding to the nodes, for example, storing the data of the nodes with frequent access in the storage unit with higher access speed and storing the data of the nodes with infrequent access in the storage unit with lower access speed, and further improving the utilization of memory resources while improving the reading speed and the writing speed of the data in the process of operating the operation instruction, and maximizing the performance and efficiency of the computer system.

In this embodiment, the computer device determines, according to the calculation requirement, the target calculation unit from each calculation unit, and determines, according to the side information, the target storage unit from each storage unit, so that the determined target calculation unit can meet the calculation requirement of the node, and the determined target storage unit can meet the data dependency relationship corresponding to the side information, so that in the process of running the calculation instruction, the data access speed can be improved, the delay of the memory access and the data transmission time can be reduced, and further, the computing resources inside the processor and the storage units in the memory can be effectively utilized.

After the target computing unit and the target storage unit are determined, in order to further improve the data access speed, a data reorganization method may be further adopted to store the data with the association relationship in the storage unit with the close physical location. In an exemplary embodiment, as shown in fig. 4, the above method includes:

S401, according to each side information, a plurality of association nodes with association relation are determined from each node.

The association node refers to a plurality of nodes with data dependency relationship, for example, if the processing data of the node A is derived from the node B, the node A and the node B are association nodes with association relationship, or if the calculation result of the node A is output to the node C, the node A and the node C are association nodes with association relationship.

In this embodiment, the computer device may determine, according to each side information, a data dependency relationship corresponding to each side information, and then determine, according to the data dependency relationship, an association node corresponding to each node. For example, if the computer device determines that node a has a data dependency relationship with node B and node C, respectively, based on the side information, node B and node C may be determined as associated nodes of node a.

S402, if the storage positions of the target storage units corresponding to the associated nodes do not meet the preset conditions, the associated storage units corresponding to the associated nodes are redetermined from the storage units according to the side information, and the associated storage units meet the preset conditions, wherein the preset conditions are that the difference value between the storage addresses of the target storage units corresponding to the associated nodes is smaller than a preset threshold value.

It will be appreciated that when the computing unit in the computer device executes the operation instruction, it is necessary to read data from the storage unit 1, obtain a data result after the data calculation, and write the data result into the storage unit 2, where in the data transmission process, if the physical locations of the storage unit 1 and the storage unit 2 are close, the time taken in the data transmission process is relatively short, and if the physical locations of the storage unit 1 and the storage unit 2 are relatively far, the time taken in the data transmission process is relatively long. Therefore, the storage addresses of the target storage units corresponding to the associated nodes can be calculated, and whether the physical positions of the target storage units corresponding to the associated nodes are close or not can be determined according to the calculation result. In this embodiment, the condition that the difference between the storage addresses of the target storage units corresponding to the association nodes is smaller than the preset threshold may be determined as a preset condition, and if the association nodes do not meet the preset condition, the association storage units corresponding to the association nodes may be redetermined.

The associated storage units corresponding to the associated nodes are storage units which meet the data dependency relationship corresponding to the side information and meet the preset condition.

In this embodiment, the computer device may obtain the storage addresses according to the storage locations of the target storage units corresponding to the respective associated nodes, then calculate the difference between the storage addresses, then determine whether the difference is smaller than a preset threshold, if the difference is smaller than the preset threshold, determine that the storage locations of the target storage units corresponding to the respective associated nodes satisfy a preset condition, if the difference is larger than the preset threshold, determine that the storage locations of the target storage units corresponding to the respective associated nodes do not satisfy the preset condition, then, the computer device may redetermine the storage units satisfying the side information from the respective storage units according to the side information, and determine the redetermined storage units satisfying the side information as associated storage units if the redetermined storage units satisfying the preset condition satisfy the preset condition.

Alternatively, the associated storage unit may be a continuous storage unit satisfying the side information, or the associated storage unit may be a discontinuous storage unit satisfying the side information. If the associated memory cells can be continuous memory cells meeting the side information, the locality and continuity of data access can be optimized, and the randomness and fragmentation of memory access can be reduced.

In this embodiment, the computer device may determine, according to each side information, a plurality of association nodes having association relationships from each node, so that, when the storage locations of the target storage units corresponding to each association node do not satisfy the preset condition, the associated storage units corresponding to each association node may be redetermined from each storage unit, and since the associated storage units satisfy the preset condition that the difference between the storage addresses of the target storage units corresponding to each association node is smaller than the preset threshold, when the storage locations of the target storage units corresponding to each association node do not satisfy the preset condition, the associated storage units corresponding to each association node may be redetermined to obtain the associated storage units that satisfy both the side information and the preset condition, thereby reducing the duration of accessing the storage units and improving the data access efficiency and the operation speed of the operation instruction.

In the above scenario in which each of the operation instructions is executed by each of the target computing units, the computer device may further adjust an execution order of the operation instructions according to an execution priority of each of the nodes. In an exemplary embodiment, as shown in fig. 5, S203 described above includes:

S501, acquiring the execution priority of each node, and determining the execution sequence of each node according to the priority and the calculation requirement.

The execution priority is a parameter of the priority level of each job program to accept the system resource when the computer time-sharing operating system processes a plurality of job programs, and the priority level is first done with high priority and then done with low priority. In this embodiment, in order to improve the overall efficiency of executing the computation graph, the execution order of each node may be adjusted according to the execution priority of each node.

In this embodiment, the computer device may acquire the execution priority from the node information of each node, and then determine whether the computing resource in the computer device may meet the computing requirement of the high priority, and if so, may advance the execution order of the nodes of the high priority and retard the execution order of the nodes of the low priority.

S502, running each operation instruction through each target computing unit according to the execution sequence.

In this embodiment, after determining the execution sequence of each node, the computer device may sequentially control the target computing units corresponding to each node to execute each operation instruction according to the execution sequence.

In this embodiment, the computer device obtains the execution priority of each node, and can determine the execution sequence of each node according to the priority and the calculation requirement, so that each operation instruction can be run by each target calculation unit according to the execution sequence.

In the scenario of executing each of the operation instructions by each target computing unit, the computer device may generate a computing operation instruction according to a computing requirement, and according to an exemplary embodiment, as shown in fig. 6, S203 includes:

S601, generating a calculation operation instruction according to the calculation requirement of the target calculation unit.

The computing operation instruction refers to a computer program for instructing the computing unit to perform data operation, and for example, the computing operation instruction may be an addition operation, a multiplication operation, a logical and operation, a logical or operation, or the like. It will be appreciated that the computing requirements for each target computing unit are different, and the computing instructions for each target computing unit are different.

In this embodiment, the computer device may generate the calculation operation instruction corresponding to each target calculation unit according to the calculation requirement corresponding to each target calculation unit. For example, if the computing requirement of the target computing unit is to perform an addition operation, the computing operation instruction may instruct the target computing unit to perform the addition operation.

S602, generating a data transmission instruction according to the side information and the storage position of the target storage unit.

Wherein the data transfer instruction is a computer program that instructs the target computing unit to read data from the target storage unit and/or write data to the target storage unit.

In this embodiment, the computer device may determine a flow path of data during operation of the operation instruction according to the side information, and then generate the data transmission instruction corresponding to each target storage unit according to the storage location and the flow path of the target storage unit. For example, if the data flow path is the target storage unit 1-the target storage unit 2, the data transfer instruction may instruct the target calculation unit to acquire data from the target storage unit 1 and, after performing data processing, instruct the target calculation unit to write the data processing result in the target storage unit 2.

In this embodiment, since the calculation operation instruction is generated according to the calculation requirement of the target calculation unit, and the data transmission instruction is generated according to the side information and the storage location of the target storage unit, the target calculation unit can efficiently and accurately execute the calculation instruction according to the generated calculation operation instruction and the data transmission instruction, thereby improving the efficiency of running the calculation instruction.

For the convenience of understanding of those skilled in the art, the following describes in detail a method for executing a computational graph provided by the present application, where the method is used in a computer device, and the device includes a reconfigurable processor and a memory, where the reconfigurable processor includes a plurality of computing units, and the memory includes a plurality of storage units, where the method may include:

S1, acquiring a preset calculation chart.

S2, determining a target computing unit from the computing units according to the computing requirements of the nodes.

S3, determining candidate storage units from the storage units according to the side information.

S4, determining the access type of the node according to the historical access times of the node, and determining the target storage unit from the storage units according to the access type.

S5, according to the side information, a plurality of association nodes with association relations are determined from the nodes.

And S6, if the storage positions of the target storage units corresponding to the associated nodes do not meet the preset conditions, the associated storage units corresponding to the associated nodes are redetermined from the storage units according to the side information.

S7, generating a calculation operation instruction according to the calculation requirement of the target calculation unit.

S8, generating a data transmission instruction according to the side information and the storage position of the target storage unit.

S9, acquiring the execution priority of each node, and determining the execution sequence of each node according to the priority and the calculation requirement.

S10, running each operation instruction through each target computing unit according to the execution sequence.

It should be noted that, for the description in the above S1-S10, reference may be made to the description related to the above embodiment, and the effects thereof are similar, which is not repeated here.

It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a calculation map execution device for realizing the above-mentioned calculation map execution method. The implementation of the solution provided by the apparatus is similar to the implementation described in the above method, so the specific limitation in the embodiments of the apparatus for executing a calculation map provided below may refer to the limitation of the method for executing a calculation map hereinabove, and will not be repeated herein.

In one embodiment, as shown in fig. 7, there is provided a calculation map execution apparatus, including an acquisition module 701, a first determination module 702, and an operation module 703, wherein:

The acquiring module 701 is configured to acquire a preset calculation graph, where the calculation graph includes a plurality of nodes and side information corresponding to each node;

a first determining module 702, configured to determine, for each node, a target computing unit and a target storage unit corresponding to the node according to node information and side information corresponding to the node;

the operation module 703 is configured to generate an operation instruction corresponding to the node according to the target computing unit and the target storage unit, and operate each operation instruction through each target computing unit.

The calculation map executing device provided in this embodiment may execute the above method embodiment, and its implementation principle and technical effects are similar, and will not be described herein.

In one embodiment, the node information includes a computing requirement corresponding to the node, and the first determining module 802 includes a first determining unit and a second determining unit, where:

the first determining unit is used for determining a target computing unit from all computing units according to the computing requirements;

And a second determining unit configured to determine a target storage unit from the storage units based on the side information.

In one embodiment, the second determining unit is specifically configured to:

And determining a target storage unit from the candidate storage units according to the access type.

In one embodiment, the apparatus further comprises:

The second determining module is used for determining a plurality of association nodes with association relations from the nodes according to the side information;

And the third determining module is used for re-determining the associated storage unit corresponding to each associated node from the storage units according to the side information if the storage position of the target storage unit corresponding to each associated node does not meet the preset condition, wherein the preset condition is that the difference value between the storage addresses of the target storage units corresponding to each associated node is smaller than the preset threshold value.

In one embodiment, the operation module 803 includes:

The third determining unit is used for obtaining the execution priority of each node and determining the execution sequence of each node according to the priority and the calculation requirement;

and the running unit is used for running each operation instruction through each target calculation unit according to the execution sequence.

In one embodiment, the operation instruction includes a calculation operation instruction and a data transmission instruction, and the operation module 803 includes:

the first generation unit is used for generating a calculation operation instruction according to the calculation requirement of the target calculation unit;

and the second generation unit is used for generating a data transmission instruction according to the side information and the storage position of the target storage unit.

The respective modules in the above-described calculation map execution apparatus may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one exemplary embodiment, a computer device is provided, which may be a server, and the internal structure thereof may be as shown in fig. 8. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing node data and edge data corresponding to the calculation map. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a computational graph execution method.

It will be appreciated by those skilled in the art that the structure shown in FIG. 8 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

According to the target calculation unit and the target storage unit, generating operation instructions corresponding to the nodes, and running each operation instruction through each target calculation unit.

In one embodiment, the processor when executing the computer program further performs the steps of:

determining a target computing unit from the computing units according to the computing requirements;

And determining a target storage unit from the storage units according to the side information.

if the storage positions of the target storage units corresponding to the associated nodes do not meet the preset conditions, the associated storage units corresponding to the associated nodes are redetermined from the storage units according to the side information, and the associated storage units meet the preset conditions, wherein the preset conditions are that the difference value between the storage addresses of the target storage units corresponding to the associated nodes is smaller than a preset threshold value.

and running each operation instruction through each target computing unit according to the execution sequence.

generating a calculation operation instruction according to the calculation requirement of the target calculation unit;

and generating a data transmission instruction according to the side information and the storage position of the target storage unit.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magneto-resistive random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (PHASE CHANGE Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A computation graph execution method, characterized in that it is used in a computer device, the device includes a reconfigurable processor and a memory, the reconfigurable processor includes a plurality of computing units, the memory includes a plurality of storage units, and the method includes:

Obtain a preset computational graph, the computational graph comprising a plurality of nodes and edge information corresponding to each of the nodes;

For each of the nodes, determining a target computing unit and a target storage unit corresponding to the node according to the node information corresponding to the node and the edge information;

According to the target computing unit and the target storage unit, a computing instruction corresponding to the node is generated, and each of the computing instructions is executed through each of the target computing units.

2. The method according to claim 1, wherein the node information includes a computing requirement corresponding to the node, and the determining the target computing unit and the target storage unit corresponding to the node according to the node information corresponding to the node and the edge information comprises:

According to the computing requirement, determining the target computing unit from the computing units;

The target storage unit is determined from the storage units according to the side information.

3. The method according to claim 2, characterized in that the step of determining the target storage unit from each of the storage units according to the side information comprises:

Determine a candidate storage unit from each of the storage units according to the side information;

The target storage unit is determined from each of the candidate storage units according to the access type.

4. The method according to claim 2 or 3, characterized in that the method further comprises:

According to each of the edge information, a plurality of associated nodes having an associated relationship are determined from each of the nodes;

If the storage position of the target storage unit corresponding to each of the associated nodes does not meet the preset conditions, the associated storage units corresponding to each of the associated nodes are re-determined from each of the storage units based on the edge information, and the associated storage units meet the preset conditions, and the preset conditions are that the difference between the storage addresses of the target storage units corresponding to each of the associated nodes is less than a preset threshold.

5. The method according to claim 2, wherein the step of executing each of the operation instructions through each of the target computing units comprises:

Obtaining the execution priority of each of the nodes, and determining the execution order of each of the nodes according to the priority and the computing requirement;

According to the execution order, each of the operation instructions is executed by each of the target computing units.

6. The method according to claim 1, wherein the operation instruction comprises a calculation operation instruction and a data transmission instruction, and the generating the operation instruction according to the target computing unit and the target storage unit comprises:

generating the computing operation instruction according to the computing requirement of the target computing unit;

The data transfer instruction is generated according to the side information and the storage location of the target storage unit.

7. A computation graph execution device, characterized in that it is used in a computer device, the device includes a reconfigurable processor and a memory, the reconfigurable processor includes a plurality of computing units, the memory includes a plurality of storage units, and the device includes:

An acquisition module is used to acquire a preset computational graph, wherein the computational graph includes a plurality of nodes and edge information corresponding to each of the nodes;

A first determination module is used to determine, for each of the nodes, a target computing unit and a target storage unit corresponding to the node according to the node information corresponding to the node and the edge information;

An operation module is used to generate a computing instruction corresponding to the node according to the target computing unit and the target storage unit, and to execute each of the computing instructions through each of the target computing units.

8. A computer device, comprising a memory and a processor, wherein the memory stores a computer program, wherein the processor implements the steps of the method according to any one of claims 1 to 6 when executing the computer program.

9. A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 6 are implemented.

10. A computer program product, comprising a computer program, characterized in that when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 6 are implemented.