[go: up one dir, main page]

CN119201833A - Computational graph execution method, device, equipment, storage medium and program product - Google Patents

Computational graph execution method, device, equipment, storage medium and program product Download PDF

Info

Publication number
CN119201833A
CN119201833A CN202411127726.1A CN202411127726A CN119201833A CN 119201833 A CN119201833 A CN 119201833A CN 202411127726 A CN202411127726 A CN 202411127726A CN 119201833 A CN119201833 A CN 119201833A
Authority
CN
China
Prior art keywords
node
target
computing
nodes
storage unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411127726.1A
Other languages
Chinese (zh)
Inventor
尹首一
谷江源
陈博颖
韩慧明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Tsinghua International Innovation Center
Tsinghua University
Original Assignee
Shanghai Tsinghua International Innovation Center
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Tsinghua International Innovation Center, Tsinghua University filed Critical Shanghai Tsinghua International Innovation Center
Priority to CN202411127726.1A priority Critical patent/CN119201833A/en
Publication of CN119201833A publication Critical patent/CN119201833A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请涉及一种计算图执行方法、装置、计算机设备、计算机可读存储介质和计算机程序产品。所述方法用于计算机设备,所述设备包括可重构处理器以及存储器,所述可重构处理器包括多个计算单元,所述存储器包括多个存储单元,所述方法包括:获取预设的计算图,所述计算图包括多个节点以及各所述节点对应的边信息;针对每个所述节点,根据所述节点对应的节点信息以及所述边信息,确定所述节点对应的目标计算单元和目标存储单元;根据所述目标计算单元和所述目标存储单元,生成所述节点对应的运算指令,并通过各所述目标计算单元运行各所述运算指令。采用本方法能够减少存储器的访问开销。

The present application relates to a method, device, computer equipment, computer-readable storage medium and computer program product for executing a computational graph. The method is used for a computer device, the device includes a reconfigurable processor and a memory, the reconfigurable processor includes multiple computing units, the memory includes multiple storage units, and the method includes: obtaining a preset computational graph, the computational graph includes multiple nodes and edge information corresponding to each of the nodes; for each of the nodes, according to the node information corresponding to the node and the edge information, determining the target computing unit and target storage unit corresponding to the node; according to the target computing unit and the target storage unit, generating the operation instruction corresponding to the node, and running each of the operation instructions through each of the target computing units. The use of this method can reduce the access overhead of the memory.

Description

Calculation map execution method, apparatus, device, storage medium, and program product
Technical Field
The present application relates to the field of memory technology, and in particular, to a method, an apparatus, a computer device, a computer readable storage medium, and a computer program product for executing a computation graph.
Background
The reconfigurable processor is a flexible and programmable processor architecture, and hardware resources can be flexibly configured according to different application requirements and algorithm characteristics. In reconfigurable processors, memory and data access directly affect the execution efficiency of the algorithm and the performance of the system.
In the conventional technology, the access efficiency of the memory is improved by optimizing the structure of the memory. However, in a large-scale data processing scenario, the reconfigurable processor still has a problem of large memory access overhead in the process of executing the computation graph.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a computational graph execution method, apparatus, computer device, computer-readable storage medium, and computer program product that can reduce access overhead of a memory.
In a first aspect, the present application provides a method for executing a computational graph. For a computer device, the device comprising a reconfigurable processor including a plurality of computing units and a memory including a plurality of storage units, the method comprising:
Acquiring a preset calculation graph, wherein the calculation graph comprises a plurality of nodes and side information corresponding to each node;
For each node, determining a target computing unit and a target storage unit corresponding to the node according to node information and side information corresponding to the node;
and generating operation instructions corresponding to the nodes according to the target calculation units and the target storage units, and running the operation instructions through the target calculation units.
In one embodiment, the node information includes a calculation requirement corresponding to the node, and the determining, according to the node information corresponding to the node and the side information, the target calculation unit and the target storage unit corresponding to the node includes:
Determining the target computing unit from the computing units according to the computing requirements;
And determining the target storage unit from the storage units according to the side information.
In one embodiment, the determining the target storage unit from the storage units according to the side information includes:
determining candidate storage units from the storage units according to the side information;
determining the access type of the node according to the historical access times of the node;
and determining the target storage unit from the candidate storage units according to the access type.
In one embodiment, the method further comprises:
determining a plurality of association nodes with association relations from the nodes according to the side information;
and if the storage positions of the target storage units corresponding to the association nodes do not meet the preset conditions, re-determining the association storage units corresponding to the association nodes from the storage units according to the side information, wherein the association storage units meet the preset conditions, and the preset conditions are that the difference value between the storage addresses of the target storage units corresponding to the association nodes is smaller than a preset threshold value.
In one embodiment, the executing, by each of the target computing units, each of the operation instructions includes:
acquiring the execution priority of each node, and determining the execution sequence of each node according to the priority and the calculation requirement;
and executing each operation instruction through each target computing unit according to the execution sequence.
In one embodiment, the operation instruction includes a calculation operation instruction and a data transmission instruction, and the generating the operation instruction according to the target calculation unit and the target storage unit includes:
generating the calculation operation instruction according to the calculation requirement of the target calculation unit;
and generating the data transmission instruction according to the side information and the storage position of the target storage unit.
In a second aspect, the application also provides a calculation map execution device. For a computer device, the device comprising a reconfigurable processor including a plurality of computing units and a memory including a plurality of storage units, the apparatus comprising:
The acquisition module is used for acquiring a preset calculation graph, wherein the calculation graph comprises a plurality of nodes and side information corresponding to each node;
The first determining module is used for determining a target computing unit and a target storage unit corresponding to each node according to node information and the side information corresponding to the node;
And the operation module is used for generating operation instructions corresponding to the nodes according to the target calculation units and the target storage units and operating the operation instructions through the target calculation units.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the method of the first aspect described above when the processor executes the computer program.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of the first aspect described above.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of the first aspect described above.
According to the computing graph execution method, the computing graph execution device, the computer equipment, the computer readable storage medium and the computer program product, the computer equipment acquires the preset computing graph, the computing graph comprises a plurality of nodes and side information corresponding to each node, then, for each node, a target computing unit and a target storage unit corresponding to the node are determined according to the node information and the side information corresponding to the node, then, according to the target computing unit and the target storage unit, computing instructions corresponding to the node are generated, and each computing instruction is operated through each target computing unit, and because the target computing unit is determined according to the node information, the determined target computing unit can meet the node information of each node, and the target storage unit is determined according to the side information, the determined target storage unit can meet the data dependency relation corresponding to the side information, so that in the process of operating each computing instruction, the storage unit accessed in the data transmission process can be determined quickly and accurately, the access times to a memory can be reduced, and the access cost to the memory can be reduced.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are needed in the description of the embodiments of the present application or the related technologies will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other related drawings may be obtained according to these drawings without inventive effort to those of ordinary skill in the art.
FIG. 1 is an application environment diagram of a method of computing graph execution in one embodiment;
FIG. 2 is a flow diagram of a method of performing a computational graph in one embodiment;
FIG. 3 is a flow chart of a method of performing a computational graph in another embodiment;
FIG. 4 is a flow chart of a method of performing a computational graph in another embodiment;
FIG. 5 is a flowchart of S203 in another embodiment;
FIG. 6 is a flowchart of S203 in another embodiment;
FIG. 7 is a block diagram showing the structure of a calculation map execution apparatus in one embodiment;
Fig. 8 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
In the conventional art, there are several methods to optimize the performance of the processor and the memory. The first is to optimize the memory hierarchy, e.g., perform data cache optimization, locality optimization, memory prefetch optimization, etc. The second is direct memory access technology, which is used to realize high-speed data transmission between the host memory and the peripheral, reduce the load of the CPU and improve the parallelism of the system. In reconfigurable processors, DMA technology is often used to enable data exchange with external memory. The third is to design on-chip memory and memory controllers, which are memory controllers for improving efficiency and bandwidth of memory access, e.g., optimization of caches, local memory and memory channels, and support for high bandwidth data transfer and parallel computation. The fourth is a compilation optimization technique, in which the compiler plays a vital role in the reconfigurable processor. Compiler optimization techniques can significantly impact program performance and execution efficiency. Compile optimization techniques may include instruction scheduling, data reordering, loop unrolling, and vectorization optimization, among others, to improve parallelism and data locality of a program. The fifth is to optimize the technology of the memory, and new memory technologies such as nonvolatile memory, 3D stacked memory, and memory storage level memory are continuously emerging. These new memory technologies are of great significance for memory and data access of reconfigurable processors, and can improve memory capacity, bandwidth and energy efficiency.
However, in the hardware architecture of the reconfigurable processor, if the computing scenario is large-scale and the parallelism is required for data processing, the conventional memory access technology cannot meet the high performance requirement of the processor, and in addition, extra delay and energy consumption are usually introduced in data transmission, especially data exchange with an external memory. Memory access scheduling is also a complex problem, and multiple factors such as data dependency, load balancing, and hardware resource allocation need to be considered in the memory access scheduling algorithm. However, the existing memory access scheduling algorithm often cannot fully utilize hardware resources, resulting in performance loss and resource waste. In addition, in the memory and data management method of the reconfigurable processor, a direct control method under a programming model is lacking, so that a developer cannot effectively customize and optimize the memory and data management of the processor, and meanwhile, the abstract level provided by a classical programming language is too high to meet the refinement requirement of the memory and data management in certain application scenes.
Therefore, in the conventional technology, under the large-scale data processing scene, the memory access overhead of the reconfigurable processor is high in the process of executing the calculation map. Based on the above, the application provides a calculation map execution method to reduce the memory access overhead.
The calculation map execution method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Fig. 1 includes a computer device having a reconfigurable processor 102 and a memory 104, wherein the reconfigurable processor 102 includes a plurality of computing units and the memory 104 includes a plurality of memory units. The computer equipment can acquire a preset calculation graph, the calculation graph comprises a plurality of nodes and side information corresponding to each node, for each node, the computer equipment determines a target calculation unit and a target storage unit corresponding to the node according to the node information and the side information corresponding to the node, and generates calculation instructions corresponding to the node according to the target calculation unit and the target storage unit and runs each calculation instruction through each target calculation unit. The memory 104 is used for storing data that the reconfigurable processor 102 needs to process. The memory 104 may include external memory, on-chip memory.
In an exemplary embodiment, as shown in fig. 2, there is provided a calculation map execution method, which is described by taking an example that the method is applied to the computer device in fig. 1, and includes the following steps:
s201, acquiring a preset calculation graph, wherein the calculation graph comprises a plurality of nodes and side information corresponding to each node.
Wherein the computational graph is a data structure for representing a computational process. A computational graph is a set of graphs made up of nodes and edges, where the node identities represent the basic operations or computational units in a computational task, e.g., such as additions, multiplications, logical operations, etc., each node typically having input and output ports for receiving and transmitting data. The edges represent the transmission relation of data in the computing task, and the edges connect input and output ports between different nodes and represent the data flow paths. In the present embodiment, the node to which each edge is connected is determined as the edge information.
The data flow corresponding to the computation graph describes the data dependency relationship in the computation task, that is, the computation result of a certain node may be used as the input data of another node, so as to form a data flow path. The data dependency refers to a dependency between associated nodes, for example, if the calculation result of the node a affects the calculation process of the node B, it is explained that the node B depends on the data of the node a. The data dependencies may be divided into control dependencies, which represent control flow dependencies in a computing task, and data dependencies, which represent data transfer dependencies in a computing task.
It should be noted that, in the embodiment of the present application, in order to solve the problem that a developer cannot effectively customize and optimize the memory and data management of a processor due to the lack of a direct control method under a programming model in the process of executing a computation graph by a reconfigurable processor, meanwhile, the level of abstraction provided by a classical programming language is too high to meet the refinement requirements for memory and data management in some application scenarios, a domain feature language (Domain Specific Language, DSL) is adopted to program the computation graph, so that the programming result of the DSL language is programmed in a heterogeneous manner in the reconfigurable processor. Therefore, in this embodiment, the preset calculation map is a calculation map corresponding to the DSL language. Wherein,
In this embodiment, the computer device may obtain, in advance, a calculation map to be run in a calculation map database, where the obtained calculation map includes a plurality of nodes and edges for connecting the nodes, and by obtaining the calculation map, a plurality of nodes included in the calculation map and edge information corresponding to each node may be obtained.
S202, for each node, determining a target computing unit and a target storage unit corresponding to the node according to node information and side information corresponding to the node.
The node information refers to information such as data, data type, operation type, input node, output node and the like corresponding to each node. The side information refers to nodes to which each side is connected, and data flow path information between a node inputting data and a node outputting data.
The calculation unit is a component for performing arithmetic operation or logical operation in the processor, and the storage unit is a storage space for storing data in the memory to realize data reading and data writing functions, and in this embodiment, the target storage unit is a storage unit capable of satisfying side information. It will be appreciated that before executing the computation graph, the computer device needs to map the computation graph into the computer device, that is, map the nodes in the computation graph into the computation units in the processor, and map the data to be processed into the storage units in the processor, so in this embodiment, the target computation unit refers to the computation unit capable of satisfying the node information of the nodes, and the target storage unit refers to the storage unit capable of satisfying the side information.
In the present embodiment, the computer apparatus may determine, as the target calculation unit, a calculation unit satisfying the node information from among the plurality of calculation units, and a storage unit satisfying the side information from among the plurality of storage units, as the target storage unit, based on the node information corresponding to the node. It is understood that one target computing unit may correspond to a plurality of target storage units, and the plurality of target storage units may be used to store input data and output data corresponding to the target computing unit.
S203, according to the target computing units and the target storage units, generating operation instructions corresponding to the nodes, and running the operation instructions through the target computing units.
Wherein the arithmetic instructions refer to a computer program that instructs a target computing unit to perform a data processing operation. It can be understood that, since the target computing unit is determined according to the node, the operation instruction generated according to the target computing unit is the operation instruction corresponding to the node.
In this embodiment, the computer device may generate the operation instruction according to the data processing type and the processing requirement corresponding to each target computing unit, and then compile the operation instruction by controlling the compiler, so as to execute each operation instruction by each target computing unit. It will be appreciated that, in this embodiment, if the node information indicates that the node can be processed in parallel, the target computing unit corresponding to the node may execute the operation instruction in parallel with the target computing unit that can also be processed in parallel.
As an alternative embodiment, if a plurality of nodes correspond to one target computing unit, the generated operation instruction may be used as the operation instruction corresponding to the plurality of nodes.
In the above calculation map execution method, the computer device obtains the preset calculation map, the calculation map includes a plurality of nodes and side information corresponding to each node, then, for each node, according to the node information and the side information corresponding to the node, a target calculation unit and a target storage unit corresponding to the node are determined, then, according to the target calculation unit and the target storage unit, an operation instruction corresponding to the node is generated, and each operation instruction is executed through each target calculation unit, because the target calculation unit is determined according to the node information, the determined target calculation unit can meet the node information of each node, and the target storage unit is determined according to the side information, the determined target storage unit can meet the data dependency relation corresponding to the side information, so that in the process of executing each operation instruction, the storage unit accessed in the data transmission process can be determined quickly and accurately, the access times to the memory can be reduced, and the access cost to the memory can be reduced.
In the above-described process of determining the target computing unit and the target storage unit, the computer device may determine the target computing unit according to the computing requirement corresponding to the node, and determine the target storage unit according to the side information. In an exemplary embodiment, the node information includes a computation requirement corresponding to the node, and the specific process of S202 may include:
in one possible embodiment, the target computing unit is determined from the computing units according to the computing requirements.
The computing requirements refer to computing resources required for running the corresponding computer program when processing the data to be processed corresponding to each node, for example, CPU resources, memory resources, network resources, such as bandwidth, hard disk resources, parallel requirements, and the like. It should be noted that, the processing performance of different computing units in the processor may be different, and therefore, a computing unit capable of meeting the computing requirement of the node needs to be selected from a plurality of computing units included in the processor as a corresponding target computing unit.
In this embodiment, the computer device may screen, according to the calculation requirement of each node, the calculation units satisfying the calculation requirement of the node from the calculation units, so as to determine the calculation unit satisfying the calculation requirement as the target calculation unit.
In another possible embodiment, the target memory cell is determined from the memory cells based on the side information.
The side information may represent a data dependency relationship between nodes in the computation graph, that is, the side information may represent a flow path of data transmitted between the nodes, and in a data transmission process, it may relate to which storage unit the data is read from and to which storage unit the data is written. Thus, the target storage unit can be determined from the side information.
In this embodiment, the computer device may determine a data dependency relationship between the nodes according to the side information, and then determine a storage unit corresponding to the input data and the output data of each node according to the data dependency relationship, thereby determining the corresponding storage unit as the target storage unit.
In the above-described "determining a target storage unit from among storage units based on side information", as shown in fig. 3, the process may include:
S301, determining candidate storage units from the storage units according to the side information.
The candidate storage unit refers to a computing unit capable of meeting the storage requirements of the input data and the output data of each node, for example, if the input data of the node a includes 100 bytes, the storage capacity of the storage unit 1 is 256 bytes, the storage capacity of the storage unit 2 is 20 bytes, and the storage capacity of the storage unit 3 is 120 bytes, then both the storage unit 1 and the storage unit 3 can be used as candidate storage units of the node a.
In this embodiment, the computer device may determine the data dependency relationship between the nodes according to the side information, and then determine the computing unit that satisfies the data dependency relationship and satisfies the data storage requirement as the candidate storage unit.
S302, determining the access type of the node according to the historical access times of the node.
The historical access times of the nodes refer to times of accessing the nodes in a historical time period, and whether each node is frequently accessed or not can be determined according to the historical access times, so that the access type of each node is determined, wherein the access type can comprise frequent access or infrequent access.
In this embodiment, a threshold value of the access times may be preset, if the historical access times of a certain node are greater than or equal to the threshold value, the access type of the node may be determined to be frequent access, and if the historical access times of the certain node are less than the threshold value, the access type of the node may be determined to be infrequent access.
S303, determining a target storage unit from the candidate storage units according to the access type.
It will be appreciated that the storage medium of the storage unit is different, as is the corresponding access speed. In the embodiment of the application, in order to improve the overall access efficiency and the utilization rate of storage resources, the data corresponding to the nodes can be stored in the storage units with different access speeds according to the access frequency of the nodes. It should be noted that the memory may be divided into memories with different access speeds, such as external memories or on-chip memories, where on-chip memories refer to memories directly integrated within an integrated circuit chip, and ‌ these memories are typically located within a CPU, ‌ such as a first Level cache (‌ Level 1 Cache) ‌ and a second Level cache (‌ Level 2 Cache) ‌, ‌ are used to increase the data processing speed and efficiency, and ‌ on-chip memories include ‌ static random access memories ‌ and ‌ dynamic random access memories ‌, ‌, where ‌ static random access memories are typically used for cache due to their faster speed, ‌ and ‌ dynamic random access memories are used for main memory due to their lower cost and larger capacity. The external memory refers to a memory except a computer memory and a CPU cache, the memory can still store data after being powered off, common external memories include a hard disk, a floppy disk, an optical disk, a U disk and the like, and the access speed of the external memory is lower than that of the on-chip memory.
Optionally, in this embodiment, if the access type of the node is frequent, the data corresponding to the node may be stored in a storage unit with a faster access speed, so as to determine a storage unit with a faster access speed in each candidate storage unit as a target storage unit, and if the access type of the node is infrequent, the data corresponding to the node may be stored in a storage unit with a slower access speed, so as to determine a storage unit with a slower access speed in each candidate storage unit as a target storage unit.
In this embodiment, the computer device determines the candidate storage units from each storage unit according to the side information, then determines the access type of the node according to the historical access times of the node, determines the target storage unit from each candidate storage unit according to the access type, and can store the data corresponding to the nodes with different access types into the corresponding target storage unit, thereby improving the matching degree between the target storage unit and the data corresponding to the nodes, for example, storing the data of the nodes with frequent access in the storage unit with higher access speed and storing the data of the nodes with infrequent access in the storage unit with lower access speed, and further improving the utilization of memory resources while improving the reading speed and the writing speed of the data in the process of operating the operation instruction, and maximizing the performance and efficiency of the computer system.
In this embodiment, the computer device determines, according to the calculation requirement, the target calculation unit from each calculation unit, and determines, according to the side information, the target storage unit from each storage unit, so that the determined target calculation unit can meet the calculation requirement of the node, and the determined target storage unit can meet the data dependency relationship corresponding to the side information, so that in the process of running the calculation instruction, the data access speed can be improved, the delay of the memory access and the data transmission time can be reduced, and further, the computing resources inside the processor and the storage units in the memory can be effectively utilized.
After the target computing unit and the target storage unit are determined, in order to further improve the data access speed, a data reorganization method may be further adopted to store the data with the association relationship in the storage unit with the close physical location. In an exemplary embodiment, as shown in fig. 4, the above method includes:
S401, according to each side information, a plurality of association nodes with association relation are determined from each node.
The association node refers to a plurality of nodes with data dependency relationship, for example, if the processing data of the node A is derived from the node B, the node A and the node B are association nodes with association relationship, or if the calculation result of the node A is output to the node C, the node A and the node C are association nodes with association relationship.
In this embodiment, the computer device may determine, according to each side information, a data dependency relationship corresponding to each side information, and then determine, according to the data dependency relationship, an association node corresponding to each node. For example, if the computer device determines that node a has a data dependency relationship with node B and node C, respectively, based on the side information, node B and node C may be determined as associated nodes of node a.
S402, if the storage positions of the target storage units corresponding to the associated nodes do not meet the preset conditions, the associated storage units corresponding to the associated nodes are redetermined from the storage units according to the side information, and the associated storage units meet the preset conditions, wherein the preset conditions are that the difference value between the storage addresses of the target storage units corresponding to the associated nodes is smaller than a preset threshold value.
It will be appreciated that when the computing unit in the computer device executes the operation instruction, it is necessary to read data from the storage unit 1, obtain a data result after the data calculation, and write the data result into the storage unit 2, where in the data transmission process, if the physical locations of the storage unit 1 and the storage unit 2 are close, the time taken in the data transmission process is relatively short, and if the physical locations of the storage unit 1 and the storage unit 2 are relatively far, the time taken in the data transmission process is relatively long. Therefore, the storage addresses of the target storage units corresponding to the associated nodes can be calculated, and whether the physical positions of the target storage units corresponding to the associated nodes are close or not can be determined according to the calculation result. In this embodiment, the condition that the difference between the storage addresses of the target storage units corresponding to the association nodes is smaller than the preset threshold may be determined as a preset condition, and if the association nodes do not meet the preset condition, the association storage units corresponding to the association nodes may be redetermined.
The associated storage units corresponding to the associated nodes are storage units which meet the data dependency relationship corresponding to the side information and meet the preset condition.
In this embodiment, the computer device may obtain the storage addresses according to the storage locations of the target storage units corresponding to the respective associated nodes, then calculate the difference between the storage addresses, then determine whether the difference is smaller than a preset threshold, if the difference is smaller than the preset threshold, determine that the storage locations of the target storage units corresponding to the respective associated nodes satisfy a preset condition, if the difference is larger than the preset threshold, determine that the storage locations of the target storage units corresponding to the respective associated nodes do not satisfy the preset condition, then, the computer device may redetermine the storage units satisfying the side information from the respective storage units according to the side information, and determine the redetermined storage units satisfying the side information as associated storage units if the redetermined storage units satisfying the preset condition satisfy the preset condition.
Alternatively, the associated storage unit may be a continuous storage unit satisfying the side information, or the associated storage unit may be a discontinuous storage unit satisfying the side information. If the associated memory cells can be continuous memory cells meeting the side information, the locality and continuity of data access can be optimized, and the randomness and fragmentation of memory access can be reduced.
In this embodiment, the computer device may determine, according to each side information, a plurality of association nodes having association relationships from each node, so that, when the storage locations of the target storage units corresponding to each association node do not satisfy the preset condition, the associated storage units corresponding to each association node may be redetermined from each storage unit, and since the associated storage units satisfy the preset condition that the difference between the storage addresses of the target storage units corresponding to each association node is smaller than the preset threshold, when the storage locations of the target storage units corresponding to each association node do not satisfy the preset condition, the associated storage units corresponding to each association node may be redetermined to obtain the associated storage units that satisfy both the side information and the preset condition, thereby reducing the duration of accessing the storage units and improving the data access efficiency and the operation speed of the operation instruction.
In the above scenario in which each of the operation instructions is executed by each of the target computing units, the computer device may further adjust an execution order of the operation instructions according to an execution priority of each of the nodes. In an exemplary embodiment, as shown in fig. 5, S203 described above includes:
S501, acquiring the execution priority of each node, and determining the execution sequence of each node according to the priority and the calculation requirement.
The execution priority is a parameter of the priority level of each job program to accept the system resource when the computer time-sharing operating system processes a plurality of job programs, and the priority level is first done with high priority and then done with low priority. In this embodiment, in order to improve the overall efficiency of executing the computation graph, the execution order of each node may be adjusted according to the execution priority of each node.
In this embodiment, the computer device may acquire the execution priority from the node information of each node, and then determine whether the computing resource in the computer device may meet the computing requirement of the high priority, and if so, may advance the execution order of the nodes of the high priority and retard the execution order of the nodes of the low priority.
S502, running each operation instruction through each target computing unit according to the execution sequence.
In this embodiment, after determining the execution sequence of each node, the computer device may sequentially control the target computing units corresponding to each node to execute each operation instruction according to the execution sequence.
In this embodiment, the computer device obtains the execution priority of each node, and can determine the execution sequence of each node according to the priority and the calculation requirement, so that each operation instruction can be run by each target calculation unit according to the execution sequence.
In the scenario of executing each of the operation instructions by each target computing unit, the computer device may generate a computing operation instruction according to a computing requirement, and according to an exemplary embodiment, as shown in fig. 6, S203 includes:
S601, generating a calculation operation instruction according to the calculation requirement of the target calculation unit.
The computing operation instruction refers to a computer program for instructing the computing unit to perform data operation, and for example, the computing operation instruction may be an addition operation, a multiplication operation, a logical and operation, a logical or operation, or the like. It will be appreciated that the computing requirements for each target computing unit are different, and the computing instructions for each target computing unit are different.
In this embodiment, the computer device may generate the calculation operation instruction corresponding to each target calculation unit according to the calculation requirement corresponding to each target calculation unit. For example, if the computing requirement of the target computing unit is to perform an addition operation, the computing operation instruction may instruct the target computing unit to perform the addition operation.
S602, generating a data transmission instruction according to the side information and the storage position of the target storage unit.
Wherein the data transfer instruction is a computer program that instructs the target computing unit to read data from the target storage unit and/or write data to the target storage unit.
In this embodiment, the computer device may determine a flow path of data during operation of the operation instruction according to the side information, and then generate the data transmission instruction corresponding to each target storage unit according to the storage location and the flow path of the target storage unit. For example, if the data flow path is the target storage unit 1-the target storage unit 2, the data transfer instruction may instruct the target calculation unit to acquire data from the target storage unit 1 and, after performing data processing, instruct the target calculation unit to write the data processing result in the target storage unit 2.
In this embodiment, since the calculation operation instruction is generated according to the calculation requirement of the target calculation unit, and the data transmission instruction is generated according to the side information and the storage location of the target storage unit, the target calculation unit can efficiently and accurately execute the calculation instruction according to the generated calculation operation instruction and the data transmission instruction, thereby improving the efficiency of running the calculation instruction.
For the convenience of understanding of those skilled in the art, the following describes in detail a method for executing a computational graph provided by the present application, where the method is used in a computer device, and the device includes a reconfigurable processor and a memory, where the reconfigurable processor includes a plurality of computing units, and the memory includes a plurality of storage units, where the method may include:
S1, acquiring a preset calculation chart.
S2, determining a target computing unit from the computing units according to the computing requirements of the nodes.
S3, determining candidate storage units from the storage units according to the side information.
S4, determining the access type of the node according to the historical access times of the node, and determining the target storage unit from the storage units according to the access type.
S5, according to the side information, a plurality of association nodes with association relations are determined from the nodes.
And S6, if the storage positions of the target storage units corresponding to the associated nodes do not meet the preset conditions, the associated storage units corresponding to the associated nodes are redetermined from the storage units according to the side information.
S7, generating a calculation operation instruction according to the calculation requirement of the target calculation unit.
S8, generating a data transmission instruction according to the side information and the storage position of the target storage unit.
S9, acquiring the execution priority of each node, and determining the execution sequence of each node according to the priority and the calculation requirement.
S10, running each operation instruction through each target computing unit according to the execution sequence.
It should be noted that, for the description in the above S1-S10, reference may be made to the description related to the above embodiment, and the effects thereof are similar, which is not repeated here.
It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a calculation map execution device for realizing the above-mentioned calculation map execution method. The implementation of the solution provided by the apparatus is similar to the implementation described in the above method, so the specific limitation in the embodiments of the apparatus for executing a calculation map provided below may refer to the limitation of the method for executing a calculation map hereinabove, and will not be repeated herein.
In one embodiment, as shown in fig. 7, there is provided a calculation map execution apparatus, including an acquisition module 701, a first determination module 702, and an operation module 703, wherein:
The acquiring module 701 is configured to acquire a preset calculation graph, where the calculation graph includes a plurality of nodes and side information corresponding to each node;
a first determining module 702, configured to determine, for each node, a target computing unit and a target storage unit corresponding to the node according to node information and side information corresponding to the node;
the operation module 703 is configured to generate an operation instruction corresponding to the node according to the target computing unit and the target storage unit, and operate each operation instruction through each target computing unit.
The calculation map executing device provided in this embodiment may execute the above method embodiment, and its implementation principle and technical effects are similar, and will not be described herein.
In one embodiment, the node information includes a computing requirement corresponding to the node, and the first determining module 802 includes a first determining unit and a second determining unit, where:
the first determining unit is used for determining a target computing unit from all computing units according to the computing requirements;
And a second determining unit configured to determine a target storage unit from the storage units based on the side information.
The calculation map executing device provided in this embodiment may execute the above method embodiment, and its implementation principle and technical effects are similar, and will not be described herein.
In one embodiment, the second determining unit is specifically configured to:
determining candidate storage units from the storage units according to the side information;
determining the access type of the node according to the historical access times of the node;
And determining a target storage unit from the candidate storage units according to the access type.
The calculation map executing device provided in this embodiment may execute the above method embodiment, and its implementation principle and technical effects are similar, and will not be described herein.
In one embodiment, the apparatus further comprises:
The second determining module is used for determining a plurality of association nodes with association relations from the nodes according to the side information;
And the third determining module is used for re-determining the associated storage unit corresponding to each associated node from the storage units according to the side information if the storage position of the target storage unit corresponding to each associated node does not meet the preset condition, wherein the preset condition is that the difference value between the storage addresses of the target storage units corresponding to each associated node is smaller than the preset threshold value.
The calculation map executing device provided in this embodiment may execute the above method embodiment, and its implementation principle and technical effects are similar, and will not be described herein.
In one embodiment, the operation module 803 includes:
The third determining unit is used for obtaining the execution priority of each node and determining the execution sequence of each node according to the priority and the calculation requirement;
and the running unit is used for running each operation instruction through each target calculation unit according to the execution sequence.
The calculation map executing device provided in this embodiment may execute the above method embodiment, and its implementation principle and technical effects are similar, and will not be described herein.
In one embodiment, the operation instruction includes a calculation operation instruction and a data transmission instruction, and the operation module 803 includes:
the first generation unit is used for generating a calculation operation instruction according to the calculation requirement of the target calculation unit;
and the second generation unit is used for generating a data transmission instruction according to the side information and the storage position of the target storage unit.
The calculation map executing device provided in this embodiment may execute the above method embodiment, and its implementation principle and technical effects are similar, and will not be described herein.
The respective modules in the above-described calculation map execution apparatus may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one exemplary embodiment, a computer device is provided, which may be a server, and the internal structure thereof may be as shown in fig. 8. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing node data and edge data corresponding to the calculation map. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a computational graph execution method.
It will be appreciated by those skilled in the art that the structure shown in FIG. 8 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:
Acquiring a preset calculation graph, wherein the calculation graph comprises a plurality of nodes and side information corresponding to each node;
For each node, determining a target computing unit and a target storage unit corresponding to the node according to node information and side information corresponding to the node;
According to the target calculation unit and the target storage unit, generating operation instructions corresponding to the nodes, and running each operation instruction through each target calculation unit.
In one embodiment, the processor when executing the computer program further performs the steps of:
determining a target computing unit from the computing units according to the computing requirements;
And determining a target storage unit from the storage units according to the side information.
In one embodiment, the processor when executing the computer program further performs the steps of:
determining candidate storage units from the storage units according to the side information;
determining the access type of the node according to the historical access times of the node;
And determining a target storage unit from the candidate storage units according to the access type.
In one embodiment, the processor when executing the computer program further performs the steps of:
determining a plurality of association nodes with association relations from the nodes according to the side information;
if the storage positions of the target storage units corresponding to the associated nodes do not meet the preset conditions, the associated storage units corresponding to the associated nodes are redetermined from the storage units according to the side information, and the associated storage units meet the preset conditions, wherein the preset conditions are that the difference value between the storage addresses of the target storage units corresponding to the associated nodes is smaller than a preset threshold value.
In one embodiment, the processor when executing the computer program further performs the steps of:
acquiring the execution priority of each node, and determining the execution sequence of each node according to the priority and the calculation requirement;
and running each operation instruction through each target computing unit according to the execution sequence.
In one embodiment, the processor when executing the computer program further performs the steps of:
generating a calculation operation instruction according to the calculation requirement of the target calculation unit;
and generating a data transmission instruction according to the side information and the storage position of the target storage unit.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:
Acquiring a preset calculation graph, wherein the calculation graph comprises a plurality of nodes and side information corresponding to each node;
For each node, determining a target computing unit and a target storage unit corresponding to the node according to node information and side information corresponding to the node;
According to the target calculation unit and the target storage unit, generating operation instructions corresponding to the nodes, and running each operation instruction through each target calculation unit.
In one embodiment, the computer program when executed by the processor further performs the steps of:
determining a target computing unit from the computing units according to the computing requirements;
And determining a target storage unit from the storage units according to the side information.
In one embodiment, the computer program when executed by the processor further performs the steps of:
determining candidate storage units from the storage units according to the side information;
determining the access type of the node according to the historical access times of the node;
And determining a target storage unit from the candidate storage units according to the access type.
In one embodiment, the computer program when executed by the processor further performs the steps of:
determining a plurality of association nodes with association relations from the nodes according to the side information;
if the storage positions of the target storage units corresponding to the associated nodes do not meet the preset conditions, the associated storage units corresponding to the associated nodes are redetermined from the storage units according to the side information, and the associated storage units meet the preset conditions, wherein the preset conditions are that the difference value between the storage addresses of the target storage units corresponding to the associated nodes is smaller than a preset threshold value.
In one embodiment, the computer program when executed by the processor further performs the steps of:
acquiring the execution priority of each node, and determining the execution sequence of each node according to the priority and the calculation requirement;
and running each operation instruction through each target computing unit according to the execution sequence.
In one embodiment, the computer program when executed by the processor further performs the steps of:
generating a calculation operation instruction according to the calculation requirement of the target calculation unit;
and generating a data transmission instruction according to the side information and the storage position of the target storage unit.
In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:
Acquiring a preset calculation graph, wherein the calculation graph comprises a plurality of nodes and side information corresponding to each node;
For each node, determining a target computing unit and a target storage unit corresponding to the node according to node information and side information corresponding to the node;
According to the target calculation unit and the target storage unit, generating operation instructions corresponding to the nodes, and running each operation instruction through each target calculation unit.
In one embodiment, the computer program when executed by the processor further performs the steps of:
determining a target computing unit from the computing units according to the computing requirements;
And determining a target storage unit from the storage units according to the side information.
In one embodiment, the computer program when executed by the processor further performs the steps of:
determining candidate storage units from the storage units according to the side information;
determining the access type of the node according to the historical access times of the node;
and determining a target storage unit from the candidate storage units according to the access type.
In one embodiment, the computer program when executed by the processor further performs the steps of:
determining a plurality of association nodes with association relations from the nodes according to the side information;
if the storage positions of the target storage units corresponding to the associated nodes do not meet the preset conditions, the associated storage units corresponding to the associated nodes are redetermined from the storage units according to the side information, and the associated storage units meet the preset conditions, wherein the preset conditions are that the difference value between the storage addresses of the target storage units corresponding to the associated nodes is smaller than a preset threshold value.
In one embodiment, the computer program when executed by the processor further performs the steps of:
acquiring the execution priority of each node, and determining the execution sequence of each node according to the priority and the calculation requirement;
and running each operation instruction through each target computing unit according to the execution sequence.
In one embodiment, the computer program when executed by the processor further performs the steps of:
generating a calculation operation instruction according to the calculation requirement of the target calculation unit;
and generating a data transmission instruction according to the side information and the storage position of the target storage unit.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magneto-resistive random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (PHASE CHANGE Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (10)

1.一种计算图执行方法,其特征在于,用于计算机设备,所述设备包括可重构处理器以及存储器,所述可重构处理器包括多个计算单元,所述存储器包括多个存储单元,所述方法包括:1. A computation graph execution method, characterized in that it is used in a computer device, the device includes a reconfigurable processor and a memory, the reconfigurable processor includes a plurality of computing units, the memory includes a plurality of storage units, and the method includes: 获取预设的计算图,所述计算图包括多个节点以及各所述节点对应的边信息;Obtain a preset computational graph, the computational graph comprising a plurality of nodes and edge information corresponding to each of the nodes; 针对每个所述节点,根据所述节点对应的节点信息以及所述边信息,确定所述节点对应的目标计算单元和目标存储单元;For each of the nodes, determining a target computing unit and a target storage unit corresponding to the node according to the node information corresponding to the node and the edge information; 根据所述目标计算单元和所述目标存储单元,生成所述节点对应的运算指令,并通过各所述目标计算单元运行各所述运算指令。According to the target computing unit and the target storage unit, a computing instruction corresponding to the node is generated, and each of the computing instructions is executed through each of the target computing units. 2.根据权利要求1所述的方法,其特征在于,所述节点信息包括所述节点对应的计算需求,所述根据所述节点对应的节点信息以及所述边信息,确定所述节点对应的目标计算单元和目标存储单元,包括:2. The method according to claim 1, wherein the node information includes a computing requirement corresponding to the node, and the determining the target computing unit and the target storage unit corresponding to the node according to the node information corresponding to the node and the edge information comprises: 根据所述计算需求,从各所述计算单元中确定所述目标计算单元;According to the computing requirement, determining the target computing unit from the computing units; 根据所述边信息,从各所述存储单元中确定所述目标存储单元。The target storage unit is determined from the storage units according to the side information. 3.根据权利要求2所述的方法,其特征在于,所述根据所述边信息,从各所述存储单元中确定所述目标存储单元,包括:3. The method according to claim 2, characterized in that the step of determining the target storage unit from each of the storage units according to the side information comprises: 根据所述边信息,从各所述存储单元中确定候选存储单元;Determine a candidate storage unit from each of the storage units according to the side information; 根据所述节点的历史访问次数,确定所述节点的访问类型;Determining the access type of the node according to the historical access times of the node; 根据所述访问类型,从各所述候选存储单元中确定所述目标存储单元。The target storage unit is determined from each of the candidate storage units according to the access type. 4.根据权利要求2或3所述的方法,其特征在于,所述方法还包括:4. The method according to claim 2 or 3, characterized in that the method further comprises: 根据各所述边信息,从各所述节点中确定存在关联关系的多个关联节点;According to each of the edge information, a plurality of associated nodes having an associated relationship are determined from each of the nodes; 若各所述关联节点对应的目标存储单元的存储位置不满足预设条件,则根据所述边信息从各所述存储单元中重新确定各所述关联节点对应的关联存储单元,所述关联存储单元满足所述预设条件,所述预设条件为各所述关联节点对应的目标存储单元的存储地址之间的差值小于预设阈值。If the storage position of the target storage unit corresponding to each of the associated nodes does not meet the preset conditions, the associated storage units corresponding to each of the associated nodes are re-determined from each of the storage units based on the edge information, and the associated storage units meet the preset conditions, and the preset conditions are that the difference between the storage addresses of the target storage units corresponding to each of the associated nodes is less than a preset threshold. 5.根据权利要求2所述的方法,其特征在于,所述通过各所述目标计算单元运行各所述运算指令,包括:5. The method according to claim 2, wherein the step of executing each of the operation instructions through each of the target computing units comprises: 获取各所述节点的执行优先级,根据所述优先级以及所述计算需求,确定各所述节点的执行顺序;Obtaining the execution priority of each of the nodes, and determining the execution order of each of the nodes according to the priority and the computing requirement; 按照所述执行顺序,通过各所述目标计算单元运行各所述运算指令。According to the execution order, each of the operation instructions is executed by each of the target computing units. 6.根据权利要求1所述的方法,其特征在于,所述运算指令包括计算操作指令和数据传输指令,所述根据所述目标计算单元和所述目标存储单元,生成运算指令,包括:6. The method according to claim 1, wherein the operation instruction comprises a calculation operation instruction and a data transmission instruction, and the generating the operation instruction according to the target computing unit and the target storage unit comprises: 根据所述目标计算单元的计算需求,生成所述计算操作指令;generating the computing operation instruction according to the computing requirement of the target computing unit; 根据所述边信息和所述目标存储单元的存储位置,生成所述数据传输指令。The data transfer instruction is generated according to the side information and the storage location of the target storage unit. 7.一种计算图执行装置,其特征在于,用于计算机设备,所述设备包括可重构处理器以及存储器,所述可重构处理器包括多个计算单元,所述存储器包括多个存储单元,所述装置包括:7. A computation graph execution device, characterized in that it is used in a computer device, the device includes a reconfigurable processor and a memory, the reconfigurable processor includes a plurality of computing units, the memory includes a plurality of storage units, and the device includes: 获取模块,用于获取预设的计算图,所述计算图包括多个节点以及各所述节点对应的边信息;An acquisition module is used to acquire a preset computational graph, wherein the computational graph includes a plurality of nodes and edge information corresponding to each of the nodes; 第一确定模块,用于针对每个所述节点,根据所述节点对应的节点信息以及所述边信息,确定所述节点对应的目标计算单元和目标存储单元;A first determination module is used to determine, for each of the nodes, a target computing unit and a target storage unit corresponding to the node according to the node information corresponding to the node and the edge information; 运行模块,用于根据所述目标计算单元和所述目标存储单元,生成所述节点对应的运算指令,并通过各所述目标计算单元运行各所述运算指令。An operation module is used to generate a computing instruction corresponding to the node according to the target computing unit and the target storage unit, and to execute each of the computing instructions through each of the target computing units. 8.一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至6中任一项所述的方法的步骤。8. A computer device, comprising a memory and a processor, wherein the memory stores a computer program, wherein the processor implements the steps of the method according to any one of claims 1 to 6 when executing the computer program. 9.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至6中任一项所述的方法的步骤。9. A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 6 are implemented. 10.一种计算机程序产品,包括计算机程序,其特征在于,该计算机程序被处理器执行时实现权利要求1至6中任一项所述的方法的步骤。10. A computer program product, comprising a computer program, characterized in that when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 6 are implemented.
CN202411127726.1A 2024-08-16 2024-08-16 Computational graph execution method, device, equipment, storage medium and program product Pending CN119201833A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411127726.1A CN119201833A (en) 2024-08-16 2024-08-16 Computational graph execution method, device, equipment, storage medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411127726.1A CN119201833A (en) 2024-08-16 2024-08-16 Computational graph execution method, device, equipment, storage medium and program product

Publications (1)

Publication Number Publication Date
CN119201833A true CN119201833A (en) 2024-12-27

Family

ID=94074669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411127726.1A Pending CN119201833A (en) 2024-08-16 2024-08-16 Computational graph execution method, device, equipment, storage medium and program product

Country Status (1)

Country Link
CN (1) CN119201833A (en)

Similar Documents

Publication Publication Date Title
US11436400B2 (en) Optimization method for graph processing based on heterogeneous FPGA data streams
TWI848007B (en) Neural processing unit
US20220147795A1 (en) Neural network tiling method, prediction method, and related apparatus
CN111488205A (en) Scheduling method and scheduling system for heterogeneous hardware architecture
US10114795B2 (en) Processor in non-volatile storage memory
US12511173B2 (en) Cloud-based framework for analysis using accelerators
CN117271136A (en) Data processing methods, devices, equipment and storage media
KR20210103393A (en) System and method for managing conversion of low-locality data into high-locality data
CN115061825B (en) Heterogeneous computing system and method for private computing, private data and federal learning
CN115205092A (en) Graphical execution of dynamic batch components using access request response
KR20210081663A (en) Interconnect device, operation method of interconnect device, and artificial intelligence(ai) accelerator system
CN109213587B (en) Multi-Stream parallel DAG graph task mapping strategy under GPU platform
CN118519768A (en) Method, device, equipment and storage medium for data overflow to shared cache
US11705207B2 (en) Processor in non-volatile storage memory
WO2022078400A1 (en) Device and method for processing multi-dimensional data, and computer program product
CN119783812B (en) Optimization Methods for Parallel Training and Inference Adaptation of Next-Generation Heterogeneous Supercomputing Large Models
CN115168284B (en) Deep learning-oriented coarse granularity reconfigurable array system and computing method
CN117827419A (en) A computing method based on multiple bare chips and related equipment
CN120196412A (en) Affinity-prioritized resource intelligent mapping method, device, medium and product
CN119201833A (en) Computational graph execution method, device, equipment, storage medium and program product
US20240281366A1 (en) Processing circuit and computation scheduling method of artificial intelligence model
CN118193443A (en) Data loading method for processor, computing device and medium
KR102723995B1 (en) System and method for efficiently converting low-locality data into high-locality data
CN110032446A (en) A kind of method and device applied to storage allocation space in embedded system
CN111488216B (en) Data processing method, device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination