[go: up one dir, main page]

CN113449856A - Control flow graph processing method and related equipment - Google Patents

Control flow graph processing method and related equipment Download PDF

Info

Publication number
CN113449856A
CN113449856A CN202010230463.2A CN202010230463A CN113449856A CN 113449856 A CN113449856 A CN 113449856A CN 202010230463 A CN202010230463 A CN 202010230463A CN 113449856 A CN113449856 A CN 113449856A
Authority
CN
China
Prior art keywords
linear execution
linear
execution sequence
sequence
semantics
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010230463.2A
Other languages
Chinese (zh)
Other versions
CN113449856B (en
Inventor
旷佩玉
卫露宁
陈飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202010230463.2A priority Critical patent/CN113449856B/en
Publication of CN113449856A publication Critical patent/CN113449856A/en
Application granted granted Critical
Publication of CN113449856B publication Critical patent/CN113449856B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/436Semantic checking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • G06N3/065Analogue means

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Neurology (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

本申请实施例公开了一种控制流图的处理方法及相关设备,通过深度学习框架编译源代码,得到控制流图,再对该控制流图进行图切分及再编译,得到多个设备支持的线性执行序列(该多个线性执行序列之间没有控制依赖),最后通过在深度学习框架上模拟执行各个线性执行序列,在模拟执行过程中基于该源代码的语义规则识别各线性执行序列的语义,再根据该语义在各个线性执行序列之间插入算子,从而将各个线性执行子序列连接成一张设备可直接识别的计算图,这张计算图表述的就是一个完整的神经网络结构,该处理方式避免了人为解析源代码内部的逻辑关系来确定整个源代码所表达的神经网络结构,不易出错,且方便快捷,提高了设备的处理效率。

Figure 202010230463

The embodiment of the present application discloses a control flow graph processing method and related equipment. The source code is compiled through a deep learning framework to obtain a control flow graph, and then the control flow graph is divided and recompiled to obtain the support of multiple devices. The linear execution sequence (there is no control dependency between the multiple linear execution sequences), and finally by simulating the execution of each linear execution sequence on the deep learning framework, during the simulated execution process, the semantic rules of the source code are used to identify each linear execution sequence. Semantics, and then insert operators between each linear execution sequence according to the semantics, so as to connect each linear execution subsequence into a computational graph that can be directly recognized by the device. This computational graph represents a complete neural network structure. The processing method avoids artificially analyzing the internal logical relationship of the source code to determine the neural network structure expressed by the entire source code, which is less error-prone, convenient and quick, and improves the processing efficiency of the device.

Figure 202010230463

Description

Control flow graph processing method and related equipment
Technical Field
The present application relates to the field of deep learning, and in particular, to a method and related device for processing a control flow graph.
Background
With the rapid development of artificial intelligence, deep learning is widely applied, such as picture recognition, voice recognition, automatic driving and the like, in order to meet the increasing requirements of a neural network, a deep learning framework comes along, and researchers only need to pay attention to the deep learning network structure through the deep learning framework and can complete a complex deep learning network structure by writing simple source codes (such as python language), so that model reasoning and training on hardware are realized.
The existing deep learning frames (such as tensiorflow, pitorch, mxnet and the like) are many, in order to quickly use deep learning to solve a new problem, a deep learning frame is indispensable to select, the capabilities supported by different deep learning frames are different, most of the existing deep learning frames write a script of a training network based on source codes such as python language, and as the structure of a neural network is more and more complex, the semantics of the source codes required to be supported by the deep learning frames are more and more complex.
However, most of the current deep learning frameworks can only determine the complete neural network structure expressed by the whole source code by artificially analyzing the logic relationship in the source code, and this processing method has low performance, is easy to make errors and is time-consuming.
Disclosure of Invention
The embodiment of the application provides a processing method and related equipment of a control flow graph, which are used for inserting at least one operator (such as a control operator, an assignment operator and the like) into a plurality of linear execution sequences so as to connect the plurality of linear execution sequences into a calculation graph which can be directly identified by equipment, and the processing method and the related equipment are not easy to make mistakes, and are convenient and quick.
Based on this, the embodiment of the present application provides the following technical solutions:
in a first aspect, the present application first provides a method for processing a control flow graph, which may be used in the field of artificial intelligence, and the method includes: first, if the user decides which neural network to use (e.g., a convolutional upgrade network), the source code, which is the structure of the neural network that the user forms by writing a simple high-level language (e.g., python script), can be input through the deep learning framework. And compiling the input source code through the deep learning framework to obtain a control flow graph corresponding to the source code. After the deep learning framework obtains the control flow graph corresponding to the source code, the control flow graph is segmented at the position including the nonlinear semantics, so that a plurality of segmented subgraphs are obtained, each subgraph has no control dependency, each subgraph refers to a graph in which a node set and an edge set are respectively a subset of a node set and a subset of an edge set of a certain graph, and each subgraph can also be called a node series, namely, a node sequence of an execution result can be obtained by executing in sequence according to input. The sub-graphs obtained after segmentation are recompiled to obtain a plurality of linear execution sequences, the obtained plurality of linear execution sequences have no data dependency relationship and cannot be independently executed on a deep learning framework, then each linear execution sequence is simulated and executed through the deep learning framework, so that the semantics represented by the source code corresponding to each linear execution sequence are identified, and at least one operator is inserted between each linear execution sequence according to the semantics of each linear execution sequence, wherein the operator in the embodiment of the application refers to the mapping from one function space to another function space supported by equipment. Wherein the deep learning framework can identify semantics of the plurality of linear execution sequences based on semantic rules of the source code during the simulation execution of the linear execution sequences, that is, the deep learning framework understands which part of the program logic described by the source code each of the fractured linear execution sequences corresponds to by simulating the execution process. In the case of a type determination of the source code, the semantic rules of the source code are determined. Specifically, the control operator is a meaning expression indicating to jump from one linear execution sequence to another linear execution sequence according to the control semantics in the semantic rule; the assignment operator is an instruction to assign the address of the output data of one linear execution sequence to the address of the input data of another linear execution sequence. In this way, the individual isolated linear execution sequences are concatenated in the manner described above into a computation graph that can be directly recognized by the device (e.g., the AI chip).
In the above embodiment of the present application, a source code is compiled through a deep learning framework to obtain a control flow graph, the control flow graph is then graph-partitioned and recompiled to obtain linear execution sequences supported by multiple devices (there is no control dependency among the multiple linear execution sequences), and finally, each linear execution sequence is simulated and executed on the deep learning framework, and an operator is inserted between each linear execution sequence according to the semantics of each linear execution sequence, so that each linear execution subsequence is connected into a calculation graph which can be directly identified by the device, and the calculation graph represents a complete neural network structure.
With reference to the first aspect of the embodiment of the present application, in the first implementation manner of the first aspect of the embodiment of the present application, if there is a data dependency relationship between one linear execution sequence and another linear execution sequence, for example, if the output data of the linear execution sequence a1 is the input data of the linear execution sequence a2, an assignment operator (also referred to as an assign operator) needs to be inserted after the linear execution sequence a1, and the assignment operator is used for assigning the address of the output data of the linear execution sequence a1 to the address of the input data of the linear execution sequence a 2. The linear execution sequence a1 and the linear execution sequence a2 both belong to the above-mentioned linear execution sequences.
In the embodiment of the application, it is explained how to connect at least two linear execution sequences by an assignment operator, and the operability is achieved.
With reference to the first aspect of the embodiment of the present application and the first implementation manner of the first aspect of the embodiment of the present application, in a second implementation manner of the first aspect of the embodiment of the present application, if the deep learning framework determines, according to semantic rules of the source code, that a linear execution sequence that has been currently simulated to be executed completely is a linear execution sequence that includes control semantics, it is required to insert a control operator (which may be referred to as a first control operator) after the linear execution sequence, where the control operator is used to jump from the linear execution sequence that has been currently simulated to be executed completely to another one or more linear execution sequences, and how the deep learning framework determines which other one or more linear execution sequences to jump to is pointed to by pointers of the respective linear execution sequences.
It should be noted that, in the embodiment of the present application, the control operator and the assignment operator are inserted between the linear execution sequences according to the semantics of the source code without conflict, that is, the control operator may be inserted only after the linear execution sequence including the control semantics, and if the output data of the linear execution sequence is also required to be used as the input data of other linear execution sequences, the assignment operator may be inserted after the linear execution sequence; if the current linear execution sequence does not include the control semantics, but the output data of the linear execution sequence needs to be used as the input data of other linear execution sequences, then only the assignment operator needs to be inserted after the linear execution sequence, and the specific details are not limited herein.
In the embodiment of the application, the method explains how to connect at least two linear execution sequences through a control operator, provides another inserted operator type and has flexibility.
With reference to the second implementation manner of the first aspect of the present embodiment, in a third implementation manner of the first aspect of the present embodiment, after the deep learning framework simulates and executes a current linear execution sequence (the current linear execution sequence is an execution sequence having conditional judgment semantics (e.g., an if statement having conditional judgment semantics)), a control operator corresponding to the conditional judgment semantics is inserted after the current linear execution sequence, and then a jump is made to a plurality of other linear execution sequences pointing to a plurality of branches, which are connected to the current linear execution sequence through the control operator.
With reference to the second implementation manner of the first aspect of the embodiment of the present application, in a fourth implementation manner of the first aspect of the embodiment of the present application, after the deep learning framework is simulated and executed, a current linear execution sequence (the current linear execution sequence is an execution sequence with loop semantics (e.g., a while statement with loop semantics)) is executed, a control operator corresponding to the loop semantics is inserted after the current linear execution sequence, then a jump is made to a loop structure (i.e., a second linear execution sequence) connected to the current linear execution sequence through the control operator and pointing to the current linear execution sequence, and a jump operator (also referred to as a second control operator or an active operator) is inserted after the second linear execution sequence, the active operator is used to execute the execution sequence with loop semantics and the loop structure in a loop manner until the loop structure output data meets a preset value, and at the moment, the preset value meets the condition that the circulation execution is not performed any more, and the circulation execution is stopped. The first linear execution sequence and the loop structure form a loop body, and the loop structure is a program structure which is set in a program and needs to repeatedly execute a certain function, and judges whether to continue to execute the certain function or exit the loop according to a condition (namely, the first linear execution sequence) in the loop body.
In the embodiment of the application, how to insert the control operator when a linear execution series with control semantics (such as if statement and while statement) is encountered is specifically explained, and the method has operability in consideration of practical application situations.
With reference to the first aspect of the embodiment of the present application and the first implementation manner to the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect of the embodiment of the present application, sequentially performing simulation execution on each linear execution sequence through a deep learning framework, and inserting at least one operator between each linear execution sequence according to semantics of each linear execution sequence may specifically have multiple implementation manners, including but not limited to the following: a. after all the linear execution sequences are simulated and executed, operators are uniformly inserted between the linear execution sequences. After the deep learning framework obtains a plurality of linear execution sequences by recompiling a plurality of subgraphs, firstly, all the obtained linear execution sequences are simulated and executed, and after all the linear execution sequences are simulated and executed, whether operators need to be inserted after each linear execution sequence and what type of operators need to be inserted are judged according to the semantics of each linear execution sequence. b. Each time the simulation has executed a linear execution sequence, it is determined whether and what type of operator needs to be inserted after the linear execution sequence. After the deep learning framework obtains a plurality of linear execution sequences through recompiling a plurality of subgraphs, whether operators need to be inserted after the linear execution sequences is judged sequentially after each linear execution sequence is executed in a simulation mode, if the operators need to be inserted after the linear execution sequences, the semantics of the current linear execution sequences is determined according to the semantics rules of the source codes, and the types of the operators need to be inserted after the current linear execution sequences are determined.
In the embodiment of the present application, a sequence between the deep learning framework simulation execution linear execution sequence and the interpolation operator is specifically set forth, that is, all the linear execution sequence reinsertion operators can be completely executed in a simulation manner, and the operations can be executed in a simulation manner while the operators are interpolated, so that flexibility is provided.
With reference to the first aspect of the embodiment of the present application and the first implementation manner to the fifth implementation manner of the first aspect, in a sixth implementation manner of the first aspect of the embodiment of the present application, a specific processing manner of the deep learning framework to insert at least one operator between the multiple linear execution sequences according to the semantics of each linear execution sequence may be: firstly, an instruction set is generated according to the semantics of each linear execution sequence, the instruction set comprises a plurality of instructions, and each instruction in the plurality of instructions is used for instructing the deep learning framework to insert at least one operator between the plurality of linear execution sequences.
In the embodiment of the application, the insertion operator is indicated by an instruction in an instruction set, and the method has realizability.
With reference to the first aspect of the present application and the first to sixth implementation manners of the first aspect, in a seventh implementation manner of the first aspect of the present application, the deep learning framework may be located at a first node of the device, where the first node may send the obtained computation graph to a second node located in the device through the deep learning framework, or send the computation graph to a second node located in a different device, specifically, the computation graph may be obtained from the deep learning framework when the second node needs to execute data, or the computation graph may be sent to the second node after the deep learning framework connects a plurality of linear execution sequences into one computation graph, where a manner that the computation graph is sent to the second node is not specifically limited here. When the second node is located in a different device, the second node may be the device itself, e.g., a separate server.
In the embodiment of the application, several processing modes of the obtained calculation graph are set forth, and flexibility is achieved.
With reference to the first aspect of the present embodiment and the first to seventh implementation manners of the first aspect, in an eighth implementation manner of the first aspect of the present embodiment, the deep learning framework may specifically be any one of mindspore, tensorflow, pitorch, mxnet, caffe, or theta.
With reference to the first aspect of the present application and the first implementation manner to the eighth implementation manner of the first aspect, in a ninth implementation manner of the first aspect of the present application, the device for obtaining the computation graph may be an AI chip, where the AI chip is also called an AI accelerator or a computation card, that is, a module for processing a large number of computation tasks in an artificial intelligence application. For example, the AI chip described in the embodiments of the present application may be a GPU, an FPGA, an ASIC, etc. without a CPU (other non-computing tasks are handled by the CPU), an AI chip of the "shangtang" series (e.g., shangtang 910), or an AI chip with an independent AI CPU, and the type of the AI chip is not limited herein.
A second aspect of the embodiments of the present application provides a deep learning framework, which has a function of implementing the method of the first aspect or any one of the possible implementation manners of the first aspect. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.
A third aspect of the present embodiment provides an execution device, which may include a memory, a processor, and a bus system, where the memory is configured to store a program, and the processor is configured to call the program stored in the memory to execute the method of the first aspect or any one of the possible implementation manners of the first aspect of the present embodiment.
A fourth aspect of the present application provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the method of the first aspect or any one of the possible implementations of the first aspect.
A fifth aspect of embodiments of the present application provides a computer program, which, when run on a computer, causes the computer to perform the method of the first aspect or any one of the possible implementation manners of the first aspect.
Drawings
FIG. 1 is a schematic structural diagram of an artificial intelligence body framework provided by an embodiment of the present application;
fig. 2 is a schematic diagram of a processing method of a control flow graph according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a control flow graph cut into multiple subgraphs according to an embodiment of the present application;
FIG. 4 is a diagram illustrating how multiple linear execution sequences are concatenated into a large execution sequence according to an embodiment of the present application;
FIG. 5 is a diagram illustrating how a plurality of linear execution sequences are connected into a large execution sequence when the control semantics are if statements in the embodiment of the present application;
FIG. 6 is a diagram of a plurality of linear execution sequences concatenated into a large execution sequence according to an embodiment of the present application;
FIG. 7 is a diagram illustrating how a plurality of linear execution sequences are connected into a large execution sequence when the control semantics are while statements in the embodiment of the present application;
FIG. 8 is another illustration of a plurality of linear execution sequences concatenated into one large execution sequence in an embodiment of the present application;
FIG. 9 is another illustration of a plurality of linear execution sequences concatenated into a large execution sequence in an embodiment of the present application;
FIG. 10 is a schematic diagram of a deep learning framework in an embodiment of the present application;
fig. 11 is a schematic diagram of an execution device in the embodiment of the present application.
Detailed Description
The embodiment of the application provides a processing method and related equipment of a control flow graph, which are used for inserting corresponding operators (such as control operators, assignment operators and the like) into a plurality of linear execution sequences, and connecting the plurality of linear execution sequences into a computation graph which can be directly identified by equipment through the operators (such as compiling a source code written in a high-level language into an assembly language supported by the equipment through a deep learning framework), so that the equipment is not required to frequently acquire each linear execution sequence.
Embodiments of the present application are described below with reference to the accompanying drawings. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.
The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
The general workflow of the artificial intelligence system will be described first, please refer to fig. 1, which shows a schematic structural diagram of an artificial intelligence body framework, and the artificial intelligence body framework is explained below from two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Where "intelligent information chain" reflects a list of processes processed from the acquisition of data. For example, the general processes of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision making and intelligent execution and output can be realized. In this process, the data undergoes a "data-information-knowledge-wisdom" refinement process. The 'IT value chain' reflects the value of the artificial intelligence to the information technology industry from the bottom infrastructure of the human intelligence, information (realization of providing and processing technology) to the industrial ecological process of the system.
(1) Infrastructure
The infrastructure provides computing power support for the artificial intelligent system, realizes communication with the outside world, and realizes support through a foundation platform. Communicating with the outside through a sensor; the computing power is provided by the smart chip, i.e., the aforementioned AI chip (CPU, NPU, GPU, ASIC, FPGA, etc. hardware acceleration chip), which in the embodiment of the present application also includes the AI chip of the Shengteng series (e.g., Shengteng 910); the basic platform comprises distributed computing framework, network and other related platform guarantees and supports, and can comprise cloud storage and computing, interconnection and intercommunication networks and the like. For example, sensors and external communications acquire data that is provided to intelligent chips in a distributed computing system provided by the base platform for computation.
(2) Data of
Data at the upper level of the infrastructure is used to represent the data source for the field of artificial intelligence. The data relates to graphs, images, voice and texts, and also relates to the data of the Internet of things of traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.
(3) Data processing
Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.
The machine learning and the deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.
Inference means a process of simulating an intelligent human inference mode in a computer or an intelligent system, using formalized information to think about and solve a problem by a machine according to an inference control strategy, and a typical function is searching and matching.
The decision-making refers to a process of making a decision after reasoning intelligent information, and generally provides functions of classification, sequencing, prediction and the like.
(4) General capabilities
After the above-mentioned data processing, further based on the result of the data processing, some general capabilities may be formed, such as algorithms or a general system, e.g. translation, analysis of text, computer vision processing, speech recognition, recognition of images, etc.
(5) Intelligent product and industrial application
The intelligent product and industry application refers to the product and application of an artificial intelligence system in various fields, and is the encapsulation of an artificial intelligence integral solution, the intelligent information decision is commercialized, and the landing application is realized, and the application field mainly comprises: intelligent terminal, intelligent manufacturing, intelligent transportation, intelligent house, intelligent medical treatment, intelligent security protection, autopilot, safe city etc..
The method for processing the control flow graph in the embodiment of the present invention can be applied to various fields of the artificial intelligence field (for example, the image processing field, the computer vision field, the semantic analysis field, and the like), and specifically, in combination with fig. 1, belongs to a specific data processing manner in the above "(3) data processing", and the method for processing the control flow graph is applied to a deep learning framework, and the control flow graph is processed into a computation graph that can be recognized by an intelligent chip through the deep learning framework (for example, a control flow graph written in a high-level language is compiled into a computation graph in a low-level language supported by the intelligent chip).
Since the processing method of the control flow graph in the embodiment of the present application is applied to the deep learning framework, before introducing the embodiment of the present application, the deep learning framework related to the present application is introduced, and a processing flow of the deep learning framework generally includes: the data to be processed may be converted into data that can be processed by a deep learning framework, and the data format of the data may be in various forms such as tensor, for example, the data to be processed may be image data (e.g., captured by a camera in a mobile terminal) related in the image processing field, may also be video data (e.g., captured by a monitoring system) related in the computer vision field, and may also be some acquired text data (e.g., text information input by a user through a terminal device), which is not limited herein. And applying various required operations aiming at the tensor, developing and training the model through automatic differentiation, and then obtaining an output result to start testing. Since the early deep learning framework is implemented based on high-level languages (such as Java, Python, Lua, etc.) for most of the processing flows, even though the simplest operation is executed, the high-level language consumes more CPU cycles than the low-level language, and is a deep neural network with a complex structure, and thus, the slow operation becomes a natural defect of the high-level language. There are two solutions to this problem: 1) the first approach is to emulate a conventional compiler. This method converts a high-level language into a low-level language (e.g., C language) and then compiles and executes it on the basis of the C language, just as a conventional compiler would compile the high-level language into assembly language of a specific platform to achieve efficient operation. The high-level language is a language independent of a machine, a process or an object, is a programming closer to a natural language and a mathematical formula, basically breaks away from a hardware system of the machine, and is written in a mode which is easier to understand by people, wherein the written program is called as a source program. Low-level languages are programming languages or instruction codes that a machine can directly recognize without translation, each instruction code having a corresponding circuit within a computer to complete it, or programming languages or instruction codes that a machine can directly understand and accept without translation. In order to realize the conversion, the implementation code of each tensor operation is added into the conversion part of the C language in advance, and then the compiler integrates the tensor operations realized by the C language in a compiling stage. 2) The second method is to use a scripting language to realize front-end modeling and a low-level language such as C + + to realize back-end operation, which means that the interaction between the high-level language and the low-level language occurs inside a deep learning framework, so that the front end does not need to be modified and the whole compilation is not needed (only partial compilation is needed by modifying compilation parameters), and the overall speed is faster.
For the embodiment of the application, a deep learning framework of a second solution is adopted, that is, a user firstly builds a model of a neural network by writing a source code, and then compiles the source code into an assembly language which can be identified by a back-end device (such as an AI chip) through the deep learning framework, so that the overall speed of the deep learning framework is improved. It should be noted that the deep learning framework applied in the processing method of the control flow graph in the embodiment of the present application may be any one of the deep learning frameworks meeting the above requirements, for example, any one of mindspore, tensoflow, pitorch, mxnet, caffe, or the thano.
Next, a method for processing a control flow graph provided in the embodiment of the present application is described, please refer to fig. 2:
201. and compiling the source code to obtain a control flow graph corresponding to the source code.
First, if the user decides which neural network (e.g., convolutional neural network) to use, the source code, which is the structure of the neural network that the user forms by writing a simple high-level language (e.g., python script), can be input through the deep learning framework. And compiling the input source code through the deep learning framework to obtain a control flow graph corresponding to the source code. What needs to be explained here is called a control flow diagram, which is also called a control flow diagram, and is an abstraction of branch jump relation in a program, and describes all possible execution paths of the program, nodes of the control flow diagram are statement sets, and edges from A to B indicate that a statement B can be directly executed after the statement A is executed.
202. And cutting the control flow graph to obtain a plurality of cut subgraphs.
After the deep learning framework obtains the control flow graph corresponding to the source code, the control flow graph is segmented at the position including the nonlinear semantics, so that a plurality of segmented subgraphs are obtained, each subgraph has no control dependence, and what called subgraph is also needed to be explained first, wherein the subgraph refers to a graph in which a node set and an edge set are respectively a subset of a node set and a subset of an edge set of a certain graph, and the subgraph can also be called as a node series, namely, a node sequence of an execution result can be obtained by executing in sequence according to input. The description is given by taking fig. 3 as an example: assuming that fig. 3 shows a control flow graph, where each circle represents a node (i.e. a basic block), and an arrow connecting each node (six nodes are illustrated in fig. 3 for nodes 1-6), the control flow graph may be divided into 3 sub-graphs, such as sub-graph 1, sub-graph 2, and sub-graph 3 shown in fig. 3, where sub-graph 1 is composed of node 1, node 2, and node 3, and the data flow direction between these three nodes, sub-graph 2 is composed of node 1, node 4, and the data flow direction between these two nodes, and sub-graph 3 is composed of node 1, node 5, node 6, and the data flow direction between these three nodes.
203. And compiling the multiple subgraphs to obtain multiple linear execution sequences.
The subgraph obtained after segmentation is recompiled to obtain a plurality of linear execution sequences, and the subgraph is obtained by directly segmenting the control flow graph, so that the subgraph is similar to the control flow graph, at the moment, the subgraph can be identified only by a deep learning framework, but equipment such as an AI chip and the like cannot directly identify each subgraph, and each subgraph needs to be recompiled by the deep learning framework, so that a plurality of linear execution sequences which can be identified by the equipment are obtained, namely, the subgraph is different from the linear execution sequences in that the linear execution sequences are logic structures compiled by a machine language which can be identified by the equipment, and the subgraph is logic structures compiled by a machine language which can only be identified by the deep learning framework but cannot be directly identified by the equipment. A sequence which can be formed by arranging all nodes in the linear structure according to the sequence of the execution sequence of the nodes is called a linear execution sequence, and the linear execution sequences have no control dependence and only data dependence.
204. And performing a plurality of linear execution sequences according to sequential simulation, and inserting at least one operator between each linear execution sequence according to the semantic meaning of each linear execution sequence.
After the sub-graph obtained after segmentation is recompiled to obtain a plurality of linear execution sequences, because the obtained plurality of linear execution sequences have no data dependency relationship and cannot be independently executed on the deep learning frame, each linear execution sequence can be further simulated and executed in sequence through the deep learning frame, according to the semantic rule of the source code, the semantics represented by the source code corresponding to each linear execution sequence are identified in the simulation execution process, and at least one operator is inserted into each linear execution sequence according to the semantics. Specifically, the control operator refers to an operation of jumping from one linear execution sequence to another linear execution sequence according to the meaning expression of control semantics in the semantic rule; the assignment operator is an operation of assigning an address of output data of one linear execution sequence to an address of input data of another linear execution sequence.
It should be noted that the simulation execution process of the deep learning framework on each linear execution sequence is to identify semantic rules of the source code, and based on the semantic rules, semantics included in each linear execution code can be further identified, and the structure of the linear execution sequence is not changed by the simulation execution process.
It should be noted here that, in some embodiments of the present application, the performing of each linear execution sequence through the deep learning framework in an order simulation, and inserting at least one operator between each linear execution sequence according to the semantics of each linear execution sequence may specifically have various implementations, including but not limited to the following:
a. after all the linear execution sequences are simulated and executed, operators are uniformly inserted between the linear execution sequences.
After the deep learning framework obtains a plurality of linear execution sequences by recompiling a plurality of subgraphs, firstly, all the obtained linear execution sequences are simulated and executed, and after all the linear execution sequences are simulated and executed, whether operators need to be inserted after each linear execution sequence and what types of operators need to be inserted are judged according to the semantics of each linear execution sequence.
b. Each time the simulation has executed a linear execution sequence, it is determined whether and what type of operator needs to be inserted after the linear execution sequence.
After the deep learning framework obtains a plurality of linear execution sequences by recompiling a plurality of subgraphs, whether operators need to be inserted after the linear execution sequences is judged sequentially after each linear execution sequence is executed in a simulation mode, and if the operators need to be inserted after the linear execution sequences, which type of operators need to be inserted after the linear execution sequences is determined according to the semantics of the current linear execution sequences.
It should be noted that, in some embodiments of the present application, a specific processing manner of the deep learning framework to insert at least one operator between the plurality of linear execution sequences according to the semantics of each linear execution sequence may be: firstly, an instruction set is generated according to the semantics of each linear execution sequence, the instruction set comprises a plurality of instructions, and each instruction in the plurality of instructions is used for instructing the deep learning framework to insert at least one operator between the plurality of linear execution sequences.
It should be further noted that, in some embodiments of the present application, according to semantic rules of the source code, semantics included in each linear execution sequence may be different, and then an operator type to be inserted after each linear execution sequence is also different, including but not limited to the following ways:
A. and inserting a control operator according to the semantic rule of the source code.
Taking the above-mentioned case that each time a linear execution sequence is executed in simulation, whether an operator needs to be inserted after the linear execution sequence and what type of operator needs to be inserted is determined, if the deep learning framework determines, according to the semantic rule of the source code, that the linear execution sequence that is currently executed in simulation is a linear execution sequence including control semantics, a control operator (which may be referred to as a first control operator) needs to be inserted, where the control operator is used to jump from the linear execution sequence that is currently executed in simulation to another linear execution sequence or linear execution sequences, and how the deep learning framework determines which other linear execution sequence or linear execution sequences to jump to is pointed to by the pointer of each linear execution sequence.
It should be noted here that, although the control operator is inserted after the linear execution sequence including the control semantics, when the control semantics are different, the specific execution mode of the insertion operator is also slightly different, and the following description takes the control semantics as if statements and while statements respectively as examples:
a. when the control semantics are conditional judgment semantics.
Taking an if statement with conditional judgment semantics as an example, after the deep learning framework is simulated and executed by a current linear execution sequence (the current linear execution sequence is an execution sequence with control semantics of the if statement), a control operator corresponding to the if statement is inserted after the current linear execution sequence, and then the control operator is switched to a plurality of other linear execution sequences which are connected with the current linear execution sequence through the control operator and point to a plurality of branches.
For ease of understanding, the following examples are given for illustration: as shown in fig. 4, first, the deep learning framework traverses all paths of the entire control flow graph compiled from a source code, and when a control node is encountered, the deep learning framework is segmented to segment the control flow graph into sub-graphs (also referred to as a node series), and the sub-graphs are compiled again to obtain linear execution sequences that can be executed by a device, and simultaneously, a graph number and an ID number are generated for each linear execution sequence, where the graph number is used for the deep learning framework to identify each linear executable sequence, and the ID number is used for the device to identify each linear executable sequence, and the deep learning framework generates an instruction set while segmenting and compiling the entire control flow graph, and each instruction in the instruction set is used for the deep learning framework to insert a corresponding control operator when each linear execution sequence is simulated in sequence, and when a control semantic is an if statement, generates a linear execution sequence that can execute correct branches after being executed after the linear execution sequence (e.g., fig. 4) True graph in (1) and a linear execution sequence of an erroneous branch (e.g., false graph in fig. 4) (which may also be an instruction that generates a linear execution sequence of an erroneous branch and a linear execution sequence of a correct branch that can be executed sequentially, even simultaneously, and is not limited herein). Specifically, the process of simulation execution may be: firstly simulating and executing a cond node, firstly setting a true execution true graph to obtain a graph number of the true graph, returning a switch instruction, executing the cond node (the cond node represents that the graph number is compiled in a control flow graph when a conditional statement such as < … … > exists), setting the cond node to be a false execution false graph to obtain a graph number of the false graph, using the cond node as a connection, storing the graph number of a linear execution sequence of the cond corresponding to the cond node, the graph number of the true graph and the graph number of the true graph, executing a switch ending instruction after the false graph is executed, calling an interface, inserting a control operator (such as the switch operator), wherein the conditional judgment semantic generally has two branches of judgment results (such as the true graph and the false graph) because the if statement belongs to one of statements with conditional judgment semantic, and the switch operator includes the linear execution sequence of the joined c-c series of the execution nodes, and then the switch operator includes the linear judgment result of the linear execution true graph and the linear execution node after the linear execution sequence of the cond is inserted into the linear execution sequence, and the linear execution result of the switch graph number of the join c series of the join nodes, true graph, false graph. Similarly, according to the execution principle described above, all linear execution sequences with control semantics are connected into a large execution sequence with multi-stream concatenation finally through a switch operator.
To further facilitate understanding, the following is again illustrated by way of example: as shown in FIG. 5, a control flow graph with control semantics of if statements is compiled into 4 sub-graphs, i.e., a switch graph, a true branch sub-graph, a false branch sub-graph, and an after sub-graph, and the 4 sub-graphs are recompiled to form 4 linear executable sequences, i.e., a test _ if graph, a true graph, a false graph, and an after graph as shown in FIG. 5. After each sub-graph is recompiled, an instruction set for simulating and executing a linear execution sequence is generated, and each linear execution sequence is simulated and executed in sequence according to the instruction set, taking fig. 5 as an example, if it is assumed that an after graph is fused into a true graph, the generated switch instruction is: switch (cond) -true-graph- > exec-true-graph- > return output- > switch return (switch process returns the original function call stack) - > switch (cond) -false- > call false-graph + exec-false-graph- > set switch graph (according to the current cond, connect the three linear execution sequences of the current cond). Then, the deep learning framework executes the switch instructions in the above order, when the call graph is executed, an input (address of input data) is set for the current graph, an assignment operator (also called an assign operator) is inserted after the input node, after the execution of the fault graph is completed, the obtained cond graph and the true graph are connected, after the simulation execution of all the linear execution sequences is completed, a large execution sequence (i.e. a computation graph) which is formed by connecting the final linear execution sequences and can be executed on a device (e.g. an AI chip) is obtained, and the large execution sequence can be as shown in fig. 6, at this time, the large execution sequence can be directly identified by the device, and the device identifies the execution sequence between the sequences by the ID numbers of the respective linear execution sequences.
b. When the control semantics are cyclic semantics.
Taking a while sentence with loop semantics as an example, after the deep learning framework simulates and executes a current linear execution sequence (the current linear execution sequence is an execution sequence with the control semantics of the while sentence), inserting a control operator corresponding to the while sentence after the current linear execution sequence, then jumping to other linear execution sequences with loop semantics which are connected with the current linear execution sequence through the control operator, and inserting a jump operator (also called as a second control operator or an active operator) after the linear execution sequence with loop semantics, wherein the active operator is used for circularly executing the execution sequence with the control semantics of the while sentence and the linear execution sequence with loop semantics until output data of the linear execution sequence with loop semantics meets a preset value, and the preset value meets the condition of no loop execution, the loop execution is stopped.
For ease of understanding, the following examples are given for illustration: as shown in fig. 7, a control flow graph with control semantics of while statements is compiled into 4 sub-graphs during the compilation process, while conditional judge statements generate switch nodes, the sub-graph in which the switch nodes are located is recompiled into a header graph, the sub-graph in which the while executive statement nodes are located is recompiled into a body graph, the sub-graph in which the after statement nodes are located is recompiled into an after graph, which is the same as the flow of if statements, and after graph segmentation, all sub-graphs are compiled into isolated linear execution sequences that can be executed on a device, that is, test _ while graph, header graph, body graph and after graph segmentation shown in fig. 7. After recompiling each sub-graph, an instruction set simulating and executing a linear execution sequence is generated, and each linear execution sequence is simulated and executed in sequence according to the instruction set, taking fig. 7 as an example, the generated switch instruction is: when simulation execution if and while return the original stack of the switch with different switch instructions, if return with return, while need to resume the header graph at the blue graph, insert active in the switch instruction execution function to connect header graph and body graph, insert active operator at the last insert node of the body graph, achieve the loop effect.
It should be noted that, in some embodiments of the present application, if there is a nested relationship and if statements or while statements are encountered in the nested relationship, the execution is compiled as described above, and all isolated linear execution sequences are concatenated into a large execution sequence, which can be directly identified by the device as shown in fig. 8, and the device identifies the execution sequence between the sequences by the ID numbers of the respective linear execution sequences.
Because the user determines the complete neural network structure expressed by the whole source code by artificially analyzing the logical relationship inside the source code at present, for example, if the control semantics of the while statement is to be realized, taking the deep learning framework as mindspore as an example, the originally written while statement under mindspore is as follows:
Figure BDA0002429133450000111
by the method for processing the control flow graph, a user can directly write according to semantic rules of source codes under mindspore, and written if statements and while statements are respectively as follows:
Figure BDA0002429133450000112
B. and inserting an assignment operator according to semantic rules of the source code.
Still taking the above-mentioned case that each time the simulation executes a linear execution sequence, it is determined whether an operator needs to be inserted after the linear execution sequence and what type of operator needs to be inserted, if there is a data-dependent relationship between one linear execution sequence and another linear execution sequence, for example, if the output data of the linear execution sequence a1 is the input data of the linear execution sequence a2, an assignment operator (also called an assign operator) needs to be inserted after the linear execution sequence a1, and the assignment operator assigns the address of the output data of the linear execution sequence a1 to the address of the input data of the linear execution sequence a 2. Specifically, when the control flow graph is cut, the data among the cut linear sub-graphs are connected through paramter, the output of one sub-graph may be the input of other sub-graphs, and when an exec instruction is executed through simulation, an assignment operator is inserted into the output, the data dependency between the sub-graphs and the sub-graphs is connected, and the specific operation flow is as follows: in the simulation execution, where a main graph calls a subgraph, an output node (namely a node) of a previous graph and a paramter node in the subgraph are set together, a set _ input is called to insert an assign operator behind the node of the previous graph, the output node is assigned to the paramter node of the called subgraph, the connection between the graphs is completed, an output is set for a global execution result after the subgraph connection is completed, if the graph of switch is executed finally, the assign operator is added to the last node of a true graph and a false graph, the value of output data is assigned to the output, and the finally connected large execution sequence is shown in FIG. 9, wherein the large execution sequence can be directly identified by a device, and the device identifies the execution sequence among the sequences through ID numbers of various linear execution sequences.
It should be noted that, in the embodiment of the present application, the control operator and the assignment operator are inserted between the linear execution sequences according to the semantics of the source code without conflict, that is, the control operator may be inserted only after the linear execution sequence including the control semantics, and if the output data of the linear execution sequence is also required to be used as the input data of other linear execution sequences, the assignment operator may be inserted after the linear execution sequence; if the current linear execution sequence does not include the control semantics, but the output data of the linear execution sequence needs to be used as the input data of other linear execution sequences, then only the assignment operator needs to be inserted after the linear execution sequence, and the specific details are not limited herein.
205. And obtaining a calculation chart formed by connecting a plurality of linear execution sequences.
The individual isolated linear execution sequences are connected in the manner described above into a computation graph that can be directly recognized by the device (e.g., an AI chip).
It should be noted that the obtained computation graph may be sent to the device, specifically, the computation graph may be obtained from the deep learning framework when the device needs to execute data, or the computation graph may be sent to the device after the deep learning framework connects a plurality of linear execution sequences into one computation graph, and specifically, the manner in which the computation graph is sent to the device is not limited here.
In the above embodiment of the present application, a source code is compiled through a deep learning framework to obtain a control flow graph, the control flow graph is then graph-partitioned and recompiled to obtain linear execution sequences supported by a plurality of devices (there is no control dependency between the linear execution sequences), finally, each linear execution sequence is simulated and executed on the deep learning framework, semantics corresponding to each linear execution sequence is identified in the simulation execution process, and an operator is inserted between each linear execution sequence according to the semantics, so that each linear execution subsequence is connected into a calculation graph which can be directly identified by the device, the calculation graph represents a complete neural network structure, and the processing method avoids artificially analyzing the internal logic relationship of the source code to determine the neural network structure expressed by the whole source code, is not prone to error, and is convenient and fast, the processing efficiency of the equipment is improved.
On the basis of the embodiment corresponding to fig. 2, in order to better implement the above-mentioned solution of the embodiment of the present application, a deep learning framework for implementing the above-mentioned solution is also provided below. Referring to fig. 10 in detail, fig. 10 is a schematic structural diagram of an execution device according to an embodiment of the present application, and the deep learning framework includes: the system comprises a first compiling module 1001, a cutting module 1002, a second compiling module 1003 and a simulation executing module 1004, wherein the first compiling module 1001 is used for compiling a source code to obtain a control flow graph corresponding to the source code; a segmentation module 1002, configured to segment the control flow graph to obtain a plurality of segmented sub-graphs; a second compiling module 1003, configured to recompile the multiple subgraphs to obtain multiple linear execution sequences, where the multiple linear execution sequences are multiple execution sequences without control dependency; the simulation execution module 1004 is configured to sequentially perform simulation execution on the plurality of linear execution sequences, identify semantics corresponding to the plurality of linear execution sequences during the simulation execution, and insert at least one operator between the plurality of linear execution sequences according to the semantics to connect the plurality of linear execution sequences into a computation graph.
In one possible design, the simulation execution module 1003 is specifically configured to: and inserting an assignment operator after a first linear execution sequence, wherein the assignment operator is used for assigning the address of the output data of the first linear execution sequence to the address of the input data of a second linear execution sequence, and the first linear execution sequence and the second linear execution sequence both belong to the plurality of linear execution sequences.
In one possible design, the simulation execution module 1003 is further specifically configured to: inserting a first control operator after a first linear execution sequence, wherein the first linear execution sequence is a linear execution sequence including control semantics in the plurality of linear execution sequences, the first control operator is used for jumping from the first linear execution sequence to a second linear execution sequence, and the second linear execution sequence is one of the plurality of linear execution sequences except the first linear execution sequence.
In one possible design, the control semantics condition judgment semantics (e.g., if statement with condition judgment semantics), and the simulation execution module 1003 is further specifically configured to: skipping to a plurality of second linear execution sequences by the first control operator, the plurality of second linear execution sequences being branches of a plurality of judgment results corresponding to the conditional judgment semantics.
In a possible design, the control semantic is a loop semantic (e.g., a while statement with a loop semantic), and after inserting a first control operator after the first linear execution sequence, the simulation execution module 1003 is further specifically configured to: jumping to the second linear execution sequence connected with the first linear execution sequence through the first control operator, wherein the second linear execution sequence is a loop structure corresponding to the first linear execution sequence; and inserting a second control operator after the second linear execution sequence, wherein the second control operator is used for circularly executing the first linear execution sequence and the second linear execution sequence until output data of the second linear execution sequence meets a preset value.
In one possible design, the simulation execution module 1003 is further specifically configured to: after all the linear execution sequences in the plurality of linear execution sequences are simulated and executed in sequence, inserting the at least one operator among the plurality of linear execution sequences according to the corresponding semantics of the plurality of linear execution sequences; or, simulating and executing a target linear execution sequence, wherein the target linear execution sequence is a linear execution sequence in the plurality of linear execution sequences which is being simulated and executed, and when the target linear execution sequence is simulated and executed, an operator corresponding to the semantics of the target linear execution sequence is inserted after the target linear execution sequence.
In one possible design, the simulation execution module 1003 is further specifically configured to: generating an instruction set according to corresponding semantics of a plurality of linear execution sequences, wherein the instruction set comprises a plurality of instructions used for indicating that at least one operator is inserted between the plurality of linear execution sequences.
In a possible design, if the deep learning framework is located at a first node, the simulation execution module 1003 is further specifically configured to: and sending the computation graph to a second node, where the first node and the second node may be located in the same device or in different devices, and the present disclosure is not limited herein.
In one possible design, the deep learning framework includes: mindspore, tensorflow, pitorch, mxnet, caffe, theta.
It should be noted that, the contents of information interaction, execution process, and the like between the modules/units in the deep learning framework provided in fig. 10 are based on the same concept as the method embodiment corresponding to fig. 2 in the present application, and specific contents may refer to the description in the foregoing method embodiment in the present application, and are not described herein again.
Referring to fig. 11, fig. 11 is a schematic structural diagram of an execution device provided in the embodiment of the present application, and for convenience of description, only portions related to the embodiment of the present application are shown, and details of the implementation are not disclosed, please refer to a method portion of the embodiment of the present application. The execution device 1100 may be deployed with corresponding modules in the deep learning framework described in the embodiment corresponding to fig. 10, and is configured to implement the functions of the deep learning framework in the embodiment corresponding to fig. 10, specifically, the execution device 1100 is implemented by one or more servers, and the execution device 1100 may generate a large difference due to different configurations or performances, and may include one or more processors 1122, a memory 1132, one or more storage media 1130 (e.g., one or more mass storage devices) for storing the application program 1142 or the data 1144. Specifically, the processor 1122 may be a Central Processing Unit (CPU), an embedded microcontroller, an AI processor, etc., and the type of the processor 1122 is not limited herein, wherein the memory 1132 and the storage medium 1130 may be a transient storage or a persistent storage. The program stored on the storage medium 1130 may include one or more modules (not shown), each of which may include a sequence of instructions for operating on the exercise device. Still further, the processor 1122 may be configured to communicate with the storage medium 1130 to execute a series of instruction operations in the storage medium 1130 on the execution device 1100, for example, the processor 1122 may call the series of instruction operations in the storage medium 1130 to perform the steps of: compiling a source code to obtain a control flow graph corresponding to the source code, segmenting the obtained control flow graph to obtain a plurality of segmented sub-graphs, recompiling the obtained sub-graphs to obtain a plurality of linear execution sequences, simulating and executing the linear execution sequences in sequence, and inserting at least one operator between the linear execution sequences according to semantics corresponding to the linear execution sequences to connect the linear execution sequences into a computation graph.
The execution apparatus 1100 may also include one or more power supplies 1126, one or more wired or wireless network interfaces 1150, one or more input-output interfaces 1158, and/or one or more operating systems 1141, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.
In this embodiment of the present application, the steps performed by the deep learning framework in the foregoing various processing methods related to the control flow graph may be executed on the processor by invoking relevant codes stored on the storage medium based on the structure shown in the execution device in fig. 11, which is not described herein in detail.
It should be noted that the above-described embodiments of the apparatus are merely schematic, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiments of the apparatus provided in the present application, the connection relationship between the modules indicates that there is a communication connection therebetween, and may be implemented as one or more communication buses or signal lines.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application may be substantially embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, an exercise device, or a network device) to execute the method according to the embodiments of the present application.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, training device, or data center to another website site, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a training device, a data center, etc., that incorporates one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Claims (22)

1.一种控制流图的处理方法,应用于深度学习框架,其特征在于,包括:1. a processing method of a control flow graph, applied to a deep learning framework, is characterized in that, comprising: 通过所述深度学习框架编译源代码,以得到与所述源代码对应的控制流图;Compile source code through the deep learning framework to obtain a control flow graph corresponding to the source code; 切分所述控制流图,以得到切分后的多个子图;splitting the control flow graph to obtain a plurality of subgraphs after splitting; 再编译所述多个子图,以得到多个线性执行序列,所述多个线性执行序列为没有控制依赖的多个执行序列;Recompiling the multiple subgraphs to obtain multiple linear execution sequences, where the multiple linear execution sequences are multiple execution sequences without control dependencies; 通过所述深度学习框架按顺序模拟执行所述多个线性执行序列,在模拟执行过程中识别所述多个线性执行序列对应的语义,并按照所述语义在所述多个线性执行序列之间插入至少一个算子,以将所述多个线性执行序列连接成一张计算图。The deep learning framework is used to simulate and execute the multiple linear execution sequences in sequence, identify the semantics corresponding to the multiple linear execution sequences during the simulated execution, and execute the multiple linear execution sequences according to the semantics. Inserting at least one operator to connect the plurality of linear execution sequences into a computational graph. 2.根据权利要求1所述的方法,其特征在于,所述按照所述语义在所述多个线性执行序列之间插入至少一个算子包括:2. The method according to claim 1, wherein the inserting at least one operator between the plurality of linear execution sequences according to the semantics comprises: 在第一线性执行序列后插入赋值算子,所述赋值算子用于将所述第一线性执行序列的输出数据的地址赋值为第二线性执行序列的输入数据的地址,所述第一线性执行序列与所述第二线性执行序列均属于所述多个线性执行序列。An assignment operator is inserted after the first linear execution sequence, the assignment operator is used to assign the address of the output data of the first linear execution sequence to the address of the input data of the second linear execution sequence, the first linear execution sequence Both the execution sequence and the second linear execution sequence belong to the plurality of linear execution sequences. 3.根据权利要求1-2中任一项所述的方法,其特征在于,所述按照所述语义在所述多个线性执行序列之间插入至少一个算子包括:3. The method according to any one of claims 1-2, wherein the inserting at least one operator between the plurality of linear execution sequences according to the semantics comprises: 在第一线性执行序列后插入第一控制算子,所述第一线性执行序列为所述多个线性执行序列中包括控制语义的线性执行序列,所述第一控制算子用于从所述第一线性执行序列跳转到第二线性执行序列,所述第二线性执行序列为所述多个线性执行序列中除所述第一线性执行序列之外的一个线性执行序列。A first control operator is inserted after the first linear execution sequence, where the first linear execution sequence is a linear execution sequence including control semantics among the plurality of linear execution sequences, the first control operator is used to obtain control semantics from the The first linear execution sequence jumps to a second linear execution sequence, where the second linear execution sequence is a linear execution sequence other than the first linear execution sequence among the plurality of linear execution sequences. 4.根据权利要求3所述的方法,其特征在于,所述控制语义为条件判断语义,在第一线性执行序列后插入第一控制算子之后,所述方法还包括:4. The method according to claim 3, wherein the control semantics is conditional judgment semantics, and after inserting the first control operator after the first linear execution sequence, the method further comprises: 通过所述第一控制算子跳转到多个所述第二线性执行序列,所述多个第二线性执行序列为与所述条件判断语义对应的多个判断结果的分支。Jumping to a plurality of the second linear execution sequences through the first control operator, where the plurality of second linear execution sequences are branches of a plurality of judgment results corresponding to the condition judgment semantics. 5.根据权利要求3所述的方法,其特征在于,所述控制语义为循环语义,在所述第一线性执行序列后插入第一控制算子之后,所述方法还包括:5. The method according to claim 3, wherein the control semantics is loop semantics, and after inserting the first control operator after the first linear execution sequence, the method further comprises: 跳转到与所述第一线性执行序列通过所述第一控制算子连接的所述第二线性执行序列,所述第二线性执行序列为对应所述第一线性执行序列的循环结构;Jumping to the second linear execution sequence connected to the first linear execution sequence through the first control operator, where the second linear execution sequence is a loop structure corresponding to the first linear execution sequence; 在所述第二线性执行序列后插入第二控制算子,所述第二控制算子用于循环执行所述第一线性执行序列以及所述第二线性执行序列,直至所述第二线性执行序列的输出数据满足预设值。A second control operator is inserted after the second linear execution sequence, the second control operator is used for cyclically executing the first linear execution sequence and the second linear execution sequence until the second linear execution The output data of the sequence meets the preset value. 6.根据权利要求1-5中任一项所述的方法,其特征在于,所述通过所述深度学习框架按顺序模拟执行所述多个线性执行序列,并按照所述语义在所述多个线性执行序列之间插入至少一个算子包括:6. The method according to any one of claims 1-5, wherein the multiple linear execution sequences are sequentially simulated and executed by the deep learning framework, and the multiple linear execution sequences are executed according to the semantics. Inserting at least one operator between linear execution sequences includes: 通过所述深度学习框架按顺序模拟执行完所述多个线性执行序列中的全部线性执行序列之后,再按照所述语义在所述多个线性执行序列之间插入所述至少一个算子;After simulating and executing all the linear execution sequences in the multiple linear execution sequences in sequence by using the deep learning framework, insert the at least one operator between the multiple linear execution sequences according to the semantics; 或,or, 通过所述深度学习框架模拟执行目标线性执行序列,所述目标线性执行序列为所述多个线性执行序列中的一个正在被模拟执行的线性执行序列;A target linear execution sequence is simulated and executed by the deep learning framework, and the target linear execution sequence is a linear execution sequence that is being simulated and executed by one of the plurality of linear execution sequences; 当所述目标线性执行序列被模拟执行完,在所述目标线性执行序列后插入与所述目标线性执行序列的语义对应的算子。When the target linear execution sequence is simulated and executed, an operator corresponding to the semantics of the target linear execution sequence is inserted after the target linear execution sequence. 7.根据权利要求1-6中任一项所述的方法,其特征在于,所述按照所述语义在所述多个线性执行序列之间插入至少一个算子包括:7. The method according to any one of claims 1-6, wherein the inserting at least one operator between the plurality of linear execution sequences according to the semantics comprises: 按照所述语义生成指令集,所述指令集中包括多个指令,所述多个指令用于指示在所述多个线性执行序列之间插入至少一个算子。An instruction set is generated according to the semantics, the instruction set includes a plurality of instructions for instructing to insert at least one operator between the plurality of linear execution sequences. 8.根据权利要求1-7中任一项所述的方法,其特征在于,所述深度学习框架位于第一节点,所述方法还包括:8. The method according to any one of claims 1-7, wherein the deep learning framework is located at the first node, and the method further comprises: 将所述计算图向第二节点发送,所述第一节点和所述第二节点位于同一设备或位于不同设备。Sending the computation graph to a second node, where the first node and the second node are located in the same device or located in different devices. 9.根据权利要求1-8中任一项所述的方法,其特征在于,所述深度学习框架包括:9. The method of any one of claims 1-8, wherein the deep learning framework comprises: mindspore、tensorflow、pytorch、mxnet、caffe或theano。mindspore, tensorflow, pytorch, mxnet, caffe or theano. 10.根据利要求8-9中任一项所述的方法,其特征在于,所述设备包括:10. The method according to any one of claims 8-9, wherein the device comprises: AI芯片。AI chip. 11.一种深度学习框架,其特征在于,包括:11. A deep learning framework, comprising: 第一编译模块,用于编译源代码,以得到与所述源代码对应的控制流图;a first compiling module for compiling source code to obtain a control flow graph corresponding to the source code; 切分模块,用于切分所述控制流图,以得到切分后的多个子图;A segmentation module, used for segmenting the control flow graph to obtain a plurality of sub-graphs after being segmented; 第二编译模块,用于再编译所述多个子图,以得到多个线性执行序列,所述多个线性执行序列为没有控制依赖的多个执行序列;a second compiling module, configured to recompile the multiple subgraphs to obtain multiple linear execution sequences, where the multiple linear execution sequences are multiple execution sequences without control dependencies; 模拟执行模块,用于按顺序模拟执行所述多个线性执行序列,在模拟执行过程中识别所述多个线性执行序列对应的语义,并按照所述语义在所述多个线性执行序列之间插入至少一个算子,以将所述多个线性执行序列连接成一张计算图。A simulation execution module, configured to simulate and execute the plurality of linear execution sequences in sequence, identify the semantics corresponding to the plurality of linear execution sequences during the simulation execution, and perform the execution between the plurality of linear execution sequences according to the semantics Inserting at least one operator to connect the plurality of linear execution sequences into a computational graph. 12.根据权利要求11所述的框架,其特征在于,所述模拟执行模块具体用于:12. The framework according to claim 11, wherein the simulation execution module is specifically used for: 在第一线性执行序列后插入赋值算子,所述赋值算子用于将所述第一线性执行序列的输出数据的地址赋值为第二线性执行序列的输入数据的地址,所述第一线性执行序列与所述第二线性执行序列均属于所述多个线性执行序列。An assignment operator is inserted after the first linear execution sequence, the assignment operator is used to assign the address of the output data of the first linear execution sequence to the address of the input data of the second linear execution sequence, the first linear execution sequence Both the execution sequence and the second linear execution sequence belong to the plurality of linear execution sequences. 13.根据权利要求11-12中任一项所述的框架,其特征在于,所述模拟执行模块具体还用于:13. The framework according to any one of claims 11-12, wherein the simulation execution module is further used for: 在第一线性执行序列后插入第一控制算子,所述第一线性执行序列为所述多个线性执行序列中包括控制语义的线性执行序列,所述第一控制算子用于从所述第一线性执行序列跳转到第二线性执行序列,所述第二线性执行序列为所述多个线性执行序列中除所述第一线性执行序列之外的一个线性执行序列。A first control operator is inserted after the first linear execution sequence, where the first linear execution sequence is a linear execution sequence including control semantics among the plurality of linear execution sequences, the first control operator is used to obtain control semantics from the The first linear execution sequence jumps to a second linear execution sequence, where the second linear execution sequence is a linear execution sequence other than the first linear execution sequence among the plurality of linear execution sequences. 14.根据权利要求13所述的框架,其特征在于,所述控制语义为条件判断语义,在第一线性执行序列后插入第一控制算子之后,所述模拟执行模块具体还用于:14. The framework according to claim 13, wherein the control semantics is conditional judgment semantics, and after inserting the first control operator after the first linear execution sequence, the simulation execution module is further used for: 通过所述第一控制算子跳转到多个所述第二线性执行序列,所述多个第二线性执行序列为与所述条件判断语义对应的多个判断结果的分支。Jumping to a plurality of the second linear execution sequences through the first control operator, where the plurality of second linear execution sequences are branches of a plurality of judgment results corresponding to the condition judgment semantics. 15.根据权利要求13所述的框架,其特征在于,所述控制语义为循环语义,在所述第一线性执行序列后插入第一控制算子之后,所述模拟执行模块具体还用于:15. The framework according to claim 13, wherein the control semantics is loop semantics, and after inserting the first control operator after the first linear execution sequence, the simulation execution module is further configured to: 跳转到与所述第一线性执行序列通过所述第一控制算子连接的所述第二线性执行序列,所述第二线性执行序列为对应所述第一线性执行序列的循环结构;Jumping to the second linear execution sequence connected to the first linear execution sequence through the first control operator, where the second linear execution sequence is a loop structure corresponding to the first linear execution sequence; 在所述第二线性执行序列后插入第二控制算子,所述第二控制算子用于循环执行所述第一线性执行序列以及所述第二线性执行序列,直至所述第二线性执行序列的输出数据满足预设值。A second control operator is inserted after the second linear execution sequence, the second control operator is used for cyclically executing the first linear execution sequence and the second linear execution sequence until the second linear execution The output data of the sequence meets the preset value. 16.根据权利要求11-15中任一项所述的框架,其特征在于,所述模拟执行模块具体还用于:16. The framework according to any one of claims 11-15, wherein the simulation execution module is further used for: 按顺序模拟执行完所述多个线性执行序列中的全部线性执行序列之后,再按照所述语义在所述多个线性执行序列之间插入所述至少一个算子;After simulating and executing all the linear execution sequences in the plurality of linear execution sequences in sequence, insert the at least one operator between the plurality of linear execution sequences according to the semantics; 或,or, 模拟执行目标线性执行序列,所述目标线性执行序列为所述多个线性执行序列中的一个正在被模拟执行的线性执行序列;simulating execution of a target linear execution sequence, where the target linear execution sequence is a linear execution sequence that is being simulated and executed by one of the plurality of linear execution sequences; 当所述目标线性执行序列被模拟执行完,在所述目标线性执行序列后插入与所述目标线性执行序列的语义对应的算子。When the target linear execution sequence is simulated and executed, an operator corresponding to the semantics of the target linear execution sequence is inserted after the target linear execution sequence. 17.根据权利要求11-16中任一项所述的框架,其特征在于,所述模拟执行模块具体还用于:17. The framework according to any one of claims 11-16, wherein the simulation execution module is further used for: 按照所述语义生成指令集,所述指令集中包括多个指令,所述多个指令用于指示在所述多个线性执行序列之间插入至少一个算子。An instruction set is generated according to the semantics, the instruction set includes a plurality of instructions for instructing to insert at least one operator between the plurality of linear execution sequences. 18.根据权利要求11-17中任一项所述的框架,其特征在于,所述深度学习框架位于第一节点,所述模拟执行模块具体还用于:18. The framework according to any one of claims 11-17, wherein the deep learning framework is located at the first node, and the simulation execution module is further used for: 将所述计算图向第二节点发送,所述第一节点和所述第二节点位于同一设备或位于不同设备。Sending the computation graph to a second node, where the first node and the second node are located in the same device or located in different devices. 19.根据权利要求11-18中任一项所述的框架,其特征在于,所述深度学习框架包括:19. The framework of any one of claims 11-18, wherein the deep learning framework comprises: mindspore、tensorflow、pytorch、mxnet、caffe或theano。mindspore, tensorflow, pytorch, mxnet, caffe or theano. 20.一种执行设备,包括处理器和存储器,所述处理器与所述存储器耦合,其特征在于,20. An execution device comprising a processor and a memory, the processor being coupled to the memory, characterized in that: 所述存储器,用于存储程序;the memory for storing programs; 所述处理器,用于执行所述存储器中的程序,使得所述执行设备执行如权利要求1-10中任一项所述的方法。The processor is configured to execute the program in the memory, so that the execution device executes the method according to any one of claims 1-10. 21.一种计算机可读存储介质,包括程序,当其在计算机上运行时,使得计算机执行如权利要求1-10中任一项所述的方法。21. A computer-readable storage medium comprising a program which, when run on a computer, causes the computer to perform the method of any one of claims 1-10. 22.一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行如权利要求1-10中任一项所述的方法。22. A computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of claims 1-10.
CN202010230463.2A 2020-03-27 2020-03-27 A control flow graph processing method and related equipment Active CN113449856B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010230463.2A CN113449856B (en) 2020-03-27 2020-03-27 A control flow graph processing method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010230463.2A CN113449856B (en) 2020-03-27 2020-03-27 A control flow graph processing method and related equipment

Publications (2)

Publication Number Publication Date
CN113449856A true CN113449856A (en) 2021-09-28
CN113449856B CN113449856B (en) 2025-03-11

Family

ID=77808091

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010230463.2A Active CN113449856B (en) 2020-03-27 2020-03-27 A control flow graph processing method and related equipment

Country Status (1)

Country Link
CN (1) CN113449856B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114091686A (en) * 2021-11-11 2022-02-25 北京百度网讯科技有限公司 Data processing method and device, electronic equipment and storage medium
CN114707643A (en) * 2022-04-11 2022-07-05 华为技术有限公司 Model segmentation method and related equipment thereof
CN115762515A (en) * 2022-11-08 2023-03-07 北京百度网讯科技有限公司 Processing and application method, device and equipment of neural network for voice recognition
CN115794098A (en) * 2022-10-13 2023-03-14 北京算能科技有限公司 Model processing method and device, electronic device and storage medium
CN116483858A (en) * 2023-04-25 2023-07-25 蚂蚁区块链科技(上海)有限公司 Data query method and device
CN116523045A (en) * 2023-03-13 2023-08-01 之江实验室 Deep learning reasoning simulator oriented to multi-core chip
CN120704693A (en) * 2025-08-15 2025-09-26 上海壁仞科技股份有限公司 Compilation method, electronic device, and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080019378A1 (en) * 2006-07-21 2008-01-24 Hogan Kevin M Generating a data flow diagram
WO2016055895A1 (en) * 2014-10-06 2016-04-14 International Business Machines Corporation Natural language processing utilizing logical tree structures and propagation of knowledge through logical parse tree structures
CN107357566A (en) * 2017-06-06 2017-11-17 上海交通大学 More framework binary system similar codes detecting systems and method
CN109408034A (en) * 2018-03-17 2019-03-01 东南大学 A kind of controlling stream graph building method of object-oriented program
CN110008710A (en) * 2019-04-15 2019-07-12 上海交通大学 Vulnerability detection method based on deep reinforcement learning and program path instrumentation
CN110908667A (en) * 2019-11-18 2020-03-24 北京迈格威科技有限公司 Method and device for joint compilation of neural network and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080019378A1 (en) * 2006-07-21 2008-01-24 Hogan Kevin M Generating a data flow diagram
WO2016055895A1 (en) * 2014-10-06 2016-04-14 International Business Machines Corporation Natural language processing utilizing logical tree structures and propagation of knowledge through logical parse tree structures
CN107357566A (en) * 2017-06-06 2017-11-17 上海交通大学 More framework binary system similar codes detecting systems and method
CN109408034A (en) * 2018-03-17 2019-03-01 东南大学 A kind of controlling stream graph building method of object-oriented program
CN110008710A (en) * 2019-04-15 2019-07-12 上海交通大学 Vulnerability detection method based on deep reinforcement learning and program path instrumentation
CN110908667A (en) * 2019-11-18 2020-03-24 北京迈格威科技有限公司 Method and device for joint compilation of neural network and electronic equipment

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114091686A (en) * 2021-11-11 2022-02-25 北京百度网讯科技有限公司 Data processing method and device, electronic equipment and storage medium
CN114091686B (en) * 2021-11-11 2022-10-14 北京百度网讯科技有限公司 Data processing method and device, electronic equipment and storage medium
CN114707643A (en) * 2022-04-11 2022-07-05 华为技术有限公司 Model segmentation method and related equipment thereof
CN115794098A (en) * 2022-10-13 2023-03-14 北京算能科技有限公司 Model processing method and device, electronic device and storage medium
CN115762515A (en) * 2022-11-08 2023-03-07 北京百度网讯科技有限公司 Processing and application method, device and equipment of neural network for voice recognition
CN115762515B (en) * 2022-11-08 2023-12-01 北京百度网讯科技有限公司 Processing and application method, device and equipment for neural network for voice recognition
CN116523045A (en) * 2023-03-13 2023-08-01 之江实验室 Deep learning reasoning simulator oriented to multi-core chip
CN116523045B (en) * 2023-03-13 2023-11-07 之江实验室 Deep learning reasoning simulator oriented to multi-core chip
CN116483858A (en) * 2023-04-25 2023-07-25 蚂蚁区块链科技(上海)有限公司 Data query method and device
CN120704693A (en) * 2025-08-15 2025-09-26 上海壁仞科技股份有限公司 Compilation method, electronic device, and storage medium
CN120704693B (en) * 2025-08-15 2025-11-28 上海壁仞科技股份有限公司 Compilation methods, electronic devices, and storage media

Also Published As

Publication number Publication date
CN113449856B (en) 2025-03-11

Similar Documents

Publication Publication Date Title
CN113449856B (en) A control flow graph processing method and related equipment
EP4116885A1 (en) Processing method for neural network model, and related device
US20230334292A1 (en) Node fusion method for computational graph and device
CN113313241B (en) Method and computing device for determining tensor information of deep learning model
US10269087B2 (en) Language translation using preprocessor macros
US20160246622A1 (en) Method and system for implementing invocation stubs for the application programming interfaces embedding with function overload resolution for dynamic computer programming languages
US20240062116A1 (en) Model processing method and apparatus
CN114327405A (en) Data processing method, device, equipment and storage medium
JP2025541903A (en) Neural network model compilation method and device, electronic device, and storage medium
US11693635B1 (en) Automatic code reconfiguration based on function and subfunction analysis
CN119759356B (en) Code generation method and device based on large model and electronic equipment
CN106648818B (en) An object code control flow graph generation system
US20240231776A1 (en) Automatic code reconfiguration based on function and subfunction analysis
EP4242880A1 (en) Operator calculation method and apparatus, device, and system
US20240134638A1 (en) Shrinking delta files based on function analysis
US20240134620A1 (en) Shrinking files based on function analysis
CN113946516B (en) Code coverage rate determining method, device and storage medium
CN114896026A (en) Combat simulation development method and system
CN116029385A (en) Model compiling method, device, compiler and model running system
US20240134618A1 (en) Program compilation method and apparatus
CN117369521B (en) Method, device and equipment for generating behavior tree model path for unmanned aerial vehicle decision
CN116560666B (en) AI front-end unified computing method, device and media based on multi-level code generation
EP3991027B1 (en) Method and apparatus for enabling autonomous acceleration of dataflow ai applications
CN110033405B (en) Symbol execution method and system based on image processor acceleration
CN116720566A (en) A model performance optimization method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant