Disclosure of Invention
The embodiment of the application provides a processing method and related equipment of a control flow graph, which are used for inserting at least one operator (such as a control operator, an assignment operator and the like) into a plurality of linear execution sequences so as to connect the plurality of linear execution sequences into a calculation graph which can be directly identified by equipment, and the processing method and the related equipment are not easy to make mistakes, and are convenient and quick.
Based on this, the embodiment of the present application provides the following technical solutions:
in a first aspect, the present application first provides a method for processing a control flow graph, which may be used in the field of artificial intelligence, and the method includes: first, if the user decides which neural network to use (e.g., a convolutional upgrade network), the source code, which is the structure of the neural network that the user forms by writing a simple high-level language (e.g., python script), can be input through the deep learning framework. And compiling the input source code through the deep learning framework to obtain a control flow graph corresponding to the source code. After the deep learning framework obtains the control flow graph corresponding to the source code, the control flow graph is segmented at the position including the nonlinear semantics, so that a plurality of segmented subgraphs are obtained, each subgraph has no control dependency, each subgraph refers to a graph in which a node set and an edge set are respectively a subset of a node set and a subset of an edge set of a certain graph, and each subgraph can also be called a node series, namely, a node sequence of an execution result can be obtained by executing in sequence according to input. The sub-graphs obtained after segmentation are recompiled to obtain a plurality of linear execution sequences, the obtained plurality of linear execution sequences have no data dependency relationship and cannot be independently executed on a deep learning framework, then each linear execution sequence is simulated and executed through the deep learning framework, so that the semantics represented by the source code corresponding to each linear execution sequence are identified, and at least one operator is inserted between each linear execution sequence according to the semantics of each linear execution sequence, wherein the operator in the embodiment of the application refers to the mapping from one function space to another function space supported by equipment. Wherein the deep learning framework can identify semantics of the plurality of linear execution sequences based on semantic rules of the source code during the simulation execution of the linear execution sequences, that is, the deep learning framework understands which part of the program logic described by the source code each of the fractured linear execution sequences corresponds to by simulating the execution process. In the case of a type determination of the source code, the semantic rules of the source code are determined. Specifically, the control operator is a meaning expression indicating to jump from one linear execution sequence to another linear execution sequence according to the control semantics in the semantic rule; the assignment operator is an instruction to assign the address of the output data of one linear execution sequence to the address of the input data of another linear execution sequence. In this way, the individual isolated linear execution sequences are concatenated in the manner described above into a computation graph that can be directly recognized by the device (e.g., the AI chip).
In the above embodiment of the present application, a source code is compiled through a deep learning framework to obtain a control flow graph, the control flow graph is then graph-partitioned and recompiled to obtain linear execution sequences supported by multiple devices (there is no control dependency among the multiple linear execution sequences), and finally, each linear execution sequence is simulated and executed on the deep learning framework, and an operator is inserted between each linear execution sequence according to the semantics of each linear execution sequence, so that each linear execution subsequence is connected into a calculation graph which can be directly identified by the device, and the calculation graph represents a complete neural network structure.
With reference to the first aspect of the embodiment of the present application, in the first implementation manner of the first aspect of the embodiment of the present application, if there is a data dependency relationship between one linear execution sequence and another linear execution sequence, for example, if the output data of the linear execution sequence a1 is the input data of the linear execution sequence a2, an assignment operator (also referred to as an assign operator) needs to be inserted after the linear execution sequence a1, and the assignment operator is used for assigning the address of the output data of the linear execution sequence a1 to the address of the input data of the linear execution sequence a 2. The linear execution sequence a1 and the linear execution sequence a2 both belong to the above-mentioned linear execution sequences.
In the embodiment of the application, it is explained how to connect at least two linear execution sequences by an assignment operator, and the operability is achieved.
With reference to the first aspect of the embodiment of the present application and the first implementation manner of the first aspect of the embodiment of the present application, in a second implementation manner of the first aspect of the embodiment of the present application, if the deep learning framework determines, according to semantic rules of the source code, that a linear execution sequence that has been currently simulated to be executed completely is a linear execution sequence that includes control semantics, it is required to insert a control operator (which may be referred to as a first control operator) after the linear execution sequence, where the control operator is used to jump from the linear execution sequence that has been currently simulated to be executed completely to another one or more linear execution sequences, and how the deep learning framework determines which other one or more linear execution sequences to jump to is pointed to by pointers of the respective linear execution sequences.
It should be noted that, in the embodiment of the present application, the control operator and the assignment operator are inserted between the linear execution sequences according to the semantics of the source code without conflict, that is, the control operator may be inserted only after the linear execution sequence including the control semantics, and if the output data of the linear execution sequence is also required to be used as the input data of other linear execution sequences, the assignment operator may be inserted after the linear execution sequence; if the current linear execution sequence does not include the control semantics, but the output data of the linear execution sequence needs to be used as the input data of other linear execution sequences, then only the assignment operator needs to be inserted after the linear execution sequence, and the specific details are not limited herein.
In the embodiment of the application, the method explains how to connect at least two linear execution sequences through a control operator, provides another inserted operator type and has flexibility.
With reference to the second implementation manner of the first aspect of the present embodiment, in a third implementation manner of the first aspect of the present embodiment, after the deep learning framework simulates and executes a current linear execution sequence (the current linear execution sequence is an execution sequence having conditional judgment semantics (e.g., an if statement having conditional judgment semantics)), a control operator corresponding to the conditional judgment semantics is inserted after the current linear execution sequence, and then a jump is made to a plurality of other linear execution sequences pointing to a plurality of branches, which are connected to the current linear execution sequence through the control operator.
With reference to the second implementation manner of the first aspect of the embodiment of the present application, in a fourth implementation manner of the first aspect of the embodiment of the present application, after the deep learning framework is simulated and executed, a current linear execution sequence (the current linear execution sequence is an execution sequence with loop semantics (e.g., a while statement with loop semantics)) is executed, a control operator corresponding to the loop semantics is inserted after the current linear execution sequence, then a jump is made to a loop structure (i.e., a second linear execution sequence) connected to the current linear execution sequence through the control operator and pointing to the current linear execution sequence, and a jump operator (also referred to as a second control operator or an active operator) is inserted after the second linear execution sequence, the active operator is used to execute the execution sequence with loop semantics and the loop structure in a loop manner until the loop structure output data meets a preset value, and at the moment, the preset value meets the condition that the circulation execution is not performed any more, and the circulation execution is stopped. The first linear execution sequence and the loop structure form a loop body, and the loop structure is a program structure which is set in a program and needs to repeatedly execute a certain function, and judges whether to continue to execute the certain function or exit the loop according to a condition (namely, the first linear execution sequence) in the loop body.
In the embodiment of the application, how to insert the control operator when a linear execution series with control semantics (such as if statement and while statement) is encountered is specifically explained, and the method has operability in consideration of practical application situations.
With reference to the first aspect of the embodiment of the present application and the first implementation manner to the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect of the embodiment of the present application, sequentially performing simulation execution on each linear execution sequence through a deep learning framework, and inserting at least one operator between each linear execution sequence according to semantics of each linear execution sequence may specifically have multiple implementation manners, including but not limited to the following: a. after all the linear execution sequences are simulated and executed, operators are uniformly inserted between the linear execution sequences. After the deep learning framework obtains a plurality of linear execution sequences by recompiling a plurality of subgraphs, firstly, all the obtained linear execution sequences are simulated and executed, and after all the linear execution sequences are simulated and executed, whether operators need to be inserted after each linear execution sequence and what type of operators need to be inserted are judged according to the semantics of each linear execution sequence. b. Each time the simulation has executed a linear execution sequence, it is determined whether and what type of operator needs to be inserted after the linear execution sequence. After the deep learning framework obtains a plurality of linear execution sequences through recompiling a plurality of subgraphs, whether operators need to be inserted after the linear execution sequences is judged sequentially after each linear execution sequence is executed in a simulation mode, if the operators need to be inserted after the linear execution sequences, the semantics of the current linear execution sequences is determined according to the semantics rules of the source codes, and the types of the operators need to be inserted after the current linear execution sequences are determined.
In the embodiment of the present application, a sequence between the deep learning framework simulation execution linear execution sequence and the interpolation operator is specifically set forth, that is, all the linear execution sequence reinsertion operators can be completely executed in a simulation manner, and the operations can be executed in a simulation manner while the operators are interpolated, so that flexibility is provided.
With reference to the first aspect of the embodiment of the present application and the first implementation manner to the fifth implementation manner of the first aspect, in a sixth implementation manner of the first aspect of the embodiment of the present application, a specific processing manner of the deep learning framework to insert at least one operator between the multiple linear execution sequences according to the semantics of each linear execution sequence may be: firstly, an instruction set is generated according to the semantics of each linear execution sequence, the instruction set comprises a plurality of instructions, and each instruction in the plurality of instructions is used for instructing the deep learning framework to insert at least one operator between the plurality of linear execution sequences.
In the embodiment of the application, the insertion operator is indicated by an instruction in an instruction set, and the method has realizability.
With reference to the first aspect of the present application and the first to sixth implementation manners of the first aspect, in a seventh implementation manner of the first aspect of the present application, the deep learning framework may be located at a first node of the device, where the first node may send the obtained computation graph to a second node located in the device through the deep learning framework, or send the computation graph to a second node located in a different device, specifically, the computation graph may be obtained from the deep learning framework when the second node needs to execute data, or the computation graph may be sent to the second node after the deep learning framework connects a plurality of linear execution sequences into one computation graph, where a manner that the computation graph is sent to the second node is not specifically limited here. When the second node is located in a different device, the second node may be the device itself, e.g., a separate server.
In the embodiment of the application, several processing modes of the obtained calculation graph are set forth, and flexibility is achieved.
With reference to the first aspect of the present embodiment and the first to seventh implementation manners of the first aspect, in an eighth implementation manner of the first aspect of the present embodiment, the deep learning framework may specifically be any one of mindspore, tensorflow, pitorch, mxnet, caffe, or theta.
With reference to the first aspect of the present application and the first implementation manner to the eighth implementation manner of the first aspect, in a ninth implementation manner of the first aspect of the present application, the device for obtaining the computation graph may be an AI chip, where the AI chip is also called an AI accelerator or a computation card, that is, a module for processing a large number of computation tasks in an artificial intelligence application. For example, the AI chip described in the embodiments of the present application may be a GPU, an FPGA, an ASIC, etc. without a CPU (other non-computing tasks are handled by the CPU), an AI chip of the "shangtang" series (e.g., shangtang 910), or an AI chip with an independent AI CPU, and the type of the AI chip is not limited herein.
A second aspect of the embodiments of the present application provides a deep learning framework, which has a function of implementing the method of the first aspect or any one of the possible implementation manners of the first aspect. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.
A third aspect of the present embodiment provides an execution device, which may include a memory, a processor, and a bus system, where the memory is configured to store a program, and the processor is configured to call the program stored in the memory to execute the method of the first aspect or any one of the possible implementation manners of the first aspect of the present embodiment.
A fourth aspect of the present application provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the method of the first aspect or any one of the possible implementations of the first aspect.
A fifth aspect of embodiments of the present application provides a computer program, which, when run on a computer, causes the computer to perform the method of the first aspect or any one of the possible implementation manners of the first aspect.
Detailed Description
The embodiment of the application provides a processing method and related equipment of a control flow graph, which are used for inserting corresponding operators (such as control operators, assignment operators and the like) into a plurality of linear execution sequences, and connecting the plurality of linear execution sequences into a computation graph which can be directly identified by equipment through the operators (such as compiling a source code written in a high-level language into an assembly language supported by the equipment through a deep learning framework), so that the equipment is not required to frequently acquire each linear execution sequence.
Embodiments of the present application are described below with reference to the accompanying drawings. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.
The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
The general workflow of the artificial intelligence system will be described first, please refer to fig. 1, which shows a schematic structural diagram of an artificial intelligence body framework, and the artificial intelligence body framework is explained below from two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Where "intelligent information chain" reflects a list of processes processed from the acquisition of data. For example, the general processes of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision making and intelligent execution and output can be realized. In this process, the data undergoes a "data-information-knowledge-wisdom" refinement process. The 'IT value chain' reflects the value of the artificial intelligence to the information technology industry from the bottom infrastructure of the human intelligence, information (realization of providing and processing technology) to the industrial ecological process of the system.
(1) Infrastructure
The infrastructure provides computing power support for the artificial intelligent system, realizes communication with the outside world, and realizes support through a foundation platform. Communicating with the outside through a sensor; the computing power is provided by the smart chip, i.e., the aforementioned AI chip (CPU, NPU, GPU, ASIC, FPGA, etc. hardware acceleration chip), which in the embodiment of the present application also includes the AI chip of the Shengteng series (e.g., Shengteng 910); the basic platform comprises distributed computing framework, network and other related platform guarantees and supports, and can comprise cloud storage and computing, interconnection and intercommunication networks and the like. For example, sensors and external communications acquire data that is provided to intelligent chips in a distributed computing system provided by the base platform for computation.
(2) Data of
Data at the upper level of the infrastructure is used to represent the data source for the field of artificial intelligence. The data relates to graphs, images, voice and texts, and also relates to the data of the Internet of things of traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.
(3) Data processing
Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.
The machine learning and the deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.
Inference means a process of simulating an intelligent human inference mode in a computer or an intelligent system, using formalized information to think about and solve a problem by a machine according to an inference control strategy, and a typical function is searching and matching.
The decision-making refers to a process of making a decision after reasoning intelligent information, and generally provides functions of classification, sequencing, prediction and the like.
(4) General capabilities
After the above-mentioned data processing, further based on the result of the data processing, some general capabilities may be formed, such as algorithms or a general system, e.g. translation, analysis of text, computer vision processing, speech recognition, recognition of images, etc.
(5) Intelligent product and industrial application
The intelligent product and industry application refers to the product and application of an artificial intelligence system in various fields, and is the encapsulation of an artificial intelligence integral solution, the intelligent information decision is commercialized, and the landing application is realized, and the application field mainly comprises: intelligent terminal, intelligent manufacturing, intelligent transportation, intelligent house, intelligent medical treatment, intelligent security protection, autopilot, safe city etc..
The method for processing the control flow graph in the embodiment of the present invention can be applied to various fields of the artificial intelligence field (for example, the image processing field, the computer vision field, the semantic analysis field, and the like), and specifically, in combination with fig. 1, belongs to a specific data processing manner in the above "(3) data processing", and the method for processing the control flow graph is applied to a deep learning framework, and the control flow graph is processed into a computation graph that can be recognized by an intelligent chip through the deep learning framework (for example, a control flow graph written in a high-level language is compiled into a computation graph in a low-level language supported by the intelligent chip).
Since the processing method of the control flow graph in the embodiment of the present application is applied to the deep learning framework, before introducing the embodiment of the present application, the deep learning framework related to the present application is introduced, and a processing flow of the deep learning framework generally includes: the data to be processed may be converted into data that can be processed by a deep learning framework, and the data format of the data may be in various forms such as tensor, for example, the data to be processed may be image data (e.g., captured by a camera in a mobile terminal) related in the image processing field, may also be video data (e.g., captured by a monitoring system) related in the computer vision field, and may also be some acquired text data (e.g., text information input by a user through a terminal device), which is not limited herein. And applying various required operations aiming at the tensor, developing and training the model through automatic differentiation, and then obtaining an output result to start testing. Since the early deep learning framework is implemented based on high-level languages (such as Java, Python, Lua, etc.) for most of the processing flows, even though the simplest operation is executed, the high-level language consumes more CPU cycles than the low-level language, and is a deep neural network with a complex structure, and thus, the slow operation becomes a natural defect of the high-level language. There are two solutions to this problem: 1) the first approach is to emulate a conventional compiler. This method converts a high-level language into a low-level language (e.g., C language) and then compiles and executes it on the basis of the C language, just as a conventional compiler would compile the high-level language into assembly language of a specific platform to achieve efficient operation. The high-level language is a language independent of a machine, a process or an object, is a programming closer to a natural language and a mathematical formula, basically breaks away from a hardware system of the machine, and is written in a mode which is easier to understand by people, wherein the written program is called as a source program. Low-level languages are programming languages or instruction codes that a machine can directly recognize without translation, each instruction code having a corresponding circuit within a computer to complete it, or programming languages or instruction codes that a machine can directly understand and accept without translation. In order to realize the conversion, the implementation code of each tensor operation is added into the conversion part of the C language in advance, and then the compiler integrates the tensor operations realized by the C language in a compiling stage. 2) The second method is to use a scripting language to realize front-end modeling and a low-level language such as C + + to realize back-end operation, which means that the interaction between the high-level language and the low-level language occurs inside a deep learning framework, so that the front end does not need to be modified and the whole compilation is not needed (only partial compilation is needed by modifying compilation parameters), and the overall speed is faster.
For the embodiment of the application, a deep learning framework of a second solution is adopted, that is, a user firstly builds a model of a neural network by writing a source code, and then compiles the source code into an assembly language which can be identified by a back-end device (such as an AI chip) through the deep learning framework, so that the overall speed of the deep learning framework is improved. It should be noted that the deep learning framework applied in the processing method of the control flow graph in the embodiment of the present application may be any one of the deep learning frameworks meeting the above requirements, for example, any one of mindspore, tensoflow, pitorch, mxnet, caffe, or the thano.
Next, a method for processing a control flow graph provided in the embodiment of the present application is described, please refer to fig. 2:
201. and compiling the source code to obtain a control flow graph corresponding to the source code.
First, if the user decides which neural network (e.g., convolutional neural network) to use, the source code, which is the structure of the neural network that the user forms by writing a simple high-level language (e.g., python script), can be input through the deep learning framework. And compiling the input source code through the deep learning framework to obtain a control flow graph corresponding to the source code. What needs to be explained here is called a control flow diagram, which is also called a control flow diagram, and is an abstraction of branch jump relation in a program, and describes all possible execution paths of the program, nodes of the control flow diagram are statement sets, and edges from A to B indicate that a statement B can be directly executed after the statement A is executed.
202. And cutting the control flow graph to obtain a plurality of cut subgraphs.
After the deep learning framework obtains the control flow graph corresponding to the source code, the control flow graph is segmented at the position including the nonlinear semantics, so that a plurality of segmented subgraphs are obtained, each subgraph has no control dependence, and what called subgraph is also needed to be explained first, wherein the subgraph refers to a graph in which a node set and an edge set are respectively a subset of a node set and a subset of an edge set of a certain graph, and the subgraph can also be called as a node series, namely, a node sequence of an execution result can be obtained by executing in sequence according to input. The description is given by taking fig. 3 as an example: assuming that fig. 3 shows a control flow graph, where each circle represents a node (i.e. a basic block), and an arrow connecting each node (six nodes are illustrated in fig. 3 for nodes 1-6), the control flow graph may be divided into 3 sub-graphs, such as sub-graph 1, sub-graph 2, and sub-graph 3 shown in fig. 3, where sub-graph 1 is composed of node 1, node 2, and node 3, and the data flow direction between these three nodes, sub-graph 2 is composed of node 1, node 4, and the data flow direction between these two nodes, and sub-graph 3 is composed of node 1, node 5, node 6, and the data flow direction between these three nodes.
203. And compiling the multiple subgraphs to obtain multiple linear execution sequences.
The subgraph obtained after segmentation is recompiled to obtain a plurality of linear execution sequences, and the subgraph is obtained by directly segmenting the control flow graph, so that the subgraph is similar to the control flow graph, at the moment, the subgraph can be identified only by a deep learning framework, but equipment such as an AI chip and the like cannot directly identify each subgraph, and each subgraph needs to be recompiled by the deep learning framework, so that a plurality of linear execution sequences which can be identified by the equipment are obtained, namely, the subgraph is different from the linear execution sequences in that the linear execution sequences are logic structures compiled by a machine language which can be identified by the equipment, and the subgraph is logic structures compiled by a machine language which can only be identified by the deep learning framework but cannot be directly identified by the equipment. A sequence which can be formed by arranging all nodes in the linear structure according to the sequence of the execution sequence of the nodes is called a linear execution sequence, and the linear execution sequences have no control dependence and only data dependence.
204. And performing a plurality of linear execution sequences according to sequential simulation, and inserting at least one operator between each linear execution sequence according to the semantic meaning of each linear execution sequence.
After the sub-graph obtained after segmentation is recompiled to obtain a plurality of linear execution sequences, because the obtained plurality of linear execution sequences have no data dependency relationship and cannot be independently executed on the deep learning frame, each linear execution sequence can be further simulated and executed in sequence through the deep learning frame, according to the semantic rule of the source code, the semantics represented by the source code corresponding to each linear execution sequence are identified in the simulation execution process, and at least one operator is inserted into each linear execution sequence according to the semantics. Specifically, the control operator refers to an operation of jumping from one linear execution sequence to another linear execution sequence according to the meaning expression of control semantics in the semantic rule; the assignment operator is an operation of assigning an address of output data of one linear execution sequence to an address of input data of another linear execution sequence.
It should be noted that the simulation execution process of the deep learning framework on each linear execution sequence is to identify semantic rules of the source code, and based on the semantic rules, semantics included in each linear execution code can be further identified, and the structure of the linear execution sequence is not changed by the simulation execution process.
It should be noted here that, in some embodiments of the present application, the performing of each linear execution sequence through the deep learning framework in an order simulation, and inserting at least one operator between each linear execution sequence according to the semantics of each linear execution sequence may specifically have various implementations, including but not limited to the following:
a. after all the linear execution sequences are simulated and executed, operators are uniformly inserted between the linear execution sequences.
After the deep learning framework obtains a plurality of linear execution sequences by recompiling a plurality of subgraphs, firstly, all the obtained linear execution sequences are simulated and executed, and after all the linear execution sequences are simulated and executed, whether operators need to be inserted after each linear execution sequence and what types of operators need to be inserted are judged according to the semantics of each linear execution sequence.
b. Each time the simulation has executed a linear execution sequence, it is determined whether and what type of operator needs to be inserted after the linear execution sequence.
After the deep learning framework obtains a plurality of linear execution sequences by recompiling a plurality of subgraphs, whether operators need to be inserted after the linear execution sequences is judged sequentially after each linear execution sequence is executed in a simulation mode, and if the operators need to be inserted after the linear execution sequences, which type of operators need to be inserted after the linear execution sequences is determined according to the semantics of the current linear execution sequences.
It should be noted that, in some embodiments of the present application, a specific processing manner of the deep learning framework to insert at least one operator between the plurality of linear execution sequences according to the semantics of each linear execution sequence may be: firstly, an instruction set is generated according to the semantics of each linear execution sequence, the instruction set comprises a plurality of instructions, and each instruction in the plurality of instructions is used for instructing the deep learning framework to insert at least one operator between the plurality of linear execution sequences.
It should be further noted that, in some embodiments of the present application, according to semantic rules of the source code, semantics included in each linear execution sequence may be different, and then an operator type to be inserted after each linear execution sequence is also different, including but not limited to the following ways:
A. and inserting a control operator according to the semantic rule of the source code.
Taking the above-mentioned case that each time a linear execution sequence is executed in simulation, whether an operator needs to be inserted after the linear execution sequence and what type of operator needs to be inserted is determined, if the deep learning framework determines, according to the semantic rule of the source code, that the linear execution sequence that is currently executed in simulation is a linear execution sequence including control semantics, a control operator (which may be referred to as a first control operator) needs to be inserted, where the control operator is used to jump from the linear execution sequence that is currently executed in simulation to another linear execution sequence or linear execution sequences, and how the deep learning framework determines which other linear execution sequence or linear execution sequences to jump to is pointed to by the pointer of each linear execution sequence.
It should be noted here that, although the control operator is inserted after the linear execution sequence including the control semantics, when the control semantics are different, the specific execution mode of the insertion operator is also slightly different, and the following description takes the control semantics as if statements and while statements respectively as examples:
a. when the control semantics are conditional judgment semantics.
Taking an if statement with conditional judgment semantics as an example, after the deep learning framework is simulated and executed by a current linear execution sequence (the current linear execution sequence is an execution sequence with control semantics of the if statement), a control operator corresponding to the if statement is inserted after the current linear execution sequence, and then the control operator is switched to a plurality of other linear execution sequences which are connected with the current linear execution sequence through the control operator and point to a plurality of branches.
For ease of understanding, the following examples are given for illustration: as shown in fig. 4, first, the deep learning framework traverses all paths of the entire control flow graph compiled from a source code, and when a control node is encountered, the deep learning framework is segmented to segment the control flow graph into sub-graphs (also referred to as a node series), and the sub-graphs are compiled again to obtain linear execution sequences that can be executed by a device, and simultaneously, a graph number and an ID number are generated for each linear execution sequence, where the graph number is used for the deep learning framework to identify each linear executable sequence, and the ID number is used for the device to identify each linear executable sequence, and the deep learning framework generates an instruction set while segmenting and compiling the entire control flow graph, and each instruction in the instruction set is used for the deep learning framework to insert a corresponding control operator when each linear execution sequence is simulated in sequence, and when a control semantic is an if statement, generates a linear execution sequence that can execute correct branches after being executed after the linear execution sequence (e.g., fig. 4) True graph in (1) and a linear execution sequence of an erroneous branch (e.g., false graph in fig. 4) (which may also be an instruction that generates a linear execution sequence of an erroneous branch and a linear execution sequence of a correct branch that can be executed sequentially, even simultaneously, and is not limited herein). Specifically, the process of simulation execution may be: firstly simulating and executing a cond node, firstly setting a true execution true graph to obtain a graph number of the true graph, returning a switch instruction, executing the cond node (the cond node represents that the graph number is compiled in a control flow graph when a conditional statement such as < … … > exists), setting the cond node to be a false execution false graph to obtain a graph number of the false graph, using the cond node as a connection, storing the graph number of a linear execution sequence of the cond corresponding to the cond node, the graph number of the true graph and the graph number of the true graph, executing a switch ending instruction after the false graph is executed, calling an interface, inserting a control operator (such as the switch operator), wherein the conditional judgment semantic generally has two branches of judgment results (such as the true graph and the false graph) because the if statement belongs to one of statements with conditional judgment semantic, and the switch operator includes the linear execution sequence of the joined c-c series of the execution nodes, and then the switch operator includes the linear judgment result of the linear execution true graph and the linear execution node after the linear execution sequence of the cond is inserted into the linear execution sequence, and the linear execution result of the switch graph number of the join c series of the join nodes, true graph, false graph. Similarly, according to the execution principle described above, all linear execution sequences with control semantics are connected into a large execution sequence with multi-stream concatenation finally through a switch operator.
To further facilitate understanding, the following is again illustrated by way of example: as shown in FIG. 5, a control flow graph with control semantics of if statements is compiled into 4 sub-graphs, i.e., a switch graph, a true branch sub-graph, a false branch sub-graph, and an after sub-graph, and the 4 sub-graphs are recompiled to form 4 linear executable sequences, i.e., a test _ if graph, a true graph, a false graph, and an after graph as shown in FIG. 5. After each sub-graph is recompiled, an instruction set for simulating and executing a linear execution sequence is generated, and each linear execution sequence is simulated and executed in sequence according to the instruction set, taking fig. 5 as an example, if it is assumed that an after graph is fused into a true graph, the generated switch instruction is: switch (cond) -true-graph- > exec-true-graph- > return output- > switch return (switch process returns the original function call stack) - > switch (cond) -false- > call false-graph + exec-false-graph- > set switch graph (according to the current cond, connect the three linear execution sequences of the current cond). Then, the deep learning framework executes the switch instructions in the above order, when the call graph is executed, an input (address of input data) is set for the current graph, an assignment operator (also called an assign operator) is inserted after the input node, after the execution of the fault graph is completed, the obtained cond graph and the true graph are connected, after the simulation execution of all the linear execution sequences is completed, a large execution sequence (i.e. a computation graph) which is formed by connecting the final linear execution sequences and can be executed on a device (e.g. an AI chip) is obtained, and the large execution sequence can be as shown in fig. 6, at this time, the large execution sequence can be directly identified by the device, and the device identifies the execution sequence between the sequences by the ID numbers of the respective linear execution sequences.
b. When the control semantics are cyclic semantics.
Taking a while sentence with loop semantics as an example, after the deep learning framework simulates and executes a current linear execution sequence (the current linear execution sequence is an execution sequence with the control semantics of the while sentence), inserting a control operator corresponding to the while sentence after the current linear execution sequence, then jumping to other linear execution sequences with loop semantics which are connected with the current linear execution sequence through the control operator, and inserting a jump operator (also called as a second control operator or an active operator) after the linear execution sequence with loop semantics, wherein the active operator is used for circularly executing the execution sequence with the control semantics of the while sentence and the linear execution sequence with loop semantics until output data of the linear execution sequence with loop semantics meets a preset value, and the preset value meets the condition of no loop execution, the loop execution is stopped.
For ease of understanding, the following examples are given for illustration: as shown in fig. 7, a control flow graph with control semantics of while statements is compiled into 4 sub-graphs during the compilation process, while conditional judge statements generate switch nodes, the sub-graph in which the switch nodes are located is recompiled into a header graph, the sub-graph in which the while executive statement nodes are located is recompiled into a body graph, the sub-graph in which the after statement nodes are located is recompiled into an after graph, which is the same as the flow of if statements, and after graph segmentation, all sub-graphs are compiled into isolated linear execution sequences that can be executed on a device, that is, test _ while graph, header graph, body graph and after graph segmentation shown in fig. 7. After recompiling each sub-graph, an instruction set simulating and executing a linear execution sequence is generated, and each linear execution sequence is simulated and executed in sequence according to the instruction set, taking fig. 7 as an example, the generated switch instruction is: when simulation execution if and while return the original stack of the switch with different switch instructions, if return with return, while need to resume the header graph at the blue graph, insert active in the switch instruction execution function to connect header graph and body graph, insert active operator at the last insert node of the body graph, achieve the loop effect.
It should be noted that, in some embodiments of the present application, if there is a nested relationship and if statements or while statements are encountered in the nested relationship, the execution is compiled as described above, and all isolated linear execution sequences are concatenated into a large execution sequence, which can be directly identified by the device as shown in fig. 8, and the device identifies the execution sequence between the sequences by the ID numbers of the respective linear execution sequences.
Because the user determines the complete neural network structure expressed by the whole source code by artificially analyzing the logical relationship inside the source code at present, for example, if the control semantics of the while statement is to be realized, taking the deep learning framework as mindspore as an example, the originally written while statement under mindspore is as follows:
by the method for processing the control flow graph, a user can directly write according to semantic rules of source codes under mindspore, and written if statements and while statements are respectively as follows:
B. and inserting an assignment operator according to semantic rules of the source code.
Still taking the above-mentioned case that each time the simulation executes a linear execution sequence, it is determined whether an operator needs to be inserted after the linear execution sequence and what type of operator needs to be inserted, if there is a data-dependent relationship between one linear execution sequence and another linear execution sequence, for example, if the output data of the linear execution sequence a1 is the input data of the linear execution sequence a2, an assignment operator (also called an assign operator) needs to be inserted after the linear execution sequence a1, and the assignment operator assigns the address of the output data of the linear execution sequence a1 to the address of the input data of the linear execution sequence a 2. Specifically, when the control flow graph is cut, the data among the cut linear sub-graphs are connected through paramter, the output of one sub-graph may be the input of other sub-graphs, and when an exec instruction is executed through simulation, an assignment operator is inserted into the output, the data dependency between the sub-graphs and the sub-graphs is connected, and the specific operation flow is as follows: in the simulation execution, where a main graph calls a subgraph, an output node (namely a node) of a previous graph and a paramter node in the subgraph are set together, a set _ input is called to insert an assign operator behind the node of the previous graph, the output node is assigned to the paramter node of the called subgraph, the connection between the graphs is completed, an output is set for a global execution result after the subgraph connection is completed, if the graph of switch is executed finally, the assign operator is added to the last node of a true graph and a false graph, the value of output data is assigned to the output, and the finally connected large execution sequence is shown in FIG. 9, wherein the large execution sequence can be directly identified by a device, and the device identifies the execution sequence among the sequences through ID numbers of various linear execution sequences.
It should be noted that, in the embodiment of the present application, the control operator and the assignment operator are inserted between the linear execution sequences according to the semantics of the source code without conflict, that is, the control operator may be inserted only after the linear execution sequence including the control semantics, and if the output data of the linear execution sequence is also required to be used as the input data of other linear execution sequences, the assignment operator may be inserted after the linear execution sequence; if the current linear execution sequence does not include the control semantics, but the output data of the linear execution sequence needs to be used as the input data of other linear execution sequences, then only the assignment operator needs to be inserted after the linear execution sequence, and the specific details are not limited herein.
205. And obtaining a calculation chart formed by connecting a plurality of linear execution sequences.
The individual isolated linear execution sequences are connected in the manner described above into a computation graph that can be directly recognized by the device (e.g., an AI chip).
It should be noted that the obtained computation graph may be sent to the device, specifically, the computation graph may be obtained from the deep learning framework when the device needs to execute data, or the computation graph may be sent to the device after the deep learning framework connects a plurality of linear execution sequences into one computation graph, and specifically, the manner in which the computation graph is sent to the device is not limited here.
In the above embodiment of the present application, a source code is compiled through a deep learning framework to obtain a control flow graph, the control flow graph is then graph-partitioned and recompiled to obtain linear execution sequences supported by a plurality of devices (there is no control dependency between the linear execution sequences), finally, each linear execution sequence is simulated and executed on the deep learning framework, semantics corresponding to each linear execution sequence is identified in the simulation execution process, and an operator is inserted between each linear execution sequence according to the semantics, so that each linear execution subsequence is connected into a calculation graph which can be directly identified by the device, the calculation graph represents a complete neural network structure, and the processing method avoids artificially analyzing the internal logic relationship of the source code to determine the neural network structure expressed by the whole source code, is not prone to error, and is convenient and fast, the processing efficiency of the equipment is improved.
On the basis of the embodiment corresponding to fig. 2, in order to better implement the above-mentioned solution of the embodiment of the present application, a deep learning framework for implementing the above-mentioned solution is also provided below. Referring to fig. 10 in detail, fig. 10 is a schematic structural diagram of an execution device according to an embodiment of the present application, and the deep learning framework includes: the system comprises a first compiling module 1001, a cutting module 1002, a second compiling module 1003 and a simulation executing module 1004, wherein the first compiling module 1001 is used for compiling a source code to obtain a control flow graph corresponding to the source code; a segmentation module 1002, configured to segment the control flow graph to obtain a plurality of segmented sub-graphs; a second compiling module 1003, configured to recompile the multiple subgraphs to obtain multiple linear execution sequences, where the multiple linear execution sequences are multiple execution sequences without control dependency; the simulation execution module 1004 is configured to sequentially perform simulation execution on the plurality of linear execution sequences, identify semantics corresponding to the plurality of linear execution sequences during the simulation execution, and insert at least one operator between the plurality of linear execution sequences according to the semantics to connect the plurality of linear execution sequences into a computation graph.
In one possible design, the simulation execution module 1003 is specifically configured to: and inserting an assignment operator after a first linear execution sequence, wherein the assignment operator is used for assigning the address of the output data of the first linear execution sequence to the address of the input data of a second linear execution sequence, and the first linear execution sequence and the second linear execution sequence both belong to the plurality of linear execution sequences.
In one possible design, the simulation execution module 1003 is further specifically configured to: inserting a first control operator after a first linear execution sequence, wherein the first linear execution sequence is a linear execution sequence including control semantics in the plurality of linear execution sequences, the first control operator is used for jumping from the first linear execution sequence to a second linear execution sequence, and the second linear execution sequence is one of the plurality of linear execution sequences except the first linear execution sequence.
In one possible design, the control semantics condition judgment semantics (e.g., if statement with condition judgment semantics), and the simulation execution module 1003 is further specifically configured to: skipping to a plurality of second linear execution sequences by the first control operator, the plurality of second linear execution sequences being branches of a plurality of judgment results corresponding to the conditional judgment semantics.
In a possible design, the control semantic is a loop semantic (e.g., a while statement with a loop semantic), and after inserting a first control operator after the first linear execution sequence, the simulation execution module 1003 is further specifically configured to: jumping to the second linear execution sequence connected with the first linear execution sequence through the first control operator, wherein the second linear execution sequence is a loop structure corresponding to the first linear execution sequence; and inserting a second control operator after the second linear execution sequence, wherein the second control operator is used for circularly executing the first linear execution sequence and the second linear execution sequence until output data of the second linear execution sequence meets a preset value.
In one possible design, the simulation execution module 1003 is further specifically configured to: after all the linear execution sequences in the plurality of linear execution sequences are simulated and executed in sequence, inserting the at least one operator among the plurality of linear execution sequences according to the corresponding semantics of the plurality of linear execution sequences; or, simulating and executing a target linear execution sequence, wherein the target linear execution sequence is a linear execution sequence in the plurality of linear execution sequences which is being simulated and executed, and when the target linear execution sequence is simulated and executed, an operator corresponding to the semantics of the target linear execution sequence is inserted after the target linear execution sequence.
In one possible design, the simulation execution module 1003 is further specifically configured to: generating an instruction set according to corresponding semantics of a plurality of linear execution sequences, wherein the instruction set comprises a plurality of instructions used for indicating that at least one operator is inserted between the plurality of linear execution sequences.
In a possible design, if the deep learning framework is located at a first node, the simulation execution module 1003 is further specifically configured to: and sending the computation graph to a second node, where the first node and the second node may be located in the same device or in different devices, and the present disclosure is not limited herein.
In one possible design, the deep learning framework includes: mindspore, tensorflow, pitorch, mxnet, caffe, theta.
It should be noted that, the contents of information interaction, execution process, and the like between the modules/units in the deep learning framework provided in fig. 10 are based on the same concept as the method embodiment corresponding to fig. 2 in the present application, and specific contents may refer to the description in the foregoing method embodiment in the present application, and are not described herein again.
Referring to fig. 11, fig. 11 is a schematic structural diagram of an execution device provided in the embodiment of the present application, and for convenience of description, only portions related to the embodiment of the present application are shown, and details of the implementation are not disclosed, please refer to a method portion of the embodiment of the present application. The execution device 1100 may be deployed with corresponding modules in the deep learning framework described in the embodiment corresponding to fig. 10, and is configured to implement the functions of the deep learning framework in the embodiment corresponding to fig. 10, specifically, the execution device 1100 is implemented by one or more servers, and the execution device 1100 may generate a large difference due to different configurations or performances, and may include one or more processors 1122, a memory 1132, one or more storage media 1130 (e.g., one or more mass storage devices) for storing the application program 1142 or the data 1144. Specifically, the processor 1122 may be a Central Processing Unit (CPU), an embedded microcontroller, an AI processor, etc., and the type of the processor 1122 is not limited herein, wherein the memory 1132 and the storage medium 1130 may be a transient storage or a persistent storage. The program stored on the storage medium 1130 may include one or more modules (not shown), each of which may include a sequence of instructions for operating on the exercise device. Still further, the processor 1122 may be configured to communicate with the storage medium 1130 to execute a series of instruction operations in the storage medium 1130 on the execution device 1100, for example, the processor 1122 may call the series of instruction operations in the storage medium 1130 to perform the steps of: compiling a source code to obtain a control flow graph corresponding to the source code, segmenting the obtained control flow graph to obtain a plurality of segmented sub-graphs, recompiling the obtained sub-graphs to obtain a plurality of linear execution sequences, simulating and executing the linear execution sequences in sequence, and inserting at least one operator between the linear execution sequences according to semantics corresponding to the linear execution sequences to connect the linear execution sequences into a computation graph.
The execution apparatus 1100 may also include one or more power supplies 1126, one or more wired or wireless network interfaces 1150, one or more input-output interfaces 1158, and/or one or more operating systems 1141, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.
In this embodiment of the present application, the steps performed by the deep learning framework in the foregoing various processing methods related to the control flow graph may be executed on the processor by invoking relevant codes stored on the storage medium based on the structure shown in the execution device in fig. 11, which is not described herein in detail.
It should be noted that the above-described embodiments of the apparatus are merely schematic, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiments of the apparatus provided in the present application, the connection relationship between the modules indicates that there is a communication connection therebetween, and may be implemented as one or more communication buses or signal lines.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application may be substantially embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, an exercise device, or a network device) to execute the method according to the embodiments of the present application.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, training device, or data center to another website site, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a training device, a data center, etc., that incorporates one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.