Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an industrial robot debugging method based on the combination of natural language and computer vision.
The general idea of the method of the invention is:
In order to solve the defects of the traditional debugging method based on the neural network, the application adds the environmental features around the robot into the neural network, and simultaneously converts the natural language describing the robot code into semantic features and adds the semantic features into the network to strengthen the robustness. The application is mainly divided into four parts, 1) natural language description is used for generating semantic information (SemanticInformation) for representing language semantics through a word2vec network and a linear layer, and then semantic vectors with specified dimensions are generated through a linear network. 2) The characteristics of the photographed industrial robot environment are extracted through a three-dimensional cyclic convolutional neural network (3D-RCNN) model. 3) Inputting the obtained characteristics into a long-short-term memory network (LSTM) encoder to generate intermediate contexts, initializing the GRU network by using the semantic vector generated in 1), and outputting the intermediate contexts output in 2) as the input of the GRU network through a cyclic neural network (RNN) to obtain API recommendation. 4) The method comprises the steps of embedding a text generated by a word2vec network through natural language description, sequentially inputting a long-term memory network encoder and a long-term memory network decoder, outputting an Abstract Syntax Tree (AST) construction action sequence, and combining API recommendation and the construction action sequence to generate an industrial robot debugging code. 5) Adding the debugging code in the step 4) into a task module of a robot program editor, and finally processing the debugging code by a user to finish the debugging of the industrial robot, wherein the specific implementation steps are as follows:
generating semantic information;
extracting environmental characteristics;
Step (3) API recommendation generation;
step (4) target code programming of the industrial robot;
And (5) completing robot debugging.
It is a further object of the present invention to provide a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the above-mentioned method.
It is a further object of the present invention to provide a computing device comprising a memory having executable code stored therein and a processor which, when executing the executable code, implements the method described above.
The invention has the beneficial effects that the problems of low robot debugging efficiency and long robot program online period in the actual industrial environment are solved. The industrial robot debugging method based on the combination of the natural language and the computer vision provided by the invention has the main points of innovation that 1) the debugging of the robot is completed by using a method based on a neural network, 2) the image characteristic extraction based on the computer vision and the target code generation based on the natural language are combined, 3) the image characteristic under the industrial environment is extracted by using a three-dimensional cyclic convolution network, 4) the API recommendation is generated by using the neural network added with semantic information, and 5) the debugging work of the robot is completed by combining an encoder-decoder network and the API recommendation.
The invention does not need a great deal of robot programming knowledge, and the user can realize tasks by inputting natural language description for generating the robot codes, thereby providing a feasible debugging method for inexperienced novice and being used as an auxiliary means of a code engineer. The invention uses an advanced robot debugging technology based on the neural network, blends the industrial field environment into the neural network, and solves the problems of high time consumption and low accuracy of robot debugging in the industrial environment by considering that the existing neural network is not blended with environmental characteristics and actual scenes in the industrial environment. The invention can effectively improve the development efficiency of robot debugging and reduce the deployment time of the robot production line in an industrial environment.
Detailed Description
The invention will be further analyzed with reference to specific examples.
An industrial robot debugging method based on the combination of natural language and computer vision, as shown in fig. 1, comprises the following steps:
step (1), generating semantic information
1-1, A natural language description input word2vec network for describing a robot action code is used for generating text embedding, and the method comprises the following steps of:
Generating a text embedding vector matrix e= { E i|i=1,2,…,n}∈RL×C by using a natural language instruction x= { X i |i=1, 2, & n } through a word2vec network;
E=word2vec(X) (1)
Wherein x i represents an ith natural language word, e i represents an ith text-embedding vector, word2vec () represents a word2vec network function, L is the number of text-embedding, and C is the embedding dimension;
1-2 generating semantic information through two cascaded linear layers A by utilizing the text embedded vector matrix generated in the step 1-1, then inputting the semantic information into a linear layer B, and outputting semantic vectors with specified dimensions, wherein the method comprises the following steps:
1-2-1 converting the text embedded vector matrix E generated in the step 1-1 into a K-dimensional vector representation I, wherein the K dimension is L multiplied by C;
K=L×C (2)
1-2-2 using vector representation I as input to two cascaded linear layers A to obtain semantic information S;
S=W2σ(W1+b1)+b2 (3)
Wherein W 1,W2,b1,b2 is the trainable weight and bias of the corresponding linear function of the two cascaded linear layers A, respectively, and sigma is the ReLU activation function;
1-2-3 semantic information S is converted into semantic vectors of specified dimensions using a linear layer B.
Step (2) environmental feature extraction
The method takes the image data of the industrial environment where the industrial robot is as the input of the three-dimensional cyclic convolution network, thereby outputting the visual characteristics of the environment, and comprises the following steps:
Inputting an industrial robot site environment image Q into a three-dimensional cyclic convolution network to generate an environment visual feature f;
f=3D-RCNN(Q) (4)
Where f is the visual characteristic representation of the image environment and 3D-RCNN () is a three-dimensional circular convolution network function.
Step (3), API recommendation generation
3-1, Inputting the environment visual characteristics generated in the step (2) into a long-short-term memory network encoder for encoding to generate an intermediate semantic vector, wherein the method comprises the following steps of:
taking the environment visual characteristic f as input of a long-short-term memory network to generate a hidden state vector g, and then converting the hidden state vector g into an intermediate semantic vector;
gt=LSTM(ft,gt-1) (5)
wherein g t is a hidden state vector at time t, g t-1 is a hidden state vector at time t-1, f t is an environmental visual characteristic at time t, and LSTM () is a long-short-term memory network function;
3-2 initializing the GRU network according to the semantic vector S generated in the step (1), converting the intermediate semantic vector generated in the step (3-1) into a hidden state vector, and then inputting the hidden state vector into the initialized GRU network to generate the intermediate vector, wherein the method comprises the following steps:
3-2-1 initializing a GRU network according to the semantic vector generated in the step (1);
3-2-2 converting the intermediate semantic vector generated in the step 3-1 into a hidden state vector r, and using the hidden state vector r as input of the GRU network to generate an intermediate vector k;
kt=GRU(rt,kt-1) (6)
Where k t is the intermediate vector at time t, GRU () is a GRU function, k t-1 is the intermediate vector at time t-1, and r t is the hidden state vector at time t;
3-3 generating an API recommendation list by using the intermediate vector k generated in the step 3-2 through a cyclic neural network, wherein the method comprises the following steps:
Taking the intermediate vector generated in the step 3-2 as the input of a cyclic neural network (RNN), and then obtaining the probability distribution of the API recommendation list through softmax layer normalization;
P=softmax(RNN(k)) (7)
Where P is the probability distribution of the API recommendation list, softmax () is the normalized exponential function, RNN () is the recurrent neural network function, τ i is the predicted probability of the ith API recommendation output by the recurrent neural network, y i is the real-case API recommendation, Is the loss rate;
Step (4), object code programming of the industrial robot
4-1 Converting natural language description describing the robot action code into a text embedded vector E by using a word2vec network according to the formula (1);
4-2, using the text embedded vector E in the step 4-1 as input of a long-short-term memory network encoder, and then inputting output into a long-short-term memory network decoder to generate a construction action sequence of an AST tree, wherein the construction action sequence is specifically as follows:
4-2-1 taking the text embedded vector E as an input of a long and short term memory network encoder (LSTM-Encoder) to generate a hidden state vector h, and then converting the hidden state vector into an intermediate semantic vector;
ht=LSTMe(et,ht-1) (9)
Wherein h t is a hidden state vector at time t, LSTM e () represents a long-short-term memory network encoder function, e t represents an embedded vector at time t, and h t-1 represents a hidden state vector at time t-1;
4-2-2 converting the intermediate semantic vector into a hidden state vector θ, and taking the hidden state vector θ as an input of a long-short-term memory network Decoder (LSTM-Decoder);
Wherein θ t is the hidden state at time t, θ t-1 is the hidden state at time t-1, LSTM d () represents a long-short-term memory network decoder function, [: represents a vector joint feature, a t-1 represents an AST tree construction action at time t-1, Attention vector representing hidden state at time t-1, beta t is vector containing father boundary information in the derivation process,For the attention vector of the hidden state at time t, h t is the context vector, W c is the connection layer function, and tanh () is the hyperbolic tangent function;
4-2-3 to generate a construction action sequence of an AST tree, specifically:
Wherein ApplyConstr [ c ] is one of the action types of AST tree (abstract syntax tree) construction actions, which can apply construction operation c to a boundary field of the same type as c, which can be used to fill a node, p (a t=ApplyConstr[c]|a<t, x) represents action information before T time and probability of action ApplyConstr [ c ] under natural language description, a t is an AST tree construction action representing T time, a <t represents AST tree construction action information before T time, x is a natural language word, softmax () is a normalized exponential function, a c is an AST tree construction action of construction operation c, T is a vector transpose, W is a connection layer function, Hiding the attention vector of the state for the moment t;
p(at=GenToken[v]|a<t,x)
=p(gen|at,x)p(v|gen,at,x)+
p(copy|at,x)p(v|copy,at,x) (13)
Wherein GenToken [ v ] is another action type in an AST tree construction action, which can fill an AST tree boundary field into a code v, (a t=GenToken[v]|a<t, x) represents action information before a time slice and probability of action GenToken [ v ] under natural language description, a t is an AST tree construction action at time t, a <t represents AST tree construction action information before time t, x is a natural language word, gen is a generating operation, copy is a copying operation;
Finally, all the construction actions of the AST tree are obtained through formulas (12) - (13), and then the construction action sequence of the AST tree is obtained as shown in figure 2;
4-2-4, combining the API recommendation list generated in the step (3) and the construction action sequence of the AST tree generated in the step 4-2-3 to output a final robot code, wherein the method specifically comprises the following steps:
sequencing according to the probability distribution of the API recommendation list generated in the step (3) to obtain the API recommendation with the maximum probability, embedding the API into the construction action sequence of the AST tree generated in the step (4-2-3) to obtain the construction action sequence of the optimized AST tree, generating an Abstract Syntax Tree (AST) from the construction action sequence of the optimized AST tree, and converting the abstract syntax tree into a debugging code of the industrial robot in the current industrial environment through a conversion function, as shown in figure 3;
step (5) completing robot debugging
And inputting the debugging codes into a task module of the robot program editor to finish the debugging of the industrial robot.