CN114691516B

CN114691516B - Industrial robot debugging method based on natural language and computer vision

Info

Publication number: CN114691516B
Application number: CN202210375780.2A
Authority: CN
Inventors: 胡海洋; 李川豪; 陈洁; 李忠金
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-04-11
Filing date: 2022-04-11
Publication date: 2025-05-27
Anticipated expiration: 2042-04-11
Also published as: CN114691516A

Abstract

The present invention discloses an industrial robot debugging method based on the combination of natural language and computer vision. The present invention generates semantic information by using a word2vec network and a linear layer to describe natural language. The environmental features of the industrial robot are extracted by a three-dimensional recurrent convolutional neural network. The environmental features are input into a long short-term memory network to generate an intermediate context, and the intermediate context is used as an input of a GRU network to obtain an API recommendation through a recurrent neural network RNN. The text generated by the natural language description through the word2vec network is embedded into a long short-term memory network encoder and a long short-term memory network decoder, and an AST is output to construct an action sequence, and the API recommendation and the constructed action sequence are combined to generate an industrial robot debugging code. The debugging code is added to a robot program editor to complete the debugging. The present invention effectively improves the development efficiency of robot debugging and reduces the deployment time of robot production lines in industrial environments.

Description

Industrial robot debugging method based on combination of natural language and computer vision

Technical Field

The invention belongs to the technical field of industrial robots, and particularly relates to an industrial robot debugging method based on combination of natural language and computer vision.

Background

In recent years, with the initiative of the country for intelligent factories, the production and manufacturing industry starts to use robot technology on a large scale to assist production, and the intelligent manufacturing concept enters a comprehensive popularization stage from popularization. The robot is used as a main component of the intelligent factory, can help to improve the productivity of the factory, can finish operations which cannot be performed by workers, and can quickly adapt to new production requirements.

The current industrial robot debugging modes in the market can be divided into online debugging and offline debugging, wherein the online debugging needs to control the robot to complete specified actions and save the actions, the operations can be repeated by running the specified actions, the offline debugging needs to carry out virtual programming on the robot by a user through a software tool, the robot does not need to be stopped during the debugging, and the production operation is not hindered. These methods have corresponding drawbacks, whether online or offline. The online debugging requires the user to operate the robot on site, the user needs to have abundant experience, the complicated task programming needs to take a lot of time, the robot is required to stop in the debugging process to influence the production operation, the offline debugging also needs to have abundant experience and programming capability of corresponding robot languages, and the user is required to finely adjust the action of the robot according to the actual scene after the completion. For example, in an elevator company, the debugging of a robot is completed by adopting an offline debugging method, and an engineer realizes a new welding plate conveying production line, which takes six months, wherein the time spent in the virtual programming is up to five months. The debugging of robots in industrial environments requires not only specialized field knowledge of the user, but also knowledge of the industrial field environment and the motion trajectories of the robots, and even specialized engineers are required to spend a lot of time. Shortening the development period and rapidly adapting to the production requirement has become the main appeal of enterprises today.

Both the industry and the academia have shown great interest in the related fields of robots, and methods based on machine learning and deep learning have also made important progress, and learning natural language features and visual features through neural networks has become a hotspot in research today. However, the current deep learning model-based robot debugging method has disadvantages such as poor task code robustness, and the inability of debugging results to be suitable for field environments. The debugging method based on the method cannot be applied to the production environment of factories.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides an industrial robot debugging method based on the combination of natural language and computer vision.

The general idea of the method of the invention is:

In order to solve the defects of the traditional debugging method based on the neural network, the application adds the environmental features around the robot into the neural network, and simultaneously converts the natural language describing the robot code into semantic features and adds the semantic features into the network to strengthen the robustness. The application is mainly divided into four parts, 1) natural language description is used for generating semantic information (SemanticInformation) for representing language semantics through a word2vec network and a linear layer, and then semantic vectors with specified dimensions are generated through a linear network. 2) The characteristics of the photographed industrial robot environment are extracted through a three-dimensional cyclic convolutional neural network (3D-RCNN) model. 3) Inputting the obtained characteristics into a long-short-term memory network (LSTM) encoder to generate intermediate contexts, initializing the GRU network by using the semantic vector generated in 1), and outputting the intermediate contexts output in 2) as the input of the GRU network through a cyclic neural network (RNN) to obtain API recommendation. 4) The method comprises the steps of embedding a text generated by a word2vec network through natural language description, sequentially inputting a long-term memory network encoder and a long-term memory network decoder, outputting an Abstract Syntax Tree (AST) construction action sequence, and combining API recommendation and the construction action sequence to generate an industrial robot debugging code. 5) Adding the debugging code in the step 4) into a task module of a robot program editor, and finally processing the debugging code by a user to finish the debugging of the industrial robot, wherein the specific implementation steps are as follows:

generating semantic information;

extracting environmental characteristics;

Step (3) API recommendation generation;

step (4) target code programming of the industrial robot;

And (5) completing robot debugging.

It is a further object of the present invention to provide a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the above-mentioned method.

It is a further object of the present invention to provide a computing device comprising a memory having executable code stored therein and a processor which, when executing the executable code, implements the method described above.

The invention has the beneficial effects that the problems of low robot debugging efficiency and long robot program online period in the actual industrial environment are solved. The industrial robot debugging method based on the combination of the natural language and the computer vision provided by the invention has the main points of innovation that 1) the debugging of the robot is completed by using a method based on a neural network, 2) the image characteristic extraction based on the computer vision and the target code generation based on the natural language are combined, 3) the image characteristic under the industrial environment is extracted by using a three-dimensional cyclic convolution network, 4) the API recommendation is generated by using the neural network added with semantic information, and 5) the debugging work of the robot is completed by combining an encoder-decoder network and the API recommendation.

The invention does not need a great deal of robot programming knowledge, and the user can realize tasks by inputting natural language description for generating the robot codes, thereby providing a feasible debugging method for inexperienced novice and being used as an auxiliary means of a code engineer. The invention uses an advanced robot debugging technology based on the neural network, blends the industrial field environment into the neural network, and solves the problems of high time consumption and low accuracy of robot debugging in the industrial environment by considering that the existing neural network is not blended with environmental characteristics and actual scenes in the industrial environment. The invention can effectively improve the development efficiency of robot debugging and reduce the deployment time of the robot production line in an industrial environment.

Drawings

FIG. 1 is a schematic flow chart of the method of the invention;

FIG. 2 is a sequence diagram constructed for an action;

FIG.3 is a schematic diagram of coding.

Detailed Description

The invention will be further analyzed with reference to specific examples.

An industrial robot debugging method based on the combination of natural language and computer vision, as shown in fig. 1, comprises the following steps:

step (1), generating semantic information

1-1, A natural language description input word2vec network for describing a robot action code is used for generating text embedding, and the method comprises the following steps of:

Generating a text embedding vector matrix e= { E _i|i＝1,2,…,n}∈R^L×C by using a natural language instruction x= { X _i |i=1, 2, & n } through a word2vec network;

E=word2vec(X) (1)

Wherein x _i represents an ith natural language word, e _i represents an ith text-embedding vector, word2vec () represents a word2vec network function, L is the number of text-embedding, and C is the embedding dimension;

1-2 generating semantic information through two cascaded linear layers A by utilizing the text embedded vector matrix generated in the step 1-1, then inputting the semantic information into a linear layer B, and outputting semantic vectors with specified dimensions, wherein the method comprises the following steps:

1-2-1 converting the text embedded vector matrix E generated in the step 1-1 into a K-dimensional vector representation I, wherein the K dimension is L multiplied by C;

K=L×C (2)

1-2-2 using vector representation I as input to two cascaded linear layers A to obtain semantic information S;

S=W₂σ(W₁+b₁)+b₂ (3)

Wherein W ₁,W₂,b₁,b₂ is the trainable weight and bias of the corresponding linear function of the two cascaded linear layers A, respectively, and sigma is the ReLU activation function;

1-2-3 semantic information S is converted into semantic vectors of specified dimensions using a linear layer B.

Step (2) environmental feature extraction

The method takes the image data of the industrial environment where the industrial robot is as the input of the three-dimensional cyclic convolution network, thereby outputting the visual characteristics of the environment, and comprises the following steps:

Inputting an industrial robot site environment image Q into a three-dimensional cyclic convolution network to generate an environment visual feature f;

f=3D-RCNN(Q) (4)

Where f is the visual characteristic representation of the image environment and 3D-RCNN () is a three-dimensional circular convolution network function.

Step (3), API recommendation generation

3-1, Inputting the environment visual characteristics generated in the step (2) into a long-short-term memory network encoder for encoding to generate an intermediate semantic vector, wherein the method comprises the following steps of:

taking the environment visual characteristic f as input of a long-short-term memory network to generate a hidden state vector g, and then converting the hidden state vector g into an intermediate semantic vector;

g_t＝LSTM(f_t,g_t-1) (5)

wherein g _t is a hidden state vector at time t, g _t-1 is a hidden state vector at time t-1, f _t is an environmental visual characteristic at time t, and LSTM () is a long-short-term memory network function;

3-2 initializing the GRU network according to the semantic vector S generated in the step (1), converting the intermediate semantic vector generated in the step (3-1) into a hidden state vector, and then inputting the hidden state vector into the initialized GRU network to generate the intermediate vector, wherein the method comprises the following steps:

3-2-1 initializing a GRU network according to the semantic vector generated in the step (1);

3-2-2 converting the intermediate semantic vector generated in the step 3-1 into a hidden state vector r, and using the hidden state vector r as input of the GRU network to generate an intermediate vector k;

k_t＝GRU(r_t,k_t-1) (6)

Where k _t is the intermediate vector at time t, GRU () is a GRU function, k _t-1 is the intermediate vector at time t-1, and r _t is the hidden state vector at time t;

3-3 generating an API recommendation list by using the intermediate vector k generated in the step 3-2 through a cyclic neural network, wherein the method comprises the following steps:

Taking the intermediate vector generated in the step 3-2 as the input of a cyclic neural network (RNN), and then obtaining the probability distribution of the API recommendation list through softmax layer normalization;

P=softmax(RNN(k)) (7)

Where P is the probability distribution of the API recommendation list, softmax () is the normalized exponential function, RNN () is the recurrent neural network function, τ _i is the predicted probability of the ith API recommendation output by the recurrent neural network, y _i is the real-case API recommendation, Is the loss rate;

Step (4), object code programming of the industrial robot

4-1 Converting natural language description describing the robot action code into a text embedded vector E by using a word2vec network according to the formula (1);

4-2, using the text embedded vector E in the step 4-1 as input of a long-short-term memory network encoder, and then inputting output into a long-short-term memory network decoder to generate a construction action sequence of an AST tree, wherein the construction action sequence is specifically as follows:

4-2-1 taking the text embedded vector E as an input of a long and short term memory network encoder (LSTM-Encoder) to generate a hidden state vector h, and then converting the hidden state vector into an intermediate semantic vector;

h_t＝LSTM_e(e_t,h_t-1) (9)

Wherein h _t is a hidden state vector at time t, LSTM _e () represents a long-short-term memory network encoder function, e _t represents an embedded vector at time t, and h _t-1 represents a hidden state vector at time t-1;

4-2-2 converting the intermediate semantic vector into a hidden state vector θ, and taking the hidden state vector θ as an input of a long-short-term memory network Decoder (LSTM-Decoder);

Wherein θ _t is the hidden state at time t, θ _t-1 is the hidden state at time t-1, LSTM _d () represents a long-short-term memory network decoder function, [: represents a vector joint feature, a _t-1 represents an AST tree construction action at time t-1, Attention vector representing hidden state at time t-1, beta _t is vector containing father boundary information in the derivation process,For the attention vector of the hidden state at time t, h _t is the context vector, W _c is the connection layer function, and tanh () is the hyperbolic tangent function;

4-2-3 to generate a construction action sequence of an AST tree, specifically:

Wherein ApplyConstr [ c ] is one of the action types of AST tree (abstract syntax tree) construction actions, which can apply construction operation c to a boundary field of the same type as c, which can be used to fill a node, p (a _t＝ApplyConstr[c]|a_<t, x) represents action information before T time and probability of action ApplyConstr [ c ] under natural language description, a _t is an AST tree construction action representing T time, a _<t represents AST tree construction action information before T time, x is a natural language word, softmax () is a normalized exponential function, a _c is an AST tree construction action of construction operation c, T is a vector transpose, W is a connection layer function, Hiding the attention vector of the state for the moment t;

p(a_t＝GenToken[v]|a_<t,x)

=p(gen|a_t,x)p(v|gen,a_t,x)+

p(copy|a_t,x)p(v|copy,a_t,x) (13)

Wherein GenToken [ v ] is another action type in an AST tree construction action, which can fill an AST tree boundary field into a code v, (a _t＝GenToken[v]|a_<t, x) represents action information before a time slice and probability of action GenToken [ v ] under natural language description, a _t is an AST tree construction action at time t, a _<t represents AST tree construction action information before time t, x is a natural language word, gen is a generating operation, copy is a copying operation;

Finally, all the construction actions of the AST tree are obtained through formulas (12) - (13), and then the construction action sequence of the AST tree is obtained as shown in figure 2;

4-2-4, combining the API recommendation list generated in the step (3) and the construction action sequence of the AST tree generated in the step 4-2-3 to output a final robot code, wherein the method specifically comprises the following steps:

sequencing according to the probability distribution of the API recommendation list generated in the step (3) to obtain the API recommendation with the maximum probability, embedding the API into the construction action sequence of the AST tree generated in the step (4-2-3) to obtain the construction action sequence of the optimized AST tree, generating an Abstract Syntax Tree (AST) from the construction action sequence of the optimized AST tree, and converting the abstract syntax tree into a debugging code of the industrial robot in the current industrial environment through a conversion function, as shown in figure 3;

step (5) completing robot debugging

And inputting the debugging codes into a task module of the robot program editor to finish the debugging of the industrial robot.

Claims

1. An industrial robot debugging method based on the combination of natural language and computer vision, characterized by comprising the following steps:

Step (1): Generate semantic information from the natural language description of the robot action code Step (2): Extract environmental features

The image data of the industrial environment where the industrial robot is located is used as the input of the three-dimensional recurrent convolutional network to output the visual features of the environment;

Step (3): Generate API recommendations

3-1 Use the environmental visual features generated in step (2) to input into the long short-term memory network encoder for encoding to generate an intermediate semantic vector; the details are as follows:

The environmental visual feature f is used as the input of the long short-term memory network to generate a hidden state vector g, which is then converted into an intermediate semantic vector.

g _t =LSTM (f _t ,g _t-1 ) (5)

Where g _t is the hidden state vector at time t, g _t-1 is the hidden state vector at time t-1, f _t is the environmental visual feature at time t, and LSTM() is the long short-term memory network function;

3-2 Initialize the GRU network according to the semantic vector S generated in step (1), then convert the intermediate semantic vector generated in step (3-1) into a hidden state vector, and then input the hidden state vector into the initialized GRU network to generate an intermediate vector; specifically as follows:

3-2-1 Initialize the GRU network according to the semantic vector generated in step (1);

3-2-2 Convert the intermediate semantic vector generated in step 3-1 into a hidden state vector r, use the hidden state vector r as the input of the GRU network, and generate an intermediate vector k;

k _t =GRU(r _t ,k _t-1 ) (6)

Where k _t is the intermediate vector at time t, GRU() is the GRU function, k _t-1 is the intermediate vector at time t-1, and r _t is the hidden state vector at time t;

3-3 Use the intermediate vector k generated in step 3-2 to generate an API recommendation list through a recurrent neural network; the details are as follows:

The intermediate vector generated in step 3-2 is used as the input of the recurrent neural network (RNN), and then normalized through the softmax layer to obtain the probability distribution of the API recommendation list.

P = softmax(RNN(k)) (7)

Where P is the probability distribution of the API recommendation list, softmax() is the normalized exponential function, RNN() is the recurrent neural network function, τ _i is the predicted probability of the i-th API recommendation output by the recurrent neural network, _yi is the actual API recommendation, is the loss rate;

Step (4): Target code programming of industrial robots

4-1 According to formula (1), the word2vec network is used to convert the natural language description of the robot action code into a text embedding vector E;

4-2 Use the text embedding vector E in step 4-1 as the input of the long short-term memory network encoder, and then input the output into the long short-term memory network decoder to generate the AST tree construction action sequence; specifically as follows:

4-2-1 Take the text embedding vector E as the input of the long short-term memory network encoder (LSTM-Encoder) to generate a hidden state vector h, and then convert the hidden state vector into an intermediate semantic vector;

h _t =LSTM _e (e _t ,h _t-1 ) (9)

Where h _t is the hidden state vector at time t; LSTM _e () represents the long short-term memory network encoder function, e _t represents the embedding vector at time t, and h _t-1 represents the hidden state vector at time t-1;

4-2-2 Convert the intermediate semantic vector into a hidden state vector θ, and use the hidden state vector θ as the input of the long short-term memory network decoder (LSTM-Decoder);

Where _θt is the hidden state at time t, θt _-1 is the hidden state at time t-1, _LSTMd () represents the long short-term memory network decoder function, [:] represents the vector joint feature, _at-1 represents the AST tree construction action at time t-1, represents the attention vector of the hidden state at time t-1, _βt is the vector containing the father boundary information in the derivation process, is the attention vector of the hidden state at time t, h _t is the context vector, W _c is the connection layer function, and tanh() is the hyperbolic tangent function;

4-2-3 Generate the construction action sequence of the AST tree, specifically:

Where ApplyConstr[c] is one of the action types in the AST tree (abstract syntax tree) construction action, which applies the construction operation c to the boundary field of the same type as c, which is used to fill the node; p( _at = ApplyConstr[c]|a _<t , x) represents the probability of the action ApplyConstr[c] under the action information and natural language description before time t, a _t represents the AST tree construction action at time t, a _<t represents the AST tree construction action information before time t, x is a natural language word, softmax() is a normalized exponential function, a _c is the AST tree construction action of the construction operation c, T is a vector transpose, W is a connection layer function, is the attention vector of the hidden state at time t;

p( _at = GenToken[v]|a _<t , x)

=p(gen|a _t ,x)p(v|gen,a _t ,x)+

p(copy|a _t ,x)p(v|copy,a _t ,x) (13)

Where GenToken[v] is another type of action in the AST tree construction action, which fills the AST tree boundary field with code v; pa( _t = GenToken[v]| _a<t , x) represents the probability of action GenToken[v] under the action information and natural language description before the time slice, a _t represents the AST tree construction action at time t, a _<t represents the AST tree construction action information before time t, x is a natural language word, gen is a generation operation, and copy is a copy operation;

Finally, formulas (12)-(13) are used to obtain all the construction actions of the AST tree, and then the construction action sequence of the AST tree is obtained;

4-2-4 Combine the API recommendation list generated in step (3) and the AST tree construction action sequence generated in step 4-2-3 to output the final robot code;

Step (5): Complete robot debugging

Use the code obtained in step 4-2-4 to enter it into the task module of the robot program editor to complete the debugging of the industrial robot.

2. The method according to claim 1, characterized in that step (1) is specifically as follows:

1-1 Input the natural language description of the robot action code into the word2vec network to generate text embedding;

1-2 Use the text embedding vector matrix generated in step 1-1 to generate semantic information through two cascaded linear layers A, and then input the semantic information into the linear layer B to output a semantic vector of a specified dimension.

3. The method according to claim 2, characterized in that step 1-1 is specifically as follows:

The natural language instruction X＝{ _xi |i＝1,2,…,n} composed of i natural language words is generated through the word2vec network to generate a text embedding vector matrix E＝{ _ei |i＝1,2,…,n}∈RL ^×C ;

E = word2vec(X) (1)

Where _xi represents the i-th natural language word, _ei represents the i-th text embedding vector, word2vec() represents the word2vec network function, L is the number of text embeddings, and C is the embedding dimension.

4. The method according to claim 2, characterized in that steps 1-2 are specifically as follows:

1-2-1 Convert the text embedding vector matrix E generated in step 1-1 into a K-dimensional vector representation I; where the dimension of K is L×C;

K＝L×C (2)

1-2-2 Use the vector representation I as the input of two cascaded linear layers A to obtain semantic information S;

S＝W ₂ σ(W ₁ +b ₁ )+b ₂ (3)

Where W ₁ , W ₂ , b ₁ , b ₂ are the trainable weights and biases of the linear functions corresponding to the two cascaded linear layers A, and σ is the ReLU activation function;

1-2-3 The semantic information S is converted into a semantic vector of specified dimension using the linear layer B.

5. The method according to claim 1, characterized in that step (2) is specifically as follows:

Input the industrial robot's on-site environment image Q into the three-dimensional recurrent convolutional network to generate the environmental visual feature f;

f = 3D-RCNN(Q) (4)

Where f is the visual feature representation of the image environment, and 3D-RCNN() is a three-dimensional recurrent convolutional network function.

6. The method according to claim 1, characterized in that step 4-2 specifically comprises:

The API recommendation list generated in step (3) is sorted according to its probability distribution to obtain the API recommendation with the highest probability. This API is embedded into the construction action sequence of the AST tree generated in step 4-2-3 to obtain the construction action sequence of the optimized AST tree. The construction action sequence of the optimized AST tree is then used to generate an abstract syntax tree (AST), which is then converted into the debugging code of the industrial robot in the current industrial environment through a conversion function.

7. A computer-readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to execute the method according to any one of claims 1 to 6.

8. A computing device, comprising a memory and a processor, wherein the memory stores executable code, and when the processor executes the executable code, the method according to any one of claims 1 to 6 is implemented.