Plant disease and severity identification method based on multi-label strategy
Technical Field
The invention relates to the technical field of image recognition, in particular to a plant disease and severity recognition method based on a multi-label strategy.
Background
Crop disease is one of the major challenges facing global grain production, such as powdery mildew, early blight, late blight, etc. These diseases can impair the ability of photosynthesis by affecting leaf health, thereby affecting the growth of crops, and even causing them to decay. Through disease automatic diagnosis, early detection of diseases has important significance for crop production. Therefore, crop disease detection has become one of the important issues to be resolved in agricultural intelligence.
The traditional disease identification is mainly carried out through manual observation and is judged by depending on own experience of an observer. However, this way of manual identification is not only time-consuming and laborious, but also the accuracy cannot be guaranteed. To solve this problem, technical experts have been working to find methods capable of automatically identifying crop diseases. With the development of computer vision technology, the crop disease detection task also has a new solution idea. The computer vision technology is applied to the disease detection task, so that the disease detection speed and the identification accuracy can be greatly improved, the manual intervention is reduced, and the cost is reduced. Such as support vector machine, self-organizing feature, nearest neighbor classification, etc., has good identification performance on plant diseases. The crop disease identification based on computer vision mainly adopts a machine learning technology, and compared with manual observation, the machine learning has the advantages that crop disease characteristics can be established manually, and the automatic detection of diseases is realized. However, the classical machine learning method requires strict requirements on disease expertise, needs manual construction of features, has poor model robustness, has low recognition accuracy in complex environments, and limits the application of the method in intelligent disease recognition directions. With the continuous improvement of the computer performance and the continuous progress of the deep learning algorithm, the deep learning method gradually replaces the traditional machine learning algorithm in the disease detection field, and becomes the mainstream. The deep learning model can automatically generate model parameters on the basis of a large amount of data and learn disease characteristics, so that the model accuracy is better, and meanwhile, the model has better robustness in a complex environment. The convolutional neural network (convolutional neural network, CNN) and the transducer module algorithm are most widely used, such as R.Amanda training of the concept v3 network, good effect is achieved in cassava disease detection, qiaokang Liang is used for providing PD2SE-Net for achieving multitasking classification of disease diagnosis and severity diagnosis, and Fengyi Wang is used for identifying cucumber diseases by improving swin transducers and improving improved swinnt.
However, careful analysis of these papers has found that these papers combine crop, disease, severity labels into a new crop-disease-severity label for classification tasks, or use a multi-branched model to separately classify crop, disease, severity, both of which are single-label classification approaches. The former can lead to an increase in the number of classifications and an increase in classification difficulty; the latter classifies crops, diseases and severity degree respectively through multiple branches, which increases model parameters and training difficulty.
Disclosure of Invention
The invention provides a plant disease and severity identification method based on a multi-label strategy, which aims to solve the technical problem that the classification difficulty is increased when the existing classification method adopts a single label for classification.
In order to achieve the above purpose, the technical scheme of the invention is realized as follows:
the invention provides a plant disease and severity identification method based on a multi-label strategy, which comprises the following steps:
s1, collecting a plurality of crop disease pictures to form a crop disease data set, and dividing the crop disease data set into a training set, a verification set and a test set; labeling all crop disease pictures in the training set and the verification set to form multiple labels;
s2, constructing an LDI-NET network, wherein the LDI-NET network comprises a multi-label feature extraction module, an information fusion communication module, a multi-scale information fusion module and a multi-label prediction decoding module which are connected in sequence;
s3, randomly selecting a batch of crop disease pictures with multiple tags from a training set, inputting the crop disease pictures into a multi-tag feature extraction module, extracting features of the crop disease pictures with the multiple tags by the multi-tag feature extraction module to obtain features with disease space information, inputting the features with the disease space information into an information fusion communication module for modeling to obtain disease semantic features, and inputting the disease semantic features and the features with the disease space information into a multi-tag prediction decoding module to obtain a multi-tag classification result by the multi-scale information fusion module;
s4, establishing a total loss function, calculating a multi-label classification result and a corresponding multi-label loss value, repeating the steps S3 to S4 until the LDI-NET network converges, verifying by using a verification set, and selecting a group of weights with the best accuracy of the verification set as the weights of the LDI-NET network to obtain a trained LDI-NET network;
s5, testing the trained LDI-NET network by using the test set.
Further, the multi-label feature extraction module comprises an image block embedding module, a plurality of transducer modules and a convolution layer which are connected in sequence;
the information fusion communication module comprises a first multi-head self-attention mechanism and a first multi-layer perceptron which are sequentially connected from left to right; the first head self-attention mechanism and the input end and the output end of the first multi-layer sensor are respectively connected with normalization and residual error;
the multi-label predictive decoding module comprises a third multi-head self-attention mechanism, a second multi-head self-attention mechanism and a second multi-layer perceptron which are sequentially connected from bottom to top.
Further, the number of the transducer modules in the multi-label feature extraction module is 12, and the 12 transducer modules are sequentially connected.
Further, the first multi-layer sensor is formed by stacking a linear layer, a GELU activation function and a Dropout.
Further, the step S3 specifically includes the following steps:
s31, randomly selecting a batch of crop disease pictures with multiple labels from the training set, and inputting the crop disease pictures into a multiple label feature extraction module;
s32, dividing a crop disease picture with multiple labels into image blocks on a pixel space by an image block embedding module in the multi-label feature extraction module, superposing a trainable parameter on the image blocks as a position code, flattening the image blocks with the superposed position codes, inputting the flattened image blocks into a plurality of transform modules for global feature extraction, and inputting the extracted features into a convolution layer for local feature extraction to obtain features with disease space information;
s33, inputting the characteristics with the disease space information into an information fusion communication module through a learnable parameter matrix W Q1 、W K1 、W V1 Obtaining a first query matrix Q 1 First key matrix K 1 And a first value matrix V 1 Calculate a first query matrix Q 1 And a first key matrix K 1 The similarity is combined through a first multi-head self-attention mechanism, and finally, disease semantic features are obtained through transformation of a first multi-layer perceptron;
s34, inputting the characteristics with the disease space information obtained in S32 and the disease semantic characteristics obtained in S33 into multi-scale information fusion for fusion to obtain a fusion result X 2 ;
S35, fusing the result X 2 Input to the multi-tag predictive decoding module and encode X in combination with initialization features within the multi-tag predictive decoding module 3 And obtaining a multi-label classification result.
Further, the S32 is expressed by a formula as follows:
Output=Conv2d(TB…TB(Conv2d(image))
wherein Output represents a feature containing disease space information, conv2d () represents a convolution operation of a convolution layer, TB represents a transducer module, and image represents crop disease data.
Further, the formula of S33 is specifically as follows:
Q 1 =X 1 W Q1
K 1 =X 1 W K1
V 1 =X 1 W V1
Multihead 1 =Concat(head 1 ,head 2 ,…,head n )W O
wherein X is 1 Features representing the input sequence, i.e., disease space information; w (W) Q1 、W K1 、W V1 Respectively a first query matrix Q 1 Is a learnable parameter matrix, a first key matrix K 1 Is a learnable parameter matrix, a first value matrix V 1 Is a matrix of learnable parameters; attention (-) represents a single head Attention mechanism; softmax (. -%) represents the normalization process; k (K) 1 T Representing a first key matrix K 1 Is a transposed matrix of (a); d, d k Representing a first key matrix K 1 Is a dimension of (2); concat (-) indicates a splice operation; head part i 、h n Respectively representing the ith self-attention and the nth self-attention; w (W) O Representing a learnable mapping matrix; multihead 1 Representing a first multi-headed self-attention mechanism; w (W) i Q1 、And->Respectively represent a first query matrix Q 1 Is the i-th matrix of learnable parameters, the first key matrix K 1 Is the i-th matrix of learnable parametersFirst value matrix V 1 Is the i-th matrix of learnable parameters.
Further, the step S35 specifically includes the following steps:
s351, self-adaptive initialization feature encoding X stored in multi-label predictive decoding module 3 As input to the third multi-headed self-attention mechanism, through a learnable matrix W Q3 、W K3 、W V3 Obtaining a third query matrix Q 3 Third key matrix K 3 And a third value matrix V 3 The method comprises the steps of carrying out a first treatment on the surface of the Calculate a third query matrix Q 3 And a third key matrix K 3 And combining the similarity of the first multi-head self-attention mechanism to obtain an output result of the first multi-head self-attention mechanism
S352, then the result of the third multi-head attention mechanismInputting into the formula (1) to obtain a second query matrix Q 2 The formula (1) is specifically as follows:
wherein WQ2 represents a second query matrix Q 2 Is a matrix of learnable parameters;
s353, fusing result X obtained in S34 2 Inputting into the formula (2) and the formula (3) to obtain a second key matrix K 2 And a second value matrix V 2 The method comprises the steps of carrying out a first treatment on the surface of the The formulas (2) and (3) are specifically as follows:
K 2 =X 2 W K2 (2)
V 2 =X 2 W V2 (3)
wherein WK2 represents the second key matrix K 2 Is a matrix of learnable parameters; WV2 represents a second value matrix V 2 Is a matrix of learnable parameters;
matrix Q of second query 2 Second key matrix K 2 And a second value matrix V 2 Inputting the first multi-head attention mechanism into the formulas (4) to (6) to obtain an output result of the second multi-head attention mechanism:
Multihead 2 =Concat(head 1 ,head 2 ,…,head n )W O (6)
wherein, multitead 2 Representing a second multi-headed attention mechanism; k (K) 2 T Representing a second key matrix K 2 Is a transposed matrix of (a); w (W) i Q2 、And->Respectively represent the second query matrix Q 2 Is the i-th matrix of leachable parameters, the second key matrix K 2 Is the i-th matrix of learnable parameters, a second matrix of values V 2 Is the i-th matrix of learnable parameters;
s354, inputting the output result of the second multi-head self-attention mechanism into a second multi-layer sensor to obtain a multi-label classification result.
Further, the S351 is expressed as follows:
K 3 =X 3 W K3
V 3 =X 3 W V3
Q 3 =X 3 W Q3
Multihead 3 =Concat(head 1 ,head 2 ,…,head n )W O
wherein W is Q3 、W K3 、W V3 Respectively represent a third query matrix Q 3 Is a matrix of learnable parameters, a matrix K of third keys 3 A learnable parameter matrix and a third value matrix V 3 Is a matrix of learnable parameters; w (W) i Q3 、Respectively represent a third query matrix Q 3 Is the i th matrix of learnable parameters, the third key matrix K 3 Is the i-th matrix of leachable parameters, the third matrix of values V 3 Is the i-th matrix of learnable parameters; multihead 3 Representing a third multi-headed self-attention mechanism; k (K) 3 T Representing a third key matrix K 3 Is a transposed matrix of (a).
The invention has the beneficial effects that:
1. the invention provides an LDI-NET network (leaf disease identification multi-label classification network) capable of being trained end to end, which can synchronously identify the type, disease and severity of plant leaves, and is more convenient and practical;
2. the LDI-NET network is classified by utilizing multiple labels, and the multi-label classification is different from the single-label classification, so that the complexity of classification can be reduced and model branches are not increased; the multi-label classification mode combines the advantages of small classification complexity of the multi-task network and no extra branch of the single-task network;
3. the LDI-NET network is an end-to-end trainable network, and required characteristics can be automatically extracted through learning in training;
4. the LDI-NET network adopts CNN (convolutional layer) to extract local features and a transducer module to extract global features, so that the LDI-NET network has stronger feature extraction capability;
5. in order to enhance global information extraction capability and reduce invalid information interference, a residual error connection structure is designed in an information fusion communication module of an LDI-NET network, low-dimensional characteristics with disease space information and high-dimensional disease semantic characteristics are connected, and the characteristic expression capability of the module is further improved.
Drawings
FIG. 1 is a schematic diagram of the structure of the present invention;
fig. 2 is a block diagram of a multi-tag feature extraction module.
Detailed Description
In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many other different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
Referring to fig. 1, an embodiment of the present application provides a method for identifying plant disease and severity based on a multi-tag strategy, comprising the steps of:
s1, collecting a plurality of crop disease pictures to form a crop disease data set, and dividing the crop disease data set into a training set, a verification set and a test set; labeling all crop disease pictures in the training set and the verification set to form multiple labels;
s2, constructing an LDI-NET network, wherein the LDI-NET network comprises a multi-label feature extraction module, an information fusion communication module, a multi-scale information fusion module and a multi-label prediction decoding module which are connected in sequence;
s3, randomly selecting a batch of crop disease pictures with multiple labels from a training set, inputting the crop disease pictures into a multi-label feature extraction module, carrying out feature extraction on the crop disease pictures with the multiple labels by the multi-label feature extraction module to obtain low-dimensional features with disease space information, inputting the low-dimensional features with the disease space information into an information fusion communication module for modeling to obtain high-dimensional disease semantic features, and the process promotes information exchange among the disease features, can pay more attention to mutual connection among three types of features of crops, diseases and severity, and improves the capability of the multi-label feature extraction module for identifying the disease features;
the multi-scale information fusion module is used for fusing the high-dimensional disease semantic features with the low-dimensional features with the disease space information and inputting the high-dimensional disease semantic features and the low-dimensional features into the multi-label predictive decoding module to obtain a multi-label classification result;
s4, establishing a cross entropy loss function, calculating a multi-label classification result and a corresponding multi-label loss value, repeating the steps S3 to S4 until the LDI-NET network converges, verifying by using a verification set, and selecting a group of weights with the highest accuracy of the verification set as the weights of the LDI-NET network to obtain a trained LDI-NET network;
s5, testing the trained LDI-NET network by using the test set.
Referring to fig. 2, in some embodiments, the multi-tag feature extraction module includes an image block embedding module, 12 transform modules, and a convolutional layer connected in sequence;
the information fusion communication module comprises a first Multi-head Self-Attention (MSA) and a first Multi-layer perceptron (MLP) which are sequentially connected from left to right; the first head self-attention mechanism and the input end and the output end of the first multi-layer sensor are respectively connected with normalization (Norm) and residual error; the degradation of the information fusion communication module in the training process can be effectively prevented and the convergence of the LDI-NET network can be quickened by adding standardization and residual errors.
In order to enhance global information extraction capability and reduce invalid information interference, a residual error connection structure is designed in an information fusion communication module of an LDI-NET network, low-dimensional characteristics and high-dimensional disease semantic characteristics with disease space information are connected, and the characteristic expression capability of the information fusion communication module is further improved.
The multi-label predictive decoding module comprises a third multi-head self-attention mechanism, a second multi-head self-attention mechanism and a second multi-layer perceptron which are sequentially connected from bottom to top.
In some embodiments, the first multi-layer perceptron (MLP) is formed by stacking a linear layer, a gel activation function, dropout (neural network element in training, which is temporarily discarded from the network with a certain probability). The net of pure attention mechanism can appear rank collapse phenomenon when carrying out the training of crops disease classification model, MLP then can control convergence rate through increasing the schitz constant, prevents from the information expression ability of attention net and is the decay of double-exponential form with the increase of degree of depth for the information fusion exchange module takes place the degradation phenomenon, and on the other hand MLP structure carries out further transformation to the crops disease label characteristic of extracting through full tie-layer, increases the expression ability of information fusion exchange module and makes crops disease multi-label classification's result more accurate.
In some embodiments, the step S3 specifically includes the following steps:
s31, randomly selecting a batch of crop disease pictures with multiple labels from the training set, and inputting the crop disease pictures into a multiple label feature extraction module;
s32, dividing a crop disease picture with multiple labels into image blocks on a pixel space by an image block embedding module in the multi-label feature extraction module, superposing a trainable parameter on the image blocks to serve as a position code in order to keep position information of an original image, flattening the image blocks with the superposed position code into sequence data which are easy to process by a transducer module, inputting the sequence data into a plurality of transducer modules for global feature extraction, inputting the extracted features into a convolution layer for local feature extraction, and obtaining low-dimensional features with disease space information;
in the multi-label feature extraction module, CNN (convolution layer) captures local features, but CNN limited by the size of receptive fields has a large number of defects in terms of global information interaction and efficient feature representation, and a self-attention mechanism in a transducer module can better extract long-range information by utilizing global relations among spatial pixels, so that efficient feature representation is realized, but has a large number of defects in terms of local feature extraction. In order to solve the problems, the invention combines a transducer module and a CNN, designs a multi-label feature extraction module, and the module can combine the advantages of the CNN and the transducer module to extract local and global features with high efficiency, so that richer disease semantic information is acquired, and multi-label classification of diseases is facilitated.
S33, inputting the characteristics with the disease space information into an information fusion communication module through a learnable parameter matrix W Q1 、W K1 、W V1 Obtaining a first query matrix Q 1 First key matrix K 1 And a first value matrix V 1 Calculate a first query matrix Q 1 And a first key matrix K 1 The similarity is combined through a first multi-head self-attention mechanism, and finally, disease semantic features are obtained through transformation of a first multi-layer perceptron;
the information fusion communication module can encode the input crop disease characteristics through the hidden information matrix, and the relations among different crop disease type characteristic information are mined. This helps to promote the communication of information between features, and better characterize the features.
In the information fusion communication module, in view of the relationship established between the three types of information, namely crop characteristic information, disease characteristic information and severity characteristic information in the output result of the multi-label characteristic extraction module, the relationship cannot be fully expressed. The information fusion communication module models the characteristics containing the disease space information, and the input characteristics pass through the MSA and the MLP. The MSA can promote information exchange among disease information blocks, and can pay more attention to the mutual connection among three types of characteristics of crops, diseases and severity, so that the capability of identifying the disease characteristics of the multi-label classification model is improved; and then optimizing the model through the MLP structure, preventing the information fusion communication module from generating model degradation when training the disease multi-label classification model, and simultaneously enabling the full-connection layer of the MLP structure to further change and transform the characteristics of the crop disease, so that the multi-label disease data is better fitted to the model, and improving the characteristic expression capability of the information fusion communication module.
S34, inputting the low-dimensional characteristics with the disease space information obtained in the S32 and the high-dimensional disease semantic characteristics obtained in the S33 into multi-scale information fusion for fusion, and obtaining a fusion result X2;
s35, fusing the result X 2 Input to the multi-tag predictive decoding module and encode X in combination with initialization features within the multi-tag predictive decoding module 3 And obtaining a multi-label classification result.
In some embodiments, the S32 is formulated as follows:
Output=Conv2d(TB…TB(Conv2d(image))
wherein Output represents a feature containing disease space information, conv2d () represents a convolution operation of a convolution layer, TB represents a transducer module, and image represents crop disease data.
In some embodiments, the S33 is formulated as follows:
Q 1 =X 1 W Q1
K 1 =X 1 W K1
V 1 =X 1 W V1
Multihead 1 =Concat(head 1 ,head 2 ,…,head n )W O
wherein X is 1 Representing the input sequence, i.e. the low dimensional features with disease space information;W Q1 、W K1 、W C1 respectively a first query matrix Q 1 Is a learnable parameter matrix, a first key matrix K 1 Is a learnable parameter matrix, a first value matrix V 1 Is a matrix of learnable parameters; attention (-) represents a single head Attention mechanism; softmax (. -%) represents the normalization process; k (K) 1 T Representing a first key matrix K 1 Is a transposed matrix of (a); d, d k Representing a first key matrix K 1 Is a dimension of (2); concat (-) indicates a splice operation; head part i 、head n Respectively representing the ith self-attention and the nth self-attention; w (W) O Representing a learnable mapping matrix; multihead represents multi-head attention. In order to improve the performance of the self-attention mechanism, multiple self-attention heads are used i Concat stitching is performed, i represents the identification of self-attention in multi-head attention, through a leachable mapping matrix W O Deriving a first multi-headed self-attention mechanism multi-head 1 。W i Q1 、And->Respectively represent a first query matrix Q 1 Is the i-th matrix of learnable parameters, the first key matrix K 1 Is the i-th matrix of leachable parameters and the first matrix of values V 1 Is the i-th matrix of learnable parameters.
In some embodiments, the step S35 specifically includes the following steps:
s351, self-adaptive initialization feature encoding X stored in multi-label predictive decoding module 3 As input to the third multi-headed self-attention mechanism, by a matrix W of learnable parameters Q3 、W K3 、W V3 Obtaining a third query matrix Q 3 Third key matrix K 3 And a third value matrix V 3 The method comprises the steps of carrying out a first treatment on the surface of the Calculate a third query matrix Q 3 And a third key matrix K 3 And combining the similarity of the first multi-head self-attention mechanism to obtain an output result of the first multi-head self-attention mechanismThe formula is as follows:
Q 3 =X 3 W Q3
K 3 =X 3 Q K3
V 3 =X 3 W V3
Multihead 3 =Concat(head 1 ,head 2 ,…,head n )W O
wherein W is Q3 、W K3 、W V3 Respectively represent a third query matrix Q 3 Is a matrix of learnable parameters, a matrix K of third keys 3 A learnable parameter matrix and a third value matrix V 3 Is a matrix of learnable parameters; w (W) i Q3 、Respectively represent a third query matrix Q 3 Is the i th matrix of learnable parameters, the third key matrix K 3 Is the i-th matrix of leachable parameters, the third matrix of values V 3 Is the i-th matrix of learnable parameters; multihead 3 Representing a third multi-headed self-attention mechanism; k (K) 3 T Representing a third key matrix K 3 Is a transposed matrix of (a);
s352, then the result of the third multi-head attention mechanismInputting into the formula (1) to obtain a second query matrix Q 2 The formula (1) is specifically as follows:
wherein W is Q2 Representing a second query matrix Q 2 Is a matrix of learnable parameters;
s353, fusing result X obtained in S34 2 Inputting into the formula (2) and the formula (3) to obtain a second key matrix K 2 And a second value matrix V 2 The method comprises the steps of carrying out a first treatment on the surface of the The formulas (2) and (3) are specifically as follows:
K 2 =X 2 W K2 (2)
V 2 =X 2 W V2 (3)
wherein W is K2 Representing a second key matrix K 2 Is a matrix of learnable parameters; w (W) V2 Representing a second matrix of values V 2 Is a matrix of learnable parameters;
matrix Q of second query 2 Second key matrix K 2 And a second value matrix V 2 Inputting the first multi-head attention mechanism into the formulas (4) to (6) to obtain an output result of the second multi-head attention mechanism:
Multihead 2 =Concat(head 1 ,head 2 ,…,head n )W O (6)
wherein, multitead 2 Representing a second multi-headed attention mechanism; k (K) 2 T Representing a second key matrix K 2 Is a transposed matrix of (a); w (W) i Q2 、And->Respectively represent the second query matrix Q 2 Is the i-th matrix of leachable parameters, the second key matrix K 2 Is the i-th matrix of learnable parameters, a second matrix of values V 2 Is the i-th matrix of learnable parameters;
s354, inputting the output result of the second multi-head self-attention mechanism into a second multi-layer sensor to obtain a multi-label classification result.
The LDI-NET network is classified by utilizing multiple labels, and the multi-label classification is different from the single-label classification, so that the classification complexity can be reduced, and the model branch number is not increased; the multi-label classification mode combines the low classification complexity of the multi-task network and the absence of additional branches of the single-task network.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Moreover, the technical solutions of the embodiments of the present invention may be combined with each other, but it is necessary to be based on the fact that those skilled in the art can implement the embodiments, and when the technical solutions are contradictory or cannot be implemented, it should be considered that the combination of the technical solutions does not exist, and is not within the scope of protection claimed by the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.