CN120580439B

CN120580439B - Real-time detection method for disease severity of various fruit leaves

Info

Publication number: CN120580439B
Application number: CN202511080429.0A
Authority: CN
Inventors: 曹丽英; 姜冬辉; 李树龙; 王银鹏; 曹建堃; 崔儒翰
Original assignee: Jilin Agricultural University
Current assignee: Jilin Agricultural University
Priority date: 2025-08-04
Filing date: 2025-08-04
Publication date: 2025-09-30
Anticipated expiration: 2045-08-04
Also published as: CN120580439A

Abstract

A real-time detection method for disease severity of various fruit leaves. Belonging to the technical field of neural network target detection. The method solves the technical problem of how to evaluate the disease severity of the fruit leaves more accurately. The method comprises the steps of constructing a real-time detection model of the lesion position of the blade, wherein the real-time detection model of the lesion position of the blade comprises an encoder and a decoder, the encoder is constructed by taking a transducer module as a main body, an expanding characteristic module is arranged in a characteristic extraction stage, a characteristic weighting fusion module is arranged in a characteristic fusion stage, and the decoder is constructed by taking a multi-layer perceptron as a main body. Compared with the existing segmentation method, the method provided by the invention has more excellent segmentation effect on different fruit leaf data sets.

Description

Real-time detection method for disease severity of various fruit leaves

Technical Field

The invention belongs to the technical field of neural network target detection, and particularly relates to a real-time detection method for disease severity of various fruit leaves.

Background

The fruit is rich in a great amount of vitamins and other nutrient elements required by human body. However, due to environmental and weather influences, diseases on the leaves of the fruit trees frequently occur, so that photosynthesis of the fruit trees is greatly weakened, the yield of the fruit trees and the quality of the fruits are influenced, and serious economic effects are brought to fruit farmers.

Therefore, the disease condition of the fruit trees can be accurately analyzed, and the prevention work can be timely carried out, so that various losses of the growers can be effectively reduced. The conventional method for diagnosing crop diseases is to observe symptoms and spots thereof with naked eyes and hear the opinion of an expert, which is difficult and labor-intensive for a wide variety of large orchards and also increases economic costs for growers.

Therefore, research on fruit tree disease detection is of great importance to agriculture, and image segmentation is achieved by assigning labels to each pixel, which helps to understand and analyze the disease condition more deeply. By segmenting the disease, the accuracy of disease diagnosis can be improved and disease trends can be tracked and monitored in real time. Can effectually help grower and plant protection personnel in time know the health condition of crop, formulate effectual control measure, prevent disease transmission to reduce the use of pesticide, ensure the safety of food.

In early smart agriculture, traditional machine learning methods identified and located target leaves and lesions by extracting characteristic information such as color, morphology or texture information from images and setting specific thresholds. With the continuous development of deep learning, researchers gradually apply deep learning to the agricultural field and achieve good results.

However, there are still many difficulties in coping with the segmentation of pathological images of different fruit tree leaves in a truly complex environment. In the aspect of the blade (1), the curling and folding of the blade can cause shadow and further increase the difficulty of segmentation. (2) Blade edge shape specificity makes edge feature extraction challenging. (3) Alternating overlapping of the blades causes difficulty in global feature extraction of the blades. In the aspect of disease spots, the strong and dark transformation of illumination intensity can lead to the blurring of the boundaries of the disease spots, the spot color is similar to the leaf color, and the segmentation result is inaccurate. (2) In areas of small disease spot intensity, spots and sticky segmentations are easily missed. (3) The characteristics of the spot are blurred and difficult to extract due to reflection after illumination in the rainy environment.

Therefore, how to efficiently overcome the problems of various leaf and spot characteristics, shielding and overlapping and the like in different environments and realize accurate segmentation, so that the more accurate evaluation of the disease severity of the fruit leaves is a current technical problem in the field.

Disclosure of Invention

In order to solve the technical problems, the invention provides a real-time detection method for the disease severity of various fruit leaves.

The method comprises the following steps:

s1, data acquisition, namely acquiring lesion images of various fruit leaves and preprocessing the lesion images;

S2, constructing a data set, namely dividing the preprocessed fruit leaf lesion image into a training set, a verification set and a test set;

S3, constructing a model, namely constructing a leaf lesion position real-time detection model, wherein the leaf lesion position real-time detection model comprises an encoder and a decoder, the encoder is constructed by taking a transducer module as a main body, an expanding feature module is arranged in a feature extraction stage, and a feature weighting fusion module is arranged in a feature fusion stage;

S4, training a model, namely training a real-time detection model of the lesion position of the blade by adopting the constructed data set, and adjusting model parameters until the model meets detection requirements;

S5, detecting the lesion position of the fruit leaf to be detected in real time by adopting a trained leaf lesion position real-time detection model, and estimating the disease severity of the fruit leaf by utilizing the ratio of the area of the lesion position of the fruit leaf to the total area of the fruit leaf.

The preprocessing is specifically to uniformly adjust the lesion images of the fruit leaves to be the same size, manually mark the categories, and perform random data enhancement operation on the lesion images of the fruit leaves.

Further, the specific structure of the encoder is that after the feature map is input, the data is divided into two paths, one path passes through the expanding feature module, the other path sequentially passes through the overlapped patch embedding module and the transform module, the two feature maps generated after the two paths of data processing are input into the feature weighted fusion module to be subjected to feature fusion, and the fused features sequentially pass through the three transform modules and then are input into the decoder.

The specific structure of the expanding feature module is that after the feature map is input, a new feature channel is generated through a first layer 1 multiplied by 1 convolution module, then a receptive field is expanded through a cavity convolution module, then information compression and integration are carried out through a second layer 1 multiplied by 1 convolution module to output a feature map containing diseases, then feature fusion is carried out on the input feature map and the feature map containing the diseases, a SiLU activation function layer and a batch normalization layer are introduced between the first layer 1 multiplied by 1 convolution module and the cavity convolution module, and a SiLU activation function layer and a batch normalization layer are introduced between the cavity convolution module and the second layer 1 multiplied by 1 convolution module.

The method comprises the steps of inputting a feature map, extracting context semantic features and local features through a depth multi-scale attention mechanism module, enhancing feature expression capability of the transducer module through a Mix-FFN module, inputting a patch merging layer, downsampling the input feature map, expanding receptive fields gradually, and integrating multi-scale information.

The depth multiscale attention mechanism module comprises a depth multiscale module, a multi-head attention mechanism module and a local feature capturing module, wherein the depth multiscale module is used for capturing texture and morphological features of different diseases, outputting output features with global contexts, and the depth multiscale module is used for capturing local features of different scales by convoluting the convolution check features of different sizes.

Further, the data processing performed in the multi-head attention mechanism module specifically includes:

Generating a query matrix by linear transformation Key matrixSum matrixBy means ofObtaining attention weights, wherein,Representation ofThe function of the function is that,Representing the dimension of the attention head, and then passingObtaining output features with global context, wherein,Representing a random deletion of the neuronal operation,Representing the result of the projection back to the original dimension through the linear layer,Indicating the operation of the transpose,The size of the batch is indicated and,The sequence of the sequences is represented and,The number of channels is indicated and the number of channels is indicated,Representation ofIs a transposed matrix of (a).

Further, the data processing performed in the depth multi-scale module specifically includes:

checking input features using convolutions of different sizes Performing convolution, wherein the output results of the convolution operations are converged into a multi-scale convolution list,, wherein,Indicating volume and size ofFor input featuresThe output after convolution is carried out, and the output of all convolution kernels is added to obtain the aggregation characteristicWill aggregate the featuresCompression into a vector by global averaging poolingMapping S to a smaller dimension through a fully connected layerThrough another full connection layerMapping to andThe same dimension, thereby generating a weight matrixUsing a weight matrixAnd each weight of the convolution kernels is used for carrying out weighted combination on the characteristics output by the convolution kernels with different sizes to obtain a combined characteristic diagram, and the combined characteristic diagram is output after passing through a full connection layer.

Further, the data processing performed in the feature weighted fusion module specifically includes:

Two input feature maps AndSplicing in the channel dimension to generate a joint feature map, processing the joint feature map by adopting a 1×1 convolution operation to generate a weighted weight map, wherein the generated weighted weight map is subjected to the following steps ofAfter the nonlinear transformation of the activation function, the weight value is ensured to be between 0 and 1, and a weighted weight graph is obtainedFeature map by means of weighted summationAndFusing to obtain fusion characteristics,, wherein,Representing batch normalization operations and fusing featuresAnd sequentially performing 1×1 convolution operation and batch normalization operation to obtain a final output characteristic diagram.

Further, the decoder is composed of two multi-layer perceptron modules connected in sequence.

The method has the beneficial effects that:

A leaf lesion position real-time detection model is constructed, and a single-layer parallel fusion architecture is adopted for the model, and dense characteristic representations of different scales are extracted on two paths, so that the extraction capacity of the lesion characteristics of different forms is remarkably improved, and the condition of lesion omission is reduced. The depth multiscale attention mechanism module further extracts multi-level features by introducing the depth multiscale module, and realizes global modeling by utilizing position information provided by the depth multiscale module, and meanwhile, the computation complexity is kept low.

The expansion feature module and SegFormer module are also innovatively provided to work cooperatively to extract global and local features, so that more refined segmentation of the edges of the blades and the lesions is realized, and tiny spots are effectively extracted. The feature weighting fusion module fuses shallow features containing detail and edge information with deep features rich in semantic information through a self-adaptive weighting strategy, so that accurate feature recovery is realized.

Experimental results show that compared with the existing segmentation method, the method provided by the invention has more excellent segmentation effect on different fruit leaf data sets, and proves that the leaf lesion position real-time detection model has stronger generalization capability and robustness, thereby providing effective technical support for pathological image analysis of various fruit leaves. Notably, the leaf lesion location real-time detection model is superior to most existing models in Params and FLOPs, and the inference speed at the local server and edge devices meets the actual agricultural needs.

Researchers verify the performance of the model in experiments containing 6 fruits and 7 disease leaf datasets, and simultaneously perform generalization tests on two grape diseases on PLANT VILLAGE datasets, so that the strong robustness and generalization capability of the model are further verified. Future research will focus on further improving the accuracy of the model and reducing the computational cost, while conducting extensive experiments on more crop disease leaf datasets, promoting the practical application of the method in smart agriculture.

Drawings

FIG. 1 is a schematic diagram of manual labeling of raw data in an embodiment of the present invention;

FIG. 2 is a manual labeling schematic diagram of brown spot and black rot of grapes in an embodiment of the invention;

FIG. 3 is a schematic diagram of the result of enhancing a dataset picture according to an embodiment of the present invention;

FIG. 4 is a diagram showing a real-time detection model of a lesion position of a blade according to an embodiment of the present invention;

FIG. 5 is a block diagram of an EFM extended features module in accordance with an embodiment of the present invention;

FIG. 6 is a block diagram of a DM-Attention depth multiscale Attention mechanism in an embodiment of the present invention;

FIG. 7 is a block diagram of a FWFM feature weighted fusion module in an embodiment of the invention;

FIG. 8 is a graph comparing the results of the disease segmentation visualization of apple rust, apple alternaria leaf spot and Li Zigong leaf spot for different models in an embodiment of the present invention;

FIG. 9 is a graph comparing the results of different models of the present invention for visualizing the disease segmentation of white rot and pear scab;

Fig. 10 is a graph comparing the results of visualizing the disease segmentation of mango brown spot and punica granatum tail spot in the example of the present invention;

FIG. 11 is a graph showing the comparison of the results of visualization of disease segmentation of grape brown spot and grape black rot in the examples of the present invention;

FIG. 12 is a visual thermodynamic diagram of a single area of attention for six fruit leaves and seven lesions in an embodiment of the present invention;

FIG. 13 is a visual thermodynamic diagram of six fruit leaves and seven lesions in an embodiment of the present invention;

FIG. 14 is a graph showing the evaluation results of the severity of disease of five fruits with different severity according to the embodiment of the present invention;

fig. 15 is a main interface diagram of a system for dividing fruit tree leaf diseases in an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

Example 1,

The embodiment provides a real-time detection method for disease severity of various fruit leaves, which is characterized by comprising the following steps:

EXAMPLE 2,

This example is a further limitation of example 1. Step S1 will be further described. In total, 1417 pieces of image data of 7 kinds of diseases of 6 fruits were collected in this example, including images of 4 kinds of fruits in a real environment and data of 2 kinds of fruits in an experimental environment. The data under the real environment are acquired from a Jilin agricultural university teaching scientific research base, the acquisition time is from 7 months to 9 months of fruit tree fruiting period in 2024, the fruit tree fruiting period is divided into two periods of noon and evening, and the fruit tree fruiting period is shot according to a fixed distance of 5-10 cm and simultaneously comprises image data after rain. In order to enhance generalization of the model, disease images of 2 crops of mango and pomegranate are selected from the disclosed crop disease and pest identification dataset PLANT VILLAGE. In addition, grape brown spot and black rot were individually selected as separate validation sets to further evaluate the generalization and robustness of the model.

Because the pixel sizes of the images are inconsistent in the acquisition process, the images are uniformly adjusted to 512×512 pixels in order to ensure data consistency. The original data is manually labeled by means of professional semantic segmentation labeling software Labelme for model learning and training. The noted dataset covers 7 diseases of 6 fruit crops, totaling 13 categories, and is stored in JSON format. The labeling patterns are shown in fig. 1, wherein (a) - (f) respectively represent disease samples of different fruits, and colors of different categories correspond to sub-division categories in the task and are used for clearly labeling the distinction of the categories.

To verify the generalization ability of the model, brown spot and black rot of grapes in PLANT VILLAGE dataset were finally selected as the verification dataset. All images were uniformly adjusted to 256×256 pixels and individually labeled. The labels of each image are stored in a JSON format, and the classification is shown in figure 2, wherein (a) is grape brown spot (b) is grape black rot, and four colors represent different classes. Although the color of the label in the dataset is similar to the partial color of fig. 1, the two are independent of each other due to the different sources of the dataset, so that no interference or influence is caused.

In the target training process, if the training data set is small in scale, the deep learning network is easy to generate an overfitting phenomenon, namely the model is good in training data, but poor in unseen data, so that the generalization capability of the network is reduced. To solve this problem, we have performed random data enhancement operations on the collected image data, such as rotation, cropping, contrast adjustment, and black 10 increase20 Mask blocks, etc. The method not only effectively increases the diversity of training data, but also furthest reduces the problem of unbalanced distribution of training samples, thereby improving the adaptability and the robustness of the model in different scenes. The enhancement results are shown in fig. 3, wherein (a) is image inversion, (b) is image contrast adjustment, (c) is image cropping, and (d) is adding of black mask blocks.

EXAMPLE 3,

This example is a further limitation of example 1. Step S3 will be further described.

Aiming at the characteristics of various fruit leaf diseases, the invention provides a SegFormer-based leaf Disease position real-time detection model (referred to as "Disease-Seg") which combines the advantages of CNN and Transformer on characteristic representation and aims to accurately identify different types of leaf diseases. FIG. 4 shows the overall structural design of the Disease-Seg. The model consists of an encoder and a decoder, wherein the encoder provides a single-layer parallel fusion encoder structure, and the advantages of the traditional CNN and the traditional Transformer are combined, so that the effective fusion of local characteristics and global context information is realized under the condition of not increasing network depth, and meanwhile, missing characteristics can be reduced to the greatest extent.

In detail, the feature maps generated by two different encoders are { of the original imageResolution. And the output per stage is the {16,32,64,160,256} channel size. The EFM expansion feature module expands receptive fields through cavity convolution, increases the perceptibility of local information, and does not increase the number of model parameters.

And FWFM, the feature weighted fusion module (Feature Weighted Fusion Module) is used for self-adaptive weighted fusion, so that redundant information brought by simple splicing can be avoided, and the effectiveness of feature expression is enhanced. Meanwhile, a DM-Attention depth multi-scale Attention mechanism is provided, the advantages of the traditional Attention mechanism for effectively extracting global context information are combined with multi-scale depth separable convolution and point-by-point convolution, and meanwhile, the superposition of the depth separable convolution with different scales is adopted, so that the parameter number and calculation cost of a model can be effectively reduced, and the extraction capability of multi-scale features can be obviously improved.

In the decoder section, by extracting the multi-scale feature map from the encoder and byDownsampling at a rate of (2) and then fusing and upsampling to the original size, thereby producingAnd (5) dividing the result. Wherein the method comprises the steps ofIs the number of channels, in this embodiment 3.

EXAMPLE 4,

This example is a further limitation of example 3. The EFM extension feature module is further described. Aiming at the problem of small spots on fruit tree leaves or characteristic blurring caused by uneven illumination, an improved EFM (electronic defect management) expansion characteristic module is provided, and the aim of improving the accuracy of disease area detection through fine network design is achieved. The model is used for capturing and extracting disease features with different scales through a series of designed convolution operations, and is particularly suitable for early small disease spots on fruit tree leaves, and a structural diagram of the EFM expansion feature model is specifically shown in figure 5.

First, the module sizes the input feature map using a1×1 convolution toWill be in the aspect of channelMapping to. This operation generates new feature channels by weighted summing all input channels for each spatial position, while keeping the spatial dimensions unchangedAnd avoid excessive parameters while maintaining feature information richness.

Then, a hole convolution (Dia Conv) is introduced, and the expansion ratio is set to 2, and the input isThe feature map of the convolution kernel is subjected to downsampling for 2 times at a spatial level, and meanwhile, the receptive field of the convolution kernel is obviously enlarged on the premise of not adding additional parameters, so that the model is helped to effectively capture a large range of local features.

In the network structure, siLU (signature-WEIGHTED LINEAR Unit) activating functions are introduced after each layer of convolution to enhance the nonlinear transformation capability, the input value is smoothly increased from the area near 0, and the characteristic makes the network more sensitive to the subtle changes of the blade edge or the disease boundary, and the segmentation accuracy is effectively improved. Particularly, siLU can enable the segmentation of the model in the disease area to be finer and more accurate when the disease edge is processed.

In addition, batch Normalization (BN) layers are added to the network for standardizing the feature map. This operation helps to reduce statistical differences between different channels, thereby preventing overfitting and improving the robustness of the model in the presence of illumination changes or lesions in different parts of the blade.

Finally, the module compresses and integrates the disease information in terms of channels through another layer of 1×1 convolution to generate an output characteristic diagram containing the disease, and high-quality input is provided for characteristic fusion and prediction of the subsequent modules.

;

Wherein, the In order to input the tensor,In the form of a convolution kernel,To bias and then pairThe Batch Normalization operations are performed:

;

Next, apply SiLU activation functions:

;

then, carrying out hole convolution operation:

;

In the form of a convolution kernel, Is biased.

Reapplication Batch Normalization and SiLU activate functions:

;

finally, a1×1 convolution operation is performed to compress and integrate the feature information:

。

The design enhances the capturing capability of the model to disease features through multi-level convolution and activation functions, and particularly can effectively improve the segmentation accuracy and robustness of the model when the details of the disease spots are processed.

EXAMPLE 5,

This embodiment is a further definition of embodiment 3, further illustrating the DM-Attention depth multiscale Attention mechanism module.

The attention mechanism in SegFormer employs a reduction ratio R to reduce the length of the sequence when considering the problem of identifying the edges of the lesion and the detail features, thereby reducing the computational complexity. However, different reduction ratios may lead to feature loss, especially for the capture of detail and edge information in the image.

Thus, to overcome this problem, a new approach is proposed, named DM-Attention deep multiscale Attention mechanism. The method has the core idea that the input channels are independently processed by utilizing the depth separable convolution, so that the computational complexity of the suppression model is effectively reduced.

On this basis, multi-scale features are gradually extracted through a DMSM-depth Multi-scale Module (Deep Multi-scale Module), and finally, a richer and diversified feature representation is obtained while the number of parameters is kept low. By the design, the DM-attribute can not only reduce the calculation and memory consumption, but also maintain the efficient capturing of detail and edge information. In particular, in the task of dealing with agricultural disease segmentation and the like, which requires careful feature expression, the structure diagram is shown in fig. 6.

First, input features are processed through a multi-headed attention mechanism. The mechanism generates a query matrix by linear transformationKey matrixSum matrixAnd focus on the disease area at different angles according to the plurality of heads.

The design enhances the detail extraction capability of the model in the detection of the diseases of the fruit tree leaves, can effectively capture the texture and morphological characteristics of different diseases, and is especially suitable for irregular disease shapes on the leaves. Specifically, the model first inputs features through a Linear layer (Linear)Mapping into a query matrixSubsequently, the first and second substrates were subjected to a vacuum,Is adjusted so that each attention head (head) focuses on a different characteristic channel, as shown below:

;

Wherein, the Indicating the operation of the transpose,The arrangement operation is represented by a sequence of steps,Representing a mapping operation, the three operations collectively implementing the functions to be inputMapping into a query matrix,The size of the batch is indicated and,The sequence of the sequences is represented and,The number of channels is indicated and the number of channels is indicated,Indicating the number of heads.

Key matrixSum matrixIs generated by a method and a query matrixTwo independent matrices are obtained by the linear layer andThe division is performed.

;

Wherein, the Representing partitioning operations, querying matricesSum key matrixAfter dot product, scaling by a scaling factor, and thenThe function gets the attention weight:

;

Wherein, the Representing the dimension of the attention head. By using a matrix of attention weight pairsWeighting, and projecting the obtained result back to the original dimension through a linear layer to finally obtain the output characteristic with the global contextThe formula is as follows:

;

Wherein, the Representing a random deletion of the neuronal operation,Representing the result of the projection back to the original dimension through the linear layer,Representing the transpose operation.

Next, to further increase feature expression, DM-Attention introduced DMSM depth multiscale modules that convolve features using convolution kernels of different sizes (1×1, 3×3, 5×5, 7×7) to capture local features of different scales. The output results of the multiple convolution operations are aggregated into a multi-scale convolution listThe calculation formula is as follows:

;

the resulting different scale features are then added to produce an aggregate feature I.e. a superposition of all convolution outputs, is used to represent the overall multi-scale information.

;

Wherein, the Representing the length of the multi-scale convolution list.

Is compressed into a vector by global average poolingThereby obtaining the global features of the input:

;

Wherein, the AndThe height and width of the feature map, respectively.

Through a full connection layerWill beMapping to a smaller dimensionThe calculation burden is reduced, the reasoning speed is increased, and the subsequent weighting process is guided:

;

through another full-connection layer Will beMapping to andThe same dimension, thereby generating a weight matrix(I.e)

;

Weight matrixEach weight of (3)For weighted combination of features of different convolution kernel sizes to obtain a combined feature mapThe specific formula is as follows:

;

Finally, combined feature map And outputting after passing through a full connection layer. The DM-attribute further enhances the adaptability of the model to different disease forms, textures and structures, thereby improving the overall performance of fruit tree disease detection. The disease area can be detected more accurately and the fine disease features on the blade can be captured.

EXAMPLE 6,

This example is a further limitation of example 3. The FWFM feature weighted fusion module is further described. In a single layer parallel encoder architecture, the FWFM feature weighted fusion module is intended to efficiently integrate features from both encoders in an adaptive manner, since different encoders focus on different aspects of the extracted features. The detailed structure is shown in fig. 7.

Two input feature mapsAndStitching in the channel dimension to generate a joint feature map containing all information from both encodersThe combined feature map is processed using a 1 x 1 convolution operation to generate a weighted weight map that is used to weight fuse the input features in a subsequent step. The generated weighted weight graph passesAfter the nonlinear transformation of the activation function, the weight value is ensured to be between 0 and 1, and a weighted weight graph is obtainedSpecifically, each element of the weight map reflects the importance of the corresponding location in the final fused feature map. Feature map by means of weighted summationAndFusing to obtain fusion characteristics,, wherein,Representing batch normalization operations and fusing featuresAnd sequentially performing 1×1 convolution operation and batch normalization operation to obtain a final output characteristic diagram. The weighted fusion strategy enables the module to flexibly adjust the fusion proportion of the two feature graphs according to the contribution degrees of different feature graphs, so that the expression capacity of the final fusion feature graph is improved, and the modeling capacity of the model on complex features is enhanced.

Through the weighted fusion mechanism, the FWFM module not only can effectively combine the feature graphs of two different sources, but also can capture more context information on a plurality of layers, and finally optimize the performance of the model, especially in complex tasks needing diversified feature extraction.

EXAMPLE 7,

The embodiment provides the test effect of the real-time detection model for the lesion position of the blade on the experimental platform, so as to further explain the superiority of the detection performance of the real-time detection model.

1. Experiment platform and evaluation index:

In order to verify the effectiveness of the Disease-Seg model provided by the invention, all experiments are carried out under the condition that software and hardware environments are consistent, and the method is specifically set as follows, wherein the experiments are carried out on NVIDIA TU102 [ Geforce RTX 2080 Ti Rev. A ] GPU, and an 11 th-generation Intel CoreTM i5-11400F processor (2.60 GHz,12 cores) is mounted. The optimizer selects AdamW optimization algorithm with weight attenuation, the learning rate attenuation strategy is cosine attenuation, the momentum is set to 0.9, and the weight attenuation coefficient is The batch size was 16 and the training period was set to 300.

To verify the robustness of the model, 6 evaluation indices were used to evaluate the proposed model: cross-over ratio (IoU), average cross-over ratio (mIoU), average pixel precision (mPA), accuracy (Acc), FLOPs (Floating Point Operations), and parameters (Params). These metrics evaluate the model from various aspects.

IoU is used for evaluating the coincidence degree of the predicted segmentation result and the real result of the model on a single category, and is a key index for measuring the segmentation precision of the single category.

MIoU is used for evaluating the prediction effect of each category in the segmentation task, and the overall performance of the model can be effectively reflected by measuring the overlapping degree of the prediction region and the real region.

The mPA is used to evaluate the proportion of correctly classified pixels in each class. It can measure the pixel level prediction effect of each class.

Acc evaluates the classification correctness of the model as a whole, i.e., the proportion of all predictively correct pixels to the total pixels.

Typical indicators used in the evaluation of linear regression analysis include R2, RMSE. R2 (decision coefficient) is used to measure the goodness of fit between the model's predicted and actual values, with values closer to 1 indicating the model more accurate. RMSR (root mean square residual) then represents the average deviation between the predicted value and the actual value, the smaller the value the better.

FLOPs is used to evaluate the computational complexity of the model by measuring the computational effort of the model in the inference process. And finally judging whether the method is suitable for being deployed on the equipment with limited resources. Params is the sum of the number of parameters of all layers, which is used to measure the complexity and size of the model.

2. Experimental results

This section aims at comparing and analyzing the expression of the Disease-Seg model proposed by the present invention with several advanced and classical semantic segmentation techniques at present, involving the semantic segmentation framework mmsegmentation of which part models PSPNet, HRNetV, U-Net, deep LabV3+, segFormer, segNeXt, DANet, OCRNet and UPerNet come from open sources. These techniques represent three different architectural paradigms, CNN, pure transducer, and fused architecture of CNN and transducer. Experiments are particularly focused on the improvement of the definition of the boundary between the blade and the lesion area on the blade and the accuracy of the segmentation of the small target points.

(1) Effect comparison under different models

In order to better verify the performance of the Disease-Seg model in a real scene, as shown in tables 1 and 2, single-class subdivision is performed on samples such as apples, grapes and plums in a dataset, and the segmentation performance of each model on a specific class is counted in detail, wherein bold type is the highest index.

Table 1 shows the segmentation results of apple rust, alternaria leaf spot, grape white rot and Li Zigong spot in an outdoor real environment. In the target lesion segmentation task, the Disease-Seg model provided by the invention has the best performance on 7 types of lesions, and in the blade segmentation task, the segmentation precision is stabilized at about 98%. For example, in the segmentation of apple alternaria leaf spot, the IoU values of the Disease-Seg model were improved by 33%, 44% and 38% compared to OCRNet, UPerNet and PSPNet, respectively, even 14% higher than SegNeXt, which is optimal for the comparison method.

In the segmentation task of the remaining three lesions, the Disease-Seg model also appears prominent, with an average segmentation accuracy of 13% higher than that of SegFormer and U-Net, which perform optimally in these categories. In conclusion, the Disease-Seg model is excellent in real complex environments, can effectively overcome the influence of interference factors such as blade folding, serrated edges, light and shadow changes and the like, and achieves accurate segmentation of blades and Disease spots.

Table 1:

As shown in Table 2, the Disease-Seg model was equally excellent in the segmentation task of indoor Granati cercospora, indoor mango brown spot and outdoor pear scab.

For example, in the segmentation of the botrytis cinerea spots, models HRNetV, UPerNet, OCRNet, etc. are significantly lower in terms of accuracy of the leaf and spot segmentation than the methods herein. In the mango disease category, although SegFormer performed best in the comparative model, the IoU values for leaf and plaque segmentation were still below 1% and 16% of the methods herein, respectively. The target point segmentation performance of UPerNet is also inferior to that of the method in the pomegranate species, the leaf segmentation IoU value is 6% lower, and the spot segmentation IoU value is 15% lower.

PSPNet performs the worst in all comparison experiments, and the pyramid pooling module has advantages in capturing global background characteristics, but has poor applicability in agricultural disease segmentation tasks needing to consider local and global characteristics. Further statistical analysis shows that the method has higher accuracy and robustness in small target segmentation tasks, and provides valuable references and potential practical guidance significance for agricultural disease segmentation research.

Table 2:

In order to more fully verify the comprehensive performance of the methods herein under different scenarios, the CNN-based methods, the Transformer-based methods, and classical and efficient models based on CNN and Transformer fusion are compared. Table 3 shows the performance of the Disease-Seg model with other comparative methods under evaluation indicators such as mIoU, acc, params, FLOPs and FPS, where bold and underlined parts represent the highest and secondary values, respectively.

The characteristics of sharing weights, local connections and inductive bias for CNN-based methods enable them to learn visual representations at a faster inference rate. However, CNN lacks the ability to capture long-range dependencies, and exhibits poor segmentation performance for small diseases in a complex background. For example, deepLabV < 3+ >, HRNetV < 2 > -and mIoU of U-Net are 78.54%, 83.45% and 82.61%, respectively, 11.78%, 6.87% and 7.71% lower than the Disease-Seg model, respectively. Nevertheless, the parameters of PSPNet are 2.41M lower than the Disease-Seg model in Params, showing some of its light weight advantages.

The correlation of each position in the feature sequence with other positions is calculated by a self-attention mechanism in a transducer-based method, so that the feature representation is dynamically adjusted and global context information is captured. However, because weight distribution relies on similarity in the global scope, the transform method may ignore fine-grained structures or texture features in the absence of explicit neighborhood modeling.

For example, mIoU of UPerNet is improved by 7.79% and 7.45% respectively compared to SegFormer and SegNeXt using a lightweight decoder. Furthermore, the Disease-Seg model is 1.8 times faster than SegFormer on FPS, but 0.7 times slower than SegNeXt. Experimental results show that the Disease-Seg model can effectively capture local fine granularity semantic information, and has strong characterization capability in a Disease segmentation task with high FPS requirements.

The method based on the fusion of the Transformer and the CNN aims at realizing the complementation of the global characteristic and the local characteristic. However, failure to effectively balance accuracy improvement with parameter scale may result in imbalance in performance and efficiency of the model. The Disease-Seg model achieves an excellent tradeoff between accuracy and computational effort. In contrast, mIoU of OCRNet and UPerNet were 18.60% and 25.67% lower than the models herein, respectively, while being 7.3M, 37.17G and 36.0M, 184.75G higher in Params and FLOPs, respectively.

Although the Disease-Seg model is not optimal in terms of parameters, FLPs and FPS indexes, the overall performance is still the best in order to meet the computing demands of cloud server and edge equipment deployment. In conclusion, experimental results fully prove the effectiveness and applicability of the Disease-Seg model in the field of agricultural fruit leaf Disease image segmentation, and important technical support and theoretical reference are provided for intelligent agricultural precise Disease management.

Table 3:

A detailed comparative analysis of the results of the different models for the (C) disease segmentation visualization of apple rust (a), apple alternaria leaf spot (B) and Li Zigong spot is shown in fig. 8, where the white box line portions are the differences of the respective models. The result shows that the segmentation effect of PSPNet is not satisfactory, and the problem is mainly caused by the multi-scale pyramid pooling structure, and the loss of detail information can be caused in the downsampling process, so that the segmentation accuracy is affected.

For example, in the apple rust (A) segmentation result, white boxes mark the region, and although SegFormer, U-Net and HR-NetV2 show a certain capability on smaller disease spot segmentation, compared with label, omission of the target region still exists, and complete segmentation cannot be achieved. In the apple alternaria leaf spot (B) segmentation result, segNeXt can relatively accurately segment the leaf and the disease spot, which is close to the label graph, but in the white frame part, the judgment of the disease spot in the overlapped leaf is error.

Other models such as OCRNet, deepLabV < 3+ > and DAnet are not effective in distinguishing blade edges from overlapping areas, mainly because these models do not adequately extract blade edge features that are similar in color and overlap each other. In the segmentation result of Li Zigong spot disease (C), although the background is complex and the colors are similar, most of comparison models can accurately position and segment the spot disease, and in a white frame area, the judgment of models such as OCRNet, deepLabV & lt3+ & gt, HR-NetV & lt 2 & gt and the like at the edge part has the condition of over-segmentation.

In summary, under the conditions of complex background, similar color and different sizes of the diseased spots in the real environment, which are involved in the diagrams (A), B) and (C), the Disease-Seg model shows remarkable advantages compared with other contrast models, can accurately locate the target area, can realize fine segmentation of the diseased spots, remarkably reduces the problems of omission of the target area and loss of edge details, and fully proves the superiority and practical application potential of the method.

In the segmentation results of the white rot (D) and the pear scab (E) of the grape shown in fig. 9, the leaf blades are affected by various factors such as folding, jagged edges, morphological irregularities, background interference, and the like, resulting in a certain disadvantage in the segmentation accuracy of the existing deep learning model.

Taking the grape white rot (D) segmentation result as an example, although the illumination condition is good, models such as DeepLabV & lt3+ & gt, OCRNet, segFormer, DAnet and the like have certain defects on the whole segmentation of the blade profile, and the accurate segmentation cannot be realized mainly due to the influences of the complex blade shape, the folding shape and the background noise. Especially in the notch part of the blade, U-Net can not effectively restrain background noise, so that the segmentation of the central part is incomplete. Furthermore HRNetV2 fails to accurately delineate the serrated edge of grape leaf.

In the aspect of dividing the disease spots, most models have the phenomenon of missing disease spots due to irregular and densely distributed disease spot shapes on grape leaves, especially white frame parts in the division result of grape white rot (D). For example, deepLabV3+, U-Net, OCRNet, etc. models were not effective at identifying all lesions.

SegNext, although not missing the diseased spot, was far inferior to the Disease-Seg model for fine characterization of the diseased spot morphology. Also, in the segmentation result of pear scab (E), the precise segmentation of the leaves is challenging due to the sharp shape of the pear leaf edges. Contrast models such as OCRNet, DAnet and U-Net fail to effectively extract details of the pear leaf edges, resulting in blurring of segmentation boundaries.

In addition, small lesions in white boxes are not accurately extracted in most models, and especially models such as DeepLabV < 3+ > and HRNetV < 2 > fail to identify these local features. In contrast, the Disease-Seg model can accurately extract the fine lesion features, and the Disease-Seg model benefits from a single-layer parallel fusion architecture adopted by the Disease-Seg model, so that global context features can be effectively extracted and local details can be fused, and the accurate segmentation of the blades and the lesions is realized. In a word, the Disease-Seg model shows superiority in the aspect of processing complex leaf morphology and Disease spot distribution, is particularly excellent in the aspects of capturing details and inhibiting complex background, and remarkably improves segmentation accuracy.

The segmentation results of mango brown spot (F) and punica granatum (G) in a laboratory environment are shown in fig. 10, with major challenges arising from noise interference in the image and irregularities in the morphology of the lesions. The folding of mango leaves, tiny and different forms of disease spots, various forms of pomegranate leaves, and similar colors of the disease spots and the background, and the factors bring great difficulty for accurate segmentation.

As shown by the white box part in the segmentation result of mango brown spot (F), deepLabV3+, U-Net, OCRNet and other models only extract the difference characteristics of spots and leaves, but cannot effectively extract the shape characteristics of the diseased spots. The Disease-Seg model not only can accurately divide pixel areas of the leaf and the Disease spots, but also can carefully capture morphological characteristics of the Disease spots, and compared with other comparison models, the Disease-Seg model has obvious advantages. The white frame part in the dividing result of the Granati urosporine (G) can clearly compare the difference between the models.

Because the disease spots of the Granati cercospora disease are similar to the background characteristics, models such as OCRNet, segFormer, U-Net and the like can not effectively inhibit background noise, so that omission exists in the disease spot area. Especially in the area of the diseased spots on the right side of the white frame, these models do not accurately distinguish the diseased spots from the background, thus creating a large range of missed detection. And DeepLabV & lt3+ & gt is judged in error at the left upper corner part of the white frame, so that the segmentation result is further influenced. In contrast, the Disease-Seg model can effectively inhibit noise and accurately distinguish Disease spots from the background, so that the segmentation accuracy is remarkably improved, and the method is excellent in detail capturing and local feature extraction.

In general, the Disease-Seg model shows obvious advantages in the task of dividing the Disease spots of mangoes and pomegranates through effective noise suppression and precise extraction of morphological characteristics, and particularly has higher robustness and precision when processing complex backgrounds and tiny Disease spots.

(2) Generalization experiment

To verify the overall generalization of the model, it is compared to classical and advanced methods on PLANT VILLAGE datasets, such as CNN-based methods, transformer-based methods, and CNN-and Teansformer-based fusion methods. And simultaneously adjusting the resolution of the image different from the training set to verify the overall performance of the model under the condition of different resolutions. As shown in table 4 in particular, wherein bold and underlined parts represent the highest value and the secondary value, respectively.

The results show that the Disease-Seg model has significant advantages over other models. Specifically, HRNetV performed best overall in the comparative model, but its mIoU and Acc indices were 1.96% and 0.12% lower than the Disease-Seg model, respectively. Meanwhile, the parameter quantity (Params) and the calculated quantity (FLOPs) of HRNetV < 2 > are 24.76M and 18.70G higher than those of the Disease-Seg model respectively. U-Net and DeepLabV & lt3+ & gt are used as comparison models with better performance, and there is still room for improvement in terms of segmentation accuracy, wherein mIoU and Acc are respectively 4.65%, 0.86%, 3.48% and 0.81% lower than that of the Disease-Seg model.

In addition, the two models have a certain gap in terms of parameter scale and reasoning speed compared with the Disease-Seg model. Although PSPNet shows prominence on Params and FLOPs efficiency indexes, the method is mainly due to the adjustability of pyramid pooling modules, and the calculation cost is effectively reduced and the processing speed is improved on the premise of ensuring the performance through smaller pooling scale. However, PSPNet is not sufficiently performing in segmenting small targets, especially in agricultural disease image segmentation tasks, the pyramid pooling module's attention to global features results in limited capture of small target detail features. In contrast, segNeXt is a transducer fusion architecture, mIoU and Acc are 28.84% lower and 1.16% lower, respectively, than the Disease-Seg model.

OCRNet, UPerNet acts as a CNN and transducer fusion architecture but is not balanced in terms of integrated segmentation accuracy and model efficiency. Experimental results show that the Disease-Seg model can effectively compensate the loss of detail information while keeping lower calculation cost, so that the small Disease target can be segmented more accurately. In conclusion, the Disease-Seg model is excellent in all kinds of evaluation indexes, and the comprehensive advantages of the Disease-Seg model in complex Disease segmentation tasks are presented.

Table 4:

To more intuitively demonstrate the superiority of the methods presented herein over the PLANT VILLAGE dataset, a visual map of the segmentation results of grape brown spot and grape black rot in the dataset is presented. The results of detailed comparative analysis of the different models on the visualization of the segmentation of grape brown spot (H) and grape black rot (I) are shown in fig. 11, in which the segmentation of grape brown spot (H) as shown in the lower left corner of the white dashed box, deepLabV3+, U-Net and HRNetV2 failed to accurately extract the detailed features at the adhesion of the two spots. In contrast, the Disease-Seg model can effectively compensate for the loss of fine-grained information caused by the fusion of different resolution features. In the segmentation result of the grape black rot (I), as shown in the center part of a white dotted line frame, the Disease-Seg model shows a finer segmentation effect on the process of the characterization of the shape of the lesion compared with SegFormer and SegNeXt. The Disease-Seg model is particularly excellent in processing fine Disease spots of leaves, and particularly shows excellent performance in the aspect of edge texture of the fine Disease spots and segmentation accuracy of detail information.

(3) Attention contrast

In order to verify that the DM-Attention depth multi-scale Attention mechanism can effectively cope with complex situations in agricultural scenes, multiple Attention mechanisms (such as SE, CBAM, COT, SK, triplet and Global Context) with stronger current performance are selected for substitution comparison in experiments, so that the superiority of DM-Attention is fully proved. As shown in Table 5, the performance of the reference model was mIoU 81.70.70%, mPA 87.90% and Acc 99.09%. After DM-Attention is introduced, the model performance is obviously improved, wherein mIoU reaches 85.98%, mPA is improved to 91.31%, and Acc is 99.11%.

Three indicators were raised by 4.28%, 3.41% and 0.02% compared to the baseline model, respectively. In the comparative model, CBAM's performance was immediately after DM-Attention, which was 84.52% and 88.83% for mIoU and mPA, respectively, indicating its better feature modeling capability. However, the SE-Attention performance is slightly inferior to CBAM, with mIoU and mPA being 82.57% and 87.57%, respectively. In contrast, the performance of COT, SK and Triplet is low, showing deficiencies in modeling complex features, with the Triplet mIoU and mPA being lowest, 69.39% and 76.87%, respectively.

Global Context performed the worst, mIoU and mPA 67.51% and 75.46%, respectively, probably because it over emphasizes Global features, resulting in insufficient capture of local information. Comprehensive comparison results show that DM-Attention has remarkable performance advantages in agricultural complex scenes. The excellent performance of the method not only verifies the effectiveness of the method on complex feature modeling, but also proves the outstanding capability of the method in the aspect of processing local and global feature interaction, and provides powerful support for agricultural disease detection and segmentation tasks.

Table 5:

to further verify the effectiveness of the proposed DM-Attention, the segmentation performance of DM-Attention was analyzed from the point of view of the effectiveness of the region of interest using the Grad-CAM method with other classical and efficient Attention mechanisms, in particular, figures (a) to (f) show a single-Attention-area visual thermodynamic diagram of six fruit leaves and seven lesions.

The visual thermodynamic diagrams of six fruit diseased leaves are shown in fig. 12 (a) - (f), in which the disease attention areas of the pomegranate and mango leaves are shown. Compared with SE, SK and Triplet methods, DM-Attention can effectively focus on the outline of the whole blade and clearly distinguish the disease spots from the edge area of the blade. The graph (b) compares Global Context, CBAM and CoT methods, and the result shows that DM-Attention can accurately distinguish the characteristics of the background similar to the lesion area and accurately focus on the disease target area. In figures (c) and (d), the identification of diseases of plum and pear leaves in a complex outdoor environment is shown.

The blade edge is not clear due to shielding, background clutter and other factors. Compared with SE, SK, triplet and Global Context, DM-attribute can accurately capture the edge characteristics of the blade. Although CBAM is also capable of noticing the blade edge, it fails to effectively distinguish the lesion from the critical area of the blade edge, resulting in poor recognition of the target area. Figure (e) shows a plaque area of a grape leaf, where the leaf is folded and edge serrated, the background is more complex. CBAM and DM-Attention can be effectively focused on the target area, but DM-Attention exhibits greater robustness in handling complex backgrounds.

Panel (f) shows the plaque area of apple leaf with both half-pelagic and rust. Compared with other Attention mechanisms, the DM-Attention can pay Attention to all characteristic areas of the blade more accurately, effectively distinguish lesion areas of two different diseases, and has obviously better performance than other comparison methods. Further analysis shows that the multiple depth-separable convolutions superimposed in the DM-Attention module play an important role in enhancing the model feature extraction capability. The depth separable convolution effectively reduces the computational complexity, reserves rich detail information, and improves the identification capability of the model on different disease areas.

The result shows that the DM-Attention enhances the local feature extraction capability of the model through the introduction of the depth separable convolution, restores the connection among channels through the point-by-point convolution, and further improves the expression capability of the feature representation. And simultaneously, under the processing of space dimension reduction, the convolution operation is utilized to fuse the features with different dimensions, so that the robustness of the model under the complex background and shielding conditions is enhanced. By introducing a weighting mechanism of convolution output, the model can more accurately capture boundary information of the lesion and the blade, and the problem of blurring in the process of lesion segmentation in the traditional convolution method is avoided.

As shown in fig. 13 (a) - (f), a visual thermodynamic diagram of seven disease spots is presented, which clearly presents the differences in the behavior of the different attentional mechanisms in the segmentation of the disease area.

Wherein (a) - (f) represent, in order, guava, mango, plum, pear, grape, apple-like spot thermodynamic diagrams. (a) (b) shows samples taken in a laboratory environment, which have blurred edge features due to the similar color of the diseased spots to the background color. Traditional attention mechanisms such as SE, SK, coT and Triplet fail to focus on the target lesions in this case, but are subject to background interference. CBAM can locate the spot position more accurately, but the detail perception capability of the edge is still insufficient. In contrast, the DM-Attention provided by the invention not only can accurately capture the space position of the disease spot, but also can effectively extract the edge characteristics of the disease spot, thereby improving the segmentation precision.

In (c), (d) and (e), the samples are collected in a real outdoor environment and are significantly affected by illumination and changes in weather conditions. Under this condition, global Context was not significantly focused on the characteristics of Li Zigong plaque, and it appeared to be weaker than other methods. The CoT and Global Context cannot effectively deal with external interference, while SE and SK cannot pay attention to pear She Xixiao disease spots due to insufficient local information perception. In addition, the diseased spots of the grape leaf are prone to background interference, in which case CoT, triplet and Global Context are over-focused on non-target areas, resulting in non-ideal segmentation results.

The DM-Attention shows remarkable advantages when processing complex background and illumination changes, not only can accurately locate target disease spots, but also can effectively distinguish target and non-target areas. Panel (f) shows the symbiotic situation of apple leaf rust and alternaria leaf spot, which places higher demands on the multi-class pixelation capability of the model. (1) Representing rust areas, triplets fail to pay Attention to target disease spots effectively, while CoT, global Context and SK are subject to background interference and deviate from the target areas significantly, and (2) representing spot defoliation areas, the DM-Attention presented herein can focus the target disease areas more accurately than other methods, exhibiting excellent Attention distribution capability.

In summary, DM-Attention represents a significant advantage in both laboratory and real outdoor complex environments. The method can effectively sense the spot position of the target disease and extract edge characteristics, and has higher robustness and accuracy for the segmentation of the multi-category symbiotic diseases. Experimental results show that DM-Attention is obviously superior to the traditional Attention mechanism in different disease types and scenes, and a more accurate and reliable solution is provided for agricultural disease detection tasks.

(4) Ablation experiments

Six sets of ablation experiments were designed to verify the effectiveness of the proposed Disease-Seg model. The experiment adopts a control variable method, and EFM, DM-Attention and FWFM modules are gradually introduced to discuss the influence of the EFM, DM-Attention and FWFM modules on the adaptability of a transducer architecture and the optimization of detail characteristics. The specific experimental results are shown in table 6. After the Test1 is used as a reference model and the DM-attribute is introduced into the Test2, the segmentation accuracy of the model is obviously improved compared with the Test1, wherein the mIoU, the mPA and the Acc are respectively improved by 4.28%, 3.41% and 0.02%. After EFM is introduced into Test3, params and FLOPs change slightly, but the precision is improved, which shows that the EFM module can effectively extract local and global context characteristics.

The Test4 combines the EFM and DM-Attention modules, so that the segmentation precision is further improved, and compared with a reference model, the mIoU, the mPA and the Acc are respectively improved by 6.23%, 3.97% and 0.35%. The result shows that by expanding the local receptive field, the key effect of the fusion of local and global features in coping with the problem of the fruit tree leaf diseases in the agricultural complex scene is realized. And FWFM modules are additionally introduced into the Test5 based on the Test3, so that the problem of feature loss is effectively reduced through a weight fusion strategy, and the performance of the model is improved.

Test6 is the final architecture of Disease-Seg, and the increases in the base model over the mIoU, mPA and Acc are 8.62%, 5.75% and 0.43%, respectively, params and FLOPs are only increased by 1.06M and 2.59G. In conclusion, the Disease-Seg successfully solves the problem of fruit tree leaf diseases in an agricultural real complex environment by fusing a CNN and a transducer architecture, not only achieves higher segmentation precision, but also achieves good balance between parameter quantity and calculation complexity. The experimental result verifies the effectiveness of the proposed model and shows that compared with the use of a single module, the complete model has obvious improvement in precision and performance.

Table 6:

(5) Deployment experiments

In order to verify the practical application effect of the proposed model, inference time evaluation is performed on the resource-constrained mobile device Jetson Nano, and the Disease-Seg is compared with other models. Specific data are shown in Table 7, and the results show that the proposed Disease-Seg shows a faster reasoning speed on the Jetson Nano platform. Although PSPNet performs the fastest in terms of inference speed, its accuracy is relatively low.

It should be noted that although the DeepLabV3+, U-Net, DANet, PSPNet, UPerNet, etc. models perform well in many semantic segmentation tasks, their model structure is complex, and the calculation and storage requirements are high, so that the reasoning speed on the mobile platform is slow. In contrast, the Disease-Seg can operate efficiently on low computing power platforms such as Jetson Nano by optimizing computing and storage requirements.

In general, the preprocessing and data enhancement are combined, so that a data set suitable for the training of the Disease-Seg model is successfully created, and the superiority of the model in a semantic segmentation task is verified. The Disease-Seg not only surpasses other models in precision, but also can realize real-time reasoning on a mobile platform with limited computing capacity, thereby providing remarkable convenience for practical application.

Table 7:

(6) Disease severity assessment experiment

To more clearly elucidate the process of disease severity assessment, five fruit leaves of varying severity are listed, as shown in fig. 14. The disease coverage rate is used as a key index for measuring the disease expansion degree, and the spreading condition of the disease can be intuitively reflected, so that an effective basis is provided for the evaluation of the disease severity degree. The application of this index helps the grower to perform accurate disease management and control. The disease ratio was calculated as follows:

Wherein, the method comprises the steps of, Represents the coverage rate of the disease,The area of the lesion site of the fruit leaf is represented,Representing the area of the fruit leaf at the non-diseased location.

In order to test the actual usability of the Disease-Seg model in fruit leaf Disease segmentation, a fruit tree leaf Disease segmentation system is established, and the main interface of the system is shown in figure 15.

In the invention, the main challenge is how to efficiently overcome the problems of various blade and spot characteristics, shielding overlapping and the like in different environments, and realize accurate segmentation so as to meet the actual demands of agricultural production. The proposed Disease-Seg model shows the best performance in segmentation and diagnosis of fruit tree leaf diseases compared with the mainstream methods of various architectures. Therefore, the Disease-Seg model can provide an effective solution for fruit tree leaves and Disease spots in different scenes. Meanwhile, the FPS of the Disease-Seg model reaches 69, which shows that the model can rapidly process images, and meanwhile, the image can be tested in the edge equipment, the inference speed is 49ms, the requirement of meeting real-time processing is met, and meanwhile, the accuracy is kept.

Claims

1. A method for real-time detection of disease severity in a plurality of fruit leaves, said method comprising the steps of:

After the feature map is input, the data is divided into two paths, one path passes through an expanding feature module, the other path sequentially passes through an overlapped patch embedding module and a Transformer module, the two feature maps generated after the two paths of data processing are input into a feature weighted fusion module to be subjected to feature fusion, and the fused features sequentially pass through three Transformer modules and then are input into a decoder;

The specific structure of the expanding feature module is that after the feature image is input, a new feature channel is generated through a first layer 1 multiplied by 1 convolution module, then a receptive field is expanded through a cavity convolution module, then information compression and integration are carried out through a second layer 1 multiplied by 1 convolution module to output a feature image containing diseases, and then feature fusion is carried out on the input feature image and the feature image containing diseases;

The specific structure of the transducer module is that after the feature map is input, the context semantic feature and the local feature are extracted through a depth multiscale attention mechanism module, then the feature expression capability of the transducer module is enhanced through a Mix-FFN module;

The specific structure of the depth multi-scale attention mechanism module is that an input feature map firstly captures texture and morphological features of different diseases through a multi-head attention mechanism module and outputs output features with global context;

2. The method for detecting the disease severity of a plurality of fruit leaves in real time according to claim 1, wherein the preprocessing is specifically to uniformly adjust the lesion images of the fruit leaves to be the same size and manually mark the categories, and to perform random data enhancement operation on the lesion images of the fruit leaves.

3. The method for real-time detection of disease severity for a plurality of fruit blades according to claim 2, wherein the data processing performed in the multi-headed attention mechanism module is specifically:

4. A method for real-time detection of disease severity for a plurality of fruit blades according to claim 3, characterized in that the data processing performed in the depth multiscale module is specifically:

5. The method for real-time detection of disease severity of a plurality of fruit blades according to claim 4, wherein the data processing performed in the feature weighted fusion module is specifically:

6. The method for real-time detection of disease severity in a plurality of fruit blades according to claim 5, wherein said decoder is comprised of two sequentially connected multi-layered sensor modules.