Intelligent segmentation method for aero-engine hole detection image damage based on context coding network
Technical Field
The invention belongs to the technical field of aeroengine hole detection, in particular to an intelligent segmentation method for aeroengine hole detection image damage based on a context coding network, which is an engineering application of a deep neural network structure and a preprocessing method for a data set in the technical field of flaw detection.
Background
The engine, as a core component in an aircraft, has a significant impact on flight safety. When the engine works, the internal temperature is high, and the pressure is high, so that various damages such as cracks, burnthrough and the like often occur to the internal structure of the engine. If the damage can not be found in time, the safety of civil aviation flight can be seriously threatened. Therefore, civil aviation companies use various detection modes to discover potential safety hazards of the engine structure in time.
Engine hole probing is one of the important detection methods. A hole detection technician extends the hole detection camera into the engine, shoots pictures, videos and the like in the engine, searches for cracks, burnthrough and other damages in the corresponding pictures and videos, and finally forms a hole detection report to provide guidance for further maintenance and repair work. However, the hole probing technique is time and labor consuming, and the hole probing of one engine takes tens of hours. And is influenced by subjective factors of hole detection personnel, and the accuracy rate is limited. With the economic development and the urbanization process of China being accelerated, the domestic and foreign airlines have been rapidly increased in recent years. The traditional hole detection technology has the defects of limited efficiency and precision and high labor cost, and can not meet the current high-rise engine hole detection requirement.
Disclosure of Invention
The invention aims to provide an intelligent segmentation method for the hole detection image damage of an aircraft engine, which has higher precision, higher speed and less occupied memory and processor resources, and the design idea of the technical scheme of the invention is as follows:
(1) collecting aeroengine hole detection image samples, marking each sample, constructing an aeroengine hole detection image semantic segmentation data set, dividing the data set into a training set, a verification set and a test set according to a certain proportion (such as 8: 1: 1);
(2) building a deep convolutional neural network, wherein the deep convolutional neural network consists of three parts, the first part is a feature extraction sub-network, the second part is a multi-scale context information extraction sub-network, and the third part is a feature expansion sub-network;
(3) preprocessing an aeroengine hole detection image to be detected;
(4) training a deep convolutional neural network by using the data set in the step (1), evaluating the network performance by using a performance evaluation function, and storing convolutional neural network parameters which reach preset indexes and have the best performance;
(5) inputting the image processed in the step (3) into a feature extraction sub-network for feature extraction to obtain a high-level feature vector capable of representing the input image;
(6) inputting the high-level feature vector obtained in the step (5) into a multi-scale context information extraction sub-network;
(7) inputting the feature vector obtained in the step (6) into a feature expansion sub-network to obtain a feature vector with the same spatial size as the input image in the step (5);
(8) and (4) generating a prediction label image by using the feature vector obtained in the step (7).
The multi-scale context information extraction sub-network consists of two parts, which are respectively: (1) the cavity convolution module has the advantages that all dimensions of input characteristic vectors and all dimensions of output characteristic vectors of the cavity convolution module are the same, five paths are formed from the input characteristic vectors to the output characteristic vectors, and the five paths are connected in parallel.
The first path is convolved with a convolution kernel with a void rate of 1 and a size of 3x 3;
the second path is convoluted by a convolution kernel with a void rate of 3 and a size of 3x3 and a convolution kernel with a void rate of 1 and a size of 1x1 in sequence;
the third path is convoluted by sequentially using a convolution kernel with a void rate of 1 and a size of 3x3, a convolution kernel with a void rate of 3 and a size of 3x3, and a convolution kernel with a void rate of 1 and a size of 1x 1;
the fourth path is sequentially convolved by a convolution kernel with a void rate of 1 and a size of 3x3, a convolution kernel with a void rate of 3 and a size of 3x3, a convolution kernel with a void rate of 5 and a size of 3x3, and a convolution kernel with a void rate of 1 and a size of 1x 1;
the fifth path is identity mapping and directly outputs the input;
all convolution operation steps in the five paths are all 1;
(2) the multi-scale pooling module comprises an input feature vector and an output feature vector, wherein the space dimensions of the input feature vector and the output feature vector of the module are the same, the number of channels of the input feature vector is 4 more than that of channels of the output feature vector, the pooling window size is 1/1, 1/2, 1/3, 1/4 and 1/7 of the space dimension of the input feature vector of the module, the pooling operation with the same step length and the pooling window size is performed, 5 pooling operations are connected in parallel, the input feature vector is directly subjected to pooling operation, the feature vectors obtained by pooling are respectively subjected to upsampling, the space dimension of the feature vectors is restored to be the same as that of the input feature vectors, and then the feature vectors and the input feature vectors are stacked according to the channel dimensions.
The feature extraction sub-network comprises a plurality of volume blocks, each volume block in the first two volume blocks comprises two convolution layers using the rectifying linear unit activation function and a maximum pooling layer, and the subsequent volume blocks comprise three convolution blocks using the rectifying linear unit activation function and a maximum pooling layer.
The feature expansion sub-network comprises a plurality of convolution blocks, each convolution block comprises an up-sampling operation and a stacking operation, the feature vectors obtained by the up-sampling operation and the output of the convolution blocks of the corresponding level in the feature extraction sub-network are stacked together according to the channel dimension, two convolution layers using a rectification linear unit activation function are arranged after the feature vectors and the output of the convolution blocks of the corresponding level in the feature extraction sub-network, the convolution kernel size is 1x1, the number of output channels is the number of damage categories plus one, and the convolution layers of the softmax activation function are matched.
The data preprocessing comprises various affine transformations, brightness, saturation and contrast adjustment, integral linear change and nonlinear transformation on an image with dark brightness, histogram equalization on an image with uneven exposure and image fusion by using a mixup method.
The deep neural network is trained by dividing a training set into a plurality of batches and inputting the batches into the deep neural network to obtain the output of the network, and then outputting and inputting the network into a graphDice-loss function based on dice coefficients for image correspondence
In the formula, p represents the prediction class probability of all pixels in all the aeroengine hole detection images in each batch, and q represents the real class of all the pixels in the label images corresponding to all the aeroengine hole detection images in each batch;
adding an l2 regularization term to the loss function, the l2 regularization term being:
the objective function after adding the l2 regularization term is:
in the formula J represents the objective function,
for the dice loss function, m represents the number of all pixels in all the aeroengine hole detection images in each batch, λ represents the L2 regularized hyper-parameter, and L represents the number of convolution layers in the deep neural network model;
calculating the gradient of each model parameter change in the deep neural network model according to the objective function based on a back propagation method, and adjusting the value of each model parameter in the deep neural network model according to the calculated gradient value by using an optimization method;
the performance evaluation function includes, but is not limited to, three performance evaluation indexes, namely, a pixel accuracy PA, an average coincidence ratio MIOU, and a frequency weighted coincidence ratio FWIOU. In the prior art, there are many types of performance evaluation functions, and the above three types are selected in the technical scheme.
In the three formulas, k represents the number of categories of pixels in the aeroengine hole detection image (the number of categories is damage category number +1), and piiRepresenting the total number p of pixels with the same type as the real type of the pixels in the label image corresponding to the aeroengine hole detection image in the type with the maximum pixel prediction type probability in each batch of aeroengine hole detection imagesijThe total number of pixels with j class as the class with the maximum probability of predicting the class of the pixels in each batch of aeroengine hole detection images and i class as the real class of the pixels in the label images corresponding to the aeroengine hole detection images, pjiThe category with the maximum probability of pixel prediction category in each batch of aeroengine hole detection images is i-type and the total number of pixels with the real category of j-type of pixels in the label images corresponding to the aeroengine hole detection images.
The invention has the beneficial effects that:
the technical scheme has higher precision and speed, and occupies less memory and processor resources.
Drawings
FIG. 1 is a schematic flow diagram of an embodiment of the method.
Detailed Description
As shown in fig. 1, two examples of the present embodiment are:
example 1
The example is divided into two stages, namely a training stage and a use stage, and it is to be noted that the following damage types include damage types such as cracks and burn-through; but also includes the category of non-invasive, i.e., non-invasive.
The training phase is divided into the following steps:
acquiring a hole detection image sample of the aeroengine, wherein the acquired sample comprises images of all positions of the aeroengine, including images with one or more damages at the same time and images without damages; the acquired image can be an image with the number of channels more than or equal to one in any color mode;
the image preprocessing of the step (1.2) converts the image obtained in the step (1.1) into the same storage format, so as to facilitate the following unified processing, and then performs image cleaning to remove abnormal shot images, for example: if there are two or more images with high blur degree and not focused sufficiently, only one image is kept. Selecting an image with darker overall brightness, and redistributing image pixel values through histogram equalization to enable the number of pixels of each brightness level in each color channel to be approximately the same;
and (3) image labeling, namely labeling all the images obtained in the step (1.2) one by using any image labeling tool (such as labelme), determining the total number N of the damage types before labeling, giving a unique class label value to each damage type from 1 to N, labeling the labels of all the pixels in the non-damage area in the image as 0 when labeling one image, and labeling the labels of all the pixels in each damage type area as respective class label values. And generating a label image according to a corresponding method provided by the marking tool. The tag image and the original image storage file name should correspond.
And (1.4) dividing a data set, regarding an original image and a label image corresponding to the original image as a divided minimum unit, and dividing all the minimum units into a training set, a verification set and a test set according to a certain proportion (such as 8: 1: 1).
And (1.5) building a deep neural network, and using an arbitrary deep learning framework, such as: the deep neural network comprises three parts, namely a feature extraction sub-network, a multi-scale context information extraction sub-network and a feature expansion sub-network.
The feature extraction sub-network comprises a plurality of volume blocks, each volume block in the first two volume blocks comprises two convolution layers using the rectifying linear unit activation function and a maximum pooling layer, and the subsequent volume blocks comprise three convolution layers using the rectifying linear unit activation function and a maximum pooling layer.
The multi-scale context information extraction sub-network consists of two parts, which are respectively: (1) the cavity convolution module has the advantages that all dimensions of input characteristic vectors and all dimensions of output characteristic vectors of the cavity convolution module are the same, five paths are formed from the input characteristic vectors to the output characteristic vectors, and the five paths are connected in parallel. The first path is convolved with a convolution kernel with a void rate of 1 and a size of 3x 3; the second path is convoluted by sequentially using a convolution kernel with a void rate of 3 and a size of 3x3 and a convolution kernel with a void rate of 1 and a size of 1x 1; the third path is convoluted by sequentially using a convolution kernel with a void rate of 1 and a size of 3x3, a convolution kernel with a void rate of 3 and a size of 3x3, and a convolution kernel with a void rate of 1 and a size of 1x 1; the fourth path is sequentially convolved by a convolution kernel with a void rate of 1 and a size of 3x3, a convolution kernel with a void rate of 3 and a size of 3x3, a convolution kernel with a void rate of 5 and a size of 3x3, and a convolution kernel with a void rate of 1 and a size of 1x 1; the fifth path is identity mapping and directly outputs the input; all convolution operation steps in the five paths are all 1;
(2) the multi-scale pooling module comprises an input feature vector and an output feature vector, wherein the space dimensions of the input feature vector and the output feature vector of the module are the same, the number of channels of the input feature vector is 4 more than that of channels of the output feature vector, the pooling window size is 1/1, 1/2, 1/3, 1/4 and 1/7 of the space dimension of the input feature vector of the module, the pooling operation with the same step length and the pooling window size is performed, 5 pooling operations are connected in parallel, the input feature vector is directly subjected to pooling operation, the feature vectors obtained by pooling are respectively subjected to upsampling, the space dimension of the feature vectors is restored to be the same as that of the input feature vectors, and then the feature vectors and the input feature vectors are stacked according to the channel dimensions.
The feature expansion sub-network comprises a plurality of convolution blocks, each convolution block comprises an up-sampling operation and a stacking operation, the feature vectors obtained by the up-sampling operation and the output of the convolution blocks of the corresponding level in the feature extraction sub-network are stacked together according to the channel dimension, two convolution layers using a rectification linear unit activation function are arranged after the feature vectors and the output of the convolution blocks of the corresponding level in the feature extraction sub-network, the convolution kernel size is 1x1, the number of output channels is the number of damage categories plus one, and the convolution layers of the softmax activation function are matched.
The feature extraction sub-network may include three or more convolution blocks, and one convolution block may be connected to another convolution block after another convolution block, on the premise that the length and width of the spatial scale of the feature vector output by the convolution block are both greater than or equal to two. The feature expansion sub-network comprises the same number of convolution blocks as the feature extraction sub-network. The upsampling operation in the feature extension sub-network described above may be bilinear interpolation, nearest neighbor interpolation, or transposed convolution.
The number of the volume blocks in the feature extraction sub-network and the feature expansion sub-network is used as a hyper-parameter, and is positively correlated with the number of images in the data set, the number of damage types and the difficulty degree of damage detection in the images.
And (1.6) training the deep neural network, dividing all the images in the training set divided in the step (1.4) into a plurality of batches, wherein the total number of samples of each batch is N, performing data amplification on the images of each batch and the corresponding label images, and then performing onehot coding on the label images.
And (4) sending all samples of one batch into the deep neural network built in the step (1.5) to obtain the output feature vector.
Then inputting the output characteristic vector and the onehot coded label images of the batch into a loss function together to obtain an error;
then calculating the gradient of trainable parameters of each layer in the deep neural network;
then, optimization is performed by using an optimizer with a set learning rate.
When all batches have passed through the above process, one round is completed. And (3) dividing all the images in the verification set into a plurality of batches after each round is finished, wherein the total number of samples in each batch is M, performing onehot coding on the label image in each sample, and sending all the samples in one batch into the deep neural network built in the step (1.5) to obtain the output feature vector.
And then inputting the output characteristic vector and the onehot coded label images of the batch into a loss function and a performance evaluation function together to obtain an error and a performance index, and storing the error and the performance index into an array.
And finishing all the batches in the verification set after the above process.
And calculating an array mean value of the error array and the performance index array, and storing parameters with the best performance and the model. And presetting the maximum number of training rounds, and stopping training when the number of the training rounds reaches the maximum number of the training rounds after multi-round training. And a learning rate automatic attenuation strategy is used during training.
The data augmentation includes random scrambling of the sample, various random affine transformations, and a range, for example: 1 + -0.4, random brightness, saturation, contrast adjustment, and mixup image fusion. It should be noted that the brightness, saturation, and contrast adjustment are performed on the original image separately, and other operations need to be performed on the original image and the label image at the same time, and the specific implementation needs to set the same random seed for random transformation, so as to ensure that the same random operation is performed on the original image and the corresponding label image in each sample.
The mixup image fusion method specifically operates as follows, firstly, N (N is the total number of samples in each batch) random numbers lambda (α can take other values) are generated according to a beta distribution with α being 1 and β being 1, then, all samples in one copy of all samples in the current batch are cloned, all samples in the copy of the copy are randomly shuffled again, and finally, fusion is carried out according to the following formula.
In the above formula, λ isA random number as described above, (x)
i,y
i) Is one sample in the current batch, i ═ 1,2, …, N; (x)
j,y
j) J is one sample of the clone of the current batch, 1,2, …, N;
is a new sample generated after fusion.
The testing stage is divided into the following steps:
and (2.1) loading the best-performance network and parameters stored in the step (1.6) and loading the parameters into the network.
Step (2.2) dividing all images in the test set into a plurality of batches, wherein the total number of samples in each batch is M, performing onehot coding on the label image in each sample, and sending all samples in one batch into the deep neural network built in the step (1.5) to obtain output characteristic vectors;
and then inputting the output characteristic vector and the onehot coded label images of the batch into a loss function and a performance evaluation function together to obtain an error and a performance index, and storing the error and the performance index into an array.
All batches in the test set are finished after the above process.
And calculating the mean value of the error array and the performance index array, and judging whether the performance index of the network reaches a preset standard. If the standard is met, returning to the step (1.5) if the standard is not met, adjusting the hyper-parameters, and repeating the above processes until the performance indexes on the test set meet the standard.
Example 2
The example is divided into two stages, namely a training stage and a use stage, and it is to be noted that the following damage types include damage types such as cracks and burn-through; but also includes the category of non-invasive, i.e., non-invasive.
The training phase is divided into the following steps:
acquiring a hole detection image sample of the aeroengine, wherein the acquired sample comprises images of all positions of the aeroengine, including images with one or more damages at the same time and images without damages; the acquired image can be an image with the number of channels more than or equal to one in any color mode;
the image preprocessing of the step (1.2) converts the image obtained in the step (1.1) into the same storage format, so as to facilitate the following unified processing, and then performs image cleaning to remove abnormal shot images, for example: if there are two or more images with high blur degree and not focused sufficiently, only one image is kept. Selecting an image with darker overall brightness, and redistributing image pixel values through histogram equalization to enable the number of pixels of each brightness level in each color channel to be approximately the same;
and (3) image labeling, namely labeling all the images obtained in the step (1.2) one by using any image labeling tool (such as labelme), determining the total number N of the damage types before labeling, giving a unique class label value to each damage type from 1 to N, labeling the labels of all the pixels in the non-damage area in the image as 0 when labeling one image, and labeling the labels of all the pixels in each damage type area as respective class label values. And generating a label image according to a corresponding method provided by the marking tool. The tag image and the original image storage file name should correspond.
And (1.4) dividing a data set, regarding an original image and a label image corresponding to the original image as a divided minimum unit, and dividing all the minimum units into a training set, a verification set and a test set according to a certain proportion (such as 8: 1: 1).
And (1.5) building a deep neural network, and using an arbitrary deep learning framework, such as: the deep neural network comprises three parts, namely a feature extraction sub-network, a multi-scale context information extraction sub-network and a feature expansion sub-network.
The feature extraction sub-network comprises a plurality of convolution blocks, each convolution block in the first two convolution blocks comprises two convolution layers using a rectification linear unit activation function, a batch regularization layer is arranged behind each convolution layer, the last convolution block is a maximum pooling layer, the last convolution block comprises three convolution layers using rectification linear unit activation functions, a batch regularization layer is arranged behind each convolution layer, and the last convolution block is a maximum pooling layer.
The multi-scale context information extraction sub-network consists of two parts, which are respectively: (1) the cavity convolution module has the advantages that all dimensions of input characteristic vectors and all dimensions of output characteristic vectors of the cavity convolution module are the same, five paths are formed from the input characteristic vectors to the output characteristic vectors, and the five paths are connected in parallel. The first path is convolved with a convolution kernel with a void rate of 1 and a size of 3x 3; the second path is convoluted by sequentially using a convolution kernel with a void rate of 3 and a size of 3x3 and a convolution kernel with a void rate of 1 and a size of 1x 1; the third path is convoluted by sequentially using a convolution kernel with a void rate of 1 and a size of 3x3, a convolution kernel with a void rate of 3 and a size of 3x3, and a convolution kernel with a void rate of 1 and a size of 1x 1; the fourth path is sequentially convolved by a convolution kernel with a void rate of 1 and a size of 3x3, a convolution kernel with a void rate of 3 and a size of 3x3, a convolution kernel with a void rate of 5 and a size of 3x3, and a convolution kernel with a void rate of 1 and a size of 1x 1; the fifth path is identity mapping and directly outputs the input; all convolution operation steps in the five paths are all 1;
(2) the multi-scale pooling module comprises an input feature vector and an output feature vector, wherein the space dimensions of the input feature vector and the output feature vector of the module are the same, the number of channels of the input feature vector is 4 more than that of channels of the output feature vector, the pooling window size is 1/1, 1/2, 1/3, 1/4 and 1/7 of the space dimension of the input feature vector of the module, the pooling operation with the same step length and the pooling window size is performed, 5 pooling operations are connected in parallel, the input feature vector is directly subjected to pooling operation, the feature vectors obtained by pooling are respectively subjected to upsampling, the space dimension of the feature vectors is restored to be the same as that of the input feature vectors, and then the feature vectors and the input feature vectors are stacked according to the channel dimensions.
The feature expansion sub-network comprises a plurality of convolution blocks, each convolution block comprises an up-sampling operation and a stacking operation, the feature vectors obtained by the up-sampling operation and the output of the convolution blocks of the corresponding level in the feature extraction sub-network are stacked together according to the channel dimension, two convolution layers using a rectification linear unit activation function are arranged after the feature blocks, a batch regularization layer is arranged after each convolution layer, the last convolution layer of the expansion sub-network is a convolution kernel with the size of 1x1, the number of output channels is the number of damage categories plus one, and the convolution layers are matched with a softmax activation function.
The feature extraction sub-network may include three or more convolution blocks, and one convolution block may be connected to another convolution block after another convolution block, on the premise that the length and width of the spatial scale of the feature vector output by the convolution block are both greater than or equal to two. The feature expansion sub-network comprises the same number of convolution blocks as the feature extraction sub-network. The upsampling operation in the feature extension sub-network described above may be bilinear interpolation, nearest neighbor interpolation, or transposed convolution.
The number of the volume blocks in the feature extraction sub-network and the feature expansion sub-network is used as a hyper-parameter, and is positively correlated with the number of images in the data set, the number of damage types and the difficulty degree of damage detection in the images.
And (1.6) training the deep neural network, dividing all the images in the training set divided in the step (1.4) into a plurality of batches, wherein the total number of samples of each batch is N, performing data amplification on the images of each batch and the corresponding label images, and then performing onehot coding on the label images. And (4) sending all samples of one batch into the deep neural network built in the step (1.6) to obtain the output feature vector. And then inputting the output feature vector and the onehot coded label images of the batch into a loss function together to obtain an error, then calculating the gradient of trainable parameters of each layer in the deep neural network, and then optimizing by using an optimizer with a set learning rate. When all batches have passed through the above process, one round is completed. And (3) dividing all the images in the verification set into a plurality of batches after each round is finished, wherein the total number of samples in each batch is M, performing onehot coding on the label image in each sample, and sending all the samples in one batch into the deep neural network built in the step (1.6) to obtain the output feature vector. And then inputting the output characteristic vector and the onehot coded label images of the batch into a loss function and a performance evaluation function together to obtain an error and a performance index, and storing the error and the performance index into an array. And finishing all the batches in the verification set after the above process. And calculating an array mean value of the error array and the performance index array, and storing parameters with the best performance and the model. And presetting the maximum number of training rounds, and stopping training when the number of the training rounds reaches the maximum number of the training rounds after multi-round training. And a learning rate automatic attenuation strategy is used during training.
The data augmentation includes random scrambling of the sample, various random affine transformations, and a range, for example: 1 + -0.4, random brightness, saturation, contrast adjustment, and mixup image fusion. It should be noted that the brightness, saturation, and contrast adjustment are performed on the original image separately, and other operations need to be performed on the original image and the label image at the same time, and the specific implementation needs to set the same random seed for random transformation, so as to ensure that the same random operation is performed on the original image and the corresponding label image in each sample.
The mixup image fusion method specifically operates as follows, firstly, N (N is the total number of samples in each batch) random numbers lambda (α can take other values) are generated according to a beta distribution with α being 1 and β being 1, then, all samples in one copy of all samples in the current batch are cloned, all samples in the copy of the copy are randomly shuffled again, and finally, fusion is carried out according to the following formula.
In the above formula, λ is one of the above random numbers, (x)
i,y
i) Is one sample in the current batch, i ═ 1,2, …, N; (x)
j,y
j) J is one sample of the clone of the current batch, 1,2, …, N;
is after fusionAnd generating a new sample.
The testing stage is divided into the following steps:
and (2.1) loading the best-performance network and parameters stored in the step (1.6) and loading the parameters into the network.
And (2.2) dividing all images in the test set into a plurality of batches, wherein the total number of samples in each batch is M, performing onehot coding on the label image in each sample, and sending all samples in one batch into the deep neural network built in the step (1.6) to obtain the output feature vector. And then inputting the output characteristic vector and the onehot coded label images of the batch into a loss function and a performance evaluation function together to obtain an error and a performance index, and storing the error and the performance index into an array. All batches in the test set are finished after the above process. And calculating the mean value of the error array and the performance index array, and judging whether the performance index of the network reaches a preset standard. If the standard is met, returning to the step (1.5) if the standard is not met, adjusting the hyper-parameters, and repeating the process until the performance indexes on the test set meet the standard.
The invention is not the best known technology.
The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.