Disclosure of Invention
The invention aims to provide an image denoising method based on a wide convolution neural network, and solves the problems of low denoising performance and long training time of the conventional neural network image denoising method.
The technical scheme adopted by the invention is that the image denoising method based on the wide convolution neural network is characterized by comprising the following steps of;
step 1, constructing a network WCNN;
step 2, training a network WCNN;
step 2.1, setting a data set comprising a training set, a verification set and a test set;
step 2.2, setting parameters for training the WCNN;
and 2.3, setting a training platform of the network WCNN.
The present invention is also characterized in that,
the network WCNN constructed in step 1 includes 10 subnets, which are ResNet1, ResNet2, ResNet3, ResNet4, ResNet5, ResNet6, UNet1, UNet2, UNet3, and densneen 1, respectively; ten wavelet sub-bands are obtained by wavelet tri-layer decomposition of the image, the ten wavelet sub-bands are HH1, LH1, HL1, HH2, LH2, HL2, HH3, LH3, HL3 and LL3 respectively, and 10 sub-nets correspond to feature maps of the ten wavelet sub-bands respectively responsible for learning the image.
The specific steps of each subnet in the network WCNN constructed in step 1 are as follows:
step 1.1, firstly, six subnets of ResNet1, ResNet2, ResNet3, ResNet4, ResNet5 and ResNet6 are designed; the six sub-networks are correspondingly responsible for training HH1, LH1, HL1, HH2, LH2 and HL2 fine sub-bands obtained by decomposing the first layer and the second layer of the wavelet; adopting a ResNet structure, utilizing residual learning to directly estimate noise, and estimating a denoised wavelet sub-band through jumping; wherein three subnets, ResNet1, ResNet2, and ResNet3, are comprised of 6 standard convolutional layers, and ResNet4, ResNet5, and ResNet6 are comprised of 8 standard convolutional layers;
step 1.2, then designing three subnets of UNet1, UNet2 and UNet 3; the three subnets are responsible for training HH3, LH3 and HL3 fine subbands obtained by wavelet third-layer decomposition, and adopt a UNet structure, and have 6 convolutional layers in total, wherein 4 convolutional layers are formed by convolution obtained by expansion convolutional and standard convolution operation;
step 1.3, designing a DenseNet subnet; it is responsible for training LL3 rough sub-band obtained by wavelet third-layer decomposition, adopts DenseNet structure, and is composed of 4 dense blocks containing 3-layer convolution;
step 1.4, designing a loss function of each subnet;
and step 1.5, performing inverse wavelet transform on the ten wavelet sub-bands processed by each sub-network when the loss function of each sub-band reaches an optimal value, and obtaining an image with clear details and cleanness.
The step 1.4 is specifically as follows:
step 1.4.1, loss function of wavelet transform coarse sub-band adopts mean square error measurement MSEl;
Wherein x (i, j) and y (i, j) represent the estimated image and corresponding, respectively, wavelet coefficient values for the net image, and c, w, and h represent the channel, width, and height, respectively, of the input subband pair;
step 1.4.2, calculating a loss function of a wavelet transform fine sub-band, introducing a weight factor delta and an adjustment factor beta into a mean square error measurement index (1) formula, and calculating a loss function MSE of the fine sub-bandhThe following were used:
wherein the weighting factor δ is calculated by:
here, the ave table is an average value of wavelet coefficients of each fine subband, the average value ave of each subband coefficient is calculated by equation (4), and the adjustment factor β is calculated by equation (5):
where σ represents the noise intensity.
As the noise level increases in step 1.4, the amplitude of the noise in the subband increases and may be greater than the average of the subband coefficients; to prevent these noise coefficients, which are larger than the average of the subband coefficients, from being enhanced, an adjustment factor β is used to intervene; if the variance σ of the noise level is above 45, then the subband coefficient value is not less than 1.2 times the average value and is considered as the image detail coefficient, giving a weight of δ 1.1; thereby suppressing coefficients less than 1.2 times the average, which are considered to represent noise information; the ave of each fine subband is different and is closely related to the noise coefficient and the feature coefficient of each subband.
The training set in step 2.1 consists of 800 images of the data set DIV2K, 200 images of the data set BSD, and 4744 images of the data set WED.
The validation set in step 2.1 consists of the images in dataset RNI5 and 300 images of dataset DIV 2K.
The test Set in step 2.1 consists of images in the data Set CSet8 and images in Set 12.
The size of the images in the training set in step 2.2 is set to 256 × 256, gaussian noise with a specific noise level, i.e., σ ═ 5, 15, 25, 35, 45, 55, 65, and 75, is added to the clean images, generating 256 × 8000 image pairs, with the network model labeled WCNN1 for the former and WCNN2 for the latter, respectively, trained using noise images with low noise intensity superimposed, i.e., σ ≦ 45, and noise images with high noise level superimposed, i.e., 45< σ ≦ 75; during testing, when the variance of the noise intensity of the tested noise image is not more than 45, denoising by using a network WCNN 1; if the variance of the noise intensity of the test noise image is greater than 45, the noise is removed by the WCNN2 network.
Step 2.3, building a WCNN network on a TensorFlow framework, updating the WCNN network by using an Adam optimizer, wherein an activation function is a ReLU, and the learning rate of all subnets is initially set to be 9 multiplied by 10-4And after every 16 cycles, the learning rate is reduced by one third, and the NVIDIA RTX 2080Ti is used for training the WCNN network.
The invention has the beneficial effects that:
1. the invention places a CNN on the sub-bands of different scales and different directions of the image, which is beneficial to fully learning the image characteristics and details of each scale and each direction, thereby inhibiting speckle noise and simultaneously maintaining high resolution of the image.
2. Each sub-network constituting the network WCNN has its own structure and loss function, ensuring that each wavelet sub-band of the noise image is most similar to the corresponding sub-band of the clean image after network training.
3. Each wavelet sub-band concentrates image features of a specific dimension in a specific direction, and therefore, having a small number of convolution layers or simple sub-nets is sufficient to capture and learn the image features contained in each wavelet sub-band, and can effectively suppress noise in each sub-band.
4. The sub-networks used for training the wavelet sub-bands are independent from each other, and the sub-networks can run on a single computer or a plurality of computers in parallel, so that the network training time is shortened.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention relates to an image denoising method based on a wide convolution neural network, which is characterized in that when an image is denoised, the image is converted into a plurality of wavelet sub-bands for denoising through wavelet decomposition, each wavelet sub-band is denoised by using a CNN to learn the characteristic mapping of the wavelet sub-band with a specific direction, a specific scale and a small size and inhibiting the noise coefficient contained in the wavelet sub-band, so that a plurality of CNNs with different independent structures are arranged on each wavelet sub-band of the image, the image detail characteristic with a certain direction and a certain scale in each sub-band is captured, and the noise in a certain intensity range is removed by using a group of learning parameters, so that the best balance is obtained between the image denoising performance and the network training time.
The invention relates to an image denoising method based on a wide convolution neural network, which is implemented by the following steps:
step 1, constructing a network WCNN;
the network WCNN constructed in step 1 includes 10 subnets, as shown in fig. 1, namely ResNet1, ResNet2, ResNet3, ResNet4, ResNet5, ResNet6, UNet1, UNet2, UNet3 and densneet 1; ten wavelet sub-bands are obtained through wavelet three-layer decomposition of the image, the ten wavelet sub-bands are HH1, LH1, HL1, HH2, LH2, HL2, HH3, LH3, HL3 and LL3 respectively, and 10 sub-nets correspond to feature mapping of the ten wavelet sub-bands respectively responsible for learning the image;
the specific steps of each subnet in the network WCNN constructed in step 1 are as follows:
step 1.1, firstly, six subnets of ResNet1, ResNet2, ResNet3, ResNet4, ResNet5 and ResNet6 are designed; the six sub-networks are correspondingly responsible for training HH1, LH1, HL1, HH2, LH2 and HL2 fine sub-bands obtained by decomposing the first layer and the second layer of the wavelet; adopting a ResNet structure, utilizing residual learning to directly estimate noise, and estimating a denoised wavelet sub-band through jumping; wherein the three subnets ResNet1, ResNet2, and ResNet3 are comprised of 6 standard convolutional layers, and ResNet4, ResNet5, and ResNet6 are comprised of 8 standard convolutional layers, as shown in FIG. 2;
step 1.2, then designing three subnets of UNet1, UNet2 and UNet 3; the three sub-networks are responsible for training HH3, LH3 and HL3 fine sub-bands obtained by wavelet third-layer decomposition, and 6 convolutional layers are formed in total by adopting a UNet structure, wherein 4 convolutional layers are formed by convolution obtained by extended convolutional and standard convolution operation, and are shown in FIG. 3;
step 1.3, designing a DenseNet subnet; it is responsible for training the coarse LL3 subband obtained by the third-layer decomposition of the wavelet, and is composed of 4 dense blocks containing 3-layer convolution by adopting a DenseNet structure, as shown in fig. 4;
step 1.4, a loss function of each subnet is designed.
Step 1.4.1, loss function of wavelet transform coarse sub-band adopts mean square error measurement MSEl;
Wherein x (i, j) and y (i, j) represent the estimated image and corresponding, respectively, wavelet coefficient values for the net image, and c, w, and h represent the channel, width, and height, respectively, of the input subband pair;
step 1.4.2, calculating a loss function of a wavelet transform fine sub-band, introducing a weight factor delta and an adjustment factor beta into a mean square error measurement index (1) formula, and calculating a loss function MSE of the fine sub-bandhThe following were used:
wherein the weighting factor δ is calculated by:
here, the ave table is an average value of wavelet coefficients of each fine subband, the average value ave of each subband coefficient is calculated by equation (4), and the adjustment factor β is calculated by equation (5):
where σ represents the noise intensity;
as the noise level increases, the amplitude of the noise in the subband increases and may be greater than the average of the subband coefficients; to prevent these noise coefficients, which are larger than the average of the subband coefficients, from being enhanced, an adjustment factor β is used to intervene; if the variance σ of the noise level is above 45, then the subband coefficient value is not less than 1.2 times the average value and is considered as the image detail coefficient, giving a weight of δ 1.1; thereby suppressing coefficients less than 1.2 times the average, which are considered to represent noise information; it is noted that the ave of each fine subband is different and is closely related to the noise coefficient and the feature coefficient of each subband.
And step 1.5, performing inverse wavelet transform on the ten wavelet sub-bands processed by each sub-network when the loss function of each sub-band reaches an optimal value, and obtaining an image with clear details and cleanness.
Each subnet of the WCNN network in step 1 is a wavelet of the respective independent training image, and the number of convolution layers of each subnet is small, so that the WCNN network has 10 independent feature extraction and learning channels. Thus, the performance of the WCNN network in suppressing noise and capturing image features is improved by extending the network width rather than deepening the network depth; because each subnet of the WCNN is independent, the subnets can be trained on a plurality of computers in parallel, so that the training time of the WCNN can be obviously shortened under the condition of not influencing the network performance, the loss function of each subnet is related to the characteristic of the wavelet subband coefficient trained by the subnet, and the contradiction between image detail preservation and noise removal is favorably coordinated.
Step 2, training a network WCNN;
and 2.1, setting a data set comprising a training set, a verification set and a test set.
The training set consists of 800 images of the data set DIV2K, 200 images of the data set BSD, and 4744 images of the data set WED;
the validation set consists of the images in dataset RNI5 and 300 images of dataset DIV 2K;
and (3) test set: consisting of images in the dataset CSet8 and images in Set 12;
step 2.2, setting parameters for training the WCNN;
the size of the images in the training set is set to 256 × 256, gaussian noise with a certain noise level, i.e., σ ═ 5, 15, 25, 35, 45, 55, 65, and 75, is added to the clean images, generating 256 × 8000 image pairs, with the network model labeled WCNN1 and WCNN2, respectively, trained using noise images with low noise intensity, i.e., σ ≦ 45, and using noise images with high noise level, i.e., 45< σ ≦ 75, respectively;
during testing, when the variance of the noise intensity of the tested noise image is not more than 45, denoising by using a network WCNN 1; if the variance of the noise intensity of the test noise image is larger than 45, denoising by using a WCNN2 network;
step 2.3, setting a training platform of the network WCNN;
building a WCNN network on a TensorFlow framework, updating by using an Adam optimizer, setting an activation function to be ReLU, and initially setting the learning rate of all subnets to be 9 multiplied by 10-4After every 16 cycles, the learning rate is reduced by one third, and the NVIDIA RTX 2080Ti is used for training the WCNN network, which takes about 7 hours.
Examples
The invention aims to provide an image denoising method based on a wide convolutional neural network, which aims to improve the denoising performance of the convolutional neural network and reduce the training time, so that the performance of the WCNN is tested and verified through experiments. Firstly, the advantages brought by independent training of each subnet are inspected; secondly, researching the influence of different sub-networks forming the WCNN on the image denoising quality; third, the impact of the loss function on the WCNN performance was investigated.
The effect of the basic components on the WCNN performance was demonstrated by ablation studies of three experiments. And finally, selecting five representative denoising methods as comparison baselines, and comparing and comprehensively analyzing the denoising effect of the WCNN method: one is a wavelet-based CNN denoising method (MWCNN 1), three CNN-based methods (DnCNN 2, UDNet 3 and FFDNet 4), and a representative conventional method (BM3D 5).
Ablation experiment
1. Independent training subnet study
In the part, it is verified that training each subnet independently can not only shorten training time, but also ensure denoising quality. One advantage of the WCNN is that it can be divided into several sub-networks, which learn the feature mapping of the sub-bands in parallel on different computers, respectively, and the trained sub-bands are integrated by Haar inverse wavelet transform to obtain a clear and clean image, where the WCNN trained in this way is denoted as WCNN-1; the WCNN is a model for learning all wavelet subband feature mappings on a single computer, and here, the WCNN is compared with BM3D, DnCNN, MWCNN, FFDNet and UDNet denoising benchmark methods, and table 1 shows watermark transparency performance index results of these methods.
TABLE 1 comparison of different method run times and PSNR (dB)/SSIM/IFC index
Table 1 shows the GPU run time and PSNRs/SSIMs/IFCs when WCNN and the comparison method are applied to 200 grayscale images from the DIV2K dataset, adding σ 25 noise, and both WCNN-1 and WCNN achieve the best performance at relatively low execution time compared to the most advanced denoising method, from which it can be seen that the training time and execution time of WCNN are slightly more than WCNN-1, because each subnet of WCNN-1 can learn their feature maps in parallel on multiple computers, while each subnet of WCNN can only run on one computer; due to the multi-scale multi-directional decomposition of the wavelet, the sub-bands of the wavelet not only have common directional characteristics, such as larger coefficients of horizontal directional edges in the LH sub-band, but also have smaller sizes, and each sub-band does not lose any detail characteristics due to clipping. Thus, there is no need for a deep CNN with multiple convolutional layers to capture the characteristics of these subbands, and each subnet has its own loss function, and the parameters of the subnets can be controlled and adjusted to ensure that each estimated subband is very similar to the subband of a clean image, which ensures that the WCNN can obtain higher PSNRs/SSIM at a relatively fast speed than other comparative methods, even if the WCNN is running on one computer.
2. Subnet structure study
As shown in fig. 5(a) -5(c), the positive impact of the subnet structure on the WCNN performance is shown; WCNN-2 represents a variation of WCNN in which all subnets are designed with the structure of ResNet, i.e., the structure shown in fig. 2, with ten convolutional layers per subnet. The performance of MWCNN is used as a base line, the image is subjected to multi-layer wavelet decomposition, and all obtained wavelet sub-bands are input into one CNN for feature learning and training; FIGS. 5(a) -5(c) show a comparison of proposed WCNN method, WCNN-2 method, and MWCNN method in terms of PSNRs/SSIMs/IFCs; these data are from the average of 200 denoised images; by comparing fig. 5(a) -5(c), it is shown that the performance of WCNN is significantly better than that of MWCNN, and the performance of WCNN-2 is slightly higher than that of MWCNN method; this shows that the strategy of training different sub-bands by adopting different CNN structures can significantly improve the denoising performance of the WCNN.
3. Study of loss function
The following experiments were performed: each subnet uses the same loss function, namely equation (1) learns the feature mapping of the subnet, and the WCNN is expressed as WCNN-3; UDNet is considered herein as a comparison baseline because it is able to process images with a range of noise levels using a single network; three networks WCNN-3, WCNN and UDNet were trained using 1000 images from BSD and WED datasets with an image plus noise variance σ of 45; the 200 images were then tested using these three networks, with a noise variance σ of 5 for 40 images, a noise variance σ of 15 for 40 images, a noise variance σ of 25 for 40 images, a noise variance σ of 35 for 40 images, and a noise variance σ of 45 for 40 images.
As shown in fig. 6(a) -6(c), the probability distributions of PSNR, SSIM and IFC gains for these 200 images are shown; the white histogram represents the distribution of index values obtained by the method WCNN-3 compared with the standard method UDNet, and one part of the index values obtained from 200 images is lower than the value obtained by the standard method UDNet and is distributed on the left half part of an abscissa 0; the black bar graph represents the distribution of the index values obtained by the WCNN method compared to the UDNet method, the index values obtained from 200 images are substantially higher than those obtained by UDNet, and almost all the values are distributed in the right half of the abscissa 0; these gain values are obtained from the baseline of WCNN and WCNN-3 relative to UDNet, and the PSNR/SSIM/IFC gains in FIGS. 6(a) -6(c) illustrate that the performance of WCNN far exceeds that of UDNet, and that WCNN-3 performs slightly less than UDNet, indicating that different loss functions can significantly improve the performance of WCNN when WCNN uses a trained set of parameters to handle a range of noise.
Second, Performance comparison with comparison methods
To fully verify the performance of WCNN, the quality of WCNN and other methods when processing images with σ ═ 5, 15, 25, 35, 45, 55, 65, or 75 noise were investigated. Furthermore, the performance of the WCNN +, which is another variation of the WCNN, in which the number of convolutional layers in each subnet is increased to 15, three evaluation values of PSNR, SSIM and IFC obtained by these methods, i.e., an average value of 20 gray images and an average value of 20 color images, as shown in table 2, visual qualities of the denoised images, the gray maps Parrot and the color map Comic, respectively, as shown in fig. 7(a) -7(g) and fig. 8(a) -8(g), are compared for effects, and a target region of interest (ROI) enlarged by bicubic interpolation (x 2) is displayed at a corner for comparing detailed features of the denoised images.
TABLE 2 PSNR (dB)/SSIM/IFC index obtained by different methods
The WCNN in Table 2 gives the best numerical results, and the method not only has the highest average PSNRs, but also has relatively high SSIM and IFCs; high PSNRs indicate that the denoised image is closest to the original clean image, and high SSIM and IFC values indicate that: these methods can recover the edge and texture details, as shown in fig. 7(a) -7(g) and fig. 8(a) -8(g), the visual quality obtained from the WCNN is quite excellent, some minor artifacts appear only at certain edges, and furthermore, the evaluation results based on PSNRs/SSIMs/IFCs are better when more convolutional layers are used per subnet of the WCNN, and the method of the present invention is superior to the current state-of-the-art denoising method.
The invention relates to an image denoising method based on a wide convolution neural network, which improves the image denoising performance by training and learning wavelet sub-bands for each sub-network in parallel so that the WCNN network expands the width of the network instead of the depth; each subnet can run on different computers, thereby shortening the network training time; each subnet captures image characteristics and noise with a specific scale and a specific direction, so that each subnet has a simple structure and requires fewer convolution layers; each subnet has a loss function suitable for itself, so that each trained noise subband and a clean image subband can be ensured to be most similar; and calculating a loss function of the fine sub-band, and enhancing the influence of the image characteristic coefficient, so that the image characteristic details of the de-noised image are more reserved.