CN104866900B

CN104866900B - A kind of deconvolution neural network training method

Info

Publication number: CN104866900B
Application number: CN201510046974.8A
Authority: CN
Inventors: 施云惠; 张轶昀; 丁文鹏; 尹宝才
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2015-01-29
Filing date: 2015-01-29
Publication date: 2018-01-19
Anticipated expiration: 2035-01-29
Also published as: CN104866900A

Abstract

The invention discloses a deconvolution neural network training method, which can effectively extract image features, is beneficial to the improvement of the classification accuracy rate, improves the training convergence efficiency and convergence accuracy of the deconvolution neural network, and reduces the deconvolution neural network. The training cost in practical applications can also be applied to solve other optimization problems based on convolution operations. This deconvolutional neural network training method includes a training phase and a reconstruction phase, and the training phase includes steps: (1) preprocessing the training images; (2) batch setting the training images; (3) setting the network of the training images Training parameters; (4) start the first layer of training; the reconstruction stage includes steps: (5) preprocessing the image to be reconstructed; (6) setting the network training parameters of the image to be reconstructed; (7) inputting the image to be reconstructed by batch images until the reconstruction of all batch images is completed.

Description

Deconvolution neural network training method

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to a deconvolution neural network training method.

Background

Deep learning is one of the research hotspots in the field of artificial intelligence at present. Based on the thought of neural network, deep learning aims to simulate the layered Perception mode of human brain, and an abstract high-level representation capable of representing attribute categories or characteristics is combined from low-level characteristics by constructing a Multilayer perceptron (MLP), so that the model becomes one of effective models for extracting complex image characteristics at present.

The classical Deep learning model mainly includes Deep Belief Networks (DBNs), multi-layer sparse automatic coding models (AE), and Convolutional Neural Networks (CNNs). In general, these models extract features from an input image through an encoder, greedily transform the image from bottom to top to a feature space layer by layer, and in contrast, reconstruct features of the feature space from top to bottom through a network using a decoder. Among them, DBNs and AE are unsupervised network models that can hierarchically learn rich image features from bottom to top and provide certain gains for high-level applications, such as pattern recognition. However, since the encoder usually uses some simple non-linear functions, it may severely limit the extraction of potential features, so that the model learns suboptimal features, thereby restricting the improvement of the practical accuracy of the application in the recognition field. In addition, their training complexity and time overhead are enormous.

The CNNs model is a supervised learning model and is known as the first algorithm for really and effectively training the multilayer network model. The model constructs a multi-layer network using a trainable set of convolution filters and corresponding pooled downsampling algorithms. Like DBNs and AE, the model trains the network from bottom to top, and can only complete the function of feature extraction and related high-level applications, such as pattern recognition, and cannot complete the function of image reconstruction and corresponding low-level applications.

Zeiler et al first proposed a deconvolution neural network model (DN) in 2010. The model is an unsupervised deep learning model. The model expands the convolutional sparse coding to multiple layers and uses a network structure similar to CNNs, so that the model can not only cope with high-level application for extracting features, but also realize low-level application for image reconstruction, and the defect that the translation invariance of a natural image is ignored in the traditional sparse coding algorithm is overcome. In particular, the convolutional sparse coding model can be considered as a special case of a single-layer deconvolution neural network model. Through the learning set representation of the multilayer structure, the model can effectively extract the features of the images in different scales, and obtains a plurality of satisfactory effects in high-level application, such as identification and classification. According to the Chenyangti method, the DN image denoising method has excellent performance in the application of DN image denoising, and the potential of the DN image denoising method in low-level application is shown.

However, solving the DN model involves a large number of convolution operations and a series of optimization problems. Zeiler et al decomposes the entire optimization problem into two sub-problems: the fixed eigen map solves the filter sub-problem of the filter and the fixed filter solves the eigen map sub-problem of the eigen map. And when actually solving, carrying out alternate optimization solution on the two sub-problems until the whole model converges. Both the two subproblems are solved by using a Conjugate Gradient (CG) method, and the computational complexity is high and the computation is slow. The Zeiler introduces the FISTA algorithm to solve the subproblems of the characteristic diagram, and adds the pooling downsampling algorithm between layers to improve the overall operation speed of the model, but the solution of the objective function is still high in complexity and slow in convergence due to the large use of convolution and a multi-layer complex structure. Besides the optimization training algorithm, the mainstream DN optimization method also includes using GPU operation and using a multithread programming method.

Because the convolution sparse coding model is a special case of a single-layer deconvolution neural network model, some optimization solving algorithms based on the convolution sparse coding model can be regarded as special cases of optimization solving of the single-layer deconvolution neural network model. Bristow et al improve the convergence efficiency of the model by transforming the convolution operation into the fourier domain and using fast fourier operations to increase the speed of the convolution operation. Rigamonti et al propose convolutional sparse coding models with separable filter penalties that improve the speed of convolutional operations by training separable filters to reduce the complexity of the convolutional operations. However, extending such methods to a multi-layer deconvolution neural network model still requires much effort and improvement.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the training method of the deconvolution neural network can effectively extract image features, is beneficial to improving the classification accuracy, improves the training convergence efficiency and the convergence accuracy of the deconvolution neural network, reduces the training cost of the deconvolution neural network in practical application, and can be applied to solving other optimization problems based on convolution operation.

The technical solution of the invention is as follows: the training method of the deconvolution neural network comprises a training stage and a reconstruction stage, wherein the training stage comprises the following steps:

(1) preprocessing a training image: selecting a training set image, processing the training set image into a gray image, and unifying length and width pixels;

(2) carrying out batch setting on training images: according to the application of the trained network, carrying out batch division on the training images;

(3) setting network training parameters of a training image, wherein the network training parameters of the training image comprise the number of network layers, the size of each layer of filter, the number of characteristic graphs of each layer, FISTA reconstruction step number and reconstruction step length, total cycle epoch times and characteristic graph sparse control parameters;

(4) starting a first layer of training: initializing a first-layer characteristic graph and a first-layer filter by using random numbers, and inputting a batch of training images; solving the sub-problem of the feature map, solving the feature map by using a FISTA method, then solving the sub-problem of the filter, wherein the current residual error is only used for updating one filter every time of calculation, the gradient step length for updating the filter comes from the current recalculated reconstructed residual error every time, and repeatedly repeating the two steps of solving the sub-problem of the feature map and solving the sub-problem of the filter until the reconstructed residual error is converged; inputting a second batch of training images, initializing feature maps of the second batch of images by using random numbers and using filters obtained by the previous batch of training, repeating the steps until training of all batches of training images is completed to be used as an epoch, and then inputting the first batch of images to start a second epoch training until the total cycle epoch times of the step (3) are completed;

the reconstruction phase comprises the following steps:

(5) preprocessing an image to be reconstructed: processing an image to be reconstructed into a gray image, and unifying the length and width pixels;

(6) setting network training parameters of an image to be reconstructed, wherein the network training parameters of the image to be reconstructed comprise a FISTA reconstruction step length, reconstruction times and characteristic diagram sparsity control parameters;

(7) inputting images to be reconstructed in batches until the reconstruction of all the batches of images is completed: and sequentially inputting images to be reconstructed according to batches, and calculating the characteristic diagram subproblems of each layer of filter obtained by training the training images before the network is used until the convergence to obtain the reconstructed images and each layer of characteristic diagram.

According to the method, by improving the sub-problem of the filter solution, the current residual error is only used for updating one filter every time of calculation, and the gradient step length used for updating the filter every time is from the reconstructed residual error calculated again at present, so that the image characteristics can be effectively extracted, the improvement of the classification accuracy is facilitated, the training convergence efficiency and the convergence accuracy of the deconvolution neural network are improved, the training cost of the deconvolution neural network in practical application is reduced, and the method can be applied to the solution of other optimization problems based on convolution operation.

Drawings

Fig. 1 shows a logical structure of a deconvolution neural network model.

FIG. 2 is a comparison of objective data of SFU algorithm and traditional CG algorithm in training experiment; in fig. 2, (a) (b) shows the training results of the first layer, and (c) (d) shows the training results of the second layer.

FIG. 3 is a comparison of objective data of SFU algorithm and traditional CG algorithm in reconstruction experiments; in fig. 3, (a) (b) shows the training results of the first layer, and (c) (d) shows the training results of the second layer.

FIG. 4 is a flow chart of a method of deconvolution neural network training in accordance with the present invention.

Detailed Description

As shown in fig. 4, the training method of the deconvolution neural network includes a training phase and a reconstruction phase, and the training phase includes the following steps:

the reconstruction phase comprises the following steps:

Preferably, in the step (2), the images of the same type with high similarity are used as the same batch of training.

Preferably, the filter subproblem is solved in the step (4) by the formula (1):

wherein u ═ f_k，Wherein u ═ f_kIs the filter that is currently to be updated, is the value obtained in the last iteration of the current filter to be updated, and L > 0 isThe upper constant parameter of the Labrinitz gradient, g (f)_k) Is the total cost function expressed by taking the current filter to be updated as a variable.

Preferably, the following steps are further included between the steps (4) and (5):

(a) starting a second layer of training: initializing a second layer filter and a feature map by using a random number by using a first layer filter obtained by first layer training; inputting the first batch of training images into a network, and performing optimization calculation on the filter and the characteristic diagram until convergence; inputting a next batch of training images, initializing the next batch of feature maps by using a random number, training until convergence, and repeating the steps until the training of all batches of training images is completed;

(b) starting nth layer training, n is an integer greater than 2: using the filter obtained by the training of the first n-1 layers, and initializing the filter of the nth layer and the characteristic diagram by using a random number; inputting the first batch of training images into a network, and performing optimization calculation on the filter and the characteristic diagram until convergence; inputting a next batch of training images, initializing the next batch of feature maps by using random numbers, training until convergence, and repeating the steps until all batches of training images are trained.

Preferably, in the steps (a) and (b), the filter and the feature map are optimally calculated by formula (2):

wherein,is the filter that is currently to be updated at layer l,is the value of the filter obtained in the last iteration, L^lIs the upper bound constant parameter, g, of the Labrunitz gradient of the first layer^lIs the first layerIs a univariate total cost function.

The present invention is specifically illustrated below:

in order to improve the training convergence efficiency and the convergence precision of the deconvolution neural network and reduce the training cost of the deconvolution neural network in practical application, the invention provides a deconvolution neural network training algorithm based on a sequential update filter strategy. The algorithm can be applied to solving other optimization problems based on convolution operation, such as a convolution sparse coding model.

The classical deconvolution neural network model objective function can be recursively expressed as:

where, y is the input image,is a reconstructed image restored through a network using the feature map of the l-th layer,is the ith characteristic diagram of the l layer, K^lis the number of characteristic graphs owned by the l-th layer, β is a reconstruction weight parameter, | | · |. luminance₁Is thatThe norm of the number of the first-order-of-arrival,is the l-th layer feature map set, using the single vector z hereinafter^lIs represented by R_lIs a set of the l-th layer feature mapsReconstruction operator to restore to a reconstructed image, F_lIs an operation matrix containing the l-th layer convolution and summation operation, which is defined as follows:

whereinIs a filter of the first layer of the filter,is a feature map of the first layer, representing a two-dimensional convolution operator,is the filter with the l layer number (i, j), which is used to reconstruct the feature map of the previous layerOne of the filters of (1). For the trained deconvolution neural network, all the filters in the network are common to all the input images, and each input image has a unique group of feature maps from the highest layer after being extracted by the network.

the logical structure of the deconvolution neural network model is shown in fig. 1, where ⊕ represents the two-dimensional convolution operator.

Traditional deconvolution neural network training methods are typically based on gradient descent algorithms. Both of these methods use the same reconstructed residual to calculate the optimal step size for all filters and update all filters in a batch, thereby introducing more error.

The main work of the invention is based on the above model, and the traditional batch updating filter algorithm is improved to greedy update filters one by one sequentially, that is, the current residual error is only used for updating one filter every time of calculation, and the gradient step length used for updating the filter every time is from the reconstructed residual error which is recalculated currently. In order to make the updating strategy calculable, the invention applies Taylor expansion formula to separate the filters to be updated one by one, thereby implementing the updating calculation.

1. Method for solving characteristic diagram subproblems

The solution of the deconvolution neural network model comprises an alternating solution optimization problem of two sub-problems, a filter sub-problem and a feature map sub-problem. The present invention improves only the solution method of the filter subproblem, while the solution of the eigen map subproblem continues to use the classical solution scheme, i.e. the FISTA method. To facilitate an understanding of the computational method of the present invention and to enable a reproduction of the experiments of the present invention, the application of the FISTA method to the computational steps for solving the signature subproblems is briefly described herein.

The feature map sub-problem is intended to fix the current filter set and optimally solve the feature map. When the problem is solved layer by layer, the problem is degraded into an exemplar convolution sparse coding problem. First, a reconstructed image is calculated using the above model objective functionThe feature map set is then calculated using the following equation:

where, y is the input image,is an operator for passing the residual from the first layer to the l-th layer,^lis the l-th layer residual, α^lIs the ISTA gradient step.

2. Solving method of filter subproblem

The content of the filter subproblem is a fixed characteristic diagram, and a filter set is optimized and solved. For convenience of explanation and understanding, the invention firstly describes a training solution method of a single-layer deconvolution neural network, and then extends to a multi-layer network solution method.

1.1 Single layer deconvolution neural network model solution

As shown by the objective function above, willIs decomposed intoThe single-layer objective function can thus be reduced to the following form:

whereinAnd isAnd then can be rewritten into the following form:

filter set for first layerIn particular, contain K¹-set of 1 elementIs fixed and known, only the k-th filterIs an unknown quantity. When the feature map is fixed, only the update is performedGamma is a combination of¹Is a constant, a functionIs a continuous convex function and thus can be approximated by taylor expansions and labbuchtz residuals. A function g is arranged inThe points are subjected to Taylor expansion, which yields the following equation:

the value of the expansion approximates the function g in each iterationTrue value of the point, L > 0The upper bound constant parameter of the Lei-Bluenz gradient. This formula is simplified to give the following formula:

using u-f_k，Replacing the argument, rewrite the formula as:

where v is a known quantity and u can be calculated directly from v.

1.2 multilayer deconvolution neural network model solution

According to the multi-layer deconvolution neural network model objective function shown above, the objective function to solve the l-th layer network can be expressed as:

wherein, K^lIs the number of characteristic maps of the l-th layer,is the jth characteristic diagram of the ith layer,is the l-th layer, indexed (i, j) filter, which is used to reconstruct the feature map of the previous layerone element of the filter set, β^lIs a regularization parameter.

Expanding the solution of the first layer to the l layer, and sequentially filtering the l layer of filterSequential updates are separated from the overall objective function, whereby the above equation can be rewritten as:

wherein When the characteristic diagram is fixedUpdatingWhen Z is^lIs a constant. The formula is iterated back to the multilayer objective function by recursion to obtain a multilayer total objective function:

wherein Method of analogy to the first layer, for the equationTaylor expansion is carried out, and simplification can obtain:

whereinEquation (2) can be solved for multiple iterative optimizations using the same method as for the first layer.

In order to verify the effectiveness of the algorithm provided by the text, the text respectively performs effect verification through two types of experiments. Experiment one is implemented by training and reconstruction. The experiment used a face94 face dataset based on which a two-layer deconvolution neural network model DN, hereinafter designated DN-CG and DN-SFU, was trained using the conventional conjugate gradient CG and the text SFU, respectively. The accuracy and convergence performance of the two algorithms in the training stage of the first layer and the second layer network and the reconstruction stage are respectively compared through the reconstruction Mean Square Error (MSE) and the reconstruction peak signal-to-noise ratio (PSNR). The structure of the deconvolution neural network is set to be 9 feature maps in the first layer with a filter size of 11 × 11, and 4 feature maps in the second layer with a filter size of 21 × 21. The training image is preprocessed into a grayscale 200 x 200 image, 10 images are arbitrarily taken as a training set, and the other 10 different images are arbitrarily taken as a test set.

For clarity, the following notations are used in Table 1.

TABLE 1 Experimental notation

Symbol	Model (model)	Algorithm
			DN-SFU	DN	SFU
DN-CG	DN	CG

Wherein DN-SFU is used for training the deconvolution neural network by using the text algorithm, and DN-CG is used for training the deconvolution neural network by using the traditional conjugate gradient method.

In fig. 2, (a) (b) shows the training results of the first layer, and (c) (d) shows the training results of the second layer. The ordinate of (a) and (c) is mean square error MSE of the reconstructed image, the abscissa is the total iteration times, and the change condition of the reconstructed mean square error along with the iteration times in the training process is shown. (b) And (d) the ordinate is the mean square error MSE of the reconstructed image, the abscissa is the training time, and the condition that the reconstructed mean square error changes along with the time in the training process is shown.

The objective experimental data prove that when the DN-SFU algorithm trains the deconvolution neural network, the convergence speed of the first layer network is obviously improved, and meanwhile, the convergence precision is improved. The DN-SFU algorithm does not obviously improve the training convergence speed of the second layer, but improves the convergence precision. Because the training of the second-layer network starts from the convergence point of the first-layer network, the algorithm can still obtain larger training time gain when the two-layer network is comprehensively trained.

The results of fig. 3 are the reconstruction experiments of the test images using the filter sets trained using the DN-CG and DN-SFU algorithms, respectively, where (a) (b) is the first layer training result and (c) (d) is the second layer training result. The ordinate of (a) and (c) is the peak signal-to-noise ratio (PSNR) of the reconstructed image and the original image, the abscissa is the iteration frequency of the reconstructed ISTA, and the change condition of the reconstructed PSNR along with the iteration frequency in the reconstruction process is shown. (b) And (d) the ordinate is the PSNR of the reconstructed image, the abscissa is the reconstruction time, and the condition that the reconstructed PSNR changes along with the time in the reconstruction process is shown. The reconstruction experiment adopts test set data which is not coincident with a training set and a filter set of convergence points of each algorithm during training.

The experimental result proves that the filter set obtained by training the algorithm has larger quality gain in the aspect of the reconstruction performance of the test image.

The test images reconstructed using the filter set trained using the DN-SFU algorithm herein are subjectively much better than the traditional conjugate gradient algorithm.

The second experiment is based on a data set Caltech-101, the data set is composed of natural images, is divided into 101 classes, and is one of general data sets used for evaluating algorithm efficiency in the field of pattern recognition. The general identification experiment test parameters are used, namely 30 images of each class are taken as training images, 3060 images are obtained in the training set, and the rest images are used as test sets. All images of the data set are preprocessed into gray scale images, and are scaled to 150 × 150 pixels in a manner of zero padding. The deconvolution network model set in this experiment was a first layer of 15 profiles using filters of 7 × 7 size, and a second layer of 4 profiles using filters of 21 × 21 size. After the training of the network is completed, the test image is used for extracting a characteristic diagram through the network, and the first layer is adoptedThe feature map is used as an input parameter of the classifier. The classifier adopts a general SPM^[13]The classification method comprises the steps of respectively extracting feature vectors of all first-layer feature maps of one map through SPM, adding the vectors, averaging to obtain a final unique feature vector of the map, and using the final unique feature vector as the final classification input of the support vector machine. This experimental method and reference [7 ]]The described method is consistent. The recognition results are shown in table 2.

TABLE 2

Table 2 demonstrates that the filter set obtained by training using the method of the present disclosure can effectively extract image features and is beneficial to improving the classification accuracy. For fairness, the methods listed herein all use a single feature for recognition, and the multi-feature comprehensive recognition method has more excellent performance on the data set.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications, equivalent variations and modifications made to the above embodiment according to the technical spirit of the present invention still belong to the protection scope of the technical solution of the present invention.

Claims

1. A deconvolution neural network training method, characterized in that, comprises a training phase and a reconstruction phase, and the training phase comprises the following steps:

(1) Preprocess the training image: select the training set image, process it as a grayscale image, and unify the length and width pixels;

(2) Batch setting is performed on the training images: according to the application of the trained network, the training images are divided into batches;

(3) Set the network training parameters of the training image. The network training parameters of the training image include the number of network layers, the filter size of each layer, the number of feature maps of each layer, the number of FISTA reconstruction steps and reconstruction steps, the total number of epoch cycles, Feature map sparse control parameters;

(4) Start the first layer training: initialize the first layer feature map and the first layer filter with random numbers, input a batch of training images; first solve the feature map sub-problem, use the FISTA method to find the feature map, and then solve the filter Sub-problems, each calculation of the current residual is only used to update one filter, and the gradient step size used to update the filter each time comes from the current recalculated reconstruction residual, repeatedly solving the feature map sub-problem and solving the filter sub-problem These two steps until the reconstruction residual converges; input the second batch of training images, initialize the feature map of the second batch of images with random numbers and use the filters obtained from the previous batch of training, and repeat until all batches of training are completed Image training, as an epoch, and then input the first batch of images to start the second epoch training, until the total cycle epoch times of step (3) is completed;

The rebuild phase includes the following steps:

(5) Preprocessing the image to be reconstructed: processing the image to be reconstructed into a grayscale image, and unifying the length and width pixels;

(6) Set the network training parameters of the image to be reconstructed, the network training parameters of the image to be reconstructed include FISTA reconstruction step size, reconstruction times and feature map sparsity control parameters;

(7) Input the images to be reconstructed in batches until the reconstruction of all batches of images is completed: input the images to be reconstructed in sequence in batches, and the network uses the filters of each layer trained by the training images to calculate the feature map sub-problem until convergence. Reconstruct the image and feature maps of each layer;

In described step (2), the same kind of image with high degree of similarity is used as the same batch of training;

In the step (4), solve the filter subproblem by formula (1):

Where u=f _k is the filter to be updated currently, is the value obtained by the current filter to be updated in the last iteration, L>0 is The upper bound constant parameter of the Leibniz gradient, g(f _k ), is the total cost function represented by the current filter to be updated as a variable.

2. deconvolution neural network training method according to claim 1, is characterized in that, also comprises the following steps between described step (4) and (5):

(a) Start the second layer of training: use the first layer of filters obtained from the first layer of training, initialize the second layer of filters and feature maps with random numbers; input the first batch of training images into the network, and filter and feature maps Perform optimization calculations until convergence; input the next batch of training images, initialize the feature maps of the next batch of training images with random numbers, train until convergence, and repeat until the training of all batches of training images is completed;

(b) Start the nth layer training, n is an integer greater than 2: use the filter obtained from the first n-1 layer training, initialize the nth layer filter and feature map with random numbers; input the first batch of training images into the network, Perform optimization calculations on filters and feature maps until convergence; input the next batch of training images, initialize the feature maps of the next batch of training images with random numbers, train until convergence, and repeat until the training of all batches of training images is completed;

In the steps (a) and (b), the filter and feature map are optimized and calculated by formula (2):

in, is the filter currently to be updated in layer l, is the value obtained by the filter in the last iteration, L ^l is the upper bound constant parameter of the Leibniz gradient of the lth layer, g ^l is the lth layer and above is a univariate total cost function.