Residual error information compression method for video coding
Technical Field
The invention relates to the field of information compression, coding and decoding, in particular to a residual error information compression method for video coding.
Background
In the digital media era, a large amount of image video data is generated and stored from the fields of daily life, social networking, public security monitoring, industrial production and the like, and a large amount of storage space needs to be consumed. The compression ratio of h264, which is the mainstream video compression format at present, still has a space for improvement, and the motion estimation based on blocks also generates color difference, so that h265, which is not popularized yet, is not considered well due to low compression efficiency and various patent disputes.
Motion compensation, which is an effective method for reducing redundant information of a frame sequence, is to predict and compensate a current local image from a previous local image. It usually has a residual with the real video information, and the residual information can complement the information lost in the motion compensation process.
In view of the large-scale application of neural networks and deep learning techniques to tasks in the field of artificial intelligence, it is very promising to compress data by means of neural networks.
Disclosure of Invention
Based on the above technical problem, the present invention provides a residual information compression method for video coding, which can obtain compressed residual information at a low bit rate for storing and compressing the residual information after motion estimation of video compression.
The method is based on a neural network structure of an autoencoder, uses a GDN activation function, and combines quantization and entropy coding to compress residual error information.
An autoencoder is an artificial neural network that learns an efficient representation of input data through unsupervised learning. It does not need to specially label the training data, and the loss is calculated based on the difference between the input and the output. The process of representing the input data by the neural network can be considered as a kind of encoding, and the dimension of the encoding is usually smaller than that of the input data, so that the compression and dimension reduction effects are achieved. Simple training it makes the input and output the same and has no great significance, so it is forced to learn an efficient representation of the data by adding internal size constraints, such as bottleeck layer, and training the data to add noise and train the self-encoder to recover the original data.
After an efficient representation is obtained, it can be quantized to achieve further compression. Because sometimes floating point numbers with higher precision occupy a lot of storage space, but too many bits after the decimal point do not have a great benefit to the actual task. However, in the back propagation of the neural network, optimization is performed by gradient descent, but quantization is an unguided process and cannot be used in the process of gradient calculation. There are various methods that can replace direct quantization, such as adding uniform noise, soft quantization, etc.
The quantized characteristic values need to be further compressed by entropy coding, and the commonly used entropy coding such as arithmetic coding, huffman coding, shannon coding and the like is important to design an efficient probability model.
Entropy coding belongs to lossless compression of data, reducing bits by identifying and eliminating portions of statistical redundancy, so that it does not lose information when compression is performed. The goal is to display discrete data with fewer bits (than needed for the original data representation) without loss of information during compression.
The method for compressing the residual information based on the self-encoder and the entropy coding can obtain the compressed residual information under the condition of low bit rate, and is used for storing and compressing the residual information after motion estimation of video compression.
The residual features are used to train the self-encoder network by using the self-encoder. Then, a trained Encoder (Encoder) network is used for extraction, a Feature Map (Feature Map) is generated, then the storage space of the data is reduced through quantization (quantization), and the quantized data is further compressed through Entropy Coding (Entropy Coding). When decoding the residual information, the stored entropy-encoded data is decoded and dequantized by using the reverse flow, and decoded by a Decoder (Decoder) with the reverse structure, and the residual information is restored from the feature map.
The implementation steps comprise: building a neural network architecture, coding, quantizing, entropy coding, storing a generated file, and decoding entropy. In particular, the amount of the solvent to be used,
1) building a neural network architecture, and specifying the number of layers of convolution layers, the size of convolution kernels, a padding method and the number of threads required by coding. In general, the design principle is that the size of a convolution kernel is first large and then small, the number is first small and then large or consistent, and strides >1 is arranged at certain layers to reduce the size of a feature map;
2) training is carried out by using a training set, each label of residual information is self, a loss function is constructed by mse and bpp, and optimization is carried out by using an Adam optimizer. After multiple iterations, a trained neural network model can be obtained;
3) the encoding process is a process of inputting the existing residual error information into the Encoder part of the trained neural network and obtaining a Feature Map (Feature Map) through multi-step convolution. Wherein the activation function of each convolutional layer uses ReLU or GDN;
4) quantization is commonly used in both the manner of adding uniform noise and soft quantization. Adding uniform noise is a process of adding noise to replace quantization in training, and because differences before and after quantization are similar to uniform noise, simulation is carried out by artificially adding noise.
5) The entropy coding is started, and binary coding is carried out firstly. Non-binary numbers must be binarized or converted to binary numbers before arithmetic coding. And counting the probability density functions of all binary symbols, and carrying out arithmetic coding on each bit of the binary symbols according to the probability density function obtained by counting.
6) The encoded file is stored in a serialized form and can be processed using a serialized package such as pickle.
7) And performing entropy decoding, reading the file stored in a serialized mode, converting the file into decimal fraction, namely converting the decimal point in front of the highest bit into the decimal fraction, and then decoding according to the existing probability density function.
8) After entropy decoding, a feature map with the size identical to that before entropy encoding is obtained, then a neural network opposite to the encoding network is constructed, a convolution layer is replaced by a deconvolution layer, the feature map is restored to residual information of three channels, and one-step rounding quantization is carried out during storage.
The invention has the advantages that
The method has better effect on the tasks of image compression and super-resolution.
The method can be applied to the field of video coding and decoding and compression, and the storage space and the storage cost are reduced by times by compressing or secondarily compressing the existing residual information. The compressed residual information is mainly used for supplementing lost information in video compression and improving the picture quality of video compression.
Drawings
FIG. 1 is a schematic workflow diagram of the present invention;
fig. 2 is an exemplary diagram of a neural network structure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.
The method comprises the steps of extracting residual error information through a trained Encoder (Encoder) network by using the idea of a self-Encoder to generate a Feature Map (Feature Map), reducing the storage space of data through quantization (Quantize), and further compressing the quantized data through entropy coding. When decoding the residual information, the reverse flow is used to decode and inversely quantize the stored entropy coding data, and the decoding is carried out by a Decoder (Decoder) with the reverse structure, and the residual information of three channels is recovered from the characteristic diagram.
The method comprises the following specific steps: building a neural network architecture, coding, quantizing, entropy coding, storing a generated file, and decoding entropy. In particular, the amount of the solvent to be used,
1) building a neural network architecture, and specifying the number of layers of convolution layers, the size of convolution kernels, a padding method and the number of threads required by coding. In general, the design principle is that the size of a convolution kernel is first large and then small, the number is first small and then large or consistent, and strides >1 is arranged at certain layers to reduce the size of a feature map;
2) training is carried out by using a training set, each label of residual information is self, a loss function is constructed by mse and bpp, and optimization is carried out by using an Adam optimizer. After multiple iterations, a trained neural network model can be obtained;
3) the encoding process is a process of inputting the existing residual error information into the Encoder part of the trained neural network and obtaining a Feature Map (Feature Map) through multi-step convolution. Wherein the activation function of each convolutional layer uses ReLU or GDN;
4) quantization is commonly used in both the manner of adding uniform noise and soft quantization. Adding uniform noise is a process of adding noise to replace quantization in training, and because differences before and after quantization are similar to uniform noise, simulation is carried out by artificially adding noise.
5) The entropy coding is started, and binary coding is carried out firstly. Non-binary numbers must be binarized or converted to binary numbers before arithmetic coding. And counting the probability density functions of all binary symbols, and carrying out arithmetic coding on each bit of the binary symbols according to the probability density function obtained by counting.
6) The encoded file is stored in a serialized form and can be processed using a serialized package such as pickle.
7) And performing entropy decoding, reading the file stored in a serialized mode, converting the file into decimal fraction, namely converting the decimal point in front of the highest bit into the decimal fraction, and then decoding according to the existing probability density function.
8) After entropy decoding, a feature map with the size identical to that before entropy encoding is obtained, then a neural network opposite to the encoding network is constructed, a convolution layer is replaced by a deconvolution layer, the feature map is restored to residual information of three channels, and one-step rounding quantization is carried out during storage.
The above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.