CN111432211B

CN111432211B - Residual error information compression method for video coding

Info

Publication number: CN111432211B
Application number: CN202010247702.5A
Authority: CN
Inventors: 段强; 汝佩哲; 李锐; 金长新
Original assignee: Shandong Inspur Science Research Institute Co Ltd
Current assignee: Shandong Inspur Science Research Institute Co Ltd
Priority date: 2020-04-01
Filing date: 2020-04-01
Publication date: 2021-11-12
Anticipated expiration: 2040-04-01
Also published as: CN111432211A

Abstract

The invention provides a residual information compression method for video coding, which relates to the fields of information compression and coding and decoding. By using the idea of self-encoder, the residual information is extracted through a trained encoder network to generate a feature map , and then reduce the storage space of the data through quantization, and then use entropy coding to further compress the quantized data. When the residual information is decoded, the reverse process is used to decode and inverse quantize the stored entropy encoded data, and decode it through a decoder with the opposite structure to restore the three-channel residual information from the feature map. By compressing or secondary compressing the existing residual information, the storage space is doubled and the storage cost is reduced.

Description

Residual error information compression method for video coding

Technical Field

The invention relates to the field of information compression, coding and decoding, in particular to a residual error information compression method for video coding.

Background

In the digital media era, a large amount of image video data is generated and stored from the fields of daily life, social networking, public security monitoring, industrial production and the like, and a large amount of storage space needs to be consumed. The compression ratio of h264, which is the mainstream video compression format at present, still has a space for improvement, and the motion estimation based on blocks also generates color difference, so that h265, which is not popularized yet, is not considered well due to low compression efficiency and various patent disputes.

Motion compensation, which is an effective method for reducing redundant information of a frame sequence, is to predict and compensate a current local image from a previous local image. It usually has a residual with the real video information, and the residual information can complement the information lost in the motion compensation process.

In view of the large-scale application of neural networks and deep learning techniques to tasks in the field of artificial intelligence, it is very promising to compress data by means of neural networks.

Disclosure of Invention

Based on the above technical problem, the present invention provides a residual information compression method for video coding, which can obtain compressed residual information at a low bit rate for storing and compressing the residual information after motion estimation of video compression.

The method is based on a neural network structure of an autoencoder, uses a GDN activation function, and combines quantization and entropy coding to compress residual error information.

An autoencoder is an artificial neural network that learns an efficient representation of input data through unsupervised learning. It does not need to specially label the training data, and the loss is calculated based on the difference between the input and the output. The process of representing the input data by the neural network can be considered as a kind of encoding, and the dimension of the encoding is usually smaller than that of the input data, so that the compression and dimension reduction effects are achieved. Simple training it makes the input and output the same and has no great significance, so it is forced to learn an efficient representation of the data by adding internal size constraints, such as bottleeck layer, and training the data to add noise and train the self-encoder to recover the original data.

After an efficient representation is obtained, it can be quantized to achieve further compression. Because sometimes floating point numbers with higher precision occupy a lot of storage space, but too many bits after the decimal point do not have a great benefit to the actual task. However, in the back propagation of the neural network, optimization is performed by gradient descent, but quantization is an unguided process and cannot be used in the process of gradient calculation. There are various methods that can replace direct quantization, such as adding uniform noise, soft quantization, etc.

The quantized characteristic values need to be further compressed by entropy coding, and the commonly used entropy coding such as arithmetic coding, huffman coding, shannon coding and the like is important to design an efficient probability model.

Entropy coding belongs to lossless compression of data, reducing bits by identifying and eliminating portions of statistical redundancy, so that it does not lose information when compression is performed. The goal is to display discrete data with fewer bits (than needed for the original data representation) without loss of information during compression.

The method for compressing the residual information based on the self-encoder and the entropy coding can obtain the compressed residual information under the condition of low bit rate, and is used for storing and compressing the residual information after motion estimation of video compression.

The residual features are used to train the self-encoder network by using the self-encoder. Then, a trained Encoder (Encoder) network is used for extraction, a Feature Map (Feature Map) is generated, then the storage space of the data is reduced through quantization (quantization), and the quantized data is further compressed through Entropy Coding (Entropy Coding). When decoding the residual information, the stored entropy-encoded data is decoded and dequantized by using the reverse flow, and decoded by a Decoder (Decoder) with the reverse structure, and the residual information is restored from the feature map.

The implementation steps comprise: building a neural network architecture, coding, quantizing, entropy coding, storing a generated file, and decoding entropy. In particular, the amount of the solvent to be used,

1) building a neural network architecture, and specifying the number of layers of convolution layers, the size of convolution kernels, a padding method and the number of threads required by coding. In general, the design principle is that the size of a convolution kernel is first large and then small, the number is first small and then large or consistent, and strides >1 is arranged at certain layers to reduce the size of a feature map;

2) training is carried out by using a training set, each label of residual information is self, a loss function is constructed by mse and bpp, and optimization is carried out by using an Adam optimizer. After multiple iterations, a trained neural network model can be obtained;

3) the encoding process is a process of inputting the existing residual error information into the Encoder part of the trained neural network and obtaining a Feature Map (Feature Map) through multi-step convolution. Wherein the activation function of each convolutional layer uses ReLU or GDN;

4) quantization is commonly used in both the manner of adding uniform noise and soft quantization. Adding uniform noise is a process of adding noise to replace quantization in training, and because differences before and after quantization are similar to uniform noise, simulation is carried out by artificially adding noise.

5) The entropy coding is started, and binary coding is carried out firstly. Non-binary numbers must be binarized or converted to binary numbers before arithmetic coding. And counting the probability density functions of all binary symbols, and carrying out arithmetic coding on each bit of the binary symbols according to the probability density function obtained by counting.

6) The encoded file is stored in a serialized form and can be processed using a serialized package such as pickle.

7) And performing entropy decoding, reading the file stored in a serialized mode, converting the file into decimal fraction, namely converting the decimal point in front of the highest bit into the decimal fraction, and then decoding according to the existing probability density function.

8) After entropy decoding, a feature map with the size identical to that before entropy encoding is obtained, then a neural network opposite to the encoding network is constructed, a convolution layer is replaced by a deconvolution layer, the feature map is restored to residual information of three channels, and one-step rounding quantization is carried out during storage.

The invention has the advantages that

The method has better effect on the tasks of image compression and super-resolution.

The method can be applied to the field of video coding and decoding and compression, and the storage space and the storage cost are reduced by times by compressing or secondarily compressing the existing residual information. The compressed residual information is mainly used for supplementing lost information in video compression and improving the picture quality of video compression.

Drawings

FIG. 1 is a schematic workflow diagram of the present invention;

fig. 2 is an exemplary diagram of a neural network structure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.

The method comprises the steps of extracting residual error information through a trained Encoder (Encoder) network by using the idea of a self-Encoder to generate a Feature Map (Feature Map), reducing the storage space of data through quantization (Quantize), and further compressing the quantized data through entropy coding. When decoding the residual information, the reverse flow is used to decode and inversely quantize the stored entropy coding data, and the decoding is carried out by a Decoder (Decoder) with the reverse structure, and the residual information of three channels is recovered from the characteristic diagram.

The method comprises the following specific steps: building a neural network architecture, coding, quantizing, entropy coding, storing a generated file, and decoding entropy. In particular, the amount of the solvent to be used,

The above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. a residual information compression method for video coding, is characterized in that,

Neural network structure based on autoencoder, using GDN activation function, combined with quantization and entropy coding for residual information compression;

By adding internal size constraints, adding noise to the training data, and training the autoencoder to restore the original data, this forces it to learn an efficient representation of the data;

After the efficient representation is obtained, it is quantized to achieve further compression;

The quantized eigenvalues need to be entropy encoded for further compression;

Entropy coding is a lossless compression of data that reduces bits by identifying and eliminating statistically redundant parts, which allows it to perform compression without loss of information;

By using the idea of the autoencoder, the residual feature is used to train the autoencoder network; then the trained encoder network is used to extract, generate a feature map, and then reduce the storage space of the data through quantization, and then use entropy coding to The quantized data is further compressed; when the residual information is decoded, the reverse process is used to decode and inversely quantize the saved entropy encoded data, and decode it through a decoder with the opposite structure, and restore the residual information from the feature map;

The steps include: building a neural network architecture, encoding, quantizing, entropy encoding, saving generated files, entropy decoding and decoding;

Among them, the network structure should at least include a set of convolutional layers for downsampling by setting Strides, a set of deconvolution layers for upsampling by setting Strides, and a set of layers for quantization and entropy coding;

Here, the size and number of convolution kernels of the convolution layer are combined through experiments, and the activation function of the convolution layer uses GDN (Generalized Divisive Normalization) or ReLU;

Specific steps:

1) Build a neural network architecture, specify the number of convolutional layers required for encoding, the size of the convolution kernel, the padding method, and the number of strides;

2) Use the training set for training, the label of each residual information is its own, build the loss function through mse and bpp, and use the Adam optimizer for optimization; after several iterations, a trained neural network model can be obtained;

3) The encoding process is the process of inputting the existing residual information into the Encoder part of the trained neural network, and obtaining the feature map through multi-step convolution;

4) There are two commonly used methods for quantization: adding uniform noise and soft quantization; adding uniform noise is the process of adding noise instead of quantization during training;

5) Start entropy coding, binarize first, and encode binary numbers; non-binary numbers must be binarized or converted into binary numbers before arithmetic coding; count the probability density functions of all binary symbols, and for each of the binarized symbols The bits are arithmetically encoded according to the probability density function obtained by statistics;

6) Serialize and save the encoded file, and use the serialized package for processing;

7) Perform entropy decoding, read the serialized and saved file, convert it into a decimal first, that is, add a decimal point in front of the highest digit to become a decimal, and then decode according to the existing probability density function;

8) After entropy decoding, a feature map with the same size as before entropy coding will be obtained, and then by constructing a neural network opposite to the coding network, the deconvolution layer is used instead of the convolution layer, and the feature map is restored to three-channel residuals. difference information, and perform one-step rounding and quantification when saving.