Improved method for video encoder implementation
The present invention belongs to the field of motion image coding technology.
(1) In the current information age, storage and transmission of images is becoming more and more important. Since the amount of information of original image data is enormous, compression of image data, that is, encoding of moving images is essential to store images on a storage medium of limited capacity and to transmit images on an information channel of limited capacity. Motion picture coding is achieved by comprehensively utilizing the redundancies of the picture signals in three aspects of time, space and statistics, as well as the knowledge of the scene and the human visual characteristics. The current mature coding method is a hybrid coding method which integrates several coding methods of predictive coding, transform coding and entropy coding and a motion compensation technology.
One of the encoding implementation methods is shown in fig. 1, and includes the following steps:
(1) performing motion estimation ME on the input current image and the previous reconstructed image, and performing motion estimation to obtain a motion vector MV;
(2) predicting P for the last reconstructed frame based on the motion vector to obtain the predicted image of the current image;
(3) subtracting the predicted image of the current image from the current image to obtain a prediction error PE;
(4) performing Discrete Cosine Transform (DCT) and quantization Q on the prediction error;
(5) carrying out Variable Length Coding (VLC) on the result of the step (4) to obtain a current coded image; and
(6) and (4) performing inverse quantization IQ and inverse discrete cosine transform IDCT on the result of (4) to obtain a reconstructed prediction error, adding the reconstructed prediction error to the current prediction image to obtain a current reconstructed image, and converting the current reconstructed image into a previous reconstructed image through a frame memory FM.
The functions of the steps in fig. 1 are as follows:
the motion estimation, prediction and calculation of prediction error (subtracter) constitute predictive coding in order to eliminate temporal correlation of the image signal. The temporal correlation of the images is represented by a portion of the current frame image resulting from a portion of the previous frame image undergoing motion. The motion is described by a motion vector, the motion estimation is to obtain the motion vector, and the prediction is to compensate and offset the signal change between the current frame image and the previous frame image due to the motion according to the motion vector.
The discrete cosine transform DCT constitutes transform coding in order to remove spatial correlation of image signals. Quantization Q is both a requirement for subsequent entropy coding and takes advantage of human visual characteristics to improve the quality of the coding.
Variable length coding VLC constitutes entropy coding, further eliminating statistical correlation of the image signal.
And the inverse quantization IQ, the inverse discrete cosine transform IDCT and the adder realize image reconstruction and provide a reference object for prediction.
An encoder implementing the hybrid encoding method described above is shown in fig. 2. The block DCT, block Q, block IQ, block IDCT, and block VLC in the picture refer to a block (8 x 8 pixels according to the international standard) in one frame (image) which is subjected to discrete cosine transform DCT, quantization Q, inverse quantization IQ, inverse discrete cosine transform IDCT, and variable length coding VLC. The macroblock P is a unit of prediction P in units of one macroblock (6 blocks according to the international standard) in one frame picture. The macroblock MV refers to a motion vector of the macroblock. A 0 block means that 8 × 8 elements in the block are all 0.
The working process of the encoder is as follows: first, a frame (one) image is encoded in units of one block (8 × 8 pixels) or one macroblock (6 blocks). Secondly, the whole encoding process is divided into two parts: the motion estimation and coding core is shown in phantom in fig. 2. The current frame macro block firstly enters a motion estimation part for motion estimation, a motion vector MV is obtained by the motion estimation and is input to a coding core for coding the current frame macro block, and the method specifically comprises the following steps:
first, motion estimation is performed on a current frame macroblock and a previous reconstructed frame macroblock. The motion estimation is divided into two steps of integer pixel search and half pixel search. The whole pixel search adopts a hierarchical motion search method, namely, a search domain is graded: a stationary point, a small search field, a large search field. This is typically level 3, and the implementation may vary, e.g. level 2: a stationary point, a small search domain, i.e. combining the small search domain with a large search domain; or 4 stage: the method comprises the following steps of a static point, a small search domain, a large search domain and a larger search domain, namely, the large search domain is subdivided into two levels, but at least two levels are required. After the search in the first two stages of search domains is finished, the decision device A, B is required to determine whether the criterion is satisfied and the whole pixel search can be stopped to enter the half pixel search and the following coding core, and the specific decision criteria may be various.
Secondly, after the coding core is entered, the current prediction frame macro block is obtained by predicting the last reconstructed frame macro block according to the motion vector of the macro block obtained by motion estimation, then the current frame macro block is subtracted by the subtracter to calculate the prediction error, and then DCT and Q are carried out.
Finally, after the DCT and Q are finished, entering a decision device to judge whether the current data block is 0 block, and when the 0 block has no effect on the subsequent entropy coding and image reconstruction, finishing the coding process of the current block and returning to the prediction and subtraction device to process the next block, wherein the 0 block has no effect on the subsequent entropy coding and image reconstruction; if the block is not 0, entropy coding and image reconstruction are performed.
The disadvantage of this implementation of the encoder is that the encoding speed is not high enough. For QCIF format simple motion image sequences (e.g., Claire sequences), software real-time encoding (25 frames/second) is still not possible with Pentium-133 PC.
The invention aims to overcome the defects of the prior art, and adds judgment on a large amount of zero data in an image on the basis of the original coding method, so as to improve the speed of a coder on the premise of not or slightly reducing other performances.
The invention provides an improved method for realizing a video encoder, which is characterized by comprising the following steps:
(1) performing motion estimation on a pet block of a current frame and a pet block of a previous reconstructed frame, wherein the motion estimation comprises whole pixel search and half pixel search, the whole pixel search adopts a hierarchical motion search method, and a search domain is graded: a static point, a small search domain and a large search domain (which is a typical 3-level search domain, and can be changed, the same as the previous one), after the search of each level of search domain is finished, the search domain enters a decision device G, and the motion \ vector obtained by the current level of search is judged to calculate the prediction error through prognosis, so that whether the current prediction error macro block is changed into a 0 macro block (6 blocks in the macro block are all 0 blocks) through DCT and Q, if so, the coding process of the current macro block is finished, and the next macro block is switched to; otherwise, continuing the next-level search, after the whole pixel search is finished, entering a half-pixel search, after the half-pixel search is finished, obtaining a motion vector, and entering a coding core:
(2) after the current frame macroblock enters the coding core, predicting the previous reconstructed frame macroblock according to the motion vector of the macroblock MV obtained by motion estimation to obtain a current predicted frame macroblock, and then subtracting the current predicted frame macroblock from the current frame macroblock to calculate a prediction error;
(3) the prediction error is not DCT, Q and enters a judger L in advance to judge, judge whether the current error block is changed into 0 block through DCT, Q, if yes, the encoding process of the current block is ended, and the next block of the current macro block is switched to; if the block is not 0, DCT and Q are performed.
(4) Because the judger L can not ensure that all prediction errors changed into 0 by DCT and Q are judged in advance, a judger is still reserved after the DCT and Q to judge whether the data block is 0 after the DCT and Q, if so, the subsequent processing process is not needed to be carried out, and the next block is switched to; if not, the non-0 blocks are entropy encoded and image reconstructed.
Compared with the prior art, the invention has the following characteristics:
firstly, in the coding core part, a pre-transform decision device L is added before DCT and Q are carried out, and the decision device L can judge most of prediction error blocks which are changed into 0 blocks through DCT and Q in advance without DCT and Q, so that a large amount of operations for DCT and Q can be saved.
Secondly, a global decision device G is arranged after each stage of search in motion estimation is finished, once the criterion in the global decision device G is satisfied, the whole coding process of the macro block is finished, namely, the whole pixel search process is finished, and the whole coding core does not need to be finished even if half pixel search is carried out, thereby greatly improving the coding speed.
Brief description of the drawingsthe accompanying drawings:
fig. 1 is a block diagram of a hybrid encoding method.
Fig. 2 is a block diagram of a conventional video encoder.
Fig. 3 is a block diagram of a video encoder according to the present invention.
Fig. 4 is a flowchart of an implementation of the determiner L in this embodiment.
Fig. 5 is a flowchart of an implementation of the decision device G in this embodiment.
An embodiment of a video encoder implemented by the encoding method of the present invention is shown in fig. 3-5. The following is described in detail in connection with the figures:
the block diagram of the implementation of the novel video encoder of the present invention is shown in fig. 3, and the operation thereof includes the following steps:
1. and performing motion estimation, namely motion vector search on the current frame macro block and the last reconstructed frame macro block. The search for the motion vector includes a full-pel search and a half-pel search. First a full pixel search is performed. The whole pixel adopts a hierarchical motion searching method, which comprises the following steps:
a. searching a static point, namely judging whether the current prediction error macro block meets the criterion in the judger G, namely whether the current prediction error macro block is subjected to DCT transformation, and becomes a 0 macro block after Q is quantized (6 blocks in the macro block are all 0 blocks), if so, executing step 2, otherwise, continuing;
b. searching in a small search domain, judging whether the current prediction error macro block meets the criterion in a judger G, if so, executing step 2, otherwise, continuing;
c. searching in a large search domain, judging whether the current prediction error macro block meets the criterion in a judger G, if so, executing step 2, otherwise, executing step 3;
2. obtaining a current prediction macro block from the motion vector searched currently and a previous reconstruction frame, wherein the current prediction macro block is a current reconstruction macro block, and executing 6;
3. performing half-pixel search to obtain a motion vector, obtaining a current prediction macro block from the motion vector and a previous reconstruction frame, then subtracting the current prediction macro block from the current frame macro block to calculate a prediction error, and judging whether the prediction error meets the criterion of a judger L, if so, the current prediction macro block is the current reconstruction macro block, executing 6, otherwise, continuing;
4. performing DCT transformation, quantizing Q, judging whether the current data block is 0 block after DCT and Q quantization, if so, the current prediction macro block is the current reconstruction macro block, executing 6, otherwise, continuing;
5. inverse quantization IQ, inverse DCT transformation, adding the obtained result and the current prediction macro block to obtain the current reconstruction macro block;
6. and entropy coding is carried out, the coding of the current macro block is finished, and the coding of the next macro block is carried out.
The specific implementation block diagrams of the decision device L, G of the present invention are shown in fig. 4 and 5, and are described as follows:
the decider L relies on a block decision criterion (which may be referred to as an order criterion) that is: an 8 × 8 block data is represented by a set { f (x, y) | x, y =0,1, …,7}, and specifically, a prediction error block is represented by a set { f (x, y) | x, y =0,1, …,7}, in fig. 3. When it is satisfied with <math> <mrow> <mrow> <mo>(</mo> <mo>(</mo> <msub> <mi>x</mi> <msub> <mi>i</mi> <mn>0</mn> </msub> </msub> <mo>+</mo> <msub> <mi>y</mi> <msub> <mi>i</mi> <mn>0</mn> </msub> </msub> <mo>)</mo> </mrow> <mo>·</mo> <msup> <mi>cos</mi> <mn>2</mn> </msup> <mfrac> <mi>π</mi> <mn>16</mn> </mfrac> <mo>+</mo> <mrow> <mo>(</mo> <msub> <mi>x</mi> <msub> <mi>i</mi> <mn>1</mn> </msub> </msub> <mo>+</mo> <msub> <mi>y</mi> <msub> <mi>i</mi> <mn>1</mn> </msub> </msub> <mo>)</mo> </mrow> <mo>·</mo> <msup> <mi>cos</mi> <mn>2</mn> </msup> <mfrac> <mrow> <mn>3</mn> <mi>π</mi> </mrow> <mn>16</mn> </mfrac> <mo>+</mo> <mrow> <mo>(</mo> <msub> <mi>x</mi> <msub> <mi>i</mi> <mn>2</mn> </msub> </msub> <mo>+</mo> <msub> <mi>y</mi> <msub> <mi>i</mi> <mn>2</mn> </msub> </msub> <mo>)</mo> </mrow> <mo>·</mo> <msup> <mi>cos</mi> <mn>2</mn> </msup> <mfrac> <mrow> <mn>5</mn> <mi>π</mi> </mrow> <mn>16</mn> </mfrac> <mo>+</mo> </mrow> </math> <math> <mrow> <mrow> <mo>(</mo> <msub> <mi>x</mi> <msub> <mi>i</mi> <mn>3</mn> </msub> </msub> <mo>+</mo> <msub> <mi>y</mi> <msub> <mi>i</mi> <mn>3</mn> </msub> </msub> <mo>)</mo> </mrow> <mo>·</mo> <msup> <mi>cos</mi> <mn>2</mn> </msup> <mfrac> <mrow> <mn>7</mn> <mi>π</mi> </mrow> <mn>16</mn> </mfrac> <mo>)</mo> <mo><</mo> <mn>20</mn> <mi>QP</mi> </mrow> </math> Then the current block becomes 0 block via DCT and Q. Wherein, <math> <mrow> <msub> <mi>x</mi> <msub> <mi>i</mi> <mn>0</mn> </msub> </msub> <mo>≥</mo> <msub> <mi>x</mi> <msub> <mi>i</mi> <mn>1</mn> </msub> </msub> <mo>≥</mo> <msub> <mi>x</mi> <msub> <mi>i</mi> <mn>2</mn> </msub> </msub> <mo>≥</mo> <msub> <mi>x</mi> <msub> <mi>i</mi> <mn>3</mn> </msub> </msub> <mo>,</mo> <msub> <mi>i</mi> <mn>0</mn> </msub> <mo>,</mo> <msub> <mi>i</mi> <mrow> <mn>1</mn> <mo>,</mo> </mrow> </msub> <msub> <mi>i</mi> <mn>2</mn> </msub> <mo>,</mo> <msub> <mi>i</mi> <mn>3</mn> </msub> <mi>ϵ</mi> <mo>{</mo> <mn>0,1,2,3</mn> <mo>}</mo> </mrow> </math> and are different from each other in that, <math> <mrow> <msub> <mi>x</mi> <mi>j</mi> </msub> <mo>=</mo> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mn>7</mn> </munderover> <mo>|</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>|</mo> <mo>+</mo> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mn>7</mn> </munderover> <mo>|</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mn>7</mn> <mo>-</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>,</mo> <mi>j</mi> <mo>=</mo> <mn>0,1,2,3</mn> <mo>,</mo> <mo></mo> </mrow> </math> <math> <mrow> <msub> <mi>y</mi> <msubsup> <mi>i</mi> <mn>0</mn> <mo>′</mo> </msubsup> </msub> <mo>≥</mo> <msub> <mi>y</mi> <msubsup> <mi>i</mi> <mn>1</mn> <mo>′</mo> </msubsup> </msub> <mo>≥</mo> <msub> <mi>y</mi> <msubsup> <mi>i</mi> <mn>2</mn> <mo>′</mo> </msubsup> </msub> <mo>≥</mo> <msub> <mi>y</mi> <msubsup> <mi>i</mi> <mn>3</mn> <mo>′</mo> </msubsup> </msub> <mo>,</mo> <msubsup> <mi>i</mi> <mn>0</mn> <mo>′</mo> </msubsup> <mo>,</mo> <msubsup> <mi>i</mi> <mn>1</mn> <mo>′</mo> </msubsup> <mo>,</mo> <msubsup> <mi>i</mi> <mn>2</mn> <mo>′</mo> </msubsup> <mo>,</mo> <msubsup> <mi>i</mi> <mn>3</mn> <mo>′</mo> </msubsup> <mi>ϵ</mi> <mo>{</mo> <mn>0,1,2,3</mn> <mo>}</mo> </mrow> </math> and are different from each other in that, <math> <mrow> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>=</mo> <munderover> <mi>Σ</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>0</mn> </mrow> <mn>7</mn> </munderover> <mo>|</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>+</mo> <munderover> <mi>Σ</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>0</mn> </mrow> <mn>7</mn> </munderover> <mo>|</mo> <mi>f</mi> <mrow> <mo>(</mo> <mn>7</mn> <mo>-</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>,</mo> <mi>i</mi> <mo>=</mo> <mn>0,1,2,3</mn> <mo>.</mo> </mrow> </math> the QP is the two quantization parameter for the macroblock in which the block is located, which is half of one quantization step (QP is determined for macroblock coding). Thus, the flow chart of the implementation of the decision device L is shown in fig. 4. FIG. 4:
processing the prediction error block { f (x, y) | x, y =0,1, …,7} in rows and columns, respectively: for each row, first, the sum of absolute values of each row is calculated
<math> <mrow> <munderover> <mi>Σ</mi> <mrow> <mi>y</mi> <mo>=</mo> <mn>0</mn> </mrow> <mn>7</mn> </munderover> <mo>|</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>=</mo> <msub> <mi>u</mi> <mi>x</mi> </msub> <mo>,</mo> <mi>x</mi> <mo>=</mo> <mn>0,1</mn> <mo>,</mo> <mo>·</mo> <mo>·</mo> <mo>·</mo> <mo>,</mo> <mn>7</mn> </mrow> </math> Then, u
x=u
x+u
7-xX =0,1,2,3, p u
0,u
1,u
2,u
3Arranged in the order from big to small
For each column, calculating the sum of absolute values of each column
<math> <mrow> <munderover> <mi>Σ</mi> <mrow> <mi>x</mi> <mo>=</mo> <mn>0</mn> </mrow> <mn>7</mn> </munderover> <mi>f</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>v</mi> <mi>y</mi> </msub> <mo>,</mo> <mi>y</mi> <mo>=</mo> <mn>0,1</mn> <mo>,</mo> <mo>·</mo> <mo>·</mo> <mo>·</mo> <mo>,</mo> <mn>7</mn> </mrow> </math> Calculating v
y=v
y+v
7-yY =0,1,2,3, pairs being arranged in descending order
<math> <mrow> <msub> <mi>y</mi> <msubsup> <mi>i</mi> <mn>0</mn> <mo>′</mo> </msubsup> </msub> <mo>,</mo> <msub> <mi>y</mi> <msubsup> <mi>i</mi> <mn>1</mn> <mo>′</mo> </msubsup> </msub> <mo>,</mo> <msub> <mi>y</mi> <msubsup> <mi>i</mi> <mn>2</mn> <mo>′</mo> </msubsup> </msub> <mo>,</mo> <msub> <mi>y</mi> <msubsup> <mi>i</mi> <mn>3</mn> <mo>′</mo> </msubsup> </msub> <mo>,</mo> </mrow> </math> Computing
And finally, comparison is carried out: sum < 20 QP? When sum is less than 20QP, the criterion is met, and the current DFD block is changed into a 0 block through DCT and Q; if sum is more than or equal to 20QP, the criterion is not satisfied.
The decision device G implementation method comprises the following steps:
the decision device G implements the decision of the macroblock and one macroblock is 6 blocks, so the decision of the macroblock can be divided into the decision of 6 blocks, and the decision of the block can be made by the method of the previous decision device L. Fig. 5 is a flow chart of an implementation of the decider G.
In fig. 5:
the current prediction error PE macroblock is first calculated,
PE macroblock = current frame macroblock — current predicted frame macroblock, and then the order criterion is used to make a decision for each of 6 blocks in the PE macroblock: the current block is changed into 0 block through DCT and Q, and when 6 blocks all meet the sequence criterion, the whole macro block meets the criterion.
The coding parameters and coding speed for the simple sequence Claire sequence and the complex sequence Foreman sequence in QCIF format on a Pentium-133PC are given below. Claire sequence: the quantization parameter for the I frame is 5 and the quantization parameter for the P frame is 7 (i.e., the formula above)
QP =7), the integer pixel search in motion estimation employs a two-level search: a static point,
Small search domain with region length of 5, each level of search algorithm is sampling method
The interval is 3, and the block matching operation adopts a sub-sampling method.
As a result: the encoding frame rate (rate) reaches an average of 25 frames/second. Foreman sequence: the I frame quantization parameter is 15 and the P frame quantization parameter is 15 (i.e., the above disclosure)
Equation QP =15), the integer pixel search in motion estimation employs three levels of search: at rest
Point, small motion region (region length of 4), large search region (region length of 10),
each level of search algorithm is a sampling method, the sampling interval is 3, and the block matching operation adopts
A sub-sampling method.
As a result: the encoding frame rate (rate) reaches an average of 10 frames of sand.