CN1219403C

CN1219403C - MPEG video code rate conversion method based on visual model

Info

Publication number: CN1219403C
Application number: CN 02157889
Authority: CN
Inventors: 张勇东; 曹岗; 林守勋; 李锦涛
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2002-12-20
Filing date: 2002-12-20
Publication date: 2005-09-14
Anticipated expiration: 2022-12-20
Also published as: CN1510923A

Abstract

A kind of MPEG video code rate conversion method that introduces visual model, comprises steps: carry out partial decoding to input code stream; DCT coefficient truncation, removes the coefficient higher than cut-off frequency; Code rate control, determines the quantization of each macroblock again factor; recoding. The present invention cleverly utilizes the Fovea visual model in the conversion, effectively improves the conversion efficiency, generates relatively better subjective quality and low code rate code stream, and further reduces the amount of calculation.

Description

MPEG video code rate conversion method based on visual model

技术领域technical field

本发明涉及MPEG视频码流码率转换方法。The invention relates to a code rate conversion method of an MPEG video code stream.

背景技术Background technique

随着视频压缩技术和网络技术的发展，各种网络多媒体服务，如多点视频会议、视频点播、数字电视等，不断出现。为了支持各种服务，视频服务器必须适应客户端与传输信道的异质性，从而要求其具有视频码流转换的功能。码流转换包括语法转换、(空间和时间)分辨率转换、码率转换等。本发明针对码率转换，即把已有的视频码流根据传输信道的实际带宽限制转换为与之相适应的更低码率的码流。With the development of video compression technology and network technology, various network multimedia services, such as multi-point video conferencing, video on demand, digital TV, etc., are constantly emerging. In order to support various services, the video server must adapt to the heterogeneity of the client and the transmission channel, thus requiring it to have the function of video code stream conversion. Code stream conversion includes syntax conversion, (spatial and time) resolution conversion, code rate conversion, etc. The present invention aims at code rate conversion, that is, converts the existing video code stream into a corresponding lower code stream according to the actual bandwidth limitation of the transmission channel.

目前视频码流转换有许多方法，可以概括为三类体系结构：(1)级联像素域转换；(2)快速级联像素域转换；(3)DCT(离散余弦变换)域转换。级联像素域转换需要经过完全解码，再重新编码的过程，计算量大，转换速度很慢。DCT域转换直接在DCT域上进行，无需DCT/IDCT过程，计算量很小，但是它的灵活性受到限制，当要求改变运动矢量时很难实现，不易实现扩展。快速级联像素域转换是级联像素域转换的简化版，由于不需要进行运动估计，因此转换速度明显高于级联像素域转换；但有由于要进行DCT/IDCT过程，因此转换速度要低于DCT域转换。At present, there are many methods for video code stream conversion, which can be summarized into three types of architectures: (1) cascaded pixel domain conversion; (2) fast cascaded pixel domain conversion; (3) DCT (discrete cosine transform) domain conversion. The cascaded pixel domain conversion needs to be completely decoded and then re-encoded. The calculation is heavy and the conversion speed is very slow. The DCT domain conversion is performed directly on the DCT domain without the DCT/IDCT process, and the amount of calculation is small, but its flexibility is limited. It is difficult to implement when the motion vector is required to be changed, and it is not easy to expand. Fast cascaded pixel domain conversion is a simplified version of cascaded pixel domain conversion. Since motion estimation is not required, the conversion speed is significantly higher than cascaded pixel domain conversion; however, due to the DCT/IDCT process, the conversion speed is lower. Converted in the DCT domain.

目前已有视频码流转换没有很好地利用人类视觉系统(HVS)特性，导致所转换成的低码率码流不能很好地与HVS特性一致，主观质量较差，转换效率低。At present, the conversion of existing video code streams does not make good use of the characteristics of the Human Visual System (HVS), resulting in the conversion of low-bit-rate code streams that cannot be well consistent with the characteristics of HVS, resulting in poor subjective quality and low conversion efficiency.

发明内容Contents of the invention

本发明的目的是提供一种与HVS特性一致的快速MPEG视频码流码率转换方法，在异质网络环境中传递主观质量更好的视频码流。The purpose of the present invention is to provide a fast MPEG video code stream code rate conversion method consistent with HVS characteristics, and to transmit video code streams with better subjective quality in a heterogeneous network environment.

为了实现上述目的，一种引入视觉模型的MPEG视频码流码率转换方法，包括步骤：In order to achieve the above object, a kind of MPEG video bit rate conversion method that introduces visual model, comprises steps:

对输入的码流进行部分解码；Partially decode the input stream;

DCT系数截断，去除高于截止频率的系数；DCT coefficient truncation to remove coefficients higher than the cutoff frequency;

码率控制，重新确定各宏块的量化因子；Rate control, re-determining the quantization factor of each macroblock;

再编码。Recode.

本发明在转换中巧妙地利用了Fovea视觉模型，有效地提高转换效率，产生主观质量相对更好低码率码流，并进一步减少了计算量。The present invention cleverly utilizes the Fovea visual model in the conversion, effectively improves the conversion efficiency, generates relatively better subjective quality and low code rate code stream, and further reduces the amount of calculation.

附图说明Description of drawings

图1是本发明的结构示意图；Fig. 1 is a structural representation of the present invention;

图2是8×8 DCT系数块的多分辨率频带表示。Figure 2 is a multi-resolution band representation of an 8 × 8 block of DCT coefficients.

具体实施方式Detailed ways

为了更好地理解本发明，首先对Fovea视觉模型给予说明。根据对HVS研究表明：人眼对于视觉信息的采样是非均匀的。一般情况下，人眼观看一幅图像时有一个注视点，可称为Fovea点，在该点处人眼具有最高感知清晰度。以该点为中心，向周围延伸人眼感知清晰度快速下降。依据这样的特性，人们给出可应用于视频图像编码的Fovea视觉模型：给定Fovea点，对于图像中的任意一点(x，y)，它的截止频率(人眼的最大可感知频率)f_c(x，y)由下面的公式确定：In order to better understand the present invention, the Fovea visual model is first explained. According to the research on HVS, it is shown that the sampling of visual information by human eyes is non-uniform. Generally, when human eyes watch an image, there is a fixation point, which can be called the Fovea point, and the human eyes have the highest perceived clarity at this point. Taking this point as the center, extending to the surroundings, the perceived sharpness of the human eye decreases rapidly. According to such characteristics, people give a Fovea visual model that can be applied to video image coding: Given a Fovea point, for any point (x, y) in the image, its cut-off frequency (the maximum perceivable frequency of the human eye) f _c (x,y) is determined by the following formula:

${f f}_{c c} ((x x,, y the y)) = = min min {{\frac{11}{88} : : d d &GreaterEqual; &Greater Equal; B B [[i i,, V V]],, 11 \leq \leq i i \leq \leq 88,, i i &Element; &Element; {Z Z}^{+ +}}}$

d＝(x-x_f)2⁺(y-y_f)² d＝(xx _f )2 ⁺ (yy _f ) ²

B[i，V]＝min{r²：[f_c(r，V)×8]＝i，r∈Z⁺}B[i, V]=min{r ² : [f _c (r, V)×8]=i, r∈Z ⁺ }

${f f}_{c c} ((r r,, V V)) = = \frac{11}{11 + + K K arctan arctan ((\frac{r r - - R R}{V V}))}$

其中，(x_f，y_f)代表图像中Fovea点坐标，V代表视点到图像的距离，模型参数k＝13.75，R代表以Fovea点为中心的圆形区域的半径，对该区域给予最高感知清晰度(即f_c＝1.0)的编码。在图像中频率高于截止频率f_c(x，y)的信息不能被人眼感知。Among them, (x _f , y _f ) represents the coordinates of the Fovea point in the image, V represents the distance from the viewpoint to the image, the model parameter k=13.75, and R represents the radius of the circular area centered on the Fovea point, which gives the highest perception Coding of sharpness (ie f _c =1.0). Information with frequencies higher than the cut-off frequency f _c (x, y) in the image cannot be perceived by the human eye.

把一帧图像分为8个区域，每个区域中具有相同的截止频率，不同的区域截止频率不同，截止频率取值范围是： $\frac{i}{8} (1 \leq i \leq 8, i &Element; Z^{+}) .$ Divide a frame of image into 8 areas, each area has the same cut-off frequency, different areas have different cut-off frequencies, and the value range of cut-off frequency is: $\frac{i}{8} (1 \leq i \leq 8, i &Element; Z^{+}) .$

图1给出了本发明的结构示意图，图中缩写的意思是：VLD-变字长解码、VLC-变字长编码、DCT-离散余弦变换、IDCT-反离散余弦变换、Q-量化、IQ-反量化、MV-运动矢量、MC-运动补偿、FM-帧存储。鉴于快速级联像素域转换的体系结构具有计算量较小，结构灵活，便于扩展的优点，本发明基于该结构，并依据Fovea视觉模型进行了相应的改进。本发明主要由以下几个部分构成：Fig. 1 has provided the structural representation of the present invention, and abbreviation means among the figure: VLD-variable word length decoding, VLC-variable word length coding, DCT-discrete cosine transform, IDCT-inverse discrete cosine transform, Q-quantization, IQ - Dequantization, MV - Motion Vector, MC - Motion Compensation, FM - Frame Storage. In view of the fast cascaded pixel domain conversion architecture having the advantages of less calculation, flexible structure, and easy expansion, the present invention is based on this structure, and corresponding improvements are made according to the Fovea visual model. The present invention mainly consists of the following parts:

●部分解码● Partial decoding

对输入的码率为R₁的MPEG视频流进行变字长解码(VLC)，之后根据码流中的量化因子信息进行反量化(IQ1)，得到每个8×8块DCT系数。Perform variable word length decoding (VLC) on the input MPEG video stream with a code rate of R ₁ , and then perform inverse quantization (IQ1) according to the quantization factor information in the code stream to obtain DCT coefficients for each 8×8 block.

●DCT系数截断●DCT coefficient truncation

依据Fovea视觉模型，在8×8 DCT块内高于截止频率的系数不能被人主观视觉感知，如果将其去除，不会影响主观视觉质量，可以有效地提高转换效率。DCT系数截断模块就是为实现这一目的而加入的。According to the Fovea visual model, the coefficients higher than the cut-off frequency in the 8×8 DCT block cannot be perceived by human subjective vision. If they are removed, the subjective visual quality will not be affected, and the conversion efficiency can be effectively improved. The DCT coefficient truncation module is added for this purpose.

可以近似认为一个8×8块具有唯一截止频率，一般取8×8块的中心点为代表，由它的坐标计算该块的截止频率f_c。一个8×8的DCT系数块可分成8个频带，构成多分辨率表示，如图2所示。对于任意一频带m，它的频率f(m)为： $\frac{m}{8} (1 \leq m \leq 8, m &Element; Z^{+}) .$ 这样基于Fovea视觉模型的DCT系数截断方法可以如下表述：给定Fovea点，对于一个8×8的DCT块，其截止频率为f_c，它的一个DCT系数为F(u，v)，该系数属于频带m，那么：It can be approximately considered that an 8×8 block has a unique cutoff frequency, and the center point of the 8×8 block is generally taken as a representative, and the cutoff frequency f _c of the block is calculated from its coordinates. An 8×8 block of DCT coefficients can be divided into 8 frequency bands to form a multi-resolution representation, as shown in Figure 2. For any frequency band m, its frequency f(m) is: $\frac{m}{8} (1 \leq m \leq 8, m &Element; Z^{+}) .$ In this way, the DCT coefficient truncation method based on the Fovea visual model can be expressed as follows: Given a Fovea point, for an 8×8 DCT block, its cut-off frequency is f _c , and one of its DCT coefficients is F(u, v), the coefficient belongs to band m, then:

$F f ((u u,, v v)) = = \{\begin{matrix} F f ((u u,, v v)) & f f ((m m)) \leq \leq {f f}_{c c} \\ 00 & f f ((m m)) > > {f f}_{c c} \end{matrix}$

●码率控制●Rate control

要把MPEG视频码流的码率由R₁降为R₂，就要运用码率控制模块重新确定各宏块的量化因子，根据量化因子对DCT系数重新量化。本发明依据Fovea视觉模型对原有的MPEG TM5码率控制方法进行改进，构成新的基于Fovea视觉模型的码率控制方法，其主要步骤如下：To reduce the bit rate of the MPEG video stream from R ₁ to R ₂ , it is necessary to use the bit rate control module to re-determine the quantization factor of each macroblock, and re-quantize the DCT coefficients according to the quantization factor. The present invention improves the original MPEG TM5 code rate control method according to the Fovea visual model, and forms a new code rate control method based on the Fovea visual model, and its main steps are as follows:

(1)图像帧级目标编码比特数分配(1) Allocation of image frame-level target coding bits

具体方法与TM5方法相同，不再详细阐述。The specific method is the same as the TM5 method and will not be described in detail.

(2)宏块级目标编码比特数分配(2) Allocation of macroblock-level target coding bits

假设一帧图像的编码比特数为R，在此图像中共有M个宏块，每个宏块中有N个8×8块。原有的TM5方法对每个宏块平均分配目标编码比特数，即对于任一个宏块k，它被分配的目标编码比特数为 $r^{(k)} = \frac{R}{M} .$ 经改进后，宏块的目标编码比特数根据截止频率的大小比例来分配(宏块内的截止频率越高，被分配的目标编码比特数应越多)，即：Assuming that the number of coded bits of a frame of image is R, there are M macroblocks in this image, and each macroblock has N 8×8 blocks. The original TM5 method allocates the target number of coding bits to each macroblock evenly, that is, for any macroblock k, its allocated target number of coding bits is $r^{(k)} = \frac{R}{m} .$ After improvement, the target number of coding bits of the macroblock is allocated according to the size ratio of the cutoff frequency (the higher the cutoff frequency in the macroblock, the more target number of coding bits should be allocated), that is:

${r r}^{((k k))} = = \frac{{Σ Σ}_{j j = = 00}^{N N} {(({f f}_{c c}^{((k k))} ((j j))))}^{22}}{{Σ Σ}_{i i = = 00}^{M m \times \times N N} {(({f f}_{c c} ((i i))))}^{22}} R R$

其中表示宏块k内的N个8×8块的截止频率的平方和，

为图像内所有8×8块的截止频率的平方和。in Indicates the sum of squares of the cut-off frequencies of N 8×8 blocks within macroblock k,

is the sum of the squares of the cutoff frequencies of all 8×8 blocks in the image.

(3)码率控制(3) Bit rate control

根据虚拟缓冲区(VBV)的满度，确定各宏块的参考量化因子Q_i。此处采用的方法与TM5相同，不再详细阐述。According to the fullness of the virtual buffer zone (VBV), the reference quantization factor Q _i of each macroblock is determined. The method adopted here is the same as that of TM5, and will not be described in detail here.

(4)自适应量化(4) Adaptive quantization

在TM5方法中，根据宏块的空间活动性来自适应确定它的最终量化因子，而宏块的空间活动性是该宏块内所有8×8块空间活动性的最小值，其中8×8块空间活动性是由块内的信息变化率V来确定，即：In the TM5 method, the final quantization factor is adaptively determined according to the spatial activity of the macroblock, and the spatial activity of the macroblock is the minimum value of the spatial activity of all 8×8 blocks in the macroblock, where the 8×8 block Spatial activity is determined by the rate of information change V within the block, namely:

$V = \frac{1}{64}$ $Σ_{i = 0}^{64} {(p_{i} - p_{mean})}^{2},$ 其中 $p_{mean} = \frac{1}{64} Σ_{i = 0}^{64} {p_{i}}^{2}$ $V = \frac{1}{64}$ $Σ_{i = 0}^{64} {(p_{i} - p_{mean})}^{2},$ in $p_{mean} = \frac{1}{64} Σ_{i = 0}^{64} {p_{i}}^{2}$

其中p_i表示块内第i个像素的亮度值。在压缩域上这样的信息无法得到，为此本发明提出了DCT块空间活动性V_DCT的计算方法：where p _i represents the brightness value of the i-th pixel in the block. Such information cannot be obtained in the compressed domain, so the present invention proposes a calculation method of DCT block spatial activity V_DCT:

$V V__DCT DCT = = \frac{11}{N N} {Σ Σ}_{i i = = 00}^{N N} {| | {F f}_{i i} | |}^{22}$

其中，此DCT块内低于此块截止频率的所有交流系数的个数为N，F_i表示这N个系数中的一个的值。Wherein, the number of all AC coefficients in the DCT block lower than the cutoff frequency of the block is N, and F _i represents the value of one of the N coefficients.

根据宏块内所有8×8 DCT块空间活动性宏块的空间活动性，确定该宏块的空间活动性(经规范化后)NV_i，那么该宏块的最终量化因子mq_i为：According to the spatial activity of all 8 × 8 DCT block spatial activity macroblocks in the macroblock, determine the spatial activity (after normalization) NV _i of this macroblock, then the final quantization factor mq _i of this macroblock is:

mq_i＝Q_i×NV_i mq _i =Q _i ×NV _i

●再编码●Recode

根据各宏块的最终量化因子mq_i对该宏块内的所有DCT块的系数进行再量化(Q2)，之后再进行变字长编码(VLC)，生成码率为R₂的MPEG视频码流。Requantize (Q2) the coefficients of all DCT blocks in the macroblock according to the final quantization factor mq _i of each macroblock, and then perform variable word length coding (VLC) to generate an MPEG video stream with a code rate of R ₂ .

●误差漂移补偿●Error drift compensation

以上过程即可以实现MPEG视频码流转换。然而由于对DCT系数的再量化(Q2)会引起编码端和解码端的参考图像的不匹配，导致误差漂移，影响转换后所生成的码流的图像质量。为此需要误差漂移补偿模块来避免误差漂移。The above process can realize MPEG video code stream conversion. However, the requantization (Q2) of the DCT coefficients will cause a mismatch between the reference images at the encoding end and the decoding end, resulting in error drift and affecting the image quality of the converted code stream. To this end, an error drift compensation module is needed to avoid error drift.

把再量化前的DCT系数与再量化后的DCT系数的差值进行IDCT变换，得到像素域系数，送入帧存储器中。然后根据部分解码所得到的运动矢量(MV)信息，在像素域进行运动补偿(MC)，并将所得的预测值利用DCT变换转换成DCT系数，反馈回去与原有的预测帧的残差DCT系数相加，从而实现误差漂移补偿。IDCT transforms the difference between the DCT coefficients before requantization and the DCT coefficients after requantization to obtain pixel domain coefficients and send them to the frame memory. Then, according to the motion vector (MV) information obtained by partial decoding, motion compensation (MC) is performed in the pixel domain, and the obtained prediction value is converted into DCT coefficients by DCT transformation, and fed back to the residual DCT of the original prediction frame. The coefficients are summed to achieve error drift compensation.

由于要进行IDCT及DCT变换，因此与DCT域转换相比，运算量较大。但是根据Fovea视觉模型，对一部分DCT系数可以不予计算，据此本发明提出了DCT/IDCT快速计算方法，显著减小DCT/IDCT计算量。原有的DCT及IDCT计算公式分别为：Compared with the DCT domain conversion, the calculation amount is relatively large due to the IDCT and DCT conversion. However, according to the Fovea visual model, some DCT coefficients may not be calculated. Accordingly, the present invention proposes a DCT/IDCT fast calculation method, which significantly reduces the DCT/IDCT calculation amount. The original DCT and IDCT calculation formulas are:

$F f ((u u,, v v)) = = \frac{11}{44} C C ((u u)) C C ((v v)) {Σ Σ}_{i i = = 00}^{77} {Σ Σ}_{j j = = 00}^{77} f f ((i i,, j j)) \times \times cos cos \frac{πu πu ((22 i i + + 11))}{1616} cos cos \frac{πv πv ((22 j j + + 11))}{1616}$

$f f ((i i,, j j)) = = \frac{11}{44} {Σ Σ}_{i i = = 00}^{77} {Σ Σ}_{j j = = 00}^{77} C C ((u u)) C C ((v v)) F f ((u u,, v v)) \times \times cos cos \frac{πu πu ((22 i i + + 11))}{1616} cos cos \frac{πv πv ((22 j j + + 11))}{1616}$

设一个8×8块的截止频率为 $\frac{t}{8} (1 \leq t \leq 8, t &Element; Z^{+})$ 那么在该块内所有高于截止频率的高频DCT系数不被人眼感知，可以不予处理，即赋值为0。因此在对该块进行DCT/IDCT变换时，只计算低于截止频率的DCT系数，从而DCT及IDCT计算公式变为：Let the cutoff frequency of an 8×8 block be $\frac{t}{8} (1 \leq t \leq 8, t &Element; Z^{+})$ Then all high-frequency DCT coefficients higher than the cut-off frequency in this block are not perceived by human eyes, and may not be processed, that is, assigned a value of 0. Therefore, when the DCT/IDCT transformation is performed on the block, only the DCT coefficients lower than the cutoff frequency are calculated, so that the calculation formulas of DCT and IDCT become:

$f f ((i i,, j j)) = = \frac{11}{44} {Σ Σ}_{i i = = 00}^{i i} {Σ Σ}_{j j = = 00}^{i i} C C ((u u)) C C ((v v)) F f ((u u,, v v)) \times \times cos cos \frac{πu πu ((22 i i + + 11))}{1616} cos cos \frac{πv πv ((22 j j + + 11))}{1616}$

最后，需要指出在本发明中，Fovea点的选择可以由用户通过鼠标用交互的方式实现。Finally, it should be pointed out that in the present invention, the selection of the Fovea point can be realized by the user in an interactive manner through the mouse.

Claims

1. MPEG video stream code rate conversion method of introducing vision mode comprises step:

Code stream to input carries out partial decoding of h;

The DCT coefficient blocks, and removes the coefficient that is higher than cut-off frequency;

Rate Control redefines the quantizing factor of each macro block;

Encode again.

2. by the described method of claim 1, it is characterized in that described partial decoding of h comprises step:

Video flowing to input carries out variable length decoding;

Carry out inverse quantization according to the quantizing factor in the code stream.

3. by the described method of claim 1, it is characterized in that described Rate Control comprises step:

Picture frame level target code bit number distributes;

Macro-block level target code bit number distributes, and distributes according to the size of cut-off frequency;

According to the full scale of virtual buffering region, determine the reference quantization factor Q of each macro block _i

Adaptive quantizing.

4. by the described method of claim 1, it is characterized in that described coding again comprises step:

According to the final quantizing factor of each macro block, the coefficient of all the DCT pieces in this macro block is quantized;

Carry out the mutilation long codes again.

5. by the described method of claim 1, it is characterized in that also comprising the error drift compensation process:

The difference of the DCT coefficient after quantizing preceding DCT coefficient again and quantizing is carried out idct transform;

According to the resulting motion vector information of partial decoding of h, carry out motion compensation in pixel domain;

Utilize dct transform to convert the DCT coefficient to resulting predicted value, and feedback the residual error DCT coefficient addition with original predictive frame.

6. by the described method of claim 5, it is characterized in that the conversion Calculation formula of described DCT/IDCT is as follows:

f (i, j) = \frac{1}{4} Σ_{i = 0}^{t} Σ_{j = 0}^{t} C (u) C (v) F (u, v) \times \cos \frac{πu (2 i + 1)}{16} \cos \frac{πv (2 j + 1)}{16}