CN1843040A

CN1843040A - Scalable video coding and decoding methods, and scalable video encoder and decoder

Info

Publication number: CN1843040A
Application number: CN 200480024363
Authority: CN
Inventors: 李培根; 河昊振; 韩宇镇; 李宰荣
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2003-08-26
Filing date: 2004-08-14
Publication date: 2006-10-04

Abstract

Scalable video coding and decoding methods, a scalable video encoder, and a scalable video decoder. The scalable video coding method includes receiving a GOP, performing temporal filtering and spatial transformation thereon, quantizing and generating a bitstream. The scalable video encoder for performing the scalable video coding method includes a weight determination block which determines a weight for scaling. The scalable video decoding method includes dequantizing the coded image information obtained from a received bitstream, performing descaling, inverse spatial transformation, and inverse temporal filtering on the scaled transform coefficients, thereby recovering video frames. The scalable video decoder for performing the scalable video decoding method includes an inverse weighting block. The standard deviation of Peak Signal to Noise Ratios (PSNRs) of frames included in a group of pictures (GOP) is reduced so that video coding performance can be increased.

Description

Scalable video encoding and decoding method and scalable video encoder and decoder

技术领域technical field

本发明涉及视频压缩，具体涉及使用权重(weight)的可扩缩(scalable)视频编码和解码方法以及分别使用所述方法的编码器和解码器。The present invention relates to video compression and in particular to scalable video encoding and decoding methods using weights and encoders and decoders respectively using said methods.

背景技术Background technique

随着包括因特网的信息通信技术的发展，视频通信以及文本和语音通信已经增加。传统的文本通信不能满足用户的各种需要，因此，对于可以提供诸如文本、画面和音乐的各种类型信息的多媒体服务的需要已经增加。多媒体数据需要大容量存储介质和用于传输的宽频带，因为多媒体数据的量通常大。例如，具有640*480的分辨率的24比特的真彩色图像需要每帧640*480*24比特的容量，即大约7.37M比特的数据。当以每秒30帧的速度发送这个图像时，需要221M比特/秒的带宽。当存储基于这样的图像的90分钟电影时，需要大约1200G比特的存储空间。因此，压缩编码方法是发送包括文本、视频和音频之类的多媒体数据的必需。With the development of information communication technology including the Internet, video communication as well as text and voice communication have increased. Conventional text communication cannot satisfy various needs of users, and therefore, demand for multimedia services that can provide various types of information such as text, pictures, and music has increased. Multimedia data requires a large-capacity storage medium and a wide frequency band for transmission because the amount of multimedia data is usually large. For example, a 24-bit true-color image with a resolution of 640*480 requires a capacity of 640*480*24 bits per frame, that is, about 7.37M bits of data. When sending this image at 30 frames per second, a bandwidth of 221 Mbit/s is required. When storing a 90-minute movie based on such images, a storage space of about 1200 Gbit is required. Therefore, a compression encoding method is necessary to transmit multimedia data including text, video and audio.

数据压缩的基本原理是去除数据冗余。可以通过去除空间冗余、时间冗余、或者考虑到人的视力和对于高频信号的有限感知而去除记忆视觉冗余来压缩数据，在所述空间冗余中，同一颜色或目标在图像中重复，在所述时间冗余中，在运动图像中的相邻帧之间有很小的变化，或者相同声音在音频上重复。数据压缩可以按照是否丢失源数据而被分类为有损/无损压缩，按照是否独立地压缩各个帧而被分类为帧内/帧间压缩，并且按照是否压缩所需要的时间与恢复所需要的时间相同而被分类为对称/非对称压缩。数据压缩被定义为压缩/恢复时间延迟不超过50毫秒的实时压缩和帧具有不同的分辨率的可扩缩压缩。对于文本或医疗数据，通常使用无损压缩。对于多媒体数据，通常使用有损压缩。同时，通常使用帧内压缩来去除空间冗余，并且通常使用帧间压缩来去除时间冗余。The basic principle of data compression is to remove data redundancy. Data can be compressed by removing spatial redundancy where the same color or object is in an image, temporal redundancy, or memory visual redundancy taking into account human vision and limited perception of high-frequency signals Repetition, in which temporal redundancy occurs when there is a small change between adjacent frames in a moving picture, or the same sound is repeated audibly. Data compression can be classified as lossy/lossless by whether or not the source data is lost, intra/inter by whether or not each frame is compressed independently, and by the time required for compression versus the time required for recovery The same and are classified as symmetric/asymmetric compression. Data compression is defined as real-time compression where the compression/recovery time delay does not exceed 50 ms and scalable compression where frames have different resolutions. For text or medical data, lossless compression is usually used. For multimedia data, lossy compression is usually used. Meanwhile, intra-frame compression is generally used to remove spatial redundancy, and inter-frame compression is generally used to remove temporal redundancy.

多媒体的不同类型的传输媒体具有不同的性能。当前使用的传输媒体具有各种传输速率。例如，超高速通信网络可以发送每秒几十兆比特(megabit)的数据，而移动通信网络具有每秒384k比特的传输速率。在诸如运动图像专家组(MPEG)-1、MPEG-2、H.263和H.264之类的传统视频编码方法中，通过基于运动估计和补偿的运动补偿来去除时间冗余，并且通过变换编码来去除空间冗余。这些方法具有令人满意的压缩率，但是它们没有真实可扩缩比特流的灵活性，因为它们在主算法中使用自反(reflexive)方法。因此，为了支持具有各种速度的传输媒体或以适合于发送环境的数据率来发送多媒体，具有扩缩性的数据编码方法，诸如小波视频编码和子带视频编码，可能适合于多媒体环境。扩缩性指示部分地解码单个压缩比特流的能力。扩缩性包括：空间扩缩性，指示视频分辨率；信噪比(SNR)扩缩性，指示视频质量水平；时间扩缩性，指示帧速率。可扩缩的视频编码器对单个流编码，并且可以以不同的质量水平、分辨率或帧速率来发送编码流的部分，以适配诸如比特率、误差和资源之类的限制条件。可扩缩视频解码器可以在改变质量水平、分辨率或帧速率的同时解码所发送的视频流。Different types of transmission media for multimedia have different properties. Currently used transmission media have various transmission rates. For example, an ultra-high-speed communication network can transmit data of tens of megabits per second, while a mobile communication network has a transmission rate of 384 kbits per second. In traditional video coding methods such as Moving Picture Experts Group (MPEG)-1, MPEG-2, H.263, and H.264, temporal redundancy is removed by motion compensation based on motion estimation and compensation, and by transform encoding to remove spatial redundancy. These methods have satisfactory compression ratios, but they do not have the flexibility of true scalable bitstreams because they use reflexive methods in the main algorithm. Therefore, in order to support transmission media with various speeds or transmit multimedia at a data rate suitable for the transmission environment, data coding methods with scalability, such as wavelet video coding and sub-band video coding, may be suitable for the multimedia environment. Scalability indicates the ability to partially decode a single compressed bitstream. Scalability includes: spatial scalability, indicating video resolution; signal-to-noise ratio (SNR) scalability, indicating video quality level; and temporal scalability, indicating frame rate. Scalable video encoders encode a single stream and can send portions of the encoded stream at different quality levels, resolutions, or frame rates to accommodate constraints such as bit rate, error, and resources. A scalable video decoder can decode the transmitted video stream while changing the quality level, resolution or frame rate.

帧间小波视频编码(IWVC)可以提供很灵活、可扩缩的比特流。但是，传统的IWVC具有比诸如H.264之类的编码方法更低的性能。由于这个较低的性能，虽然IWVC具有良好的扩缩性，但是它仅仅用于很有限的应用。因此，改善具有扩缩性的数据编码方法的性能是一个问题。Inter-frame wavelet video coding (IWVC) can provide a very flexible, scalable bit stream. However, conventional IWVC has lower performance than encoding methods such as H.264. Due to this lower performance, although IWVC has good scalability, it is only used in very limited applications. Therefore, improving the performance of data encoding methods with scalability is a problem.

图1是IWVC的流程图。Figure 1 is a flowchart of IWVC.

在步骤S1，以包括多个帧的画面组(GOP)为单位来接收图像。优选的是，对于时间扩缩性，所述GOP包括2ⁿ(n＝1、2、3、...)个帧。在本发明的实施例中，所述GOP包括16个帧，并且以GOP为单位来执行各种操作。In step S1, an image is received in units of a group of pictures (GOP) including a plurality of frames. Preferably, for temporal scalability, said GOP comprises ²ⁿ (n=1, 2, 3, . . . ) frames. In the embodiment of the present invention, the GOP includes 16 frames, and various operations are performed in units of GOP.

接着，在步骤S2中，使用分层可变大小块匹配(HVSBM)来进行运动估计。当原始图像大小是N*N时，使用小波变换来获得等级0(N*N)、等级1(N/2*N/2)和等级2(N/4*N/4)的图像。对于等级2的图像，运动估计块大小从16*16改变到8*8和4*4，对于每个块执行运动估计，并且相对于每个块获得绝对失真的幅度(MAD)。类似地，对于等级1的图像，运动估计块大小从32*32改变到16*16、8*8和4*4，对于每个块执行运动估计，并且相对于每个块获得MAD。对于等级0的图像，运动估计块大小从64*64改变到32*32、16*16、8*8和4*4，对于每个块执行运动估计，并且相对于每个块获得MAD。Next, in step S2, motion estimation is performed using Hierarchical Variable Size Block Matching (HVSBM). When the original image size is N*N, wavelet transform is used to obtain images of level 0 (N*N), level 1 (N/2*N/2), and level 2 (N/4*N/4). For class 2 images, the motion estimation block size is changed from 16*16 to 8*8 and 4*4, motion estimation is performed for each block, and the magnitude of absolute distortion (MAD) is obtained with respect to each block. Similarly, for a class 1 image, the motion estimation block size is changed from 32*32 to 16*16, 8*8, and 4*4, motion estimation is performed for each block, and MAD is obtained with respect to each block. For images of level 0, the motion estimation block size is changed from 64*64 to 32*32, 16*16, 8*8, and 4*4, motion estimation is performed for each block, and MAD is obtained with respect to each block.

接着，在步骤S3，修剪(prune)运动估计树以最小化MAD。Next, at step S3, the motion estimation tree is prune to minimize MAD.

然后，在步骤S4，使用被修剪的最佳运动估计树来进行运动补偿时间过滤(MCTF)，将参照图2来对此说明。参见图2，在每个帧内写入的数字表示在时间序列中的所述帧的位置，Wn(其中n＝1、2、...、15)表示在MCTF后获得的子带。换句话说，fr0-fr15表示在对其进行MCTF之前在单个GOP中包括的16个帧。Then, at step S4, motion compensated temporal filtering (MCTF) is performed using the pruned best motion estimation tree, which will be explained with reference to FIG. 2 . Referring to FIG. 2 , the numbers written in each frame indicate the position of the frame in the time series, and Wn (where n=1, 2, . . . , 15) indicate subbands obtained after MCTF. In other words, fr0-fr15 represent 16 frames included in a single GOP before MCTF is performed on them.

首先，在时间等级0中，对于16个图像帧前向(forward)执行MCTF，由此获得8个低频帧和8个高频子带W8、W9、W10、W11、W12、W13、W14和W15。在时间等级1中，对于8个低频帧前向执行MCTF，由此获得4个低频帧和4个高频子带W4、W5、W6和W7。在时间等级2中，对于在时间等级1中获得的4个低频帧前向执行MCTF，由此获得2个低频帧和2个高频子带W2和W3。最后，在时间等级3中，对于在时间等级2中获得的2个低频帧前向执行MCTF，由此获得单个低频子带W0和单个高频子带W1。因此，作为MCTF的结果，获得总共16个子带W0-W15，其中包括在最后等级的15个高频子带和单个低频子带。在获得该16个子带后，对于图1的步骤S5中的16个子带执行空间变换和量化。其后，在步骤S6产生比特流，其中包括从空间变换和量化得到的数据与从运动估计得到的运动向量数据。First, in temporal level 0, MCTF is performed forward on 16 image frames, thereby obtaining 8 low frequency frames and 8 high frequency subbands W8, W9, W10, W11, W12, W13, W14 and W15 . In temporal level 1, MCTF is performed forward for 8 low frequency frames, thereby obtaining 4 low frequency frames and 4 high frequency subbands W4, W5, W6 and W7. In temporal level 2, MCTF is performed forward on the 4 low frequency frames obtained in temporal level 1, thereby obtaining 2 low frequency frames and 2 high frequency subbands W2 and W3. Finally, in temporal level 3, MCTF is performed forward on the 2 low frequency frames obtained in temporal level 2, thereby obtaining a single low frequency subband W0 and a single high frequency subband W1. Thus, as a result of MCTF, a total of 16 subbands W0-W15 are obtained, including 15 high frequency subbands and a single low frequency subband at the last level. After obtaining the 16 subbands, spatial transformation and quantization are performed on the 16 subbands in step S5 of FIG. 1 . Thereafter, a bit stream is generated at step S6, which includes data obtained from spatial transformation and quantization and motion vector data obtained from motion estimation.

发明内容Contents of the invention

技术问题technical problem

虽然传统的IWVC具有良好的扩缩性，它仍然具有缺点。一般，为了量化地测量视频编码的性能，使用峰值信噪比(PSNR)。当在原始图像和编码图像之间的差小时，PSNR值大。当在原始图像和编码图像之间的差大时，PSNR值小。当两个图像准确地相同时，PSNR值是无限的。图3示出了在传统的IWVC中平均PSNR值相对于帧索引的分布。如图3中所示，PSNR值相对于GOP内的帧索引而大大地改变。在诸如fr0、fr4、fr8、fr12和fr16(即在另一个GOP中的fr0)的位置，PSNR值变得比在它们的相邻位置更小。当PSNR值相对于帧索引而大大地改变时，视频画面质量随着时间而大大地改变。当画面质量临时大大地改变时，人们感知到画面质量变差。如上所述，画面质量的差别阻碍了诸如流服务之类的商业服务。因此，降低PSNR值的变化量对于基于小波的可扩缩视频编码是关键的。同时，降低在GOP内的帧之间的PSNR值的变化量在使用基于小波的空间变换的可扩缩视频编码中是重要的，并且在使用诸如离散余弦变换(DCT)之类的其它类型的空间变换的可扩缩视频编码中也是重要的。Although traditional IWVC has good scalability, it still has disadvantages. Typically, to quantitatively measure the performance of video coding, a peak signal-to-noise ratio (PSNR) is used. When the difference between the original image and the encoded image is small, the PSNR value is large. When the difference between the original image and the encoded image is large, the PSNR value is small. When two images are exactly the same, the PSNR value is infinite. Figure 3 shows the distribution of average PSNR values with respect to frame index in conventional IWVC. As shown in Figure 3, the PSNR value varies greatly with respect to the frame index within the GOP. At positions such as fr0, fr4, fr8, fr12, and fr16 (ie, fr0 in another GOP), PSNR values become smaller than at their adjacent positions. When the PSNR value varies greatly with respect to the frame index, the video picture quality varies greatly over time. When the picture quality is temporarily and greatly changed, people perceive that the picture quality is deteriorated. As described above, the difference in picture quality hinders commercial services such as streaming services. Therefore, reducing the variation of PSNR values is critical for wavelet-based scalable video coding. At the same time, reducing the amount of variation in PSNR values between frames within a GOP is important in scalable video coding using wavelet-based spatial transforms, and in scalable video coding using other types such as discrete cosine transform (DCT) Spatial transforms are also important in scalable video coding.

技术方案Technical solutions

本发明提供了可扩缩视频编码和解码方法及其可扩缩视频编码器和解码器，所述方法使得可以降低在峰值信噪比(PSNR)中的改变。The present invention provides scalable video encoding and decoding methods and scalable video encoders and decoders thereof, which make it possible to reduce changes in peak signal-to-noise ratio (PSNR).

按照本发明的一个方面，提供了一种可扩缩视频编码方法，包括：(a)接收多个视频帧，并且对于多个视频帧执行运动补偿时间过滤(MCTF)以从视频帧中去除时间冗余；以及(b)从被去除时间冗余的视频帧中获得被扩缩(scale)的变换系数，量化所扩缩的变换系数，并且产生比特流。According to an aspect of the present invention, there is provided a scalable video coding method, comprising: (a) receiving a plurality of video frames, and performing motion compensated temporal filtering (MCTF) on the plurality of video frames to remove temporal redundancy; and (b) obtaining scaled transform coefficients from the video frame from which temporal redundancy has been removed, quantizing the scaled transform coefficients, and generating a bitstream.

在步骤(a)中接收的视频帧已经进行了小波变换，以便从视频帧去除空间冗余，并且可以通过向在已经被去除时间冗余的视频帧中的一些子带应用预定的权重来获得所扩缩的变换系数。The video frame received in step (a) has been wavelet transformed in order to remove spatial redundancy from the video frame, and can be obtained by applying predetermined weights to some subbands in the video frame from which temporal redundancy has been removed The scaled transform coefficients.

也可以通过向在已经被去除时间冗余的视频帧中的一些子带应用预定的权重，并且对于加权的子带执行空间变换，来在步骤(b)中获得所扩缩的变换系数。The scaled transform coefficients may also be obtained in step (b) by applying predetermined weights to some subbands in the video frame from which temporal redundancy has been removed, and performing spatial transformation on the weighted subbands.

优选的是，通过对于已经被去除时间冗余的视频帧执行空间变换，并且向在通过空间变换而产生的变换系数中的、从一些子带获得的变换系数应用预定的权重，来在步骤(b)中获得所扩缩的变换系数。在这种情况下，对于每个画面组(GOP)确定预定权重。对于单个GOP，所述预定权重具有单一值，并且优选的是，根据GOP的绝对失真的幅度来确定所述预定权重。在此，优选的是，从子带获得使用所述预定权重而扩缩的变换系数，所述子带在用于构造低PSNR帧的子带中，与低PSNR帧相比，对于高的峰值信噪比(PSNR)帧施加实质很小的影响。Preferably, the step ( In b), the scaled transform coefficients are obtained. In this case, a predetermined weight is determined for each group of pictures (GOP). The predetermined weight has a single value for a single GOP, and is preferably determined according to the magnitude of the absolute distortion of the GOP. Here, it is preferable that the transform coefficients scaled using the predetermined weights are obtained from subbands that have a higher peak value than the low PSNR frame in the subbands used to construct the low PSNR frame The signal-to-noise ratio (PSNR) frame exerts substantially little impact.

在步骤(b)中产生的比特流包括关于用于获得所扩缩的变换系数的权重的信息。The bitstream generated in step (b) includes information on weights used to obtain the scaled transform coefficients.

按照本发明的另一个方面，提供了一种可扩缩视频编码器，它接收多个视频帧，并且产生比特流。所述可扩缩视频编码器包括：时间过滤块，它对于视频帧执行MCTF以从所述视频帧去除时间冗余；空间变换块，它对于视频帧执行空间变换，以从视频帧去除空间冗余；权重确定块，它确定权重，所述权重将要用于扩缩在作为从视频帧去除时间冗余和空间冗余的结果而获得的变换系数中的、从一些子带获得的变换系数；量化块，它量化被扩缩的变换系数；以及比特流产生块，它使用量化的变换系数来产生比特流。According to another aspect of the present invention, a scalable video encoder receives a plurality of video frames and generates a bitstream. The scalable video encoder includes a temporal filtering block that performs MCTF on video frames to remove temporal redundancy from the video frames, a spatial transformation block that performs spatial transformation on video frames to remove spatial redundancy from the video frames A weight determination block that determines weights to be used for scaling transform coefficients obtained from some subbands among transform coefficients obtained as a result of removing temporal redundancy and spatial redundancy from a video frame; a quantization block, which quantizes the scaled transform coefficients; and a bitstream generation block, which uses the quantized transform coefficients to generate a bitstream.

所述空间变换块可以对视频帧执行小波变换以从视频帧去除空间冗余，所述时间过滤块可以使用通过对小波变换的视频帧执行MCTF而获得的子带来产生变换系数，并且所述权重确定块可以使用小波变换的帧来确定权重，并且将所确定的权重乘以从一些子带获得的变换系数，由此获得被扩缩的变换系数。The spatial transform block may perform wavelet transform on the video frame to remove spatial redundancy from the video frame, the temporal filter block may generate transform coefficients using subbands obtained by performing MCTF on the wavelet transformed video frame, and the The weight determination block may determine weights using wavelet-transformed frames, and multiply the determined weights by transform coefficients obtained from some subbands, thereby obtaining scaled transform coefficients.

所述时间过滤块可以通过对视频帧执行MCTF来获得子带，所述权重确定块可以使用视频帧来确定权重，并且将所确定的权重乘以一些子带以获得被扩缩的子带，并且所述空间变换块可以对于被扩缩的子带执行空间变换，由此获得被扩缩的变换系数。The temporal filtering block may obtain subbands by performing MCTF on video frames, the weight determination block may use video frames to determine weights, and multiply the determined weights by some subbands to obtain scaled subbands, And the spatial transformation block may perform spatial transformation on the scaled subbands, thereby obtaining scaled transformation coefficients.

而且，时间过滤块可以通过对于视频帧执行MCTF来获得子带，并且所述空间变换块可以通过对于子带执行空间变换来产生变换系数，并且所述权重确定块可以使用视频帧来确定权重，并且将所确定的权重乘以从预定子带获得的变换系数，由此获得被扩缩的变换系数。Also, the temporal filtering block may obtain subbands by performing MCTF on video frames, and the spatial transformation block may generate transform coefficients by performing spatial transformation on subbands, and the weight determination block may determine weights using video frames, And the determined weight is multiplied by the transform coefficient obtained from the predetermined subband, thereby obtaining the scaled transform coefficient.

在此，优选的是，根据GOP的绝对失真的幅度来对于每个画面组(GOP)确定预定权重。优选的是，从子带获得使用预定权重而扩缩的变换系数，所述子带在用于构造低PSNR帧的子带中，与低PSNR帧相比，对于高的峰值信噪比(PSNR)帧施加实质很小的影响。Here, it is preferable to determine the predetermined weight for each group of pictures (GOP) according to the magnitude of the absolute distortion of the GOP. Preferably, the transform coefficients scaled with predetermined weights are obtained from the subbands used to construct the low PSNR frame for a high peak signal-to-noise ratio (PSNR ) frame exerts substantially little influence.

所述比特流产生块可以包括关于用于获得被扩缩的变换系数的权重的信息。The bitstream generation block may include information on weights used to obtain the scaled transform coefficients.

按照本发明的另一个方面，提供了一种可扩缩视频解码方法，包括：从比特流提取编码的图像信息、编码顺序信息和权重信息；通过对于所编码的图像信息去量化(dequantize)来获得被扩缩的变换系数；并且以与由所述编码顺序信息指示的编码顺序相反的解码顺序来对被扩缩的变换系数执行去扩缩(descale)、逆空间变换和逆时间过滤，从而恢复视频帧。According to another aspect of the present invention, a scalable video decoding method is provided, including: extracting encoded image information, encoding order information and weight information from a bit stream; dequantizing (dequantize) the encoded image information obtaining scaled transform coefficients; and performing descaling, inverse spatial transformation, and inverse temporal filtering on the scaled transform coefficients in a decoding order opposite to an encoding order indicated by the encoding order information, thereby Restore video frames.

所述解码顺序例如是去扩缩、逆时间过滤和逆空间变换。另外，所述解码顺序可以是逆空间变换、去扩缩和逆时间过滤，或者可以是去扩缩、逆空间变换和逆时间过滤。The decoding order is eg descaling, inverse temporal filtering and inverse spatial transformation. In addition, the decoding order may be inverse spatial transformation, descaling and inverse temporal filtering, or may be descaling, inverse spatial transformation and inverse temporal filtering.

对于每个画面组(GOP)，从比特流提取例如预定权重。在此，构成GOP的帧的数目是2^k(其中k＝1，2，3，...)。For each group of pictures (GOP), eg predetermined weights are extracted from the bitstream. Here, the number of frames constituting the GOP is 2 ^k (where k=1, 2, 3, . . . ).

例如，从已经在编码期间产生的子带W4、W6、W8、W10、W12和W14来获得要使用所述预定权重而逆扩缩的变换系数。The transform coefficients to be descaled using said predetermined weights are obtained, for example, from the subbands W4, W6, W8, W10, W12 and W14 that have been generated during encoding.

按照本发明的另一个方面，提供了一种可扩缩视频解码器，包括：比特流分析块，它分析所接收的比特流以从所述比特流提取编码的图像信息、编码顺序信息和权重信息；逆量化块，它将所编码图像去量化以获得被扩缩的变换系数；逆加权块，它执行去扩缩；逆空间变换块，它执行逆空间变换；以及逆时间过滤块，它执行逆时间过滤，所述可扩缩视频解码器以与由所述编码顺序信息指示的编码顺序相反的顺序来对被扩缩的变换系数执行去扩缩、逆空间变换和逆时间过滤，由此恢复视频帧。According to another aspect of the present invention there is provided a scalable video decoder comprising: a bitstream analysis block which analyzes a received bitstream to extract encoded image information, encoding order information and weights from said bitstream information; an inverse quantization block, which dequantizes the encoded image to obtain scaled transform coefficients; an inverse weighting block, which performs descaling; an inverse spatial transformation block, which performs an inverse spatial transformation; and an inverse temporal filtering block, which performing inverse temporal filtering, the scalable video decoder performs descaling, inverse spatial transformation, and inverse temporal filtering on the scaled transform coefficients in an order opposite to the encoding order indicated by the encoding order information, by This restores video frames.

在非限定性示例中，所述解码器以去扩缩、逆时间过滤和逆空间变换的顺序来执行解码。另外，所述解码器可以以逆空间变换、去扩缩和逆时间过滤的顺序或者以去扩缩、逆空间变换和逆时间过滤的顺序来执行解码。In a non-limiting example, the decoder performs decoding in the order descaling, inverse temporal filtering and inverse spatial transformation. Also, the decoder may perform decoding in the order of inverse spatial transformation, descaling and inverse temporal filtering or in the order of descaling, inverse spatial transformation and inverse temporal filtering.

在另一非限定性示例中，所述比特流分析块对于每个画面组(GOP)从所述比特流提取预定的权重。在此，构成GOP的帧的数目是2^k(其中k＝1，2，3，...)。In another non-limiting example, the bitstream analysis block extracts predetermined weights from the bitstream for each group of pictures (GOP). Here, the number of frames constituting the GOP is 2 ^k (where k=1, 2, 3, . . . ).

按照本发明的一个实施例，所述逆加权块对于从已经在编码期间产生的子带W4、W6、W8、W10、W12和W14扩缩的变换系数执行逆扩缩。According to one embodiment of the invention, said inverse weighting block performs inverse scaling on transform coefficients scaled from subbands W4, W6, W8, W10, W12 and W14 that have been generated during encoding.

附图说明Description of drawings

通过下面参照附图详细说明本发明的示例性实施例，本发明的上述和其它特点和优点将会变得更加清楚，其中：The above and other features and advantages of the present invention will become more apparent by the following detailed description of exemplary embodiments of the present invention with reference to the accompanying drawings, in which:

图1是传统帧间小波视频编码(IWVC)的流程图；Fig. 1 is the flowchart of traditional interframe wavelet video coding (IWVC);

图2图解了传统的运动补偿时间过滤(MCTF)；Figure 2 illustrates conventional motion compensated temporal filtering (MCTF);

图3是示出当两个画面组(GOP)的Foreman(福曼)序列以512Kbps的速度来进行传统的IWVC时出现的峰值信噪比(PSNR)的图；3 is a graph showing the peak signal-to-noise ratio (PSNR) that occurs when a Foreman (Foreman) sequence of two groups of pictures (GOP) is performed at a speed of 512 Kbps for conventional IWVC;

图4是按照本发明的一个实施例的可扩缩视频编码方法的流程图；Fig. 4 is a flowchart of a scalable video coding method according to an embodiment of the present invention;

图5图解了按照本发明的一个实施例的用于确定要扩缩的子带的规程；Figure 5 illustrates a procedure for determining subbands to scale according to one embodiment of the present invention;

图6图解了按照绝对失真幅度(MAD)的最佳扩缩因子的略图；Figure 6 illustrates a schematic diagram of optimal scaling factors in terms of magnitude of absolute distortion (MAD);

图7是用于将在本发明中获得的平均PSNR值与在传统技术中获得的平均PSNR值相比较的图；7 is a graph for comparing average PSNR values obtained in the present invention with those obtained in conventional techniques;

图8图解了按照本发明的一个实施例的使用不同的时间方向的MCTF；Figure 8 illustrates MCTF using different time directions according to one embodiment of the present invention;

图9是按照本发明的一个实施例的可扩缩视频编码器的功能方框图；Figure 9 is a functional block diagram of a scalable video encoder according to one embodiment of the present invention;

图10是按照本发明的另一个实施例的可扩缩视频编码器的功能方框图；以及Figure 10 is a functional block diagram of a scalable video encoder according to another embodiment of the present invention; and

图11是按照本发明的一个实施例的可扩缩视频解码器的功能方框图。FIG. 11 is a functional block diagram of a scalable video decoder according to one embodiment of the present invention.

具体实施方式Detailed ways

现在参照附图来详细说明本发明的示例性的、非限定性实施例。Exemplary, non-limiting embodiments of the invention will now be described in detail with reference to the accompanying drawings.

图4是按照本发明的一个实施例的可扩缩视频编码方法的流程图。FIG. 4 is a flowchart of a scalable video coding method according to an embodiment of the present invention.

首先，在步骤S10，以包括多个帧的画面组(GOP)为单位来接收图像。在本发明的一个实施例中，单个GOP包括16个帧，并且以GOP为单位来执行所有的操作。First, in step S10, an image is received in units of a group of pictures (GOP) including a plurality of frames. In one embodiment of the present invention, a single GOP includes 16 frames, and all operations are performed in GOP units.

在接收到图像后，在步骤S20计算权重、即扩缩因子。下面说明扩缩因子的计算。After receiving the image, the weight, ie the scaling factor, is calculated in step S20. The calculation of the scaling factor will be described below.

其后，在步骤S30，使用分层可变大小块匹配(HVSBM)来执行运动估计。在所述运动估计后，在步骤S40，修剪运动估计树，以便最小化绝对失真幅度(MAD)。Thereafter, at step S30, motion estimation is performed using Hierarchical Variable Size Block Matching (HVSBM). After said motion estimation, at step S40, the motion estimation tree is pruned so as to minimize the magnitude of absolute distortion (MAD).

接着，在步骤S50，使用被修剪的最佳运动估计树来进行运动补偿时间过滤(MCTF)。作为MCTF的结果，获得总共16个子带，其中包括15个高频子带和单个低频子带。在步骤S60，所述16个子带进行空间变换。可以使用离散余弦变换(DCT)来作为空间变换，但是优选的是使用小波变换。其后，在步骤S70，使用在步骤S20中获得的扩缩因子来执行帧扩缩。下面说明所述帧扩缩。在帧扩缩后，在步骤S80中执行嵌入的量化，然后在步骤S90中产生比特流。所述比特流包括编码图像信息、运动向量信息和扩缩因子信息。在编码期间，可以在空间变换后进行时间变换，并且可以在时间变换后进行扩缩。可以在所述比特流中包括关于编码顺序的信息，因此解码器可以识别不同的编码顺序。但是，所述比特流不必然包括编码顺序信息。当在所述比特流中不包括编码顺序信息时，可以将编码识别为以预定的顺序被执行。在本发明的实施例中，高频子带指示比较两个图像帧(‘a’和‘b’)的结果((a-b)/2)，并且低频子带指示两个图像帧的平均值((a+b)/2)。但是，本发明不限于此。例如，高频子带可以指示在两个帧之间的差(a-b)，并且低频子带可以指示两个所比较的帧中的一个帧(a)。Next, in step S50, motion compensated temporal filtering (MCTF) is performed using the pruned best motion estimation tree. As a result of MCTF, a total of 16 subbands were obtained, including 15 high frequency subbands and a single low frequency subband. In step S60, the 16 sub-bands undergo spatial transformation. A discrete cosine transform (DCT) can be used as the spatial transform, but it is preferable to use a wavelet transform. Thereafter, at step S70, frame scaling is performed using the scaling factor obtained at step S20. The frame scaling is described below. After frame scaling, embedded quantization is performed in step S80 and then a bitstream is generated in step S90. The bitstream includes encoded image information, motion vector information, and scaling factor information. During encoding, time transformation can be performed after space transformation, and scaling can be performed after time transformation. Information about the encoding order can be included in the bitstream, so a decoder can recognize different encoding orders. However, the bitstream does not necessarily include encoding order information. When encoding order information is not included in the bitstream, encoding may be recognized as being performed in a predetermined order. In an embodiment of the invention, the high frequency subband indicates the result of comparing two image frames ('a' and 'b') ((a-b)/2), and the low frequency subband indicates the average value of the two image frames ( (a+b)/2). However, the present invention is not limited thereto. For example, a high frequency subband may indicate the difference between two frames (a-b), and a low frequency subband may indicate one of the two compared frames (a).

图5图解了按照本发明的一个实施例的用于确定要扩缩的子带的规程。子带指示作为时间过滤的结果而获得的多个高频帧和单个低频帧。高频帧被称为高频子带，低频帧被称为低频子带。在可扩缩视频编码中，MCTF被用作时间过滤。当使用MCTF时，可以去除时间冗余，并且可以获得时间可扩缩性。Figure 5 illustrates a procedure for determining subbands to scale according to one embodiment of the present invention. Subbands indicate multiple high frequency frames and a single low frequency frame obtained as a result of temporal filtering. High frequency frames are called high frequency subbands, and low frequency frames are called low frequency subbands. In scalable video coding, MCTF is used as a temporal filter. When using MCTF, temporal redundancy can be removed and temporal scalability can be obtained.

将参照图5来说明在视频帧fr0-fr15和从MCTF产生的子带W0-W15之间的关系和恢复时间帧的方法。可以将在视频帧fr0-fr15和子带W0-W15之间的关系定义如下：The relationship between the video frames fr0-fr15 and the subbands W0-W15 generated from the MCTF and the method of restoring the time frame will be explained with reference to FIG. 5 . The relationship between video frames fr0-fr15 and subbands W0-W15 can be defined as follows:

fr15＝W0+W1+W3+W7+W15fr15=W0+W1+W3+W7+W15

fr14＝W0+W1+W3+W7-W15fr14=W0+W1+W3+W7-W15

fr13＝W0+W1+W3-W7+W14fr13=W0+W1+W3-W7+W14

fr12＝W0+W1+W3-W7-W14fr12=W0+W1+W3-W7-W14

fr11＝W0+W1-W3+W6+W13fr11=W0+W1-W3+W6+W13

fr10＝W0+W1-W3+W6-W13fr10=W0+W1-W3+W6-W13

fr9＝W0+W1-W3-W6+W12fr9=W0+W1-W3-W6+W12

fr8＝W0+W1-W3-W6-W12fr8=W0+W1-W3-W6-W12

fr7＝W0-W1+W2+W5+W11fr7=W0-W1+W2+W5+W11

fr6＝W0-W1+W2+W5-W11fr6=W0-W1+W2+W5-W11

fr5＝W0-W1+W2-W5+W10fr5=W0-W1+W2-W5+W10

fr4＝W0-W1+W2-W5-W10fr4=W0-W1+W2-W5-W10

fr3＝W0-W1-W2+W4+W9fr3=W0-W1-W2+W4+W9

fr2＝W0-W1-W2+W4-W9fr2=W0-W1-W2+W4-W9

fr1＝W0-W1-W2-W4+W8fr1=W0-W1-W2-W4+W8

fr0＝W0-W1-W2-W4-W8。fr0=W0-W1-W2-W4-W8.

如图3中所示，帧fr0、fr4、fr8和fr12与相邻帧相比较具有特别低的峰值信噪比(PSNR)，它们被称为低PSNR帧。低PSNR帧周期出现的原因与MCTF顺序有关。换句话说，运动估计误差出现在MCTF期间，并且随时间等级增加而被累加。通过MCTF结构来确定累加程度。对于由在低时间等级的高频子带替换的帧，累加程度高。相反，由在高时间等级的高频子带替换的帧和由在最高时间等级的低频子带替换的帧具有高PSNR值，并且这些帧被称为高PSNR帧。As shown in FIG. 3, frames fr0, fr4, fr8 and fr12 have particularly low peak signal-to-noise ratio (PSNR) compared to adjacent frames, and they are called low PSNR frames. The reason why the low PSNR frame period occurs is related to the MCTF order. In other words, motion estimation errors occur during MCTF and are accumulated with increasing temporal levels. The degree of accumulation is determined by the MCTF structure. The degree of accumulation is high for frames replaced by high frequency subbands at low temporal levels. On the contrary, a frame replaced by a high frequency subband at a high temporal level and a frame replaced by a low frequency subband at the highest temporal level have high PSNR values, and these frames are referred to as high PSNR frames.

因此，可以从需要重建低PSNR帧的子带中选择要乘以扩缩因子的被过滤的子带。与扩缩因子相乘表示分配更多的比特。换句话说，当考虑到在嵌入的量化期间将比特优选地分配到较大的变换系数时，将子带乘以扩缩因子表示，与其它变换系数相比，向从所选择的子带获得的变换系数分配更多的比特。向在使用预定数量的比特编码的GOP中的低PSNR帧分配更多的比特意味着向除在GOP中的低PSNR帧之外的帧分配更少的比特。同样地，当低PSNR帧的PSNR值提高时，高PSNR帧的PSNR值降低。重建低PSNR帧所需要的、并且向高PSNR帧施加较小影响的子带被选择来乘以扩缩因子。换句话说，应当选择最少被用来重建高PSNR帧的子带(以下称为最小改变子带)。相应地，首先选择子带W8、W10、W12和W14。但是，因为帧fr0和fr8具有比其它帧特别低的PSNR值，因此对于帧fr0和fr8需要特殊的补偿。为此，在本发明的所述实施例中，将子带W4和W6附加选择为要乘以扩缩因子的最小改变子带，以便大大地降低在PSNR值中的改变。Thus, the filtered subbands to be multiplied by the scaling factor can be selected from the subbands needed to reconstruct the low PSNR frame. Multiplied by the scaling factor means that more bits are allocated. In other words, multiplying the subbands by the scaling factor represents, when considering the preferential allocation of bits to the larger transform coefficients during embedded quantization, towards the obtained from the selected subbands compared to the other transform coefficients The transform coefficients allocate more bits. Allocating more bits to low PSNR frames in a GOP encoded with a predetermined number of bits means allocating fewer bits to frames other than low PSNR frames in the GOP. Likewise, when the PSNR value of the low PSNR frame increases, the PSNR value of the high PSNR frame decreases. The subbands that are needed to reconstruct low PSNR frames and exert less influence on high PSNR frames are selected to be multiplied by the scaling factor. In other words, the subbands that are least used to reconstruct high PSNR frames (hereinafter referred to as least changed subbands) should be selected. Accordingly, subbands W8, W10, W12 and W14 are selected first. However, since frames fr0 and fr8 have particularly lower PSNR values than other frames, special compensation is required for frames fr0 and fr8. To this end, in the described embodiment of the invention, the subbands W4 and W6 are additionally selected as the least changed subbands to be multiplied by the scaling factor in order to greatly reduce the change in PSNR values.

同样地，如图5中所示，在使用MCTF获得的子带W0-W15中，将最小改变子带W4、W6、W8、W10、W12和W14乘以扩缩因子‘a’。为了降低视频编码的计算量，优选的是，计算每个GOP的扩缩因子，而不是一次一个对于视频中的所有帧一起计算扩缩因子。在本发明的上述实施例中，相同扩缩因子用于最小改变子带W4、W6、W8、W10、W12和W14，以便降低计算量，但是本发明的精神不限于上述的实施例。应当解释为在本发明的精神中包括通过MCTF操作而获得的子带被加权以便降低PSNR值中的改变的视频编码和解码技术。因此，在本发明的范围中也包括将子带乘以不同的扩缩因子的情况。Likewise, as shown in FIG. 5, among the subbands W0-W15 obtained using MCTF, the minimally changed subbands W4, W6, W8, W10, W12, and W14 are multiplied by the scaling factor 'a'. In order to reduce the calculation amount of video encoding, it is preferable to calculate the scaling factor of each GOP, instead of calculating the scaling factor for all frames in the video one at a time. In the above embodiments of the present invention, the same scaling factor is used for the minimum change subbands W4, W6, W8, W10, W12 and W14 in order to reduce the amount of computation, but the spirit of the present invention is not limited to the above embodiments. It should be construed as including in the spirit of the present invention video encoding and decoding techniques in which subbands obtained through MCTF operations are weighted so as to reduce changes in PSNR values. Therefore, the case of multiplying the subbands by different scaling factors is also included in the scope of the present invention.

可以使用各种方法来确定要乘以子带的扩缩因子。在本发明的一个实施例中，按照MAD对于每个GOP获得扩缩因子。在本发明的所述实施例中，通过公式(1)来定义MAD。Various methods can be used to determine the scaling factor by which the subbands are multiplied. In one embodiment of the invention, the scaling factor is obtained for each GOP in terms of MAD. In said embodiment of the invention, MAD is defined by formula (1).

$MAD MAD = = 88 \times \times {Σ Σ}_{i i = = 00}^{\frac{n no - - 11}{22}} {Σ Σ}_{x x = = 00}^{p p - - 11} {Σ Σ}_{y the y = = 00}^{q q - - 11} | | {T T}_{22 i i + + 11} ((x x,, y the y)) - - {T T}_{22 i i} ((x x,, y the y)) | | \cdot &Center Dot; \cdot &Center Dot; \cdot \cdot ((11))$

在此，‘i’指示帧索引，‘n’指示在GOP中的最后帧索引，T(x，y)指示在T帧中的位置(x，y)的画面值，并且单个帧的大小是p*q。Here, 'i' indicates a frame index, 'n' indicates a last frame index in a GOP, T(x,y) indicates a picture value at a position (x,y) in a T frame, and the size of a single frame is p*q.

为了实现本发明，按照MAD而将扩缩因子乘以子带。接着，获得每个帧的PSNR值。接着，如图6中所示获得最佳扩缩因子‘a’。To implement the invention, the scaling factor is multiplied by the subbands according to the MAD. Next, the PSNR value of each frame is obtained. Next, an optimal scaling factor 'a' is obtained as shown in FIG. 6 .

图6图解了按照MAD的最佳扩缩因子的略图。在图6中，实线是在实际实验中获得的值的图形，虚线是通过将所述值与线性公式逼近而获得的图形。使用公式(2)来获得扩缩因子‘a’。Fig. 6 illustrates a schematic diagram of optimal scaling factors according to MAD. In FIG. 6 , the solid line is a graph of values obtained in actual experiments, and the broken line is a graph obtained by approximating the values with a linear formula. Use formula (2) to obtain the scaling factor 'a'.

a＝1.3(如果MAD＜30)a=1.3 (if MAD<30)

a＝1.4-0.0033MAD(如果30＜MAD＜140)a=1.4-0.0033MAD (if 30<MAD<140)

a＝1(如果MAD＞140) (2)a=1 (if MAD>140) (2)

在获得所述扩缩因子‘a’后，对子带执行扩缩。换句话说，在使用MCTF而获得的子带W0-W15中，按照公式(3)来对最小改变子带W4、W6、W8、W10、W12和W14进行扩缩。After obtaining the scaling factor 'a', scaling is performed on the subbands. In other words, among the subbands W0-W15 obtained using MCTF, the minimum change subbands W4, W6, W8, W10, W12, and W14 are scaled according to formula (3).

W4＝a*W4，W6＝a*W6W4=a*W4, W6=a*W6

W8＝a*W8，W10＝a*W10W8=a*W8, W10=a*W10

W12＝a*W12，W14＝a*W14(使用公式(2)来获得“a”) (3)W12=a*W12, W14=a*W14 (use formula (2) to get "a") (3)

图7是将在本发明的一个实施例中获得的平均PSNR值与在使用传统MCTF的情况下获得的平均PSNR值相比较的图形。FIG. 7 is a graph comparing the average PSNR value obtained in one embodiment of the present invention with that obtained in the case of using a conventional MCTF.

参见图7，PSNR值的改变在本发明的所述实施例中小于在使用传统MCTF的情况中。另外，可以发现在传统情况下的低PSNR值在本发明中被提高，同时在传统情况下的高PSNR值在本发明中被降低。Referring to FIG. 7, the change in PSNR value in the embodiment of the present invention is smaller than in the case of using the conventional MCTF. In addition, it can be found that the low PSNR value in the conventional case is improved in the present invention, while the high PSNR value in the conventional case is reduced in the present invention.

除了在仅仅前向执行的传统MCTF的过程中将GOP中的一些帧加权的方法之外，还可以通过在MCTF期间按照预定的规则组合前向时间过滤和逆时间过滤来提高PSNR值。表1中示出了组合的前向和逆时间过滤的示例。In addition to the method of weighting some frames in the GOP during the traditional MCTF performed only in the forward direction, the PSNR value can also be improved by combining forward temporal filtering and inverse temporal filtering according to predetermined rules during MCTF. An example of combined forward and inverse temporal filtering is shown in Table 1.

表1 模式标志等级0 等级1 等级2 等级3 前向(F＝0) ++++++++ ++++ ++ + 逆(F＝1) -------- ---- -- - 组合的前向和逆(F＝2)情况(a)情况(b)情况(c)情况(d) +-+-+-+-+-+-+-+-++++++++++++---- ++--+-+-++--++-- +-+-+-+- +(-)+(-)-- Table 1 mode flag Level 0 Grade 1 Level 2 Level 3 Forward (F=0) ++++++++ ++++ ++ + Inverse (F=1) -------- ---- -- - Combined forward and inverse (F=2) case (a) case (b) case (c) case (d) +-+-+-+-+-+-+-+-++++++++++++---- ++--+-+-++--++-- +-+-+-+- +(-)+(-)--

情况(c)和(d)其特征在于在最后等级的低频帧(以下称为基准帧)位于第1到第16帧的中心(即第8帧)。基准帧是在视频编码中最必不可少的帧。根据基准帧来恢复其它帧。当在一个帧和基准帧之间的时间距离增大时，恢复性能降低。因此，在情况(c)和(d)中，进行前向时间过滤和逆时间过滤的组合，以便基准帧位于中心、即第8帧，以最小化在基准帧和每个其它帧之间的时间距离。The cases (c) and (d) are characterized in that the low-frequency frame at the last level (hereinafter referred to as the reference frame) is located at the center of the 1st to 16th frames (ie, the 8th frame). A reference frame is the most essential frame in video encoding. Restoring other frames from the base frame. Restoration performance degrades when the temporal distance between a frame and the reference frame increases. Therefore, in cases (c) and (d), a combination of forward temporal filtering and inverse temporal filtering is done so that the reference frame is at the center, frame 8, to minimize the distance between the reference frame and every other frame time distance.

在情况(a)和(b)中，最小化平均时间距离(ATD)。为了计算ATD，计算时间距离。时间距离被定义为在两个帧之间的位置差。参见图3，在第一帧和第二帧之间的时间距离被定义为1，在帧2和帧4之间的时间距离被定义为2。通过把成对地进行运动估计操作的帧之间的时间距离的和除以对于运动估计而定义的帧对的数量，来获得ATD。In cases (a) and (b), the average temporal distance (ATD) is minimized. To calculate ATD, time distance is calculated. Temporal distance is defined as the position difference between two frames. Referring to FIG. 3 , the time distance between the first frame and the second frame is defined as 1, and the time distance between frame 2 and frame 4 is defined as 2. ATD is obtained by dividing the sum of temporal distances between frames on which motion estimation operates in pairs by the number of frame pairs defined for motion estimation.

在情况(a)中，In case (a),

$ATD ATD = = \frac{88 \times \times 11 + + 44 \times \times 11 + + 22 \times \times 44 + + 11 \times \times 33}{1515} = = 1.53 1.53$

在情况(b)中，In case (b),

$ATD ATD = = \frac{88 \times \times 11 + + 44 \times \times 11 + + 22 \times \times 33 + + 11 \times \times 55}{1515} = = 1.53 1.53$

在表1中所示的前向模式和逆模式中，In the forward mode and inverse mode shown in Table 1,

$ATD ATD = = \frac{88 \times \times 11 + + 44 \times \times 22 + + 22 \times \times 44 + + 11 \times \times 88}{1515} = = 2.13 2.13$

在情况(c)中，In case (c),

$ATD ATD = = \frac{88 \times \times 11 + + 44 \times \times 22 + + 22 \times \times 44 + + 11 \times \times 22}{1515} 1.73 1.73$

在情况(d)中，In case (d),

$ATD ATD = = \frac{88 \times \times 11 + + 44 \times \times 11 + + 22 \times \times 44 + + 11 \times \times 11}{1515} = = 11 . . 6767$

在实际模拟中，随着ATD降低，PSNR值提高，以便提高视频编码的性能。In actual simulation, as the ATD decreases, the PSNR value increases in order to improve the performance of video coding.

图8图解了在情况(a)中所示的不同时间方向中执行的MCTF。实线指示前向时间过滤，虚线指示逆时间过滤。当如图8所示执行MCTF时，在帧fr0-fr15和子带W0-W15之间的关系被定义如下：Fig. 8 illustrates MCTF performed in different time directions shown in case (a). Solid lines indicate forward temporal filtering and dashed lines indicate inverse temporal filtering. When MCTF is performed as shown in FIG. 8, the relationship between frames fr0-fr15 and subbands W0-W15 is defined as follows:

fr15＝W0+W1-W3-W7-W15fr15=W0+W1-W3-W7-W15

fr14＝W0+W1-W3-W7+W15fr14=W0+W1-W3-W7+W15

fr13＝W0+W1-W3+W7+W14fr13=W0+W1-W3+W7+W14

fr12＝W0+W1-W3+W7-W14fr12=W0+W1-W3+W7-W14

fr11＝W0+W1+W3-W6-W13fr11=W0+W1+W3-W6-W13

fr10＝W0+W1+W3-W6+W13fr10=W0+W1+W3-W6+W13

fr9＝W0+W1+W3+W6+W12fr9=W0+W1+W3+W6+W12

fr8＝W0+W1+W3+W6-W12fr8=W0+W1+W3+W6-W12

fr7＝W0-W1+W2+W5-W11fr7=W0-W1+W2+W5-W11

fr6＝W0-W1+W2+W5+W11fr6=W0-W1+W2+W5+W11

fr5＝W0-W1+W2-W5+W10fr5=W0-W1+W2-W5+W10

fr4＝W0-W1+W2-W5-W10fr4=W0-W1+W2-W5-W10

fr3＝W0-W1-W2+W4-W9fr3=W0-W1-W2+W4-W9

fr2＝W0-W1-W2+W4+W9fr2=W0-W1-W2+W4+W9

fr1＝W0-W1-W2-W4+W8fr1=W0-W1-W2-W4+W8

fr0＝W0-W1-W2-W4-W8。fr0=W0-W1-W2-W4-W8.

在表1中的情况(a)中，PSNR值也按照帧索引而改变。确定具有低PSNR值的帧索引，并且也确定对于与所确定的帧索引对应的帧之外的帧施加较小影响的最小改变子带。在计算MAD后，将最小改变子带乘以适当的扩缩因子。按照在MCTF期间的时间过滤的方向，在GOP中对应于特定索引的帧具有良好的性能，而在GOP中对应于另一个特定索引的帧具有较差的性能。本发明其特征在于下述操作：当确定时间过滤顺序时确定具有低PSNR值的帧索引，然后确定在用于重建与所确定的帧索引对应的帧的子带中、对于除了与所确定的帧索引对应的帧之外的帧施加较小影响的最小改变子带，然后将所述最小改变子带乘以扩缩因子。在本发明的一个实施例中，单个扩缩因子用于GOP中的子带，并且按照MAD而被确定。In case (a) in Table 1, the PSNR value also changes according to the frame index. A frame index having a low PSNR value is determined, and a minimum change subband exerting a small influence on frames other than the frame corresponding to the determined frame index is also determined. After computing the MAD, the least altered subband is multiplied by the appropriate scaling factor. In the direction of temporal filtering during MCTF, a frame in a GOP corresponding to a certain index has good performance, while a frame in a GOP corresponding to another specific index has poor performance. The present invention is characterized by the operation of determining a frame index with a low PSNR value when determining the temporal filtering order, and then determining, among the subbands used to reconstruct the frame corresponding to the determined frame index, for Frames other than the frame corresponding to the frame index apply the least affected sub-bands, which are then multiplied by the scaling factor. In one embodiment of the invention, a single scaling factor is used for the subbands in a GOP and is determined by MAD.

另外，即使当不像传统MCTF那样而使用多个基准帧来进行MCTF时，可以以与如上所述的方式相同的方式使用在帧和子带之间的关系来进行扩缩因子的相乘。In addition, even when MCTF is performed using a plurality of reference frames unlike conventional MCTF, multiplication of scaling factors can be performed using the relationship between frames and subbands in the same manner as described above.

图9是按照本发明的一个实施例的可扩缩视频编码器的功能方框图。FIG. 9 is a functional block diagram of a scalable video encoder according to one embodiment of the present invention.

所述可扩缩视频编码器包括运动估计块110、运动向量编码块120、比特流产生块130、时间过滤块140、空间变换块150、嵌入量化块160和权重确定块170。The scalable video encoder includes a motion estimation block 110 , a motion vector encoding block 120 , a bitstream generation block 130 , a temporal filtering block 140 , a spatial transformation block 150 , an embedded quantization block 160 and a weight determination block 170 .

运动估计块110获得根据在基准帧中的匹配块而要被编码的每个帧中的块的运动向量。时间过滤块140也使用所述帧。可以使用诸如分层可变大小块匹配(HVSBM)之类的分层方法来获得运动向量。由运动估计块110获得的运动向量被提供到时间过滤块140，以便可以执行MCTF。所述运动向量也被运动向量编码块120编码，然后被比特流产生块130包括在比特流中。The motion estimation block 110 obtains a motion vector of a block in each frame to be encoded from a matching block in a reference frame. The temporal filtering block 140 also uses the frames. Motion vectors can be obtained using hierarchical methods such as Hierarchical Variable Size Block Matching (HVSBM). The motion vectors obtained by the motion estimation block 110 are provided to the temporal filtering block 140 so that MCTF can be performed. The motion vectors are also encoded by the motion vector encoding block 120 and then included in the bitstream by the bitstream generation block 130 .

时间过滤块140参照从运动估计块110接收的运动向量而执行视频帧的时间过滤。使用MCTF来执行时间过滤，并且所述时间过滤不限于传统的MCTF。例如，可以改变时间过滤顺序，或者可以使用多个基准帧。The temporal filtering block 140 performs temporal filtering of video frames with reference to motion vectors received from the motion estimation block 110 . Temporal filtering is performed using MCTF and is not limited to conventional MCTF. For example, the temporal filtering order can be changed, or multiple reference frames can be used.

同时，权重确定块170使用公式(1)来关于视频帧计算MAD，并且按照公式(2)而使用所计算的MAD来获得权重。可以按照公式(3)而将所获得的权重乘以子带。在一个示例性实施例中，将所述权重乘以从由空间变换块150执行的空间变换产生的变换系数。换句话说，通过对将要乘以公式(3)中的权重的子带进行空间变换，来获得变换系数，然后，将所述变换系数乘以权重。显然可以在时间过滤后执行权重的相乘，其后，可以执行空间变换。Meanwhile, the weight determination block 170 calculates the MAD with respect to the video frame using formula (1), and obtains the weight using the calculated MAD according to formula (2). The obtained weights can be multiplied by the subbands according to formula (3). In an exemplary embodiment, the weights are multiplied by transform coefficients resulting from the spatial transform performed by the spatial transform block 150 . In other words, transform coefficients are obtained by spatially transforming subbands to be multiplied by weights in formula (3), and then multiplied by weights. It is obvious that the multiplication of weights can be performed after temporal filtering, after which a spatial transformation can be performed.

按照权重而扩缩的变换系数被发送到嵌入量化块160。嵌入量化块160执行被扩缩的变换系数的嵌入量化，由此产生编码的图像信息。所述编码的图像信息和编码的运动向量被发送到比特流产生块130。比特流产生块130产生包括所述编码的图像信息、所述编码的运动向量和权重信息的比特流。通过信道来发送所述比特流。The transformed coefficients scaled according to weights are sent to the embedded quantization block 160 . The embedded quantization block 160 performs embedded quantization of the scaled transform coefficients, thereby generating encoded image information. The encoded image information and encoded motion vectors are sent to the bitstream generation block 130 . The bitstream generation block 130 generates a bitstream including the encoded image information, the encoded motion vector and weight information. The bitstream is sent over a channel.

按照示例性实施例，空间变换块150使用小波变换来关于视频帧去除空间冗余，以获得空间可扩缩性。或者，空间变换块150可以使用DCT来关于视频帧去除空间冗余。According to an exemplary embodiment, the spatial transformation block 150 uses wavelet transformation to remove spatial redundancy with respect to video frames for spatial scalability. Alternatively, spatial transformation block 150 may use DCT to remove spatial redundancy with respect to video frames.

同时，当使用小波变换时，不像传统的视频编码那样，可以在时间过滤之前执行空间变换。参照图10来说明这个操作。Also, when wavelet transform is used, unlike conventional video coding, spatial transformation can be performed before temporal filtering. This operation is explained with reference to FIG. 10 .

图10是按照本发明的另一个实施例的可扩缩视频编码器的功能方框图。FIG. 10 is a functional block diagram of a scalable video encoder according to another embodiment of the present invention.

参见图10，通过空间变换块210来小波变换视频帧。按照小波变换的公知方法，将单个帧划分为四，将所述帧的一个象限替换为类似于所述帧的整个图像并且具有所述帧的面积的1/4的缩小图像(被称为L图像)，并且将所述帧的其它三个象限替换为一种信息(被称为H图像)，根据所述信息，可以从L图像恢复整个图像。以相同的方式，可以将L图像帧替换为具有所述L图像帧的面积的1/4的LL图像和根据其能够恢复L图像的信息。被称为JPEG2000的压缩方法使用利用这种小波方法的图像压缩。不像DCT图像那样，小波变换的图像包括原始图像信息，并且利用缩小图像来使能具有空间可扩缩性的视频编码。Referring to FIG. 10 , a video frame is wavelet transformed by a spatial transformation block 210 . According to the well-known method of wavelet transformation, a single frame is divided into four, and one quadrant of the frame is replaced by a reduced image (referred to as L image) and replace the other three quadrants of the frame with a type of information (called the H image) from which the entire image can be recovered from the L image. In the same manner, an L image frame can be replaced with an LL image having 1/4 the area of the L image frame and information from which the L image can be restored. A compression method called JPEG2000 uses image compression using this wavelet method. Unlike a DCT image, a wavelet-transformed image includes original image information, and enables video coding with spatial scalability using downscaled images.

运动估计块220关于空间变换的帧获得运动向量。所述运动向量用于通过时间过滤块240的时间过滤。所述运动向量也被运动向量编码块230编码，然后被包括在由比特流产生块270产生的比特流中。The motion estimation block 220 obtains motion vectors with respect to the spatially transformed frame. The motion vectors are used for temporal filtering by temporal filtering block 240 . The motion vectors are also encoded by the motion vector encoding block 230 and then included in the bitstream generated by the bitstream generation block 270 .

权重确定块260根据所述空间变换的帧来确定权重。将所确定的权重乘以从在从时间过滤产生的子带中的最小改变子带获得的变换系数。被扩缩的变换系数被嵌入量化块250量化，因此被转换为编码图像。所编码的图像被比特流产生块270与运动向量和权重一起使用，以产生比特流。Weight determination block 260 determines weights from the spatially transformed frames. The determined weight is multiplied by the transform coefficient obtained from the least changed subband among the subbands resulting from temporal filtering. The scaled transform coefficients are quantized by the embedded quantization block 250 and thus converted into coded images. The encoded images are used by the bitstream generation block 270 together with motion vectors and weights to generate a bitstream.

同时，视频编码器可以包括在图9和10中所示的两个视频编码器，用于执行两种类型的视频编码，并且所述视频编码器可以使用编码图像来产生比特流，所述编码图像是利用在关于每个GOP的在图9和10中所示的编码顺序中的、提供较好性能的编码顺序而获得的。在这种视频编码器中，关于编码顺序的信息被包括在要发送的比特流中。在图9和10中所示的实施例中，关于编码顺序的信息也被包括在比特流中，以便解码器可以解码已经以不同顺序编码的所有图像。Meanwhile, the video encoder may include two video encoders shown in FIGS. 9 and 10 for performing two types of video encoding, and the video encoder may generate a bitstream using encoded Images are obtained using the encoding order that provides better performance among the encoding orders shown in FIGS. 9 and 10 with respect to each GOP. In such a video encoder, information on the encoding order is included in the bitstream to be transmitted. In the embodiment shown in Figures 9 and 10, information about the encoding order is also included in the bitstream so that a decoder can decode all pictures that have been encoded in a different order.

当在传统的视频压缩中在空间变换之前执行时间过滤时，变换系数指示通过空间变换而产生的值。换句话说，变换系数当它通过DCT而被产生时被称为DCT系数，或者当它通过小波变换而被产生时被称为小波系数。When temporal filtering is performed before spatial transformation in conventional video compression, a transformation coefficient indicates a value generated by spatial transformation. In other words, a transform coefficient is called a DCT coefficient when it is generated by DCT, or a wavelet coefficient when it is generated by wavelet transform.

在本发明的实施例中，术语‘变换系数’意欲表示通过在进行量化(即嵌入量化)之前从帧去除空间冗余和时间冗余而获得的值。换句话说，在图9中所示的实施例中，变换系数指示像在传统的视频压缩中那样通过空间变换而产生的系数。但是，在图10中所示的实施例中，变换系数指示通过时间过滤而产生的系数。In an embodiment of the present invention, the term 'transform coefficient' is intended to mean a value obtained by removing spatial redundancy and temporal redundancy from a frame before performing quantization (ie embedded quantization). In other words, in the embodiment shown in FIG. 9 , the transform coefficients indicate coefficients generated by spatial transformation as in conventional video compression. However, in the embodiment shown in FIG. 10, the transform coefficients indicate coefficients resulting from temporal filtering.

在本发明中使用的术语‘扩缩的变换系数’意欲包含通过使用权重而扩缩变换系数，或通过使用权重而对通过时间过滤而获得的扩缩子带的结果执行空间变换，而产生的值。同时，可以考虑将不使用权重扩缩的变换系数乘以1，因此，被扩缩的变换系数可以包括还没有被扩缩的变换系数以及已经使用权重而被扩缩的变换系数。The term 'scaled transform coefficients' as used in the present invention is intended to include those produced by scaling transform coefficients using weights, or by using weights to perform a spatial transformation on the result of scaling subbands obtained by temporal filtering value. Meanwhile, it may be considered to multiply transform coefficients not scaled with weights by 1, and thus scaled transform coefficients may include transform coefficients that have not been scaled and transform coefficients that have been scaled using weights.

所述可扩缩视频解码器包括：比特流分析块310，它分析输入比特流，由此提取编码图像信息、编码运动向量信息和权重信息；逆嵌入量化块320，它去量化由所述比特流分析块310提取的编码图像信息，由此获得被扩缩的变换系数；逆加权块370，它使用所述权重信息来去扩缩被扩缩的变换系数；逆空间变换块330和360，执行逆空间变换；以及，逆时间过滤块340和350，执行逆时间过滤。The scalable video decoder includes: a bitstream analysis block 310, which analyzes the input bitstream, thereby extracting encoded image information, encoded motion vector information and weight information; encoded image information extracted by stream analysis block 310, thereby obtaining scaled transform coefficients; inverse weighting block 370, which uses said weight information to descale the scaled transform coefficients; inverse spatial transform blocks 330 and 360, performing inverse spatial transformation; and, inverse temporal filtering blocks 340 and 350, performing inverse temporal filtering.

图11中所示的可扩缩视频解码器包括两个逆时间过滤块340和350以及两个逆空间变换块330和360，以便它可以恢复以不同顺序而编码的所有图像。但是，在实际实施中，可以对于使用软件的计算装置执行时间过滤和空间变换。在这种情况下，可以与选择操作顺序的选项一起提供用于时间过滤的仅仅单个软件模块和用于空间变换的仅仅单个软件模块。The scalable video decoder shown in Fig. 11 includes two inverse temporal filtering blocks 340 and 350 and two inverse spatial transformation blocks 330 and 360, so that it can recover all pictures encoded in different orders. However, in an actual implementation, temporal filtering and spatial transformation may be performed on a computing device using software. In this case, only a single software module for temporal filtering and only a single software module for spatial transformation may be provided together with the option to select the order of operations.

比特流分析块310从比特流提取所编码的图像信息，并且向逆嵌入量化块320发送所编码的图像信息。然后逆嵌入量化块320对于所编码的图像信息执行逆嵌入量化，由此获得被扩缩的变换系数。比特流分析块310也向逆加权块370发送权重信息。The bitstream analysis block 310 extracts encoded image information from the bitstream, and sends the encoded image information to the inverse embedding quantization block 320 . The inverse embedding quantization block 320 then performs inverse embedding quantization on the encoded image information, thereby obtaining scaled transform coefficients. Bitstream analysis block 310 also sends weight information to inverse weighting block 370 .

逆加权块370根据所述权重信息来将被扩缩的变换系数去扩缩以获得变换系数。所述去扩缩与编码顺序相关联。当已经以时间过滤、空间变换和扩缩的顺序执行了编码时，逆加权块370在逆空间变换块330之前将被扩缩的变换系数去扩缩。接着，逆空间变换块330执行逆空间变换。其后，逆时间过滤块340通过逆时间过滤而恢复视频帧。The inverse weighting block 370 descales the scaled transform coefficients according to the weight information to obtain transform coefficients. The descaling is associated with a coding order. When encoding has been performed in the order of temporal filtering, spatial transformation, and scaling, the inverse weighting block 370 descales the scaled transform coefficients before the inverse spatial transformation block 330 . Next, the inverse spatial transformation block 330 performs an inverse spatial transformation. Thereafter, the inverse temporal filtering block 340 restores the video frames by inverse temporal filtering.

当已经以时间过滤、扩缩和空间变换的顺序而执行了编码时，逆空间变换块330对于被扩缩的变换系数执行逆空间变换，然后，逆加权块370去扩缩已经由逆空间变换块330处理的被扩缩的变换系数。其后，逆时间过滤块340通过逆时间过滤而恢复视频帧。When encoding has been performed in the order of temporal filtering, scaling, and spatial transformation, the inverse spatial transformation block 330 performs inverse spatial transformation on the scaled transform coefficients, and then the inverse weighting block 370 descales the Block 330 processes the scaled transform coefficients. Thereafter, the inverse temporal filtering block 340 restores the video frames by inverse temporal filtering.

当已经以空间变换、时间过滤和扩缩的顺序执行了编码时，逆加权块370去扩缩被扩缩的变换系数，由此获得变换系数。接着，逆时间过滤块350使用所述变换系数来构造图像，并且对于所述图像执行逆时间过滤。接着，逆空间变换块360对于所述图像执行逆空间变换，由此恢复视频帧。可以按照GOP来改变编码顺序。在这种情况下，比特流分析块310从比特流的GOP首标获得编码顺序信息。同时，可以预定基本编码顺序，并且比特流可以不包括编码顺序信息。在这种情况下，可以以与基本编码顺序相反的顺序来执行解码。例如，当基本编码顺序是时间过滤、空间变换和扩缩时，如果比特流不包括编码顺序信息，则依序对比特流执行去扩缩、逆空间变换和逆时间过滤(即使用在图11中的低虚线框中的逆空间变换块330和逆时间过滤块340来进行解码)。When encoding has been performed in the order of spatial transformation, temporal filtering, and scaling, the inverse weighting block 370 descales the scaled transform coefficients, thereby obtaining transform coefficients. Next, an inverse temporal filtering block 350 constructs an image using the transform coefficients and performs inverse temporal filtering on the image. Next, an inverse spatial transformation block 360 performs an inverse spatial transformation on the image, thereby restoring a video frame. The encoding order can be changed according to GOP. In this case, the bitstream analysis block 310 obtains encoding order information from the GOP header of the bitstream. Meanwhile, the basic encoding order may be predetermined, and the bitstream may not include encoding order information. In this case, decoding may be performed in the reverse order of the basic encoding order. For example, when the basic coding sequence is temporal filtering, spatial transformation and scaling, if the bitstream does not include coding sequence information, descaling, inverse spatial transformation and inverse temporal filtering are performed sequentially on the bitstream (i.e., as used in Figure 11 The inverse spatial transformation block 330 and the inverse temporal filtering block 340 in the lower dashed box in ) for decoding).

在上述的实施例中，已经描述了可扩缩视频编码器发送包括权重的比特流，并且可扩缩视频解码器使用所述权重来恢复视频图像。本发明不限于此。例如，可扩缩视频编码器可以变换信息(即MAD信息)，可扩缩视频解码器可以从所述信息获得权重。In the above-described embodiments, it has been described that the scalable video encoder transmits a bitstream including weights, and the scalable video decoder restores video images using the weights. The present invention is not limited thereto. For example, a scalable video encoder can transform information (ie MAD information) from which a scalable video decoder can derive weights.

视频编码器和视频解码器可以以硬件来实现。或者，可以使用通用计算机和用于执行编码和解码方法的软件来实现它们，所述通用计算机包括能够计算的中央处理单元和存储器。这样的软件可以被记录在诸如光盘只读存储器(CD-ROM)或硬盘之类的记录媒体上，以便所述软件可以使用计算机来一起实现视频编码器和视频解码器。Video encoders and video decoders can be implemented in hardware. Alternatively, they can be realized using a general-purpose computer including a central processing unit capable of calculation and a memory, and software for executing encoding and decoding methods. Such software can be recorded on a recording medium such as a compact disc read only memory (CD-ROM) or a hard disk so that the software can realize a video encoder and a video decoder together using a computer.

因此，本领域内的技术人员可以明白，在不脱离由所附的权利要求限定的本发明的精神和范围的情况下，可以进行形式和细节上的各种改变。在上述的实施例中，已经使用了MCTF，但是任何类型的周期时间过滤将被解释为被包括在本发明的范围中。Accordingly, it will be apparent to those skilled in the art that various changes in form and details may be made without departing from the spirit and scope of the present invention as defined by the appended claims. In the embodiments described above, MCTF has been used, but any type of cycle time filtering is to be construed as being included within the scope of the present invention.

通过本发明的实验而获得的值被示出在表2-7中。在本发明中，平均PSNR与通过传统的MCTF获得的没有太大不同。但是，与传统的MCTF相比较，本发明降低了标准偏差。The values obtained by the experiments of the present invention are shown in Tables 2-7. In the present invention, the average PSNR is not much different from that obtained by conventional MCTF. However, the present invention reduces the standard deviation compared to conventional MCTF.

表2：在Foreman序列中的平均PSNR 比特率本发明传统MCTF(前向过滤) 128 30.88 30.91 256 35.66 35.68 512 39.19 39.23 1024 43.65 43.71 Table 2: Average PSNR in Foreman sequence bit rate this invention Traditional MCTF (forward filtering) 128 30.88 30.91 256 35.66 35.68 512 39.19 39.23 1024 43.65 43.71

表3：在Foreman序列中的标准偏差比特率本发明传统MCTF(前向过滤) 128 1.22 1.23 256 0.89 0.94 512 0.75 0.84 1024 0.62 0.74 Table 3: Standard Deviation in Foreman Sequence bit rate this invention Traditional MCTF (forward filtering) 128 1.22 1.23 256 0.89 0.94 512 0.75 0.84 1024 0.62 0.74

表4：在Canoa(卡诺亚)序列中的平均PSNR 比特率本发明传统MCTF(前向过滤) 128 28.46 28.45 256 32.58 32.58 512 37.76 37.76 1024 45.36 45.43 Table 4: Average PSNR in Canoa sequence bit rate this invention Traditional MCTF (forward filtering) 128 28.46 28.45 256 32.58 32.58 512 37.76 37.76 1024 45.36 45.43

表5：在Canoa序列中的标准偏差比特率本发明传统MCTF(前向过滤) 128 0.859 0.861 256 1.004 1.007 512 1.000 1.020 1024 1.070 1.090 Table 5: Standard Deviations in Canoa Sequences bit rate this invention Traditional MCTF (forward filtering) 128 0.859 0.861 256 1.004 1.007 512 1.000 1.020 1024 1.070 1.090

表6：在Tempete(泰彼得)序列中的平均PSNR 比特率本发明传统MCTF(前向过滤) 128 27.98 27.99 256 32.2 32.28 512 35.42 35.5 1024 37.78 37.82 Table 6: Average PSNR in Tempete sequence bit rate this invention Traditional MCTF (forward filtering) 128 27.98 27.99 256 32.2 32.28 512 35.42 35.5 1024 37.78 37.82

表7：在Tempete序列中的标准偏差比特率本发明传统MCTF(前向过滤) 128 0.348 0.350 256 0.591 0.670 512 0.555 0.682 1024 0.564 0.654 Table 7: Standard Deviation in Tempete Sequence bit rate this invention Traditional MCTF (forward filtering) 128 0.348 0.350 256 0.591 0.670 512 0.555 0.682 1024 0.564 0.654

工业实用性Industrial Applicability

本发明提供了能够在可扩缩视频编码中降低在帧索引之间的PSNR值中的改变的模型。换句话说，按照本发明，在单个GOP中的帧的高PSNR值降低，而在所述GOP中的其它帧的低PSNR值提高，以便可以改善视频编码性能。The present invention provides a model capable of reducing changes in PSNR values between frame indices in scalable video coding. In other words, according to the present invention, high PSNR values of frames in a single GOP are reduced, while low PSNR values of other frames in the GOP are increased, so that video coding performance can be improved.

因此，应当明白，上述的实施例仅仅用于说明，而不应被解释为对于本发明的限制。本发明的范围通过所附的权利要求给出，而不是通过前述的说明书给出，并且，落在权利要求的范围中的所有改变和等同内容意欲被包含在本发明中。Therefore, it should be understood that the above-mentioned embodiments are only for illustration, and should not be construed as limiting the present invention. The scope of the present invention is given by the appended claims rather than the foregoing description, and all changes and equivalents which fall within the scope of the claims are intended to be embraced in the present invention.

Claims

1. scalable method for video coding comprises:

(a) receive a plurality of frame of video, and a plurality of frame of video are carried out motion compensated temporal filtering (MCTF), from frame of video, to remove time redundancy; And

(b) from the frame of video that is removed time redundancy, obtain scalable conversion coefficient, quantize scalable conversion coefficient, and produce bit stream.

2. according to the described scalable method for video coding of claim 1, wherein, the frame of video that receives in step (a) has been passed through wavelet transformation, thereby removed spatial redundancy from frame of video, and by use to some subbands in the frame of video that is removed time redundancy predetermined weight obtain scalable conversion coefficient.

3. according to the described scalable method for video coding of claim 1, wherein, by using predetermined weight to some subbands in the frame of video that is removed time redundancy, then the subband of weighting is carried out spatial alternation, come in step (b) acquisition scalable conversion coefficient.

4. according to the described scalable method for video coding of claim 1, wherein, by the frame of video that is removed time redundancy is carried out spatial alternation, use predetermined weight to conversion coefficient in the conversion coefficient that produces by spatial alternation, that obtain from some subbands then, come in step (b) acquisition scalable conversion coefficient.

5. according to the described scalable method for video coding of claim 4, wherein, determine predefined weight for each picture group (GOP), for single GOP, described predefined weight has single and identical value.

6. according to the described scalable method for video coding of claim 5, wherein, determine described predefined weight according to the amplitude of the absolute distortion of GOP.

7. according to the described scalable method for video coding of claim 6, wherein, obtain to use described predefined weight and scalable conversion coefficient from subband, described subband is at the subband that is used for constructing low PSNR frame, compare with low PSNR frame, high Y-PSNR (PSNR) frame is applied the essence slight influence.

8. according to the described scalable method for video coding of claim 7, wherein, each GOP comprises 16 frames; On single direction, carry out MCTF; Calculate the amplitude (MAD) of absolute distortion by following formula,

MAD = 8 \times Σ_{i = 0}^{\frac{n - 1}{2}} Σ_{x = 0}^{p - 1} Σ_{y = 0}^{q - 1} | T_{2 i + 1} (x, y) - T_{2 i} (x, y) |

Wherein, ' i ' indicates frame index, the last frame index of ' n ' indication in GOP, T (x, y) position of indication in the T frame (x, picture value y), and the size of single frame is p*q; According to the following predefined weight ' a ' that calculates, a=1.3 (if MAD＜30), a=1.4-0.0033MAD (if 30＜MAD＜140), and a=1 (if MAD＞140); And obtain to use predefined weight and scalable conversion coefficient from subband W4, W6, W8, W10, W12 and W14.

9. according to the described scalable method for video coding of claim 1, wherein, the bit stream that in step (b), produces comprise about be used to obtain the information of weight of scalable conversion coefficient.

10. a scalable video encoder receives a plurality of frame of video, and produces bit stream, and described scalable video encoder comprises:

The temporal filtering piece is carried out motion compensated temporal filtering (MCTF) to frame of video, to remove time redundancy from frame of video;

The spatial alternation piece is carried out spatial alternation to frame of video, to remove spatial redundancy from frame of video;

Weight is determined piece, determines weight, and described weight will be used for scalable at the conversion coefficient conversion coefficient that obtains as removing the result of time redundancy and spatial redundancy from frame of video, that obtain from some subbands;

Quantize block, quantize scalable conversion coefficient; And

Bit stream produces piece, uses quantized transform coefficients to produce bit stream.

11. according to the described scalable video encoder of claim 10, wherein, described spatial alternation piece is carried out wavelet transformation to remove spatial redundancy from frame of video to frame of video, described temporal filtering piece uses the subband that obtains by the frame of video execution MCTF to wavelet transformation to produce conversion coefficient, and described weight determines that piece uses the frame of wavelet transformation to determine weight, and determined weight be multiply by the conversion coefficient that obtains from some subbands, obtain thus scalable conversion coefficient.

12. according to the described scalable video encoder of claim 10, wherein, described temporal filtering piece obtains subband by frame of video is carried out MCTF, described weight determines that piece uses frame of video to determine weight, and with determined weight multiply by some subbands with obtain scalable subband, and described spatial alternation piece to scalable subband carry out spatial alternation, obtain thus scalable conversion coefficient.

13. according to the described scalable video encoder of claim 10, wherein, described temporal filtering piece obtains subband by frame of video is carried out MCTF, described spatial alternation piece produces conversion coefficient by subband is carried out spatial alternation, and described weight determines that piece uses frame of video to determine weight, and determined weight be multiply by the conversion coefficient that obtains from predetermined sub-band, obtain thus scalable conversion coefficient.

14. according to the described scalable video encoder of claim 13, wherein, determine predefined weight for each picture group (GOP), for single GOP, described predefined weight has single and identical value.

15., wherein, determine predefined weight according to the amplitude of the absolute distortion of GOP according to the described scalable video encoder of claim 14.

16. according to the described scalable video encoder of claim 15, wherein, obtain to use predefined weight and scalable conversion coefficient from subband, described subband is at the subband that is used for constructing low PSNR frame, compare with low PSNR frame, high Y-PSNR (PSNR) frame is applied the essence slight influence.

17. according to the described scalable video encoder of claim 16, wherein, each GOP comprises 16 frames; On single direction, carry out MCTF; Calculate the amplitude (MAD) of absolute distortion by following formula,

MAD = 8 \times Σ_{i = 0}^{\frac{n - 1}{2}} Σ_{x = 0}^{p - 1} Σ_{y = 0}^{q - 1} | T_{2 i + 1} (x, y) - T_{2 i} (x, y) |

18. according to the described scalable video encoder of claim 10, wherein, described bit stream produce piece comprise about be used to obtain the information of weight of scalable conversion coefficient.

19. a scalable video encoding/decoding method comprises:

From bitstream extraction image encoded information, coded sequence information and weight information;

By coded image information is gone to quantize to obtain scalable conversion coefficient; And

With with come by the opposite decoding order of the coded sequence of described coded sequence information indication to scalable conversion coefficient carry out and remove scalable, inverse spatial transform and filter between the inverse time, thereby recover frame of video.

20. according to the described scalable video encoding/decoding method of claim 19, wherein, described decoding order is to filter and inverse spatial transform between scalable, inverse time.

21. according to the described scalable video encoding/decoding method of claim 19, wherein, described decoding order is inverse spatial transform, go scalable and filter between the inverse time.

22. according to the described scalable video encoding/decoding method of claim 19, wherein, described decoding order is scalable, inverse spatial transform and filters between the inverse time.

23. according to the described scalable video encoding/decoding method of claim 22, wherein, for each picture group (GOP), from the bitstream extraction predefined weight.

24. according to the described scalable video encoding/decoding method of claim 23, wherein, the number that constitutes the frame of GOP is 2 ^k(k=1 wherein, 2,3 ...).

25., wherein, obtain to use predefined weight and contrary scalable conversion coefficient from subband W4, W6, W8, W10, W12 and the W14 that during encoding, produces according to the described scalable video encoding/decoding method of claim 23.

26. a scalable Video Decoder comprises:

The bit stream analysis piece is analyzed the bit stream received with from described bitstream extraction image encoded information, coded sequence information and weight information;

Inverse quantization block, with coded image go to quantize with obtain scalable conversion coefficient;

Contrary weighting block, execution is gone scalable;

The inverse spatial transform piece is carried out inverse spatial transform; And

Filter block between the inverse time is carried out between the inverse time and is filtered,

Described scalable Video Decoder with come by the opposite order of the coded sequence of described coded sequence information indication to scalable conversion coefficient carry out and remove scalable, inverse spatial transform and filter between the inverse time, recover frame of video thus.

27. according to the described scalable Video Decoder of claim 26, wherein, described decoding order is scalable for going, filtration and inverse spatial transform between the inverse time.

28. according to the described scalable Video Decoder of claim 26, wherein, described decoding order is inverse spatial transform, go scalable and filter between the inverse time.

29. according to the described scalable Video Decoder of claim 26, wherein, described decoding order is scalable for going, inverse spatial transform and filtering between the inverse time.

30. according to the described scalable Video Decoder of claim 29, wherein, the weight that described bit stream analysis piece is scheduled to from described bitstream extraction for each picture group (GOP).

31. according to the described scalable Video Decoder of claim 30, wherein, the number that constitutes the frame of GOP is 2 ^k(k=1 wherein, 2,3 ...).

32. according to the described scalable Video Decoder of claim 26, wherein, described contrary weighting block is for carrying out contrary scalable from the scalable conversion coefficient of subband W4, the W6, W8, W10, W12 and the W14 that have produced during encoding.

33. a recording medium has computer-readable code, is used for carrying out the step according to the method for any one claim of claim 1-9 and 19-25.