CN1708991B

CN1708991B - Method and apparatus for controlling rate-distortion tradeoff using lagrange multiplier and visual masking

Info

Publication number: CN1708991B
Application number: CN200380101963.0A
Authority: CN
Inventors: B·G·哈斯克尔; A·杜米特拉什; A·普里
Original assignee: Apple Inc; Apple Computer Inc
Current assignee: Apple Inc
Priority date: 2002-11-08
Filing date: 2003-10-31
Publication date: 2010-08-18
Anticipated expiration: 2023-10-31
Also published as: CN101409835B; CN101409835A; CN1708991A

Abstract

The present invention discloses a method and apparatus for controlling rate-distortion trade-off by mode selection in a video encoder. The system of the present invention first selects a distortion value D around the expected distortion value. Next, the system determines a quantizer value Q using the selected distortion value D. The system then calculates the Lagrangian multiplier lambda using the quantizer value Q. With the selected lagrange multiplier lambda and quantizer value Q, the system begins encoding the prime module. If the system detects a potential buffer overflow, the system will increase the Lagrangian multiplier lambda. If the Lagrangian multiplier lambda exceeds the maximum lambda threshold, the system will increase the quantizer value Q. If the system detects a potential buffer underflow, the system will decrease the Lagrangian multiplier lambda. If the Lagrangian multiplier lambda falls below the minimum lambda threshold, the system will decrease the quantizer value Q.

Description

Method and apparatus for controlling rate-distortion tradeoff using Lagrangian multipliers and visual masking

技术领域technical field

本发明涉及多媒体压缩和编码系统领域。本发明尤其公开了用于控制数字视频编码器中速率-失真折中的方法和系统。The invention relates to the field of multimedia compression and coding systems. Among other things, the present invention discloses a method and system for controlling the rate-distortion tradeoff in a digital video encoder.

背景技术Background technique

基于数字的电子介质格式正在完全取代传统的模拟电子介质格式。在音频领域，多年前数字压缩盘(CD)取代了模拟乙烯树脂唱片。模拟的盒式磁带也变得愈加稀少。第二和第三代数字音频系统诸如基于小型磁盘和mp3(MPEG音频-层3)的格式正同第一代数字音频格式的压缩盘夺取市场份额。Digital-based electronic media formats are completely replacing traditional analog electronic media formats. In the audio world, digital compact discs (CDs) replaced analog vinyl records many years ago. Analog cassette tapes are also becoming increasingly rare. Second and third generation digital audio systems such as compact disc based formats and mp3 (MPEG Audio-Layer 3) formats are taking market share from first generation digital audio format compact discs.

基于数字的静态摄影正迅速地取代基于胶片的静态摄影。通过因特网为用户提供了具有不可抗拒的特点的图像的及时利用性以及图像分配。Digital-based still photography is rapidly replacing film-based still photography. Immediate availability and distribution of images with an irresistible character is provided to users via the Internet.

然而，视频领域向数字存储和传输格式发展的速度要比音频和静态摄影的发展速度慢。这主要是由于以数字格式精确表示视频需要大量的数字信息。精确表示视频所需的大量数字信息需要非常高容量的数字存储系统和高带宽的传输系统。However, the video world has been slower to move toward digital storage and transmission formats than audio and still photography. This is mainly due to the large amount of digital information required to accurately represent video in digital format. Accurately representing the large amounts of digital information required for video requires very high-capacity digital storage systems and high-bandwidth transmission systems.

但是视频领域最终采用数字存储和传输格式。更快的计算机处理器、高密度的存储系统、高带宽的光传输线以及新的高效视频编码算法最终使得数字视频系统在消费价格方面变得实用。DVD(数字通用光盘)、数字视频系统已经成为销售最快的消费电子产品之一。由于其杰出的视频质量、高质量的5.1信道数字音频、便利以及其它特点，DVD已经迅速代替了录像机(VCR)，成为选择的预先录制视频重放系统。在视频传输系统领域，过时的模拟NTSC(国家电视标准委员会)视频传输标准最终由使用数字压缩和编码技术的数字ATSC(高级电视标准委员会)视频传输系统所代替。But the video domain eventually adopted digital storage and transmission formats. Faster computer processors, high-density storage systems, high-bandwidth optical transmission lines, and new high-efficiency video encoding algorithms are finally making digital video systems practical at consumer prices. DVD (Digital Versatile Disc), digital video system has become one of the fastest-selling consumer electronics products. Due to its outstanding video quality, high-quality 5.1-channel digital audio, convenience, and other features, DVDs have rapidly replaced video recorders (VCRs) as the pre-recorded video playback system of choice. In the field of video transmission systems, the outdated analog NTSC (National Television Standards Committee) video transmission standard was eventually replaced by the digital ATSC (Advanced Television Standards Committee) video transmission system using digital compression and encoding techniques.

多年来，计算机系统已经使用各种不同的数字视频编码格式。由计算机系统使用的最好的数字视频压缩和编码系统是公知的由其缩写为MPEG的运动图像专家组支持的数字视频系统。MPEG的三种最公知并且使用率非常高的数字视频格式是已知简单的MPEG-1、MPEG-2和MPEG-4。视频CD以及用户标准数字视频编辑系统使用早期的MPEG-1格式。数字通用光盘(DVD)以及碟形网络牌(Dish Network brand)直播卫星电视广播系统(DBS)使用MPEG-2数字视频压缩和编码系统。基于最新计算机的数字视频编码器和相关的数字视频播放器正迅速地采用了MPEG-4编码系统。Over the years, computer systems have used a variety of different digital video encoding formats. The best digital video compression and encoding system used by computer systems is the well known digital video system supported by the Moving Picture Experts Group with its acronym MPEG. The three most well-known and very heavily used digital video formats of MPEG are the well-known simple MPEG-1, MPEG-2 and MPEG-4. Video CDs, as well as consumer standard digital video editing systems, use the early MPEG-1 format. Digital Versatile Disc (DVD) and Dish Network brand Direct Broadcast Satellite Broadcasting (DBS) use the MPEG-2 digital video compression and encoding system. The latest computer-based digital video encoders and associated digital video players are rapidly adopting the MPEG-4 encoding system.

MPEG-2和MPEG-4标准对一系列视频帧或视频字段进行压缩，然后将压缩的帧或字段编译成数字比特流。必须严密监控数字比特流的速率，以使其不溢出缓存区、不下溢出缓存区或不超出传输信道容量。因此，必须使复杂的速率控制系统与数字视频编码器一起应用，所述的数字视频编码器在分配信道容量中提供了尽可能最好的图像质量，而不溢出或下溢缓存区。The MPEG-2 and MPEG-4 standards compress a series of video frames or fields and then compile the compressed frames or fields into a digital bit stream. The rate of the digital bit stream must be closely monitored so that it does not overflow buffers, underflow buffers, or exceed transmission channel capacity. Therefore, complex rate control systems must be implemented with digital video encoders that provide the best possible picture quality within the allocated channel capacity without overflowing or underflowing buffers.

发明内容Contents of the invention

本发明公开了一种在视频编码器中通过模式选择来控制速率-失真折中的方法和装置。本发明的系统首先在预期的失真值附近选择失真值D。接下来，该系统利用选定的失真值D确定量化器值Q。该系统然后利用量化器值Q计算拉格朗日乘子lambda。利用选择的拉格朗日乘子lambda和量化器值Q，该系统开始对象素模块进行编码。The present invention discloses a method and apparatus for controlling rate-distortion tradeoff through mode selection in a video encoder. The system of the present invention first selects the distortion value D around the expected distortion value. Next, the system uses the selected distortion value D to determine a quantizer value Q. The system then computes the Lagrangian multiplier lambda using the quantizer value Q. With the selected Lagrangian multiplier lambda and quantizer value Q, the system begins encoding pixel blocks.

如果系统检测到潜在的缓存区溢出，则该系统将增加拉格朗日乘子lambda。当缓存区的占有率值超过溢出阈值时，可以检测到潜在的缓存区溢出。如果拉格朗日乘子lambda超过最大的lambda阈值，则系统将增加量化器值Q。If the system detects a potential buffer overflow, the system will increase the Lagrangian multiplier lambda. A potential buffer overflow can be detected when the occupancy value of the buffer exceeds an overflow threshold. If the Lagrangian multiplier lambda exceeds the maximum lambda threshold, the system will increase the quantizer value Q.

如果系统检测到潜在的缓存区下溢，则系统将减小拉格朗日乘子lambda。当缓存区的占有率值降到缓存区下溢阈值以下时，可以检测到潜在的缓存区下溢。如果拉格朗日乘子lambda降到最小的lambda阈值以下，则系统将减小量化器值Q。If the system detects a potential buffer underflow, the system will decrease the Lagrangian multiplier lambda. A potential buffer underflow may be detected when the occupancy value of the buffer drops below a buffer underflow threshold. If the Lagrange multiplier lambda falls below the minimum lambda threshold, the system will decrease the quantizer value Q.

通过附图以及下列的详细描述，本发明的其它目的、特点以及优点将会显而易见。Other objects, features and advantages of the present invention will be apparent from the accompanying drawings and the following detailed description.

附图说明Description of drawings

通过下面的详细描述，本发明的目的、特点以及优点对本领域的技术人员来讲将会是显而易见的，其中：Objects, features and advantages of the present invention will be apparent to those skilled in the art through the following detailed description, wherein:

图1描述的是一个可能的数字视频编码系统的高级框图；Figure 1 depicts a high-level block diagram of a possible digital video coding system;

图2描述的是一系列将要显示的视频图片，其中连接不同图片的箭头表示利用运动补偿生成的交互图片的相关性；What Figure 2 describes is a series of video pictures to be displayed, wherein the arrows connecting different pictures represent the correlation of interactive pictures generated by motion compensation;

图3表示将图2中的视频图片重新排列得到的优选传输顺序的视频图片，其中连接不同图片的箭头表示利用运动补偿生成的交互图片的相关性；Fig. 3 represents the video picture of the preferred transmission sequence obtained by rearranging the video pictures in Fig. 2, wherein the arrows connecting different pictures represent the correlation of interactive pictures generated by motion compensation;

图4描述的是一族R、D曲线，每个曲线对应量化器Q的每个不同的值。What Fig. 4 describes is a family of R, D curves, and each curve corresponds to each different value of the quantizer Q.

具体实施方式Detailed ways

本发明公开了在视频编码器中通过模式选择来控制速率-失真折中的模式。在下列的描述中，为了便于解释，提出具体的术语来提供对本发明的完整的理解。然而，对本领域技术人员来讲显而易见的是：为了实施本发明并不需要这些具体的细节。例如，参照MPEG-4部分10(H.264)多媒体压缩和编码系统对本发明进行了描述。但是，相同的技术可以很容易地应用到其它类型的压缩和编码系统。The present invention discloses modes to control the rate-distortion tradeoff through mode selection in a video encoder. In the following description, for convenience of explanation, specific terms are set forth to provide a complete understanding of the present invention. It will be apparent, however, to one skilled in the art that these specific details are not required in order to practice the present invention. For example, the invention has been described with reference to the MPEG-4 Part 10 (H.264) multimedia compression and coding system. However, the same techniques can be easily applied to other types of compression and encoding systems.

多媒体压缩和编码综述A Survey of Multimedia Compression and Coding

图1描述的是现有技术中公知的典型的数字视频编码器100的高级框图。数字视频编码器100接收位于框图左方的输入视频流105。每个视频帧由离散余弦转换(DCT)单元110进行处理。可以对视频帧独立进行处理(帧内)或者利用运动估计单元160参照来自其它帧(帧间)的信息对其进行处理。然后量化器(Q)单元120对来自离散余弦转换(DCT)单元110的信息进行量化。然后通过熵编码器(H)180对量化的帧进行编码以生成编码的视频比特流。FIG. 1 depicts a high-level block diagram of a typical digital video encoder 100 known in the art. Digital video encoder 100 receives an input video stream 105 located on the left of the block diagram. Each video frame is processed by a discrete cosine transform (DCT) unit 110 . Video frames may be processed independently (intra) or with motion estimation unit 160 with reference to information from other frames (inter). A quantizer (Q) unit 120 then quantizes the information from the discrete cosine transform (DCT) unit 110 . The quantized frames are then encoded by an entropy encoder (H) 180 to generate an encoded video bitstream.

由于参照其它附近的视频帧来确定帧间编码的视频帧，数字视频编码器100需要对参考的数字视频帧如何真正出现在数字视频解码器内进行拷贝，以使得可以对帧间进行编码。因而，数字视频编码器100的下部实际上是数字视频解码器。特别地，反向量化器(Q^-1)130使帧信息的量化反转，并且反向离散余弦变换(DCT^-1)单元140使视频帧信息的离散余弦变换反转。毕竟DCT系数是从反向离散余弦变换重建的，运动补偿单元将利用该信息连同运动向量来重建视频帧，其中该视频帧可以作为用于其它视频帧的运动估计的参考视频帧。Since an inter-coded video frame is determined with reference to other nearby video frames, the digital video encoder 100 needs to copy how the referenced digital video frame actually appears in the digital video decoder so that the inter-frame can be coded. Thus, the lower part of the digital video encoder 100 is actually a digital video decoder. In particular, the inverse quantizer (Q ⁻¹ ) 130 inverts the quantization of the frame information, and the inverse discrete cosine transform (DCT ⁻¹ ) unit 140 inverts the discrete cosine transform of the video frame information. After all the DCT coefficients are reconstructed from the Inverse Discrete Cosine Transform, the motion compensation unit will use this information together with the motion vectors to reconstruct a video frame which can be used as a reference video frame for motion estimation of other video frames.

解码的视频帧可以用来对帧间进行编码，所述的帧间是相对于解码的视频帧中的信息定义的。特别地，运动补偿(MC)单元150以及运动估计(ME)单元160用来确定运动向量并产生用作对帧间进行编码的微分值。The decoded video frame may be used to encode inter-frames defined relative to information in the decoded video frame. In particular, a motion compensation (MC) unit 150 and a motion estimation (ME) unit 160 are used to determine motion vectors and generate differential values for encoding inter-frames.

速率控制器190接收数字视频编码器100中来自许多不同部件的信息，并利用该信息为每个将被编码的视频帧分配比特预算。以能产生最高质量的数字比特流的方式分配比特预算，所述的比特流遵守特定的一套限定。特别地，速率控制器190试图产生最高质量的压缩视频流而不溢出缓存区(以比视频帧信息被显示并随后被删除更快的速度来发送视频帧信息的方式来超过可利用的缓存区的量)或下溢缓存区(发送的视频帧信息不够快以使得正接收的数字视频解码器用完了显示的视频帧信息)。Rate controller 190 receives information from many different components in digital video encoder 100 and uses this information to allocate a bit budget for each video frame to be encoded. The bit budget is allocated in such a way as to produce the highest quality digital bit stream that obeys a certain set of constraints. In particular, the rate controller 190 attempts to produce the highest quality compressed video stream without overflowing the buffer (exceeding the available buffer by sending video frame information faster than it can be displayed and then deleted) amount) or underflow buffer (video frame information is not sent fast enough for the receiving digital video decoder to run out of video frame information for display).

象素模块编码pixel block coding

许多数字视频编码算法首先将每个视频图像分割成通常被称作象素模块的小的象素子集。特别地，视频图像被分成矩形象素模块网格。术语宏块(macroblock)、块、子块也通常用于象素子集。本文档将使用术语象素模块来包括所有这些不同但相似的概念。不同尺寸的象素模块可以由不同的数字视频编码系统使用。例如，所使用的不同的象素模块尺寸包括8×8象素模块、8×4象素模块、16×16象素模块、4×4象素模块等。Many digital video coding algorithms first divide each video image into small subsets of pixels, often called pixel blocks. In particular, video images are divided into a grid of rectangular pixel modules. The terms macroblock, block, subblock are also commonly used for a subset of pixels. This document will use the term pixel module to cover all these different but similar concepts. Pixel modules of different sizes can be used by different digital video coding systems. For example, different pixel block sizes used include 8x8 pixel blocks, 8x4 pixel blocks, 16x16 pixel blocks, 4x4 pixel blocks, and the like.

为了编码视频图像，使用某种编码方法对视频图像的每个独立的象素模块进行编码。不用参照任何其它象素模块就可对有些被称作内部模块的象素模块进行编码。利用诸如运动补偿的某种预测的编码方法对其它的象素模块进行编码，其中所述的运动补偿参考在相同或不同的视频图像中的最接近地匹配的象素模块。To encode a video image, each individual pixel block of the video image is encoded using a certain encoding method. Some pixel blocks, called internal blocks, can be encoded without reference to any other pixel blocks. Other pixel blocks are encoded using some predictive coding method, such as motion compensation, which refers to the closest matching pixel block in the same or a different video image.

对视频图像中的每个独立的象素模块独立进行压缩和编码。一些视频编码标准，例如ISO MPEG或ITU.264，使用不同类型的预测象素模块来编码数字视频图像。在一个方案中，象素模块可以是以下3种类型之一：Each individual pixel block in a video image is compressed and encoded independently. Some video coding standards, such as ISO MPEG or ITU.264, use different types of predictive pixel blocks to code digital video images. In one aspect, pixel modules can be one of the following 3 types:

1、I象素模块——内部(I)象素模块，在其编码中不使用任何其它视频图像的信息(因而，内部象素模块完全是自定义的)；1. I pixel module - an internal (I) pixel module that does not use any other video image information in its encoding (thus, the internal pixel module is completely self-defined);

2、P象素模块——单向的预测(P)象素模块参考来自于较早视频图像的图像信息；或2. P-pixel module - a unidirectional predictive (P) pixel module that references image information from earlier video images; or

3、B象素模块——双向的预测(B)象素模块，其利用较早视频图像或稍后的视频图像的信息。3. B-pixel module - a bidirectional predictive (B)-pixel module that utilizes information from earlier video images or later video images.

如果编码的数字视频图像中的所有象素模块是内部象素模块(I 象素模块)，则编码的数字视频图像帧称作帧内。注意帧内并不参考任何其它视频图像，使得帧内数字视频图像完全自定义。If all pixel modules in the encoded digital video image are intra pixel modules (I pixel modules), the encoded digital video image frame is called intraframe. Note that Intra does not reference any other video images, making Intra digital video images completely customizable.

如果数字视频图像帧只包括单向预测象素模块(P象素模块)以及内部象素模块(I象素模块)但不包括双向的预测象素模块(B象素模块)，则该视图像称作P帧。当使用预测的编码(P象素模块编码)比独立编码象素模块(I象素模块)需要更多的比特时，I象素模块可以出现在P帧中。If a digital video image frame only includes a unidirectional predictive pixel module (P pixel module) and an internal pixel module (I pixel module) but does not include a bidirectional predictive pixel module (B pixel module), then the visual image It is called a P frame. I-pixel blocks can occur in P-frames when coding using prediction (P-pixel block coding) requires more bits than independently coding pixel blocks (I-pixel blocks).

如果数字视频图像帧包括任何双向预测象素模块(B象素模块)，则视频图像帧称作B帧。为了简化，本申请将考虑到在给定的图像区域内所有的象素模块都是相同类型的情形。(帧内只包括I象素模块，P帧只包括P象素模块，B帧只包括B象素模块。)A video frame is called a B frame if it includes any bidirectionally predictive pixel blocks (B pixel blocks). For simplicity, this application will consider the case where all pixel modules within a given image area are of the same type. (the frame only includes I pixel modules, the P frame only includes P pixel modules, and the B frame only includes B pixel modules.)

将要被编码的视频图像的序列的例子可以表示为：An example of a sequence of video images to be encoded can be represented as:

I₁ B₂ B₃ B₄ P₅ B₆ B₇ B₈ B₉ P₁₀ B₁₁ P₁₂ B₁₃ I₁₄……I ₁ B ₂ B ₃ B ₄ P ₅ B ₆ B ₇ B ₈ B ₉ P ₁₀ B ₁₁ P ₁₂ B ₁₃ I ₁₄ ...

其中如果数字视频图像帧是I帧、P帧或B帧则用字母I、P或B表示，并且数字下标表示在视频图像序列中的视频图像的拍摄顺序。拍摄顺序是摄影机记录视频图像的顺序，因而也是视频图像应该被显示的顺序(显示顺序)。Wherein if the digital video image frame is an I frame, a P frame or a B frame, it is represented by a letter I, P or B, and the digital subscript indicates the shooting order of the video images in the video image sequence. The shooting order is the order in which the cameras record the video images and thus the order in which the video images should be displayed (display order).

在图2中对前述的实施例的一系列视频图像进行了概念描述。参照附图2，箭头表示来自存储的图片的象素模块(在这种情况下为I帧或P帧)用在了其它数字视频图片(B帧和P帧)的运动补偿预测中。A series of video images of the foregoing embodiments are conceptually described in FIG. 2 . Referring to Figure 2, arrows indicate that pixel blocks from stored pictures (in this case I or P frames) are used in motion compensated prediction of other digital video pictures (B and P frames).

参照附图2，没有来自于任何其它的视频图片的信息用在了第一视频图片帧、帧内视频图片I₁的编码中。视频图像P₅是P帧，其在编码中利用来自于前一个视频图像I₁的视频信息，因此箭头从帧内视频图像I₁绘制到P帧视频图像P₅。在其编码中，视频图像B₂、视频图像B₃和视频图像B₄全部都是利用来自视频图像I₁和视频图像P₅的信息，所以信息相关性箭头从视频图像I₁和视频图像P₅绘制到视频图像B₂、视频图像B₃和视频图像B₄。Referring to FIG. 2, no information from any other video picture is used in the encoding of the first video picture frame, intra video picture _I1 . Video picture _P5 is a P-frame that utilizes video information from the previous video picture _I1 in encoding, so an arrow is drawn from intra-frame video picture _I1 to P-frame video picture _P5 . In their coding, video image B ₂ , video image B ₃ and video image B ₄ all utilize information from video image I ₁ and video image P ₅ , so the information dependency arrows from video image I ₁ and video image P ₅ is drawn to video image B ₂ , video image B ₃ and video image B ₄ .

由于B帧视频图像利用来自于后面的视频图像(随后显示的图像)的信息，一组数字视频图像的传输顺序通常与数字视频图像的显示顺序不同。特别地，需用来构建其它视频图像的参考视频图像应当在由参考视频图像所决定的视频图像之前传输。因而，对于图2中的显示顺序，优选的传输顺序可以是：Since B-frame video images utilize information from subsequent video images (pictures that are subsequently displayed), the transmission order of a set of digital video images is usually not the same as the display order of the digital video images. In particular, the reference video pictures needed to construct other video pictures should be transmitted before the video pictures determined from the reference video pictures. Thus, for the display order in Figure 2, the preferred transmission order may be:

I₁ P₅ B₂ B₃ B₄ P₁₀ B₆ B₇ B₈ B₉ P₁₂ B₁₁ I₁₄ B₁₃……I ₁ P ₅ B ₂ B ₃ B ₄ P ₁₀ B ₆ B ₇ B ₈ B ₉ P ₁₂ B ₁₁ I ₁₄ B ₁₃ …

图3描述的是图2中的视频图像的优选的传输顺序。图中的箭头表示来自于参考视频图像的象素模块(在这种情况下的I帧或P帧视频图像)用在了其它视频图像的运动补偿预测中(P帧和B帧视频图像)。FIG. 3 depicts a preferred transmission sequence of the video images in FIG. 2 . Arrows in the figure indicate that pixel blocks from a reference video picture (in this case an I-frame or P-frame video picture) are used in motion compensated prediction of other video pictures (P-frame and B-frame video pictures).

参照图3，传输系统首先传输并不依赖任何其它视频帧的I帧I₁。接下来，该系统传输P帧视频图像P₅，其只依赖先前传输的视频图像I₁。接下来，即使视频图像B₂在视频图像P₅之前显示，该系统在视频图像P₅之后传输B帧视频图像B₂。原因是当要解码并递交相关的视频图像B₂时，数字视频解码器已经接收到、并解码了视频图像I₁和视频图像P₅中的解码相关的视频图像B₂所必需的信息。同样，解码的视频图像I₁和解码的视频图像P₅准备用于解码并递交下两个相关的视频图像：相关的视频图像B₃和相关的视频图像B₄。Referring to FIG. 3 , the transmission system first transmits an I frame I ₁ that does not depend on any other video frames. Next, the system transmits P frames of video image P ₅ , relying only on the previously transmitted video image I ₁ . Next, even though video image _B2 is displayed before video image _P5 , the system transmits B frames of video image _B2 after video image _P5 . The reason is that when the associated video image _B2 is to be decoded and delivered, the digital video decoder has already received and decoded the information necessary for decoding the associated video image _B2 in the video image _I1 and the video image _P5 . Likewise, the decoded video picture I ₁ and the decoded video picture P ₅ are ready for decoding and delivering the next two related video pictures: related video picture B ₃ and related video picture B ₄ .

接收器/解码器系统然后记录视频图像合适的显示顺序。在该项操作中，参考视频图像I₁以及参考视频图像P₅称作“存储的图像”。存储的图像用于重构其它的参考该存储的图像的相关的视频图像。(注意某些数字视频编码系统也允许B帧用作存储图像。)The receiver/decoder system then records the video images in the proper display sequence. In this operation, the reference video picture _I1 and the reference video picture _P5 are referred to as "stored pictures". The stored image is used to reconstruct other related video images that reference the stored image. (Note that some digital video coding systems also allow B-frames to be used as stored images.)

子母画面(P-picture)Picture-in-picture (P-picture)

子母画面的编码典型地利用了运动补偿(MC)，其中为当前视频图像中的每个象素模块计算指向前一个视频图像中某个位置的运动向量(MV)。运动向量参考在参考视频图像中的接近匹配的象素模块。利用运动向量，预测象素模块可以通过转换上述的前一个视频图像中参考象素来形成。然后将子母画面中的实际的象素模块与预测象素模块之间的差异进行编码用于传输。然后用该差异来精确构建初始的象素模块。Coding of picture-in-picture typically utilizes motion compensation (MC), where a motion vector (MV) is calculated for each pixel block in the current video image pointing to a location in the previous video image. Motion vectors refer to closely matching pixel blocks in a reference video image. Using motion vectors, predictive pixel blocks can be formed by transforming reference pixels from the previous video picture as described above. The difference between the actual pixel block and the predicted pixel block in the picture-in-picture is then encoded for transmission. This difference is then used to accurately construct the initial pixel module.

每个运动向量也可以通过预测编码方法进行传输。例如，可以利用邻近的运动向量来形成运动向量预测。在这种情况下，实际运动向量与预测运动向量之间的差异然后被编码用于传输。然后将该差异用于生成来自于预测运动向量的实际运动向量。Each motion vector can also be transmitted by a predictive coding method. For example, neighboring motion vectors can be utilized to form a motion vector prediction. In this case, the difference between the actual motion vector and the predicted motion vector is then encoded for transmission. This difference is then used to generate the actual motion vector from the predicted motion vector.

双向画面(B-picture)Bi-directional picture (B-picture)

B帧中的每个B象素模块使用两个不同的运动向量：第一运动向量和第二运动向量，其中第一运动向量参考较早的视频图像中的象素模块，第二运动向量参考稍后的视频图像中的另一个象素模块。根据这两个运动向量，计算两个预测象素模块。利用某种函数将这两个预测象素模块联合在一起以形成最终预测象素模块。(可以对这两个象素模块一起简单地进行平均。)与P象素模块一样，B帧视频图像的实际预期的象素模块和最终预测象素模块之间的差异被编码用于传输。该象素模块差异然后用于精确重建初始的象素模块。Each B-pixel block in a B-frame uses two different motion vectors: a first motion vector and a second motion vector, where the first motion vector refers to the pixel block in an earlier video image, and the second motion vector refers to Another pixel block later in the video image. From these two motion vectors, two predicted pixel blocks are calculated. The two predicted pixel blocks are joined together using some function to form the final predicted pixel block. (The two pixel blocks can simply be averaged together.) As with the P pixel block, the difference between the actual expected pixel block and the final predicted pixel block of the B-frame video image is encoded for transmission. This pixel block difference is then used to accurately reconstruct the original pixel block.

与P象素模块一样，B象素模块的每个运动向量(MV)也可以通过预测编码方法进行传输。特别地，可以利用邻近的运动向量的某种组合来形成预测运动向量。然后，实际运动向量与预测运动向量之间的差异被编码用于传输。然后利用该差异重新创造来自于预测的运动向量的实际运动向量。As with P-pixel blocks, each motion vector (MV) of a B-pixel block can also be transmitted by a predictive coding method. In particular, some combination of adjacent motion vectors may be used to form the predicted motion vector. The difference between the actual motion vector and the predicted motion vector is then encoded for transmission. This difference is then used to recreate the actual motion vector from the predicted motion vector.

然而，对于B象素模块，存在内插运动向量的机会，所述的运动向量来自于配置的或邻近存储的图像象素模块中的运动向量。这样的运动向量的内插是在数字视频编码器和数字视频解码器中执行的。(注意数字视频编码器一直包括数字视频解码器。)However, for B pixel blocks, there is an opportunity to interpolate motion vectors from motion vectors in configured or adjacently stored image pixel blocks. Interpolation of such motion vectors is performed in digital video encoders and digital video decoders. (Note that digital video encoders always include digital video decoders.)

在一些情况下，内插的运动向量足够好，不需对该内插的运动向量做任何类型的修正就可使用。在这样情况下，不需要发送运动向量数据。在ITU H.263以及H.264数字视频编码标准中，这称为“直接模式”。In some cases, the interpolated motion vector is good enough to be used without any type of modification to the interpolated motion vector. In such cases, there is no need to send motion vector data. In the ITU H.263 and H.264 digital video coding standards, this is called "direct mode".

该运动向量内插技术在来自于由摄影机生成的视频序列的一系列视频图像运作得尤其好，其中所述的摄影机缓慢地随动拍摄(panning)静态的背景。实际上，这样的运动向量内插足够好可以单独使用。特别地，这意味着对这些利用运动向量内插编码的B象素模块运动向量来讲不需要微分的运动向量信息来进行计算或传输。The motion vector interpolation technique works particularly well on a sequence of video images from a video sequence generated by a camera that is slowly panning a static background. In fact, such motion vector interpolation is good enough to be used alone. In particular, this means that for these B-pixel block motion vectors coded by motion vector interpolation no differential motion vector information is required for calculation or transmission.

象素模块编码pixel block coding

在每个视频图像内象素模块也可以以不同的方式进行编码。例如，可以将象素模块分成更小的子块，对每个子块计算并传输运动向量。子块的形状也可以不同，并且可以不一定是正方形。Pixel blocks can also be coded differently within each video image. For example, the pixel block can be divided into smaller sub-blocks, and a motion vector is calculated and transmitted for each sub-block. The shape of the sub-blocks may also vary and may not necessarily be square.

在子母画面或双向画面内，如果在存储的参考的图像内没有找到接近匹配的象素模块，不用运动补偿就可以对一些象素模块进行高效的编码。这样的象素模块然后被编码为内部象素模块(I象素模块)。在双向画面中，利用单向运动补偿代替双向运动补偿可以对一些象素模块更好地进行编码。因此，根据最接近匹配的象素模块是否在较早的视频图像或较后的视频图像中找到，将那些象素模块编码为向前预测象素模块(P象素模块)或向后预测象素模块。In picture-in-picture or bidirectional pictures, some pixel blocks can be efficiently coded without motion compensation if no close matching pixel block is found in the stored reference picture. Such pixel blocks are then coded as intra-pixel blocks (I-pixel blocks). In bidirectional pictures, some pixel blocks can be encoded better by using unidirectional motion compensation instead of bidirectional motion compensation. Therefore, depending on whether the closest matching pixel blocks are found in an earlier video image or a later video image, those pixel blocks are coded as forward predicted pixel blocks (P pixel blocks) or backward predicted pixel blocks. element module.

在传输之前，象素模块或子模快的预测错误典型地由正交转换诸如离散余弦转换或其近似值来进行转换。转换操作的结果是一组转换系数，其在数值上等于被转换的象素模块或者子模快中的象素个数。在接收器/解码器处，将接收到的转换系数进行反转换以恢复在解码中进一步用到的预测错误值。并不是所有的转换系数都需要传输为可接受的视频质量。根据可利用的传输比特率，一半或者有时多于一半的转换系数可以被删除和不传输。在解码器处，在反变换操作之前，被删除的系数值由0来代替。Prediction errors of pixel blocks or sub-blocks are typically transformed by an orthogonal transform such as a discrete cosine transform or an approximation thereof prior to transmission. The result of the transformation operation is a set of transformation coefficients that are numerically equal to the number of pixels in the pixel block or subblock being transformed. At the receiver/decoder, the received transform coefficients are inverse transformed to recover the prediction error values used further in decoding. Not all conversion factors need to be transferred for acceptable video quality. Depending on the available transmission bit rate, half or sometimes more than half of the conversion coefficients can be deleted and not transmitted. At the decoder, the deleted coefficient values are replaced by 0 before the inverse transform operation.

而且，在传输之前，典型地对转换系数进行如图1所述的量化和熵编码。量化包括用可能值的有限子集来表示转换系数值，这降低了传输的精确度。而且，该量化经常使得小的转换系数值为0，因而进一步减小了被传输的转换系数值的数量。Also, before transmission, the transform coefficients are typically quantized and entropy encoded as described in FIG. 1 . Quantization involves representing conversion coefficient values with a limited subset of possible values, which reduces the accuracy of transmission. Also, the quantization often makes small transform coefficient values zero, thereby further reducing the number of transformed transform coefficient values that are transmitted.

在量化步骤中，每个转换系数值典型地由量化器步长Q进行分割并四舍五入为最接近的整数。例如，利用下列公式可将初始的转换系数C量化为量化的系数值C_q：In the quantization step, each transform coefficient value is typically divided by a quantizer step size Q and rounded to the nearest integer. For example, the initial conversion coefficient C can be quantized into a quantized coefficient value C _q using the following formula:

C_q＝(C+Q/2)/Q 截为整数。C _q =(C+Q/2)/Q truncated to an integer.

在量化步骤之后，利用诸如霍夫曼编码的变长编码或算术编码对这些整数进行熵编码。由于许多转换系数值被截为0，由从量化和变长编码步骤将获得大量的压缩。After the quantization step, these integers are entropy coded using variable length coding such as Huffman coding or arithmetic coding. Since many transform coefficient values are truncated to 0, a lot of compression will be gained from the quantization and variable length coding steps.

利用拉格朗日函数选择比特率和失真值Selecting Bit Rate and Distortion Values Using Lagrangian Functions

数字视频编码器必须在所有可能的编码方法(或编码模式)中确定最好的编码方法，以用于对视频图像中的每个象素模块进行编码。这种编码问题通常称作模式选择问题。许多特定的方法用在了不同的数字视频编码器的实施中以处理模式选择问题。转换系数删除、传输的转换系数的量化以及模式选择的结合使得用于传输的比特率R减少。然而，这些比特率R减少技术也导致解码的视频图像中的失真D。A digital video encoder must determine the best encoding method among all possible encoding methods (or encoding modes) for encoding each pixel block in a video image. This encoding problem is often referred to as a mode selection problem. A number of specific methods are used in different digital video encoder implementations to deal with the mode selection problem. The combination of transform coefficient deletion, quantization of the transmitted transform coefficients and mode selection results in a reduction of the bit rate R for transmission. However, these bit rate R reduction techniques also result in distortions D in the decoded video images.

理想地，当设计视频编码器时，人们愿意或者将比特率R固定为常量值并减小编码失真D或者将编码失真D固定为常量值同时减小比特率R。然而，特别是在象素模块级，比特率R和/或失真D值可以与预期的固定值具有相当大的差别，因而使得限定的最优化方法站不住脚。Ideally, when designing a video encoder, one would like to either fix the bitrate R to a constant value and reduce the coding distortion D or fix the coding distortion D to a constant value while reducing the bitrate R. However, especially at the pixel block level, the bit rate R and/or distortion D values may differ considerably from the expected fixed values, thus making the defined optimization method untenable.

然而可以做的是利用拉格朗日乘子将限定的最优化问题转化为无限定的最优化问题。因而，不是固定其中一个变量(比特率R或失真D)并最优化另一个变量，而是可以只最小化拉格朗日方程：What can be done, however, is to transform the bounded optimization problem into an unbounded optimization problem using Lagrange multipliers. Thus, instead of fixing one of the variables (bitrate R or distortion D) and optimizing the other, one can just minimize the Lagrange equation:

D+lambda×RD+lambda×R

其中lambda是拉格朗日乘子。因而对于视频图像中的每个象素模块，编码器选择象素模块编码模式来最小化拉格朗日方程D+lambda×R。where lambda is the Lagrangian multiplier. Thus for each pixel block in the video image, the encoder selects the pixel block coding mode to minimize the Lagrangian equation D+lambda*R.

理论上，通过重复利用所有可能的lambda值来实现对每个单独的视频图像的完全最优化，每个lambda生成{D，R}对。预期的比特率R(或失真D)、相应的失真D(或比特率R)以及lambda值都可以从其中得到。然后再一次利用该选定的lambda值将视频图像最终编码，其将会生成预期的结果。In theory, full optimization for each individual video image is achieved by reusing all possible lambda values, each lambda generating a {D, R} pair. The expected bit rate R (or distortion D), the corresponding distortion D (or bit rate R) and the lambda value can be obtained therefrom. The video image is then finally encoded again using the selected lambda value, which will generate the expected result.

实际上，对每个视频图像来讲，这种理想的方法通常太复杂并且太资源密集而不能执行。为了确定lambda、失真D和量化器Q之间近似的关系，通常的做法是以宽范围的lambda值，利用完整的最优化方法执行具有多个视频图像的许多初步的实验。In practice, this ideal approach is usually too complex and resource intensive to implement per video image. In order to determine the approximate relationship between lambda, distortion D and quantizer Q, it is common practice to perform many preliminary experiments with multiple video images using a full optimization method with a wide range of lambda values.

以宽范围的lambda值利用完整的最优化方法初步试验多个视频图像来确定lambda、失真D和量化器Q之间近似的关系。在这些实验中，在改变lambda拉格朗日乘子的同时保持量化器Q为常量通常是有利的。如果在每个实验中保持量化器Q为常量，最终结果是一族R、D曲线，一条曲线对应量化器Q的每个不同值。图4描述的是这样一族R、D曲线的一个实例。对于每个不同的常量Q曲线，在由lambda的某个值得到的特定的{R，D}点，曲线的斜率是(-lambda)。最佳的{R，D}关系是通过提取所有的R、D曲线的最小值来获得的。The approximate relationship between lambda, distortion D, and quantizer Q is determined by preliminary experimentation with multiple video images using a full optimization method at a wide range of lambda values. In these experiments it is often advantageous to keep the quantizer Q constant while varying the lambda Lagrange multipliers. If you keep the quantizer Q constant in each experiment, the end result is a family of R,D curves, one curve for each different value of quantizer Q. Figure 4 depicts an example of such a family of R, D curves. For each different constant Q curve, at a particular {R, D} point obtained by some value of lambda, the slope of the curve is (-lambda). The optimal {R, D} relationship is obtained by extracting the minimum of all R, D curves.

此后，对于每个不同的量化器Q值，选定典型的lambda值诸如lambda_Q。例如，lambda_Q可以是在图4的Q+1和Q-1的交叉点之间的半路部分提供失真D值的值。用来选择典型的lambda值的其它方法包括lambda_Q＝0.85Q²以及lambda_Q＝0.85×2^Q/3。对于多个双向画面，通常选定更大的lambda_Q值。因而，我们有Thereafter, for each different quantizer Q value, a typical lambda value such as lambda _Q is selected. For example, lambda _Q may be a value that provides a distortion D value in the halfway portion between the intersection points of Q+1 and Q-1 of FIG. 4 . Other methods used to select typical lambda values include lambda _Q =0.85Q ² and lambda _Q =0.85×2 ^Q/3 . For multiple bidirectional pictures, a larger value of lambda _Q is usually chosen. Thus, we have

lambda_Q＝f(Q)lambda _Q = f(Q)

D_Q＝g(Q)从中可以获得Q＝h(D_Q)D _Q = g(Q) from which Q = h(D _Q ) can be obtained

然后为了编码带有预期的失真D的视频图像序列，可以首先找出最接近的D_Q，从中可以获得Q＝h(D_Q)。然后，利用相应的lambda_Q＝f(Q)对视频图像执行编码，这提供了失真D_Q的最优的比特率R。Then to encode a sequence of video images with the expected distortion D, the closest D _Q can be found first, from which Q=h(D _Q ) can be obtained. The video image is then encoded with the corresponding lambda _Q = f(Q), which provides an optimal bit rate R for the distortion _DQ .

在许多应用中，作为结果的比特率R可能太大或太小，迫使使用速率控制以确保不发生缓存区溢出或缓存区下溢。和大多数速率控制算法一样，通常的方法是将量化器Q从象素模块改变为象素模块和/或从视频图像改变为视频图像。当编码器缓存区有可能变得太满(并可能溢出)的征兆时，增加量化器Q的值以减小比特率R。当编码器缓存区可能太空(并且将可能下溢)时，减小量化器Q的值以增加比特率R。In many applications, the resulting bit rate R may be too large or too small, necessitating the use of rate control to ensure that no buffer overflow or buffer underflow occurs. As with most rate control algorithms, the usual approach is to change the quantizer Q from pixel blocks to pixel blocks and/or from video image to video image. When there is a sign that the encoder buffer may become too full (and possibly overflow), the value of the quantizer Q is increased to decrease the bit rate R. When the encoder buffer is likely to run out (and will likely underflow), the value of the quantizer Q is decreased to increase the bit rate R.

然而，量化器Q的值的改变可能导致比特率R的太大的变化。而且，量化器Q值的改变需要发信号到解码器，这增加了必须要传输到解码器的附加位的量。而且，改变量化器Q可能具有其它有关视频图像质量的影响诸如回路滤波。However, a change in the value of the quantizer Q may result in too large a change in the bit rate R. Furthermore, changes in the Q value of the quantizer need to be signaled to the decoder, which increases the amount of additional bits that must be transmitted to the decoder. Also, changing the quantizer Q may have other effects on video image quality such as in-loop filtering.

为了获得预期的速率控制，改变量化器Q另一可选择的办法是改变拉格朗日乘子lambda。较小的拉格朗日乘子lambda值导致较大的比特率R(以及较小的失真D)，同样地较大的拉格朗日乘子lambda值降低了比特率R(并增加失真D)。拉格朗日乘子lambda中的变化可以随意地细微，这与被数字化并编码的量化器Q中的变化相反，使得量化器Q只限定于某些值。在许多数字视频压缩与编码系统中，包括所有的MPEG视频压缩与编码标准，并不是量化器Q的所有整数值都允许发送，在这种情况下比特率R的突然变化可以更显著。An alternative to changing the quantizer Q in order to obtain the desired rate control is to change the Lagrangian multiplier lambda. Smaller values of Lagrange multiplier lambda result in larger bitrate R (and smaller distortion D), likewise larger values of Lagrange multiplier lambda reduce bitrate R (and increase distortion D ). The variation in the Lagrangian multiplier lambda can be arbitrarily small, as opposed to the variation in the quantizer Q which is digitized and coded so that the quantizer Q is limited to certain values. In many digital video compression and coding systems, including all MPEG video compression and coding standards, not all integer values of the quantizer Q are allowed to be sent, in which case sudden changes in the bit rate R can be more significant.

当需要拉格朗日乘子lambda大于某个阈值lambda_max(Q)以得到某一比特率减小量时，量化器Q将增加，并且利用新增加的量化器Q值，拉格朗日乘子lambda将返回其名义上的值f(Q)。当需要拉格朗日乘子lambda小于某个阈值lambda_min(Q)以得到某一比特率增加时，量化器Q将减小，并且利用新减小的量化器Q，拉格朗日乘子lambda将返回其名义上的值f(Q)。When the Lagrangian multiplier lambda is required to be greater than a certain threshold lambda_max(Q) to obtain a certain amount of bit rate reduction, the quantizer Q will increase, and using the newly increased quantizer Q value, the Lagrange multiplier The lambda will return its nominal value f(Q). When the Lagrangian multiplier lambda is required to be less than a certain threshold lambda_min(Q) to obtain a certain bit rate increase, the quantizer Q will be reduced, and with the newly reduced quantizer Q, the Lagrange multiplier lambda will return its nominal value f(Q).

lambda_max(Q)和lambda_min(Q)的值是由图4中的比特率——失真关系上的交叉点来决定的。如果将D(lambda，Q)确定为当用拉格朗日乘子lambda和量化器步长Q编码时获得的失真，则操作关系为：The values of lambda_max(Q) and lambda_min(Q) are determined by the intersection point on the bit rate-distortion relationship in Figure 4. If D(lambda, Q) is determined to be the distortion obtained when encoding with the Lagrangian multiplier lambda and the quantizer step size Q, the operational relation is:

D(lambda_min(Q+1)，Q+1)＝D(lambda_max(Q)，Q)D(lambda_min(Q+1), Q+1)=D(lambda_max(Q), Q)

lambda_min(Q)＜＝f(Q)＜＝lambda_max(Q)lambda_min(Q)<=f(Q)<=lambda_max(Q)

在以下的伪代码中对视频编码系统的这样的速率控制算法的详细操作进行了阐述：The detailed operation of such a rate control algorithm for a video coding system is set forth in the following pseudocode:

Start_encoding_picture： //开始编码视频图像Start_encoding_picture: //Start encoding video image

input desired D； //取得预期的失真D值** input desired D; // get the expected distortion D value

find D_Q nearest to D； //找到和预期的D值最接近的D_Q值find D _Q nearest to D; //Find the D _Q value closest to the expected D value

Q＝h(D_Q)； //确定量化器值QQ=h(D _Q ); //Determine the quantizer value Q

lambda＝f(Q)； //确定拉格朗日乘子lambdaLambda＝f(Q); //Determine the Lagrangian multiplier lambda

start_encoding_pixelblock： //开始从图像中编码象素模块start_encoding_pixelblock: //Start encoding pixel block from the image

code_pixelblock(lambda，Q)；//利用lambda和Q编码象素模块code_pixelblock(lambda, Q);//Use lambda and Q to encode the pixel block

if(encoder_buffer＞Tfull){ //缓存区有溢出征兆否？if(encoder_buffer＞Tfull){ //Is there any sign of buffer overflow?

lambda＝lambda+deltalambda；//deltalambda可以依赖阈QLambda＝lambda+deltalambda; //deltalambda can depend on the threshold Q

if(lambda＞Lambda_max(Q)){ //如果lambda太大，增加QIf(lambda＞Lambda_max(Q)){ //If lambda is too large, increase Q

Q＝Q+deltaQ； //增加量化器Q步长 Q＝Q+deltaQ;

Lambda＝f(Q)； //设置新的拉格朗日乘子lambdaLambda＝f(Q); //Set the new Lagrangian multiplier lambda

}}

if(encoder_buffer＜Tempty){ //缓存区有下溢征兆否？if(encoder_buffer<Tempty){ //Is there any sign of underflow in the buffer?

lambda＝lambda-deltalambda；//是的，因此减小lambdaLambda = lambda-deltalambda; // yes, so reduce lambda

if(Lambda＜Lambda_min(Q)){ //如果lambda太小，减小QIf(Lambda<Lambda_min(Q)){ //If lambda is too small, reduce Q

Q＝Q-deltaQ； //减小量化器Q步长 Q＝Q-deltaQ;

Lambda＝f(Q)； //设置新的拉格朗日乘子lambdaLambda＝f(Q); //Set a new Lagrangian multiplier lambda

}}

if(not last pixelblock)then goto start_encoding_pixelblock；//接下来处理图像if(not last pixelblock) then goto start_encoding_pixelblock;//Then process the image

通常的比特率控制算法的变化可以包括编码器缓存区值的多种不同的阈值，由此如果编码器缓存区大大超过Tfull阈值，则不用等到拉格朗日乘子lambda超过其阈值就可以立即增加量化器Q。类似地，如果编码器缓存区显著低于Tempty阈值，则可以立即减小量化器Q。可替代地，如果编码器缓存区极大地超过Tfull阈值或者远没达到Tempty阈值，则可以增加deltalambda的步长。Variations of the usual bitrate control algorithm can include various thresholds for the encoder buffer value, whereby if the encoder buffer exceeds the Tfull threshold by a large amount, it can immediately Increase the quantizer Q. Similarly, if the encoder buffer is significantly below the Tempty threshold, the quantizer Q can be reduced immediately. Alternatively, the step size of delta lambda can be increased if the encoder buffer exceeds the Tfull threshold greatly or falls well short of the Tempty threshold.

Deltalambda和deltaQ的值可以随着量化器Q或随着视频图像类型(单一画面、子母画面或双向画面)而变化。而且，增加拉格朗日乘子lambda的操作可以由乘法来代替，其可以将拉格朗日乘子lambda改变某一百分量。例如，可以利用下列的增加lambda的操作的等式来改变拉格朗日乘子lambda：The values of Deltalambda and deltaQ can vary with quantizer Q or with video picture type (single picture, picture-in-picture or bidirectional picture). Also, the operation of increasing the Lagrangian multiplier lambda can be replaced by multiplication, which can change the Lagrangian multiplier lambda by a certain percentage. For example, the Lagrangian multiplier lambda can be varied using the following equation for the operation of increasing lambda:

Lambda＝(1+deltalambda)×lambdaLambda＝(1+delta lambda)×lambda

同样地，可以利用下列的等式进行减小lambda的操作Similarly, the following equation can be used to reduce the lambda operation

Lambda＝(1-deltalambda)×lambdaLambda＝(1-delta lambda)×lambda

这个简单的速率控制算法描述了用于该应用的不同的lambda。也设计了其它的更复杂的速率控制算法，并且这些其它的速率控制算法也可以得益于不同的拉格朗日乘子lambda。This simple rate control algorithm describes different lambdas for this application. Other more complex rate control algorithms have also been devised and these other rate control algorithms may also benefit from different Lagrangian multipliers lambda.

视觉失真折中Visual Distortion Tradeoff

不同的拉格朗日乘子lambda的另一个应用是在视觉失真标准的使用中。通常通过对原始的象素值和解码的象素值之间的平方误差进行求和来对失真D进行测量。然而，这种简单的失真测量方法并不能较好地调整视频图像中象素错误的实际的可见性。因而，这样简单的失真测量方法可以使得先前的最小化得出小于最佳结果的结果。因而，将主观的影响考虑进去的算法通常是更有用的。Another application of the different Lagrangian multipliers lambda is in the use of visual distortion criteria. The distortion D is usually measured by summing the squared error between the original pixel value and the decoded pixel value. However, this simple measure of distortion does not do a good job of adjusting for the actual visibility of pixel errors in video images. Thus, such a simple distortion measurement method can make the prior minimization yield less than optimal results. Thus, algorithms that take subjective effects into account are often more useful.

可以通过对每个将被编码在视频图像中的象素模块或子模块的视觉掩蔽值M进行计算来将编码噪音的可见性考虑进去。视觉掩蔽值M是基于该区域内的象素的空间变量和时间变量的。The visibility of coding noise can be taken into account by computing a visual masking value M for each pixel block or sub-block to be coded in the video image. The visual masking value M is based on the spatial and temporal variables of the pixels within the region.

更大的视觉掩蔽值M表示更大的掩蔽，其使得失真更难于视觉上检测到。在这样的区域，可以增加失真D并减少比特率R。这是在编码优化算法中利用M×lambda(拉格朗日乘子)而不是只利用拉格朗日乘子lambda来方便地完成的。下面的伪代码描述了修改的算法。A larger visual masking value M indicates a larger masking, which makes the distortion more difficult to detect visually. In such regions, the distortion D can be increased and the bit rate R can be decreased. This is conveniently done using M*lambda (Lagrangian multiplier) instead of only Lagrangian multiplier lambda in the coding optimization algorithm. The pseudocode below describes the modified algorithm.

Start_encoding_picture://开始编码视频图像Start_encoding_picture://Start encoding video image

input desired D//取得预期的失真D值 input desired D//Get the expected distortion D value

find D_Q nearest to D；//找到和预期的D值最接近的D值find D _Q nearest to D;//Find the D value closest to the expected D value

Qnorm＝h(D_Q)//确定没有掩蔽的正常QQnorm=h(D _Q )//determine normal Q without masking

lambda＝f(Qnorm)；//确定拉格朗日乘子lambda lambda＝f(Qnorm);//Determine the Lagrangian multiplier lambda

start_encoding_pixelblock：//开始从图像中编码象素模块start_encoding_pixelblock: //Start encoding the pixel block from the image

Q＝Qnorm；//将Q设置到没有掩蔽的正常Q Q = Qnorm; //Set Q to normal Q without masking

calculate visual mask M；//确定视觉掩蔽的量Calculate visual mask M;//Determine the amount of visual masking

while(M×lambda＞Lambda_max(Q))//{如果存在强烈的掩蔽， while(M×lambda＞Lambda_max(Q))//{If there is strong masking,

增加QIncrease Q

Q＝Q+deltaQ；//增加量化器Q步长 Q＝Q+deltaQ;//Increase the Q step size of the quantizer

}}

code pixelblock((M×lambda，Q)//利用M×lambda和Q进行编码 code pixelblock((M×lambda, Q)//use M×lambda and Q to encode

lambda＝lambda+deltalambda； //增加lambdaLambda＝lambda+deltalambda; //increase lambda

if(lambda＞Lambda_max(Q)){ //测试lambdaIf(lambda＞Lambda_max(Q)){ //Test lambda

Qnorm＝Qnorm+deltaQ； //如果lambda太大，增加Q Qnorm＝Qnorm+deltaQ; //If lambda is too large, increase Q

lambda＝f(Qnorm)； //计算新的lambda`` lambda=f(Qnorm); //calculate new lambda

}}

lambda＝lambda-deltalambda；//减小lambdaLambda=lambda-deltalambda;//decrease lambda

if(Lambda＜Lambda_min(Qnorm)){ //测试lambdaIf(Lambda<Lambda_min(Qnorm)){ //Test lambda

Qnorm＝Qnorm-deltaQ；//如果lambda太小，减小Q Qnorm＝Qnorm-deltaQ;//If lambda is too small, reduce Q

Lambda＝f(Qnorm)；//计算新的lambdaLambda＝f(Qnorm);//Calculate new lambda

}}

第二个简单的视觉掩蔽算法描述了不同的lambda在该应用中的使用。也设计了其它的更复杂的视觉掩蔽算法，并且这些其它的视觉掩蔽算法也可以得益于不同的拉格朗日乘子lambda。The second simple visual masking algorithm describes the use of different lambdas for this application. Other more complex visual masking algorithms have also been devised and these other visual masking algorithms may also benefit from different Lagrangian multipliers lambda.

拉格朗日乘子lambda的变化在其它编码决策中也可以是有用的。例如当编码一系列视频图像时，要确定有多少双向画面要编码通常是非常难以回答的。对于特定值的量化器Q和lambda_Q＝f(Q)，每个子母画面带一个双向画面的编码结果可以是R₁、D₁，而每个子母画面带两个双向画面的编码结果可以是R₂、D₂。Variations of the Lagrangian multiplier lambda may also be useful in other coding decisions. For example when encoding a sequence of video images, it is often very difficult to determine how many bidirectional pictures to encode. For a specific value of quantizer Q and lambda _Q = f(Q), the encoding result of one bidirectional picture per picture-in-picture can be R ₁ , D ₁ , and the encoding result of two bidirectional pictures per picture-in-picture can be R ₂ , D ₂ .

如果R₂＜R₁并且D₂＜D₁，则显然最好的答案是两个双向画面较好。然而，结果经常是R₂＜R₁并且D₂＞D₁，这样不清楚双向画面的数量较好。在这种情况下，我们可以使用具有较小的lambda的每一子母画面两个双向画面来重新编码，该lambda给定的D₂近似等于D₁。然后我们可以简单地将结果值R₂和R₁进行对比，看那个比特率更小。If R ₂ < R ₁ and D ₂ < D ₁ , then obviously the best answer is that two bidirectional pictures are better. However, it often turns out that R ₂ < R ₁ and D ₂ > D ₁ , so it is not clear that the number of bidirectional pictures is better. In this case, we can re-encode using two bidirectional pictures per picture-in-picture with a smaller lambda given D ₂ approximately equal to D ₁ . We can then simply compare the resulting value _R2 with _R1 to see which has the lower bit rate.

其它方案可以相似地类比，例如交织编码对连续编码、不同的运动搜索范围的编码、带有或者不带有某种编码模式的编码等。Other schemes can be similarly compared, such as interleaved coding versus continuous coding, coding of different motion search ranges, coding with or without a certain coding mode, and so on.

总而言之，我们提供了一种简单但有效的速率-失真折中的方法，其在视频编码中具有多种应用。以上描述了在多媒体压缩和编码系统中通过编码模式选择来控制速率-失真折中的系统。预期本领域的普通技术人员在不背离本发明的范围的情况下可以对本发明的元件的材料和配置作出改变和修改。In summary, we provide a simple yet effective approach to the rate-distortion tradeoff, which has diverse applications in video coding. The above describes a system for controlling the rate-distortion trade-off through coding mode selection in a multimedia compression and coding system. It is expected that those of ordinary skill in the art may make changes and modifications in the materials and configuration of the elements of the invention without departing from the scope of the invention.

Claims

1. A method of controlling rate-distortion in a video compression and coding system, said method comprising:

Select the distortion value D around the expected distortion value;

using said distortion value D to determine a quantizer value Q;

Computing a Lagrangian multiplier lambda using said quantizer value Q;

encoding pixel blocks using said Lagrangian multiplier lambda and said quantizer value Q;

When the occupancy value of the buffer area exceeds the overflow threshold, increase the Lagrangian multiplier lambda, and when the Lagrangian multiplier lambda exceeds the maximum lambda threshold, increase the quantizer value Q, wherein said buffer area is used to store the digital bit stream; and

When the occupancy value of the buffer area falls below the underflow threshold, reduce the Lagrangian multiplier lambda, and when the Lagrangian multiplier lambda falls below the minimum lambda threshold, reduce The quantizer value Q.

2. The method according to claim 1, further comprising recalculating the Lagrangian multiplier lambda when the quantizer value Q is adjusted.

3. The method of claim 1, wherein, depending on the quantizer value Q, the Lagrangian multiplier lambda is increased or decreased by an amount.

4. A method of controlling rate-distortion in a video compression and coding system, said method comprising:

Select the distortion value D around the expected distortion value;

using said distortion value D to determine a quantizer value Q;

Computing a Lagrangian multiplier lambda using said quantizer value Q;

For the pixel block, calculating a visual masking value M based on the temporal and spatial variables of the pixels in the pixel block to quantify the visual masking of the pixel block, wherein the visual masking of the pixel block the visibility of coding noise representing the pixel block; and

The quantizer value Q is increased when the visual masking value M multiplied by the Lagrangian multiplier lambda is greater than a maximum threshold of the Lagrangian multiplier lambda.

5. The method according to claim 4, wherein the minimum threshold value lambda_min based on the first quantizer value Q+1 and based on the second The maximum threshold lambda_max for the quantizer value Q, where the bitrate-distortion relationship is expressed by:

D(lambda_min(Q+1), Q+1)=D(lambda_max(Q), Q).

6. The method of claim 4, further comprising:

7. The method according to claim 6, further comprising recalculating the Lagrangian multiplier lambda if the quantizer value Q is adjusted.

8. An apparatus for controlling rate distortion, the apparatus comprising:

means for selecting a distortion value D around an expected distortion value;

means for determining a quantizer value Q using said distortion value D;

means for calculating a Lagrangian multiplier lambda using said quantizer value Q;

means for encoding a block of pixels using said Lagrangian multiplier lambda and said quantizer value Q;

Used to increase the Lagrangian multiplier lambda when the occupancy value of the buffer exceeds the overflow threshold, and increase the quantizer value Q when the Lagrange multiplier lambda exceeds the maximum lambda threshold means, wherein the buffer area is used to store a digital bit stream; and

Used for reducing the Lagrangian multiplier lambda when the occupancy value of the buffer zone falls below the underflow threshold, and when the Lagrangian multiplier lambda falls below the minimum lambda threshold, means for decreasing said quantizer value Q.

9. The apparatus of claim 8, further comprising:

means for recomputing said Lagrangian multiplier lambda when said quantizer value Q is adjusted.

10. The apparatus of claim 8, wherein depending on the quantizer value Q, the Lagrangian multiplier lambda is increased or decreased by an amount.

11. An apparatus for controlling rate distortion, the apparatus comprising:

means for selecting a distortion value D around an expected distortion value;

means for determining a quantizer value Q using said distortion value D;

For said pixel module, calculate the visual masking value M based on the temporal variable and spatial variable of the pixels in said pixel module to quantify the visual masking means of said pixel module, wherein said pixel module the visual mask represents the visibility of coding noise for the pixel block; and

means for increasing said quantizer value Q when said visual masking value M multiplied by said Lagrangian multiplier lambda is greater than a maximum threshold of said Lagrangian multiplier lambda.

12. The apparatus according to claim 11 , wherein the minimum threshold lambda_min based on the first quantizer value Q+1 and based on the second The maximum threshold lambda_max for the quantizer value Q, where the bitrate-distortion relationship is expressed by:

D(lambda_min(Q+1), Q+1)=D(lambda_max(Q), Q).

13. The apparatus of claim 11, further comprising:

A device for increasing the Lagrangian multiplier lambda when the occupancy value of the buffer exceeds an overflow threshold, and increasing the quantizer value Q when the Lagrangian multiplier lambda exceeds a maximum lambda threshold , wherein the buffer area is used to store a digital bit stream; and

Used for reducing the Lagrangian multiplier lambda when the occupancy value of the buffer area falls below the underflow threshold, and reducing the Lagrangian multiplier lambda when the Lagrange multiplier lambda falls below the minimum lambda threshold means for small the quantizer value Q.

14. The apparatus of claim 13, further comprising: