HK1205841B

HK1205841B - Video coding with improved random access point picture behaviors

Info

Publication number: HK1205841B
Application number: HK15106300.5A
Authority: HK
Inventors: 王益魁
Original assignee: 高通股份有限公司
Priority date: 2012-09-20
Filing date: 2013-08-27
Publication date: 2019-12-13

Description

Video decoding with improved random access point picture behavior

本申请案主张2012年9月20日申请的第61/703,695号美国临时申请案的权利，所述申请案的全部内容以引用的方式并入本文中。This application claims the benefit of U.S. Provisional Application No. 61/703,695, filed September 20, 2012, the entire contents of which are incorporated herein by reference.

技术领域Technical Field

本发明通常涉及处理视频数据，且更具体来说涉及用于视频数据中的随机存取图片。The present disclosure relates generally to processing video data, and more particularly to random access pictures for use in video data.

背景技术Background Art

数字视频能力可并入至广泛范围的装置中，所述装置包含数字电视、数字直播系统、无线广播系统、个人数字助理(PDA)、膝上型或台式计算机、平板型计算机、电子书阅读器、数字摄像机、数字记录装置、数字媒体播放器、视频游戏装置、视频游戏主机、蜂窝式或卫星无线电电话、所谓的“智能型手机”、视频电传会议装置、视频流装置，及其类似者。数字视频装置实施视频译码技术，例如在由MPEG-2、MPEG-4、ITU-T H.263、ITU-T H.264/MPEG-4第10部分“高级视频译码(AVC)”所定义之标准、目前正在发展之高效视频译码(HEVC)标准及此些标准的扩展中所描述的视频译码技术。视频装置可通过实施此些视频译码技术来更有效地发射、接收、编码、解码及/或存储数字视频信息。Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital live broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital camcorders, digital recording devices, digital media players, video game devices, video game consoles, cellular or satellite radio telephones, so-called "smartphones," video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video coding techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 Part 10, "Advanced Video Coding (AVC)," the currently developing High Efficiency Video Coding (HEVC) standard, and extensions of such standards. By implementing such video coding techniques, video devices can more efficiently transmit, receive, encode, decode, and/or store digital video information.

视频译码技术包含空间(图片内)预测及/或时间(图片间)预测以减少或移除视频序列中固有的冗余。对于基于块的视频译码，可将视频切片(例如，视频帧或视频帧的一部分)分割成视频块(其也可被称作树型块)、译码单元(CU)及/或译码节点。使用相对于同一图片中的相邻块中的参考样本的空间预测来编码图片的帧内译码(I)切片中的视频块。图片的帧间译码(P或B)切片中的视频块可使用相对于同一图片中的相邻块中的参考样本的空间预测或相对于其它参考图片中的参考样本的时间预测。图片可被称作帧，且参考图片可被称作参考帧。Video coding techniques include spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (e.g., a video frame or a portion of a video frame) may be partitioned into video blocks (which may also be referred to as treeblocks), coding units (CUs), and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in an inter-coded (P or B) slice of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. Pictures may be referred to as frames, and reference pictures may be referred to as reference frames.

空间预测或时间预测导致待译码块的预测性块。残余数据表示待译码的原始块与预测性块之间的像素差。根据指向形成预测性块的参考样本的块的运动向量及指示经译码块与预测性块之间的差异的残余数据来编码帧间译码块。根据帧内译码模式及残余数据来编码帧内译码块。为了进行进一步压缩，可将残余数据自像素域变换至变换域，从而引起残余变换系数，可接着量化残余变换系数。可扫描最初布置成二维数组的经量化的变换系数以便产生变换系数的一维向量，且可应用熵译码以实现甚至更多压缩。Spatial prediction or temporal prediction results in a predictive block for the block to be coded. Residual data represents the pixel differences between the original block to be coded and the predictive block. Inter-coded blocks are encoded based on a motion vector pointing to a block of reference samples forming the predictive block and the residual data indicating the difference between the coded block and the predictive block. Intra-coded blocks are encoded based on an intra-coding mode and the residual data. For further compression, the residual data can be transformed from the pixel domain to the transform domain, resulting in residual transform coefficients, which can then be quantized. The quantized transform coefficients, initially arranged in a two-dimensional array, can be scanned to produce a one-dimensional vector of transform coefficients, and entropy coding can be applied to achieve even more compression.

发明内容Summary of the Invention

一般来说，本发明描述在视频译码中提供对包含清洁随机存取(CRA)图片及断链存取(BLA)图片的随机存取点(RAP)图片的改善的支持的技术。在一些状况下，RAP图片可替代性地被称作帧内随机存取点(IRAP)图片。详细地说，本发明描述用于选择经译码图片缓冲器(CPB)参数的技术，所述经译码图片缓冲器(CPB)参数用以定义视频译码装置的针对视频位流中的CRA图片或BLA图片的CPB。CPB参数的默认集合或是替代性集合可用以定义CPB。如果在应选择替代性集合时使用了默认集合，则CPB可溢出。In general, this disclosure describes techniques for providing improved support for random access point (RAP) pictures, including clean random access (CRA) pictures and broken link access (BLA) pictures, in video coding. In some cases, RAP pictures may alternatively be referred to as intra random access point (IRAP) pictures. Specifically, this disclosure describes techniques for selecting coded picture buffer (CPB) parameters that define a CPB for a CRA or BLA picture in a video bitstream of a video coding device. Either a default set or an alternative set of CPB parameters can be used to define the CPB. If the default set is used when an alternative set should be selected, the CPB may overflow.

在一个实例中，本发明是针对一种处理视频数据的方法，所述方法包括接收表示多个图片的位流，所述多个图片包含CRA图片或BLA图片中的一或多者；及接收消息，所述消息指示是否针对所述CRA图片或所述BLA图片中的至少一者使用CPB参数的替代性集合。所述方法进一步包括基于所述接收到的消息设定经定义以指示用于所述CRA图片或所述BLA图片中的所述一者的CPB参数的所述集合的变量；及基于针对所述图片的所述变量选择用于所述CRA图片或所述BLA图片中的所述一者的CPB参数的所述集合。In one example, the disclosure is directed to a method of processing video data, the method comprising receiving a bitstream representing a plurality of pictures, the plurality of pictures including one or more of CRA pictures or BLA pictures; and receiving a message indicating whether an alternative set of CPB parameters is to be used for at least one of the CRA pictures or the BLA pictures. The method further comprises setting a variable defined to indicate the set of CPB parameters for the one of the CRA pictures or the BLA pictures based on the received message; and selecting the set of CPB parameters for the one of the CRA pictures or the BLA pictures based on the variable for the picture.

在另一实例中，本发明是针对一种用于处理视频数据的视频译码装置，所述装置包括经配置以存储视频数据的CPB；及一或多个处理器，所述一或多个处理器经配置以接收表示多个图片的位流，所述多个图片包含CRA图片或BLA图片中的一或多者；接收消息，所述消息指示是否针对所述CRA图片或所述BLA图片中的至少一者使用CPB参数的替代性集合；基于所述接收到的消息设定经定义以指示用于所述CRA图片或所述BLA图片中的所述一者的CPB参数的所述集合的变量；及基于针对所述图片的所述变量选择用于所述CRA图片或所述BLA图片中的所述一者的CPB参数的所述集合。In another example, the present disclosure is directed to a video decoding device for processing video data, the device comprising a CPB configured to store the video data; and one or more processors configured to receive a bitstream representing a plurality of pictures, the plurality of pictures including one or more of CRA pictures or BLA pictures; receive a message indicating whether to use an alternative set of CPB parameters for at least one of the CRA pictures or the BLA pictures; set a variable defined to indicate the set of CPB parameters for the one of the CRA pictures or the BLA pictures based on the received message; and select the set of CPB parameters for the one of the CRA pictures or the BLA pictures based on the variable for the picture.

在其它实例中，本发明是针对一种用于处理视频数据的视频译码装置，所述装置包括用于接收表示多个图片的位流的装置，所述多个图片包含CRA图片或BLA图片中的一或多者；用于接收消息的装置，所述消息指示是否针对所述CRA图片或所述BLA图片中的至少一者使用CPB参数的替代性集合；用于基于所述接收到的消息设定经定义以指示用于所述CRA图片或所述BLA图片中的所述一者的CPB参数的所述集合的变量的装置；及用于基于针对所述图片的所述变量选择用于所述CRA图片或所述BLA图片中的所述一者的CPB参数的所述集合的装置。In other examples, the present disclosure is directed to a video decoding device for processing video data, the device comprising means for receiving a bitstream representing a plurality of pictures, the plurality of pictures including one or more of CRA pictures or BLA pictures; means for receiving a message indicating whether an alternative set of CPB parameters is to be used for at least one of the CRA pictures or the BLA pictures; means for setting a variable defined to indicate the set of CPB parameters for the one of the CRA pictures or the BLA pictures based on the received message; and means for selecting the set of CPB parameters for the one of the CRA pictures or the BLA pictures based on the variable for the picture.

在额外实例中，本发明是针对一种包括用于处理视频数据的指令的计算机可读媒体，所述指令在执行时使得一或多个处理器接收表示多个图片的位流，所述多个图片包含CRA图片或BLA图片中的一或多者；接收消息，所述消息指示是否针对所述CRA图片或所述BLA图片中的至少一者使用CPB参数的替代性集合；基于所述接收到的消息设定经定义以指示用于所述CRA图片或所述BLA图片中的所述一者的CPB参数的所述集合的变量；及基于针对所述图片的所述变量选择用于所述CRA图片或所述BLA图片中的所述一者的CPB参数的所述集合。In an additional example, the present disclosure is directed to a computer-readable medium comprising instructions for processing video data, the instructions, when executed, causing one or more processors to receive a bitstream representing a plurality of pictures, the plurality of pictures including one or more of CRA pictures or BLA pictures; receive a message indicating whether an alternative set of CPB parameters is to be used for at least one of the CRA pictures or the BLA pictures; set a variable defined to indicate the set of CPB parameters for the one of the CRA pictures or the BLA pictures based on the received message; and select the set of CPB parameters for the one of the CRA pictures or the BLA pictures based on the variable for the picture.

一或多个实例的细节阐述于以下随附图式及描述内容中。其它特征、目标及优点将自所述描述内容及所述图式以及自权利要求书显而易见。The details of one or more examples are set forth in the following drawings and the description. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是说明可利用本发明中所描述的技术的实例视频编码及解码系统的框图。1 is a block diagram illustrating an example video encoding and decoding system that may utilize the techniques described in this disclosure.

图2是说明可实施本发明中所描述的技术的实例视频编码器的框图。2 is a block diagram illustrating an example video encoder that may implement the techniques described in this disclosure.

图3是说明可实施本发明中所描述的技术的实例视频解码器的框图。3 is a block diagram illustrating an example video decoder that may implement the techniques described in this disclosure.

图4是说明经配置以根据假想参考解码器(HRD)操作的实例目的地装置的框图。4 is a block diagram illustrating an example destination device configured to operate according to a hypothetical reference decoder (HRD).

图5是说明基于变量选择经译码图片缓冲器(CPB)参数的集合的实例操作的流程图，所述变量指示用于位流中的特定随机存取点(RAP)图片的CPB参数的集合。5 is a flowchart illustrating an example operation of selecting a set of coded picture buffer (CPB) parameters based on a variable that indicates the set of CPB parameters for a particular random access point (RAP) picture in a bitstream.

图6是说明基于变量设定特定RAP图片的网络抽象层(NAL)单元类型的实例操作的流程图，所述变量指示用于所述图片的CPB参数的集合。6 is a flow diagram illustrating example operations of setting a network abstraction layer (NAL) unit type for a particular RAP picture based on a variable that indicates a set of CPB parameters for the picture.

图7是说明基于针对图片的NAL单元类型及变量选择用于特定RAP图片的CPB参数的集合的实例操作的流程图，所述变量指示用于所述图片的CPB参数的集合。7 is a flowchart illustrating an example operation of selecting a set of CPB parameters for a particular RAP picture based on the NAL unit type for the picture and a variable that indicates the set of CPB parameters for the picture.

图8是说明基于变量选择CPB参数的集合的实例操作的流程图，所述变量经定义以指示针对位流中的特定RAP图片的网络抽象层(NAL)单元类型。8 is a flowchart illustrating an example operation of selecting a set of CPB parameters based on a variable defined to indicate a network abstraction layer (NAL) unit type for a particular RAP picture in a bitstream.

图9是说明形成网络的部分的装置的实例集合的框图。9 is a block diagram illustrating an example collection of devices forming part of a network.

具体实施方式DETAILED DESCRIPTION

本发明描述在视频译码中提供对包含清洁随机存取(CRA)图片及断链存取(BLA)图片的随机存取点(RAP)图片的改善的支持的技术。在一些状况下，RAP图片可替代性地被称作帧内随机存取点(IRAP)图片。详细地说，本发明描述用于选择经译码图片缓冲器(CPB)参数的技术，所述经译码图片缓冲器(CPB)参数用以定义视频译码装置的用于视频位流中的CRA图片及BLA图片的CPB。假想参考解码器(HRD)依赖于包含缓冲周期信息及图片时序信息的HRD参数。缓冲周期信息定义CPB参数，即初始CPB移除延迟及初始CPB移除延迟偏移。CPB参数的默认集合或是替代性集合可用以基于用以初始化HRD的图片的类型来定义CPB。如果在应选择替代性集合时使用默认集合，则视频译码装置中符合HRD的CPB可溢出。This disclosure describes techniques for providing improved support for random access point (RAP) pictures, including clean random access (CRA) pictures and broken link access (BLA) pictures, in video coding. In some cases, RAP pictures may alternatively be referred to as intra random access point (IRAP) pictures. Specifically, this disclosure describes techniques for selecting coded picture buffer (CPB) parameters, which are used to define the CPB for CRA and BLA pictures in a video bitstream of a video coding device. The hypothetical reference decoder (HRD) relies on HRD parameters that include buffering period information and picture timing information. The buffering period information defines CPB parameters, namely, initial CPB removal delay and initial CPB removal delay offset. Either a default set or an alternative set of CPB parameters can be used to define the CPB based on the type of picture used to initialize the HRD. If the default set is used when an alternative set should be selected, the CPB that conforms to the HRD in the video coding device may overflow.

根据所述技术，视频译码装置接收表示多个图片的位流，所述多个图片包含一或多个CRA图片或BLA图片；且还接收消息，所述消息指示是否针对CRA图片或BLA图片中的每一者使用CPB参数的替代性集合。所述消息可接收自外部装置，例如，包含于流式处理服务器中的处理装置、中间网络元件或另一网络实体。According to the techniques, a video coding device receives a bitstream representing a plurality of pictures, the plurality of pictures including one or more CRA pictures or BLA pictures, and further receives a message indicating whether an alternative set of CPB parameters is used for each of the CRA pictures or BLA pictures. The message may be received from an external device, such as a processing device included in a streaming server, an intermediate network element, or another network entity.

视频译码装置基于接收到的消息设定变量，所述变量经定义以指示用于CRA图片或BLA图片中的给定一者的CPB参数的集合。视频译码装置接着基于针对CRA图片或BLA图片中的所述给定一者的变量来选择用于图片的CPB参数的集合。将CPB参数的所选择集合应用于包含于视频编码器或视频解码器中的CPB以确保CPB在视频译码期间将不溢出。在一些状况下，视频译码装置可设定针对CRA图片或BLA图片中的给定一者的网络抽象层(NAL)单元类型。视频译码装置可将针对图片的NAL单元类型设定为所发信的，或视频译码装置可基于针对图片的变量来设定NAL单元类型。视频译码装置可基于针对图片的NAL单元类型及变量来选择用于给定图片的CPB参数的集合。The video coding device sets a variable based on the received message, the variable being defined to indicate a set of CPB parameters for a given one of a CRA picture or a BLA picture. The video coding device then selects a set of CPB parameters for the picture based on the variable for the given one of the CRA picture or the BLA picture. The selected set of CPB parameters is applied to the CPB included in the video encoder or the video decoder to ensure that the CPB will not overflow during video coding. In some cases, the video coding device may set a network abstraction layer (NAL) unit type for the given one of the CRA picture or the BLA picture. The video coding device may set the NAL unit type for the picture to the signaled one, or the video coding device may set the NAL unit type based on the variable for the picture. The video coding device may select a set of CPB parameters for the given picture based on the NAL unit type for the picture and the variable.

图1是说明可利用本发明中所描述的技术的实例视频编码及解码系统10的框图。如图1中所展示，系统10包含源装置12，所述源装置12提供待由目的地装置14在稍后时间解码的经编码视频数据。详细地说，源装置12经由计算机可读媒体16将视频数据提供至目的地装置14。源装置12及目的地装置14可包括广泛范围的装置中的任一者，所述装置包含台式计算机、笔记本(即，膝上型)计算机、平板计算机、机顶盒、例如所谓“智能型”手机的电话手机、所谓“智能型”板、电视、摄像机、显示装置、数字媒体播放器、数字游戏主机、视频流装置，或其类似者。在一些状况下，源装置12及目的地装置14可经配备以进行无线通信。FIG1 is a block diagram illustrating an example video encoding and decoding system 10 that may utilize the techniques described in this disclosure. As shown in FIG1 , system 10 includes a source device 12 that provides encoded video data to be decoded at a later time by a destination device 14. In detail, source device 12 provides the video data to destination device 14 via a computer-readable medium 16. Source device 12 and destination device 14 may comprise any of a wide range of devices, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephones such as so-called "smart" phones, so-called "smart" tablets, televisions, cameras, display devices, digital media players, digital game consoles, video streaming devices, or the like. In some cases, source device 12 and destination device 14 may be equipped for wireless communication.

目的地装置14可接收待经由计算机可读媒体16解码的经编码视频数据。计算机可读媒体16可包括能够将经编码视频数据自源装置12移动至目的地装置14的任何类型的媒体或装置。在一个实例中，计算机可读媒体16可包括通信媒体以使得源装置12能够将经编码视频数据实时地直接发射至目的地装置14。可根据通信标准(例如，无线通信协议)调制经编码视频数据，且将经编码视频数据发射至目的地装置14。通信媒体可包括任何无线或有线通信媒体，例如，射频(RF)频谱或一或多个实体传输线。通信媒体可形成基于数据包之网络(例如，局域网、广域网或例如因特网的全球网络)的部分。通信媒体可包含路由器、交换机、基站，或可用以促进自源装置12至目的地装置14的通信的任何其它设备。Destination device 14 may receive the encoded video data to be decoded via computer-readable medium 16. Computer-readable medium 16 may comprise any type of medium or device capable of moving the encoded video data from source device 12 to destination device 14. In one example, computer-readable medium 16 may comprise a communication medium to enable source device 12 to transmit the encoded video data directly to destination device 14 in real time. The encoded video data may be modulated according to a communication standard (e.g., a wireless communication protocol) and transmitted to destination device 14. The communication medium may comprise any wireless or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network (e.g., a local area network, a wide area network, or a global network such as the Internet). The communication medium may include routers, switches, base stations, or any other equipment that may be used to facilitate communication from source device 12 to destination device 14.

在一些实例中，可将经编码数据自输出接口22输出至存储装置。类似地，可通过输入接口自存储装置存取经编码数据。存储装置可包含多种分布式或本地存取式数据存储媒体中的任一者，例如，硬盘驱动器、蓝光光盘、DVD、CD-ROM、闪存、易失性或非易失性存储器，或用于存储经编码视频数据的任何其它合适的数字存储媒体。在其它实例中，存储装置可对应于文件服务器或可存储由源装置12产生的经编码视频的另一中间存储装置。目的地装置14可经由流式处理或下载自存储装置存取所存储的视频数据。文件服务器可为能够存储经编码视频数据且将所述经编码视频数据发射至目的地装置14的任何类型的服务器。实例文件服务器包含网页服务器(例如，用于网站)、FTP服务器、网络连接存储(NAS)装置或本地磁盘驱动器。目的地装置14可经由任何标准数据连接(包含因特网连接)而存取经编码的视频数据。此数据连接可包含适合于存取存储于文件服务器上的经编码视频数据的无线信道(例如，Wi-Fi连接)、有线连接(例如，DSL、电缆调制解调器等)，或两者的结合。经编码视频数据自存储装置之发射可为流式发射、下载发射，或其组合。In some examples, the encoded data may be output from output interface 22 to a storage device. Similarly, the encoded data may be accessed from the storage device via the input interface. The storage device may include any of a variety of distributed or locally accessible data storage media, such as a hard drive, Blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data. In other examples, the storage device may correspond to a file server or another intermediate storage device that can store the encoded video generated by source device 12. Destination device 14 may access the stored video data from the storage device via streaming or downloading. The file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to destination device 14. Example file servers include a web server (e.g., for a website), an FTP server, a network-attached storage (NAS) device, or a local disk drive. Destination device 14 may access the encoded video data via any standard data connection, including an Internet connection. This data connection may include a wireless channel suitable for accessing encoded video data stored on a file server (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both. The transmission of the encoded video data from the storage device may be a streaming transmission, a download transmission, or a combination thereof.

本发明的技术未必限于无线应用或设定。所述技术可应用于支持多种多媒体应用中的任一者的视频译码，所述应用例如：空中电视广播、有线电视发射、卫星电视发射、例如HTTP动态自适应性流式处理(DASH)之因特网流式视频发射、经编码至数据存储媒体上的数字视频、存储于数据存储媒体上的数字视频的解码，或其它应用。在一些实例中，系统10可经配置以支持单向或双向视频发射以支持例如视频流式处理、视频播放、视频广播及/或视频电话的应用。The techniques of this disclosure are not necessarily limited to wireless applications or settings. The techniques may be applied to video decoding to support any of a variety of multimedia applications, such as over-the-air television broadcasts, cable television transmissions, satellite television transmissions, Internet streaming video transmissions such as Dynamic Adaptive Streaming over HTTP (DASH), digital video encoded onto a data storage medium, decoding of digital video stored on a data storage medium, or other applications. In some examples, system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

在图1的实例中，源装置12包含视频源18、视频编码器20及输出接口22。目的地装置14包含输入接口28、视频解码器30及显示装置32。在其它实例中，源装置及目的地装置可包含其它组件或布置。举例来说，源装置12可自外部视频源18(例如，外部摄像机)接收视频数据。同样地，目的地装置14可与外部显示装置接口连接，而非包含集成式显示装置。1 , source device 12 includes a video source 18, a video encoder 20, and an output interface 22. Destination device 14 includes an input interface 28, a video decoder 30, and a display device 32. In other examples, the source and destination devices may include other components or arrangements. For example, source device 12 may receive video data from an external video source 18 (e.g., an external camera). Similarly, destination device 14 may interface with an external display device rather than including an integrated display device.

图1的所说明的系统10仅为一个实例。本发明的技术可由任何数字视频编码及/或解码装置来执行。尽管通常所述技术由视频编码装置来执行，但所述技术也可由通常称作“编解码器”的视频编码器/解码器来执行。此外，也可通过视频预处理器来执行本发明的技术。源装置12及目的地装置14仅为此些译码装置的实例，在所述译码装置中，源装置12产生用于发射至目的地装置14之经译码视频数据。在一些实例中，装置12、14可以实质上对称的方式进行操作，使得装置12、14中的每一者包含视频编码及解码组件。因此，系统10可支持视频装置12、14之间的单向或双向视频发射，例如，用于视频流式处理、视频播放、视频广播或视频电话。The illustrated system 10 of FIG. 1 is merely an example. The techniques of this disclosure may be performed by any digital video encoding and/or decoding device. Although typically performed by a video encoding device, the techniques may also be performed by a video encoder/decoder, commonly referred to as a "codec." Furthermore, the techniques of this disclosure may also be performed by a video preprocessor. Source device 12 and destination device 14 are merely examples of such decoding devices in which source device 12 generates coded video data for transmission to destination device 14. In some examples, devices 12, 14 may operate in a substantially symmetrical manner, such that each of devices 12, 14 includes video encoding and decoding components. Thus, system 10 may support one-way or two-way video transmission between video devices 12, 14, for example, for video streaming, video playback, video broadcasting, or video telephony.

源装置12的视频源18可包含例如视频摄像机的视频俘获装置、含有先前俘获的视频的视频存档，及/或用以自视频内容提供商接收视频的视频馈入接口。作为另一替代例，视频源18可产生基于计算机图形的数据作为源视频，或产生实况视频、经存档视频及计算机产生的视频的组合。在一些情况下，如果视频源18为视频摄像机，则源装置12及目的地装置14可形成所谓的摄像机电话或视频电话。然而，如上文所提及，本发明中所描述的技术可通常适用于视频译码，且可应用于无线及/或有线应用。在每一状况下，经俘获、预先俘获或计算机产生的视频可由视频编码器20来编码。经编码视频信息可接着由输出接口22输出于计算机可读媒体16上。Video source 18 of source device 12 may include a video capture device, such as a video camera, a video archive containing previously captured video, and/or a video feed interface for receiving video from a video content provider. As another alternative, video source 18 may generate computer graphics-based data as the source video, or a combination of live video, archived video, and computer-generated video. In some cases, if video source 18 is a video camera, source device 12 and destination device 14 may form so-called camera phones or video phones. However, as mentioned above, the techniques described in this disclosure may be generally applicable to video coding and may be applied to wireless and/or wired applications. In each case, the captured, pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video information may then be output by output interface 22 onto computer-readable medium 16.

计算机可读媒体16可包含暂时性媒体，例如，无线广播或有线网络发射；或存储媒体(即，非暂时性存储媒体)，例如，硬盘、闪存驱动器、光盘、数字视频光盘、蓝光光盘或其它计算机可读媒体。在一些实例中，网络服务器(图中未展示)可自源装置12接收经编码视频数据，且(例如)经由网络发射将所述经编码的视频数据提供至目的地装置14。类似地，媒体生产设施(例如，光盘压印设施)的计算装置可自源装置12接收经编码视频数据且产生含有所述经编码视频数据的光盘。因此，在各种实例中，可将计算机可读媒体16理解成包含各种形式的一或多个计算机可读媒体。Computer-readable medium 16 may include transitory media, such as wireless broadcasts or wired network transmissions, or storage media (i.e., non-transitory storage media), such as a hard drive, a flash drive, an optical disc, a digital video disc, a Blu-ray disc, or other computer-readable media. In some examples, a network server (not shown) may receive encoded video data from source device 12 and provide the encoded video data to destination device 14, such as via a network transmission. Similarly, a computing device at a media production facility (e.g., a disc imprinting facility) may receive encoded video data from source device 12 and produce a disc containing the encoded video data. Thus, in various examples, computer-readable medium 16 may be understood to include one or more computer-readable media in various forms.

目的地装置14的输入接口28自计算机可读媒体16接收信息。计算机可读媒体16的信息可包含由视频编码器20定义的也由视频解码器30使用的语法信息，所述语法信息包含描述块及其它经译码单元(例如，GOP)的特性及/或处理的语法元素。显示装置32向用户显示经解码视频数据，且可包括多种显示装置中的任一者，例如，阴极射线管(CRT)、液晶显示器(LCD)、等离子显示器、有机发光二极管(OLED)显示器或另一类型的显示装置。Input interface 28 of destination device 14 receives information from computer-readable medium 16. The information of computer-readable medium 16 may include syntax information defined by video encoder 20, including syntax elements describing characteristics and/or processing of blocks and other coded units (e.g., GOPs), that is also used by video decoder 30. Display device 32 displays the decoded video data to a user and may comprise any of a variety of display devices, such as a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or another type of display device.

视频编码器20及视频解码器30可根据例如目前在开发中的高效视频译码(HEVC)标准的视频译码标准而操作，且可符合HEVC测试模型(HM)。或者，视频编码器20及视频解码器30可根据例如或者被称作MPEG-4第10部分(高级视频译码(AVC))之ITU-T H.264标准的其它专属或工业标准或此些标准的扩展而操作。然而，本发明的技术不限于任何特定译码标准。视频译码标准的其它实例包含MPEG-2及ITU-T H.263。虽然未展示于图1中，但在一些方面中，视频编码器20及视频解码器30可各自与音频编码器及解码器集成，且可包含适当MUX-DEMUX单元或其它硬件及软件以处置共同数据流或独立数据流中的音频及视频两者的编码。如果适用，则MUX-DEMUX单元可符合ITU H.223多路复用器协议或例如用户数据报协议(UDP)的其它协议。Video encoder 20 and video decoder 30 may operate according to a video coding standard, such as the High Efficiency Video Coding (HEVC) standard currently under development, and may conform to the HEVC Test Model (HM). Alternatively, video encoder 20 and video decoder 30 may operate according to other proprietary or industry standards, such as the ITU-T H.264 standard, or extensions of such standards, also known as MPEG-4 Part 10 (Advanced Video Coding (AVC)). However, the techniques of this disclosure are not limited to any particular coding standard. Other examples of video coding standards include MPEG-2 and ITU-T H.263. Although not shown in FIG. 1 , in some aspects, video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder, and may include appropriate MUX-DEMUX units or other hardware and software to handle the encoding of both audio and video in a common data stream or in separate data streams. If applicable, the MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol or other protocols, such as the User Datagram Protocol (UDP).

ITU-T H.264/MPEG-4(AVC)标准是由ITU-T视频译码专业团体(VCEG)连同ISO/IEC动画专业团体(MPEG)一起作为被称为联合视频团队(JVT)的集体伙伴关系的产物而制订的。在一些方面中，本发明中所描述的技术可应用于大体符合H.264标准的装置。H.264标准描述于由ITU-T研究团体在2005年3月发布之ITU-T推荐H.264(用于一般视听服务的高级视频译码)中，其在本文中可被称作H.264标准或H.264规范或H.264/AVC标准或规范。联合视频团队(JVT)继续致力于对H.264/MPEG-4AVC的扩展。The ITU-T H.264/MPEG-4 (AVC) standard was developed by the ITU-T Video Coding Expert Group (VCEG) in conjunction with the ISO/IEC Motion Picture Expert Group (MPEG) as a collective partnership known as the Joint Video Team (JVT). In some aspects, the techniques described in this disclosure may be applied to devices that generally conform to the H.264 standard. The H.264 standard is described in ITU-T Recommendation H.264 (Advanced Video Coding for General Audiovisual Services), published in March 2005 by the ITU-T study group, which may be referred to herein as the H.264 standard, the H.264 specification, or the H.264/AVC standard or specification. The Joint Video Team (JVT) continues to work on extensions to H.264/MPEG-4 AVC.

视频编码器20及视频解码器30可各自实施为多种合适编码器电路中的任一者，例如一或多个微处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、离散逻辑、软件、硬件、固件或其任何组合。当所述技术部分地以软件实施时，装置可将用于软件的指令存储于合适的非暂时性计算机可读媒体中，且在硬件中使用一或多个处理器来执行所述指令以执行本发明的技术。视频编码器20及视频解码器30中的每一者可包含于一或多个编码器或解码器中，其中任一者可集成为相应装置中的经组合编码器/解码器(编解码器(CODEC))的部分。Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuits, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware, or any combination thereof. When the techniques are implemented partially in software, the device may store instructions for the software in a suitable, non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in the respective device.

JCT-VC正致力于HEVC标准的开发。HEVC标准化努力是基于视频译码装置的被称作HEVC测试模型(HM)的演进模型。HM假设了视频译码装置相对于根据(例如)ITU-T H.264/AVC的现有装置的若干额外能力。举例来说，H.264提供九个帧内预测编码模式，而HM可提供多达三十三个帧内预测编码模式。The JCT-VC is working on the development of the HEVC standard. The HEVC standardization effort is based on an evolving model of a video coding device called the HEVC Test Model (HM). The HM assumes several additional capabilities of video coding devices compared to existing devices based on, for example, ITU-T H.264/AVC. For example, while H.264 provides nine intra-frame prediction coding modes, the HM can provide up to 33 intra-frame prediction coding modes.

一般来说，HM的工作模型描述视频帧或图片可被划分成包含明度样本及色度样本两者的树型块或最大译码单元(LCU)的序列。位流内的语法数据可定义LCU的大小，LCU就像素的数目来说为最大译码单元。切片包含按译码次序的数个连续树型块。可将视频帧或图片分割成一或多个切片。可根据四叉树而将每一树型块分裂成若干译码单元(CU)。一般来说，四叉树数据结构对于每CU包含一个节点，其中根节点对应于树型块。如果将CU分裂成四个子CU，则对应于所述CU的节点包含四个叶节点，所述四个叶节点中的每一者对应于所述子CU中的一者。In general, the HM's working model describes that a video frame or picture can be divided into a sequence of treeblocks, or largest coding units (LCUs), that include both luma and chroma samples. Syntax data within the bitstream can define the size of an LCU, which is a largest coding unit in terms of the number of pixels. A slice comprises a number of consecutive treeblocks in coding order. A video frame or picture can be partitioned into one or more slices. Each treeblock can be split into coding units (CUs) according to a quadtree. In general, a quadtree data structure includes one node per CU, with the root node corresponding to the treeblock. If a CU is split into four sub-CUs, the node corresponding to the CU includes four leaf nodes, each of which corresponds to one of the sub-CUs.

所述四叉树数据结构中的每一节点可提供针对对应CU的语法数据。举例来说，四叉树中的节点可包含分裂旗标，从而指示是否将对应于所述节点的CU分裂成子CU。可递归地定义用于CU的语法元素，且用于CU的语法元素可视CU是否被分裂成子CU而定。如果CU未经进一步分裂，则其被称作叶CU。在本发明中，即使不存在原始叶CU的明显分裂，叶CU的四个子CU也会被称作叶CU。举例来说，如果16×16大小的CU未经进一步分裂，则四个8×8子CU也会被称作叶CU，尽管所述16×16CU从未经分裂。Each node in the quadtree data structure may provide syntax data for the corresponding CU. For example, a node in the quadtree may include a split flag, indicating whether the CU corresponding to the node is split into sub-CUs. Syntax elements for a CU may be defined recursively, and the syntax elements for a CU may depend on whether the CU is split into sub-CUs. If a CU is not further split, it is referred to as a leaf-CU. In the present invention, the four sub-CUs of a leaf-CU are referred to as leaf-CUs even if there is no explicit splitting of the original leaf-CU. For example, if a CU of size 16×16 is not further split, the four 8×8 sub-CUs are also referred to as leaf-CUs, even though the 16×16 CU has never been split.

除了CU不具有大小区别之外，CU具有类似于H.264标准之宏块的用途。举例来说，树型块可分裂成四个子节点(亦被称作子CU)，且每一子节点又可为父节点并分裂成另外四个子节点。被称作四叉树的叶节点的最终的未分裂子节点包括一译码节点，所述译码节点亦被称作叶CU。与经译码位流相关联的语法数据可定义可分裂树型块的最大次数(其被称作最大CU深度)，且还可定义所述译码节点的最小大小。因此，位流也可定义最小译码单元(SCU)。本发明使用术语“块”指代在HEVC的背景中的CU、PU或TU中的任一者，或在其它标准的背景中的类似数据结构(例如，在H.264/AVC中的宏块及其子块)。CUs have uses similar to macroblocks of the H.264 standard, except that CUs do not have size distinctions. For example, a treeblock can be split into four child nodes (also called sub-CUs), and each child node can be a parent node and split into another four child nodes. The final unsplit child node, called a leaf node of the quadtree, includes a coding node, which is also called a leaf CU. Syntax data associated with the coded bitstream can define the maximum number of times a treeblock can be split (which is called the maximum CU depth) and can also define the minimum size of the coding node. Therefore, the bitstream can also define a minimum coding unit (SCU). This disclosure uses the term "block" to refer to any of a CU, PU, or TU in the context of HEVC, or similar data structures in the context of other standards (e.g., macroblocks and their sub-blocks in H.264/AVC).

CU包含译码节点及与所述译码节点相关联的预测单元(PU)及变换单元(TU)。CU的大小对应于译码节点的大小，且形状必须为正方形。CU的大小可在8×8像素直至具有最大64×64像素或大于64×64像素的树型块的大小的范围内。每一CU可含有一或多个PU及一或多个TU。与CU相关联的语法数据可描述(例如)CU至一或多个PU的分割。分割模式可视CU是经跳过或直接模式编码、经帧内预测模式编码或是经帧间预测模式编码而不同。PU可分割成非正方形的形状。与CU相关联的语法数据也可描述(例如)根据四叉树将CU分割成一或多个TU。TU的形状可为正方形或非正方形(例如，矩形)。A CU comprises a coding node and prediction units (PUs) and transform units (TUs) associated with the coding node. The size of the CU corresponds to the size of the coding node and must be square in shape. The size of a CU can range from 8×8 pixels to the size of a treeblock with a maximum of 64×64 pixels or larger. Each CU may contain one or more PUs and one or more TUs. Syntax data associated with a CU may describe, for example, the partitioning of the CU into one or more PUs. The partitioning mode may differ depending on whether the CU is encoded in skip or direct mode, encoded in intra-prediction mode, or encoded in inter-prediction mode. A PU may be partitioned into non-square shapes. Syntax data associated with a CU may also describe, for example, the partitioning of the CU into one or more TUs according to a quadtree. The shape of a TU may be square or non-square (for example, rectangular).

HEVC标准允许根据TU的变换，TU对于不同CU可不同。通常基于针对经分割LCU所定义的给定CU内的PU的大小而设定TU的大小，但可能并非总是如此状况。TU通常具有与PU相同的大小，或小于PU。在一些实例中，可使用称为“残余四叉树”(RQT)的四叉树结构而将对应于CU的残余样本再分为更小的单元。RQT的叶节点可被称作变换单元(TU)。可变换与TU相关联的像素差值以产生变换系数，其可经量化。The HEVC standard allows for transforms based on TUs, which can be different for different CUs. The size of a TU is typically set based on the size of the PU within a given CU defined for a partitioned LCU, but this may not always be the case. A TU is typically the same size as a PU, or smaller than a PU. In some examples, a quadtree structure called a "residual quadtree" (RQT) can be used to subdivide the residual samples corresponding to a CU into smaller units. The leaf nodes of the RQT can be referred to as transform units (TUs). Pixel difference values associated with a TU can be transformed to produce transform coefficients, which can be quantized.

叶CU可包含一或多个预测单元(PU)。一般来说，PU表示对应于对应CU的全部或一部分的空间区域，且可包含用于检索PU的参考样本的数据。此外，PU包含与预测有关的数据。举例来说，当PU经帧内模式编码时，用于PU的数据可包含于残余四叉树(RQT)中，残余四叉树可包含描述对应于PU的TU的帧内预测模式的数据。作为另一实例，当PU经帧间模式编码时，PU可包含定义所述PU的一或多个运动向量的数据。定义PU的运动向量的数据可描述(例如)运动向量的水平分量、运动向量的垂直分量、运动向量的分辨率(例如，四分之一像素精度或八分之一像素精度)、运动向量所指向的参考图片，及/或运动向量的参考图片列表(例如，列表0、列表1或列表C)。A leaf-CU may include one or more prediction units (PUs). Generally, a PU represents a spatial region corresponding to all or a portion of the corresponding CU and may include data used to retrieve reference samples for the PU. In addition, a PU includes data related to prediction. For example, when the PU is intra-mode encoded, the data for the PU may be included in a residual quadtree (RQT), which may include data describing the intra-prediction mode of the TU corresponding to the PU. As another example, when the PU is inter-mode encoded, the PU may include data defining one or more motion vectors for the PU. The data defining the motion vector of the PU may describe, for example, the horizontal component of the motion vector, the vertical component of the motion vector, the resolution of the motion vector (e.g., quarter-pixel precision or eighth-pixel precision), the reference picture to which the motion vector points, and/or the reference picture list for the motion vector (e.g., List 0, List 1, or List C).

具有一或多个PU的叶CU也可包含一或多个变换单元(TU)。可使用RQT(亦被称作TU四叉树结构)指定变换单元，如上文所论述。举例来说，分裂旗标可指示叶CU是否分裂成四个变换单元。接着，每一变换单元可进一步分裂成其它子TU。当TU不进一步分裂时，其可被称作叶TU。一般来说，对于帧内译码，属于叶CU的所有叶TU共享相同帧内预测模式。即，通常应用相同帧内预测模式来计算叶CU的所有TU的预测值。对于帧内译码，视频编码器可将使用帧内预测模式的每一叶TU的残余值计算为在CU的对应于所述TU的部分与原始块之间的差。TU未必限于PU的大小。因此，TU可能大于或小于PU。对于帧内译码，PU可与同一CU的一对应叶TU共同定位。在一些实例中，叶TU的最大大小可对应于对应叶CU的大小。A leaf-CU with one or more PUs may also include one or more transform units (TUs). The transform units may be specified using an RQT (also known as a TU quadtree structure), as discussed above. For example, a split flag may indicate whether a leaf-CU is split into four transform units. Each transform unit may then be further split into additional sub-TUs. When a TU is not further split, it may be referred to as a leaf-TU. Generally speaking, for intra coding, all leaf-TUs belonging to a leaf-CU share the same intra prediction mode. That is, the same intra prediction mode is typically applied to calculate prediction values for all TUs of a leaf-CU. For intra coding, the video encoder may calculate the residual value for each leaf-TU using the intra prediction mode as the difference between the portion of the CU corresponding to the TU and the original block. A TU is not necessarily limited to the size of a PU. Thus, a TU may be larger or smaller than a PU. For intra coding, a PU may be co-located with a corresponding leaf-TU of the same CU. In some examples, the maximum size of a leaf-TU may correspond to the size of the corresponding leaf-CU.

此外，叶CU的TU也可与被称作残余四叉树(RQT)的相应四叉树数据结构相关联。即，叶CU可包含指示如何将叶CU分割成TU的四叉树。TU四叉树的根节点通常对应于叶CU，而CU四叉树的根节点通常对应于树型块(或LCU)。RQT的不分裂的TU被称作叶TU。一般来说，除非另有指示，否则本发明分别使用术语CU及TU来指代叶CU及叶TU。Furthermore, the TUs of a leaf-CU may also be associated with corresponding quadtree data structures called residual quadtrees (RQTs). That is, a leaf-CU may include a quadtree that indicates how the leaf-CU is partitioned into TUs. The root node of a TU quadtree typically corresponds to a leaf-CU, while the root node of a CU quadtree typically corresponds to a treeblock (or LCU). TUs that are not split in an RQT are referred to as leaf-TUs. In general, unless otherwise indicated, this disclosure uses the terms CU and TU to refer to a leaf-CU and leaf-TU, respectively.

视频序列通常包含一系列视频帧或图片。图片群组(GOP)通常包括视频图片中的一系列的一或多者。GOP可在GOP的标头、图片中的一或多者的标头中或在别处包含描述包含于GOP中的图片数目的语法数据。图片的每一切片可包含描述所述相应切片的编码模式的切片语法数据。视频编码器20通常对个别视频切片内的视频块进行操作，以便编码视频数据。视频块可对应于CU内的译码节点。视频块可具有固定或变化的大小，且可根据指定译码标准而在大小方面不同。A video sequence typically includes a series of video frames or pictures. A group of pictures (GOP) typically includes a series of one or more of the video pictures. A GOP may include syntax data describing the number of pictures included in the GOP in a header of the GOP, a header of one or more of the pictures, or elsewhere. Each slice of a picture may include slice syntax data describing the coding mode of the respective slice. Video encoder 20 typically operates on video blocks within individual video slices to encode the video data. A video block may correspond to a coding node within a CU. A video block may have a fixed or varying size and may differ in size according to a specified coding standard.

作为实例，HM支持以各种PU大小进行预测。假定特定CU的大小为2N×2N，则HM支持以2N×2N或N×N的PU大小进行帧内预测，及以2N×2N、2N×N、N×2N或N×N的对称PU大小进行帧间预测。HM也支持用于以2N×nU、2N×nD、nL×2N及nR×2N的PU大小进行帧间预测的不对称分割。在不对称分割中，CU的一个方向未分割，而另一方向分割成25％及75％。CU的对应于25％分割区的部分由“n”接着是“上”、“下”、“左”或“右”的指示来指示。因此，例如，“2N×nU”指代在水平方向上分割为顶部2N×0.5N PU及底部2N×1.5N PU的2N×2N CU。As an example, the HM supports prediction with various PU sizes. Assuming a particular CU is 2N×2N in size, the HM supports intra prediction with PU sizes of 2N×2N or N×N, and inter prediction with symmetric PU sizes of 2N×2N, 2N×N, N×2N, or N×N. The HM also supports asymmetric partitioning for inter prediction with PU sizes of 2N×nU, 2N×nD, nL×2N, and nR×2N. In asymmetric partitioning, the CU is unpartitioned in one direction and partitioned into 25% and 75% in the other direction. The portion of the CU corresponding to the 25% partition is indicated by an "n" followed by an indication of "up," "down," "left," or "right." Thus, for example, "2N×nU" refers to a 2N×2N CU that is partitioned horizontally into a 2N×0.5N PU at the top and a 2N×1.5N PU at the bottom.

在本发明中，“N×N”与“N乘N”可被互换地使用以指代视频块在垂直尺寸与水平尺寸方面的像素尺寸，例如，16×16像素或16乘16像素。一般来说，16×16块在垂直方向上将具有16个像素(y＝16)且在水平方向上将具有16个像素(x＝16)。同样地，N×N块通常在垂直方向上具有N个像素，且在水平方向上具有N个像素，其中N表示非负整数值。可按行及列来布置块中的像素。此外，块未必需要在水平方向上与在垂直方向上具有相同数目个像素。举例来说，块可包括N×M个像素，其中M未必等于N。In this disclosure, "NxN" and "N by N" may be used interchangeably to refer to the pixel dimensions of a video block in terms of vertical and horizontal dimensions, e.g., 16x16 pixels or 16 by 16 pixels. Generally, a 16x16 block will have 16 pixels in the vertical direction (y=16) and 16 pixels in the horizontal direction (x=16). Similarly, an NxN block typically has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. The pixels in a block may be arranged in rows and columns. Furthermore, a block need not necessarily have the same number of pixels in the horizontal direction as in the vertical direction. For example, a block may comprise NxM pixels, where M is not necessarily equal to N.

在使用CU的PU的帧内预测性或帧间预测性译码之后，视频编码器20可计算CU的TU的残余数据。PU可包括描述在空间域(也称作像素域)中产生预测性像素数据的方法或模式的语法数据，且TU可包括在将例如离散余弦变换(DCT)、整数变换、小波变换或概念上类似的变换的变换应用于残余视频数据之后变换域中的系数。残余数据可对应于未经编码的图片的像素与对应于PU的预测值之间的像素差。视频编码器20可形成包含CU的残余数据的TU，且接着变换所述TU以产生CU的变换系数。After intra-predictive or inter-predictive coding of the PUs of a CU, video encoder 20 may calculate residual data for the TUs of the CU. A PU may include syntax data that describes a method or mode of generating predictive pixel data in the spatial domain (also called the pixel domain), and a TU may include coefficients in the transform domain after applying a transform, such as a discrete cosine transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform, to the residual video data. The residual data may correspond to pixel differences between pixels of an unencoded picture and prediction values corresponding to the PU. Video encoder 20 may form TUs including the residual data for the CU, and then transform the TUs to generate transform coefficients for the CU.

在进行任何变换以产生变换系数之后，视频编码器20可执行变换系数的量化。量化通常指代如下过程：将变换系数量化以可能地减少用以表示所述系数的数据的量，从而提供进一步压缩。所述量化过程可减少与所述系数中的一些或所有系数相关联的位深度。举例来说，可在量化期间将n位值下舍入至m位值，其中n大于m。After performing any transforms to generate transform coefficients, video encoder 20 may perform quantization of the transform coefficients. Quantization generally refers to the process of quantizing transform coefficients to potentially reduce the amount of data used to represent the coefficients, thereby providing further compression. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value may be rounded down to an m-bit value during quantization, where n is greater than m.

在量化之后，视频编码器可扫描变换系数，从而自包含经量化的变换系数的二维矩阵产生一维向量。扫描可经设计成将较高能量(且因此较低频率)系数置于数组前部，且将较低能量(且因此较高频率)系数置于数组后部。在一些实例中，视频编码器20可利用预定义扫描次序来扫描经量化的变换系数，以产生可经熵编码的串行化向量。在其它实例中，视频编码器20可执行自适应性扫描。在扫描经量化的变换系数以形成一维向量之后，视频编码器20可(例如)根据上下文自适应性可变长度译码(CAVLC)、上下文自适应性二进制算术译码(CABAC)、基于语法的上下文自适应性二进制算术译码(SBAC)、机率区间分割熵(PIPE)译码或另一熵编码方法而熵编码所述一维向量。视频编码器20也可熵编码与经编码的视频数据相关联的语法元素以供视频解码器30用于解码视频数据。After quantization, the video encoder may scan the transform coefficients, generating a one-dimensional vector from a two-dimensional matrix containing the quantized transform coefficients. The scan may be designed to place higher energy (and therefore lower frequency) coefficients at the front of the array and lower energy (and therefore higher frequency) coefficients at the back of the array. In some examples, video encoder 20 may utilize a predefined scan order to scan the quantized transform coefficients to generate a serialized vector that can be entropy encoded. In other examples, video encoder 20 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 20 may entropy encode the one-dimensional vector, for example, according to context-adaptive variable length coding (CAVLC), context-adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding, or another entropy encoding method. Video encoder 20 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 30 in decoding the video data.

为了执行CABAC，视频编码器20可将上下文模型内的上下文指派给待发射的符号。所述上下文可涉及(例如)符号的相邻值是否为非零的。为了执行CAVLC，视频编码器20可针对待发射的符号选择可变长度码。VLC中的码字可经构造，使得相对较短码对应于更有可能的符号，而较长码对应于较不可能的符号。以此方式，使用VLC可实现位节省(与(例如)针对待发射的每一符号使用等长度码字相比较)。机率判定可基于指派给符号的上下文。To perform CABAC, video encoder 20 may assign context within a context model to a symbol to be transmitted. The context may relate to, for example, whether neighboring values of a symbol are non-zero. To perform CAVLC, video encoder 20 may select a variable length code for the symbol to be transmitted. Codewords in VLC may be constructed so that relatively shorter codes correspond to more likely symbols, while longer codes correspond to less likely symbols. In this way, using VLC may achieve bit savings (compared to, for example, using equal-length codewords for each symbol to be transmitted). Probability decisions may be based on the context assigned to the symbol.

视频编码器20可进一步例如在帧标头、块标头、切片标头或GOP标头中发送语法数据(例如，基于块的语法数据、基于帧的语法数据，及基于GOP的语法数据)至视频解码器30。GOP语法数据可描述相应GOP中的帧的数目，且帧语法数据可指示用以编码对应帧的编码/预测模式。Video encoder 20 may further send syntax data (e.g., block-based syntax data, frame-based syntax data, and GOP-based syntax data), e.g., in a frame header, block header, slice header, or GOP header, to video decoder 30. The GOP syntax data may describe the number of frames in the corresponding GOP, and the frame syntax data may indicate the encoding/prediction mode used to encode the corresponding frame.

视频编码器20及视频解码器30可各自实施为多种合适编码器或解码器电路中的任一者(在适用时)，例如，一或多个微处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、离散逻辑电路、软件、硬件、固件或其任何组合。视频编码器20及视频解码器30中的每一者可包含于一或多个编码器或解码器中，其中的任一者可集成为组合式视频编码器/解码器(CODEC)的部分。包含视频编码器20及/或视频解码器30的装置可包括集成电路、微处理器及/或无线通信装置(例如，蜂窝式电话)。Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder or decoder circuits, as applicable, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic circuits, software, hardware, firmware, or any combination thereof. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined video encoder/decoder (CODEC). A device including video encoder 20 and/or video decoder 30 may comprise an integrated circuit, a microprocessor, and/or a wireless communication device (e.g., a cellular telephone).

视频译码标准可包含视频缓冲模型的规范。在AVC及HEVC中，缓冲模型被称作假想参考解码器(HRD)，其包含在视频编码器20及/或视频解码器30中所包含的经译码图片缓冲器(CPB)及经解码图片缓冲器(DPB)两者的缓冲模型，且CPB及DPB行为被数学地指定。HRD直接对不同时序、缓冲器大小及位速率强加约束，且间接对位流特性及统计资料强加约束。HRD参数的完整集合包含五个基本参数：初始CPB移除延迟、CPB大小、位速率、初始DPB输出延迟及DPB大小。在AVC及HEVC中，位流一致性及解码器一致性被指定作为HRD规范的部分。尽管HRD被命名为一种类型的解码器，但在编码器侧通常需要HRD以保证位流一致性(即，编码器产生的位流与解码器的要求的一致性)，而在解码器侧处通常不需要HRD。Video coding standards may include specifications for video buffering models. In AVC and HEVC, the buffering model is called a Hypothetical Reference Decoder (HRD), which includes buffering models for both the coded picture buffer (CPB) and the decoded picture buffer (DPB) included in video encoder 20 and/or video decoder 30, with the CPB and DPB behaviors mathematically specified. The HRD directly imposes constraints on different timings, buffer sizes, and bit rates, and indirectly on bitstream characteristics and statistics. The complete set of HRD parameters includes five basic parameters: initial CPB removal delay, CPB size, bit rate, initial DPB output delay, and DPB size. In AVC and HEVC, bitstream conformance and decoder conformance are specified as part of the HRD specification. Although HRD is named for a type of decoder, it is generally required on the encoder side to ensure bitstream conformance (i.e., the bitstream generated by the encoder conforms to the requirements of the decoder), but is generally not required on the decoder side.

在AVC及HEVC HRD模型中，解码或CPB移除是基于存取单元的，且假设图片解码是瞬时的。在实际应用中，如果一致的解码器严格地遵循(例如)在图片时序补充增强信息(SEI)消息中发信的解码时间而开始解码存取单元，则输出特定经解码的图片的最早可能时间等于所述特定图片的解码时间加上解码所述特定图片所需的时间。不同于AVC及HEVCHRD模型，在真实世界中解码图片所需要的时间不等于零。如本发明中所使用的术语“瞬时”及“瞬时地”可指代可在一或多个译码模型或任何一或多个译码模型的理想化方面中被假定为瞬时的任何持续时间，应理解此可不同于在实体或文字意义上的“瞬时”。举例来说，为了本发明的目的，如果功能或过程在执行所述功能或过程的假想或理想化的最早可能时间的实际容限处或内发生，则功能或过程可被视作名义上“瞬时的”。在一些实例中，如本文中所使用的语法及变量名称可根据其在HEVC模型内的含义来加以理解。In the AVC and HEVC HRD models, decoding or CPB removal is based on access units, and picture decoding is assumed to be instantaneous. In practice, if a conforming decoder strictly follows the decoding time signaled in a picture timing supplemental enhancement information (SEI) message to start decoding an access unit, the earliest possible time to output a particular decoded picture is equal to the decoding time of that particular picture plus the time required to decode that particular picture. Unlike the AVC and HEVC HRD models, the time required to decode a picture in the real world is not equal to zero. As used in this disclosure, the terms "instantaneous" and "instantaneously" may refer to any duration that can be assumed to be instantaneous in one or more coding models or idealized aspects of any one or more coding models, with the understanding that this may be different from "instantaneous" in a physical or literal sense. For example, for the purposes of this disclosure, a function or process may be considered nominally "instantaneous" if it occurs at or within a practical tolerance of the assumed or idealized earliest possible time to perform the function or process. In some examples, syntax and variable names as used herein may be understood according to their meaning within the HEVC model.

提供了实例假想参考解码器(HRD)操作、经译码图片缓冲器的实例操作、位流到达的实例时序、解码单元移除的实例时序、解码单元的实例解码、经解码图片缓冲器的实例操作、图片自经解码图片缓冲器的实例移除、实例图片输出及实例当前解码图片标记及存储的以下描述，以说明视频编码器20及/或视频解码器30的实例，所述视频编码器20及/或视频解码器30可经配置以除其它功能外尤其将视频数据的一或多个解码单元存储于图片缓冲器中，获得一或多个解码单元的相应缓冲器移除时间，根据解码单元中的每一者的所获得的缓冲器移除时间自图片缓冲器移除解码单元，及译码对应于经移除的解码单元的视频数据。在其它实例中，操作可被不同地定义或执行。以此方式，视频编码器20及/或视频解码器30可经配置以根据下文所描述的HRD操作的各种实例来操作。The following description of example hypothetical reference decoder (HRD) operations, example operations of a coded picture buffer, example timing of bitstream arrival, example timing of decoding unit removal, example decoding of a decoding unit, example operations of a decoded picture buffer, example removal of a picture from a decoded picture buffer, example picture output, and example current decoded picture marking and storage is provided to illustrate an example of a video encoder 20 and/or video decoder 30 that may be configured to, among other functions, store one or more decoding units of video data in a picture buffer, obtain respective buffer removal times for one or more decoding units, remove a decoding unit from the picture buffer according to the obtained buffer removal times for each of the decoding units, and decode video data corresponding to the removed decoding unit. In other examples, the operations may be defined or performed differently. In this manner, video encoder 20 and/or video decoder 30 may be configured to operate according to various examples of HRD operations described below.

可在缓冲周期补充增强信息(SEI)消息中的任一者处初始化HRD。在初始化之前，CPB可为空的。在初始化之后，HRD不可由后续缓冲周期SEI消息再次初始化。与初始化CPB的缓冲周期SEI消息相关联的存取单元可被称作存取单元0。经解码的图片缓冲器可含有图片存储缓冲器。图片存储缓冲器中的每一者可含有被标记为“用于参考”或被保持以供将来输出的经解码图片。在初始化之前，DPB可为空的。The HRD may be initialized at any of the buffering period supplemental enhancement information (SEI) messages. Prior to initialization, the CPB may be empty. After initialization, the HRD may not be initialized again by a subsequent buffering period SEI message. The access unit associated with the buffering period SEI message that initializes the CPB may be referred to as access unit 0. The decoded picture buffer may contain picture storage buffers. Each of the picture storage buffers may contain decoded pictures marked as "used for reference" or held for future output. Prior to initialization, the DPB may be empty.

HRD(例如，视频编码器20及/或视频解码器30)可如下操作。假想流调度器(HSS)可递送与根据指定的到达时间表而流入CPB的解码单元相关联的数据。在一个实例中，可在CPB移除时间通过瞬时解码过程瞬时地移除并解码与每一解码单元相关联的数据。每一经解码图片可置放于DPB中。可在DPB输出时间或经解码图片对于帧间预测参考变得不再需要的时间中的较迟时间自DPB移除经解码图片。An HRD (e.g., video encoder 20 and/or video decoder 30) may operate as follows. A hypothetical stream scheduler (HSS) may deliver data associated with decoding units that flow into the CPB according to a specified arrival schedule. In one example, data associated with each decoding unit may be instantaneously removed and decoded by a transient decoding process at the CPB removal time. Each decoded picture may be placed in the DPB. Decoded pictures may be removed from the DPB at a later time, either at the DPB output time or when they are no longer needed for inter-frame prediction reference.

HRD依赖于HRD参数，包含初始CPB移除延迟及初始CPB移除延迟偏移的CPB参数。在一些状况下，可基于用以初始化HRD的图片的类型来判定HRD参数。在随机存取的状况下，可通过例如清洁随机存取(CRA)图片或断链存取(BLA)图片的随机存取点(RAP)图片来初始化HRD。在一些状况下，RAP图片可替代性地被称作帧内随机存取点(IRAP)图片。举例来说，当通过在位流中不具有亦被称作标记为舍弃(TFD)图片或随机存取跳过前置(RASL)图片的相关联的非可解码前置图片的BLA图片初始化HRD时，可使用CPB参数的替代性集合。否则，将CPB参数的默认集合用于HRD。如果在应已选择替代性集合时使用了CPB参数的默认集合，则CPB可溢出。The HRD relies on HRD parameters, including the CPB parameters for the initial CPB removal delay and the initial CPB removal delay offset. In some cases, the HRD parameters may be determined based on the type of picture used to initialize the HRD. In the case of random access, the HRD may be initialized with a random access point (RAP) picture, such as a clean random access (CRA) picture or a broken link access (BLA) picture. In some cases, a RAP picture may alternatively be referred to as an intra random access point (IRAP) picture. For example, when the HRD is initialized with a BLA picture that does not have an associated non-decodable preceding picture in the bitstream, also known as a marked-for-discard (TFD) picture or a random access skipped preceding (RASL) picture, an alternative set of CPB parameters may be used. Otherwise, a default set of CPB parameters is used for the HRD. If the default set of CPB parameters is used when an alternative set should have been selected, the CPB may overflow.

在一些实例中，给定CRA图片或BLA图片可具有在原始位流中的相关联的TFD图片，且TFD图片可由外部装置自原始位流移除。外部装置可包括包含于流式处理服务器、中间网络元件或另一网络实体中的处理装置。然而，外部装置可能不能改变给定CRA图片或BLA图片的所发信的类型以反映相关联的TFD图片的移除。在此状况下，可能基于CRA图片或BLA图片的在原始位流中所发信的类型而选择CPB参数的默认集合。此情形可导致CPB溢出，这是因为TFD图片已被外部装置移除，使得图片不再具有相关联的TFD图片，且应该将CPB参数的替代性集合用于HRD。In some examples, a given CRA picture or BLA picture may have an associated TFD picture in the original bitstream, and the TFD picture may be removed from the original bitstream by an external device. The external device may include a processing device included in a streaming server, an intermediate network element, or another network entity. However, the external device may not be able to change the signaled type of a given CRA picture or BLA picture to reflect the removal of the associated TFD picture. In this case, a default set of CPB parameters may be selected based on the type of the CRA picture or BLA picture signaled in the original bitstream. This situation may result in a CPB overflow because the TFD picture has been removed by the external device, such that the picture no longer has an associated TFD picture, and an alternative set of CPB parameters should be used for HRD.

本发明描述用于选择用以定义视频编码器20及/或视频解码器30的用于视频位流中的CRA图片或BLA图片的CPB的CPB参数的技术。根据所述技术，视频解码器30接收表示多个图片的位流，所述多个图片包含一或多个CRA图片或BLA图片；且还接收消息，所述消息指示是否针对CRA图片或BLA图片中的至少一者使用CPB参数的替代性集合。所述消息可接收自外部装置，例如，包含于流式处理服务器、中间网络元件或另一网络实体中的处理装置。This disclosure describes techniques for selecting CPB parameters used to define a CPB for a CRA picture or a BLA picture in a video bitstream for video encoder 20 and/or video decoder 30. According to the techniques, video decoder 30 receives a bitstream representing a plurality of pictures, the plurality of pictures including one or more CRA pictures or BLA pictures, and also receives a message indicating whether to use an alternative set of CPB parameters for at least one of the CRA pictures or the BLA pictures. The message may be received from an external device, such as a processing device included in a streaming server, an intermediate network element, or another network entity.

视频解码器30基于接收到的消息而设定经定义以指示用于CRA图片或BLA图片中的给定一者的CPB参数的集合的变量。视频解码器30接着基于针对CRA图片或BLA图片中的所述给定一者的变量来选择用于图片的CPB参数的集合。在一些状况下，视频解码器30可设定针对CRA图片或BLA图片中的给定一者的网络抽象层(NAL)单元类型，且可基于针对给定图片的NAL单元类型及变量来选择用于图片的CPB参数的集合。Based on the received message, video decoder 30 sets a variable defined to indicate a set of CPB parameters for a given one of a CRA picture or a BLA picture. Video decoder 30 then selects a set of CPB parameters for the picture based on the variable for the given one of the CRA picture or the BLA picture. In some cases, video decoder 30 may set a network abstraction layer (NAL) unit type for the given one of the CRA picture or the BLA picture and may select a set of CPB parameters for the picture based on the NAL unit type and the variable for the given picture.

将CPB参数的所选择集合应用于包含于视频解码器30中的CPB以确保CPB在视频解码期间将不溢出。视频编码器20可经配置以执行类似操作且将CPB参数的所选择集合应用于包含于视频编码器20中的CPB，以确保包含于视频编码器20中的CPB在视频编码期间将不溢出，且包含于视频解码器30中的CPB在接收到由视频编码器20产生的经编码位流时将不溢出。The selected set of CPB parameters is applied to the CPB included in video decoder 30 to ensure that the CPB will not overflow during video decoding. Video encoder 20 may be configured to perform similar operations and apply the selected set of CPB parameters to the CPB included in video encoder 20 to ensure that the CPB included in video encoder 20 will not overflow during video encoding, and that the CPB included in video decoder 30 will not overflow when receiving an encoded bitstream generated by video encoder 20.

图2是说明可实施本发明中所描述的技术的视频编码器20的实例的框图。视频编码器20可执行视频切片内的视频块的帧内译码及帧间译码。帧内译码依赖于空间预测以减少或移除给定视频帧或图片内的视频的空间冗余。帧间译码依赖于时间预测以减少或移除视频序列的邻近帧或图片内的视频的时间冗余。帧内模式(I模式)可指代若干基于空间的译码模式中的任一者。帧间模式(例如，单向预测(P模式)或双向预测(B模式))可指代若干基于时间的译码模式中的任一者。FIG2 is a block diagram illustrating an example of a video encoder 20 that may implement the techniques described in this disclosure. Video encoder 20 may perform intra- and inter-coding of video blocks within a video slice. Intra-coding relies on spatial prediction to reduce or remove spatial redundancy in video within a given video frame or picture. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy in video within adjacent frames or pictures of a video sequence. Intra-mode (I-mode) may refer to any of several spatially-based coding modes. Inter-mode (e.g., unidirectional prediction (P-mode) or bidirectional prediction (B-mode)) may refer to any of several temporally-based coding modes.

如图2中所示，视频编码器20接收待编码的视频帧内的当前视频块。在图2的实例中，视频编码器20包含模式选择单元40、求和器50、变换处理单元52、量化单元54、熵编码单元56、经解码图片缓冲器(DPB)64及经译码图片缓冲器(CPB)66。模式选择单元40又包含运动补偿单元44、运动估计单元42、帧内预测处理单元46及分割单元48。为了视频块重构，视频编码器20亦包含反量化单元58、反变换处理单元60及求和器62。也可包含解块滤波器(图2中未展示)以滤波块边界从而自经重构的视频移除块效应伪影。如果需要，则解块滤波器通常将对求和器62的输出进行滤波。除解块滤波器外，也可使用额外滤波器(回路内或回路后)。为了简洁起见未展示此些滤波器，但如果需要，此些滤波器可对求和器50的输出进行滤波(作为回路内滤波器)。As shown in FIG2 , video encoder 20 receives a current video block within a video frame to be encoded. In the example of FIG2 , video encoder 20 includes a mode select unit 40, a summer 50, a transform processing unit 52, a quantization unit 54, an entropy encoding unit 56, a decoded picture buffer (DPB) 64, and a coded picture buffer (CPB) 66. Mode select unit 40, in turn, includes a motion compensation unit 44, a motion estimation unit 42, an intra-prediction processing unit 46, and a partitioning unit 48. For video block reconstruction, video encoder 20 also includes an inverse quantization unit 58, an inverse transform processing unit 60, and summer 62. A deblocking filter (not shown in FIG2 ) may also be included to filter block boundaries, thereby removing blocking artifacts from the reconstructed video. If necessary, the deblocking filter will typically filter the output of summer 62. In addition to the deblocking filter, additional filters (in-loop or post-loop) may also be used. Such filters are not shown for simplicity, but if desired, such filters may filter the output of summer 50 (as in-loop filters).

在编码过程期间，视频编码器20接收待译码的视频帧或切片。可将帧或切片划分成多个视频块。运动估计单元42及运动补偿单元44相对于一或多个参考帧中的一或多个块来执行经接收视频块的帧间预测性译码，以提供时间预测。帧内预测处理单元46可替代性地相对于与待译码的块在相同的帧或切片中的一或多个相邻块执行对接收到的视频块的帧内预测性译码以提供空间预测。视频编码器20可执行多个译码遍次(例如)以选择用于视频数据的每一块的适当译码模式。During the encoding process, video encoder 20 receives a video frame or slice to be coded. The frame or slice may be divided into multiple video blocks. Motion estimation unit 42 and motion compensation unit 44 perform inter-predictive coding of the received video block relative to one or more blocks in one or more reference frames to provide temporal prediction. Intra-prediction processing unit 46 may alternatively perform intra-predictive coding of the received video block relative to one or more neighboring blocks in the same frame or slice as the block to be coded to provide spatial prediction. Video encoder 20 may perform multiple coding passes, for example, to select an appropriate coding mode for each block of video data.

此外，分割单元48可基于在先前译码遍次中对先前分割方案的评估而将视频数据的块分割成子块。举例来说，分割单元48可最初将一帧或切片分割成LCU，且基于速率-失真分析(例如，速率-失真优化)来将所述LCU中的每一者分割成子CU。模式选择单元40可进一步产生指示LCU至子CU的分割的四叉树数据结构。四叉树的叶节点CU可包含一或多个PU及一或多个TU。Furthermore, partition unit 48 may partition blocks of video data into sub-blocks based on evaluation of previous partitioning schemes in previous coding passes. For example, partition unit 48 may initially partition a frame or slice into LCUs and, based on rate-distortion analysis (e.g., rate-distortion optimization), partition each of the LCUs into sub-CUs. Mode select unit 40 may further generate a quadtree data structure indicating the partitioning of the LCUs into sub-CUs. A leaf node CU of the quadtree may include one or more PUs and one or more TUs.

模式选择单元40可(例如，基于错误结果)选择译码模式(帧内或帧间)中的一者，且将所得的经帧内或帧间译码的块提供至求和器50以产生残余块数据，且提供至求和器62以重构经编码的块以用作参考帧。模式选择单元40亦将语法元素(例如运动向量、帧内模式指示符、分割信息及其它此些语法信息)提供至熵编码单元56。Mode select unit 40 may select one of the coding modes (intra or inter) (e.g., based on the error result) and provide the resulting intra- or inter-coded block to summer 50 to generate residual block data and to summer 62 to reconstruct the encoded block for use as a reference frame. Mode select unit 40 also provides syntax elements, such as motion vectors, intra-mode indicators, partition information, and other such syntax information, to entropy encoding unit 56.

运动估计单元42及运动补偿单元44可高度集成，但为概念性目的而被分别说明。由运动估计单元42执行的运动估计为产生运动向量的过程，运动向量估计视频块的运动。举例来说，运动向量可指示在当前视频帧或图片内的视频块的PU相对于在参考帧(或其它经译码单元)内的预测性块(其相对于所述当前帧(或其它经译码单元)内的正被译码的当前块)的位移。预测性块为被发现在像素差方面紧密地匹配于待译码块的块，所述像素差可通过绝对差总和(SAD)、平方差总和(SSD)或其它差量度予以判定。在一些实例中，视频编码器20可计算存储于DPB 64中的参考图片的次整数像素位置的值。举例来说，视频编码器20可内插参考图片的四分之一像素位置、八分之一像素位置或其它分率像素位置的值。因此，运动估计单元42可执行相对于全像素位置及分率像素位置的运动搜寻，且以分率像素精度输出运动向量。Motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are described separately for conceptual purposes. Motion estimation, performed by motion estimation unit 42, is the process of generating motion vectors, which estimate the motion of video blocks. For example, a motion vector may indicate the displacement of a PU of a video block within a current video frame or picture relative to a predictive block within a reference frame (or other coded unit) relative to the current block being coded within the current frame (or other coded unit). A predictive block is a block that is found to closely match the block to be coded in terms of pixel difference, which can be determined by sum of absolute difference (SAD), sum of squared difference (SSD), or other difference metrics. In some examples, video encoder 20 may calculate values for sub-integer pixel positions of a reference picture stored in DPB 64. For example, video encoder 20 may interpolate values for quarter-pixel positions, eighth-pixel positions, or other fractional pixel positions of a reference picture. Thus, motion estimation unit 42 may perform motion searches relative to full-pixel positions and fractional pixel positions and output motion vectors with fractional pixel precision.

运动估计单元42通过比较帧间译码切片中的视频块的PU的位置与参考图片的预测性块的位置而计算所述PU的运动向量。参考图片可为选自第一参考图片列表(列表0)或第二参考图片列表(列表1)，前述列表中的每一者识别存储于DPB 64中的一或多个参考图片。运动估计单元42将所计算的运动向量发送至熵编码单元56及运动补偿单元44。Motion estimation unit 42 calculates a motion vector for a PU of a video block in an inter-coded slice by comparing the position of the PU to the position of a predictive block of a reference picture. The reference picture may be selected from a first reference picture list (List 0) or a second reference picture list (List 1), each of which identifies one or more reference pictures stored in DPB 64. Motion estimation unit 42 sends the calculated motion vector to entropy encoding unit 56 and motion compensation unit 44.

由运动补偿单元44执行的运动补偿可涉及到基于由运动估计单元42判定的运动向量来提取或产生预测性块。再一次，在一些实例中，运动估计单元42及运动补偿单元44可在功能上进行集成。在接收到当前视频块的PU的运动向量时，运动补偿单元44可在参考图片列表中的一者中找到运动向量所指向的预测性块。求和器50通过自正被译码的当前视频块的像素值减去预测性块的像素值，从而形成像素差值来形成残余视频块，如下文所论述。一般来说，运动估计单元42执行关于明度分量的运动估计，且运动补偿单元44将基于所述明度分量所计算的运动向量用于色度分量与明度分量两者。模式选择单元40也可产生与视频块及视频切片相关联的语法元素以供视频解码器30在解码视频切片的视频块时使用。Motion compensation performed by motion compensation unit 44 may involve extracting or generating a predictive block based on the motion vector determined by motion estimation unit 42. Again, in some examples, motion estimation unit 42 and motion compensation unit 44 may be functionally integrated. Upon receiving the motion vector for the PU of the current video block, motion compensation unit 44 may locate the predictive block to which the motion vector points in one of the reference picture lists. Summer 50 forms a residual video block by subtracting the pixel values of the predictive block from the pixel values of the current video block being coded, forming pixel difference values, as discussed below. In general, motion estimation unit 42 performs motion estimation for the luma component, and motion compensation unit 44 uses the motion vector calculated based on the luma component for both chroma and luma components. Mode select unit 40 may also generate syntax elements associated with video blocks and video slices for use by video decoder 30 when decoding video blocks of the video slices.

如上文所描述，作为由运动估计单元42及运动补偿单元44执行的帧间预测的替代例，帧内预测处理单元46可对当前块进行帧内预测。详细地说，帧内预测处理单元46可判定帧内预测模式以用以编码当前块。在一些实例中，帧内预测处理单元46可(例如)在单独编码遍次期间使用各种帧内预测模式编码当前块，且帧内预测处理单元46(或在一些实例中，模式选择单元40)可自经测试模式中选择将使用的适当帧内预测模式。As described above, intra-prediction processing unit 46 may intra-predict a current block, as an alternative to the inter-prediction performed by motion estimation unit 42 and motion compensation unit 44. In particular, intra-prediction processing unit 46 may determine an intra-prediction mode to use to encode the current block. In some examples, intra-prediction processing unit 46 may encode the current block using various intra-prediction modes, e.g., during separate encoding passes, and intra-prediction processing unit 46 (or, in some examples, mode select unit 40) may select an appropriate intra-prediction mode to use from among the tested modes.

举例来说，帧内预测处理单元46可使用针对各种经测试的帧内预测模式的速率-失真分析而计算速率-失真值，且在经测试模式当中选择具有最佳速率-失真特性的帧内预测模式。速率-失真分析通常判定经编码块与经编码以产生所述经编码块的原始未经编码块之间的失真(或错误)的量以及用以产生经编码块的位速率(即，位计数)。帧内预测处理单元46可根据各个经编码块的失真及速率计算比率，以判定哪一帧内预测模式对于所述块展现最佳速率-失真值。For example, intra-prediction processing unit 46 may calculate rate-distortion values using a rate-distortion analysis for various tested intra-prediction modes and select the intra-prediction mode with the best rate-distortion characteristics among the tested modes. Rate-distortion analysis typically determines the amount of distortion (or error) between an encoded block and the original, unencoded block that was encoded to produce the encoded block, as well as the bit rate (i.e., bit count) used to produce the encoded block. Intra-prediction processing unit 46 may calculate a ratio based on the distortion and rate for each encoded block to determine which intra-prediction mode exhibits the best rate-distortion value for that block.

在选择块的帧内预测模式之后，帧内预测处理单元46可将指示块的所选择帧内预测模式的信息提供至熵编码单元56。熵编码单元56可编码指示所选择帧内预测模式的信息。视频编码器20可在经发射的位流中包含配置数据，其可包含多个帧内预测模式索引表及多个经修改的帧内预测模式索引表(亦被称作码字映射表)、各种块的编码上下文的定义及将用于所述上下文中的每一者的最有可能的帧内预测模式、帧内预测模式索引表及经修改的帧内预测模式索引表的指示。After selecting an intra-prediction mode for a block, intra-prediction processing unit 46 may provide information indicating the selected intra-prediction mode for the block to entropy encoding unit 56. Entropy encoding unit 56 may encode the information indicating the selected intra-prediction mode. Video encoder 20 may include configuration data in the transmitted bitstream, which may include multiple intra-prediction mode index tables and multiple modified intra-prediction mode index tables (also referred to as codeword mapping tables), definitions of encoding contexts for various blocks and the most probable intra-prediction mode to be used for each of the contexts, the intra-prediction mode index tables, and indications of the modified intra-prediction mode index tables.

视频编码器20通过自正被译码的原始视频块减去来自模式选择单元40的预测数据而形成残余视频块。求和器50表示执行此减法运算的一或多个组件。变换处理单元52将例如离散余弦变换(DCT)或概念上类似的变换的变换应用于残余块，从而产生包括残余变换系数值的视频块。变换处理单元52可执行概念上类似于DCT的其它变换。也可使用小波变换、整数变换、子频带变换或其它类型的变换。在任何状况下，变换处理单元52将变换应用于残余块，从而产生残余变换系数的块。所述变换可将残余信息自像素值域转换至变换域(例如频域)。变换处理单元52可将所得的变换系数发送至量化单元54。量化单元54量化所述变换系数以进一步减少位速率。所述量化过程可减少与所述系数中的一些或所有系数相关联的位深度。可通过调整量化参数而修改量化程度。在一些实例中，量化单元54可接着执行对包含经量化的变换系数的矩阵的扫描。或者，熵编码单元56可执行所述扫描。Video encoder 20 forms a residual video block by subtracting the prediction data from mode select unit 40 from the original video block being coded. Summer 50 represents the one or more components that perform this subtraction operation. Transform processing unit 52 applies a transform, such as a discrete cosine transform (DCT) or a conceptually similar transform, to the residual block, producing a video block comprising residual transform coefficient values. Transform processing unit 52 may perform other transforms conceptually similar to the DCT. Wavelet transforms, integer transforms, subband transforms, or other types of transforms may also be used. In any case, transform processing unit 52 applies a transform to the residual block, producing a block of residual transform coefficients. The transform may convert the residual information from the pixel value domain to a transform domain (e.g., the frequency domain). Transform processing unit 52 may send the resulting transform coefficients to quantization unit 54. Quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The degree of quantization may be modified by adjusting a quantization parameter. In some examples, quantization unit 54 may then perform a scan of the matrix including the quantized transform coefficients. Alternatively, entropy encoding unit 56 may perform the scan.

在量化之后，熵编码单元56熵译码经量化的变换系数。举例来说，熵编码单元56可执行上下文自适应性可变长度译码(CAVLC)、上下文自适应性二进制算术译码(CABAC)、基于语法的上下文自适应性二进制算术译码(SBAC)、机率区间分割熵(PIPE)译码或另一熵译码技术。在基于上下文的熵译码的状况下，上下文可基于相邻块。在由熵编码单元56进行的熵译码之后，可或多或少临时地在CPB 66中缓冲或存储经编码的位流，将经编码的位流发射至另一装置(例如，视频解码器30)或存档以供稍后发射或检索。After quantization, entropy encoding unit 56 entropy codes the quantized transform coefficients. For example, entropy encoding unit 56 may perform context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding, or another entropy coding technique. In the case of context-based entropy coding, the context may be based on neighboring blocks. After entropy coding by entropy encoding unit 56, the encoded bitstream may be buffered or stored, more or less temporarily, in CPB 66, transmitted to another device (e.g., video decoder 30), or archived for later transmission or retrieval.

反量化单元58及反变换处理单元60分别应用反量化及反变换以在像素域中重构残余块(例如)以供稍后用作参考块。运动补偿单元44可通过将残余块加至DPB 64的帧中的一者的预测性块来计算参考块。运动补偿单元44也可将一或多个内插滤波器应用于经重构的残余块以计算次整数像素值以供用于运动估计中。求和器62将经重构的残余块加至由运动补偿单元44产生的经运动补偿预测块，以产生经重构的视频块以供存储于DPB 64中。经重构的视频块可由运动估计单元42及运动补偿单元44用作参考块以对后续视频帧中的块进行帧间译码。Inverse quantization unit 58 and inverse transform processing unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain, for example, for later use as a reference block. Motion compensation unit 44 may calculate a reference block by adding the residual block to a predictive block of one of the frames of DPB 64. Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for use in motion estimation. Summer 62 adds the reconstructed residual block to the motion compensated prediction block produced by motion compensation unit 44 to produce a reconstructed video block for storage in DPB 64. The reconstructed video block may be used by motion estimation unit 42 and motion compensation unit 44 as a reference block for inter-coding a block in a subsequent video frame.

DPB 64可为数据存储装置或可包含于数据存储装置中，例如能够存储数据的任何永久或易失性存储器，例如同步动态随机存取存储器(SDRAM)、嵌入式动态随机存取存储器(eDRAM)或静态随机存取存储器(SRAM)。DPB 64可根据本发明中所描述的实例经译码图片缓冲器及/或经解码图片缓冲器行为的任何组合来操作。举例来说，视频编码器20可经配置以根据假想参考解码器(HRD)操作。在此状况下，包含于视频编码器20中的DPB 64可根据HRD的缓冲模型由包含CPB参数及DPB参数的HRD参数来定义。DPB 64 may be or may be included in a data storage device, such as any permanent or volatile memory capable of storing data, such as synchronous dynamic random access memory (SDRAM), embedded dynamic random access memory (eDRAM), or static random access memory (SRAM). DPB 64 may operate according to any combination of the example coded picture buffer and/or decoded picture buffer behaviors described in this disclosure. For example, video encoder 20 may be configured to operate according to a hypothetical reference decoder (HRD). In this case, DPB 64 included in video encoder 20 may be defined by HRD parameters including CPB parameters and DPB parameters according to the HRD's buffering model.

类似地，CPB 66可为数据存储装置或可包含于数据存储装置中，例如能够存储数据的任何永久或易失性存储器，例如同步动态随机存取存储器(SDRAM)、嵌入式动态随机存取存储器(eDRAM)或静态随机存取存储器(SRAM)。尽管被展示为形成视频编码器20的部分，但在一些实例中，CPB 66可形成在视频编码器20外部的装置、单元或模块的部分。举例来说，CPB 66可形成在视频编码器20外部的流调度器单元(例如，递送调度器或假想流调度器(HSS))的部分。在视频编码器20经配置以根据HRD操作的状况下，包含于视频编码器20中的CPB 66可根据HRD的缓冲模型由包含初始CPB移除延迟及偏移的CPB参数的HRD参数来定义。Similarly, CPB 66 may be or may be included in a data storage device, such as any permanent or volatile memory capable of storing data, such as synchronous dynamic random access memory (SDRAM), embedded dynamic random access memory (eDRAM), or static random access memory (SRAM). Although shown as forming part of video encoder 20, in some examples, CPB 66 may form part of a device, unit, or module external to video encoder 20. For example, CPB 66 may form part of a stream scheduler unit (e.g., a delivery scheduler or a virtual stream scheduler (HSS)) external to video encoder 20. In the event that video encoder 20 is configured to operate according to an HRD, CPB 66 included in video encoder 20 may be defined by HRD parameters including CPB parameters for initial CPB removal delay and offset according to the buffering model of the HRD.

根据本发明的技术，视频编码器20可将CPB参数的默认集合或是替代性集合应用于CPB 66，以确保CPB 66在视频数据的编码期间不溢出，且包含于视频解码器30中的CPB在接收到由视频编码器20产生的经编码位流时不溢出。如果在应选择替代性集合时使用了默认集合，则包含于视频编码器20中的CPB 66或包含于视频解码器30中的CPB可溢出。适当CPB参数的选择主要是当将例如清洁随机存取(CRA)图片或断链存取(BLA)图片的随机存取点(RAP)图片用以初始化HRD时的关注事项。因此，所述技术可在视频译码中提供对RAP图片的改善的支持。According to the techniques of this disclosure, video encoder 20 may apply either a default set or an alternative set of CPB parameters to CPB 66 to ensure that CPB 66 does not overflow during encoding of video data and that the CPB included in video decoder 30 does not overflow when receiving an encoded bitstream generated by video encoder 20. If the default set is used when an alternative set should be selected, CPB 66 included in video encoder 20 or the CPB included in video decoder 30 may overflow. Selection of appropriate CPB parameters is primarily a concern when random access point (RAP) pictures, such as clean random access (CRA) pictures or broken link access (BLA) pictures, are used to initialize HRDs. Thus, the techniques may provide improved support for RAP pictures in video coding.

视频编码器20可经配置以接收表示多个图片的位流，所述多个图片包含一或多个CRA图片或BLA图片；且还接收消息，所述消息指示是否针对CRA图片或BLA图片中的至少一者使用CPB参数的替代性集合。在一些状况下，可在视频编码器20的解码部分(即，反量化单元58及反变换处理单元60)处直接自视频编码器20的编码部分(例如，熵编码单元56或CPB66)接收到位流。可自外部装置，例如，包含于流式处理服务器、中间网络元件或另一网络实体中的处理装置接收所述消息。Video encoder 20 may be configured to receive a bitstream representing a plurality of pictures, the plurality of pictures including one or more CRA pictures or BLA pictures, and also receive a message indicating whether an alternative set of CPB parameters is used for at least one of the CRA pictures or the BLA pictures. In some cases, the bitstream may be received directly from the encoding portion of video encoder 20 (e.g., entropy encoding unit 56 or CPB 66) at the decoding portion of video encoder 20 (i.e., inverse quantization unit 58 and inverse transform processing unit 60). The message may be received from an external device, such as a processing device included in a streaming server, an intermediate network element, or another network entity.

视频编码器20基于接收到的消息设定变量，所述变量经定义以指示用于CRA图片或BLA图片中的给定一者的CPB参数的集合。视频编码器20接着基于针对图片的变量来选择用于CRA图片或BLA图片中的所述给定一者的CPB参数的集合。视频编码器20将CPB参数的所选择集合应用于包含于视频编码器20中的CPB 66，以确保CPB 66在视频编码期间将不溢出，且确保包含于视频解码器30中的CPB在接收到由视频编码器20产生的经编码位流时将不溢出。在一些状况下，视频编码器20可设定针对CRA图片或BLA图片中的给定一者的网络抽象层(NAL)单元类型，且可基于针对图片的NAL单元类型及变量来选择用于给定图片的CPB参数的集合。关于图3的视频解码器30更详细地描述用于RAP图片的CPB参数选择过程。Based on the received message, video encoder 20 sets a variable defined to indicate a set of CPB parameters for a given one of a CRA picture or a BLA picture. Video encoder 20 then selects a set of CPB parameters for the given one of the CRA picture or the BLA picture based on the variable for the picture. Video encoder 20 applies the selected set of CPB parameters to CPB 66 included in video encoder 20 to ensure that CPB 66 will not overflow during video encoding and to ensure that the CPB included in video decoder 30 will not overflow when receiving the encoded bitstream generated by video encoder 20. In some cases, video encoder 20 may set a network abstraction layer (NAL) unit type for the given one of the CRA picture or the BLA picture and may select a set of CPB parameters for the given picture based on the NAL unit type and the variable for the picture. The CPB parameter selection process for RAP pictures is described in more detail with respect to video decoder 30 of FIG.

图3是说明可实施本发明中所描述的技术的视频解码器30的实例的框图。在图3的实例中，视频解码器30包含：熵解码单元70、包含运动补偿单元72及帧内预测处理单元74的预测处理单元71、反量化单元76、反变换处理单元78、求和器80、经译码图片缓冲器(CPB)68及经解码图片缓冲器(DPB)82。在一些实例中，视频解码器30可执行与关于来自图2的视频编码器20所描述的编码遍次大体互逆的解码遍次。3 is a block diagram illustrating an example of a video decoder 30 that may implement the techniques described in this disclosure. In the example of FIG3 , video decoder 30 includes an entropy decoding unit 70, a prediction processing unit 71 including a motion compensation unit 72 and an intra-prediction processing unit 74, an inverse quantization unit 76, an inverse transform processing unit 78, a summer 80, a coded picture buffer (CPB) 68, and a decoded picture buffer (DPB) 82. In some examples, video decoder 30 may perform a decoding pass that is generally reciprocal to the encoding pass described with respect to video encoder 20 from FIG2 .

在解码过程期间，视频解码器30自视频编码器20接收经编码的视频位流，所述视频位流表示经编码视频切片的视频块及相关联的语法元素。视频解码器30可自网络实体29接收经编码的视频位流。网络实体29可例如是流式处理服务器、媒体感知网络元件(media-aware network element)(MANE)、视频编辑器/接合器、中间网络元件，或经配置以实施上文所描述的技术中的一或多者的其它此类装置。网络实体29可包含经配置以执行本发明的技术的外部装置。如上文所描述，本发明中所描述的技术中的一些技术可在网络实体29将经编码视频位流发射至视频解码器30之前由网络实体29来实施。在一些视频解码系统中，网络实体29及视频解码器30可为分离装置的部分，而在其它个例中，关于网络实体29描述的功能性可由包括视频解码器30的同一装置来执行。During the decoding process, video decoder 30 receives an encoded video bitstream from video encoder 20, which represents video blocks and associated syntax elements of an encoded video slice. Video decoder 30 may receive the encoded video bitstream from network entity 29. Network entity 29 may be, for example, a streaming server, a media-aware network element (MANE), a video editor/joiner, an intermediate network element, or other such device configured to implement one or more of the techniques described above. Network entity 29 may include external devices configured to perform the techniques of this disclosure. As described above, some of the techniques described in this disclosure may be implemented by network entity 29 before network entity 29 transmits the encoded video bitstream to video decoder 30. In some video decoding systems, network entity 29 and video decoder 30 may be parts of separate devices, while in other cases, the functionality described with respect to network entity 29 may be performed by the same device that includes video decoder 30.

在熵解码单元70进行熵解码之前，可或多或少临时地在CPB 68中缓冲或存储位流。视频解码器30的熵解码单元70接着熵解码位流以产生经量化的系数、运动向量或帧内预测模式指示符及其它语法元素。熵解码单元70将运动向量及其它语法元素转发至运动补偿单元72。视频解码器30可接收视频切片级别及/或视频块级别的语法元素。The bitstream may be buffered or stored more or less temporarily in CPB 68 prior to entropy decoding by entropy decoding unit 70. Entropy decoding unit 70 of video decoder 30 then entropy decodes the bitstream to generate quantized coefficients, motion vectors or intra-prediction mode indicators, and other syntax elements. Entropy decoding unit 70 forwards the motion vectors and other syntax elements to motion compensation unit 72. Video decoder 30 may receive syntax elements at the video slice level and/or the video block level.

当视频切片经译码为帧内译码(I)切片时，帧内预测处理单元74可基于经发信的帧内预测模式及来自当前帧或图片的先前经解码块的数据而产生当前视频切片的视频块的预测数据。当视频帧被译码为帧间译码(即，B或P)切片时，运动补偿单元72基于接收自熵解码单元70的运动向量及其它语法元素而产生当前视频切片的视频块的预测性块。可根据参考图片列表中的一者内的参考图片中的一者产生预测性块。视频解码器30可基于存储于DPB 82中的参考图片使用默认构造技术构造参考帧列表(列表0及列表1)。When the video slice is coded as an intra-coded (I) slice, intra-prediction processing unit 74 may generate prediction data for the video block of the current video slice based on the signaled intra-prediction mode and data from previously decoded blocks of the current frame or picture. When the video frame is coded as an inter-coded (i.e., B or P) slice, motion compensation unit 72 generates predictive blocks for the video block of the current video slice based on motion vectors and other syntax elements received from entropy decoding unit 70. The predictive blocks may be generated according to one of the reference pictures within one of the reference picture lists. Video decoder 30 may construct the reference frame lists (List 0 and List 1) using default construction techniques based on the reference pictures stored in DPB 82.

运动补偿单元72通过解析运动向量及其它语法元素而判定当前视频切片的视频块的预测信息，且使用所述预测信息以产生正被解码的当前视频块的预测性块。举例来说，运动补偿单元72使用接收到的语法元素中的一些来判定用以译码视频切片的视频块的预测模式(例如，帧内或帧间预测)、帧间预测切片类型(例如，B切片或P切片)、切片的参考图片列表中的一或多者的构造信息、切片的每一帧间编码视频块的运动向量、切片的每一帧间译码视频块的帧间预测状态及其它信息以解码当前视频切片中的视频块。Motion compensation unit 72 determines prediction information for video blocks of the current video slice by parsing motion vectors and other syntax elements, and uses the prediction information to generate predictive blocks for the current video block being decoded. For example, motion compensation unit 72 uses some of the received syntax elements to determine the prediction mode (e.g., intra or inter prediction) used to code the video blocks of the video slice, the inter-prediction slice type (e.g., B slice or P slice), construction information for one or more of the slice's reference picture lists, motion vectors for each inter-coded video block of the slice, the inter-prediction state for each inter-coded video block of the slice, and other information to decode the video blocks in the current video slice.

运动补偿单元72也可基于内插滤波器执行内插。运动补偿单元72可使用如由视频编码器20在视频块的编码期间使用的内插滤波器来计算参考块的次整数像素的内插值。在此状况下，运动补偿单元72可自接收的语法元素判定由视频编码器20使用的内插滤波器且使用所述内插滤波器来产生预测性块。Motion compensation unit 72 may also perform interpolation based on interpolation filters. Motion compensation unit 72 may calculate interpolated values for sub-integer pixels of a reference block using interpolation filters as used by video encoder 20 during encoding of the video block. In this case, motion compensation unit 72 may determine the interpolation filters used by video encoder 20 from received syntax elements and use the interpolation filters to produce a predictive block.

反量化单元76反量化(即，解量化)提供于位流中且由熵解码单元70解码的经量化变换系数。反量化过程可包含使用由视频解码器30计算的用于视频切片中的每一视频块的量化参数QP_Y来判定应应用的量化程度及同样地反量化程度。反变换处理单元78将例如反DCT、反整数变换或概念上类似的反变换过程的反变换应用于变换系数，以便产生像素域中的残余块。Inverse quantization unit 76 inverse quantizes (i.e., dequantizes) the quantized transform coefficients provided in the bitstream and decoded by entropy decoding unit 70. The inverse quantization process may include determining the degree of quantization, and likewise the degree of inverse quantization, that should be applied using a quantization parameter QP _Y for each video block in the video slice calculated by video decoder 30. Inverse transform processing unit 78 applies an inverse transform, such as an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process, to the transform coefficients in order to produce residual blocks in the pixel domain.

在运动补偿单元72基于运动向量及其它语法元素产生当前视频块的预测性块之后，视频解码器30通过将来自反变换单元78的残余块与由运动补偿单元72产生的对应预测性块求和而形成经解码的视频块。求和器90表示执行此求和运算的一或多个组件。如果需要，也可应用解块滤波器以对经解码块滤波以便移除块效应伪影。其它回路滤波器(译码回路中或译码回路后)也可用以使像素转变平滑，或以其它方式改善视频质量。接着将给定帧或图片中的经解码视频块存储于DPB 82中，DPB 82存储用于后续运动补偿的参考图片。DPB82亦存储经解码的视频供稍后再现于例如图1的显示装置32的显示装置上。After motion compensation unit 72 generates a predictive block for the current video block based on the motion vectors and other syntax elements, video decoder 30 forms a decoded video block by summing the residual block from inverse transform unit 78 with the corresponding predictive block generated by motion compensation unit 72. Summer 90 represents the one or more components that perform this summing operation. If desired, a deblocking filter may also be applied to filter the decoded blocks to remove blocking artifacts. Other loop filters (in the decoding loop or after the decoding loop) may also be used to smooth pixel transitions or otherwise improve video quality. The decoded video blocks in a given frame or picture are then stored in DPB 82, which stores reference pictures used for subsequent motion compensation. DPB 82 also stores the decoded video for later reproduction on a display device, such as display device 32 of FIG. 1 .

DPB 82可为数据存储装置或可包含于数据存储装置中，例如能够存储数据的任何永久或易失性存储器，例如同步动态随机存取存储器(SDRAM)、嵌入式动态随机存取存储器(eDRAM)或静态随机存取存储器(SRAM)。DPB 82可根据本发明中所描述的实例经译码图片缓冲器及/或经解码图片缓冲器行为的任何组合来操作。举例来说，视频解码器30可经配置以根据假想参考解码器(HRD)操作。在此状况下，视频解码器30可解码用以根据HRD的缓冲模型定义DPB 82的HRD参数(包含CPB参数及DPB参数)。DPB 82 may be, or may be included in, a data storage device, such as any permanent or volatile memory capable of storing data, such as synchronous dynamic random access memory (SDRAM), embedded dynamic random access memory (eDRAM), or static random access memory (SRAM). DPB 82 may operate according to any combination of the example coded picture buffer and/or decoded picture buffer behaviors described in this disclosure. For example, video decoder 30 may be configured to operate according to a hypothetical reference decoder (HRD). In this case, video decoder 30 may decode HRD parameters (including CPB parameters and DPB parameters) that define DPB 82 according to the HRD's buffering model.

类似地，CPB 68可为数据存储装置或可包含于数据存储装置中，例如能够存储数据的任何永久或易失性存储器，例如同步动态随机存取存储器(SDRAM)、嵌入式动态随机存取存储器(eDRAM)或静态随机存取存储器(SRAM)。尽管被展示为形成视频解码器30的部分，但在一些实例中，CPB 68可形成在视频解码器30外部的装置、单元或模块的部分。举例来说，CPB 68可形成在视频解码器30外部的流调度器单元(例如，递送调度器或假想流调度器(HSS))的部分。在视频解码器30经配置以根据HRD操作的状况下，视频解码器30可解码用以根据HRD的缓冲模型定义CPB 68的包含初始CPB移除延迟及偏移的CPB参数的HRD参数。Similarly, CPB 68 may be or may be included in a data storage device, such as any permanent or volatile memory capable of storing data, such as synchronous dynamic random access memory (SDRAM), embedded dynamic random access memory (eDRAM), or static random access memory (SRAM). Although shown as forming part of video decoder 30, in some examples, CPB 68 may form part of a device, unit, or module external to video decoder 30. For example, CPB 68 may form part of a stream scheduler unit (e.g., a delivery scheduler or a virtual stream scheduler (HSS)) external to video decoder 30. In the event that video decoder 30 is configured to operate according to an HRD, video decoder 30 may decode HRD parameters, including CPB parameters for initial CPB removal delay and offset, used to define CPB 68 according to the buffering model of the HRD.

根据本发明的技术，视频解码器30可将CPB参数的默认集合或是替代性集合应用于CPB 68，以确保CPB 68在视频数据的解码期间不溢出。如果在应选择替代性集合时使用了默认集合，则包含于经配置以根据HRD操作的视频解码器中的CPB 68可溢出。适当CPB参数的选择主要是当将例如清洁随机存取(CRA)图片或断链存取(BLA)图片的随机存取点(RAP)图片用以初始化HRD时的关注事项。因此，所述技术可在视频译码中提供对RAP图片的改善的支持。According to the techniques of this disclosure, video decoder 30 may apply either a default set or an alternative set of CPB parameters to CPB 68 to ensure that CPB 68 does not overflow during decoding of video data. If the default set is used when an alternative set should be selected, CPB 68 included in a video decoder configured to operate according to an HRD may overflow. The selection of appropriate CPB parameters is primarily a concern when random access point (RAP) pictures, such as clean random access (CRA) pictures or broken link access (BLA) pictures, are used to initialize the HRD. Thus, the techniques may provide improved support for RAP pictures in video coding.

视频解码器30接收表示多个图片的位流，所述多个图片包含一或多个CRA图片或BLA图片；且还接收消息，所述消息指示是否针对CRA图片或BLA图片中的至少一者使用CPB参数的替代性集合。所述消息可接收自网络实体29或另一外部装置，例如，包含于流式处理服务器或中间网络元件中的处理装置。Video decoder 30 receives a bitstream representing a plurality of pictures, the plurality of pictures including one or more CRA pictures or BLA pictures, and also receives a message indicating whether an alternative set of CPB parameters is to be used for at least one of the CRA pictures or the BLA pictures. The message may be received from network entity 29 or another external device, such as a processing device included in a streaming server or an intermediate network element.

视频解码器30基于接收到的消息设定变量，所述变量经定义以指示用于CRA图片或BLA图片中的给定一者的CPB参数的集合。视频译码装置接着基于针对图片的变量来选择用于CRA图片或BLA图片中的所述给定一者的CPB参数的集合。视频解码器30将CPB参数的所选择集合应用于CPB 68以确保CPB 68在视频解码期间将不溢出。在一些状况下，视频解码器30可设定针对CRA图片或BLA图片中的给定一者的网络抽象层(NAL)单元类型。视频解码器30可将图片的NAL单元类型设定为所发信的，或可基于针对图片的变量来设定NAL单元类型。视频解码器30可接着基于针对图片的NAL单元类型及变量来选择用于给定图片的CPB参数的集合。Based on the received message, video decoder 30 sets a variable defined to indicate a set of CPB parameters for a given one of a CRA picture or a BLA picture. The video coding device then selects a set of CPB parameters for the given one of the CRA picture or the BLA picture based on the variable for the picture. Video decoder 30 applies the selected set of CPB parameters to CPB 68 to ensure that CPB 68 will not overflow during video decoding. In some cases, video decoder 30 may set a network abstraction layer (NAL) unit type for the given one of the CRA picture or the BLA picture. Video decoder 30 may set the NAL unit type for the picture to the signaled one, or may set the NAL unit type based on the variable for the picture. Video decoder 30 may then select a set of CPB parameters for the given picture based on the NAL unit type for the picture and the variable.

一般来说，本发明描述提供对RAP图片的改善的支持的技术，所述技术包含选择针对RAP图片的HRD参数及将CRA图片作为BLA图片进行处置的改善的方法。如上文所描述，视频译码标准包含ITU-T H.261、ISO/IEC MPEG-1Visual、ITU-T H.262或ISO/IEC MPEG-2Visual、ITU-T H.263、ISO/IEC MPEG-4Visual及ITU-T H.264(也称作ISO/IEC MPEG-4AVC)(包含其可缩放视频译码(SVC)及多视图视频译码(MVC)扩展)。此外，存在由ITU-T视频译码专业团体(VCEG)及ISO/IEC动画专业团体(MPEG)的关于视频译码的联合合作团队(JCT-VC)开发的新视频译码标准(即，高效视频译码(HEVC))。HEVC的新近工作草案(WD)(下文称作HEVC WD8)描述于2012年7月11日至20日瑞典斯德哥尔摩，ITU-T SG16WP3与ISO/IECJTC1/SC29/WG11的关于视频译码的联合合作团队(JCT-VC)第10次会议的文献JCTVC-J1003_d7，Bross等人的“高效率视频译码(HEVC)文本规范草案8(High Efficiency VideoCoding(HEVC)Text Specification Draft8)”中，自2012年9月20日起可在http：//phenix.int-evry.fr/jct/doc_end_user/documents/10_Stockholm/wg11/JCTVC-J1003-v8.zip处获得所述工作草案。In general, this disclosure describes techniques that provide improved support for RAP pictures, including improved methods for selecting HRD parameters for RAP pictures and treating CRA pictures as BLA pictures. As described above, video coding standards include ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual, and ITU-T H.264 (also known as ISO/IEC MPEG-4 AVC), including its scalable video coding (SVC) and multi-view video coding (MVC) extensions. In addition, there is a new video coding standard, High Efficiency Video Coding (HEVC), being developed by the Joint Collaborative Team on Video Coding (JCT-VC) of the ITU-T Video Coding Expert Group (VCEG) and the ISO/IEC Motion Picture Expert Group (MPEG). A recent working draft (WD) of HEVC (hereinafter referred to as HEVC WD8) is described in document JCTVC-J1003_d7, Bross et al., “High Efficiency Video Coding (HEVC) Text Specification Draft 8,” of the 10th Joint Collaboration Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, Stockholm, Sweden, July 11-20, 2012, available as of September 20, 2012 at http://phenix.int-evry.fr/jct/doc_end_user/documents/10_Stockholm/wg11/JCTVC-J1003-v8.zip.

随机存取指代自并非位流中的第一经译码图片的经译码图片开始对视频位流的解码。在例如广播及流式处理的许多视频应用中需要对位流的随机存取，(例如)供使用者在任何时间调谐至节目，在不同频道之间切换，跳至视频的指定部分，或切换至不同位流以进行位速率、帧速率、空间分辨率及其类似者的流调适。此特征是通过以规则间隔将随机存取图片或随机存取点插入至视频位流中许多次来启用。Random access refers to starting the decoding of a video bitstream from a coded picture that is not the first coded picture in the bitstream. Random access to the bitstream is required in many video applications such as broadcasting and streaming, for example, for users to tune to a program at any time, switch between different channels, jump to a specific part of the video, or switch to a different bitstream for stream adaptation of bit rate, frame rate, spatial resolution, and the like. This feature is enabled by inserting random access pictures or random access points into the video bitstream many times at regular intervals.

位流接合指代两个或两个以上位流或其部分的串连。举例来说，第一位流可附加有第二位流，其中可能具有对位流中的一或两者的一些修改以产生经接合位流。第二位流中的第一经译码图片亦被称作接合点。因此，在经接合位流中的接合点之后的图片发源于第二位流，而在经接合位流中的接合点之前的图片发源于第一位流。Bitstream splicing refers to the concatenation of two or more bitstreams or portions thereof. For example, a first bitstream may be appended to a second bitstream, possibly with some modifications to one or both of the bitstreams to produce a spliced bitstream. The first coded picture in the second bitstream is also referred to as the splice point. Thus, pictures after the splice point in the spliced bitstream originate from the second bitstream, while pictures before the splice point in the spliced bitstream originate from the first bitstream.

位流的接合由位流接合器来执行。位流接合器常常为轻量级的，且相较于编码器智能程度低得多。举例来说，位流接合器可能未配备有熵解码及编码能力。位流切换可用于自适应性流式处理环境中。在切换至的位流中的某图片处的位流切换操作实际上是位流接合操作，其中接合点是位流切换点(即，来自切换至的位流的第一图片)。The bitstream splicing is performed by a bitstream splicer. Bitstream splicers are often lightweight and much less intelligent than encoders. For example, a bitstream splicer may not be equipped with entropy decoding and encoding capabilities. Bitstream switching can be used in adaptive streaming environments. A bitstream switching operation at a picture in the bitstream being switched to is actually a bitstream splicing operation, where the splicing point is the bitstream switching point (i.e., the first picture from the bitstream being switched to).

如AVC或HEVC中指定的瞬时解码再新(IDR)图片可用于随机存取。然而，由于在解码次序上在IDR图片之后的图片无法使用IDR图片之前经解码的图片作为参考，因此依赖于IDR图片来进行随机存取的位流可具有显著较低的译码效率。为了改善译码效率，在HEVC中引入了清洁随机存取(CRA)图片的概念，以允许在解码次序上在CRA图片之后但在输出次序上在CRA图片之前的图片使用在CRA图片之前解码的图片作为参考图片。Instantaneous decoding refresh (IDR) pictures, as specified in AVC or HEVC, can be used for random access. However, since pictures following an IDR picture in decoding order cannot use pictures decoded before the IDR picture as references, bitstreams that rely on IDR pictures for random access can have significantly lower coding efficiency. To improve coding efficiency, the concept of clean random access (CRA) pictures was introduced in HEVC to allow pictures following a CRA picture in decoding order but preceding the CRA picture in output order to use pictures decoded before the CRA picture as reference pictures.

在解码次序上在CRA图片之后但在输出次序上在CRA图片之前的图片被称作与CRA图片相关联的前置图片或CRA图片的前置图片。如果解码自在当前CRA图片之前的IDR或CRA图片开始，则CRA图片的前置图片是可正确解码的。当发生自当前CRA图片的随机存取时，CRA图片的前置图片可为非可解码的。因此，前置图片通常在随机存取解码期间被舍弃。为了防止自可能不可用(视解码开始于何处)的参考图片的错误传播，在解码次序及输出次序上皆在CRA图片之后的所有图片不应使用在解码次序或是输出次序上在CRA图片之前的任何图片(包含前置图片)作为参考图片。A picture that follows a CRA picture in decoding order but precedes the CRA picture in output order is called a preceding picture associated with the CRA picture or a preceding picture of the CRA picture. A preceding picture of a CRA picture is correctly decodable if decoding starts from an IDR or CRA picture that precedes the current CRA picture. A preceding picture of a CRA picture may be non-decodable when random access occurs from the current CRA picture. Therefore, preceding pictures are typically discarded during random access decoding. To prevent error propagation from reference pictures that may be unavailable (depending on where decoding starts), all pictures that follow a CRA picture in both decoding and output order should not use any picture (including the preceding picture) that precedes the CRA picture in either decoding or output order as a reference picture.

在引入CRA图片之后，在HEVC中进一步引入了断链存取(BLA)图片的概念，且其是基于CRA图片的概念。BLA图片通常发源于在CRA图片的位置处接合的位流，且在经接合的位流中，接合点CRA图片被改变至BLA图片。IDR图片、CRA图片及BLA图片被统称作随机存取点(RAP)图片或帧内随机存取点(IRAP)图片。Following the introduction of CRA pictures, HEVC further introduced the concept of broken link access (BLA) pictures, which are based on the concept of CRA pictures. BLA pictures typically originate from a bitstream spliced at the location of a CRA picture, and in the spliced bitstream, the splice-point CRA picture is changed to a BLA picture. IDR pictures, CRA pictures, and BLA pictures are collectively referred to as random access point (RAP) pictures or intra random access point (IRAP) pictures.

BLA图片与CRA图片之间的主要差异的论述如下。对于CRA图片，如果解码开始于在解码次序上在CRA图片之前的RAP图片，则相关联的前置图片是可正确解码的；且当发生自CRA图片的随机存取时(即，当解码开始于CRA图片时，或换句话说当CRA图片是位流中的第一图片时)，相关联的前置图片可为非可正确解码的。对于BLA图片，相关联的前置图片可在所有状况下皆是非可解码的，即使在解码开始于在解码次序上在BLA图片之前的RAP图片时亦如此。The main differences between BLA pictures and CRA pictures are discussed below. For a CRA picture, if decoding starts with a RAP picture that precedes the CRA picture in decoding order, the associated preceding picture is correctly decodable; and when random access occurs from a CRA picture (i.e., when decoding starts with a CRA picture, or in other words, when the CRA picture is the first picture in the bitstream), the associated preceding picture may not be correctly decodable. For a BLA picture, the associated preceding picture may not be decodable in all cases, even when decoding starts with a RAP picture that precedes the BLA picture in decoding order.

对于特定CRA或BLA图片，即使在CRA或BLA图片是位流中的第一图片时，相关联的前置图片中的一些也是可正确解码的。此些前置图片被称作可解码前置图片(DLP)，且其它前置图片被称作非可解码前置图片(NLP)。在一些状况下，DLP可替代性地被称作随机存取可解码前置(RADL)图片。在HEVC WD8中，NLP被称作标记为舍弃(TFD)图片。在其它状况下，NLP可替代性地被称作随机存取跳过前置(RASL)图片。为了本发明的目的，可互换地使用术语“非可解码前置图片”、“TFD图片”及“RASL图片”。For a particular CRA or BLA picture, some of the associated preceding pictures are correctly decodable even when the CRA or BLA picture is the first picture in the bitstream. Such preceding pictures are referred to as decodable preceding pictures (DLPs), and other preceding pictures are referred to as non-decodable preceding pictures (NLPs). In some cases, DLPs are alternatively referred to as random access decodable preceding (RADL) pictures. In HEVC WD8, NLPs are referred to as marked as discarded (TFD) pictures. In other cases, NLPs are alternatively referred to as random access skipped preceding (RASL) pictures. For the purposes of the present invention, the terms "non-decodable preceding pictures", "TFD pictures" and "RASL pictures" are used interchangeably.

在HEVC WD8中，在附录C中指定了假想参考解码器(HRD)。HRD依赖于HRD参数(其可在位流中提供于包含于视频参数集(VPS)及/或序列参数集(SPS)中的hrd_parameters()语法结构中)、缓冲周期补充增强信息(SEI)消息，及图片时序SEI消息。缓冲周期SEI消息主要包含CPB参数，即初始经译码图片缓冲器(CPB)移除延迟及初始CPB移除延迟偏移。可提供CPB参数的两个集合，其被称为由语法元素initial_cpb_removal_delay[]及initial_cpb_removal_delay_offset[]发信的默认集合；及由语法元素initial_alt_cpb_removal_delay[]及initial_alt_cpb_removal_delay_offset[]发信的替代性集合。In HEVC WD8, the Hypothetical Reference Decoder (HRD) is specified in Annex C. The HRD relies on HRD parameters (which may be provided in the bitstream in the hrd_parameters() syntax structure contained in the Video Parameter Set (VPS) and/or Sequence Parameter Set (SPS), a buffering period supplemental enhancement information (SEI) message, and a picture timing SEI message. The buffering period SEI message primarily includes CPB parameters, namely the initial coded picture buffer (CPB) removal delay and the initial CPB removal delay offset. Two sets of CPB parameters are provided: a default set, signaled by the syntax elements initial_cpb_removal_delay[] and initial_cpb_removal_delay_offset[], and an alternative set, signaled by the syntax elements initial_alt_cpb_removal_delay[] and initial_alt_cpb_removal_delay_offset[].

当sub_pic_cpb_params_present_flag等于0且rap_cpb_params_present_flag等于1时，以下情形适用。当用在位流中并不具有相关联的TFD图片的BLA图片初始化HRD时，视频解码器30使用CPB参数的替代性集合以定义CPB 68。并不具有相关联的非可解码前置图片的BLA图片具有nal_unit_type，所述nal_unit_type指示具有可解码前置图片的BLA图片(例如，BLA_W_DLP)或指示无前置图片的BLA图片(例如，BLA_N_LP)。如果替代地使用默认集合，则CPB可溢出。当通过具有相关联的TFD图片的CRA图片或BLA图片初始化HRD时，视频解码器30使用CPB参数的默认集合来定义CPB 68。具有相关联的TFD图片的BLA图片具有指示具有非可解码前置图片的BLA图片的nal_unit_type(例如，BLA_W_TFD)。此情形反映于HEVCWD8的子条款C.2.1中的以下文本中：When sub_pic_cpb_params_present_flag is equal to 0 and rap_cpb_params_present_flag is equal to 1, the following applies. When an HRD is initialized with a BLA picture that does not have an associated TFD picture in the bitstream, video decoder 30 uses an alternative set of CPB parameters to define the CPB 68. A BLA picture that does not have an associated non-decodable preceding picture has a nal_unit_type that indicates a BLA picture with a decodable preceding picture (e.g., BLA_W_DLP) or a BLA picture with no preceding picture (e.g., BLA_N_LP). If the default set is used instead, the CPB may overflow. When an HRD is initialized with a CRA picture or a BLA picture that has an associated TFD picture, video decoder 30 uses the default set of CPB parameters to define the CPB 68. A BLA picture that has an associated TFD picture has a nal_unit_type that indicates a BLA picture with a non-decodable preceding picture (e.g., BLA_W_TFD). This situation is reflected in the following text in subclause C.2.1 of HEVC WD8:

变量InitCpbRemovalDelay[SchedSelIdx]及InitCpbRemovalDelayOffset[SchedSelIdx]被设定如下。The variables InitCpbRemovalDelay[SchedSelIdx] and InitCpbRemovalDelayOffset[SchedSelIdx] are set as follows.

-如果以下条件中的任一者为真，则InitCpbRemovalDelay[SchedSelIdx]及InitCpbRemovalDelayOffset[SchedSelIdx]分别被设定为相关联的缓冲周期SEI消息的对应initial_alt_cpb_removal_delay[SchedSelIdx]及initial_alt_cpb_removal_delay_offset[SchedSelIdx]的值：- If any of the following conditions is true, InitCpbRemovalDelay[SchedSelIdx] and InitCpbRemovalDelayOffset[SchedSelIdx] are set to the values of the corresponding initial_alt_cpb_removal_delay[SchedSelIdx] and initial_alt_cpb_removal_delay_offset[SchedSelIdx], respectively, of the associated buffering period SEI message:

-存取单元0是经译码图片具有等于BLA_W_DLP或BLA_N_LP的nal_unit_type的BLA存取单元，且相关联的缓冲周期SEI消息的rap_cpb_params_present_flag的值等于1；- Access unit 0 is a BLA access unit of which the coded picture has nal_unit_type equal to BLA_W_DLP or BLA_N_LP, and the value of rap_cpb_params_present_flag of the associated buffering period SEI message is equal to 1;

-SubPicCpbFlag等于1。-SubPicCpbFlag is equal to 1.

-否则，InitCpbRemovalDelay[SchedSelIdx]及InitCpbRemovalDelayOffset[SchedSelIdx]分别被设定为相关联的缓冲周期SEI消息的对应initial_cpb_removal_delay[SchedSelIdx]及initial_cpb_removal_delay_offset[SchedSelIdx]的值。Otherwise, InitCpbRemovalDelay[SchedSelIdx] and InitCpbRemovalDelayOffset[SchedSelIdx] are set to the values of the corresponding initial_cpb_removal_delay[SchedSelIdx] and initial_cpb_removal_delay_offset[SchedSelIdx], respectively, of the associated buffering period SEI message.

如自上文可看出，对于给定图片选择使用CPB参数的哪一集合可基于图片的nal_unit_type的值。As can be seen above, the selection of which set of CPB parameters to use for a given picture may be based on the value of the picture's nal_unit_type.

HEVC WD8亦包含在子条款8.1中的针对将CRA图片作为BLA图片进行处置的以下文本。HEVC WD8 also includes the following text in subclause 8.1 for handling CRA pictures as BLA pictures.

在当前图片是CRA图片时，以下情形适用。When the current picture is a CRA picture, the following applies.

-如果此规范中未指定的一些外部装置可用于将变量HandleCraAsBlaFlag设定为一值，则将HandleCraAsBlaFlag设定为由外部装置提供的值。- If some external device not specified in this specification is available to set the variable HandleCraAsBlaFlag to a value, then HandleCraAsBlaFlag is set to the value provided by the external device.

-否则，将HandleCraAsBlaFlag的值设定为0。- Otherwise, set the value of HandleCraAsBlaFlag to 0.

当HandleCraAsBlaFlag等于1时，在每一经译码切片NAL单元的解析及解码过程期间以下情形适用：When HandleCraAsBlaFlag is equal to 1, the following applies during the parsing and decoding process of each coded slice NAL unit:

-将nal_unit_type的值设定为BLA_W_TFD。- Set the value of nal_unit_type to BLA_W_TFD.

-将no_output_of_prior_pics_flag的值设定为1。- Set the value of no_output_of_prior_pics_flag to 1.

在HEVC WD8中，CRA图片在其经译码切片的NAL单元标头中具有等于CRA_NUT的nal_unit_type，且其可具有相关联的TFD图片及DLP图片。In HEVC WD8, a CRA picture has nal_unit_type equal to CRA_NUT in the NAL unit header of its coded slices, and it may have associated TFD pictures and DLP pictures.

以下问题是与用于选择用于CRA图片、BLA图片及作为BLA图片处置的CRA图片的CPB参数的现有方法相关联。第一问题是与选择用于CRA图片及BLA图片的CPB参数相关联。CRA图片可具有相关联的TFD图片。当CRA图片在原始位流中具有相关联的TFD图片，但相关联的TFD图片被流式处理服务器或中间网络元件舍弃时，为了启用对CPB参数的适当集合(即，替代性集合)的选择，网络实体29或另一外部装置必须在将CRA图片发送至视频解码器30之前将CRA图片改变至BLA图片。然而，网络实体29可能不能进行此操作。在此些情形下，对初始CPB移除延迟及偏移的适当集合的选择无法成功，此情形可导致CPB 68的溢出；或是不能执行TFD图片的舍弃，此情形导致带宽的浪费或较低视频质量。The following problems are associated with existing methods for selecting CPB parameters for CRA pictures, BLA pictures, and CRA pictures treated as BLA pictures. The first problem is associated with selecting CPB parameters for CRA pictures and BLA pictures. CRA pictures may have associated TFD pictures. When a CRA picture has an associated TFD picture in the original bitstream, but the associated TFD picture is discarded by the streaming server or an intermediate network element, in order to enable selection of an appropriate set (i.e., an alternative set) of CPB parameters, network entity 29 or another external device must change the CRA picture to a BLA picture before sending the CRA picture to video decoder 30. However, network entity 29 may be unable to do this. In such cases, selection of an appropriate set of initial CPB removal delays and offsets may fail, which may result in overflow of CPB 68; or TFD picture discarding may fail, which may result in wasted bandwidth or lower video quality.

第二问题是与将CRA图片作为BLA图片进行处置相关联。CRA图片可具有相关联的TFD图片。当CRA图片具有在原始位流中的相关联的TFD图片，但相关联的TFD图片被网络实体29或另一外部装置(例如，包含于流式处理服务器或中间网络元件中的处理装置)舍弃时，外部装置指示将CRA图片作为BLA图片进行处置。如HEVC WD8中所指定，视频解码器30接着设定nal_unit_type的值以指示具有非可解码前置图片的BLA图片(例如，BLA_W_TFD)，此情形导致使用CPB参数的默认集合，且因此CPB 68可溢出。A second issue is associated with treating a CRA picture as a BLA picture. A CRA picture may have an associated TFD picture. When a CRA picture has an associated TFD picture in the original bitstream, but the associated TFD picture is discarded by network entity 29 or another external device (e.g., a processing device included in a streaming server or an intermediate network element), the external device instructs the CRA picture to be treated as a BLA picture. As specified in HEVC WD8, video decoder 30 then sets the value of nal_unit_type to indicate a BLA picture with a non-decodable preceding picture (e.g., BLA_W_TFD), which results in the use of the default set of CPB parameters, and therefore CPB 68 may overflow.

本发明的技术提供能够消除或避免上文所描述的问题的改善的RAP图片行为。根据所述技术，脱离视频译码规范的范围，定义变量，且变量的值可由网络实体29或另一外部装置(例如，包含于流式处理服务器、中间网络元件或另一网络实体中的处理装置)来设定。在一个实例中，变量可指定是否使用CPB参数的替代性集合，且当将CRA图片作为BLA图片进行处置时使用何NAL单元类型。在另一实例中，变量可指定待用于特定图片的NAL单元类型值，自所述NAL单元类型值可导出使用CPB参数的默认集合或是替代性集合。The techniques of this disclosure provide improved RAP picture behavior that can eliminate or avoid the problems described above. According to the techniques, variables are defined outside the scope of the video coding specification, and their values can be set by network entity 29 or another external device (e.g., a processing device included in a streaming server, an intermediate network element, or another network entity). In one example, the variables can specify whether to use an alternative set of CPB parameters and what NAL unit type to use when handling CRA pictures as BLA pictures. In another example, the variables can specify a NAL unit type value to be used for a particular picture, from which the default set or the alternative set of CPB parameters can be derived.

在以下章节中，更详细地描述上述技术。下划线可指示相对于HEVC WD8的添加，且可指示相对于HEVC WD8的删除。In the following sections, the above techniques are described in more detail. Underlining may indicate additions relative to HEVC WD8 and may indicate deletions relative to HEVC WD8.

在一个实例中，视频解码器30接收表示多个图片的位流，多个图片包含一或多个CRA图片或BLA图片。视频解码器30也自网络实体29接收消息，所述消息指示是否针对CRA图片或BLA图片中的至少一者使用CPB参数的替代性集合。视频解码器30基于接收到的消息设定变量，所述变量经定义以指示用于CRA图片或BLA图片中的给定一者的CPB参数的集合。视频解码器30接着基于针对图片的变量来选择用于CRA图片或BLA图片中的所述给定一者的CPB参数的集合。In one example, video decoder 30 receives a bitstream representing a plurality of pictures, the plurality of pictures including one or more CRA pictures or BLA pictures. Video decoder 30 also receives a message from network entity 29 indicating whether an alternative set of CPB parameters is to be used for at least one of the CRA pictures or the BLA pictures. Video decoder 30 sets a variable based on the received message, the variable being defined to indicate a set of CPB parameters to be used for a given one of the CRA pictures or the BLA pictures. Video decoder 30 then selects a set of CPB parameters to be used for the given one of the CRA pictures or the BLA pictures based on the picture-specific variable.

根据此实例，可针对每一BLA或CRA图片而定义变量UseAltCpbParamsFlag。此变量的值由网络实体29或某其它外部装置设定为0或1。如果此外部装置不可用，则视频解码器30可将变量的值设定为0。According to this example, a variable UseAltCpbParamsFlag may be defined for each BLA or CRA picture. The value of this variable is set to 0 or 1 by network entity 29 or some other external device. If this external device is not available, video decoder 30 may set the value of the variable to 0.

在此状况下，上文引用的HEVC WD8的子条款8.1中的文本可由以下内容替换：In this case, the text in subclause 8.1 of HEVC WD8 cited above may be replaced by the following:

在当前图片是具有等于BLA_W_TFD的nal_unit_type的BLA图片或是CRA图片时，以When the current picture is a BLA picture with nal_unit_type equal to BLA_W_TFD or a CRA picture, 下情形适用。The following situations apply.

-如果此规范中未指定的某外部装置可用于将变量UseAltCpbParamsFlag设定为一值，则将UseAltCpbParamsFlag设定为由外部装置提供的值。 - If some external device not specified in this specification is available to set the variable UseAltCpbParamsFlag to a value, UseAltCpbParamsFlag is set to the value provided by the external device.

-否则，将UseAltCpbParamsFlag的值设定为0。 -Otherwise, set the value of UseAltCpbParamsFlag to 0.

-如果此规范中未指定的某外部装置可用于将变量HandleCraAsBlaFlag设定为一值，则将HandleCraAsBlaFlag设定为由外部装置提供的值。- If some external device not specified in this specification is available to set the variable HandleCraAsBlaFlag to a value, then HandleCraAsBlaFlag is set to the value provided by the external device.

在当前图片是CRA图片且HandleCraAsBlaFlag等于1时，在针对每一经译码切片NAL单元的解析及解码过程期间，以下情形适用，且将CRA图片视作BLA图片并将CRA存取单元视作BLA存取单元： When the current picture is a CRA picture and HandleCraAsBlaFlag is equal to 1, during the parsing and decoding process for each coded slice NAL unit, the following applies, and the CRA picture is treated as a BLA picture and the CRA access unit is treated as a BLA access unit:

-如果UseAltCpbParamsFlag等于0，则将nal_unit_type的值设定为BLA_W_TFD。否则，将nal_unit_type的值设定为BLA_W_DLP。- If UseAltCpbParamsFlag is equal to 0, set the value of nal_unit_type to BLA_W_TFD. Otherwise, set the value of nal_unit_type to BLA_W_DLP.

此外，上文引用的HEVC WD8的子条款C.2.1中的文本可由以下内容替换：Additionally, the text in subclause C.2.1 of HEVC WD8 cited above may be replaced by the following:

-如果以下条件中的一者为真，则InitCpbRemovalDelay[SchedSelIdx]及InitCpbRemovalDelayOffset[SchedSelIdx]分别被设定为相关联的缓冲周期SEI消息的对应initial_alt_cpb_removal_delay[SchedSelIdx]及initial_alt_cpb_removal_delay_offset[SchedSelIdx]的值：- If one of the following conditions is true, InitCpbRemovalDelay[SchedSelIdx] and InitCpbRemovalDelayOffset[SchedSelIdx] are set to the values of the corresponding initial_alt_cpb_removal_delay[SchedSelIdx] and initial_alt_cpb_removal_delay_offset[SchedSelIdx], respectively, of the associated buffering period SEI message:

-存取单元0是经译码图片具有等于BLA_W_TFD的nal_unit_type的BLA存取单元或- Access unit 0 is a BLA access unit of a coded picture with nal_unit_type equal to BLA_W_TFD or 是CRA存取单元，UseAltCpbParamsFlag等于1，且相关联的缓冲周期SEI消息的rap_cpb_It is a CRA access unit, UseAltCpbParamsFlag is equal to 1, and the rap_cpb_ params_present_flag的值等于1；The value of params_present_flag is equal to 1;

-SubPicCpbFlag等于1。-SubPicCpbFlag is equal to 1.

网络实体29或经配置以设定UseAltCpbParamsFlag的值的另一外部装置可如下起作用。网络实体29可将消息发送至视频解码器30或发送至含有视频解码器30的接收器。消息可指示，特定BLA或CRA图片具有相关联的TFD图片但相关联的TFD图片被舍弃，且因此应使用CPB参数的替代性集合。在接收到此消息时，视频解码器30可将针对特定BLA或CRA图片的UseAltCpbParamsFlag的值设定为1。如果特定BLA或CRA不具有TFD图片，或其具有未被舍弃的TFD图片，则不需要发送消息，或发送消息以指导视频解码器30将针对特定BLA或CRA图片的UseAltCpbParamsFlag的值设定为0。Network entity 29 or another external device configured to set the value of UseAltCpbParamsFlag may function as follows. Network entity 29 may send a message to video decoder 30 or to a receiver containing video decoder 30. The message may indicate that a particular BLA or CRA picture has an associated TFD picture but the associated TFD picture is discarded and, therefore, an alternative set of CPB parameters should be used. Upon receiving this message, video decoder 30 may set the value of UseAltCpbParamsFlag for the particular BLA or CRA picture to 1. If the particular BLA or CRA does not have a TFD picture, or it has a TFD picture that is not discarded, then no message needs to be sent, or a message may be sent to direct video decoder 30 to set the value of UseAltCpbParamsFlag for the particular BLA or CRA picture to 0.

在一些状况下，视频解码器30可设定针对CRA图片或BLA图中的给定一者的网络抽象层(NAL)单元类型，且可基于针对图片的NAL单元类型及变量来选择用于给定图片的CPB参数的集合。作为另一实例，并非使用指示一般CRA图片的仅一个NAL单元类型(例如，CRA_NUT)，本发明的技术允许使用三个不同NAL单元类型，所述不同NAL单元类型分别指示具有非可解码前置图片的CRA图片(例如，CRA_W_TFD)，指示具有可解码前置图片的CRA图片(例如，CRA_W_DLP)，且指示无前置图片的CRA图片(例如，CRA_N_LP)。在此状况下，如下文所展示改变了HEVC WD8中的表7-1及表下方的注释。In some cases, video decoder 30 may set a network abstraction layer (NAL) unit type for a given one of a CRA picture or a BLA picture, and may select a set of CPB parameters for a given picture based on the NAL unit type and variables for the picture. As another example, instead of using only one NAL unit type (e.g., CRA_NUT) that indicates a general CRA picture, the techniques of this disclosure allow the use of three different NAL unit types that respectively indicate a CRA picture with a non-decodable preceding picture (e.g., CRA_W_TFD), a CRA picture with a decodable preceding picture (e.g., CRA_W_DLP), and a CRA picture with no preceding picture (e.g., CRA_N_LP). In this case, Table 7-1 in HEVC WD8 and the notes below the table are changed as shown below.

表7-1-NAL单元类型码及NAL单元类型类Table 7-1 - NAL unit type codes and NAL unit type classes

注释3-具有等于CRA_W_TFD的nal_unit_type的CRA图片可具有在位流中存在的相关联的TFD图片或相关联的DLP图片或两者。具有等于CRA_W_DLP的nal_unit_type的CRA图片并不具有存在于位流中的相关联的TFD图片，但可具有在位流中的相关联的DLP图片。具有等于CRA_N_LP的nal_unit_type的CRA图片并不具有存在于位流中的相关联的前置图片。NOTE 3 - A CRA picture with nal_unit_type equal to CRA_W_TFD may have an associated TFD picture or an associated DLP picture, or both, present in the bitstream. A CRA picture with nal_unit_type equal to CRA_W_DLP does not have an associated TFD picture present in the bitstream, but may have an associated DLP picture in the bitstream. A CRA picture with nal_unit_type equal to CRA_N_LP does not have an associated preceding picture present in the bitstream.

注释4-具有等于BLA_W_TFD的nal_unit_type的BLA图片可具有在位流中存在的相关联的TFD图片或相关联的DLP图片或两者。具有等于BLA_W_DLP的nal_unit_type的BLA图片并不具有存在于位流中的相关联的TFD图片，但可具有在位流中的相关联的DLP图片。具有等于BLA_N_LP的nal_unit_type的BLA图片并不具有存在于位流中的相关联的前置图片。NOTE 4 - A BLA picture with nal_unit_type equal to BLA_W_TFD may have an associated TFD picture or an associated DLP picture, or both, present in the bitstream. A BLA picture with nal_unit_type equal to BLA_W_DLP does not have an associated TFD picture present in the bitstream, but may have an associated DLP picture in the bitstream. A BLA picture with nal_unit_type equal to BLA_N_LP does not have an associated preceding picture present in the bitstream.

注释5-具有等于IDR_N_LP的nal_unit_type的IDR图片并不具有存在于位流中的相关联的前置图片。具有等于IDR_W_DLP的nal_unit_type的IDR图片并不具有存在于位流中的相关联的TFD图片，但可具有在位流中的相关联的DLP图片。NOTE 5 - An IDR picture with nal_unit_type equal to IDR_N_LP does not have an associated preceding picture present in the bitstream. An IDR picture with nal_unit_type equal to IDR_W_DLP does not have an associated TFD picture present in the bitstream, but may have an associated DLP picture in the bitstream.

此外，类似于上述第一实例，针对每一BLA或CRA图片而定义变量UseAltCpbParamsFlag。此变量的值由网络实体29或另一外部装置设定为0或1。如果此外部装置不可用，则视频解码器30可将变量的值设定为0。In addition, similar to the first example above, a variable UseAltCpbParamsFlag is defined for each BLA or CRA picture. The value of this variable is set to 0 or 1 by network entity 29 or another external device. If this external device is not available, video decoder 30 may set the value of the variable to 0.

在当前图片是具有等于BLA_W_TFD的nal_unit_type的BLA图片或是具有等于CRA_When the current picture is a BLA picture with nal_unit_type equal to BLA_W_TFD or a CRA_ W_TFD的nal_unit_type的CRA图片时，以下情形适用。When using a CRA picture with nal_unit_type of W_TFD, the following applies.

在当前图片是CRA图片且HandleCraAsBlaFlag等于1时，在针对每一经解码切片NAL单元的解析及解码过程期间，以下情形适用，且将CRA图片视作BLA图片并将CRA存取单元视作BLA存取单元： When the current picture is a CRA picture and HandleCraAsBlaFlag is equal to 1, during the parsing and decoding process for each decoded slice NAL unit, the following applies, and the CRA picture is treated as a BLA picture and the CRA access unit is treated as a BLA access unit:

-如果nal_unit_type的值等于CRA_W_TFD，则将nal_unit_type的值设定为BLA_W_TFD。否则，如果nal_unit_type的值等于CRA_W_DLP，则将nal_unit_type的值设定为BLA_W_DLP。否则，将nal_unit_type的值设定为BLA_N_LP。- If the value of nal_unit_type is equal to CRA_W_TFD, then the value of nal_unit_type is set to BLA_W_TFD. Otherwise, if the value of nal_unit_type is equal to CRA_W_DLP, then the value of nal_unit_type is set to BLA_W_DLP. Otherwise, the value of nal_unit_type is set to BLA_N_LP.

-存取单元0是经译码图片具有等于CRA_W_DLP或VRA_N_LP的nal_unit_type的CRA 存取单元，且相关联的缓冲周期SEI消息的rap_cpb_params_present_flag的值等于1； - Access unit 0 is a CRA access unit whose coded picture has nal_unit_type equal to CRA_W_DLP or VRA_N_LP , and the value of rap_cpb_params_present_flag of the associated buffering period SEI message is equal to 1;

-存取单元0是经译码图片具有等于BLA_W_TFD的nal_unit_type的BLA存取单元或是经译码图片具有等于CRA_W_TFD的nal_unit_type的CRA存取单元，UseAltCpbParamsFlag 等于1，且相关联的缓冲周期SEI消息的rap_cpb_params_present_flag的值等于1； - Access unit 0 is a BLA access unit whose coded picture has nal_unit_type equal to BLA_W_TFD or a CRA access unit whose coded picture has nal_unit_type equal to CRA_W_TFD, UseAltCpbParamsFlag is equal to 1, and the value of rap_cpb_params_present_flag of the associated buffering period SEI message is equal to 1;

-SubPicCpbFlag等于1。-SubPicCpbFlag is equal to 1.

网络实体29或经配置以设定UseAltCpbParamsFlag的值的另一外部装置可如下起作用。网络实体29可将消息发送至视频解码器30或含有视频解码器30的接收器。消息可指示，特定BLA或CRA图片具有相关联的TFD图片但相关联的TFD图片被舍弃，且因此应使用CPB参数的替代性集合。在接收到此消息时，视频解码器30可将针对特定BLA或CRA图片的UseAltCpbParamsFlag的值设定为1。如果特定BLA或CRA不具有TFD图片，或其具有TFD图片但未被舍弃，则不需要发送消息，或发送消息以指导视频解码器30将针对特定BLA或CRA图片的UseAltCpbParamsFlag的值设定为0。Network entity 29 or another external device configured to set the value of UseAltCpbParamsFlag may function as follows. Network entity 29 may send a message to video decoder 30 or a receiver containing video decoder 30. The message may indicate that a particular BLA or CRA picture has an associated TFD picture but the associated TFD picture is discarded and, therefore, an alternative set of CPB parameters should be used. Upon receiving this message, video decoder 30 may set the value of UseAltCpbParamsFlag for the particular BLA or CRA picture to 1. If the particular BLA or CRA does not have a TFD picture, or if it has a TFD picture but it is not discarded, then no message needs to be sent, or a message may be sent to direct video decoder 30 to set the value of UseAltCpbParamsFlag for the particular BLA or CRA picture to 0.

在另一实例中，视频解码器30接收表示多个图片的位流，所述多个图片包含一或多个CRA图片或BLA图片；且还自网络实体29接收消息，所述消息指示针对CRA图片或BLA图片中的至少一者的NAL单元类型。视频解码器30基于接收到的消息设定变量，所述变量经定义以指示针对CRA图片或BLA图片中的给定一者的NAL单元类型。视频解码器30接着设定针对CRA图片或BLA图片中的给定一者的NAL单元类型，且基于NAL单元类型选择用于给定图片的CPB参数的集合。In another example, video decoder 30 receives a bitstream representing a plurality of pictures, the plurality of pictures including one or more CRA pictures or BLA pictures, and also receives a message from network entity 29 indicating a NAL unit type for at least one of the CRA pictures or BLA pictures. Video decoder 30 sets a variable based on the received message, the variable being defined to indicate the NAL unit type for a given one of the CRA pictures or BLA pictures. Video decoder 30 then sets the NAL unit type for the given one of the CRA pictures or BLA pictures and selects a set of CPB parameters for the given picture based on the NAL unit type.

根据此实例，可针对每一CRA或BLA图片而定义变量UseThisNalUnitType。此变量的值由网络实体29或某其它外部装置来设定。如果此外部装置不可用，则视频解码器30可将变量的值设定为CRA或BLA图片的nal_unit_type。在一些实例中，此变量的可能值是CRA_NUT、BLA_W_TFD、BLA_W_DLP及BLA_N_LP。在其它实例中，此变量的可能值可包含经配置以指示一般CRA图片、具有非可解码前置图片的BLA图片、具有可解码前置图片的BLA图片，及无前置图片的BLA图片的其它nal_unit_type。According to this example, a variable UseThisNalUnitType may be defined for each CRA or BLA picture. The value of this variable is set by network entity 29 or some other external device. If such an external device is not available, video decoder 30 may set the value of the variable to the nal_unit_type of the CRA or BLA picture. In some examples, possible values for this variable are CRA_NUT, BLA_W_TFD, BLA_W_DLP, and BLA_N_LP. In other examples, possible values for this variable may include other nal_unit_types configured to indicate a normal CRA picture, a BLA picture with a non-decodable preceding picture, a BLA picture with a decodable preceding picture, and a BLA picture with no preceding picture.

在当前图片是BLA或CRA图片时，以下情形适用。When the current picture is a BLA or CRA picture, the following applies.

-如果此规范中未指定的某外部装置可用于将变量UseThisNalUnitType设定为一值，则将UseThisNalUnitType设定为由外部装置提供的值。对于具有等于BLA_N_LP的nal_unit_type的BLA图片，外部装置可仅将UseThisNalUnitType设定为BLA_N_LP；对于具有等于BLA_W_DLP的nal_unit_type的BLA图片，外部装置可仅将UseThisNalUnitType设定为BLA_W_DLP或是BLA_N_LP；对于具有等于BLA_W_TFD的nal_unit_type的BLA图片，外部装置可仅将UseThisNalUnitType设定为BLA_W_TFD、BLA_W_DLP及BLA_N_LP中的一者；对于BLA图片，外部装置应从不设定UseThisNalUnitType来指示CRA图片或任何其它图片类型；对于CRA图片，外部装置可将UseThisNalUnitType设定为CRA_NUT、BLA_W_TFD、BLA_W_DLP及BLA_N_LP中的一者而非任何其它值。- If some external device not specified in this specification is available to set the variable UseThisNalUnitType to a value, then UseThisNalUnitType is set to the value provided by the external device. For BLA pictures with nal_unit_type equal to BLA_N_LP, the external device may only set UseThisNalUnitType to BLA_N_LP; for BLA pictures with nal_unit_type equal to BLA_W_DLP, the external device may only set UseThisNalUnitType to BLA_W_DLP or BLA_N_LP; for BLA pictures with nal_unit_type equal to BLA_W_TFD, the external device may only set UseThisNalUnitType to one of BLA_W_TFD, BLA_W_DLP and BLA_N_LP; for BLA pictures, the external device should never set UseThisNalUnitType to indicate CRA pictures or any other picture type; for CRA pictures, the external device may set UseThisNalUnitType to one of CRA_NUT, BLA_W_TFD, BLA_W_DLP and BLA_N_LP instead of any other value.

-否则，将UseThisNalUnitType的值设定为当前图片的nal_unit_type。- Otherwise, set the value of UseThisNalUnitType to the nal_unit_type of the current picture.

在当前图片是CRA或BLA图片时，在针对每一经译码切片NAL单元的解析及解码过程期间，以下情形适用：When the current picture is a CRA or BLA picture, during the parsing and decoding process for each coded slice NAL unit, the following applies:

-将nal_unit_type的值设定为UseThisNalUnitType，且根据等于UseThisNalUnitType的nal_unit_type的值将当前图片或存取单元视作CRA或BLA图片或存取单元。- The value of nal_unit_type is set to UseThisNalUnitType, and the current picture or access unit is considered to be a CRA or BLA picture or access unit according to the value of nal_unit_type being equal to UseThisNalUnitType.

-如果当前图片在以上步骤之前是CRA图片且已变为BLA图片，则将no_output_of_prior_pics_flag的值设定为1。- If the current picture is a CRA picture before the above step and has been changed to a BLA picture, the value of no_output_of_prior_pics_flag is set to 1.

上文引用的HEVC WD8的子条款C.2.1中的文本并不需要被改变。The text in subclause C.2.1 of HEVC WD8 cited above does not need to be changed.

作为另一实例，并非使用指示一般CRA图片的仅一个NAL单元类型(例如，CRA_NUT)，本发明的技术允许使用三个不同NAL单元类型，所述不同NAL单元类型分别指示具有非可解码前置图片的CRA图片(例如，CRA_W_TFD)，指示具有可解码前置图片的CRA图片(例如，CRA_W_DLP)，且指示无前置图片的CRA图片(例如，CRA_N_LP)。在此状况下，如上文所描述地改变HEVC WD8中的表7-1及表下方的注释。As another example, instead of using only one NAL unit type (e.g., CRA_NUT) that indicates a general CRA picture, the techniques of this disclosure allow the use of three different NAL unit types that respectively indicate a CRA picture with a non-decodable preceding picture (e.g., CRA_W_TFD), a CRA picture with a decodable preceding picture (e.g., CRA_W_DLP), and a CRA picture with no preceding picture (e.g., CRA_N_LP). In this case, Table 7-1 in HEVC WD8 and the notes below the table are changed as described above.

此外，类似于上述第二实例，针对每一CRA或BLA图片而定义变量UseThisNalUnitType。此变量的值由网络实体29或另一外部装置来设定。如果此外部装置不可用，则视频解码器30可将变量的值设定为CRA或BLA图片的nal_unit_type。在一些实例中，此变量的可能值是CRA_W_TFD、CRA_W_DLP、CRA_N_LP、BLA_W_TFD、BLA_W_DLP及BLA_N_LP。在其它实例中，此变量的可能值可包含其它nal_unit_type，所述nal_unit_type经配置以指示具有非可解码前置图片的CRA图片、具有可解码前置图片的CRA图片、无前置图片的CRA图片、具有非可解码前置图片的BLA图片、具有可解码前置图片的BLA图片，及无前置图片的BLA图片。Furthermore, similar to the second example described above, a variable, UseThisNalUnitType, is defined for each CRA or BLA picture. The value of this variable is set by network entity 29 or another external device. If this external device is not available, video decoder 30 may set the value of the variable to the nal_unit_type of the CRA or BLA picture. In some examples, possible values for this variable are CRA_W_TFD, CRA_W_DLP, CRA_N_LP, BLA_W_TFD, BLA_W_DLP, and BLA_N_LP. In other examples, possible values for this variable may include other nal_unit_types configured to indicate a CRA picture with a non-decodable preceding picture, a CRA picture with a decodable preceding picture, a CRA picture with no preceding picture, a BLA picture with a non-decodable preceding picture, a BLA picture with a decodable preceding picture, and a BLA picture with no preceding picture.

-如果此规范中未指定的某外部装置可用于将变量UseThisNalUnitType设定为一值，则将UseThisNalUnitType设定为由外部装置提供的值。- If some external device not specified in this specification is available to set the variable UseThisNalUnitType to a value, then UseThisNalUnitType is set to the value provided by the external device.

对于具有等于BLA_N_LP的nal_unit_type的BLA图片，外部装置可仅将UseThisNalUnitType设定为BLA_N_LP；对于具有等于BLA_W_DLP的nal_unit_type的BLA图片，外部装置可仅将UseThisNalUnitType设定为BLA_W_DLP或是BLA_N_LP；对于具有等于BLA_W_TFD的nal_unit_type的BLA图片，外部装置可仅将UseThisNalUnitType设定为BLA_W_TFD、BLA_W_DLP及BLA_N_LP中的一者；对于BLA图片，外部装置应从不设定UseThisNalUnitType来指示CRA图片或任何其它图片类型。For BLA pictures with nal_unit_type equal to BLA_N_LP, the external device may only set UseThisNalUnitType to BLA_N_LP; for BLA pictures with nal_unit_type equal to BLA_W_DLP, the external device may only set UseThisNalUnitType to BLA_W_DLP or BLA_N_LP; for BLA pictures with nal_unit_type equal to BLA_W_TFD, the external device may only set UseThisNalUnitType to one of BLA_W_TFD, BLA_W_DLP and BLA_N_LP; for BLA pictures, the external device should never set UseThisNalUnitType to indicate CRA pictures or any other picture type.

对于具有等于CRA_N_LP的nal_unit_type的CRA图片，外部装置可仅将UseThisNalUnitType设定为CRA_N_LP或BLA_N_LP；对于具有等于CRA_W_DLP的nal_unit_type的CRA图片，外部装置可仅将UseThisNalUnitType设定为CRA_W_DLP、CRA_N_LP、BLA_W_DLP或BLA_N_LP；对于具有等于CRA_W_TFD的nal_unit_type的CRA图片，外部装置可仅将UseThisNalUnitType设定为CRA_W_TFD、CRA_W_DLP、CRA_N_LP、BLA_W_TFD、BLA_W_DLP或BLA_N_LP。For CRA pictures with nal_unit_type equal to CRA_N_LP, the external device may only set UseThisNalUnitType to CRA_N_LP or BLA_N_LP; for CRA pictures with nal_unit_type equal to CRA_W_DLP, the external device may only set UseThisNalUnitType to CRA_W_DLP, CRA_N_LP, BLA_W_DLP, or BLA_N_LP; for CRA pictures with nal_unit_type equal to CRA_W_TFD, the external device may only set UseThisNalUnitType to CRA_W_TFD, CRA_W_DLP, CRA_N_LP, BLA_W_TFD, BLA_W_DLP, or BLA_N_LP.

-存取单元0是经译码图片具有等于CRA_W_DLP或CRA_N_LP的nal_unit_type的CRA 存取单元，且相关联的缓冲周期SEI消息的rap_cpb_params_present_flag的值等于1； - Access unit 0 is a CRA access unit whose coded picture has nal_unit_type equal to CRA_W_DLP or CRA_N_LP , and the value of rap_cpb_params_present_flag of the associated buffering period SEI message is equal to 1;

-SubPicCpbFlag等于1。-SubPicCpbFlag is equal to 1.

图4是说明经配置以根据假想参考解码器(HRD)操作的实例目的地装置100的框图。在此实例中，目的地装置100包含输入接口102、流调度器104、经译码图片缓冲器(CPB)106、视频解码器108、经解码图片缓冲器(DPB)110、再现单元112，及输出接口114。目的地装置100可实质上对应于来自图1的目的地装置14。输入接口102可包括能够接收视频数据的经译码位流的任何输入接口，且可实质上对应于来自图1的输入接口28。举例来说，输入接口102可包括接收器、调制解调器、例如有线或无线接口的网络接口、存储器或存储器接口、用于自盘片读取数据的驱动器(例如，光驱接口或磁性媒体接口)或其它接口组件。FIG4 is a block diagram illustrating an example destination device 100 configured to operate according to a hypothetical reference decoder (HRD). In this example, destination device 100 includes an input interface 102, a stream scheduler 104, a coded picture buffer (CPB) 106, a video decoder 108, a decoded picture buffer (DPB) 110, a rendering unit 112, and an output interface 114. Destination device 100 may substantially correspond to destination device 14 from FIG1. Input interface 102 may include any input interface capable of receiving a coded bitstream of video data, and may substantially correspond to input interface 28 from FIG1. For example, input interface 102 may include a receiver, a modem, a network interface such as a wired or wireless interface, a memory or storage interface, a drive for reading data from a disk (e.g., an optical drive interface or a magnetic media interface), or other interface components.

输入接口102可接收包含视频数据的经译码位流且将位流提供至流调度器104。流调度器104自位流提取视频数据单元(例如，存取单元及/或解码单元)，且将所提取的单元存储至CPB 106。以此方式，流调度器104表示假想流调度器(HSS)的实例实施。CPB106可实质上符合来自图3的CPB 68，只不过如图4中所展示，CPB 106与视频解码器108分离。在不同实例中，CPB 106可与视频解码器108分离，或经集成作为视频解码器108的部分。Input interface 102 may receive a coded bitstream including video data and provide the bitstream to stream scheduler 104. Stream scheduler 104 extracts video data units (e.g., access units and/or decoding units) from the bitstream and stores the extracted units in CPB 106. In this manner, stream scheduler 104 represents an example implementation of a hypothetical stream scheduler (HSS). CPB 106 may substantially conform to CPB 68 from FIG. 3 , except that, as shown in FIG. 4 , CPB 106 is separate from video decoder 108. In different examples, CPB 106 may be separate from video decoder 108 or integrated as part of video decoder 108.

视频解码器108包含DPB 110。视频解码器108可实质上符合来自图1及图3的视频解码器30。DPB 110可实质上符合来自图3的DPB 82。因此，视频解码器108可解码CPB 106的解码单元。此外，视频解码器108可自DPB 110输出经解码的图片。视频解码器108可将输出图片传递至再现单元112。再现单元112可裁切图片，且接着将经裁切的图片传递至输出接口114。输出接口114又可将经裁切的图片提供至可实质上符合来自图1的显示装置32的显示装置。The video decoder 108 includes a DPB 110. The video decoder 108 may substantially conform to the video decoder 30 from Figures 1 and 3. The DPB 110 may substantially conform to the DPB 82 from Figure 3. Thus, the video decoder 108 may decode the decoding units of the CPB 106. Furthermore, the video decoder 108 may output decoded pictures from the DPB 110. The video decoder 108 may pass the output pictures to a rendering unit 112. The rendering unit 112 may crop the pictures and then pass the cropped pictures to an output interface 114. The output interface 114 may, in turn, provide the cropped pictures to a display device, which may substantially conform to the display device 32 from Figure 1.

显示装置可形成目的地装置100的部分，且可通信地耦合至目的地装置100。举例来说，显示装置可包括与目的地装置100集成的屏幕、触摸屏、投影仪或其它显示单元，或可包括例如电视、监视器、投影仪、触摸屏或通信地耦合至目的地装置100的其它装置的分离显示器。通信耦合可包括例如通过同轴电缆、复合视频电缆、色差视频电缆、高清晰度多媒体接口(HDMI)电缆、射频广播，或其它有线或无线耦合进行的有线或无线耦合。A display device may form part of destination device 100 and may be communicatively coupled to destination device 100. For example, the display device may comprise a screen, touch screen, projector, or other display unit that is integrated with destination device 100, or may comprise a separate display such as a television, monitor, projector, touch screen, or other device that is communicatively coupled to destination device 100. The communicative coupling may include, for example, a wired or wireless coupling via a coaxial cable, a composite video cable, a component video cable, a High-Definition Multimedia Interface (HDMI) cable, radio frequency broadcast, or other wired or wireless coupling.

图5是说明基于变量选择经译码图片缓冲器(CPB)参数的集合的实例操作的流程图，所述变量指示用于位流中的特定随机存取点(RAP)图片的CPB参数的集合。关于来自图3的包含CPB 68的视频解码器30来描述所说明的操作。在其它实例中，类似操作可由包含CPB66的来自图2的视频编码器20、包含CPB 106及视频解码器108的来自图4的目的地装置100或包含具有经配置以根据HRD操作而操作的CPB的视频编码器或视频解码器的其它装置来执行。FIG5 is a flowchart illustrating an example operation of selecting a set of coded picture buffer (CPB) parameters based on a variable that indicates the set of CPB parameters for a particular random access point (RAP) picture in a bitstream. The illustrated operation is described with respect to video decoder 30 from FIG3 , including CPB 68. In other examples, similar operations may be performed by video encoder 20 from FIG2 , including CPB 66, destination device 100 from FIG4 , including CPB 106 and video decoder 108, or other devices including a video encoder or video decoder having a CPB configured to operate according to HRD operation.

视频解码器30接收包含一或多个CRA图片或BLA图片的位流(120)。连同位流，视频解码器30还接收消息，所述消息指示是否针对CRA或BLA图片中的特定一者使用CPB参数的替代性集合(122)。更具体来说，视频解码器30可自例如网络实体29的外部装置接收消息，所述网络实体29能够舍弃与特定图片相关联的TFD图片，且还能够向视频解码器30通知TFD图片已被舍弃。Video decoder 30 receives a bitstream including one or more CRA pictures or BLA pictures (120). Along with the bitstream, video decoder 30 also receives a message indicating whether an alternative set of CPB parameters is used for a particular one of the CRA or BLA pictures (122). More specifically, video decoder 30 may receive a message from an external device, such as network entity 29, that is capable of discarding a TFD picture associated with a particular picture and is also capable of notifying video decoder 30 that the TFD picture has been discarded.

举例来说，当特定图片在自视频编码器20输出的原始位流中具有TFD图片，且TFD图片已被外部装置舍弃时，由视频解码器30接收到的消息指示针对特定图片使用CPB参数的替代性集合。作为另一实例，当特定图片在自视频编码器20输出的原始位流中不具有TFD图片或特定图片具有在原始位流中的TFD图片且TFD图片尚未被外部装置舍弃时，由视频解码器30接收到的消息并不指示针对特定图片使用CPB参数的替代性集合。在此状况下，可基于图片的NAL单元类型而将CPB参数的默认集合或是替代性集合用于CRA图片或BLA图片中的一者。For example, when a particular picture has a TFD picture in the original bitstream output from video encoder 20, and the TFD picture has been discarded by an external device, the message received by video decoder 30 indicates that an alternative set of CPB parameters is used for the particular picture. As another example, when the particular picture does not have a TFD picture in the original bitstream output from video encoder 20, or the particular picture has a TFD picture in the original bitstream, and the TFD picture has not been discarded by an external device, the message received by video decoder 30 does not indicate that an alternative set of CPB parameters is used for the particular picture. In this case, either a default set or an alternative set of CPB parameters may be used for either a CRA picture or a BLA picture based on the NAL unit type of the picture.

视频解码器30基于接收到的消息设定变量(例如，UseAltCpbParamsFlag)，所述变量经定义以指示用于特定图片的CPB参数的集合(124)。举例来说，当接收到的消息指示用于特定图片的CPB参数的替代性集合时，视频解码器30可将UseAltCpbParamsFlag设定为等于1。相反，当接收到的消息并未明确指示用于特定图片的CPB参数的替代性集合时，视频解码器30可将UseAltCpbParamsFlag设定为等于0。在一些状况下，视频解码器30可不接收针对CRA图片或BLA图片中的至少一者的消息。视频解码器30可接着将UseAltCpbParamsFlag设定为等于0。Video decoder 30 sets a variable (e.g., UseAltCpbParamsFlag) based on the received message, the variable being defined to indicate a set of CPB parameters for a particular picture (124). For example, when the received message indicates an alternative set of CPB parameters for a particular picture, video decoder 30 may set UseAltCpbParamsFlag to 1. Conversely, when the received message does not explicitly indicate an alternative set of CPB parameters for a particular picture, video decoder 30 may set UseAltCpbParamsFlag to 0. In some cases, video decoder 30 may not receive a message for at least one of a CRA picture or a BLA picture. Video decoder 30 may then set UseAltCpbParamsFlag to 0.

视频解码器30接着设定针对特定图片的NAL单元类型(126)。在一些状况下，视频解码器30可将针对特定图片的NAL单元类型设定为如在位流中所发信的。在其它状况下，视频解码器30可至少部分基于针对图片的变量来设定针对特定图片的NAL单元类型。在下文关于图6更详细地描述NAL单元类型选择操作。视频解码器30基于针对特定图片的NAL单元类型及变量来选择用于特定图片的CPB参数的默认集合或替代性集合(128)。详细地说，视频解码器30在变量并未指示CPB参数的替代性集合时针对一或多个NAL单元类型选择CPB参数的默认集合；且在变量指示CPB参数的替代性集合时且针对一或多个不同NAL单元类型针对一或多个NAL单元类型选择CPB参数的替代性集合。在下文关于图7更详细地描述CPB参数集选择操作。Video decoder 30 then sets the NAL unit type for the particular picture (126). In some cases, video decoder 30 may set the NAL unit type for the particular picture as signaled in the bitstream. In other cases, video decoder 30 may set the NAL unit type for the particular picture based at least in part on a variable for the picture. The NAL unit type selection operation is described in more detail below with respect to FIG. 6. Video decoder 30 selects a default set or an alternative set of CPB parameters for the particular picture based on the NAL unit type and the variable for the particular picture (128). In detail, video decoder 30 selects a default set of CPB parameters for one or more NAL unit types when the variable does not indicate an alternative set of CPB parameters; and selects an alternative set of CPB parameters for one or more NAL unit types when the variable indicates an alternative set of CPB parameters and for one or more different NAL unit types. The CPB parameter set selection operation is described in more detail below with respect to FIG.

图6是说明基于变量设定针对特定RAP图片的网络抽象层(NAL)单元类型的实例操作的流程图，所述变量指示用于图片的CPB参数的集合。关于来自图3的包含CPB 68的视频解码器30来描述所说明的操作。在其它实例中，类似操作可由包含CPB 66的来自图2的视频编码器20、包含CPB 106及视频解码器108的来自图4的目的地装置100或包含具有经配置以根据HRD操作而操作的CPB的视频编码器或视频解码器的其它装置来执行。FIG6 is a flowchart illustrating example operations for setting a network abstraction layer (NAL) unit type for a particular RAP picture based on a variable indicating a set of CPB parameters for the picture. The illustrated operations are described with respect to video decoder 30 from FIG3 including CPB 68. In other examples, similar operations may be performed by video encoder 20 from FIG2 including CPB 66, destination device 100 from FIG4 including CPB 106 and video decoder 108, or other devices including a video encoder or video decoder having a CPB configured to operate according to HRD operations.

视频解码器30接收包含一或多个CRA图片或BLA图片的位流(150)。视频解码器30接收消息，所述消息指示是否针对CRA图片或BLA图片中的特定一者使用CPB参数的替代性集合(152)。视频解码器30基于接收到的消息设定变量，所述变量经定义以指示用于特定图片的CPB参数的集合(154)。Video decoder 30 receives a bitstream including one or more CRA pictures or BLA pictures (150). Video decoder 30 receives a message indicating whether to use an alternative set of CPB parameters for a particular one of the CRA pictures or BLA pictures (152). Video decoder 30 sets a variable defined to indicate a set of CPB parameters for the particular picture based on the received message (154).

当特定图片是BLA图片(156的否分支)时，视频解码器30将针对特定BLA图片的NAL单元类型设定为如在位流中所发信的(158)。当特定图片是CRA图片(156的是分支)时且当CRA图片并未作为BLA图片进行处置(160的否分支)时，视频解码器30亦将针对特定CRA图片的NAL单元类型设定为如在位流中发信的(158)。When the particular picture is a BLA picture (the No branch of 156), video decoder 30 sets the NAL unit type for the particular BLA picture as signaled in the bitstream (158). When the particular picture is a CRA picture (the Yes branch of 156) and when the CRA picture is not handled as a BLA picture (the No branch of 160), video decoder 30 also sets the NAL unit type for the particular CRA picture as signaled in the bitstream (158).

常规地，当将CRA图片作为BLA图片处置时，设定针对CRA图片的NAL单元类型以指示具有非可解码前置图片的BLA图片(例如，BLA_W_TFD)，此情形导致针对图片选择CPB参数的默认集合。在一些状况下，图片可能不具有相关联的TFD图片，且使用CPB参数的默认集合可导致CPB的溢出。根据本发明的技术，当特定图片是CRA图片(156的是分支)且CRA图片被作为BLA图片进行处置(160的是分支)时，视频解码器30基于针对特定CRA图片的变量来设定针对特定图片的NAL单元类型。Conventionally, when a CRA picture is handled as a BLA picture, the NAL unit type for the CRA picture is set to indicate a BLA picture with a non-decodable preceding picture (e.g., BLA_W_TFD), which results in the selection of a default set of CPB parameters for the picture. In some cases, a picture may not have an associated TFD picture, and using the default set of CPB parameters may result in overflow of the CPB. According to the techniques of this disclosure, when a particular picture is a CRA picture (the YES branch of 156 ) and the CRA picture is handled as a BLA picture (the YES branch of 160 ), video decoder 30 sets the NAL unit type for the particular picture based on a variable for the particular CRA picture.

举例来说，当变量并未明确指示CPB参数的替代性集合(162的否分支)时，视频解码器30设定针对特定图片的NAL单元类型以指示具有非可解码前置图片的BLA图片(例如，BLA_W_TFD)，此情形指示特定图片具有相关联的TFD图片(164)。在此状况下，将针对特定图片适当地选择CPB参数的默认集合。当变量指示CPB参数的替代性集合(162的是分支)时，视频解码器30设定针对特定图片的NAL单元类型以指示具有可解码前置图片的BLA图片(例如，BLA_W_DLP)，此情形指示特定图片不具有相关联的TFD图片(166)。在此状况下，将针对特定图片适当地选择CPB参数的替代性集合。以此方式，所述技术确保视频解码器的CPB将不会归因于使用不适当CPB参数而溢出。For example, when the variable does not explicitly indicate an alternative set of CPB parameters (the NO branch of 162), video decoder 30 sets the NAL unit type for the particular picture to indicate a BLA picture with a non-decodable preceding picture (e.g., BLA_W_TFD), which indicates that the particular picture has an associated TFD picture (164). In this case, the default set of CPB parameters will be appropriately selected for the particular picture. When the variable indicates an alternative set of CPB parameters (the YES branch of 162), video decoder 30 sets the NAL unit type for the particular picture to indicate a BLA picture with a decodable preceding picture (e.g., BLA_W_DLP), which indicates that the particular picture does not have an associated TFD picture (166). In this case, the alternative set of CPB parameters will be appropriately selected for the particular picture. In this way, the techniques ensure that the CPB of the video decoder will not overflow due to the use of inappropriate CPB parameters.

图7是说明基于针对图片的NAL单元类型及变量选择用于特定RAP图片的CPB参数的集合的实例操作的流程图，所述变量指示用于图片的CPB参数的集合。关于来自图3的包含CPB 68的视频解码器30来描述所说明的操作。在其它实例中，类似操作可由包含CPB 66的来自图2的视频编码器20、包含CPB 106及视频解码器108的来自图4的目的地装置100或包含具有经配置以根据HRD操作而操作的CPB的视频编码器或视频解码器的其它装置来执行。FIG7 is a flowchart illustrating an example operation of selecting a set of CPB parameters for a particular RAP picture based on the NAL unit type for the picture and a variable indicating the set of CPB parameters for the picture. The illustrated operation is described with respect to video decoder 30 from FIG3 including CPB 68. In other examples, similar operations may be performed by video encoder 20 from FIG2 including CPB 66, destination device 100 from FIG4 including CPB 106 and video decoder 108, or other devices including a video encoder or video decoder having a CPB configured to operate according to HRD operation.

视频解码器30接收包含一或多个CRA图片或BLA图片的位流(170)。视频解码器30接收消息，所述消息指示是否针对CRA图片或BLA图片中的特定一者使用CPB参数的替代性集合(172)。视频解码器30基于接收到的消息设定变量，所述变量经定义以指示用于特定图片的CPB参数的集合(174)。视频解码器30接着设定针对特定图片的NAL单元类型(176)。如上文关于图6所描述，视频解码器30可将特定图片的NAL单元类型设定为如在位流中发信的，或可基于针对图片的变量来设定针对特定图片的NAL单元类型。Video decoder 30 receives a bitstream that includes one or more CRA pictures or BLA pictures (170). Video decoder 30 receives a message indicating whether an alternative set of CPB parameters is used for a particular one of the CRA pictures or BLA pictures (172). Video decoder 30 sets a variable based on the received message, the variable being defined to indicate the set of CPB parameters for the particular picture (174). Video decoder 30 then sets a NAL unit type for the particular picture (176). As described above with respect to FIG. 6 , video decoder 30 may set the NAL unit type for the particular picture as signaled in the bitstream, or may set the NAL unit type for the particular picture based on a variable for the picture.

当特定图片是具有指示具有可解码前置图片的BLA图片(例如，BLA_W_DLP)或指示无前置图片的BLA图片(例如，BLA_N_LP)的NAL单元类型的BLA图片时(所述情形指示特定图片不具有相关联的TFD图片)(178的是分支)，视频解码器30基于NAL单元类型来选择用于特定图片的CPB参数的替代性集合(180)。常规地，CPB参数的默认集合是用于具有相关联的TFD图片的任何CRA图片或BLA图片(例如，BLA_W_TFD)。然而，在一些状况下，原始位流中的与特定图片相关联的TFD图片可在位流到达视频解码器之前被舍弃。视频解码器接着即使在图片不再具有相关联的TFD图片时亦基于NAL单元类型使用默认CPB参数，此情形可导致CPB的溢出。When a particular picture is a BLA picture with a NAL unit type indicating a BLA picture with decodable preceding pictures (e.g., BLA_W_DLP) or a BLA picture with no preceding pictures (e.g., BLA_N_LP), which indicates that the particular picture does not have an associated TFD picture (the yes branch of 178), video decoder 30 selects an alternative set of CPB parameters for the particular picture based on the NAL unit type (180). Conventionally, the default set of CPB parameters is for any CRA picture or BLA picture with an associated TFD picture (e.g., BLA_W_TFD). However, in some cases, the TFD picture associated with the particular picture in the original bitstream may be discarded before the bitstream reaches the video decoder. The video decoder then uses the default CPB parameters based on the NAL unit type even when the picture no longer has an associated TFD picture, which may result in an overflow of the CPB.

根据本发明的技术，当特定图片是具有指示具有非可解码前置图片的BLA图片的NAL单元类型(例如，BLA_W_TFD)的CRA图片或BLA图片(此情形指示特定图片具有相关联的TFD图片)(182的是分支)时，视频解码器30基于针对特定图片的变量来选择用于特定图片的CPB参数的集合。举例来说，当变量并未明确指示CPB参数的替代性集合(184的否分支)时，视频解码器30基于变量选择用于特定图片的CPB参数的默认集合(186)。当变量指示CPB参数的替代性集合(184的是分支)时，视频解码器30基于变量选择用于特定图片的CPB参数的替代性集合(188)。以此方式，所述技术确保视频解码器的CPB不会归因于使用不适当CPB参数而溢出。According to the techniques of this disclosure, when a particular picture is a CRA picture or a BLA picture (which indicates that the particular picture has an associated TFD picture) having a NAL unit type (e.g., BLA_W_TFD) that indicates a BLA picture with a non-decodable preceding picture (the YES branch of 182), video decoder 30 selects a set of CPB parameters for the particular picture based on a variable for the particular picture. For example, when the variable does not explicitly indicate an alternative set of CPB parameters (the NO branch of 184), video decoder 30 selects a default set of CPB parameters for the particular picture based on the variable (186). When the variable indicates an alternative set of CPB parameters (the YES branch of 184), video decoder 30 selects an alternative set of CPB parameters for the particular picture based on the variable (188). In this way, the techniques ensure that the CPB of the video decoder does not overflow due to the use of inappropriate CPB parameters.

图8是说明基于变量选择CPB参数的集合的实例操作的流程图，所述变量经定义以指示针对位流中的特定RAP图片的网络抽象层(NAL)单元类型。关于来自图3的包含CPB 68的视频解码器30来描述所说明的操作。在其它实例中，类似操作可由包含CPB66的来自图2的视频编码器20、包含CPB 106及视频解码器108的来自图4的目的地装置100或包含具有经配置以根据HRD操作而操作的CPB的视频编码器或视频解码器的其它装置来执行。FIG8 is a flowchart illustrating an example operation of selecting a set of CPB parameters based on a variable defined to indicate a network abstraction layer (NAL) unit type for a particular RAP picture in a bitstream. The illustrated operation is described with respect to video decoder 30 from FIG3 including CPB 68. In other examples, similar operations may be performed by video encoder 20 from FIG2 including CPB 66, destination device 100 from FIG4 including CPB 106 and video decoder 108, or other devices including a video encoder or video decoder having a CPB configured to operate according to HRD operation.

视频解码器30接收包含一或多个CRA图片或BLA图片的位流(190)。连同位流，视频解码器30还接收消息，所述消息指示针对CRA或BLA图片中的特定一者的NAL单元类型(192)。更具体来说，视频解码器30可自例如网络实体29的外部装置接收消息，所述网络实体29能够舍弃与特定图片相关联的TFD图片，且还能够向视频解码器30通知TFD图片已被舍弃。Video decoder 30 receives a bitstream (190) that includes one or more CRA pictures or BLA pictures. Along with the bitstream, video decoder 30 also receives a message that indicates the NAL unit type for a particular one of the CRA or BLA pictures (192). More specifically, video decoder 30 may receive a message from an external device, such as network entity 29, that is capable of discarding a TFD picture associated with a particular picture and that is also capable of notifying video decoder 30 that the TFD picture has been discarded.

举例来说，当特定图片具有在自视频编码器20输出的原始位流中的TFD图片，且TFD图片已由外部装置舍弃时，由视频解码器30接收到的消息可指示针对特定图片的NAL单元类型，所述NAL单元类型指示具有可解码前置图片的BLA图片(例如，BLA_W_DLP)，或指示无前置图片的BLA图片(例如，BLA_N_LP)。作为另一实例，当特定图片具有在原始位流中的TFD图片且TFD图片尚未被外部装置舍弃时，由视频解码器30接收到的消息可指示针对CRA图片或BLA图片中的一者的NAL单元类型，所述NAL单元类型指示具有非可解码前置图片的BLA图片(例如，BLA_W_TFD)。For example, when a specific picture has a TFD picture in the original bitstream output from video encoder 20, and the TFD picture has been discarded by an external device, the message received by video decoder 30 may indicate a NAL unit type for the specific picture, the NAL unit type indicating a BLA picture with a decodable preceding picture (e.g., BLA_W_DLP), or a BLA picture with no preceding picture (e.g., BLA_N_LP). As another example, when the specific picture has a TFD picture in the original bitstream and the TFD picture has not been discarded by an external device, the message received by video decoder 30 may indicate a NAL unit type for one of a CRA picture or a BLA picture, the NAL unit type indicating a BLA picture with a non-decodable preceding picture (e.g., BLA_W_TFD).

视频解码器30基于接收到的消息设定变量(例如，UseThisNalUnitType)，所述变量经定义以指示针对特定图片的NAL单元类型(194)。举例来说，视频解码器30可将UseThisNalUnitType设定为等于由针对特定图片的所接收到的消息指示的NAL单元类型。在一些状况下，视频解码器30可不接收针对CRA图片或BLA图片中的至少一者的消息。视频解码器30可接着将UseThisNalUnitType设定为等于在位流中针对特定图片所发信的NAL单元类型。视频解码器30基于变量设定针对特定图片的NAL单元类型(196)。视频解码器30接着基于针对特定图片的NAL单元类型来选择用于特定图片的CPB参数的默认集合或替代性集合(198)。Video decoder 30 sets a variable (e.g., UseThisNalUnitType) based on the received message, the variable being defined to indicate a NAL unit type for a particular picture (194). For example, video decoder 30 may set UseThisNalUnitType equal to the NAL unit type indicated by the received message for the particular picture. In some cases, video decoder 30 may not receive a message for at least one of a CRA picture or a BLA picture. Video decoder 30 may then set UseThisNalUnitType equal to the NAL unit type signaled in the bitstream for the particular picture. Video decoder 30 sets the NAL unit type for the particular picture based on the variable (196). Video decoder 30 then selects a default set or an alternative set of CPB parameters for the particular picture based on the NAL unit type for the particular picture (198).

图9是说明形成网络200的部分的装置的实例集合的框图。在此实例中，网络200包含路由装置204A、204B(路由装置204)及转码装置206。路由装置204及转码装置206意欲表示可形成网络200的部分的少量装置。例如交换机、集线器、网关、防火墙、网桥及其它此类装置的其它网络装置也可包含于网络200内。此外，可沿着服务器装置202与客户端装置208之间的网络路径而提供额外网络装置。在一些实例中，服务器装置202可对应于图1的源装置12，而客户端装置208可对应于图1的目的地装置14。9 is a block diagram illustrating an example set of devices that form part of network 200. In this example, network 200 includes routing devices 204A, 204B (routing devices 204) and transcoding device 206. Routing devices 204 and transcoding device 206 are intended to represent a small number of devices that may form part of network 200. Other network devices, such as switches, hubs, gateways, firewalls, bridges, and other such devices, may also be included within network 200. Furthermore, additional network devices may be provided along the network path between server device 202 and client device 208. In some examples, server device 202 may correspond to source device 12 of FIG. 1, while client device 208 may correspond to destination device 14 of FIG. 1.

一般来说，路由装置204实施一或多个路由协议以经由网络200交换网络数据。在一些实例中，路由装置204可经配置以执行代理或高速缓存操作。因此，在一些实例中，路由装置204可被称作代理装置。一般来说，路由装置204执行路由协议以发现经由网络200的路由。通过执行此类路由协议，路由装置204B可发现自其自身经由路由装置204A至服务器装置202的网络路由。Generally speaking, routing device 204 implements one or more routing protocols to exchange network data via network 200. In some instances, routing device 204 may be configured to perform proxy or caching operations. Therefore, in some instances, routing device 204 may be referred to as a proxy device. Generally speaking, routing device 204 executes routing protocols to discover routes through network 200. By executing such routing protocols, routing device 204B may discover a network route from itself via routing device 204A to server device 202.

本发明的技术可由例如路由装置204及转码装置206的网络装置来实施，但也可由客户端装置208来实施。以此方式，路由装置204、转码装置206及客户端装置208表示经配置以执行本发明的技术(包含在本发明的权利要求书部分中叙述的技术)的装置的实例。此外，图1的装置及展示于图2中的编码器以及展示于图3中的解码器也是可经配置以执行本发明的技术(包含在本发明的权利要求书部分中叙述的技术)的例示性装置。The techniques of this disclosure can be implemented by network devices such as routing device 204 and transcoding device 206, but can also be implemented by client device 208. In this manner, routing device 204, transcoding device 206, and client device 208 represent examples of devices configured to perform the techniques of this disclosure, including those recited in the claims section of this disclosure. Furthermore, the devices of FIG. 1 , the encoder shown in FIG. 2 , and the decoder shown in FIG. 3 are also exemplary devices that can be configured to perform the techniques of this disclosure, including those recited in the claims section of this disclosure.

应认识到，取决于实例，本文中所描述的技术中的任一者的某些动作或事件可以不同序列执行、可被添加、合并或完全省略(例如，对于实践所述技术来说并非所有所描述的动作或事件是必要的)。此外，在某些实例中，可(例如)经由多线程处理、中断处理或多个处理器同时而非顺序地执行动作或事件。It should be appreciated that, depending on the example, certain actions or events of any of the techniques described herein may be performed in a different sequence, may be added, combined, or omitted entirely (e.g., not all described actions or events are necessary to practice the techniques). Furthermore, in some examples, actions or events may be performed simultaneously rather than sequentially, for example, via multithreading, interrupt processing, or multiple processors.

在一或多个实例中，所描述功能可以硬件、软件、固件或其任何组合予以实施。如果以软件予以实施，则所述功能可作为一或多个指令或代码而存储于计算机可读媒体上或经由计算机可读媒体进行发射，且由基于硬件的处理单元执行。计算机可读媒体可包含计算机可读存储媒体(其对应于例如数据存储媒体的有形媒体)或通信媒体，通信媒体包含(例如)根据通信协议促进计算机程序自一处传送至另一处的任何媒体。以此方式，计算机可读媒体通常可对应于(1)非暂时性的有形计算机可读存储媒体，或(2)例如信号或载波的通信媒体。数据存储媒体可为可由一或多个计算机或一或多个处理器存取以检索指令、代码及/或数据结构以用于实施本发明中所描述的技术的任何可用媒体。计算机程序产品可包含计算机可读媒体。In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted via a computer-readable medium as one or more instructions or codes and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media (which corresponds to tangible media such as data storage media) or communication media, which includes, for example, any media that facilitates the transfer of a computer program from one place to another according to a communication protocol. In this manner, computer-readable media may generally correspond to (1) non-transitory, tangible computer-readable storage media, or (2) communication media such as signals or carrier waves. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, codes, and/or data structures for implementing the techniques described in this disclosure. A computer program product may include computer-readable media.

通过实例而非限制，此些计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、闪存，或可用以存储呈指令或数据结构的形式的所要程序代码且可由计算机存取的任何其它媒体。又，任何连接可适当地称为计算机可读媒体。举例来说，如果使用同轴电缆、光缆、双绞线、数字用户线(DSL)或无线技术(例如，红外线、无线电及微波)而自网站、服务器或其它远程源发射指令，则同轴电缆、光缆、双绞线、DSL或无线技术(例如，红外线、无线电及微波)包含于媒体的定义中。然而，应理解，计算机可读存储媒体及数据存储媒体不包含连接、载波、信号或其它暂时性媒体，而是针对非暂时性有形存储媒体。如本文中所使用，磁盘及光盘包含光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)、软磁盘及蓝光光盘，其中磁盘通常以磁性方式再生数据，而光盘通过激光以光学方式再生数据。以上各物的组合也应包含于计算机可读媒体的范围内。By way of example, and not limitation, such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Furthermore, any connection may be properly referred to as a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies (e.g., infrared, radio, and microwave), the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies (e.g., infrared, radio, and microwave) are included in the definition of medium. However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but rather refer to non-transitory, tangible storage media. As used herein, disk and disc include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, and Blu-ray disc, where disks typically reproduce data magnetically, while discs reproduce data optically using lasers. Combinations of the above should also be included within the scope of computer-readable media.

可由例如一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路的一或多个处理器来执行指令。因而，本文中所使用的术语“处理器”可指代上述结构或适合于实施本文中所描述的技术的任何其它结构中的任一者。此外，在一些方面中，可将本文中所描述的功能性提供于经配置以用于编码及解码的专用硬件及/或软件模块内，或并入于组合式编解码器中。又，所述技术可完全以一或多个电路或逻辑组件来实施。Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general-purpose microprocessors, application-specific integrated circuits (ASICs), field-programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits. Thus, the term "processor," as used herein, may refer to any of the aforementioned structures or any other structure suitable for implementing the techniques described herein. Furthermore, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated into a combined codec. Alternatively, the techniques may be fully implemented in one or more circuits or logic components.

本发明的技术可以多种装置或设备予以实施，所述装置或设备包含无线手机、集成电路(IC)或IC集合(例如，芯片集)。在本发明中描述各种组件、模块或单元以强调经配置以执行所揭示技术的装置的功能方面，但未必要求通过不同硬件单元来实现。确切来说，如上文所描述，可将各种单元组合于编解码器硬件单元中，或通过互操作性硬件单元(包含如上文所描述的一或多个处理器)的集合结合合适软件及/或固件来提供所述单元。The techniques of this disclosure may be implemented in a variety of devices or apparatuses, including wireless handsets, integrated circuits (ICs), or sets of ICs (e.g., chipsets). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require implementation by different hardware units. Specifically, as described above, the various units may be combined in a codec hardware unit, or provided by a collection of interoperable hardware units (including one or more processors as described above) in conjunction with appropriate software and/or firmware.

已描述各种实例。此些及其它实例属于以下权利要求书的范围内。Various embodiments have been described. These and other embodiments are within the scope of the following claims.

Claims

1. A method for processing video data, the method comprising:

Receive a bitstream representing multiple images, the multiple images including one or more Clean Random Access (CRA) images or one or more Broken Link Access (BLA) images, the one or more CRA images having an associated NAL cell type indicating that the CRA image has a TFD image as an associated discarded image, the one or more BLA images having an NAL cell type indicating that the BLA image has an associated TFD image, wherein the TFD image is a non-decodeable front image;

Receive a message from an external device indicating whether the TFD image of at least one of the one or more CRA images or the one or more BLA images is discarded by the external device, and therefore whether an alternative set of decoded image buffer CPB parameters is used for the one or more CRA images or the one or more BLA images.

Based on the received message, a variable is defined to indicate a set of CPB parameters for at least one of the one or more CRA images or the one or more BLA images, such that the variable indicates the alternative set of CPB parameters when the TFD image is discarded by the external device; and when the variable indicates the alternative set of CPB parameters, the alternative set of CPB parameters for at least one of the one or more CRA images or the one or more BLA images is selected, or when the variable does not indicate the alternative set of CPB parameters, a default set of CPB parameters for at least one of the one or more CRA images or the one or more BLA images is selected.

2. The method of claim 1, further comprising initializing the HRD using at least one of the one or more CRA images or the one or more BLA images and associated hypothetical reference decoder HRD parameters, wherein the HRD parameters comprise the selected set of CPB parameters.

3. The method of claim 1, further comprising setting a Network Abstraction Layer (NAL) unit type for at least one of the one or more CRA images or the one or more BLA images, wherein selecting one of the default set or the alternative set of CPB parameters comprises selecting one of the default set or the alternative set of CPB parameters for the one or more CRA images or the one or more BLA images based on the NAL unit type and the value of the variable.

4. The method of claim 3, wherein the one or more CRA images or at least one of the one or more BLA images includes a CRA image treated as a BLA image, and wherein setting the NAL unit type includes setting the NAL unit type for the CRA image treated as the BLA image based on the value of the variable.

5. The method of claim 4, wherein setting the NAL unit type for the CRA image processed as the BLA image includes:

Based on the values of the variables indicating the use of the CPB parameters, the NAL unit type for the CRA image treated as the BLA image is set to indicate the BLA image having an associated decodeable preceding image; and

Based on the values of the variables in the alternative set that do not indicate the use of CPB parameters, the NAL cell type for the CRA image disposed of as the BLA image is set to indicate the BLA image with the associated TFD image.

6. The method of claim 3, wherein at least one of the one or more CRA images or the one or more BLA images includes a CRA image, and wherein setting the NAL unit type includes setting the NAL unit type for the CRA image to indicate a general CRA image.

7. The method of claim 3, wherein at least one of the one or more CRA images or the one or more BLA images includes a CRA image, and wherein setting the NAL unit type includes setting the NAL unit type for the CRA image to indicate one of a CRA image with an associated non-decorable front image, a CRA image with an associated decorable front image, or a CRA image without a front image.

8. The method of claim 1, wherein at least one of the one or more CRA images or the one or more BLA images does not have an associated non-decodeable pre-image in the original bitstream, or has an associated non-decodeable pre-image in the original bitstream and the associated non-decodeable pre-image has not been discarded by the external device, and wherein the value of the variable does not indicate the alternative set of CPB parameters used for at least one of the one or more CRA images or the one or more BLA images.

9. The method of claim 1, further comprising:

Based on not receiving messages indicating whether to use the alternative set of CPB parameters for the one or more CRA images or the other of the one or more BLA images, the value of the variable is set to not indicate the use of the alternative set of CPB parameters for the one or more CRA images or the other of the one or more BLA images; and

The default set of CPB parameters for the other of the one or more CRA images or the one or more BLA images is selected based on the value of the variable.

10. The method of claim 1, wherein each of the default set of CPB parameters and the alternative set of CPB parameters includes an initial CPB removal delay and an initial CPB removal delay offset.

11. The method of claim 1, further comprising applying the selected set of CPB parameters for at least one of the one or more CRA images or the one or more BLA images to a CPB contained in a video decoding device to ensure that the CPB does not overflow during decoding of the video data.

12. The method of claim 1, further comprising applying the selected set of CPB parameters for at least one of the one or more CRA images or the one or more BLA images to a first CPB included in the video encoding apparatus to ensure that the first CPB included in the video encoding apparatus does not overflow during encoding the video data, and to ensure that a second CPB included in the video decoding apparatus does not overflow upon receiving the encoded bitstream generated by the video encoding apparatus.

13. A video decoding apparatus for processing video data, the apparatus comprising:

The decoded image buffer CPB is configured to store video data; and

One or more processors, configured to:

Receive a bitstream representing a plurality of images comprising one or more Clean Random Access (CRA) images or one or more Broken Link Access (BLA) images, wherein the one or more CRA images have NAL cell types indicating that the CRA images have associated TFD images marked as discarded, and the one or more BLA images have NAL cell types indicating that the BLA images have associated TFD images, wherein the TFD images are non-decodeable front images;

Receive a message from an external device indicating whether the TFD image of at least one of the one or more CRA images or the one or more BLA images is discarded by the external device, and therefore whether an alternative set of CPB parameters is used for at least one of the one or more CRA images or the one or more BLA images.

Based on the received message, a variable is set, defined to indicate a set of CPB parameters for at least one of the one or more CRA images or the one or more BLA images, such that the variable indicates the alternative set of CPB parameters when the TFD image is discarded by the external device; and

When the variable indicates the alternative set of CPB parameters, the alternative set of CPB parameters for at least one of the one or more CRA images or the one or more BLA images is selected; or when the variable does not indicate the alternative set of CPB parameters, the default set of CPB parameters for at least one of the one or more CRA images or the one or more BLA images is selected.

14. The video decoding apparatus of claim 13, wherein the one or more processors are further configured to initialize an HRD using at least one of the one or more CRA images or the one or more BLA images and associated hypothetical reference decoder HRD parameters, wherein the HRD parameters include the selected set of CPB parameters for the images.

15. The video decoding apparatus of claim 13, wherein the one or more processors are further configured to set a Network Abstraction Layer (NAL) unit type for at least one of the one or more CRA images or the one or more BLA images, and to select, based on the NAL unit type and the value of the variable, one of the default set or the alternative set of CPB parameters for at least one of the one or more CRA images or the one or more BLA images.

16. The video decoding apparatus of claim 15, wherein the one or more CRA images or at least one of the one or more BLA images includes a CRA image treated as a BLA image, and wherein the one or more processors are further configured to set the NAL unit type for the CRA image treated as the BLA image based on the value of the variable.

17. The video decoding apparatus according to claim 16, wherein:

Based on the values of the variables indicating the use of the CPB parameters, the one or more processors are further configured to set the NAL unit type for the CRA image treated as the BLA image to indicate a BLA image with an associated decodeable preceding image; and

Based on the values of the variables in the alternative set that do not indicate the use of CPB parameters, the one or more processors are further configured to set the NAL unit type for the CRA image disposed of as the BLA image to indicate the BLA image with an associated TFD image.

18. The video decoding apparatus of claim 15, wherein at least one of the one or more CRA images or the one or more BLA images comprises a CRA image, and wherein the one or more processors are further configured to set the NAL unit type for the CRA image to indicate a general CRA image.

19. The video decoding apparatus of claim 15, wherein at least one of the one or more CRA images or the one or more BLA images comprises a CRA image, and wherein the one or more processors are further configured to set the NAL unit type for the CRA image to indicate one of a CRA image having an associated non-decoding front image, a CRA image having an associated decoding front image, or a CRA image without a front image.

20. The video decoding apparatus of claim 13, wherein at least one of the one or more CRA images or the one or more BLA images does not have an associated non-decodeable pre-image in the original bitstream, or has an associated non-decodeable pre-image in the original bitstream and the associated non-decodeable pre-image has not been discarded by the external device, and wherein the value of the variable does not indicate the alternative set of CPB parameters used for at least one of the one or more CRA images or the one or more BLA images.

21. The video decoding apparatus of claim 13, wherein the one or more processors are further configured to:

22. The video decoding apparatus of claim 13, wherein each of the default set of CPB parameters and the alternative set of CPB parameters includes an initial CPB removal delay and an initial CPB removal delay offset.

23. The video decoding apparatus of claim 13, wherein the one or more processors are further configured to apply the selected set of CPB parameters for at least one of the one or more CRA images or the one or more BLA images to the CPB contained in the video decoding apparatus to ensure that the CPB does not overflow during decoding of the video data.

24. The video decoding apparatus of claim 13, wherein the one or more processors are further configured to apply the selected set of CPB parameters for at least one of the one or more CRA images or the one or more BLA images to a first CPB included in the video encoding apparatus to ensure that the first CPB included in the video encoding apparatus does not overflow during encoding the video data, and to ensure that a second CPB included in the video decoding apparatus does not overflow upon receiving the encoded bitstream generated by the video encoding apparatus.

25. A video decoding apparatus for processing video data, the apparatus comprising:

A means for receiving a bitstream representing a plurality of images, the plurality of images comprising one or more Clean Random Access (CRA) images or one or more Broken Link Access (BLA) images, the one or more CRA images having NAL cell types indicating that the CRA image has an associated TFD image marked as discarded, and the one or more BLA images having NAL cell types indicating that the BLA image has an associated TFD image, wherein the TFD image is a non-decodeable front image;

A means for receiving a message from an external device, the message indicating whether the TFD image of at least one of the one or more CRA images or the one or more BLA images is discarded by the external device, and therefore whether an alternative set of decoded image buffer CPB parameters is used for at least one of the one or more CRA images or the one or more BLA images.

A means for setting variables based on received messages, the variables being defined to indicate a set of CPB parameters for at least one of the one or more CRA images or the one or more BLA images, such that the variables indicate the alternative set of CPB parameters when the TFD image is discarded by the external device; and

A means for selecting the alternative set of CPB parameters for at least one of the one or more CRA images or the one or more BLA images when the variable indicates the alternative set of CPB parameters, or for selecting the default set of CPB parameters for at least one of the one or more CRA images or the one or more BLA images when the variable does not indicate the alternative set of CPB parameters.

26. The video decoding apparatus of claim 25, further comprising means for initializing an HRD using at least one of the one or more CRA images or the one or more BLA images and associated hypothetical reference decoder HRD parameters, wherein the HRD parameters comprise the selected set of CPB parameters.

27. The video decoding apparatus of claim 25, further comprising means for setting a Network Abstraction Layer (NAL) unit type for at least one of the one or more CRA images or the one or more BLA images, and means for selecting, based on the NAL unit type and the value of the variable, one of the default set or the alternative set of CPB parameters for at least one of the one or more CRA images or the one or more BLA images.

28. The video decoding apparatus of claim 27, wherein the one or more CRA images or at least one of the one or more BLA images includes a CRA image treated as a BLA image, the video decoding apparatus further comprising means for setting the NAL unit type for the CRA image treated as the BLA image based on the value of the variable.

29. The video decoding apparatus of claim 25, further comprising means for applying the selected set of CPB parameters for at least one of the one or more CRA images or the one or more BLA images to a CPB contained in the video decoding apparatus to ensure that the CPB does not overflow during decoding of the video data.

30. The video decoding apparatus of claim 25, further comprising means for applying the selected set of CPB parameters for at least one of the one or more CRA images or the one or more BLA images to a first CPB included in the video encoding apparatus to ensure that the first CPB included in the video encoding apparatus does not overflow during encoding the video data and to ensure that a second CPB included in the video decoding apparatus does not overflow upon receiving the encoded bitstream generated by the video encoding apparatus.

31. A non-transitory computer-readable medium storing instructions for processing video data, said instructions, when executed, causing one or more processors to:

Receive a message from an external device indicating whether the TFD image of at least one of the one or more CRA images or the one or more BLA images is discarded by the external device, and therefore whether an alternative set of decoded image buffer CPB parameters is used for at least one of the one or more CRA images or the one or more BLA images.

32. The non-transitory computer-readable medium of claim 31, wherein the instructions cause the one or more processors to set a Network Abstraction Layer (NAL) unit type for at least one of the one or more CRA images or the one or more BLA images, and to select, based on the NAL unit type and the value of the variable, one of the default set or the alternative set of CPB parameters for the one or more CRA images or the one or more BLA images.

33. The non-transitory computer-readable medium of claim 32, wherein the one or more CRA images or at least one of the one or more BLA images includes a CRA image treated as a BLA image, and wherein the instructions cause the one or more processors to set the NAL unit type for the CRA image treated as the BLA image based on the value of the variable.