HK1204181B

HK1204181B - Coding sei nal units for video coding

Info

Publication number: HK1204181B
Application number: HK15104410.7A
Authority: HK
Inventors: 王益魁
Original assignee: 高通股份有限公司
Priority date: 2012-07-10
Filing date: 2013-07-08
Publication date: 2019-08-30

Description

Decode SEI NAL units for video decoding

相关申请案Related applications

本申请案主张以下申请案的权利：This application claims the rights of:

2012年7月10日申请的第61/670,066号美国临时申请案，所述申请案据此以其全文引用的方式并入本文中。U.S. Provisional Application No. 61/670,066, filed Jul. 10, 2012, which is hereby incorporated by reference herein in its entirety.

技术领域Technical Field

本发明大体上涉及处理视频数据，且更确切地说，涉及用于视频数据中的随机存取图片。This disclosure relates generally to processing video data, and more particularly to random access pictures for use in video data.

背景技术Background Art

数字视频能力可并入到广泛范围的装置中，所述装置包含数字电视、数字直播系统、无线广播系统、个人数字助理(PDA)、膝上型或桌上型计算机、平板计算机、电子书阅读器、数码相机、数字记录装置、数字媒体播放器、视频游戏装置、视频游戏控制台、蜂窝式或卫星无线电电话、所谓的“智能电话”、视频电话会议装置、视频流式传输装置及其类似者。数字视频装置实施视频译码技术，例如描述于以下各者中的那些技术：由ITU-T H.261、ISO/IEC MPEG-1 Visual、ITU-T H.262、ISO/IEC MPEG-2Visual、ITU-T H.263、ITU-TH.264/MPEG-4第10部分高级视频译码(AVC)定义的标准，及目前在开发的高效率视频译码(HEVC)标准，及此些标准的扩展。视频装置可通过实施此些视频译码技术而较有效率地发射、接收、编码、解码及/或存储数字视频信息。Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital live broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video game devices, video game consoles, cellular or satellite radio telephones, so-called "smartphones", video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video coding techniques, such as those described in the standards defined by ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262, ISO/IEC MPEG-2 Visual, ITU-T H.263, ITU-T H.264/MPEG-4 Part 10 Advanced Video Coding (AVC), and the currently developing High Efficiency Video Coding (HEVC) standard, and extensions of such standards. By implementing such video coding techniques, video devices can more efficiently transmit, receive, encode, decode, and/or store digital video information.

视频译码技术包含空间(图片内)预测及/或时间(图片间)预测以减少或移除视频序列中固有的冗余。对于基于块的预测性视频译码，可将视频切片(例如，视频帧或视频帧的一部分)分割为视频块，所述视频块还可被称作宏块、树型块、译码树单元(CTU)、译码树型块(CTB)、译码单元(CU)及/或译码节点。使用相对于相同图片中的相邻块中的参考样本的空间预测来编码图片的经帧内译码(I)切片中的视频块。图片的经帧间译码(P或B)切片中的视频块可使用相对于相同图片中的相邻块中的参考样本的空间预测或相对于其它参考图片中的参考样本的时间预测。图片可被称作帧，且参考图片可被称作参考帧。Video coding techniques include spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based predictive video coding, a video slice (e.g., a video frame or a portion of a video frame) may be partitioned into video blocks, which may also be referred to as macroblocks, treeblocks, coding tree units (CTUs), coding tree blocks (CTBs), coding units (CUs), and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in an inter-coded (P or B) slice of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. Pictures may be referred to as frames, and reference pictures may be referred to as reference frames.

空间预测或时间预测导致译码用于块的预测性块。残余数据表示待译码的原始块与预测性块之间的像素差。像素还可被称作图片元素、像元或样本。根据指向形成预测性块的参考样本的块的运动向量，及指示经译码块与预测性块之间的差异的残余数据来编码经帧间译码块。根据帧内译码模式及残余数据来编码经帧内译码块。为进行进一步压缩，可将残余数据从像素域变换到变换域，从而产生可接着进行量化的残余变换系数。可扫描最初布置成二维阵列的经量化的变换系数以便产生变换系数的一维向量，且可应用熵译码以达成甚至更多的压缩。Spatial prediction or temporal prediction results in coding a predictive block for the block. Residual data represents the pixel differences between the original block to be coded and the predictive block. Pixels may also be referred to as picture elements, pels, or samples. Inter-coded blocks are encoded based on motion vectors pointing to a block of reference samples forming the predictive block, and residual data indicating the difference between the coded block and the predictive block. Intra-coded blocks are encoded based on an intra-coding mode and the residual data. For further compression, the residual data may be transformed from the pixel domain to the transform domain, generating residual transform coefficients that may then be quantized. The quantized transform coefficients, initially arranged in a two-dimensional array, may be scanned to generate a one-dimensional vector of transform coefficients, and entropy coding may be applied to achieve even greater compression.

发明内容Summary of the Invention

大体上，本发明描述用于处理视频数据的技术。详细地说，本发明描述可用以进行以下操作的技术：减少例如交谈式应用程序等视频应用程序中的延迟，提供随机存取经译码视频序列中的改进，及提供用于具有固定图片速率且支持时间可缩放性的视频内容的信息。In general, this disclosure describes techniques for processing video data. Specifically, this disclosure describes techniques that can be used to reduce latency in video applications, such as interactive applications, provide improvements in random access to coded video sequences, and provide information for video content with a fixed picture rate and supporting temporal scalability.

在一实例中，一种解码视频数据的方法包含：从网络抽象层(NAL)单元解封装位流的随机存取点(RAP)图片的切片，其中所述NAL单元包含NAL单元类型值，其指示所述RAP图片是否为可具有相关联的引导图片的类型，及所述RAP图片为瞬时解码器刷新(IDR)图片还是清洁随机存取(CRA)图片；基于所述NAL单元类型值确定所述RAP图片是否可具有相关联的引导图片；及基于所述确定所述RAP图片是否可具有相关联的引导图片，解码在所述RAP图片之后的所述位流的视频数据。In one example, a method of decoding video data includes: decapsulating a slice of a random access point (RAP) picture of a bitstream from a network abstraction layer (NAL) unit, wherein the NAL unit includes a NAL unit type value indicating whether the RAP picture is of a type that can have associated leading pictures, and whether the RAP picture is an instantaneous decoder refresh (IDR) picture or a clean random access (CRA) picture; determining whether the RAP picture can have associated leading pictures based on the NAL unit type value; and decoding video data of the bitstream following the RAP picture based on the determining whether the RAP picture can have associated leading pictures.

在另一实例中，一种用于解码视频数据的装置，所述装置包括处理器，所述处理器经配置以：从网络抽象层(NAL)单元解封装位流的随机存取点(RAP)图片的切片，其中所述NAL单元包含NAL单元类型值，其指示所述RAP图片是否为可具有相关联的引导图片的类型，及所述RAP图片为瞬时解码器刷新(IDR)图片还是清洁随机存取(CRA)图片；基于所述NAL单元类型值确定所述RAP图片是否可具有相关联的引导图片；及基于所述确定所述RAP图片是否可具有相关联的引导图片，解码在所述RAP图片之后的所述位流的视频数据。In another example, a device for decoding video data includes a processor configured to: decapsulate a slice of a random access point (RAP) picture of a bitstream from a network abstraction layer (NAL) unit, wherein the NAL unit includes a NAL unit type value indicating whether the RAP picture is of a type that can have associated leading pictures, and whether the RAP picture is an instantaneous decoder refresh (IDR) picture or a clean random access (CRA) picture; determine whether the RAP picture can have associated leading pictures based on the NAL unit type value; and decode video data of the bitstream following the RAP picture based on the determination of whether the RAP picture can have associated leading pictures.

在另一实例中，一种用于解码视频数据的装置包含：用于从网络抽象层(NAL)单元解封装位流的随机存取点(RAP)图片的切片的装置，其中所述NAL单元包含NAL单元类型值，其指示所述RAP图片是否为可具有相关联的引导图片的类型，及所述RAP图片为瞬时解码器刷新(IDR)图片还是清洁随机存取(CRA)图片；用于基于所述NAL单元类型值确定所述RAP图片是否可具有相关联的引导图片的装置；及用于基于所述确定所述RAP图片是否可具有相关联的引导图片，解码在所述RAP图片之后的所述位流的视频数据的装置。In another example, a device for decoding video data includes: means for decapsulating a slice of a random access point (RAP) picture of a bitstream from a network abstraction layer (NAL) unit, wherein the NAL unit includes a NAL unit type value indicating whether the RAP picture is of a type that can have associated leading pictures, and whether the RAP picture is an instantaneous decoder refresh (IDR) picture or a clean random access (CRA) picture; means for determining whether the RAP picture can have associated leading pictures based on the NAL unit type value; and means for decoding video data of the bitstream following the RAP picture based on the determining whether the RAP picture can have associated leading pictures.

在另一实例中，一种存储有指令的计算机可读存储媒体，所述指令在经执行时使得处理器进行以下操作：从网络抽象层(NAL)单元解封装位流的随机存取点(RAP)图片的切片，其中所述NAL单元包含NAL单元类型值，其指示所述RAP图片是否为可具有相关联的引导图片的类型，及所述RAP图片为瞬时解码器刷新(IDR)图片还是清洁随机存取(CRA)图片；基于所述NAL单元类型值确定所述RAP图片是否可具有相关联的引导图片；及基于所述确定所述RAP图片是否可具有相关联的引导图片，解码在所述RAP图片之后的所述位流的视频数据。In another example, a computer-readable storage medium storing instructions that, when executed, cause a processor to: decapsulate a slice of a random access point (RAP) picture of a bitstream from a network abstraction layer (NAL) unit, wherein the NAL unit includes a NAL unit type value indicating whether the RAP picture is of a type that can have associated leading pictures, and whether the RAP picture is an instantaneous decoder refresh (IDR) picture or a clean random access (CRA) picture; determine whether the RAP picture can have associated leading pictures based on the NAL unit type value; and decode video data of the bitstream following the RAP picture based on the determination of whether the RAP picture can have associated leading pictures.

在另一实例中，一种产生包含视频数据的位流的方法，所述方法包括：确定随机存取点(RAP)图片是否为可具有相关联的引导图片的类型，及所述RAP图片包括瞬时解码器刷新(IDR)图片还是清洁随机存取(CRA)图片；将所述RAP图片的切片封装于网络抽象层(NAL)单元中，其中所述NAL单元包含NAL单元类型值，其指示所述RAP图片是否为可具有相关联的引导图片的类型；及产生包含所述NAL单元的位流。In another example, a method of generating a bitstream including video data includes determining whether a random access point (RAP) picture is of a type that can have associated leading pictures, and whether the RAP picture comprises an instantaneous decoder refresh (IDR) picture or a clean random access (CRA) picture; encapsulating a slice of the RAP picture in a network abstraction layer (NAL) unit, wherein the NAL unit includes a NAL unit type value indicating whether the RAP picture is of a type that can have associated leading pictures; and generating a bitstream including the NAL unit.

在另一实例中，一种用于产生包含视频数据的位流的装置包含处理器，所述处理器经配置以进行以下操作：确定随机存取点(RAP)图片是否为可具有相关联的引导图片的类型，及所述RAP图片包括瞬时解码器刷新(IDR)图片还是清洁随机存取(CRA)图片；将所述RAP图片的切片封装于网络抽象层(NAL)单元中，其中所述NAL单元包含NAL单元类型值，其指示所述RAP图片是否为可具有相关联的引导图片的类型；及产生包含所述NAL单元的位流。In another example, a device for generating a bitstream including video data includes a processor configured to: determine whether a random access point (RAP) picture is of a type that can have associated leading pictures, and whether the RAP picture comprises an instantaneous decoder refresh (IDR) picture or a clean random access (CRA) picture; encapsulate slices of the RAP picture in a network abstraction layer (NAL) unit, wherein the NAL unit includes a NAL unit type value indicating whether the RAP picture is of a type that can have associated leading pictures; and generate a bitstream including the NAL unit.

在另一实例中，一种用于产生包含视频数据的位流的装置包含：用于确定随机存取点(RAP)图片是否为可具有相关联的引导图片的类型，及所述RAP图片包括瞬时解码器刷新(IDR)图片还是清洁随机存取(CRA)图片的装置；用于将所述RAP图片的切片封装于网络抽象层(NAL)单元中的装置，其中所述NAL单元包含NAL单元类型值，其指示所述RAP图片是否为可具有相关联的引导图片的类型；及用于产生包含所述NAL单元的位流的装置。In another example, a device for generating a bitstream including video data includes: means for determining whether a random access point (RAP) picture is of a type that can have associated leading pictures, and whether the RAP picture comprises an instantaneous decoder refresh (IDR) picture or a clean random access (CRA) picture; means for encapsulating a slice of the RAP picture in a network abstraction layer (NAL) unit, wherein the NAL unit includes a NAL unit type value indicating whether the RAP picture is of a type that can have associated leading pictures; and means for generating a bitstream including the NAL unit.

在另一实例中，一种存储有指令的计算机可读存储媒体，所述指令在经执行时使得处理器进行以下操作：确定随机存取点(RAP)图片是否为可具有相关联的引导图片的类型，及所述RAP图片包括瞬时解码器刷新(IDR)图片还是清洁随机存取(CRA)图片；将所述RAP图片的切片封装于网络抽象层(NAL)单元中，其中所述NAL单元包含NAL单元类型值，其指示所述RAP图片是否为可具有相关联的引导图片的类型；及产生包含所述NAL单元的位流。In another example, a computer-readable storage medium storing instructions that, when executed, cause a processor to: determine whether a random access point (RAP) picture is of a type that can have associated leading pictures, and whether the RAP picture comprises an instantaneous decoder refresh (IDR) picture or a clean random access (CRA) picture; encapsulate a slice of the RAP picture in a network abstraction layer (NAL) unit, wherein the NAL unit includes a NAL unit type value indicating whether the RAP picture is of a type that can have associated leading pictures; and generate a bitstream including the NAL unit.

在另一实例中，一种解码视频数据的方法包含：针对位流的补充增强信息(SEI)网络抽象层(NAL)单元，确定所述SEI NAL单元的NAL单元类型值指示所述NAL单元包括包含前缀SEI消息的前缀SEI NAL单元还是包含后缀SEI消息的后缀SEI NAL单元；及基于所述SEINAL单元为所述前缀SEI NAL单元还是所述后缀SEI NAL单元及所述SEI NAL单元的数据，解码在所述SEI NAL单元之后的所述位流的视频数据。In another example, a method of decoding video data includes: determining, for a supplemental enhancement information (SEI) network abstraction layer (NAL) unit of a bitstream, a NAL unit type value of the SEI NAL unit indicating whether the NAL unit comprises a prefix SEI NAL unit including a prefix SEI message or a suffix SEI NAL unit including a suffix SEI message; and decoding video data of the bitstream following the SEI NAL unit based on whether the SEI NAL unit is the prefix SEI NAL unit or the suffix SEI NAL unit and data of the SEI NAL unit.

在另一实例中，一种用于解码视频数据的装置包含处理器，所述处理器经配置以：针对位流的补充增强信息(SEI)网络抽象层(NAL)单元，确定所述SEI NAL单元的NAL单元类型值指示所述NAL单元包括包含前缀SEI消息的前缀SEI NAL单元还是包含后缀SEI消息的后缀SEI NAL单元；及基于所述SEI NAL单元为所述前缀SEI NAL单元还是所述后缀SEI NAL单元及所述SEI NAL单元的数据，解码在所述SEI NAL单元之后的所述位流的视频数据。In another example, a device for decoding video data includes a processor configured to: determine, for a supplemental enhancement information (SEI) network abstraction layer (NAL) unit of a bitstream, a NAL unit type value of the SEI NAL unit indicating whether the NAL unit comprises a prefix SEI NAL unit including a prefix SEI message or a suffix SEI NAL unit including a suffix SEI message; and decode video data of the bitstream following the SEI NAL unit based on whether the SEI NAL unit is the prefix SEI NAL unit or the suffix SEI NAL unit and data of the SEI NAL unit.

在另一实例中，一种用于解码视频数据的装置包含：用于针对位流的补充增强信息(SEI)网络抽象层(NAL)单元，确定所述SEI NAL单元的NAL单元类型值指示所述NAL单元包括包含前缀SEI消息的前缀SEI NAL单元还是包含后缀SEI消息的后缀SEI NAL单元的装置；及用于基于所述SEI NAL单元为所述前缀SEI NAL单元还是所述后缀SEI NAL单元及所述SEI NAL单元的数据，解码在所述SEI NAL单元之后的所述位流的视频数据的装置。In another example, a device for decoding video data includes: means for determining, for a supplemental enhancement information (SEI) network abstraction layer (NAL) unit of a bitstream, whether a NAL unit type value of the SEI NAL unit indicates whether the NAL unit comprises a prefix SEI NAL unit including a prefix SEI message or a suffix SEI NAL unit including a suffix SEI message; and means for decoding video data of the bitstream following the SEI NAL unit based on whether the SEI NAL unit is the prefix SEI NAL unit or the suffix SEI NAL unit and data of the SEI NAL unit.

在另一实例中，一种存储有指令的计算机可读存储媒体，所述指令在经执行时使得处理器进行以下操作：针对位流的补充增强信息(SEI)网络抽象层(NAL)单元，确定所述SEI NAL单元的NAL单元类型值指示所述NAL单元包括包含前缀SEI消息的前缀SEI NAL单元还是包含后缀SEI消息的后缀SEI NAL单元；及基于所述SEI NAL单元为所述前缀SEI NAL单元还是所述后缀SEI NAL单元及所述SEI NAL单元的数据，解码在所述SEI NAL单元之后的所述位流的视频数据。In another example, a computer-readable storage medium storing instructions that, when executed, cause a processor to: determine, for a supplemental enhancement information (SEI) network abstraction layer (NAL) unit of a bitstream, a NAL unit type value of the SEI NAL unit indicating whether the NAL unit comprises a prefix SEI NAL unit including a prefix SEI message or a suffix SEI NAL unit including a suffix SEI message; and decode video data of the bitstream following the SEI NAL unit based on whether the SEI NAL unit is the prefix SEI NAL unit or the suffix SEI NAL unit and data of the SEI NAL unit.

在另一实例中，一种产生包含视频数据的位流的方法包含：确定补充增强信息(SEI)消息为前缀SEI消息还是后缀SEI消息，其中所述SEI消息包含与经编码视频数据有关的数据；将所述SEI消息封装于SEI NAL单元中，其中所述SEI NAL单元包含NAL单元类型值，其指示所述SEI NAL单元为前缀SEI NAL单元还是后缀SEI NAL单元，及所述SEI消息为前缀SEI消息还是后缀SEI消息；及产生至少包含所述SEI NAL单元的位流。In another example, a method of generating a bitstream including video data includes determining whether a supplemental enhancement information (SEI) message is a prefix SEI message or a suffix SEI message, wherein the SEI message includes data related to the encoded video data; encapsulating the SEI message in a SEI NAL unit, wherein the SEI NAL unit includes a NAL unit type value indicating whether the SEI NAL unit is a prefix SEI NAL unit or a suffix SEI NAL unit, and whether the SEI message is a prefix SEI message or a suffix SEI message; and generating a bitstream including at least the SEI NAL unit.

在另一实例中，一种用于产生包含视频的位流的装置包含处理器，所述处理器经配置以：确定补充增强信息(SEI)消息为前缀SEI消息还是后缀SEI消息，其中所述SEI消息包含与经编码视频数据有关的数据；将所述SEI消息封装于SEI NAL单元中，其中所述SEINAL单元包含NAL单元类型值，其指示所述SEI NAL单元为前缀SEI NAL单元还是后缀SEINAL单元，及所述SEI消息为前缀SEI消息还是后缀SEI消息；及产生至少包含所述SEI NAL单元的位流。In another example, a device for generating a bitstream including video includes a processor configured to: determine whether a supplemental enhancement information (SEI) message is a prefix SEI message or a suffix SEI message, wherein the SEI message includes data related to encoded video data; encapsulate the SEI message in a SEI NAL unit, wherein the SEI NAL unit includes a NAL unit type value indicating whether the SEI NAL unit is a prefix SEI NAL unit or a suffix SEI NAL unit, and whether the SEI message is a prefix SEI message or a suffix SEI message; and generate a bitstream including at least the SEI NAL unit.

在另一实例中，一种用于产生包含视频数据的位流的装置包含：用于确定补充增强信息(SEI)消息为前缀SEI消息还是后缀SEI消息的装置，其中所述SEI消息包含与经编码视频数据有关的数据；用于将所述SEI消息封装于SEI NAL单元中的装置，其中所述SEI NAL单元包含NAL单元类型值，其指示所述SEI NAL单元为前缀SEI NAL单元还是后缀SEI NAL单元，及所述SEI消息为前缀SEI消息还是后缀SEI消息；及用于产生至少包含所述SEI NAL单元的位流的装置。In another example, a device for generating a bitstream including video data includes: means for determining whether a supplemental enhancement information (SEI) message is a prefix SEI message or a suffix SEI message, wherein the SEI message includes data related to the encoded video data; means for encapsulating the SEI message in a SEI NAL unit, wherein the SEI NAL unit includes a NAL unit type value indicating whether the SEI NAL unit is a prefix SEI NAL unit or a suffix SEI NAL unit, and whether the SEI message is a prefix SEI message or a suffix SEI message; and means for generating a bitstream including at least the SEI NAL unit.

在另一实例中，一种存储有指令的计算机可读存储媒体，所述指令在经执行时使得处理器进行以下操作：确定补充增强信息(SEI)消息为前缀SEI消息还是后缀SEI消息，其中所述SEI消息包含与经编码视频数据有关的数据；将所述SEI消息封装于SEI NAL单元中，其中所述SEI NAL单元包含NAL单元类型值，其指示所述SEI NAL单元为前缀SEI NAL单元还是后缀SEI NAL单元，及所述SEI消息为前缀SEI消息还是后缀SEI消息；及产生至少包含所述SEI NAL单元的位流。In another example, a computer-readable storage medium having instructions stored thereon that, when executed, cause a processor to: determine whether a supplemental enhancement information (SEI) message is a prefix SEI message or a suffix SEI message, wherein the SEI message includes data related to encoded video data; encapsulate the SEI message in a SEI NAL unit, wherein the SEI NAL unit includes a NAL unit type value indicating whether the SEI NAL unit is a prefix SEI NAL unit or a suffix SEI NAL unit, and whether the SEI message is a prefix SEI message or a suffix SEI message; and generate a bitstream including at least the SEI NAL unit.

在另一实例中，一种呈现视频数据的方法包含：确定所述视频数据的整数值；确定第一图片的呈现时间与第二图片的呈现时间之间的差值，其中所述差值等于所述整数值乘以时钟刻度值；及根据所述所确定的差值，呈现所述第一图片及所述第二图片。In another example, a method of presenting video data includes determining an integer value for the video data; determining a difference between a presentation time of a first picture and a presentation time of a second picture, wherein the difference is equal to the integer value multiplied by a clock tick value; and presenting the first picture and the second picture based on the determined difference.

在另一实例中，一种用于呈现视频数据的装置包含处理器，所述处理器经配置以：确定所述视频数据的整数值；确定第一图片的呈现时间与第二图片的呈现时间之间的差值，其中所述差值等于所述整数值乘以时钟刻度值；及根据所述所确定的差值，呈现所述第一图片及所述第二图片。In another example, a device for presenting video data includes a processor configured to: determine an integer value for the video data; determine a difference between a presentation time of a first picture and a presentation time of a second picture, wherein the difference is equal to the integer value multiplied by a clock tick value; and present the first picture and the second picture based on the determined difference.

在另一实例中，一种用于呈现视频数据的装置包含：用于确定所述视频数据的整数值的装置；用于确定第一图片的呈现时间与第二图片的呈现时间之间的差值的装置，其中所述差值等于所述整数值乘以时钟刻度值；及用于根据所述所确定的差值，呈现所述第一图片及所述第二图片的装置。In another example, a device for presenting video data includes: a device for determining an integer value for the video data; a device for determining a difference between a presentation time of a first picture and a presentation time of a second picture, wherein the difference is equal to the integer value multiplied by a clock tick value; and a device for presenting the first picture and the second picture based on the determined difference.

在另一实例中，一种存储有指令的计算机可读存储媒体，所述指令在经执行时使得处理器进行以下操作：确定所述视频数据的整数值；确定第一图片的呈现时间与第二图片的呈现时间之间的差值，其中所述差值等于所述整数值乘以时钟刻度值；及根据所述所确定的差值，呈现所述第一图片及所述第二图片。In another example, a computer-readable storage medium stores instructions that, when executed, cause a processor to: determine an integer value for the video data; determine a difference between a presentation time of a first picture and a presentation time of a second picture, wherein the difference is equal to the integer value multiplied by a clock tick value; and present the first picture and the second picture based on the determined difference.

在另一实例中，一种产生包含视频数据的位流的方法包含：产生指示第一图片的呈现时间与第二图片的呈现时间之间的差是否为时钟刻度值的整数倍的数据；及在所述数据指示所述差为所述时钟刻度值的所述整数倍时，产生表示所述整数倍数的数据。In another example, a method of generating a bitstream including video data includes generating data indicating whether a difference between a presentation time of a first picture and a presentation time of a second picture is an integer multiple of a clock tick value; and when the data indicates that the difference is the integer multiple of the clock tick value, generating data representing the integer multiple.

在另一实例中，一种用于产生包含视频数据的位流的装置包含处理器，所述处理器经配置以进行以下操作：产生指示第一图片的呈现时间与第二图片的呈现时间之间的差是否为时钟刻度值的整数倍的数据；及在所述数据指示所述差为所述时钟刻度值的所述整数倍时，产生表示所述整数倍数的数据。In another example, a device for generating a bitstream including video data includes a processor configured to: generate data indicating whether a difference between a presentation time of a first picture and a presentation time of a second picture is an integer multiple of a clock tick value; and when the data indicates that the difference is the integer multiple of the clock tick value, generate data representing the integer multiple.

在另一实例中，一种用于产生包含视频数据的位流的装置包含：用于产生指示第一图片的呈现时间与第二图片的呈现时间之间的差是否为时钟刻度值的整数倍的数据的装置；及用于在所述数据指示所述差为所述时钟刻度值的所述整数倍时，产生表示所述整数倍数的数据的装置。In another example, a device for generating a bitstream including video data includes: means for generating data indicating whether a difference between a presentation time of a first picture and a presentation time of a second picture is an integer multiple of a clock tick value; and means for generating data representing the integer multiple when the data indicates that the difference is the integer multiple of the clock tick value.

在另一实例中，一种存储有指令的计算机可读存储媒体，所述指令在经执行时使得处理器进行以下操作：产生指示第一图片的呈现时间与第二图片的呈现时间之间的差是否为时钟刻度值的整数倍的数据；及在所述数据指示所述差为所述时钟刻度值的所述整数倍时，产生表示所述整数倍数的数据。In another example, a computer-readable storage medium stores instructions that, when executed, cause a processor to: generate data indicating whether the difference between the presentation time of a first image and the presentation time of a second image is an integer multiple of a clock tick value; and when the data indicates that the difference is the integer multiple of the clock tick value, generate data representing the integer multiple.

在随附图式及以下描述中阐述一或多个实例的细节。其它特征、目标及优势将从所述描述及所述图式以及从权利要求书而显而易见。The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为说明根据预测性视频译码技术译码的视频序列的概念图。1 is a conceptual diagram illustrating a video sequence coded according to predictive video coding techniques.

图2为说明经译码视频序列的一实例的概念图。2 is a conceptual diagram illustrating an example of a coded video sequence.

图3为说明可利用本发明中所描述的技术的实例视频编码及解码系统的框图。3 is a block diagram illustrating an example video encoding and decoding system that may utilize the techniques described in this disclosure.

图4为说明可实施本发明中所描述的技术的实例封装单元的框图。4 is a block diagram illustrating an example encapsulation unit that may implement the techniques described in this disclosure.

图5为说明根据本发明的技术产生VCL NAL单元的一实例的流程图。5 is a flow diagram illustrating an example of generating VCL NAL units according to the techniques of this disclosure.

图6为说明根据本发明的技术产生非VCL NAL单元的一实例的流程图。6 is a flow diagram illustrating an example of generating non-VCL NAL units according to the techniques of this disclosure.

图7为说明用信号发出呈现时间增量值的一实例的流程图。7 is a flow diagram illustrating an example of signaling a presentation time delta value.

图8为说明可实施本发明中所描述的技术的实例视频编码器的框图。8 is a block diagram illustrating an example video encoder that may implement the techniques described in this disclosure.

图9为说明确定呈现时间增量值的一实例的流程图。9 is a flow diagram illustrating an example of determining a presentation time delta value.

图10为说明可实施本发明中所描述的技术的实例视频解码器的框图。10 is a block diagram illustrating an example video decoder that may implement the techniques described in this disclosure.

具体实施方式DETAILED DESCRIPTION

本发明描述各种改进的视频译码设计。详细地说，本发明描述可用以减少例如交谈式应用程序等视频应用程序中的延迟，及提供随机存取经译码视频序列中的改进的技术。This disclosure describes various improved video coding designs. Specifically, this disclosure describes techniques that can be used to reduce delay in video applications, such as conversational applications, and to provide improvements in random access to coded video sequences.

数字视频装置实施视频压缩技术以更有效率地编码及解码数字视频信息。可根据视频译码标准(例如，AVC或HEVC)来定义视频压缩技术。ITU-T H.264/MPEG-4(AVC)标准由ITU-T视频译码专家组(VCEG)连同ISO/IEC动画专家组(MPEG)一起阐明为被称为联合视频小组(JVT)的集体合作的产品。H.264标准由ITU-T研究组且在日期为2005年3月描述于ITU-T推荐H.264(用于一般视听服务的高级视频译码)中，其在本文中可被称作H.264标准或H.264规范或H.264/AVC标准或规范。联合视频小组(JVT)继续致力于对H.264/MPEG-4AVC的扩展。Digital video devices implement video compression techniques to more efficiently encode and decode digital video information. Video compression techniques may be defined according to video coding standards, such as AVC or HEVC. The ITU-T H.264/MPEG-4 (AVC) standard was formulated by the ITU-T Video Coding Experts Group (VCEG) in conjunction with the ISO/IEC Moving Picture Experts Group (MPEG) as a product of a collective effort known as the Joint Video Team (JVT). The H.264 standard was developed by an ITU-T study group and is described in ITU-T Recommendation H.264 (Advanced Video Coding for General Audiovisual Services) dated March 2005, which may be referred to herein as the H.264 standard or H.264 specification or the H.264/AVC standard or specification. The Joint Video Team (JVT) continues to work on extensions to H.264/MPEG-4 AVC.

HEVC的最新工作草案(其被称作“HEVC工作草案7”或“WD7”)描述于文件JCTVC-I1003_d5(Bross等人，“WD7：Working Draft 7of High-Efficiency Video Coding(HEVC)”，ITU-T SG16WP3及ISO/IEC JTC1/SC29/WG11的视频译码联合合作小组(JCT-VC)，第9次会议：Switzerland的日内瓦，2012年4月27日到2012年5月7日)中。另外，HEVC的另一最新工作草案(工作草案9)描述于文件HCTVC-K1003_d7(Bross等人，“High EfficiencyVideo Coding(HEVC)Text Specification Draft9”，ITU-T SG16WP3及ISO/IEC JTC1/SC29/WG11的视频译码联合合作小组(JCT-VC)，第11次会议：中国上海，2012年10月)中。即将到来的HEVC标准还可被称作ISO/IEC 23008-HEVC，其意欲为HEVC的交付版本的标准编号。在一些方面中，本发明中所描述的技术可应用于大体上符合H.264及/或即将到来的HEVC标准的装置。尽管关于H.264标准及即将到来的HEVC标准描述本发明的技术，但本发明的技术大体上可适用于任何视频译码标准。The latest working draft of HEVC, referred to as “HEVC Working Draft 7” or “WD7”, is described in document JCTVC-I1003_d5 (Bross et al., “WD7: Working Draft 7 of High-Efficiency Video Coding (HEVC)”, Joint Collaboration Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, 9th Meeting: Geneva, Switzerland, April 27, 2012 - May 7, 2012). Additionally, another recent working draft of HEVC (Working Draft 9) is described in document HCTVC-K1003_d7 (Bross et al., "High Efficiency Video Coding (HEVC) Text Specification Draft 9," Joint Collaboration Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, 11th Meeting: Shanghai, China, October 2012). The upcoming HEVC standard may also be referred to as ISO/IEC 23008-HEVC, which is intended to be the standard number for the delivery version of HEVC. In some aspects, the techniques described in this disclosure may be applied to devices that generally conform to H.264 and/or the upcoming HEVC standard. Although the techniques of this disclosure are described with respect to the H.264 standard and the upcoming HEVC standard, the techniques of this disclosure are generally applicable to any video coding standard.

视频序列通常包含一系列视频帧，视频帧还被称作图片。编码及/或解码视频序列的视频应用程序的实例包含本地播放、流式传输、广播、多播及交谈式应用程序。交谈式应用程序包含视频电话及视频会议，且还被称作低延迟应用程序。交谈式应用程序需要全部系统的相对较低的端间延迟，即，在第一数字视频装置处俘获视频帧的时间与在第二数字视频装置处显示所述视频帧的时间之间的延迟。对于交谈式应用程序，通常可接受的端间延迟应小于400ms，且大约150ms的端间延迟被视为极好的。A video sequence typically comprises a series of video frames, also referred to as pictures. Examples of video applications that encode and/or decode video sequences include local playback, streaming, broadcast, multicast, and conversational applications. Conversational applications include video telephony and video conferencing, and are also referred to as low-latency applications. Conversational applications require a relatively low end-to-end latency for the entire system, i.e., the delay between the time a video frame is captured at a first digital video device and the time it is displayed at a second digital video device. For conversational applications, an acceptable end-to-end latency is typically less than 400 ms, with an end-to-end latency of approximately 150 ms considered excellent.

与处理视频序列相关联的每一步骤可有助于总的端间延迟。与处理视频序列相关联的延迟的实例包含俘获延迟、预处理延迟、编码延迟、发射延迟、接收缓冲延迟(用于解除抖动)、解码延迟、经解码图片输出延迟、后处理延迟及显示延迟。与根据特定视频译码标准译码视频序列相关联的延迟可被称作编解码器延迟，且其可包含编码延迟、解码延迟及经解码图片输出延迟。在交谈式应用程序中应使编解码器延迟最小化。详细地说，视频序列的译码结构应确保视频序列中的图片的输出次序与所述视频序列中的图片的解码次序相同，使得经解码图片输出延迟等于零。视频序列的译码结构部分地指代用以编码视频序列的图片类型的分配。Each step associated with processing a video sequence may contribute to the overall end-to-end delay. Examples of delays associated with processing a video sequence include capture delay, preprocessing delay, encoding delay, transmission delay, receive buffering delay (for dejittering), decoding delay, decoded picture output delay, post-processing delay, and display delay. The delay associated with decoding a video sequence according to a particular video coding standard may be referred to as codec delay, and may include encoding delay, decoding delay, and decoded picture output delay. Codec delay should be minimized in interactive applications. Specifically, the coding structure of a video sequence should ensure that the output order of pictures in the video sequence is the same as the decoding order of the pictures in the video sequence, so that the decoded picture output delay is equal to zero. The coding structure of a video sequence refers, in part, to the allocation of picture types used to encode the video sequence.

图片群组(GOP)大体上包括根据显示次序布置的一或多个图片的序列。根据HEVC，视频编码器可将视频帧或图片划分成一系列大小相等的视频块。视频块可具有明度分量(表示为Y)及两个色度分量(表示为U及V或Cb及Cr)。此些视频块还可被称作最大译码单元(LCU)、树型块或译码树型块单元(CTU)。HEVC的LCU可广泛地类似于例如H.264/AVC等先前标准的宏块。然而，LCU未必限于特定大小。根据HEVC，位流内的语法数据可根据水平明度样本及/或垂直明度样本的数目来定义LCU。举例来说，可将LCU定义为包含64x64或32x32个明度样本。另外，可根据四分树分割方案将LCU分割成多个译码单元(CU)。大体上，四分树分割指代将CU递归地分裂成四个子CU。与经译码位流相关联的语法数据可定义可分裂LCU的最大次数，所述最大次数被称作最大CU深度，且所述语法数据还可定义CU的最小大小。因此，位流还可定义最小译码单元(SCU)。举例来说，可将SCU定义为包含8x8明度样本。A group of pictures (GOP) generally comprises a sequence of one or more pictures arranged according to display order. According to HEVC, a video encoder may divide a video frame or picture into a series of equally sized video blocks. A video block may have a luma component (denoted as Y) and two chroma components (denoted as U and V or Cb and Cr). Such video blocks may also be referred to as largest coding units (LCUs), treeblocks, or coding treeblock units (CTUs). An LCU of HEVC may be broadly similar to macroblocks of previous standards such as H.264/AVC. However, an LCU is not necessarily limited to a specific size. According to HEVC, syntax data within the bitstream may define an LCU based on the number of horizontal luma samples and/or vertical luma samples. For example, an LCU may be defined as comprising 64x64 or 32x32 luma samples. Additionally, an LCU may be partitioned into multiple coding units (CUs) according to a quadtree partitioning scheme. Generally, quadtree partitioning refers to recursively splitting a CU into four sub-CUs. Syntax data associated with the coded bitstream may define the maximum number of times an LCU can be split, referred to as the maximum CU depth, and may also define the minimum size of a CU. Thus, the bitstream may also define a smallest coding unit (SCU). For example, an SCU may be defined as comprising 8x8 luma samples.

此外，根据HEVC，视频编码器可将图片分割成多个切片，其中所述切片中的每一者包含整数数目个LCU。切片可为I切片、P切片或B切片，其中I、P及B定义其它视频块如何用于预测CU。使用帧内预测模式预测I切片(例如，从相同帧内的视频块预测)。帧内译码依赖于空间预测以减少或移除给定视频帧或图片内的视频的空间冗余。使用单向帧间预测模式预测P切片(例如，从先前帧中的视频块预测)。使用双向帧间预测模式预测B切片(例如，从先前帧及后续帧内的视频块预测)。帧间译码依赖于时间预测以减少或移除视频序列的邻近帧或图片内的视频的时间冗余。Furthermore, according to HEVC, a video encoder may partition a picture into multiple slices, where each of the slices includes an integer number of LCUs. A slice may be an I slice, a P slice, or a B slice, where I, P, and B define how other video blocks are used to predict a CU. An I slice is predicted using an intra-prediction mode (e.g., predicted from a video block within the same frame). Intra-coding relies on spatial prediction to reduce or remove spatial redundancy in video within a given video frame or picture. A P slice is predicted using a unidirectional inter-prediction mode (e.g., predicted from a video block in a previous frame). A B slice is predicted using a bidirectional inter-prediction mode (e.g., predicted from video blocks in a previous frame and a subsequent frame). Inter-coding relies on temporal prediction to reduce or remove temporal redundancy in video within adjacent frames or pictures of a video sequence.

图1为说明根据预测性视频译码技术译码的视频序列的概念图。如图1中所说明，视频序列100包含图片Pic₁到Pic₁₀。在图1的概念图中，图片Pic₁到Pic₁₀是根据其将显示的次序而布置及顺序地编号。如下文更详细地描述，显示次序未必对应于解码次序。如图1中所说明，视频序列100包含GOP₁及GOP₂，其中图片Pic₁到Pic₅包含于GOP₁中，且图片Pic₆到Pic₁₀包含于GOP₂中。图1说明将Pic₅分割成slice₁及slice₂，其中slice₁及slice₂中的每一者包含根据从左到右从上到下的光栅扫描的连续的LCU。尽管未展示，但可以类似方式将图1中所说明的其它图片分割成一或多个切片。图1还说明关于GOP₂的I切片、P切片或B切片的概念。与GOP₂中的Pic₆到Pic₁₀中的每一者相关联的箭头基于由箭头指示的参考图片指示图片包含I切片、P切片还是B切片。在图1中，图片Pic₆及Pic₉表示包含I切片的图片(即，参考所述图片自身)，图片Pic₇及Pic₁₀表示包含P切片的图片(即，各自参考先前图片)，及Pic₈表示包含B切片的图片(即，参考先前图片及后续图片)。FIG1 is a conceptual diagram illustrating a video sequence coded according to predictive video coding techniques. As illustrated in FIG1 , video sequence 100 includes pictures Pic ₁ through Pic _10. In the conceptual diagram of FIG1 , pictures Pic ₁ through Pic ₁₀ are arranged and sequentially numbered according to the order in which they will be displayed. As described in more detail below, the display order does not necessarily correspond to the decoding order. As illustrated in FIG1 , video sequence 100 includes GOPs ₁ and ₂ , where pictures Pic ₁ through Pic ₅ are included in GOP ₁ , and pictures Pic ₆ through Pic ₁₀ are included in GOP _2. FIG1 illustrates the partitioning of Pic ₅ into slices ₁ and ₂ , where each of slices ₁ and ₂ includes consecutive LCUs according to a raster scan from left to right and top to bottom. Although not shown, the other pictures illustrated in FIG1 can be partitioned into one or more slices in a similar manner. 1 also illustrates the concept of an I slice, a P slice, or a B slice for GOP _2. The arrows associated with each of Pic ₆ through Pic ₁₀ in GOP ₂ indicate whether the picture includes an I slice, a P slice, or a B slice based on the reference picture indicated by the arrow. In FIG1 , pictures Pic ₆ and Pic ₉ represent pictures that include an I slice (i.e., reference the pictures themselves), pictures Pic ₇ and Pic ₁₀ represent pictures that include a P slice (i.e., each references a previous picture), and Pic ₈ represents a picture that includes a B slice (i.e., references both a previous picture and a subsequent picture).

在HEVC中，视频序列中的每一者：GOP、图片、切片及CU可与描述视频译码性质的语法数据相关联。举例来说，切片包含标头，所述标头包含指示所述切片为I切片、P切片还是B切片的语法元素。另外，HEVC包含参数集概念。参数集为包含允许视频解码器重建构视频序列的语法元素的语法结构。HEVC利用阶层式参数集机制，其中语法元素基于预期语法元素改变的频率包含于一种类型的参数集中。HEVC中的参数集机制将不频繁改变的信息的发射与经译码块数据的发射解耦。另外，在一些应用程序中，可“带外”输送参数集，即，不与含有经译码视频数据的单元一起运输所述参数集。带外发射通常是可靠的。In HEVC, each of the video sequences: GOP, picture, slice, and CU can be associated with syntax data that describes the properties of the video coding. For example, a slice includes a header that includes syntax elements indicating whether the slice is an I slice, a P slice, or a B slice. In addition, HEVC includes the concept of parameter sets. A parameter set is a syntax structure that includes syntax elements that allow a video decoder to reconstruct a video sequence. HEVC utilizes a hierarchical parameter set mechanism, in which syntax elements are included in a type of parameter set based on the frequency with which the syntax elements are expected to change. The parameter set mechanism in HEVC decouples the transmission of infrequently changing information from the transmission of coded block data. In addition, in some applications, parameter sets can be transported "out-of-band," that is, they are not transported together with the unit containing the coded video data. Out-of-band transmission is generally reliable.

在HEVC WD7中，使用参数集ID识别特定参数集。在HEVC WD7中，参数集ID为不带正负号的整数指数哥伦布译码(Exp-Golomb-coded)语法元素，其中从左位开始。HEVC WD7定义以下参数集：In HEVC WD7, a parameter set ID is used to identify a specific parameter set. In HEVC WD7, the parameter set ID is an unsigned integer exponential Golomb-coded (Exp-Golomb-coded) syntax element, starting from the left bit. HEVC WD7 defines the following parameter sets:

视频参数集(VPS)：VPS为含有适用于零或多个全部经译码视频序列的语法元素的语法结构。即，VPS包含预期对于帧序列保持不变的语法元素(例如，图片次序、参考帧的数目及图片大小)。使用VPS ID识别VPS。序列参数集包含VPS ID。Video Parameter Set (VPS): A VPS is a syntax structure that contains syntax elements that apply to zero or more entire coded video sequences. That is, a VPS includes syntax elements that are expected to remain unchanged for a sequence of frames (e.g., picture order, number of reference frames, and picture size). A VPS is identified using a VPS ID. A sequence parameter set includes the VPS ID.

序列参数集(SPS)-SPS为包含适用于零或多个全部经译码视频序列的语法元素的语法结构。即，SPS包含预期对于帧序列保持不变的语法元素(例如，图片次序、参考帧的数目及图片大小)。使用SPS ID识别SPS。图片参数集包含SPS ID。Sequence Parameter Set (SPS) - An SPS is a syntax structure that includes syntax elements that apply to zero or more entire coded video sequences. That is, an SPS includes syntax elements that are expected to remain unchanged for a sequence of frames (e.g., picture order, number of reference frames, and picture size). An SPS is identified using an SPS ID. A picture parameter set includes the SPS ID.

图片参数集(PPS)-PPS为包含适用于一或多个图片的语法元素的语法结构。即，PPS包含可在一序列内从图片到图片改变的语法元素(例如，熵译码模式、量化参数及位深度)。使用PPS ID识别PPS参数集。切片标头包含PPS ID。Picture Parameter Set (PPS) - A PPS is a syntax structure that includes syntax elements applicable to one or more pictures. That is, a PPS includes syntax elements that can change from picture to picture within a sequence (e.g., entropy coding mode, quantization parameter, and bit depth). A PPS parameter set is identified using a PPS ID. The slice header includes the PPS ID.

自适应参数集(APS)-APS为包含适用于一或多个图片的语法元素的语法结构。APS包含预期在序列的图片内改变的语法元素(例如，块大小及解块滤波)。使用APS ID识别APS集。切片标头可包含APS ID。Adaptation Parameter Set (APS) - An APS is a syntax structure that contains syntax elements applicable to one or more pictures. An APS includes syntax elements that are expected to change within pictures of a sequence (e.g., block size and deblocking filtering). An APS set is identified using an APS ID. A slice header may include an APS ID.

根据HEVC WD7中所定义的参数集类型，每一SPS参考VPS ID，每一PPS参考SPS ID，且每一切片标头参考PPS ID且可能参考APS ID。应注意，在一些状况下，SPS中包含VPS ID及PPS中包含SPS ID的线性参考关系可为低效的。举例来说，尽管HEVC WD7中支持VPS，但序列层级信息参数中的大多数仍仅存在于SPS中。除参数集概念之外，HEVC包含经译码视频序列及存取单元的概念。根据HEVC WD7，经译码视频序列及存取单元如下来定义：According to the parameter set type defined in HEVC WD7, each SPS references the VPS ID, each PPS references the SPS ID, and each slice header references the PPS ID and possibly the APS ID. It should be noted that in some cases, a linear reference relationship including the VPS ID in the SPS and the SPS ID in the PPS may be inefficient. For example, although VPS is supported in HEVC WD7, most of the sequence level information parameters are still only present in the SPS. In addition to the parameter set concept, HEVC includes the concept of decoded video sequence and access unit. According to HEVC WD7, decoded video sequence and access unit are defined as follows:

经译码视频序列：存取单元的序列，其由按解码次序的以下各者组成：为位流中的第一存取单元的CRA存取单元、IDR存取单元或BLA存取单元，继的包含至多所有后续存取单元但不包含任何后续IDR或BLA存取单元的零或多个非IDR及非BLA存取单元[下文详细描述CRA存取单元、IDR存取单元及BLA存取单元]。Coded video sequence: A sequence of access units consisting of, in decoding order: a CRA access unit, an IDR access unit, or a BLA access unit that is the first access unit in the bitstream, followed by zero or more non-IDR and non-BLA access units that include up to all subsequent access units but do not include any subsequent IDR or BLA access units [CRA access units, IDR access units, and BLA access units are described in detail below].

存取单元：在解码次序上连续且含有一个经译码图片的NAL单元的集合。除经译码图片的经译码切片NAL单元之外，存取单元还可含有不含有经译码图片的切片的其它NAL单元。存取单元的解码始终产生经解码图片。Access unit: A set of NAL units that are consecutive in decoding order and contain one coded picture. In addition to coded slice NAL units of coded pictures, an access unit may also contain other NAL units that do not contain slices of coded pictures. Decoding of an access unit always produces a decoded picture.

NAL单元指代网络抽象层单元。因此，根据HEVC，经译码视频数据的位流包含NAL单元的序列。存取单元为在解码次序上连续地布置且含有确切一个经译码图片的NAL单元的集合，且经译码视频序列包含按解码次序布置的存取单元的序列。图2为说明经译码视频序列的一实例的概念图。图2表示可对应于图1中所说明的GOP₂的经译码视频序列200的一实例。如图2中所说明，经译码视频序列200包含对应于Pic₆到Pic₁₀中的每一者的存取单元。经译码视频序列200的存取单元是根据解码次序顺序地布置。应注意，对应于Pic₉的存取单元位于对应于Pic₈的存取单元之前。因此，解码次序并不对应于图1中所说明的显示次序。在此实例中，此情况是归因于Pic₈参考Pic₉的事实。因此，必须在可解码Pic₈之前解码Pic₉。图2说明对应于Pic₉的存取单元包含NAL单元：AU定界符NAL单元202、PPS NAL单元204、slice₁NAL单元206及slice₂NAL单元208。每一NAL单元可包含识别NAL单元类型的标头。NAL unit refers to a network abstraction layer unit. Thus, according to HEVC, the bitstream of coded video data includes a sequence of NAL units. An access unit is a set of NAL units that are arranged contiguously in decoding order and contain exactly one coded picture, and a coded video sequence includes a sequence of access units arranged in decoding order. FIG2 is a conceptual diagram illustrating an example of a coded video sequence. FIG2 represents an example of a coded video sequence 200 that may correspond to GOP ₂ illustrated in FIG1 . As illustrated in FIG2 , coded video sequence 200 includes access units corresponding to each of Pics ₆ to _10. The access units of coded video sequence 200 are arranged sequentially according to decoding order. It should be noted that the access unit corresponding to Pic ₉ precedes the access unit corresponding to Pic _8. Therefore, the decoding order does not correspond to the display order illustrated in FIG1 . In this example, this is due to the fact that Pic ₈ references Pic ₉ . Therefore, Pic ₉ must be decoded before Pic ₈ can be decoded. Figure 2 illustrates that the access unit corresponding to Pic ₉ includes NAL units: AU delimiter NAL unit 202, PPS NAL unit 204, slice ₁ NAL unit 206, and slice ₂ NAL unit 208. Each NAL unit may include a header that identifies the NAL unit type.

HEVC定义两个NAL单元类型类别：经译码切片NAL单元(VCL)及非VCL NAL单元。经译码切片NAL单元含有视频数据的切片。在图2中所说明的实例中，slice₁NAL单元206及slice₂NAL单元208各自含有视频数据的切片，且为VCL NAL单元的实例。在图2的实例中，slice₁NAL单元206及slice₂NAL单元208中的每一者可为I切片。非VCL包含含有除了视频数据的切片以外的信息。举例来说，非VCL可含有定界符数据或参数集。在图2中所说明的实例中，AU定界符NAL单元202包含信息以从对应于Pic₇的存取单元来界定对应于Pic₉的存取单元的界限。另外，PPS NAL单元204包含图片参数集。因此，AU定界符NAL单元202及PPS NAL单元204为非VCL NAL单元的实例。HEVC defines two categories of NAL unit types: coded slice NAL units (VCLs) and non-VCL NAL units. A coded slice NAL unit contains a slice of video data. In the example illustrated in FIG2 , slice ₁ NAL unit 206 and slice ₂ NAL unit 208 each contain a slice of video data and are examples of VCL NAL units. In the example of FIG2 , each of slice ₁ NAL unit 206 and slice ₂ NAL unit 208 may be an I slice. Non-VCLs include information other than a slice of video data. For example, non-VCLs may contain delimiter data or parameter sets. In the example illustrated in FIG2 , AU delimiter NAL unit 202 includes information to define the boundaries of the access unit corresponding to Pic ₉ from the access unit corresponding to Pic _7. Additionally, PPS NAL unit 204 includes a picture parameter set. Thus, the AU delimiter NAL unit 202 and the PPS NAL unit 204 are examples of non-VCL NAL units.

HEVC中的非VCL NAL单元的另一实例为补充增强信息(SEI)NAL单元。AVC及HEVC两者中所支持的SEI机制使得编码器能够将元数据包含于位流中，所述元数据并非正确地解码输出图片的样本值所需的，但所述元数据可用于各种其它目的，例如图片输出定时、显示以及损失检测及隐藏。举例来说，SEI NAL单元可包含由视频解码器在解码位流时使用的图片定时消息。图片定时消息可包含指示视频解码器应何时开始解码VCL NAL单元的信息。编码器可包含存取单元中的任何数目个SEI NAL单元，且每一SEI NAL单元可含有一或多个SEI消息。草案HEVC标准包含用于若干SEI消息的语法及语义，但未指定所述SEI消息的处置，这是因为其不影响基准解码过程。草案HEVC标准中具有SEI消息的一个原因为使得能够在使用HEVC的不同系统中相同地解译补充数据。使用HEVC的规范及系统可能需要编码器产生某些SEI消息或可定义特定类型的所接收SEI消息的特定处置。表1列出HEVC中所指定的SEI消息且简略地描述其目的。Another example of a non-VCL NAL unit in HEVC is the Supplemental Enhancement Information (SEI) NAL unit. The SEI mechanism, supported in both AVC and HEVC, enables encoders to include metadata in the bitstream that is not required for correctly decoding the sample values of output pictures, but which can be used for various other purposes, such as picture output timing, display, and loss detection and concealment. For example, an SEI NAL unit may include a picture timing message used by a video decoder when decoding the bitstream. The picture timing message may include information indicating when the video decoder should begin decoding VCL NAL units. An encoder may include any number of SEI NAL units in an access unit, and each SEI NAL unit may contain one or more SEI messages. The draft HEVC standard includes syntax and semantics for several SEI messages, but does not specify their handling because they do not affect the baseline decoding process. One reason for including SEI messages in the draft HEVC standard is to enable the supplementary data to be interpreted identically in different systems using HEVC. Specifications and systems using HEVC may require the encoder to generate certain SEI messages or may define specific handling of received SEI messages of a particular type. Table 1 lists the SEI messages specified in HEVC and briefly describes their purposes.

表1：SEI消息的概述Table 1: Overview of SEI messages

随机存取指代从并非位流中的第一经译码图片的经译码图片开始解码视频位流。许多视频应用程序中需要随机存取位流，例如广播及流式传输，(例如)以用于用户在不同信道之间切换、跳转到视频的特定部分或切换到不同位流以达成流调适(例如，达成位速率、帧速率或空间分辨率可缩放性)。对于视频序列，通过以规则间隔多次地具有包含随机存取点(RAP)图片或存取单元的译码结构实现随机存取。瞬时解码器刷新(IDR)图片、清洁随机存取(CRA)图片及断链存取(BLA)图片为HEVC WD7中所定义的RAP图片类型。IDR图片、CRA图片及BLA图片中的每一者仅包含I切片。然而，基于所定义的参考约束，IDR图片、CRA图片及BLA图片中的每一者不同。Random access refers to decoding a video bitstream starting from a coded picture that is not the first coded picture in the bitstream. Random access to the bitstream is required in many video applications, such as broadcasting and streaming, for example, for users to switch between different channels, jump to a specific part of the video, or switch to a different bitstream to achieve stream adaptation (e.g., to achieve bit rate, frame rate, or spatial resolution scalability). For a video sequence, random access is achieved by having a coding structure that includes random access point (RAP) pictures or access units multiple times at regular intervals. Instantaneous decoder refresh (IDR) pictures, clean random access (CRA) pictures, and broken link access (BLA) pictures are RAP picture types defined in HEVC WD7. Each of the IDR pictures, CRA pictures, and BLA pictures contains only I slices. However, based on the defined reference constraints, each of the IDR pictures, CRA pictures, and BLA pictures is different.

IDR图片是在AVC中指定，且是根据HEVC WD7定义。虽然IDR图片可用于随机存取，但IDR图片是受约束的，这是因为在解码次序上在IDR图片之后的图片无法使用在IDR图片之前解码的图片作为参考。在图1及图2中所说明的实例中，如上文所描述，视频序列100中的pic₆可为IDR图片。归因于与IDR图片相关联的约束，依赖于IDR图片来随机存取的位流可具有显著较低的译码效率。IDR pictures are specified in AVC and defined according to HEVC WD7. Although IDR pictures can be used for random access, they are constrained because pictures following the IDR picture in decoding order cannot use pictures decoded before the IDR picture as references. In the example illustrated in Figures 1 and 2, as described above, pic ₆ in video sequence 100 may be an IDR picture. Due to the constraints associated with IDR pictures, bitstreams that rely on IDR pictures for random access may have significantly lower coding efficiency.

为改进译码效率，在HEVC中引入CRA图片的概念。根据HEVC WD7，类似于IDR图片，CRA图片仅包含I切片。然而，允许在解码次序上在CRA图片之后但在输出次序上在CRA图片之前的图片使用在CRA图片之前解码的图片作为参考。在解码次序上在CRA图片之后但在输出次序上在CRA图片之前的图片被称作与CRA图片相关联的引导图片(或CRA图片的引导图片)。如果解码从当前CRA图片之前的IDR图片或CRA图片开始，那么CRA图片的引导图片可正确解码。然而，当发生从CRA图片随机存取时，CRA图片的引导图片可能不可正确解码。参看图1及图2中所说明的实例，Pic₉可为CRA图片且Pic₈可为Pic₉的引导图片。如果在Pic₆下存取GOP₂，那么Pic₈可正确解码，但如果将GOP₂作为Pic₉存取，那么其可能不可正确解码。这是归因于以下事实：如果将GOP₂作为Pic₉存取，那么Pic₇可能不可用。为防止从取决于解码开始处而可能不可用的参考图片的错误传播，根据HEVC WD7，将在解码次序及输出次序两者上在CRA图片之后的所有图片约束为不使用在解码次序或输出次序上在CRA图片之前的任何图片(其包含引导图片)作为参考。另外，在随机存取解码期间，通常丢弃引导图片。To improve coding efficiency, HEVC introduces the concept of CRA pictures. According to HEVC WD7, similar to IDR pictures, CRA pictures only include I slices. However, pictures that follow a CRA picture in decoding order but precede it in output order are allowed to use pictures decoded before the CRA picture as references. Pictures that follow a CRA picture in decoding order but precede it in output order are referred to as leading pictures associated with the CRA picture (or leading pictures of a CRA picture). If decoding starts from an IDR picture or CRA picture preceding the current CRA picture, the leading pictures of the CRA picture can be correctly decoded. However, when random access occurs from a CRA picture, the leading pictures of the CRA picture may not be correctly decoded. Referring to the examples illustrated in Figures 1 and 2, Pic ₉ may be a CRA picture and Pic ₈ may be a leading picture of Pic ₉ . If GOP ₂ is accessed at Pic ₆ , Pic ₈ may be correctly decodable, but if GOP ₂ is accessed as Pic ₉ , it may not be correctly decodable. This is due to the fact that if GOP ₂ is accessed as Pic ₉ , Pic ₇ may not be available. To prevent error propagation from reference pictures that may not be available depending on where decoding starts, according to HEVC WD7, all pictures following a CRA picture in both decoding order and output order are constrained to not use any pictures preceding the CRA picture in decoding order or output order (including leading pictures) as reference. In addition, leading pictures are typically discarded during random access decoding.

位流拼接指代两个或两个以上位流或其部分的串接。举例来说，可将第二位流附加到第一位流，可能对所述位流中的一者或两者作出一些修改以产生经拼接的位流。第二位流中的第一经译码图片还被称作拼接点。因此，经拼接的位流中在拼接点之后的图片起源于第二位流，而经拼接的位流中在拼接点之前的图片起源于第一位流。通常由位流拼接器执行位流的拼接。位流拼接器常常为轻型的且相比视频编码器来说不太智能。举例来说，位流拼接器可能未装备有熵解码及编码能力。时间可缩放性为可使用位流拼接的应用程序。时间可缩放性可指代以一或多个帧速率解码视频序列。举例来说，可能能够基于系统能力以30帧/秒(fps)或60fps解码视频序列。为达成时间可缩放性，视频序列可包含多个时间层。其中每一时间层为与帧速率相关联的经译码视频序列。具有最高帧速率的时间层可被称作最高时间层。可将多个时间层拼接在一起以产生最高帧速率的视频序列，例如，将具有30fps的经译码视频序列与实现60fps的经译码视频序列拼接。Bitstream splicing refers to the concatenation of two or more bitstreams or portions thereof. For example, a second bitstream may be appended to a first bitstream, possibly with some modifications made to one or both of the bitstreams to produce the spliced bitstream. The first coded picture in the second bitstream is also referred to as a splicing point. Thus, pictures after the splicing point in the spliced bitstream originate from the second bitstream, while pictures before the splicing point in the spliced bitstream originate from the first bitstream. Bitstream splicing is typically performed by a bitstream splicer. Bitstream splicers are often lightweight and less intelligent than video encoders. For example, a bitstream splicer may not be equipped with entropy decoding and encoding capabilities. Temporal scalability is an application that can use bitstream splicing. Temporal scalability may refer to decoding a video sequence at one or more frame rates. For example, it may be possible to decode a video sequence at 30 frames per second (fps) or 60 fps, depending on system capabilities. To achieve temporal scalability, a video sequence may include multiple temporal layers. Each temporal layer is a coded video sequence associated with a frame rate. The temporal layer with the highest frame rate may be referred to as the highest temporal layer. Multiple temporal layers may be spliced together to produce a video sequence with the highest frame rate, for example, splicing a coded video sequence with 30 fps with a coded video sequence that achieves 60 fps.

可在自适应流式传输环境中使用位流切换。在切换到的位流中的某些图片处的位流切换操作实际上为位流拼接操作，其中拼接点为位流切换点，即，从切换到的位流起的第一图片。应注意，通常对具有相同译码结构的两个流执行位流切换。即，所述两个流具有相同的预测结构及对IDR图片、CRA图片、P图片及B图片等的相同分配。Bitstream switching can be used in an adaptive streaming environment. The bitstream switching operation at certain pictures in the switched bitstream is actually a bitstream splicing operation, where the splicing point is the bitstream switching point, that is, the first picture from the switched bitstream. It should be noted that bitstream switching is typically performed on two streams with the same coding structure. That is, the two streams have the same prediction structure and the same allocation of IDR pictures, CRA pictures, P pictures, B pictures, etc.

在引入CRA图片之后，在HEVC WD7中进一步引入断链存取(BLA)图片的概念，且其是基于CRA图片的概念。BLA图片通常起源于CRA图片的位置处的位流拼接，且在经拼接的位流中，拼接点CRA图片改变成BLA图片。BLA图片与CRA图片之间的最本质差异如下：对于CRA图片，如果解码从在解码次序上在CRA图片之前的RAP图片开始，那么相关联的引导图片可正确解码，且在随机存取从CRA图片开始时，相关联的引导图片可能不可正确解码；对于BLA图片，在所有状况下，相关联的引导图片可能不可正确解码，甚至在解码从在解码次序上在BLA图片之前的RAP图片开始时也是如此。应注意，对于特定CRA图片或BLA图片，相关联的引导图片中的一些可正确解码，甚至在CRA图片或BLA图片为位流中的第一图片时也是如此。此些引导图片被称作可解码引导图片(DLP)，且其它引导图片被称作非可解码引导图片(NLP)。在HEVC WD9中，NLP还被称作加标志为丢弃(TFD)图片。应注意，与IDR图片相关联的所有引导图片为DLP图片。表2为包含于HEVC WD7中的指定根据HEVC WD7定义的NAL单元的表格。如表2中所说明，HEVC WD7中的NAL单元类型包含CRA图片、BLA图片、IDR图片、VPS、SPS、PPS及APS NAL单元类型，其对应于上文所描述的图片及参数集。Following the introduction of CRA pictures, the concept of broken link access (BLA) pictures was further introduced in HEVC WD7, which is based on the concept of CRA pictures. BLA pictures typically originate from bitstream splicing at the location of a CRA picture, and in the spliced bitstream, the splice-point CRA picture is changed to a BLA picture. The most essential difference between BLA pictures and CRA pictures is as follows: for CRA pictures, if decoding starts from a RAP picture that precedes the CRA picture in decoding order, the associated leading pictures are correctly decodable, but when random access starts from the CRA picture, the associated leading pictures may not be correctly decodable. For BLA pictures, the associated leading pictures may not be correctly decodable in all cases, even when decoding starts from a RAP picture that precedes the BLA picture in decoding order. It should be noted that for a particular CRA picture or BLA picture, some of the associated leading pictures are correctly decodable, even when the CRA picture or BLA picture is the first picture in the bitstream. Such leading pictures are referred to as decodable leading pictures (DLPs), and other leading pictures are referred to as non-decodable leading pictures (NLPs). In HEVC WD9, NLPs are also referred to as flagged-for-discard (TFD) pictures. It should be noted that all leading pictures associated with IDR pictures are DLP pictures. Table 2 is a table included in HEVC WD7 that specifies the NAL units defined according to HEVC WD7. As illustrated in Table 2, the NAL unit types in HEVC WD7 include CRA pictures, BLA pictures, IDR pictures, VPS, SPS, PPS, and APS NAL unit types, which correspond to the pictures and parameter sets described above.

表2：HEVC WD7 NAL单元类型码及NAL单元类型类别Table 2: HEVC WD7 NAL unit type code and NAL unit type category

为了简化NAL单元分配，S.Kanumuri、G.Sullivan的“随机存取点支持的改进(Refinement of Random Access Point Support)”(第10次会议，SE的斯德哥尔摩，2012年7月，文件JCTVC-J0344(下文称作“Kanumuri”)，其以全文引用的方式并入本文中)提议：(1)对IDR图片的约束，使得不存在与任何IDR图片相关联的引导图片(即，无可在解码次序上在IDR图片之后且在输出次序上在IDR图片之前的图片)，及(2)针对RAP图片的根据上文表2所定义的NAL单元类型4到7的修改分配，如下：To simplify NAL unit allocation, S. Kanumuri, G. Sullivan, “Refinement of Random Access Point Support,” 10th Conf, Stockholm, SE, July 2012, document JCTVC-J0344 (hereinafter “Kanumuri”), which is incorporated herein by reference in its entirety, proposed: (1) a constraint on IDR pictures such that there are no leading pictures associated with any IDR picture (i.e., no pictures that follow the IDR picture in decoding order and precede the IDR picture in output order), and (2) a modified allocation of NAL unit types 4 to 7 for RAP pictures, as defined in Table 2 above, as follows:

表3：根据Kanumuri的所提议的NAL单元类型Table 3: Proposed NAL unit types according to Kanumuri

在表3中，SAP类型指代ISO/IEC 14496-12第4版本的“信息技术-视听对象的译码-第12部分：ISO基础媒体文件格式(Information technology-Coding of audio-visualobjects-Part 12：ISO base media file format)”(w12640，第100次MPEG会议，日内瓦，2012年4月)中所定义的流存取点类型，所述文件以其全文引用的方式并入本文中。如上文所描述，对于位流切换，IDR图片及BLA/CRA图片在功能上不同，但对于随机存取，其在功能上相同(例如，搜索应用程序)。对于IDR图片处的位流切换，视频译码系统可知道或假定呈现可为连续的，而不会出现故障(例如，未呈现的图片的遗失)。这是因为在解码次序上在IDR图片之后的图片不可使用在IDR图片之前解码的图片作为参考(即，与IDR图片相关联的引导图片为DLP)。然而，对于BLA图片处的位流切换，可能需要对来自两个流的一或多个图片进行一些重叠解码以确保呈现连续。在无额外能力的情况下，对于符合HEVC WD7解码器，此重叠解码当前可能为不可能的。在无额外能力的情况下，在相关联的TFD图片位置处可能不存在待呈现的任何图片，这是因为可能已丢弃所述图片。此可导致呈现未必为连续的。另外，即使BLA图片为不具有相关联的TFD图片的BLA图片，问题也是相同的，这是因为可能已丢弃存在于原始位流中的TFD图片。另外，如果原始位流中不存在TFD图片，那么CRA图片(归因于位流拼接/切换等，稍后改变为BLA图片)可能已经编码为IDR图片。因此，不将具有引导图片的IDR图片标记为IDR图片(即，不允许IDR图片具有引导图片)，如Kanumuri所提议，使得IDR图片对于系统的位流切换不太友好。In Table 3, the SAP type refers to the stream access point type defined in ISO/IEC 14496-12, Version 4, "Information technology - Coding of audio-visual objects - Part 12: ISO base media file format" (w12640, 100th MPEG meeting, Geneva, April 2012), which is incorporated herein by reference in its entirety. As described above, IDR pictures and BLA/CRA pictures are functionally different for bitstream switching, but they are functionally the same for random access (e.g., seek applications). For bitstream switching at IDR pictures, the video coding system can know or assume that presentation can be continuous without failure (e.g., loss of non-presented pictures). This is because pictures following the IDR picture in decoding order cannot use pictures decoded before the IDR picture as references (i.e., the leading pictures associated with the IDR picture are DLPs). However, for bitstream switching at BLA pictures, some overlap decoding of one or more pictures from both streams may be required to ensure continuous presentation. Without additional capabilities, this overlap decoding may not currently be possible for HEVC WD7-compliant decoders. Without additional capabilities, there may not be any pictures to be presented at the associated TFD picture location because they may have been discarded. This can result in presentation that is not necessarily continuous. Furthermore, the problem is the same even if the BLA picture is a BLA picture without an associated TFD picture because the TFD pictures present in the original bitstream may have been discarded. Furthermore, if TFD pictures are not present in the original bitstream, then CRA pictures (later changed to BLA pictures due to bitstream splicing/switching, etc.) may have been coded as IDR pictures. Therefore, not marking IDR pictures with leading pictures as IDR pictures (i.e., not allowing IDR pictures to have leading pictures), as proposed by Kanumuri, makes IDR pictures less friendly to system bitstream switching.

从流式传输系统视角来说，例如，经由HTTP的动态流式传输(DASH)，能够容易地识别哪个图片为RAP图片及在解码从RAP图片开始的情况下，能够识别为最早呈现时间(例如，最早图片次序计数(POC)值)的时间是有益的。因此，可进一步改进将NAL单元类型分配到不同的RAP图片以及DLP图片及TFD图片的现有设计以对于流式传输系统更友好。根据现有设计，对于每一RAP图片，当解码从所述RAP图片开始时，系统必须查看是否存在相关联的DLP图片，以知道所述RAP图片自身的呈现时间是否为最早呈现时间。另外，系统必须查看并比较所有DLP图片的呈现时间，以计算出最早呈现时间值。From the perspective of streaming systems, such as Dynamic Streaming over HTTP (DASH), it is beneficial to be able to easily identify which picture is a RAP picture and, when decoding starts from a RAP picture, to identify the earliest presentation time (e.g., the earliest picture order count (POC) value). Therefore, the existing design of assigning NAL unit types to different RAP pictures, DLP pictures, and TFD pictures can be further improved to be more friendly to streaming systems. According to the existing design, for each RAP picture, when decoding starts from the RAP picture, the system must check whether there is an associated DLP picture to know whether the RAP picture's own presentation time is the earliest presentation time. In addition, the system must check and compare the presentation times of all DLP pictures to calculate the earliest presentation time value.

视频译码标准包含视频缓冲模型规范。在AVC及HEVC中，缓冲模型被称作假设参考解码器(HRD)，其包含经译码图片缓冲器(CPB)及经解码图片缓冲器(DPB)两者的缓冲模型。根据HEVC WD7，将HRD定义为假设解码器模型，其指定对编码过程可产生的符合的NAL单元流或符合的位流的可变性的约束。因此，在AVC及HEVC中，将位流符合性及解码器符合性指定为HRD规范的部分。根据HEVC WD7，CPB为按解码次序含有存取单元的先进先出缓冲器，且DPB为保持经解码图片以用于参考的缓冲器。根据HRD，以数学方法指定CPB行为及DPB行为。HRD直接对定时、缓冲器大小及位速率强加约束，且间接地对位流特性及统计强加约束。HRD参数的完整集合包含五个基本参数：初始CPB移除延迟、CPB大小、位速率、初始DPB输出延迟及DPB大小。根据HEVC WD7，HRD参数可包含于视频可用性信息(VUI)参数中，且VUI参数可包含于SPS中。应注意，尽管HRD被称作解码器，但编码器侧通常需要HRD以保证位流符合性，且解码器侧通常不需要HRD。HEVC WD7指定用于HRD符合性的两种类型的位流，即类型I及类型II。HEVC WD7还指定两种类型的解码器符合性，即输出定时解码器符合性及输出次序解码器符合性。Video coding standards include video buffering model specifications. In AVC and HEVC, the buffering model is called the Hypothetical Reference Decoder (HRD), which includes buffering models for both the coded picture buffer (CPB) and the decoded picture buffer (DPB). According to HEVC WD7, the HRD is defined as a hypothetical decoder model that specifies constraints on the variability of conforming NAL unit streams or conforming bitstreams that can be produced by the encoding process. Therefore, in AVC and HEVC, bitstream conformance and decoder conformance are specified as part of the HRD specification. According to HEVC WD7, the CPB is a first-in, first-out buffer containing access units in decoding order, and the DPB is a buffer that holds decoded pictures for reference. According to the HRD, the CPB and DPB behaviors are mathematically specified. The HRD imposes constraints directly on timing, buffer size, and bitrate, and indirectly on bitstream characteristics and statistics. The complete set of HRD parameters includes five basic parameters: initial CPB removal delay, CPB size, bitrate, initial DPB output delay, and DPB size. According to HEVC WD7, HRD parameters can be included in the Video Usability Information (VUI) parameters, and the VUI parameters can be included in the SPS. It should be noted that although HRD is referred to as a decoder, HRD is generally required on the encoder side to ensure bitstream conformance, and is generally not required on the decoder side. HEVC WD7 specifies two types of bitstreams for HRD conformance, namely Type I and Type II. HEVC WD7 also specifies two types of decoder conformance, namely output timing decoder conformance and output order decoder conformance.

在AVC及HEVC HRD模型中，解码或CPB移除是基于存取单元，且假定图片解码是瞬时的。在真实世界应用中解码图片所需的时间不可能等于零。因此，在实务应用中，如果符合的解码器严格地遵循(例如)在图片定时SEI消息中用信号发出的解码时间来开始解码存取单元，那么可输出特定经解码图片的最早可能时间等于所述特定图片的解码时间加上解码所述特定图片所需的时间。In the AVC and HEVC HRD models, decoding or CPB removal is based on access units and assumes that picture decoding is instantaneous. In real-world applications, the time required to decode a picture is unlikely to be zero. Therefore, in practical applications, if a conforming decoder strictly follows the decoding time signaled, for example, in a picture timing SEI message to start decoding an access unit, the earliest possible time that a particular decoded picture can be output is equal to the decoding time of that particular picture plus the time required to decode that particular picture.

类似于以下文件中所描述的CPB行为的基于子图片的CPB行为已包含于HEVC WD7中：Ye-Kui Wang等人的“基于子图片的CPB操作(Sub-picture based CPB operation)”(第9次会议：CH的日内瓦，2012年5月，JCTVC-I0588(下文称作“Wang”))。基于Wang子图片的CPB允许在存取单元(AU)层级或子图片层级进行CPB移除。允许AU层级或子图片层级CPB移除有助于以互通方式达成减少的编解码器延迟。当发生存取单元层级的CPB移除时，每次发生移除操作时，将存取单元从CPB移除。当发生子图片层级的CPB移除时，每次发生移除操作时，将含有一或多个切片的解码单元(DU)从CPB移除。Sub-picture based CPB behavior similar to that described in Ye-Kui Wang et al., "Sub-picture based CPB operation" (9th meeting: Geneva, CH, May 2012, JCTVC-I0588 (hereinafter referred to as "Wang")) has been included in HEVC WD7. Wang sub-picture based CPB allows CPB removal at the access unit (AU) level or the sub-picture level. Allowing AU-level or sub-picture level CPB removal helps achieve reduced codec latency in an interoperable manner. When access unit level CPB removal occurs, the access unit is removed from the CPB each time the removal operation occurs. When sub-picture level CPB removal occurs, the decoding unit (DU) containing one or more slices is removed from the CPB each time the removal operation occurs.

除AU层级CPB移除定时信息之外，还可用信号发出子图片层级CPB移除定时信息。当针对AU层级移除及子图片层级移除两者呈现CPB移除定时信息时，解码器可选择在AU层级或子图片层级操作CPB。应注意，为了达成当前图片定时SEI消息及同时实现AU层级及DU层级HRD CPB移除两者以达成子图片延迟的机制，必须在编码全部AU之前，将DU发送出去，且在编码全部AU之前，仍不可将AU层级SEI消息发送出去。In addition to AU-level CPB removal timing information, sub-picture-level CPB removal timing information can also be signaled. When CPB removal timing information is presented for both AU-level and sub-picture-level removal, the decoder can choose to operate the CPB at the AU level or the sub-picture level. It should be noted that in order to achieve the current picture timing SEI message and the mechanism for simultaneously implementing both AU-level and DU-level HRD CPB removal to achieve sub-picture delay, the DU must be sent before all AUs are encoded, and the AU-level SEI message must not be sent before all AUs are encoded.

根据HEVC WD7，定时信息可包含定义两个连续图片的HRD输出时间之间的时间距离的信息。HEVC WD7定义以下定时信息语法元素：According to HEVC WD7, timing information may include information defining the temporal distance between the HRD output times of two consecutive pictures. HEVC WD7 defines the following timing information syntax elements:

time_scale为一秒中经过的时间单位的数目。举例来说，使用27MHz时钟测量时间的时间坐标系具有27,000,000的time_scale。time_scale应大于0。time_scale is the number of time units that pass in one second. For example, a time coordinate system that uses a 27 MHz clock to measure time has a time_scale of 27,000,000. time_scale should be greater than 0.

num_units_in_tick为时钟以对应于时钟刻度计数器的一增量(称为时钟刻度)的频率time_scale Hz操作的时间单位的数目。num_units_in_tick应大于0。num_units_in_tick is the number of time units that the clock operates at the frequency time_scale Hz corresponding to one increment of the clock tick counter (called a clock tick). num_units_in_tick should be greater than 0.

因此，基于time_scale及num_units_in_tick的值，可导出所谓的时钟刻度变量t_c如下：Therefore, based on the values of time_scale and num_units_in_tick, the so-called clock tick variable _tc can be derived as follows:

t_c＝num_units_in_tick÷time_scale (1)t _c =num_units_in_tick÷time_scale (1)

根据HEVC WD7，时钟刻度变量可用于约束HRD输出时间。即，在一些状况下，可能需要在输出次序上连续的两个图片(即，第一及第二图片)的呈现时间之间的差等于时钟刻度。HEVC WD7包含fixed_pic_rate_flag语法元素，其指示在输出次序上连续的两个图片的呈现时间之间的差是否等于时钟刻度。fixed_pic_rate_flag语法元素可包含于VUI参数的集合中，VUI参数的集合可包含于SPS中。在HEVC WD7中，当fixed_pic_rate_flag语法元素等于一时，在以下条件中的任一者成立的情况下，在输出次序上的任何两个连续图片的HRD输出时间之间的时间距离受约束为等于所确定的时钟刻度：(1)第二图片处于与第一图片相同的经译码视频序列中；或(2)第二图片处于不同于第一图片的经译码视频序列中，且在含有第二图片的经译码视频序列中，fixed_pic_rate_flag等于1，且对于两个经译码视频序列，num_units_in_tick÷time_scale的值相同。当fixed_pic_rate_flag语法元素等于零时，此些约束不适用于在输出次序上的任何两个连续图片(即，第一图片及第二图片)的HRD输出时间之间的时间距离。应注意，当未呈现fixed_pic_rate_flag时，推断其等于0。应注意，根据HEVC WD7，当fixed_pic_rate_flag等于1时，在丢弃一些最高时间层的状况下，基于时间可缩放性的流调适将需要改变time_scale或num_units_in_tick的值。应注意，HEVC WD7提供fixed_pic_rate_flag的以下语义：According to HEVC WD7, a clock tick variable can be used to constrain HRD output time. That is, in some cases, it may be required that the difference between the presentation times of two consecutive pictures in output order (i.e., the first and second pictures) is equal to the clock tick. HEVC WD7 includes a fixed_pic_rate_flag syntax element, which indicates whether the difference between the presentation times of two consecutive pictures in output order is equal to the clock tick. The fixed_pic_rate_flag syntax element can be included in a set of VUI parameters, which can be included in an SPS. In HEVC WD7, when the fixed_pic_rate_flag syntax element is equal to one, the temporal distance between the HRD output times of any two consecutive pictures in output order is constrained to be equal to the determined clock tick if either of the following conditions holds: (1) the second picture is in the same coded video sequence as the first picture; or (2) the second picture is in a different coded video sequence from the first picture, and in the coded video sequence containing the second picture, fixed_pic_rate_flag is equal to one, and the value of num_units_in_tick÷time_scale is the same for both coded video sequences. When the fixed_pic_rate_flag syntax element is equal to zero, these constraints do not apply to the temporal distance between the HRD output times of any two consecutive pictures in output order (i.e., the first picture and the second picture). It should be noted that when fixed_pic_rate_flag is not present, it is inferred to be equal to 0. It should be noted that according to HEVC WD7, when fixed_pic_rate_flag is equal to 1, stream adaptation based on temporal scalability will need to change the value of time_scale or num_units_in_tick in the case of dropping some of the highest temporal layers. It should be noted that HEVC WD7 provides the following semantics for fixed_pic_rate_flag:

当对于含有图片n的经译码视频序列，fixed_pic_rate_flag等于1时，当对于经指定以用于等式C-13中的后续图片n_n，以下条件中的一或多者成立时，如等式C-13中所指定的针对At_o，dpb(n)计算的值应等于如等式C-1中所指定的t_c(对于含有图片n的经译码视频序列，使用t_c的值)：When fixed_pic_rate_flag is equal to 1 for a coded video sequence containing picture n, the value calculated for At o,dpb(n) as specified in Equation C-13 shall be equal to t _c as specified in Equation C-1 (using the value of t _c for the coded video sequence containing picture n) when, for a subsequent picture n _n designated for use in Equation _C -13, one or more of the following conditions hold:

-图片n_n处于与图片n相同的经译码视频序列中。Picture _n is in the same coded video sequence as picture n.

-图片n_n处于不同的经译码视频序列中，且在含有图片n_n的经译码视频序列中fixed_pic_rate_flag等于1，且对于两个经译码视频序列，num_units_in_tick÷time_scale的值相同。Picture n _n is in different coded video sequences, and fixed_pic_rate_flag is equal to 1 in the coded video sequence containing picture n _n , and the value of num_units_in_tick÷time_scale is the same for both coded video sequences.

其中在HEVC WD7中，等式C-1对应于等式(1)，且等式C-13如下定义：In HEVC WD7, Equation C-1 corresponds to Equation (1), and Equation C-13 is defined as follows:

Δt_o，dpb(n)＝t_o，dpb(n_n)-t_o，dpb(n) (2)Δt _o,dpb (n)=t _o,dpb (n _n )-t _o,dpb (n) (2)

鉴于上文所提及的与HEVC WD7相关联的定时及随机存取特性，本发明描述可用于进行以下操作的技术：减少例如交谈式应用程序的视频应用程序中的延迟，及提供在随机存取经译码视频序列中的改进。在一实例中，本发明描述用于分配NAL单元类型的技术。在另一实例中，本发明描述子图片层级或解码单元层级HRD行为。在另一实例中，本发明描述用于参考参数集ID的技术。在又一实例中，本发明描述用于提供fixed_pic_rate_flag语法元素的改进的语义的技术。应注意，本文中所描述的此些技术及其它技术的任何及所有组合可并入于视频编码及解码系统中。In view of the timing and random access characteristics associated with HEVC WD7 mentioned above, this disclosure describes techniques that can be used to reduce latency in video applications, such as conversational applications, and provide improvements in random access to coded video sequences. In one example, this disclosure describes techniques for assigning NAL unit types. In another example, this disclosure describes sub-picture level or decoding unit level HRD behavior. In another example, this disclosure describes techniques for reference parameter set IDs. In yet another example, this disclosure describes techniques for providing improved semantics for the fixed_pic_rate_flag syntax element. It should be noted that any and all combinations of these and other techniques described herein can be incorporated into video encoding and decoding systems.

图3为说明可利用本文中所描述的技术的实例视频编码及解码系统10的框图。详细地说，视频编码及解码系统可利用本文中所描述的技术，所述技术与以下各者有关：(1)分配NAL单元类型，(2)子图片层级或解码单元层级HRD行为，(3)参考参数集ID，(4)fixed_pic_rate_flag的改进的语义，或此些技术的任何及所有组合。视频编码及解码系统10为可用于以下视频应用程序中的任一者的视频系统的实例：本地播放、流式传输、广播、多播及/或交谈式应用程序。源装置12及目的地装置14为译码装置的实例，其中源装置12产生经编码视频数据以用于发射到目的地装置14。在一些实例中，源装置12及目的地装置14可以实质上对称方式操作，使得源装置12及目的地装置14中的每一者包含视频编码及解码组件。因此，系统10可经配置以支持源装置12与目的地装置14之间的单向或双向视频传输。FIG3 is a block diagram illustrating an example video encoding and decoding system 10 that may utilize the techniques described herein. In detail, the video encoding and decoding system may utilize the techniques described herein related to: (1) assigning NAL unit types, (2) sub-picture level or decoding unit level HRD behavior, (3) reference parameter set IDs, (4) improved semantics of fixed_pic_rate_flag, or any and all combinations of such techniques. Video encoding and decoding system 10 is an example of a video system that may be used for any of the following video applications: local playback, streaming, broadcast, multicast, and/or interactive applications. Source device 12 and destination device 14 are examples of decoding devices, where source device 12 generates encoded video data for transmission to destination device 14. In some examples, source device 12 and destination device 14 may operate in a substantially symmetrical manner such that each of source device 12 and destination device 14 includes video encoding and decoding components. Thus, system 10 may be configured to support one-way or two-way video transmission between source device 12 and destination device 14 .

尽管结合源装置12及目的地装置14描述本文中所描述的技术，但可由任何数字视频编码及/或解码装置来执行所述技术。还可由视频预处理器来执行本发明的技术。另外，尽管将本发明的技术大体上描述为由视频编码装置及视频解码装置执行，但还可由视频编码器/解码器(通常被称作“编解码器(CODEC)”)来执行所述技术。因此，图3中的视频编码器20及视频解码器30中的每一者可包含于一或多个编码器或解码器中，其中的任一者可集成为相应装置中的组合式编码器/解码器(编解码器(CODEC))的部分。另外，包含视频编码器20及/或视频解码器30的装置可包括集成电路、微处理器及/或无线通信装置，例如蜂窝式电话。尽管图3中未展示，但在一些方面中，视频编码器20及视频解码器30可各自与音频编码器及解码器集成，且可包含适当的MUX-DEMUX单元或其它硬件及软件以处置共同数据流或单独数据流中的音频及视频两者的编码。在适用的情况下，MUX-DEMUX单元可符合ITUH.223多路复用器协议或例如用户数据报协议(UDP)等其它协议。Although the techniques described herein are described in conjunction with source device 12 and destination device 14, the techniques may be performed by any digital video encoding and/or decoding device. The techniques of this disclosure may also be performed by a video preprocessor. Additionally, although the techniques of this disclosure are generally described as being performed by a video encoding device and a video decoding device, they may also be performed by a video encoder/decoder (often referred to as a "CODEC"). Thus, each of the video encoder 20 and video decoder 30 in FIG. 3 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in the respective device. Additionally, a device including the video encoder 20 and/or video decoder 30 may comprise an integrated circuit, a microprocessor, and/or a wireless communication device, such as a cellular telephone. Although not shown in FIG. 3 , in some aspects, the video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder, and may include appropriate MUX-DEMUX units or other hardware and software to handle the encoding of both audio and video in a common data stream or in separate data streams. Where applicable, the MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol or other protocols such as the User Datagram Protocol (UDP).

如图3中所说明，系统10包含源装置12，其提供待由目的地装置14于稍后时间解码的经编码视频数据。详细地说，源装置12经由计算机可读媒体16将经编码视频数据提供给目的地装置14。目的地装置14可经由计算机可读媒体16接收待解码的经编码视频数据。源装置12及目的地装置14可包括广泛范围的装置中的任一者，包含桌上型计算机、笔记型(即，膝上型)计算机、平板计算机、机顶盒、电话手持机(例如，所谓的“智能”电话、所谓的“智能”平板设备(smart pad)、电视、相机、显示装置、数字媒体播放器、视频游戏控制台、视频流式传输装置，或其类似者。在一些状况下，源装置12及目的地装置14可经装备用于无线通信。3 , system 10 includes a source device 12 that provides encoded video data to be decoded at a later time by a destination device 14. In detail, source device 12 provides the encoded video data to destination device 14 via a computer-readable medium 16. Destination device 14 may receive the encoded video data to be decoded via computer-readable medium 16. Source device 12 and destination device 14 may comprise any of a wide range of devices, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets (e.g., so-called “smart” phones, so-called “smart” pads), televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, or the like. In some cases, source device 12 and destination device 14 may be equipped for wireless communication.

计算机可读媒体16可包括能够将经编码视频数据从源装置12移动到目的地装置14的任何类型的媒体或装置。计算机可读媒体16可包含暂时性媒体，例如无线广播或有线网络传输；或存储媒体(即，非暂时性存储媒体)，例如硬盘、闪存驱动器、压缩光盘、数字视频光盘、蓝光光盘或其它计算机可读媒体。在一些实例中，网络服务器(未展示)可从源装置12接收经编码视频数据，且(例如)经由网络传输将经编码视频数据提供给目的地装置14。类似地，媒体生产设施(例如，光盘压印设施)的计算装置可从源装置12接收经编码视频数据且生产含有经编码视频数据的光盘。Computer-readable medium 16 may comprise any type of medium or device capable of moving encoded video data from source device 12 to destination device 14. Computer-readable medium 16 may include transitory media, such as wireless broadcasts or wired network transmissions, or storage media (i.e., non-transitory storage media) such as a hard drive, flash drive, compact disc, digital video disc, Blu-ray disc, or other computer-readable media. In some examples, a network server (not shown) may receive the encoded video data from source device 12 and provide the encoded video data to destination device 14, for example, via a network transmission. Similarly, a computing device at a media production facility (e.g., a disc pressing facility) may receive the encoded video data from source device 12 and produce a disc containing the encoded video data.

在一实例中，计算机可读媒体16可包括用以使得源装置12能够实时地将经编码视频数据直接发射到目的地装置14的通信媒体。可根据通信标准(例如，无线通信协议)调制经编码视频数据，及将经编码视频数据发射到目的地装置14。通信媒体可包括任何无线或有线通信媒体，例如，射频(RF)频谱或一或多个物理传输线。通信媒体可形成基于包的网络(例如，局域网、广域网或例如因特网等全球网络)的部分。通信媒体可包含路由器、交换器、基站，或可用以促进从源装置12到目的地装置14的通信的任何其它设备。In one example, computer-readable medium 16 may comprise a communication medium to enable source device 12 to transmit encoded video data directly to destination device 14 in real time. The encoded video data may be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to destination device 14. The communication medium may comprise any wireless or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the Internet. The communication medium may include routers, switches, base stations, or any other equipment that may be used to facilitate communication from source device 12 to destination device 14.

存储装置可包含多种分布式或本地存取式数据存储媒体中的任一者，例如，硬盘、蓝光光盘、DVD、CD-ROM、快闪存储器、易失性或非易失性存储器，或用于存储经编码视频数据的任何其它合适的数字存储媒体。在另一实例中，存储装置可对应于可存储由源装置12产生的经编码视频的文件服务器或另一中间存储装置。目的地装置14可经由流式传输或下载从存储装置存取所存储的视频数据。文件服务器可为能够存储经编码视频数据且将所述经编码视频数据发射到目的地装置14的任何类型的服务器。实例文件服务器包含web服务器(例如，用于网站)、FTP服务器、网络附加存储(NAS)装置或本地磁盘驱动器。目的地装置14可经由任何标准数据连接(包含因特网连接)存取经编码视频数据。此数据连接可包含适合于存取存储于文件服务器上的经编码视频数据的无线信道(例如，Wi-Fi连接)、有线连接(例如，DSL、缆线调制解调器等)，或两者的组合。经编码视频数据从存储装置的传输可为流式传输、下载传输，或其组合。The storage device may include any of a variety of distributed or locally accessible data storage media, such as a hard drive, Blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data. In another example, the storage device may correspond to a file server or another intermediate storage device that can store the encoded video generated by source device 12. Destination device 14 can access the stored video data from the storage device via streaming or downloading. The file server may be any type of server capable of storing and transmitting the encoded video data to destination device 14. Example file servers include a web server (e.g., for a website), an FTP server, a network-attached storage (NAS) device, or a local disk drive. Destination device 14 can access the encoded video data via any standard data connection, including an Internet connection. This data connection may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.) suitable for accessing the encoded video data stored on the file server, or a combination of both. The transmission of the encoded video data from the storage device may be a streaming transmission, a download transmission, or a combination thereof.

本发明的技术未必限于无线应用或设定。所述技术可应用于支持例如以下应用等多种多媒体应用中的任一者的视频译码：无线电视广播、有线电视传输、卫星电视传输、例如经由HTTP的动态自适应流式传输(DASH)等因特网流式视频传输、编码到数据存储媒体上的数字视频、存储于数据存储媒体上的数字视频的解码，或其它应用。The techniques of this disclosure are not necessarily limited to wireless applications or settings. They may be applied to video coding supporting any of a variety of multimedia applications, such as wireless television broadcasting, cable television transmission, satellite television transmission, Internet streaming video transmission such as Dynamic Adaptive Streaming over HTTP (DASH), encoding of digital video onto a data storage medium, decoding of digital video stored on a data storage medium, or other applications.

在图3的实例中，源装置12包含视频源18、译码结构单元19、视频编码器20、封装单元21及输出接口22。目的地装置14包含输入接口28、解封装单元29、视频解码器30及显示装置32。在其它实例中，源装置12及目的地装置14可包含其它组件或布置。举例来说，源装置12可从外部视频源18(例如，外部相机)接收视频数据。同样地，目的地装置14可与外部显示装置介接，而非包含集成显示装置。源装置12及目的地装置14的组件各自可实施为多种合适电路中的任一者，例如一或多个微处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、离散逻辑、软件、硬件、固件或其任何组合。当部分地在软件中实施本文中所描述的技术时，装置可将用于软件的指令存储于合适的非暂时性计算机可读媒体中，且在硬件中使用一或多个处理器来执行所述指令以执行所述技术。In the example of FIG3 , source device 12 includes a video source 18, a coding structure unit 19, a video encoder 20, an encapsulation unit 21, and an output interface 22. Destination device 14 includes an input interface 28, a decapsulation unit 29, a video decoder 30, and a display device 32. In other examples, source device 12 and destination device 14 may include other components or arrangements. For example, source device 12 may receive video data from an external video source 18 (e.g., an external camera). Similarly, destination device 14 may interface with an external display device rather than including an integrated display device. The components of source device 12 and destination device 14 may each be implemented as any of a variety of suitable circuits, such as one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware, or any combination thereof. When the techniques described herein are implemented partially in software, the device may store instructions for the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques.

源装置12的视频源18可包含视频俘获装置，例如视频相机、含有先前所俘获的视频的视频存档，及/或用以从视频内容提供者接收视频的视频馈入接口。作为另一替代例，视频源18可产生基于计算机图形的数据作为源视频，或直播视频、存档视频及计算机产生的视频的组合。在一些状况下，如果视频源18为视频相机，那么源装置12及目的地装置14可形成所谓的相机电话或视频电话。然而，如上文所提及，本发明中所描述的技术可大体上适用于视频译码，且可应用于无线及/或有线应用。在每一状况下，可由视频编码器20接收所俘获视频、预俘获的视频或计算机产生的视频。输出接口22可经配置以将经编码视频数据(例如，经译码视频序列)输出到计算机可读媒体16上。在一些实例中，可将经译码视频序列从输出接口22输出到存储装置。目的地装置14的输入接口28从计算机可读媒体16接收经编码视频数据。显示装置32向用户显示经解码视频数据，且可包括多种显示装置中的任一者，例如，阴极射线管(CRT)、液晶显示器(LCD)、等离子体显示器、有机发光二极管(OLED)显示器或另一类型的显示装置。Video source 18 of source device 12 may include a video capture device, such as a video camera, a video archive containing previously captured video, and/or a video feed interface for receiving video from a video content provider. As another alternative, video source 18 may generate computer graphics-based data as the source video, or a combination of live video, archived video, and computer-generated video. In some cases, if video source 18 is a video camera, source device 12 and destination device 14 may form so-called camera phones or video phones. However, as mentioned above, the techniques described in this disclosure may be generally applicable to video coding and may be applied to wireless and/or wired applications. In each case, captured video, pre-captured video, or computer-generated video may be received by video encoder 20. Output interface 22 may be configured to output encoded video data (e.g., a coded video sequence) onto computer-readable medium 16. In some examples, the coded video sequence may be output from output interface 22 to a storage device. Input interface 28 of destination device 14 receives the encoded video data from computer-readable medium 16. Display device 32 displays the decoded video data to a user and may comprise any of a variety of display devices such as a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or another type of display device.

译码结构单元19、视频编码器20、封装单元21、解封装单元29及视频解码器30可根据例如上文所描述的即将到来的HEVC等视频译码标准操作，且可大体上符合HEVC测试模型(HM)。或者，视频编码器20及视频解码器30可根据以下标准操作：其它专属或工业标准，例如ITU-T H.264标准(或者被称作MPEG-4第10部分，高级视频译码(AVC))，或此些标准的扩展。译码结构单元19、视频编码器20、封装单元21、解封装单元29及视频解码器30还可根据视频译码标准的修改的版本操作，其中视频译码标准的修改的版本经修改以包含本文中所描述的技术的任何及所有组合。Coding structure unit 19, video encoder 20, encapsulation unit 21, decapsulation unit 29, and video decoder 30 may operate according to a video coding standard such as the upcoming HEVC described above and may generally conform to the HEVC Test Model (HM). Alternatively, video encoder 20 and video decoder 30 may operate according to other proprietary or industry standards, such as the ITU-T H.264 standard (alternatively known as MPEG-4 Part 10, Advanced Video Coding (AVC)), or extensions of such standards. Coding structure unit 19, video encoder 20, encapsulation unit 21, decapsulation unit 29, and video decoder 30 may also operate according to modified versions of the video coding standards, where the modified versions of the video coding standards are modified to include any and all combinations of the techniques described herein.

视频编码器20可将视频帧或图片划分成一系列大小相等的视频块，例如HEVC WD7中所描述的CU。CU包含译码节点及与所述译码节点相关联的预测单元(PU)及变换单元(TU)。CU的大小对应于译码节点的大小，且形状必须为正方形。CU的大小的范围可从8x8像素直到具有最大64x64像素或大于64x64像素的树型块的大小。每一CU可含有一或多个PU及一或多个TU。与CU相关联的语法数据可描述(例如)将CU分割成一或多个PU。分割模式可在以下情形间不同：CU经跳转或直接模式编码、经帧内预测模式编码或经帧间预测模式编码。PU的形状可分割成非正方形。与CU相关联的语法数据还可描述(例如)根据四分树将CU分割成一或多个TU。TU的形状可为正方形或非正方形(例如，矩形)。Video encoder 20 may divide a video frame or picture into a series of equally sized video blocks, such as CUs described in HEVC WD7. A CU includes a coding node and prediction units (PUs) and transform units (TUs) associated with the coding node. The size of a CU corresponds to the size of the coding node and must be square in shape. The size of a CU may range from 8x8 pixels up to a treeblock size of 64x64 pixels or larger. Each CU may contain one or more PUs and one or more TUs. Syntax data associated with a CU may describe, for example, the partitioning of the CU into one or more PUs. The partitioning mode may differ between cases where the CU is skip or direct mode encoded, intra-prediction mode encoded, or inter-prediction mode encoded. The shape of a PU may be non-square. Syntax data associated with a CU may also describe, for example, the partitioning of the CU into one or more TUs according to a quadtree. The shape of a TU may be square or non-square (e.g., rectangular).

HEVC标准允许根据TU的变换，所述变换对于不同CU可不同。通常基于针对经分割的LCU定义的给定CU内的PU的大小而设定TU的大小，但可能并非始终为此状况。TU通常具有与PU相同的大小，或小于PU。在一些实例中，可使用被称为“残余四分树”(RQT)的四分树结构将对应于CU的残余样本再分为较小单元。RQT的叶节点可被称作变换单元(TU)。可变换与TU相关联的像素差值以产生可量化的变换系数。The HEVC standard allows for transforms based on TUs, which can be different for different CUs. The size of a TU is typically set based on the size of the PU within a given CU defined for the partitioned LCU, but this may not always be the case. A TU is typically the same size as a PU, or smaller than a PU. In some examples, a quadtree structure called a "residual quadtree" (RQT) can be used to subdivide the residual samples corresponding to a CU into smaller units. The leaf nodes of the RQT can be referred to as transform units (TUs). Pixel difference values associated with a TU can be transformed to produce quantizable transform coefficients.

叶CU可包含一或多个预测单元(PU)。大体上，PU表示对应于对应CU的全部或一部分的空间区域，且可包含用于检索PU的参考样本的数据。此外，PU包含与预测有关的数据。举例来说，当PU经帧内模式编码时，用于PU的数据可包含于残余四分树(RQT)中，残余四分树可包含描述对应于PU的TU的帧内预测模式的数据。作为另一实例，当PU经帧间模式编码时，PU可包含定义所述PU的一或多个运动向量的数据。定义PU的运动向量的数据可描述(例如)运动向量的水平分量、运动向量的垂直分量、运动向量的分辨率(例如，四分之一像素精度或八分之一像素精度)、运动向量所指向的参考图片，及/或运动向量的参考图片列表(例如，列表0、列表1或列表C)。A leaf-CU may include one or more prediction units (PUs). Generally, a PU represents a spatial region corresponding to all or a portion of the corresponding CU and may include data used to retrieve reference samples for the PU. In addition, a PU includes data related to prediction. For example, when the PU is intra-mode encoded, the data for the PU may be included in a residual quadtree (RQT), which may include data describing the intra-prediction mode of the TU corresponding to the PU. As another example, when the PU is inter-mode encoded, the PU may include data defining one or more motion vectors for the PU. The data defining the motion vector of the PU may describe, for example, the horizontal component of the motion vector, the vertical component of the motion vector, the resolution of the motion vector (e.g., quarter-pixel precision or eighth-pixel precision), the reference picture to which the motion vector points, and/or the reference picture list (e.g., List 0, List 1, or List C) for the motion vector.

具有一或多个PU的叶CU还可包含一或多个变换单元(TU)。可使用RQT(还被称作TU四分树结构)指定变换单元，如上文所论述。举例来说，分裂旗标可指示叶CU是否分裂成四个变换单元。接着，每一变换单元可进一步分裂成其它子TU。当TU不进一步分裂时，其可被称作叶TU。大体上，对于帧内译码，属于叶CU的所有叶TU共享相同的帧内预测模式。即，大体上应用相同的帧内预测模式来计算叶CU的所有TU的所预测值。对于帧内译码，视频编码器可使用帧内预测模式将每一叶TU的残余值计算为在CU的对应于所述TU的部分与原始块之间的差。TU未必限于PU的大小。因此，TU可能大于或小于PU。对于帧内译码，PU可与相同CU的对应叶TU共置。在一些实例中，叶TU的最大大小可对应于对应叶CU的大小。A leaf-CU with one or more PUs may also include one or more transform units (TUs). Transform units may be specified using an RQT (also known as a TU quadtree structure), as discussed above. For example, a split flag may indicate whether a leaf-CU is split into four transform units. Each transform unit may then be further split into additional sub-TUs. When a TU is not further split, it may be referred to as a leaf-TU. Generally, for intra coding, all leaf-TUs belonging to a leaf-CU share the same intra prediction mode. That is, generally the same intra prediction mode is applied to calculate the predicted values for all TUs of a leaf-CU. For intra coding, the video encoder may use the intra prediction mode to calculate the residual value of each leaf-TU as the difference between the portion of the CU corresponding to the TU and the original block. A TU is not necessarily limited to the size of a PU. Thus, a TU may be larger or smaller than a PU. For intra coding, a PU may be co-located with a corresponding leaf-TU of the same CU. In some examples, the maximum size of a leaf-TU may correspond to the size of the corresponding leaf-CU.

此外，叶CU的TU还可与被称作残余四分树(RQT)的相应四分树数据结构相关联。即，叶CU可包含指示如何将叶CU分割成TU的四分树。TU四分树的根节点大体上对应于叶CU，而CU四分树的根节点大体上对应于树型块(或LCU)。RQT的不分裂的TU被称作叶TU。大体上，除非另有注释，否则本发明分别使用术语CU及TU来指代叶CU及叶TU。本发明使用术语“块”来指代在HEVC的上下文中的CU、PU或TU中的任一者，或在其它标准的上下文中的类似数据结构(例如，H.264/AVC中的宏块及其子块)。In addition, the TUs of a leaf-CU may also be associated with corresponding quadtree data structures referred to as residual quadtrees (RQTs). That is, a leaf-CU may include a quadtree that indicates how the leaf-CU is partitioned into TUs. The root node of a TU quadtree generally corresponds to a leaf-CU, while the root node of a CU quadtree generally corresponds to a treeblock (or LCU). Unsplit TUs of an RQT are referred to as leaf-TUs. In general, unless otherwise noted, this disclosure uses the terms CU and TU to refer to leaf-CUs and leaf-TUs, respectively. This disclosure uses the term "block" to refer to any of a CU, PU, or TU in the context of HEVC, or similar data structures in the context of other standards (e.g., macroblocks and subblocks thereof in H.264/AVC).

作为一实例，HM支持以各种PU大小进行的预测。假定特定CU的大小为2Nx2N，那么HM支持以2Nx2N或NxN的PU大小进行的帧内预测，及以2Nx2N、2NxN、Nx2N或NxN的对称PU大小进行的帧间预测。HM还支持以2NxnU、2NxnD、nLx2N及nRx2N的PU大小进行的用于帧间预测的不对称分割。在不对称分割中，不分割CU的一个方向，而将另一方向分割成25％及75％。CU的对应于25％分割区的部分由“n”后跟着“上”、“下”、“左”或“右”的指示来指示。因此，例如，“2NxnU”指代在水平方向上以顶部2Nx0.5N PU及底部2Nx1.5N PU分割的2Nx2N CU。As an example, the HM supports prediction with various PU sizes. Assuming a particular CU is 2Nx2N, the HM supports intra prediction with PU sizes of 2Nx2N or NxN, and inter prediction with symmetric PU sizes of 2Nx2N, 2NxN, Nx2N, or NxN. The HM also supports asymmetric partitioning for inter prediction with PU sizes of 2NxnU, 2NxnD, nLx2N, and nRx2N. In asymmetric partitioning, one direction of the CU is not partitioned, while the other direction is partitioned into 25% and 75%. The portion of the CU corresponding to the 25% partition is indicated by an "n" followed by an indication of "up," "down," "left," or "right." Thus, for example, "2NxnU" refers to a 2Nx2N CU partitioned horizontally with a 2Nx0.5N PU at the top and a 2Nx1.5N PU at the bottom.

在本发明中，“NxN”与“N乘N”可互换地使用以指代视频块在垂直尺寸与水平尺寸方面的像素尺寸，例如，16x16像素或16乘16像素。大体上，16x16块在垂直方向上将具有16个像素(y＝16)且在水平方向上将具有16个像素(x＝16)。同样地，NxN块大体上在垂直方向上具有N个像素，且在水平方向上具有N个像素，其中N表示非负整数值。可按行及列来布置块中的像素。此外，块未必需要在水平方向上具有与垂直方向上相同的数目个像素。举例来说，块可包括NxM个像素，其中M未必等于N。In this disclosure, "NxN" and "N by N" may be used interchangeably to refer to the pixel dimensions of a video block in terms of vertical and horizontal dimensions, e.g., 16x16 pixels or 16 by 16 pixels. Generally, a 16x16 block will have 16 pixels in the vertical direction (y=16) and 16 pixels in the horizontal direction (x=16). Similarly, an NxN block generally has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. The pixels in a block may be arranged in rows and columns. Furthermore, a block need not necessarily have the same number of pixels in the horizontal direction as in the vertical direction. For example, a block may comprise NxM pixels, where M is not necessarily equal to N.

在使用CU的PU进行帧内预测性译码或帧间预测性译码之后，视频编码器20可计算CU的TU的残余数据。PU可包括描述产生空间域(还被称作像素域)中的预测性像素数据的方法或模式的语法数据，且TU可包括在将例如离散余弦变换(DCT)、整数变换、小波变换或概念上类似的变换等变换应用于残余视频数据之后的变换域中的系数。残余数据可对应于未经编码图片的像素与对应于PU的预测值之间的像素差。视频编码器20可形成包含CU的残余数据的TU，且接着变换所述TU以产生CU的变换系数。After intra-predictive coding or inter-predictive coding using the PUs of a CU, video encoder 20 may calculate residual data for the TUs of the CU. A PU may include syntax data that describes a method or mode of generating predictive pixel data in the spatial domain (also referred to as the pixel domain), and a TU may include coefficients in the transform domain after applying a transform, such as a discrete cosine transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform, to the residual video data. The residual data may correspond to pixel differences between pixels of an unencoded picture and prediction values corresponding to the PU. Video encoder 20 may form TUs including the residual data for the CU, and then transform the TUs to generate transform coefficients for the CU.

在应用任何变换以产生变换系数之后，视频编码器20可执行变换系数的量化。量化大体上指代如下过程：将变换系数量化以可能地减少用以表示所述系数的数据的量，从而提供进一步压缩。所述量化过程可减少与所述系数中的一些或全部相关联的位深度。举例来说，可在量化期间将n位值向下舍入到m位值，其中n大于m。After applying any transforms to generate transform coefficients, video encoder 20 may perform quantization of the transform coefficients. Quantization generally refers to the process of quantizing transform coefficients to potentially reduce the amount of data used to represent the coefficients, thereby providing further compression. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value may be rounded down to an m-bit value during quantization, where n is greater than m.

在量化之后，视频编码器可扫描变换系数，从而从包含经量化的变换系数的二维矩阵产生一维向量。扫描可经设计成将较高能量(且因此较低频率)系数置放于阵列前部，及将较低能量(且因此较高频率)系数置放于阵列后部。在一些实例中，视频编码器20可利用预定义扫描次序来扫描经量化的变换系数，以产生可经熵编码的串行化向量。在其它实例中，视频编码器20可执行自适应扫描。在扫描经量化的变换系数以形成一维向量之后，视频编码器20可(例如)根据上下文自适应可变长度译码(CAVLC)、上下文自适应二进制算术译码(CABAC)、基于语法的上下文自适应二进制算术译码(SBAC)、概率区间分割熵(PIPE)译码或另一熵编码方法熵编码所述一维向量。视频编码器20还可熵编码与经编码视频数据相关联的语法元素以供视频解码器30用于解码视频数据。After quantization, the video encoder may scan the transform coefficients, generating a one-dimensional vector from the two-dimensional matrix comprising the quantized transform coefficients. The scan may be designed to place higher energy (and therefore lower frequency) coefficients at the front of the array and lower energy (and therefore higher frequency) coefficients at the back of the array. In some examples, video encoder 20 may utilize a predefined scan order to scan the quantized transform coefficients to generate a serialized vector that can be entropy encoded. In other examples, video encoder 20 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 20 may entropy encode the one-dimensional vector, for example, according to context-adaptive variable length coding (CAVLC), context-adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding, or another entropy encoding method. Video encoder 20 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 30 in decoding the video data.

为了执行CABAC，视频编码器20可将上下文模型内的上下文指派给待发射的符号。所述上下文可关于(例如)符号的相邻值是否为非零。为了执行CAVLC，视频编码器20可针对待发射的符号选择可变长度码。VLC中的码字可经建构以使得相对较短码对应于较有可能的符号，而较长码对应于较不可能的符号。以此方式，使用VLC可达成位节省(与(例如)针对待发射的每一符号使用相等长度的码字相比较)。概率确定可基于指派给符号的上下文而进行。To perform CABAC, video encoder 20 may assign context within a context model to a symbol to be transmitted. The context may relate to, for example, whether neighboring values of the symbol are non-zero. To perform CAVLC, video encoder 20 may select a variable length code for the symbol to be transmitted. Codewords in VLC may be constructed so that relatively shorter codes correspond to more likely symbols, while longer codes correspond to less likely symbols. In this way, using VLC may achieve bit savings (compared to, for example, using codewords of equal length for each symbol to be transmitted). Probability determinations may be made based on the context assigned to the symbol.

如上文所描述，可根据所确定的视频译码结构来译码视频序列，其中译码结构定义用于编码视频序列的图片类型(例如，RAP图片及非RAP图片)的分配。举例来说，可用以预定间隔包含的RAP图片编码视频序列，以便促进视频序列的随机存取。此译码结构可用于广播应用。另外，可根据使低延迟应用程序的延迟最小化的译码结构编码视频序列。译码结构单元19可经配置以确定待由视频编码器20使用以编码从视频源18接收的视频序列的译码结构。在一实例中，译码结构单元19可存储对应于相应视频应用程序的预定义的译码结构。译码结构单元19可经配置以输出向视频编码器20及封装单元21中的每一者指示特定译码结构的信息。视频编码器20接收来自视频源18的视频序列及来自译码结构单元19的译码结构信息，且产生经编码视频数据。封装单元21接收来自视频编码器20的经编码视频数据及指示特定译码结构的信息，且产生包含存储单元的经译码视频序列。解封装单元29可经配置以接收经译码视频序列，及剖析存取单元及NAL单元。视频解码器30可经配置以接收NAL单元，及基于包含于所接收的NAL单元中的信息重建构视频数据。As described above, a video sequence may be coded according to a determined video coding structure, wherein the coding structure defines the allocation of picture types (e.g., RAP pictures and non-RAP pictures) used to encode the video sequence. For example, a video sequence may be coded with RAP pictures included at predetermined intervals to facilitate random access to the video sequence. This coding structure may be used in broadcast applications. Additionally, a video sequence may be coded according to a coding structure that minimizes latency for low-latency applications. Coding structure unit 19 may be configured to determine a coding structure to be used by video encoder 20 to encode a video sequence received from video source 18. In one example, coding structure unit 19 may store a predefined coding structure corresponding to a respective video application. Coding structure unit 19 may be configured to output information indicating a specific coding structure to each of video encoder 20 and encapsulation unit 21. Video encoder 20 receives the video sequence from video source 18 and the coding structure information from coding structure unit 19 and generates encoded video data. Encapsulation unit 21 receives encoded video data and information indicating a specific coding structure from video encoder 20 and generates a coded video sequence including storage units. Decapsulation unit 29 may be configured to receive the coded video sequence and parse access units and NAL units. Video decoder 30 may be configured to receive NAL units and reconstruct video data based on information contained in the received NAL units.

应注意，译码结构单元19及/或视频编码器20可经配置以产生包含于参数集中的语法元素。在一些实例中，译码结构单元19可经配置以产生包含于例如SPS的高层级参数集中的语法元素，且视频编码器20可经配置以基于从译码单元结构所接收的语法元素执行视频编码，以及将经熵编码的语法元素作为经编码视频数据的部分输出。It should be noted that coding structure unit 19 and/or video encoder 20 may be configured to generate syntax elements for inclusion in parameter sets. In some examples, coding structure unit 19 may be configured to generate syntax elements for inclusion in a higher-level parameter set, such as an SPS, and video encoder 20 may be configured to perform video encoding based on the syntax elements received from the coding structure and output the entropy-encoded syntax elements as part of the encoded video data.

根据本发明的技术，可按以下方式执行NAL单元类型的分配：使得例如目的地装置14等装置可容易地识别RAP图片及相关联的定时信息。在一实例中，不具有相关联的引导图片的IDR图片具有与可具有相关联的引导图片的IDR图片相异的NAL单元类型。举例来说，不具有相关联的引导图片的IDR图片具有NAL单元类型M，而可具有相关联的引导图片的IDR图片具有NAL单元类型N，其中M不等于N，如表4中所说明。应注意，在表4中所说明的实例中，与IDR图片相关联的引导图片可为DLP图片。在一实例中，表4中所说明的NAL单元类型可并入到表2中所说明的HEVC WD7NAL单元类型码及NAL单元类型类别中。举例来说，表2中经反转的NAL单元类型值可用于表4中的NAL单元类型M及N。According to the techniques of this disclosure, the assignment of NAL unit types may be performed in a manner that allows a device, such as destination device 14, to easily identify RAP pictures and associated timing information. In one example, an IDR picture that does not have an associated leading picture has a different NAL unit type than an IDR picture that may have an associated leading picture. For example, an IDR picture that does not have an associated leading picture has a NAL unit type of M, while an IDR picture that may have an associated leading picture has a NAL unit type of N, where M is not equal to N, as illustrated in Table 4. Note that in the example illustrated in Table 4, the leading picture associated with the IDR picture may be a DLP picture. In one example, the NAL unit types illustrated in Table 4 may be incorporated into the HEVC WD7 NAL unit type codes and NAL unit type categories illustrated in Table 2. For example, the inverted NAL unit type values in Table 2 may be used for NAL unit types M and N in Table 4.

表4：相异IDR NAL单元类型Table 4: Different IDR NAL unit types

在另一实例中，不具有相关联的引导图片的CRA图片具有不同于可具有相关联的引导图片的CRA图片的相异NAL单元类型。此外，不具有相关联的TFD图片的CRA图片具有不同于可具有相关联的TFD图片的CRA图片的相异NAL单元。因此，三个不同的NAL单元类型可用于不同类型的CRA图片，如表5中所说明。在一实例中，表5中所说明的NAL单元类型可并入到表2中所说明的HEVC WD7NAL单元类型码及NAL单元类型类别中。举例来说，表1中经反转的NAL单元类型值可用于表5中的NAL单元类型X、Y及Z。In another example, a CRA picture that does not have an associated leading picture has a different NAL unit type than a CRA picture that may have an associated leading picture. Furthermore, a CRA picture that does not have an associated TFD picture has a different NAL unit type than a CRA picture that may have an associated TFD picture. Thus, three different NAL unit types may be used for different types of CRA pictures, as illustrated in Table 5. In one example, the NAL unit types illustrated in Table 5 may be incorporated into the HEVC WD7 NAL unit type codes and NAL unit type categories illustrated in Table 2. For example, the inverted NAL unit type values in Table 1 may be used for NAL unit types X, Y, and Z in Table 5.

表5：相异CRA NAL单元类型Table 5: Different CRA NAL unit types

在另一实例中，不具有相关联的引导图片的BLA图片可具有不同于可具有相关联的引导图片的BLA图片的相异NAL单元类型。此外，不具有相关联的TFD图片的BLA图片可具有不同于可具有相关联的TFD图片的BLA图片的相异NAL单元。因此，三个不同的NAL单元类型可用于不同类型的BLA，如表6中所说明。在一实例中，表6中所说明的NAL单元类型可并入到表2中所说明的HEVC WD7NAL单元类型码及NAL单元类型类别中。举例来说，表2中经反转的NAL单元类型值可用于表6中的NAL单元类型A、B及C。In another example, a BLA picture that does not have an associated leading picture may have a different NAL unit type than a BLA picture that may have an associated leading picture. Furthermore, a BLA picture that does not have an associated TFD picture may have a different NAL unit type than a BLA picture that may have an associated TFD picture. Thus, three different NAL unit types may be used for different types of BLAs, as illustrated in Table 6. In one example, the NAL unit types illustrated in Table 6 may be incorporated into the HEVC WD7 NAL unit type codes and NAL unit type categories illustrated in Table 2. For example, the inverted NAL unit type values in Table 2 may be used for NAL unit types A, B, and C in Table 6.

表6：相异BLA NAL单元类型Table 6: Different BLA NAL unit types

关于表4到表6所描述的NAL单元类型的任何及所有组合可用于NAL单元类型的分配。在一实例中，关于表4到表6所描述的所有NAL单元类型可用于NAL单元类型的分配。表7说明表4到表6中所说明的所有NAL类型用于NAL单元类型的分配的实例。如表7中所说明，NAL单元类型包含关于表4到表6所描述的CRA图片、BLA图片及IDR图片NAL单元类型，以及上文所描述的VPS、SPS、PPS及APS NAL单元类型。表7可与上文的表2形成对比，这是因为：表7中所提供的NAL单元类型的分配包含针对IDR图片、CRA图片及BLA图片的多个NAL单元类型，然而，表1中所提供的NAL单元类型的分配包含针对IDR图片、CRA图片及BLA图片中的每一者的单一NAL单元类型。Any and all combinations of NAL unit types described with respect to Tables 4 through 6 may be used for the allocation of NAL unit types. In one example, all NAL unit types described with respect to Tables 4 through 6 may be used for the allocation of NAL unit types. Table 7 illustrates an example where all NAL types described in Tables 4 through 6 are used for the allocation of NAL unit types. As illustrated in Table 7, the NAL unit types include the CRA picture, BLA picture, and IDR picture NAL unit types described with respect to Tables 4 through 6, as well as the VPS, SPS, PPS, and APS NAL unit types described above. Table 7 may be contrasted with Table 2 above because the allocation of NAL unit types provided in Table 7 includes multiple NAL unit types for IDR pictures, CRA pictures, and BLA pictures, whereas the allocation of NAL unit types provided in Table 1 includes a single NAL unit type for each of the IDR pictures, CRA pictures, and BLA pictures.

表7：NAL单元类型码及NAL单元类型类别Table 7: NAL unit type code and NAL unit type category

封装单元21可经配置以接收来自视频编码器20的经编码视频数据及指示特定译码结构的信息，且基于表2到表7中所说明的NAL单元分配的任何及所有组合中所说明的NAL单元类型的分配，产生包含存取单元的经译码视频序列。另外，解封装单元29可经配置以接收经译码视频序列，并剖析存取单元及NAL单元，其中NAL单元是基于表2到表7中所说明的NAL单元分配的任何及所有组合而分配。Encapsulation unit 21 may be configured to receive encoded video data and information indicating a specific coding structure from video encoder 20, and generate a coded video sequence including access units based on the allocation of NAL unit types described in any and all combinations of the NAL unit allocations described in Tables 2 through 7. Additionally, decapsulation unit 29 may be configured to receive the coded video sequence and parse access units and NAL units, where the NAL units are allocated based on any and all combinations of the NAL unit allocations described in Tables 2 through 7.

如上文所描述，根据HEVC WD7，为了达成当前图片定时SEI消息及同时实现AU层级及DU层级HRD CPB移除两者以达成子图片延迟的机制，必须在编码全部AU之前，将DU发送出去，且在编码全部AU之前，仍不可将AU层级SEI消息发送出去。根据本发明的技术，封装单元21及解封装单元29可经配置以使得相比于HEVC WD7来说，可修改子图片层级或解码单元层级HRD行为。As described above, according to HEVC WD7, in order to implement the current picture timing SEI message and simultaneously implement both AU-level and DU-level HRD CPB removal to achieve sub-picture delay, the DU must be sent before all AUs are encoded, and the AU-level SEI message cannot be sent before all AUs are encoded. According to the techniques of this disclosure, the encapsulation unit 21 and the decapsulation unit 29 can be configured to modify the sub-picture level or decoding unit level HRD behavior compared to HEVC WD7.

举例来说，封装单元21可经配置以使得在编码全部AU之后，发送AU层级SEI消息。此AU层级SEI消息可包含于具有相异NAL单元类型的SEI NAL单元中。此SEI NAL单元与SEINAL单元的现有定义(例如，如在HEVC WD7中所定义)之间的一差异在于：可允许此相异SEINAL单元类型在解码次序上接在相同AU中的最后VCL NAL单元之后，且可受约束使得不会在解码次序上在相同AU中的第一VCL NAL单元之前。常规SEI NAL单元及SEI消息可分别被称作前缀SEI NAL单元及前缀SEI消息，而本文中所描述的相异SEI NAL单元及SEI消息可分别被称作后缀SEI NAL单元及后缀SEI消息。For example, encapsulation unit 21 may be configured so that an AU-level SEI message is sent after all AUs are encoded. This AU-level SEI message may be included in a SEI NAL unit having a distinct NAL unit type. One difference between this SEI NAL unit and the existing definition of a SEI NAL unit (e.g., as defined in HEVC WD7) is that this distinct SEI NAL unit type may be allowed to follow the last VCL NAL unit in the same AU in decoding order and may be constrained not to precede the first VCL NAL unit in the same AU in decoding order. Conventional SEI NAL units and SEI messages may be referred to as prefix SEI NAL units and prefix SEI messages, respectively, while the distinct SEI NAL units and SEI messages described herein may be referred to as suffix SEI NAL units and suffix SEI messages, respectively.

除经配置以基于表2到表7中所说明的NAL单元分配的任何及所有组合而产生经译码视频序列之外，封装单元21还可经配置以产生包含前缀SEI NAL单元及后缀SEI NAL单元的经译码视频序列。同样地，解封装单元29可经配置以接收经译码视频序列，并剖析存取单元及NAL单元，其中NAL单元包含前缀SEI NAL单元类型及后缀SEI NAL单元类型。即，解封装单元29可经配置以从存取单元提取后缀SEI NAL单元。表8说明表4到表6中所说明的所有NAL类型用于NAL单元类型的分配的实例，以及前缀SEI NAL单元及后缀SEI NAL单元。In addition to being configured to generate a coded video sequence based on any and all combinations of the NAL unit assignments described in Tables 2 through 7, encapsulation unit 21 may also be configured to generate a coded video sequence that includes prefix SEI NAL units and suffix SEI NAL units. Similarly, decapsulation unit 29 may be configured to receive a coded video sequence and parse access units and NAL units, where the NAL units include prefix SEI NAL unit types and suffix SEI NAL unit types. That is, decapsulation unit 29 may be configured to extract suffix SEI NAL units from access units. Table 8 illustrates an example of all NAL types described in Tables 4 through 6 being used for the assignment of NAL unit types, as well as prefix SEI NAL units and suffix SEI NAL units.

表8：NAL单元类型码及NAL单元类型类别Table 8: NAL unit type code and NAL unit type category

如上文所描述，除SEI NAL单元之外，非VCL NAL单元类型还包含VPS、SPS、PPS及APS NAL单元。根据HEVC WD7中所定义的参数集类型，每一SPS参考VPS ID，每一PPS参考SPSID，且每一切片标头参考PPS ID且可能参考APS ID。视频编码器20及/或译码结构单元19可经配置以根据HEVC WD7中所定义的参数集产生参数集。另外，视频编码器20及/或译码结构单元19可经配置以产生参数集，其中可任选地在切片标头中用信号发出VPS ID及SPS ID(例如，其中VPS ID在SPS ID之前)。在于切片标头中用信号发出VPS ID及SPS ID的一实例中，无VPS ID将位于SPS中且无SPS ID将位于PPS中。另外，在一实例中，VPS ID及SPS ID可存在于每一RAP图片的切片标头中，且每一图片可与恢复点SEI消息相关联。另外，在其它实例中，VPS ID及SPS ID可存在于其它图片的切片标头中。As described above, in addition to SEI NAL units, non-VCL NAL unit types also include VPS, SPS, PPS, and APS NAL units. According to the parameter set type defined in HEVC WD7, each SPS references a VPS ID, each PPS references an SPS ID, and each slice header references a PPS ID and possibly an APS ID. Video encoder 20 and/or coding structure unit 19 may be configured to generate a parameter set according to the parameter set defined in HEVC WD7. Additionally, video encoder 20 and/or coding structure unit 19 may be configured to generate a parameter set in which the VPS ID and SPS ID may optionally be signaled in the slice header (e.g., where the VPS ID precedes the SPS ID). In one example where the VPS ID and SPS ID are signaled in the slice header, no VPS ID will be located in the SPS and no SPS ID will be located in the PPS. Additionally, in one example, the VPS ID and SPS ID may be present in the slice header of each RAP picture, and each picture may be associated with a recovery point SEI message. Additionally, in other examples, the VPS ID and SPS ID may be present in slice headers of other pictures.

图4为说明可实施本发明中所描述的技术的实例封装单元的框图。在图4中所说明的实例中，封装单元21包含VCL NAL单元建构器402、非VCL NAL单元建构器404、存取单元建构器406及位流输出接口408。封装单元21接收经编码视频数据及高阶语法，并输出经编码视频位流。经编码视频数据可包含与切片相关联的残余视频数据及语法数据。高阶语法数据可包含(例如)包含于参数集中的语法元素、SEI消息或由例如即将到来的HEVC标准等视频译码标准定义的其它语法元素。经编码视频位流可包含一或多个经译码视频序列，且可大体上符合例如即将到来的HEVC标准等视频译码标准。如上文所描述，VCL NAL单元包含视频数据的切片。VCL NAL单元建构器402可经配置以接收经编码视频数据的切片，且基于包含切片的图片的类型而产生VCL NAL单元。VCL NAL单元建构器402可经配置以根据上文关于表2到表8所描述的NAL分配的任何及所有组合而产生VCL NAL单元。VCL NAL单元建构器402可经配置以在VCLNAL单元中包含标头，其中所述标头识别VCLNAL单元的类型。FIG4 is a block diagram illustrating an example encapsulation unit that may implement the techniques described in this disclosure. In the example illustrated in FIG4 , encapsulation unit 21 includes a VCL NAL unit constructor 402, a non-VCL NAL unit constructor 404, an access unit constructor 406, and a bitstream output interface 408. Encapsulation unit 21 receives encoded video data and high-level syntax and outputs an encoded video bitstream. The encoded video data may include residual video data and syntax data associated with a slice. The high-level syntax data may include, for example, syntax elements included in a parameter set, SEI messages, or other syntax elements defined by a video coding standard such as the upcoming HEVC standard. The encoded video bitstream may include one or more coded video sequences and may generally conform to a video coding standard such as the upcoming HEVC standard. As described above, a VCL NAL unit includes a slice of video data. VCL NAL unit constructor 402 may be configured to receive a slice of encoded video data and generate a VCL NAL unit based on the type of picture comprising the slice. The VCL NAL unit constructor 402 may be configured to generate VCL NAL units according to any and all combinations of NAL allocations described above with respect to Tables 2 through 8. The VCL NAL unit constructor 402 may be configured to include a header in the VCL NAL unit, wherein the header identifies the type of the VCL NAL unit.

举例来说，VCL NAL单元建构器402可经配置以接收包含于一IDR图片中的视频数据的切片，且(1)如果所述IDR图片不具有相关联的引导图片，那么将视频数据的所述切片封装于具有指示所述IDR图片不具有引导图片的类型的NAL单元中，或(2)如果所述IDR图片具有相关联的引导图片，那么将视频数据的所述切片封装于具有指示所述IDR图片具有引导图片的类型的NAL单元中。VCL NAL单元建构器402可经配置以接收包含于一CRA图片中的视频数据的切片，且(1)如果所述CRA图片不具有相关联的引导图片，那么将视频数据的所述切片封装于具有指示所述CRA图片不具有引导图片的类型的NAL单元中，或(2)如果所述CRA图片具有相关联的引导图片，那么将视频数据的所述切片封装于具有指示所述CRA图片具有引导图片的类型的NAL单元中。另外，如果与CRA图片相关联的引导图片为TFD图片，那么VCL NAL单元建构器402可经配置以将视频数据的切片封装于具有指示与CRA图片相关联的引导图片为TFD的类型的NAL单元中。For example, the VCL NAL unit constructor 402 may be configured to receive a slice of video data contained in an IDR picture and (1) if the IDR picture does not have associated leading pictures, encapsulate the slice of video data in a NAL unit having a type indicating that the IDR picture does not have leading pictures, or (2) if the IDR picture has associated leading pictures, encapsulate the slice of video data in a NAL unit having a type indicating that the IDR picture has leading pictures. The VCL NAL unit constructor 402 may be configured to receive a slice of video data contained in a CRA picture and (1) if the CRA picture does not have associated leading pictures, encapsulate the slice of video data in a NAL unit having a type indicating that the CRA picture does not have leading pictures, or (2) if the CRA picture has associated leading pictures, encapsulate the slice of video data in a NAL unit having a type indicating that the CRA picture has leading pictures. Additionally, if the leading pictures associated with the CRA picture are TFD pictures, the VCL NAL unit constructor 402 may be configured to encapsulate the slice of video data in a NAL unit having a type indicating that the leading pictures associated with the CRA picture are TFD.

另外，如果与CRA图片相关联的引导图片并非TFD图片，那么VCL NAL单元建构器402可经配置以将视频数据的所述切片封装于具有指示与CRA图片相关联的引导图片并非TFD的类型的NAL单元中。另外，VCL NAL单元建构器402可经配置以接收包含于一BLA图片中的视频数据的切片，且(1)如果所述BLA图片不具有相关联的引导图片，那么将视频数据的所述切片封装于具有指示所述BLA图片不具有引导图片的类型的NAL单元中，或(2)如果所述BLA图片具有相关联的引导图片，那么将视频数据的所述切片封装于具有指示所述BLA图片具有引导图片的类型的NAL单元中。另外，如果与BLA图片相关联的引导图片为TFD图片，那么VCL NAL单元建构器402可经配置以将视频数据的切片封装于具有指示与BLA图片相关联的引导图片为TFD的类型的NAL单元中。另外，如果与BLA图片相关联的引导图片并非TFD图片，那么VCL NAL单元建构器402可经配置以将视频数据的切片封装于具有指示与BLA图片相关联的引导图片并非TFD的类型的NAL单元中。Additionally, if the leading picture associated with the CRA picture is not a TFD picture, the VCL NAL unit constructor 402 may be configured to encapsulate the slice of video data in a NAL unit having a type indicating that the leading picture associated with the CRA picture is not a TFD picture. Additionally, the VCL NAL unit constructor 402 may be configured to receive a slice of video data contained in a BLA picture and (1) if the BLA picture does not have an associated leading picture, encapsulate the slice of video data in a NAL unit having a type indicating that the BLA picture does not have a leading picture, or (2) if the BLA picture has an associated leading picture, encapsulate the slice of video data in a NAL unit having a type indicating that the BLA picture has a leading picture. Additionally, if the leading picture associated with the BLA picture is a TFD picture, the VCL NAL unit constructor 402 may be configured to encapsulate the slice of video data in a NAL unit having a type indicating that the leading picture associated with the BLA picture is a TFD picture. Additionally, if the leading picture associated with the BLA picture is not a TFD picture, the VCL NAL unit constructor 402 may be configured to encapsulate the slice of video data in a NAL unit having a type indicating that the leading picture associated with the BLA picture is not a TFD picture.

图5为说明根据本发明的技术产生VCL NAL单元的实例的流程图。尽管将图5中所说明的产生VCL NAL单元的实例描述为由VCL NAL单元建构器402执行，但源装置12、视频编码器20、封装单元21及其组件的组合的任何组合可执行图5中所说明的产生VCL NAL单元的实例。如图5中所说明，VCL NAL单元建构器402接收视频数据的切片(502)。可根据本文中所描述的编码技术中的任一者将视频数据的切片编码为经编码视频数据。视频数据的切片可包含于本文中所描述的图片类型中的一者中。VCL NAL单元建构器402确定视频数据的切片包含于IDR图片还是CRA图片中(504)。FIG5 is a flow chart illustrating an example of generating a VCL NAL unit according to the techniques of this disclosure. Although the example of generating a VCL NAL unit illustrated in FIG5 is described as being performed by VCL NAL unit constructor 402, any combination of source device 12, video encoder 20, encapsulation unit 21, and combinations of components thereof may perform the example of generating a VCL NAL unit illustrated in FIG5. As illustrated in FIG5, VCL NAL unit constructor 402 receives a slice of video data (502). The slice of video data may be encoded into encoded video data according to any of the encoding techniques described herein. The slice of video data may be contained in one of the picture types described herein. VCL NAL unit constructor 402 determines whether the slice of video data is contained in an IDR picture or a CRA picture (504).

如果视频数据的切片包含于IDR图片中(504的“IDR”分支)，那么VCL NAL单元建构器402确定所述IDR图片是否具有相关联的引导图片(506)。如果IDR图片不具有相关联的引导图片(506的“否”分支)，那么VCL NAL单元建构器402产生指示所述IDR图片不具有相关联的引导图片的VCL NAL单元(508)。如果IDR图片具有相关联的引导图片(506的“是”分支)，那么VCL NAL单元建构器402产生指示所述IDR图片具有相关联的引导图片的VCL NAL单元(510)。If the slice of video data is included in an IDR picture ("IDR" branch of 504), VCL NAL unit constructor 402 determines whether the IDR picture has associated leading pictures (506). If the IDR picture does not have associated leading pictures ("No" branch of 506), VCL NAL unit constructor 402 generates a VCL NAL unit indicating that the IDR picture does not have associated leading pictures (508). If the IDR picture has associated leading pictures ("Yes" branch of 506), VCL NAL unit constructor 402 generates a VCL NAL unit indicating that the IDR picture has associated leading pictures (510).

如果视频数据的切片包含于CRA图片中，那么VCL NAL单元建构器402确定所述CRA图片是否具有相关联的引导图片(512)。如果CRA图片不具有相关联的引导图片(512的“否”分支)，那么VCL NAL单元建构器402产生指示所述CRA图片不具有相关联的引导图片的VCLNAL单元(514)。如果CRA图片具有相关联的引导图片(512的“是”分支)，那么VCL NAL单元建构器402确定相关联的引导图片是否为TFD图片(516)。If the slice of video data is contained in a CRA picture, VCL NAL unit constructor 402 determines whether the CRA picture has associated leading pictures (512). If the CRA picture does not have associated leading pictures (the "No" branch of 512), VCL NAL unit constructor 402 generates a VCL NAL unit indicating that the CRA picture does not have associated leading pictures (514). If the CRA picture has associated leading pictures (the "Yes" branch of 512), VCL NAL unit constructor 402 determines whether the associated leading pictures are TFD pictures (516).

如果CRA图片的相关联的引导图片为TFD图片(516的“是”分支)，那么VCL NAL单元建构器402产生指示CRA的相关联的引导图片为TFD图片的VCL NAL单元(518)。如果BLA图片的相关联的引导图片并非TFD图片(516的“否”分支)，那么VCL NAL单元建构器402产生指示相关联的引导图片并非TFD图片的VCL NAL单元(520)。If the associated leading picture of the CRA picture is a TFD picture ("yes" branch of 516), the VCL NAL unit constructor 402 generates a VCL NAL unit indicating that the associated leading picture of the CRA is a TFD picture (518). If the associated leading picture of the BLA picture is not a TFD picture ("no" branch of 516), the VCL NAL unit constructor 402 generates a VCL NAL unit indicating that the associated leading picture is not a TFD picture (520).

VCL NAL单元建构器402可通过将切片数据封装于NAL单元中及将NAL单元类型值包含于NAL单元标头中而产生NAL单元。每一NAL单元类型值可对应于一相应NAL单元类型。在一实例中，可根据表7定义NAL单元类型值。可由NAL单元建构器402将所产生的NAL单元输出到存取单元建构器406以用于包含于存取单元中(522)。VCL NAL unit constructor 402 may generate a NAL unit by encapsulating the slice data in a NAL unit and including a NAL unit type value in the NAL unit header. Each NAL unit type value may correspond to a respective NAL unit type. In one example, NAL unit type values may be defined according to Table 7. The generated NAL unit may be output by NAL unit constructor 402 to access unit constructor 406 for inclusion in an access unit (522).

以此方式，封装单元21表示用于产生包含视频数据的位流的装置的实例，所述装置包含处理器，所述处理器经配置以：确定随机存取点(RAP)图片是否为可具有相关联的引导图片的类型，及所述RAP图片包括瞬时解码器刷新(IDR)图片还是清洁随机存取(CRA)图片；将所述RAP图片的切片封装于网络抽象层(NAL)单元中，其中所述NAL单元包含指示所述RAP图片是否为可具有相关联的引导图片的类型的NAL单元类型值；及产生包含所述NAL单元的位流。In this way, encapsulation unit 21 represents an example of a device for generating a bitstream including video data, the device including a processor configured to: determine whether a random access point (RAP) picture is of a type that can have associated leading pictures, and whether the RAP picture comprises an instantaneous decoder refresh (IDR) picture or a clean random access (CRA) picture; encapsulate slices of the RAP picture in a network abstraction layer (NAL) unit, wherein the NAL unit includes a NAL unit type value indicating whether the RAP picture is of a type that can have associated leading pictures; and generate a bitstream including the NAL unit.

同样地，图5的方法表示产生包含视频数据的位流的方法的实例，所述方法包含：确定随机存取点(RAP)图片是否为可具有相关联的引导图片的类型，及所述RAP图片包括瞬时解码器刷新(IDR)图片还是清洁随机存取(CRA)图片；将所述RAP图片的切片封装于网络抽象层(NAL)单元中，其中所述NAL单元包含指示所述RAP图片是否为可具有相关联的引导图片的类型的NAL单元类型值；及产生包含所述NAL单元的位流。Likewise, the method of FIG5 represents an example of a method of generating a bitstream comprising video data, the method comprising: determining whether a random access point (RAP) picture is of a type that can have associated leading pictures, and whether the RAP picture comprises an instantaneous decoder refresh (IDR) picture or a clean random access (CRA) picture; encapsulating a slice of the RAP picture in a network abstraction layer (NAL) unit, wherein the NAL unit includes a NAL unit type value indicating whether the RAP picture is of a type that can have associated leading pictures; and generating a bitstream comprising the NAL unit.

再次参看图4，非VCL NAL单元建构器404可经配置以接收高阶语法元素，例如包含于参数集及SEI消息中的语法元素(如上文所描述)，且基于上文关于表2到表8所描述的NAL单元分配的任何及所有组合而产生非VCL NAL单元。非VCL NAL单元建构器404可经配置以通过将语法数据封装于NAL单元中及将NAL单元类型值包含于NAL单元标头中而产生非VCLNAL单元。举例来说，非VCL NAL建构器可经配置以接收包含于参数集中的语法元素，且将指示参数集类型的NAL单元类型值包含于NAL单元标头中。4 , the non-VCL NAL unit constructor 404 may be configured to receive high-level syntax elements, such as syntax elements included in parameter sets and SEI messages (as described above), and generate non-VCL NAL units based on any and all combinations of NAL unit allocations described above with respect to Tables 2 through 8. The non-VCL NAL unit constructor 404 may be configured to generate non-VCL NAL units by encapsulating syntax data in NAL units and including a NAL unit type value in a NAL unit header. For example, the non-VCL NAL constructor may be configured to receive syntax elements included in a parameter set and include a NAL unit type value indicating the parameter set type in the NAL unit header.

另外，非VCL NAL单元建构器404可经配置以接收AU层级SEI消息，且产生SEI消息NAL单元。在一实例中，非VCL NAL单元建构器404可经配置以产生两种类型的SEI消息NAL单元，其中第一类型的SEI NAL单元指示此SEI NAL单元可在解码次序上接在存取单元中的最后VCL NAL单元之后，且第二类型的SEI NAL单元指示此SEI NAL单元不可在解码次序上接在存取单元中的最后VCL NAL单元之后。另外，第一类型的SEI NAL可受约束，使得不可允许其在解码次序上在相同存取单元中的第一VCL NAL单元之前。第一类型的NAL单元可被称作后缀SEI NAL单元，且第二类型的NAL单元可被称作前缀SEI NAL单元。非VCL NAL单元建构器404将非VCL NAL单元输出到存取单元建构器406。In addition, the non-VCL NAL unit constructor 404 may be configured to receive AU-level SEI messages and generate SEI message NAL units. In one example, the non-VCL NAL unit constructor 404 may be configured to generate two types of SEI message NAL units, wherein a first type of SEI NAL unit indicates that this SEI NAL unit may follow the last VCL NAL unit in the access unit in decoding order, and a second type of SEI NAL unit indicates that this SEI NAL unit may not follow the last VCL NAL unit in the access unit in decoding order. In addition, the first type of SEI NAL unit may be constrained such that it is not allowed to precede the first VCL NAL unit in the same access unit in decoding order. The first type of NAL unit may be referred to as a suffix SEI NAL unit, and the second type of NAL unit may be referred to as a prefix SEI NAL unit. The non-VCL NAL unit constructor 404 outputs the non-VCL NAL unit to the access unit constructor 406.

存取单元建构器406可经配置以接收VCL NAL单元及非VCL NAL单元，且产生存取单元。存取单元建构器406可接收表2到表8中所定义的任何类型的NAL单元。VCL存取单元建构器406可经配置以基于本文中所描述的NAL单元类型的任何及所有组合而产生存取单元。如上文所描述，根据HEVC WD7，存取单元为在解码次序上连续且含有一经译码图片的NAL单元的集合。因此，存取单元建构器406可经配置以接收多个NAL单元，且根据解码次序布置所述多个NAL单元。另外，存取单元建构器406可经配置以布置后缀SEI NAL单元，如上文所描述，使得其接在存取单元中的最后VCL NAL单元之后及/或不在相同存取单元中的第一VCLNAL单元之前。The access unit constructor 406 can be configured to receive VCL NAL units and non-VCL NAL units and generate an access unit. The access unit constructor 406 can receive any type of NAL unit defined in Tables 2 to 8. The VCL access unit constructor 406 can be configured to generate an access unit based on any and all combinations of NAL unit types described herein. As described above, according to HEVC WD7, an access unit is a set of NAL units that are consecutive in decoding order and contain a decoded picture. Therefore, the access unit constructor 406 can be configured to receive multiple NAL units and arrange the multiple NAL units according to the decoding order. In addition, the access unit constructor 406 can be configured to arrange the suffix SEI NAL unit, as described above, so that it follows the last VCL NAL unit in the access unit and/or does not precede the first VCL NAL unit in the same access unit.

图6为说明根据本发明的技术产生非VCL NAL单元的实例的流程图。尽管将图6中所说明的产生非VCL NAL单元的实例描述为由非VCL NAL单元建构器404及存取单元建构器406执行，但源装置12、视频编码器20、封装单元21及其组件的组合的任何组合可执行图6中所说明的产生非VCLNAL单元的实例。FIG6 is a flowchart illustrating an example of generating non-VCL NAL units according to the techniques of this disclosure. Although the example of generating non-VCL NAL units illustrated in FIG6 is described as being performed by non-VCL NAL unit constructor 404 and access unit constructor 406, any combination of source device 12, video encoder 20, encapsulation unit 21, and combinations of components thereof may perform the example of generating non-VCL NAL units illustrated in FIG6.

如图6中所展示，非VCL NAL单元建构器404接收SEI消息(602)。SEI消息可为上文关于表1所描述的任何类型的SEI消息。非VCL NAL单元建构器404确定所述SEI消息为前缀SEI消息还是后缀SEI消息(604)。6, non-VCL NAL unit constructor 404 receives a SEI message (602). The SEI message may be any type of SEI message described above with respect to Table 1. Non-VCL NAL unit constructor 404 determines whether the SEI message is a prefix SEI message or a suffix SEI message (604).

如果所述SEI消息为后缀SEI消息(604的“后缀”分支)，那么非VCL NAL单元建构器404产生SEI NAL单元的指示SEI NAL单元为后缀SEI消息的类型值(606)。如果所述SEI消息为前缀SEI消息(604的“前缀”分支)，那么非VCL NAL单元建构器404产生SEI NAL单元的指示SEI NAL单元为常规SEI消息的类型值(608)。If the SEI message is a suffix SEI message (the "Suffix" branch of 604), the non-VCL NAL unit constructor 404 generates a type value for the SEI NAL unit indicating that the SEI NAL unit is a suffix SEI message (606). If the SEI message is a prefix SEI message (the "Prefix" branch of 604), the non-VCL NAL unit constructor 404 generates a type value for the SEI NAL unit indicating that the SEI NAL unit is a regular SEI message (608).

存取单元建构器406接收所产生的NAL单元，所述所产生的NAL单元可包含上文关于表2到表8所描述的NAL单元的类型的任何组合(610)。存取单元建构器406产生包含所接收的NAL单元的存取单元(612)。如果所产生的存取单元包含后缀SEI NAL单元，那么存取单元的NAL单元可经布置，使得所述后缀SEI NAL并不在相同存取单元中的第一VCL NAL单元之前，但可在解码次序上接在存取单元中的最后VCL NAL单元之后。The access unit constructor 406 receives the generated NAL units, which may include any combination of the types of NAL units described above with respect to Tables 2 through 8 (610). The access unit constructor 406 generates an access unit that includes the received NAL units (612). If the generated access unit includes a suffix SEI NAL unit, the NAL units of the access unit may be arranged such that the suffix SEI NAL unit does not precede the first VCL NAL unit in the same access unit, but may follow the last VCL NAL unit in the access unit in decoding order.

以此方式，封装单元21表示处理器的实例，所述处理器经配置以：确定补充增强信息(SEI)消息为前缀SEI消息还是后缀SEI消息，其中所述SEI消息包含与经编码视频数据有关的数据；将所述SEI消息封装于SEI NAL单元中，其中所述SEI NAL单元包含NAL单元类型值，所述NAL单元类型值指示所述SEI NAL单元为前缀SEI NAL单元还是后缀SEI NAL单元，及所述SEI消息为前缀SEI消息还是后缀SEI消息；及产生至少包含所述SEI NAL单元的位流。In this manner, encapsulation unit 21 represents an example of a processor configured to: determine whether a supplemental enhancement information (SEI) message is a prefix SEI message or a suffix SEI message, wherein the SEI message includes data related to the encoded video data; encapsulate the SEI message in a SEI NAL unit, wherein the SEI NAL unit includes a NAL unit type value that indicates whether the SEI NAL unit is a prefix SEI NAL unit or a suffix SEI NAL unit, and whether the SEI message is a prefix SEI message or a suffix SEI message; and generate a bitstream including at least the SEI NAL unit.

同样地，图6的方法表示产生包含视频数据的位流的方法的实例，所述方法包含：确定补充增强信息(SEI)消息为前缀SEI消息还是后缀SEI消息，其中所述SEI消息包含与经编码视频数据有关的数据；将所述SEI消息封装于SEI NAL单元中，其中所述SEI NAL单元包含NAL单元类型值，所述NAL单元类型值指示所述SEI NAL单元为前缀SEI NAL单元还是后缀SEI NAL单元，及所述SEI消息为前缀SEI消息还是后缀SEI消息；及产生至少包含所述SEINAL单元的位流。Likewise, the method of FIG. 6 represents an example of a method of generating a bitstream including video data, the method including: determining whether a supplemental enhancement information (SEI) message is a prefix SEI message or a suffix SEI message, wherein the SEI message includes data related to the encoded video data; encapsulating the SEI message in an SEI NAL unit, wherein the SEI NAL unit includes a NAL unit type value, the NAL unit type value indicating whether the SEI NAL unit is a prefix SEI NAL unit or a suffix SEI NAL unit, and whether the SEI message is a prefix SEI message or a suffix SEI message; and generating a bitstream including at least the SEI NAL unit.

再次参看图4，位流输出接口408可经配置以接收存取单元，且产生经译码视频序列。位流输出接口408可经进一步配置以将经译码视频序列作为经编码视频位流的部分输出，其中经编码视频位流包含基于本文中所描述的NAL单元类型的任何及所有组合的一或多个经译码视频序列。如上文所描述，根据HEVC WD7，经译码视频序列为在解码次序上连续的存取单元的集合。因此，位流输出接口408可经配置以接收多个存取单元，且根据解码次序布置所述多个存取单元。4, the bitstream output interface 408 can be configured to receive access units and generate a coded video sequence. The bitstream output interface 408 can be further configured to output the coded video sequence as part of an encoded video bitstream, wherein the encoded video bitstream includes one or more coded video sequences based on any and all combinations of NAL unit types described herein. As described above, according to HEVC WD7, a coded video sequence is a set of access units that are consecutive in decoding order. Therefore, the bitstream output interface 408 can be configured to receive multiple access units and arrange them according to decoding order.

如上文所描述，译码结构单元19及/或视频编码器20可经配置以产生包含于参数集中的语法元素，包含可包含于VUI参数的集合中的fixed_pic_rate_flag语法元素，VUI参数的集合可包含于SPS中，如HEVC WD7中所提供。另外，译码结构单元19及/或视频编码器20可经配置以产生fixed_pic_rate_flag语法元素，其中所述fixed_pic_rate_flag语法元素包含从HEVC WD7中所提供的那些语义修改的语义。举例来说，根据HEVC WD7中的fixed_pic_rate_flag的当前语义，当fixed_pic_rate_flag等于1时，需要在输出次序上连续的两个图片的呈现时间之间的差等于时钟刻度。然而，当丢弃一些最高时间层以达成基于时间可缩放性的流调适时，此情况将需要改变time_scale或num_units_in_tick的值。As described above, coding structure unit 19 and/or video encoder 20 may be configured to generate syntax elements for inclusion in a parameter set, including a fixed_pic_rate_flag syntax element that may be included in a set of VUI parameters, which may be included in an SPS, as provided in HEVC WD7. Additionally, coding structure unit 19 and/or video encoder 20 may be configured to generate a fixed_pic_rate_flag syntax element, wherein the fixed_pic_rate_flag syntax element includes semantics modified from those provided in HEVC WD7. For example, according to the current semantics of fixed_pic_rate_flag in HEVC WD7, when fixed_pic_rate_flag is equal to 1, the difference between the presentation times of two pictures consecutive in output order is required to be equal to the clock tick. However, when some of the highest temporal layers are dropped to achieve stream adaptation based on temporal scalability, this may require changes to the values of time_scale or num_units_in_tick.

在一实例中，不需要增量(即，在输出次序上连续的两个图片的呈现时间之间的差)确切等于时钟刻度，而是可能需要增量为时钟刻度的整数数目倍。以此方式，译码结构单元19及/或视频编码器20可经配置以产生fixed_pic_rate_flag语法元素，使得当fixed_pic_rate_flag等于1时，需要在输出次序上连续的两个图片的呈现时间之间的差等于时钟刻度的整数倍。In one example, the increment (i.e., the difference between the presentation times of two pictures consecutive in output order) need not be exactly equal to the clock tick, but rather the increment may need to be an integer number of multiples of the clock tick. In this way, coding structure unit 19 and/or video encoder 20 may be configured to generate a fixed_pic_rate_flag syntax element such that when fixed_pic_rate_flag is equal to 1, the difference between the presentation times of two pictures consecutive in output order is required to be equal to an integer multiple of the clock tick.

在另一实例中，可能需要译码结构单元19及/或视频编码器20针对每一时间层用信号发出fixed_pic_rate_flag。另外，在此实例中，如果特定时间层的fixed_pic_rate_flag等于1，即，时间层表示具有恒定图片速率，那么可用信号发出值N，且所述时间层表示的增量(在输出次序上连续的两个图片的呈现时间之间)可等于时钟刻度的N倍。In another example, coding structure unit 19 and/or video encoder 20 may need to signal a fixed_pic_rate_flag for each temporal layer. Additionally, in this example, if fixed_pic_rate_flag for a particular temporal layer is equal to 1, i.e., the temporal layer representation has a constant picture rate, a value N may be signaled, and the increment of the temporal layer representation (between presentation times of two pictures consecutive in output order) may be equal to N times the clock tick.

在另一实例中，译码结构单元19及/或视频编码器20可经配置以任选地针对每一时间层用信号发出fixed_pic_rate_flag。在此实例中，如果特定层的fixed_pic_rate_flag存在且等于1，即，时间层表示具有恒定图片速率，那么可用信号发出值N，且所述时间层表示的增量(在输出次序上连续的两个图片的呈现时间之间)等于时钟刻度的N倍。在任选地针对每一时间层用信号发出fixed_pic_rate_flag的状况下，假定针对最高时间层用信号发出fixed_pic_rate_flag且值等于1，接着，针对不具有用信号发出的fixed_pic_rate_flag的每一特定时间层，可导出fixed_pic_rate_flag的值等于针对最高时间层用信号发出的fixed_pic_rate_flag，且导出N的值等于2^{max_Tid-currTid}，其中max_Tid等于最高temporal_id值，且currTid等于特定时间层的temporal_id。In another example, coding structure unit 19 and/or video encoder 20 may be configured to optionally signal a fixed_pic_rate_flag for each temporal layer. In this example, if the fixed_pic_rate_flag for a particular layer is present and equal to 1, that is, the temporal layer representation has a constant picture rate, then a value N may be signaled, and the increment of the temporal layer representation (between presentation times of two pictures consecutive in output order) is equal to N times the clock tick. In the case where a fixed_pic_rate_flag is optionally signaled for each temporal layer, assuming that fixed_pic_rate_flag is signaled for the highest temporal layer and has a value equal to 1, then, for each specific temporal layer that does not have a signaled fixed_pic_rate_flag, the value of fixed_pic_rate_flag can be derived to be equal to the fixed_pic_rate_flag signaled for the highest temporal layer, and the value of N is derived to be equal to 2 ^{max_Tid - currTid} , where max_Tid is equal to the highest temporal_id value and currTid is equal to the temporal_id of the specific temporal layer.

图7为说明用信号发出呈现时间增量值的实例的流程图。尽管将图7中所说明的用信号发出呈现时间增量值的实例描述为由封装单元21执行，但源装置12、视频编码器20、封装单元21及其组件的组合的任何组合可执行图7中所说明的用信号发出呈现时间增量值的实例。FIG7 is a flowchart illustrating an example of signaling presentation time delta values. Although the example of signaling presentation time delta values illustrated in FIG7 is described as being performed by encapsulation unit 21, any combination of source device 12, video encoder 20, encapsulation unit 21, and combinations of components thereof may perform the example of signaling presentation time delta values illustrated in FIG7.

如图7的实例中所说明，封装单元21产生指示第一图片的呈现时间(例如，POC值)与第二图片的呈现时间之间的增量是否为时钟刻度值的整数的旗标(702)。换句话说，封装单元21可产生指示第一图片与第二图片的呈现时间之间的差(例如，增量)是否为时钟刻度值的整数倍的数据。图7中所描述的旗标表示此所产生的数据的实例。在一些状况下，封装单元21可从译码结构单元19或视频编码器20接收旗标的值。旗标可为上文所描述的fixed_pic_rate_flag语法元素中的任一者。7 , encapsulation unit 21 generates a flag indicating whether the delta between the presentation time (e.g., POC value) of a first picture and the presentation time of a second picture is an integer of a clock tick value (702). In other words, encapsulation unit 21 may generate data indicating whether the difference (e.g., delta) between the presentation times of the first picture and the second picture is an integer multiple of a clock tick value. The flags described in FIG. 7 represent examples of such generated data. In some cases, encapsulation unit 21 may receive the value of the flag from coding structure unit 19 or video encoder 20. The flag may be any of the fixed_pic_rate_flag syntax elements described above.

在一实例中，封装单元21确定旗标的值是否可指示增量为时钟刻度值的整数倍(704)。当旗标指示增量为时钟刻度的整数值时(704的“是”分支)，封装单元21可产生表示时钟刻度值的整数倍的整数值N(706)。可由例如目的地装置14等解码装置使用整数值N，以确定所述增量值，其中所述增量为时钟刻度值的整数倍。在一实例中，整数值N可为0到2047的值，且可指示增量所等于的比时钟的整数倍数小1的值。封装单元21可接着将旗标及整数值N作为位流的部分输出(708)。In one example, encapsulation unit 21 determines whether the value of the flag may indicate that the increment is an integer multiple of the clock tick value (704). When the flag indicates that the increment is an integer value of the clock tick (the "yes" branch of 704), encapsulation unit 21 may generate an integer value N representing an integer multiple of the clock tick value (706). The integer value N may be used by a decoding device, such as destination device 14, to determine the increment value, where the increment is an integer multiple of the clock tick value. In one example, the integer value N may be a value from 0 to 2047 and may indicate a value that is one less than an integer multiple of the clock tick to which the increment is equal. Encapsulation unit 21 may then output the flag and the integer value N as part of the bitstream (708).

另一方面，当封装单元21确定旗标指示增量值并非时钟刻度的整数倍时(704的“否”分支)，封装单元21可仅输出所述旗标(710)。On the other hand, when encapsulation unit 21 determines that the flag indicates that the increment value is not an integer multiple of the clock tick (“No” branch of 704 ), encapsulation unit 21 may simply output the flag ( 710 ).

以此方式，源装置12表示处理器的实例，所述处理器经配置以：产生指示第一图片的呈现时间与第二图片的呈现时间之间的差是否为时钟刻度值的整数倍的数据，及在所述数据指示所述差为时钟刻度值的整数倍时，产生表示所述整数倍数的数据。In this way, source device 12 represents an example of a processor configured to generate data indicating whether the difference between the presentation time of the first picture and the presentation time of the second picture is an integer multiple of a clock tick value, and when the data indicates that the difference is an integer multiple of the clock tick value, generate data representing the integer multiple.

同样地，图7的方法表示用于产生包含视频数据的位流的方法的实例，所述方法包含：产生指示第一图片的呈现时间与第二图片的呈现时间之间的差是否为时钟刻度值的整数倍的数据，及在所述数据指示所述差为时钟刻度值的整数倍时，产生表示所述整数倍数的数据。Similarly, the method of Figure 7 represents an example of a method for generating a bitstream including video data, the method including: generating data indicating whether the difference between the presentation time of a first picture and the presentation time of a second picture is an integer multiple of a clock tick value, and when the data indicates that the difference is an integer multiple of the clock tick value, generating data representing the integer multiple.

如上文所描述，封装单元21接收经编码视频数据。图8为说明可产生经编码视频数据的视频编码器20的实例的框图。如图8中所展示，视频编码器20接收视频数据及高阶语法数据。视频编码器20通常对个别视频切片内的视频块操作，以便编码视频数据。视频块可对应于CU内的译码节点。视频块可具有固定或变化的大小，且大小可根据指定译码标准而不同。视频编码器20可进一步产生(例如)帧标头中、块标头中、切片标头中或GOP标头中的语法数据，例如，基于块的语法数据、基于帧的语法数据及基于GOP的语法数据。GOP语法数据可描述相应GOP中的帧的数目，且帧语法数据可指示用以编码对应帧的编码/预测模式。As described above, encapsulation unit 21 receives encoded video data. FIG8 is a block diagram illustrating an example of a video encoder 20 that can generate encoded video data. As shown in FIG8, video encoder 20 receives video data and high-level syntax data. Video encoder 20 typically operates on video blocks within individual video slices to encode the video data. Video blocks may correspond to coding nodes within a CU. Video blocks may have fixed or varying sizes, and the sizes may differ according to a specified coding standard. Video encoder 20 may further generate syntax data, such as block-based syntax data, frame-based syntax data, and GOP-based syntax data, in a frame header, a block header, a slice header, or a GOP header. GOP syntax data may describe the number of frames in the respective GOP, and frame syntax data may indicate the encoding/prediction mode used to encode the corresponding frame.

在图8的实例中，视频编码器20包含模式选择单元40、参考图片存储器64、求和器50、变换处理单元52、量化单元54及熵编码单元56。模式选择单元40又包含运动补偿单元44、运动估计单元42、帧内预测单元46及分割单元48。为了视频块重建构，视频编码器20还包含反量化单元58、反变换单元60及求和器62。还可包含解块滤波器(图8中未展示)以对块边界进行滤波以从经重建构的视频移除成块假影。在需要时，解块滤波器通常将对求和器62的输出进行滤波。除解块滤波器之外，还可使用额外的滤波器(回路内或回路后)。为简洁起见，未展示此些滤波器，但在需要时，此些滤波器可对求和器50的输出进行滤波(作为回路内滤波器)。In the example of FIG8 , video encoder 20 includes a mode select unit 40, a reference picture memory 64, a summer 50, a transform processing unit 52, a quantization unit 54, and an entropy encoding unit 56. Mode select unit 40, in turn, includes a motion compensation unit 44, a motion estimation unit 42, an intra-prediction unit 46, and a partitioning unit 48. For video block reconstruction, video encoder 20 also includes an inverse quantization unit 58, an inverse transform unit 60, and summer 62. A deblocking filter (not shown in FIG8 ) may also be included to filter block boundaries to remove blocking artifacts from the reconstructed video. If necessary, the deblocking filter will typically filter the output of summer 62. In addition to the deblocking filter, additional filters (in-loop or post-loop) may also be used. For simplicity, such filters are not shown, but if necessary, such filters may filter the output of summer 50 (as in-loop filters).

在编码过程期间，视频编码器20接收待译码的视频帧或切片。可将帧或切片划分成多个视频块。运动估计单元42及运动补偿单元44相对于一或多个参考帧中的一或多个块对所接收的视频块执行帧间预测性译码，以提供时间预测。帧内预测单元46可替代地相对于与待译码的块相同的帧或切片中的一或多个相邻块对所接收的视频块执行帧内预测性译码，以提供空间预测。视频编码器20可执行多个译码遍次(例如)以选择用于视频数据的每一块的适当译码模式。During the encoding process, video encoder 20 receives a video frame or slice to be coded. The frame or slice may be divided into multiple video blocks. Motion estimation unit 42 and motion compensation unit 44 perform inter-predictive coding on the received video block relative to one or more blocks in one or more reference frames to provide temporal prediction. Intra-prediction unit 46 may alternatively perform intra-predictive coding on the received video block relative to one or more neighboring blocks in the same frame or slice as the block to be coded to provide spatial prediction. Video encoder 20 may perform multiple coding passes, for example, to select an appropriate coding mode for each block of video data.

此外，分割单元48可基于先前译码遍次中的先前分割方案的评估而将视频数据的块分割成子块。举例来说，分割单元48可最初将帧或切片分割成LCU，且基于速率-失真分析(例如，速率-失真最佳化)而将所述LCU中的每一者分割成子CU。模式选择单元40可进一步产生指示将LCU分割成子CU的四分树数据结构。四分树的叶节点CU可包含一或多个PU及一或多个TU。Furthermore, partition unit 48 may partition blocks of video data into sub-blocks based on an evaluation of a previous partitioning scheme in a previous coding pass. For example, partition unit 48 may initially partition a frame or slice into LCUs and, based on rate-distortion analysis (e.g., rate-distortion optimization), partition each of the LCUs into sub-CUs. Mode select unit 40 may further generate a quadtree data structure indicating the partitioning of the LCU into sub-CUs. A leaf node CU of the quadtree may include one or more PUs and one or more TUs.

模式选择单元40可选择译码模式(帧内或帧间)中的一者(例如，基于错误结果)，且将所得的经帧内译码或经帧间译码的块提供到求和器50以产生残余块数据，及提供到求和器62以重建构经编码块以供用作参考帧。模式选择单元40还将语法元素(例如，运动向量、帧内模式指示符、分割信息及其它此语法信息)提供到熵编码单元56。Mode select unit 40 may select one of the coding modes (intra or inter) (e.g., based on the error result) and provide the resulting intra- or inter-coded block to summer 50 to generate residual block data and to summer 62 to reconstruct the encoded block for use as a reference frame. Mode select unit 40 also provides syntax elements, such as motion vectors, intra-mode indicators, partition information, and other such syntax information, to entropy encoding unit 56.

运动估计单元42及运动补偿单元44可高度集成，但出于概念目的而分别加以说明。由运动估计单元42执行的运动估计为产生估计视频块的运动的运动向量的过程。举例来说，运动向量可指示当前视频帧或图片内的视频块的PU相对于参考帧(或其它经译码单元)内的预测性块(其相对于当前帧(或其它经译码单元)内正经译码的当前块)的位移。预测性块为经发现在像素差方面紧密地匹配待译码的块的块，所述像素差可通过绝对差总和(SAD)、平方差总和(SSD)或其它差量度来确定。在一些实例中，视频编码器20可计算存储于参考图片存储器64中的参考图片的子整数像素位置的值。举例来说，视频编码器20可内插参考图片的四分之一像素位置、八分之一像素位置或其它分数像素位置的值。因此，运动估计单元42可执行相对于全像素位置及分数像素位置的运动搜索，且以分数像素精度输出运动向量。Motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are described separately for conceptual purposes. Motion estimation, performed by motion estimation unit 42, is the process of generating motion vectors that estimate the motion of video blocks. For example, a motion vector may indicate the displacement of a PU of a video block within a current video frame or picture relative to a predictive block within a reference frame (or other coded unit) relative to the current block being coded within the current frame (or other coded unit). A predictive block is a block that is found to closely match the block to be coded in terms of pixel difference, which may be determined by sum of absolute difference (SAD), sum of squared difference (SSD), or other difference metrics. In some examples, video encoder 20 may calculate values for sub-integer pixel positions of a reference picture stored in reference picture memory 64. For example, video encoder 20 may interpolate values for quarter-pixel positions, eighth-pixel positions, or other fractional pixel positions of a reference picture. Thus, motion estimation unit 42 may perform motion searches relative to full-pixel positions and fractional pixel positions and output motion vectors with fractional pixel precision.

运动估计单元42通过比较经帧间译码的切片中的视频块的PU的位置与参考图片的预测性块的位置而计算所述PU的运动向量。参考图片可选从第一参考图片列表(列表0)或第二参考图片列表(列表1)，所述列表中的每一者识别存储于参考图片存储器64中的一或多个参考图片。运动估计单元42将所计算的运动向量发送到熵编码单元56及运动补偿单元44。Motion estimation unit 42 calculates a motion vector for a PU of a video block in an inter-coded slice by comparing the position of the PU to the position of a predictive block of a reference picture. The reference picture may be selected from a first reference picture list (List 0) or a second reference picture list (List 1), each of which identifies one or more reference pictures stored in reference picture memory 64. Motion estimation unit 42 sends the calculated motion vector to entropy encoding unit 56 and motion compensation unit 44.

由运动补偿单元44执行的运动补偿可涉及基于由运动估计单元42确定的运动向量来提取或产生预测性块。此外，在一些实例中，运动估计单元42及运动补偿单元44可在功能上集成。在接收到当前视频块的PU的运动向量后，运动补偿单元44便可将运动向量所指向的预测性块定位于参考图片列表中的一者中。求和器50通过从正经译码的当前视频块的像素值减去预测性块的像素值从而形成像素差值来形成残余视频块，如下文所论述。大体上，运动估计单元42相对于明度分量执行运动估计，且运动补偿单元44将基于明度分量而计算的运动向量用于色度分量与明度分量两者。模式选择单元40还可产生与视频块及视频切片相关联的语法元素以供视频解码器30用于解码视频切片的视频块。Motion compensation performed by motion compensation unit 44 may involve extracting or generating a predictive block based on the motion vector determined by motion estimation unit 42. Furthermore, in some examples, motion estimation unit 42 and motion compensation unit 44 may be functionally integrated. Upon receiving the motion vector for the PU of the current video block, motion compensation unit 44 may locate the predictive block to which the motion vector points in one of the reference picture lists. Summer 50 forms a residual video block by subtracting pixel values of the predictive block from pixel values of the current video block being coded, thereby forming pixel difference values, as discussed below. Generally, motion estimation unit 42 performs motion estimation with respect to the luma component, and motion compensation unit 44 uses the motion vector calculated based on the luma component for both chroma and luma components. Mode select unit 40 may also generate syntax elements associated with video blocks and video slices for use by video decoder 30 in decoding video blocks of video slices.

作为由运动估计单元42及运动补偿单元44执行的帧间预测(如上文所描述)的替代，帧内预测单元46可帧内预测当前块。详细地说，帧内预测单元46可确定使用帧内预测模式来编码当前块。在一些实例中，帧内预测单元46可(例如)在单独编码遍次期间使用各种帧内预测模式来编码当前块，且帧内预测单元46(或在一些实例中，模式选择单元40)可从所测试的模式选择适当帧内预测模式来使用。Intra-prediction unit 46 may intra-predict a current block, as an alternative to the inter-prediction performed by motion estimation unit 42 and motion compensation unit 44, as described above. In particular, intra-prediction unit 46 may determine an intra-prediction mode to use to encode the current block. In some examples, intra-prediction unit 46 may encode the current block using various intra-prediction modes, e.g., during separate encoding passes, and intra-prediction unit 46 (or, in some examples, mode select unit 40) may select an appropriate intra-prediction mode to use from the tested modes.

举例来说，帧内预测单元46可使用针对各种所测试的帧内预测模式的速率-失真分析计算速率-失真值，且在所测试的模式当中选择具有最佳速率-失真特性的帧内预测模式。速率-失真分析大体上确定经编码块与原始的未经编码块之间的失真(或错误)的量以及用以产生经编码块的位速率(即，位的数目)，所述原始的未经编码块经编码以产生所述经编码块。帧内预测单元46可从各种经编码块的失真及速率计算比率以确定哪一帧内预测模式展现块的最佳速率-失真值。For example, intra-prediction unit 46 may calculate rate-distortion values using a rate-distortion analysis for the various intra-prediction modes tested and select the intra-prediction mode with the best rate-distortion characteristics among the tested modes. The rate-distortion analysis generally determines the amount of distortion (or error) between the encoded block and the original, unencoded block that was encoded to produce the encoded block, as well as the bit rate (i.e., the number of bits) used to produce the encoded block. Intra-prediction unit 46 may calculate ratios from the distortion and rates for the various encoded blocks to determine which intra-prediction mode exhibits the best rate-distortion value for the block.

在选择块的帧内预测模式之后，帧内预测单元46可将指示块的所选择的帧内预测模式的信息提供到熵编码单元56。熵编码单元56可编码指示所选择的帧内预测模式的信息。视频编码器20可在经发射的位流配置数据中包含各种块的编码上下文的定义及用于所述上下文中的每一者的最有可能的帧内预测模式、帧内预测模式索引表及修改的帧内预测模式索引表的指示，所述经发射的位流配置数据可包含多个帧内预测模式索引表及多个修改的帧内预测模式索引表(还被称作码字映射表)。After selecting the intra-prediction mode for the block, intra-prediction unit 46 may provide information indicating the selected intra-prediction mode for the block to entropy encoding unit 56. Entropy encoding unit 56 may encode the information indicating the selected intra-prediction mode. Video encoder 20 may include definitions of the encoding contexts for the various blocks and indications of the most probable intra-prediction mode, an intra-prediction mode index table, and a modified intra-prediction mode index table in the transmitted bitstream configuration data, which may include multiple intra-prediction mode index tables and multiple modified intra-prediction mode index tables (also referred to as codeword mapping tables).

视频编码器20通过从正经译码的原始视频块减去来自模式选择单元40的预测数据而形成残余视频块。求和器50表示执行此减法运算的(多个)组件。变换处理单元52将例如离散余弦变换(DCT)或概念上类似的变换等变换应用于残余块，从而产生包括残余变换系数值的视频块。变换处理单元52可执行概念上类似于DCT的其它变换。还可使用小波变换、整数变换、次频带变换或其它类型的变换。在任一状况下，变换处理单元52将变换应用于残余块，从而产生残余变换系数的块。所述变换可将残余信息从像素值域转换到变换域(例如，频域)。变换处理单元52可将所得的变换系数发送到量化单元54。量化单元54量化所述变换系数以进一步减少位速率。量化过程可减少与所述系数中的一些或全部相关联的位深度。可通过调整量化参数而修改量化程度。在一些实例中，量化单元54可接着执行包含经量化的变换系数的矩阵的扫描。或者，熵编码单元56可执行所述扫描。Video encoder 20 forms a residual video block by subtracting the prediction data from mode select unit 40 from the original video block being coded. Summer 50 represents the component(s) that perform this subtraction operation. Transform processing unit 52 applies a transform, such as a discrete cosine transform (DCT) or a conceptually similar transform, to the residual block, producing a video block comprising residual transform coefficient values. Transform processing unit 52 may perform other transforms conceptually similar to the DCT. Wavelet transforms, integer transforms, sub-band transforms, or other types of transforms may also be used. In either case, transform processing unit 52 applies a transform to the residual block, producing a block of residual transform coefficients. The transform may convert the residual information from the pixel value domain to a transform domain (e.g., the frequency domain). Transform processing unit 52 may send the resulting transform coefficients to quantization unit 54. Quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The degree of quantization may be modified by adjusting a quantization parameter. In some examples, quantization unit 54 may then perform a scan of the matrix including the quantized transform coefficients. Alternatively, entropy encoding unit 56 may perform the scan.

在量化之后，熵编码单元56熵译码所述经量化的变化系数。举例来说，熵编码单元56可执行上下文自适应可变长度译码(CAVLC)、上下文自适应二进制算术译码(CABAC)、基于语法的上下文自适应二进制算术译码(SBAC)、概率区间分割熵(PIPE)译码或另一熵译码技术。在基于上下文的熵译码的状况下，上下文可基于相邻块。在由熵编码单元56进行熵译码之后，可将经编码位流发射到另一装置(例如，视频解码器30)或将其存档以供稍后发射或检索。After quantization, entropy encoding unit 56 entropy codes the quantized transform coefficients. For example, entropy encoding unit 56 may perform context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding, or another entropy coding technique. In the case of context-based entropy coding, the context may be based on neighboring blocks. After entropy coding by entropy encoding unit 56, the encoded bitstream may be transmitted to another device (e.g., video decoder 30) or archived for later transmission or retrieval.

反量化单元58及反变换单元60分别应用反量化及反变换以在像素域中重建构残余块(例如)以供稍后用作参考块。运动补偿单元44可通过将残余块加到参考图片存储器64的帧中的一者的预测性块来计算参考块。运动补偿单元44还可对经重建构的残余块应用一或多个内插滤波器以计算用于在运动估计中使用的子整数像素值。求和器62将经重建构的残余块加到由运动补偿单元44产生的经运动补偿的预测块以产生经重建构的视频块以用于存储于参考图片存储器64中。经重建构的视频块可由运动估计单元42及运动补偿单元44用作参考块以帧间译码后续视频帧中的块。Inverse quantization unit 58 and inverse transform unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain, e.g., for later use as a reference block. Motion compensation unit 44 may calculate a reference block by adding the residual block to a predictive block of one of the frames of reference picture memory 64. Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for use in motion estimation. Summer 62 adds the reconstructed residual block to the motion compensated prediction block produced by motion compensation unit 44 to produce a reconstructed video block for storage in reference picture memory 64. The reconstructed video block may be used by motion estimation unit 42 and motion compensation unit 44 as a reference block to inter-code a block in a subsequent video frame.

如上文所描述，解封装单元29可经配置以接收经译码视频序列，并剖析存取单元及NAL单元，其中NAL单元是基于表2到表7中所说明的NAL单元分配的任何及所有组合而分配。另外，解封装单元29及视频解码器30可基于NAL单元类型分配而重建构视频数据。在一实例中，解封装单元29可经配置以接收NAL单元，其中所述NAL单元包含NAL类型值，且基于所述NAL类型值，确定所述NAL单元是否封装包含于与引导图片相关联的RAP图片中的视频数据的经编码切片，且视频解码器30可经配置以基于所述NAL单元是否封装包含于与引导图片相关联的RAP图片中的视频数据的经编码切片，重建构视频数据。在另一实例中，解封装单元29可经配置以接收NAL单元，其中所述NAL单元包含NAL类型值，且基于所述NAL类型值，确定所述NAL单元是否封装AU层级SEI消息，且视频解码器30可经配置以基于所述NAL单元是否封装AU层级SEI消息而重建构视频数据。在一些状况下，重建构视频数据可包含产生经拼接的位流，如上文所描述，且视频解码器30可基于NAL单元类型确定而确定经拼接的视频流中的图片的呈现时间。As described above, decapsulation unit 29 may be configured to receive a coded video sequence and parse access units and NAL units, wherein the NAL units are allocated based on any and all combinations of the NAL unit allocations described in Tables 2 through 7. Additionally, decapsulation unit 29 and video decoder 30 may reconstruct video data based on the NAL unit type allocations. In one example, decapsulation unit 29 may be configured to receive a NAL unit, wherein the NAL unit includes a NAL type value, and based on the NAL type value, determine whether the NAL unit encapsulates an encoded slice of video data included in a RAP picture associated with a leading picture, and video decoder 30 may be configured to reconstruct the video data based on whether the NAL unit encapsulates an encoded slice of video data included in a RAP picture associated with a leading picture. In another example, decapsulation unit 29 may be configured to receive a NAL unit, wherein the NAL unit includes a NAL type value, and based on the NAL type value, determine whether the NAL unit encapsulates an AU-level SEI message, and video decoder 30 may be configured to reconstruct the video data based on whether the NAL unit encapsulates the AU-level SEI message. In some cases, reconstructing the video data may include generating a spliced bitstream, as described above, and video decoder 30 may determine the presentation time of pictures in the spliced video stream based on the NAL unit type determination.

另外如上文所描述，例如源装置12的源装置可经配置以用信号发出第一图片的呈现时间与第二图片的呈现时间之间的增量，其中所述发信号使用可为上文所描述的fixed_pic_rate_flag语法元素中的任一者的语法元素中的任一者。因此，目的地装置14、解封装单元29及视频解码器30可经配置以确定第一图片及第二图片的呈现时间，且相应地呈现所述图片。As also described above, a source device, such as source device 12, may be configured to signal a delta between the presentation time of a first picture and the presentation time of a second picture, wherein the signaling uses any of the syntax elements, which may be any of the fixed_pic_rate_flag syntax elements described above. Accordingly, destination device 14, decapsulation unit 29, and video decoder 30 may be configured to determine the presentation times of the first picture and the second picture and present the pictures accordingly.

图9为说明确定呈现时间增量值的实例方法的流程图。尽管将图9中所说明的用信号发出呈现时间增量值的实例描述为由解封装单元29执行，但目的地装置14、视频解码器30、解封装单元29及其组件的组合的任何组合可执行图9中所说明的确定呈现时间增量值的实例。如图9中所说明，解封装单元29获得第一图片(902)。第一图片可为对应于存取单元的经编码图片。解封装单元29获得第二图片(904)。第二图片可为对应于存取单元的经编码图片。第二图片可包含于与第一图片相同的时间层中。另外，第一图片及第二图片可包含于视频数据的最高时间层中。FIG9 is a flowchart illustrating an example method of determining a presentation time delta value. Although the example of signaling a presentation time delta value illustrated in FIG9 is described as being performed by decapsulation unit 29, any combination of destination device 14, video decoder 30, decapsulation unit 29, and combinations of components thereof may perform the example of determining a presentation time delta value illustrated in FIG9. As illustrated in FIG9, decapsulation unit 29 obtains a first picture (902). The first picture may be an encoded picture corresponding to an access unit. Decapsulation unit 29 obtains a second picture (904). The second picture may be an encoded picture corresponding to the access unit. The second picture may be included in the same temporal layer as the first picture. Additionally, the first and second pictures may be included in the highest temporal layer of the video data.

解封装单元29可接着获得整数值N(906)。这是假定解封装单元29先前已获得数据，例如旗标的值，其指示。整数值N可包含于VUI参数的集合中，VUI参数的集合可包含于SPS中。解封装单元29确定时钟刻度值(908)。解封装单元29可根据上文所描述的等式(1)，基于time_scale及num_units_in_tick语法元素而确定时钟刻度值。Decapsulation unit 29 may then obtain an integer value N (906). This assumes that decapsulation unit 29 has previously obtained data, such as the value of a flag, indicating the integer value N. The integer value N may be included in a set of VUI parameters, which may be included in the SPS. Decapsulation unit 29 determines a clock tick value (908). Decapsulation unit 29 may determine the clock tick value based on the time_scale and num_units_in_tick syntax elements according to equation (1) described above.

解封装单元29可接着确定第一图片的呈现时间与第二图片的呈现时间之间的增量(910)。所述增量可基于整数值N而等于时钟刻度值的整数倍。举例来说，增量可等于(N+1)*时钟刻度。Decapsulation unit 29 may then determine a delta between the presentation time of the first picture and the presentation time of the second picture 910. The delta may be equal to an integer multiple of the clock tick value based on an integer value N. For example, the delta may be equal to (N+1)*clock tick.

解封装单元29及视频解码器30可接着根据所确定的增量呈现第一图片及第二图片(912)。在一实例中，解封装单元29可将所述增量值用信号发出到视频解码器30，且视频解码器30可基于所述增量值而执行解码过程。以此方式，目的地装置14表示包含处理器的装置的实例，所述处理器经配置以确定第一图片的呈现时间与第二图片的呈现时间之间的差值，其中所述差值等于整数值乘以时钟刻度值，及根据所确定的差值呈现第一图片及第二图片。Decapsulation unit 29 and video decoder 30 may then present the first picture and the second picture according to the determined delta (912). In one example, decapsulation unit 29 may signal the delta value to video decoder 30, and video decoder 30 may perform a decoding process based on the delta value. In this manner, destination device 14 represents an example of a device including a processor configured to determine a difference between a presentation time of a first picture and a presentation time of a second picture, wherein the difference is equal to an integer value multiplied by a clock tick value, and present the first picture and the second picture according to the determined difference.

同样地，图9的方法表示包含以下操作的方法的实例：确定第一图片的呈现时间与第二图片的呈现时间之间的差值，其中所述差值等于整数值乘以时钟刻度值，及根据所确定的差值呈现第一图片及第二图片。Similarly, the method of Figure 9 represents an example of a method that includes the following operations: determining a difference between the presentation time of a first picture and the presentation time of a second picture, wherein the difference is equal to an integer value multiplied by a clock tick value, and presenting the first picture and the second picture according to the determined difference.

图10为说明视频解码器30的实例的框图，视频解码器30可实施用于进行以下操作的技术：(1)接收包含NAL单元类型的数据，(2)处理所接收的子图片层级或解码单元层级HRD行为，(3)处理包含对参数集ID的参考的数据，(4)处理包含fixed_pic_rate_flag的改进的语义的所接收的数据，或此些各者的任何及所有组合。在图10的实例中，视频解码器30包含熵解码单元70、运动补偿单元72、帧内预测单元74、反量化单元76、反变换单元78、参考图片存储器82及求和器80。在一些实例中，视频解码器30可执行与关于视频编码器20(图2)所描述的编码遍次大体上互逆的解码遍次。运动补偿单元72可基于从熵解码单元70所接收的运动向量而产生预测数据，而帧内预测单元74可基于从熵解码单元70所接收的帧内预测模式指示符产生预测数据。FIG10 is a block diagram illustrating an example of a video decoder 30 that may implement techniques for: (1) receiving data including NAL unit types, (2) processing received sub-picture level or decoding unit level HRD behavior, (3) processing data including references to parameter set IDs, (4) processing received data including improved semantics for fixed_pic_rate_flag, or any and all combinations thereof. In the example of FIG10 , the video decoder 30 includes an entropy decoding unit 70, a motion compensation unit 72, an intra-prediction unit 74, an inverse quantization unit 76, an inverse transform unit 78, a reference picture memory 82, and a summer 80. In some examples, the video decoder 30 may perform a decoding pass that is substantially reciprocal to the encoding pass described with respect to the video encoder 20 ( FIG2 ). Motion compensation unit 72 may generate prediction data based on the motion vector received from entropy decoding unit 70 , while intra-prediction unit 74 may generate prediction data based on the intra-prediction mode indicator received from entropy decoding unit 70 .

在解码过程期间，视频解码器30从视频编码器20接收表示经编码视频切片的视频块的经编码视频位流及相关联的语法元素。视频解码器30的熵解码单元70熵解码位流以产生经量化的系数、运动向量或帧内预测模式指示符及其它语法元素。熵解码单元70将运动向量及其它语法元素转递到运动补偿单元72。视频解码器30可接收视频切片层级及/或视频块层级的语法元素。During the decoding process, video decoder 30 receives an encoded video bitstream representing video blocks of an encoded video slice and associated syntax elements from video encoder 20. Entropy decoding unit 70 of video decoder 30 entropy decodes the bitstream to generate quantized coefficients, motion vectors or intra-prediction mode indicators, and other syntax elements. Entropy decoding unit 70 forwards the motion vectors and other syntax elements to motion compensation unit 72. Video decoder 30 may receive syntax elements at the video slice level and/or the video block level.

当视频切片经译码为经帧内译码(I)切片时，帧内预测单元74可基于用信号发出的帧内预测模式及来自当前帧或图片的先前经解码块的数据而产生当前视频切片的视频块的预测数据。当视频帧经译码为经帧间译码(即，B、P或GPB)切片时，运动补偿单元72基于从熵解码单元70所接收的运动向量及其它语法元素而产生当前视频切片的视频块的预测性块。可从参考图片列表中的一者内的参考图片中的一者产生预测性块。视频解码器30可基于存储于参考图片存储器82中的参考图片，使用默认建构技术来建构参考帧列表：列表0及列表1。运动补偿单元72通过剖析运动向量及其它语法元素而确定当前视频切片的视频块的预测信息，且使用所述预测信息产生正经解码的当前视频块的预测性块。举例来说，运动补偿单元72使用所接收的语法元素中的一些语法元素来确定用以译码视频切片的视频块的预测模式(例如，帧内预测或帧间预测)、帧间预测切片类型(例如，B切片、P切片或GPB切片)、切片的参考图片列表中的一或多者的建构信息、切片的每一经帧间编码视频块的运动向量、切片的每一经帧间译码视频块的帧间预测状态，及用以解码当前视频切片中的视频块的其它信息。When the video slice is coded as an intra-coded (I) slice, intra-prediction unit 74 may generate prediction data for the video block of the current video slice based on the signaled intra-prediction mode and data from previously decoded blocks of the current frame or picture. When the video frame is coded as an inter-coded (i.e., B, P, or GPB) slice, motion compensation unit 72 generates predictive blocks for the video block of the current video slice based on motion vectors and other syntax elements received from entropy decoding unit 70. The predictive blocks may be generated from one of the reference pictures within one of the reference picture lists. Video decoder 30 may construct reference frame lists: List 0 and List 1, using default construction techniques based on the reference pictures stored in reference picture memory 82. Motion compensation unit 72 determines prediction information for the video block of the current video slice by parsing the motion vectors and other syntax elements, and uses the prediction information to generate predictive blocks for the current video block being decoded. For example, motion compensation unit 72 uses some of the received syntax elements to determine a prediction mode (e.g., intra-prediction or inter-prediction) used to code a video block of a video slice, an inter-prediction slice type (e.g., a B slice, a P slice, or a GPB slice), construction information for one or more of the slice's reference picture lists, motion vectors for each inter-coded video block of the slice, inter-prediction status for each inter-coded video block of the slice, and other information used to decode video blocks in the current video slice.

运动补偿单元72还可基于内插滤波器执行内插。运动补偿单元72可使用如由视频编码器20在视频块的编码期间所使用的内插滤波器来计算参考块的子整数像素的内插值。在此状况下，运动补偿单元72可从所接收的语法元素确定由视频编码器20使用的内插滤波器，且使用所述内插滤波器来产生预测性块。Motion compensation unit 72 may also perform interpolation based on interpolation filters. Motion compensation unit 72 may calculate interpolated values for sub-integer pixels of a reference block using interpolation filters as used by video encoder 20 during encoding of the video block. In this case, motion compensation unit 72 may determine the interpolation filters used by video encoder 20 from the received syntax elements and use the interpolation filters to produce the predictive blocks.

反量化单元76将提供于位流中且由熵解码单元70解码的经量化的变换系数反量化(即，解量化)。反量化过程可包含使用由视频解码器30计算的视频切片中的每一视频块的量化参数QP_Y，来确定量化程度及(同样地)应应用的反量化的程度。Inverse quantization unit 76 inverse quantizes, i.e., dequantizes, the quantized transform coefficients provided in the bitstream and decoded by entropy decoding unit 70. The inverse quantization process may include using a quantization parameter _QPY for each video block in a video slice calculated by video decoder 30 to determine a degree of quantization and, likewise, a degree of inverse quantization that should be applied.

反变换单元78将反变换(例如，反DCT、反整数变换或概念上类似的反变换过程)应用于变换系数，以便在像素域中产生残余块。Inverse transform unit 78 applies an inverse transform (e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients to produce residual blocks in the pixel domain.

在运动补偿单元72基于运动向量及其它语法元素产生当前视频块的预测性块之后，视频解码器30通过将来自反变换单元78的残余块与由运动补偿单元72产生的对应预测性块加总而形成经解码视频块。求和器80表示执行此加总运算的(多个)组件。在需要时，还可应用解块滤波器来对经解码块进行滤波以便移除成块假影。其它回路滤波器(译码回路中或译码回路后)还可用以使像素转变平滑，或以其它方式改进视频质量。接着将给定帧或图片中的经解码视频块存储于参考图片存储器82中，参考图片存储器82存储用于后续运动补偿的参考图片。参考图片存储器82还存储经解码视频以用于稍后呈现于显示装置(例如，图3的显示装置32)上。After motion compensation unit 72 generates a predictive block for the current video block based on the motion vector and other syntax elements, video decoder 30 forms a decoded video block by summing the residual block from inverse transform unit 78 with the corresponding predictive block generated by motion compensation unit 72. Summer 80 represents the component(s) that perform this summing operation. If necessary, a deblocking filter may also be applied to filter the decoded block to remove blocking artifacts. Other loop filters (in the coding loop or after the coding loop) may also be used to smooth pixel transitions or otherwise improve video quality. The decoded video blocks in a given frame or picture are then stored in reference picture memory 82, which stores reference pictures used for subsequent motion compensation. Reference picture memory 82 also stores the decoded video for later presentation on a display device (e.g., display device 32 of FIG. 3 ).

应认识到，取决于实例，本文中所描述的技术中的任一者的某些动作或事件可以不同序列来执行，可经添加、合并或完全省略(例如，对于实践所述技术来说，并非所有所描述的动作或事件皆是必要的)。此外，在某些实例中，可(例如)经由多线程处理、中断处理或多个处理器同时而非顺序地执行动作或事件。It should be appreciated that, depending on the example, certain actions or events of any of the techniques described herein may be performed in a different sequence, may be added, combined, or omitted entirely (e.g., not all described actions or events are necessary to practice the techniques). Furthermore, in some examples, actions or events may be performed simultaneously rather than sequentially, for example, via multithreading, interrupt processing, or multiple processors.

在一或多个实例中，所描述的功能可以硬件、软件、固件或其任何组合来实施。如果以软件来实施，那么所述功能可作为一或多个指令或代码而存储于计算机可读媒体上或经由计算机可读媒体予以发射，且由基于硬件的处理单元来执行。计算机可读媒体可包含计算机可读存储媒体(其对应于例如数据存储媒体等有形媒体)或通信媒体，通信媒体包含(例如)根据通信协议促进计算机程序从一处传送到另一处的任何媒体。以此方式，计算机可读媒体大体上可对应于(1)非暂时性的有形计算机可读存储媒体，或(2)例如信号或载波等通信媒体。数据存储媒体可为可由一或多个计算机或一或多个处理器存取以检索指令、代码及/或数据结构以用于实施本发明中所描述的技术的任何可用媒体。计算机程序产品可包含计算机可读媒体。In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted via a computer-readable medium as one or more instructions or codes and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media (which corresponds to tangible media such as data storage media) or communication media, which includes, for example, any media that facilitates the transfer of a computer program from one place to another according to a communication protocol. In this manner, computer-readable media may generally correspond to (1) a non-transitory, tangible computer-readable storage medium, or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, codes, and/or data structures for implementing the techniques described in this disclosure. A computer program product may include a computer-readable medium.

通过实例而非限制，此些计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储器、磁盘存储器或其它磁性存储装置、快闪存储器，或可用以存储呈指令或数据结构的形式的所要程序代码且可由计算机存取的任何其它媒体。而且，任何连接可适当地称为计算机可读媒体。举例来说，如果使用同轴电缆、光纤缆线、双绞线、数字订户线(DSL)或无线技术(例如，红外线、无线电及微波)而从网站、服务器或其它远程源发射指令，那么同轴电缆、光纤缆线、双绞线、DSL或无线技术(例如，红外线、无线电及微波)包含于媒体的定义中。然而，应理解，计算机可读存储媒体及数据存储媒体不包含连接、载波、信号或其它暂时性媒体，而实情为，针对非暂时性有形存储媒体。如本文中所使用，磁盘及光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)、软性磁盘及蓝光光盘，其中磁盘通常以磁性方式复制数据，而光盘通过激光以光学方式复制数据。上文各者的组合还应包含于计算机可读媒体的范围内。By way of example, and not limitation, such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Moreover, any connection may be properly referred to as a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies (e.g., infrared, radio, and microwave), then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies (e.g., infrared, radio, and microwave) are included in the definition of medium. However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but rather are directed to non-transitory, tangible storage media. As used herein, disk and disc include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, and Blu-ray disc, where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

可由例如一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一或多个处理器来执行指令。因此，如本文中所使用的术语“处理器”可指代上述结构或适于实施本文中所描述的技术的任何其它结构中的任一者。另外，在一些方面中，可将本文中所描述的功能性提供于经配置以用于编码及解码的专用硬件及/或软件模块内，或并入于组合式编解码器中。而且，所述技术可完全实施于一或多个电路或逻辑元件中。Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general-purpose microprocessors, application-specific integrated circuits (ASICs), field-programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits. Thus, the term "processor," as used herein, may refer to any of the aforementioned structures or any other structure suitable for implementing the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated into a combined codec. Moreover, the techniques may be fully implemented in one or more circuits or logic elements.

本发明的技术可以广泛多种装置或设备来实施，所述装置或设备包含无线手持机、集成电路(IC)或IC的集合(例如，芯片组)。本发明中描述各种组件、模块或单元以强调经配置以执行所揭示的技术的装置的功能方面，但未必需要通过不同硬件单元来实现。确切地说，如上文所描述，可将各种单元组合于编解码器硬件单元中，或通过互操作性硬件单元(包含如上文所描述的一或多个处理器)的集合结合合适的软件及/或固件来提供所述单元。The techniques of this disclosure can be implemented in a wide variety of devices or apparatuses, including wireless handsets, integrated circuits (ICs), or collections of ICs (e.g., chipsets). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require implementation by different hardware units. Specifically, as described above, the various units can be combined in a codec hardware unit or provided by a collection of interoperable hardware units (including one or more processors as described above) in conjunction with appropriate software and/or firmware.

已描述各种实例。此些及其它实例在以下权利要求书的范围内。Various examples have been described. These and other examples are within the scope of the following claims.

Claims

1. A method for decoding video data, the method comprising:

A first value of the first NAL unit type of the first supplementary enhancement information SEI network abstraction layer NAL unit of the bit stream and a second value of the second NAL unit type of the second SEI NAL unit of the bit stream are determined, wherein the first NAL unit type value of the first SEI NAL unit indicates that the first SEI NAL unit includes a prefix SEI NAL unit containing a prefix SEI message, and wherein the second NAL unit type value of the second SEI NAL unit indicates that the second SEI NAL unit includes a suffix SEI NAL unit containing a suffix SEI message;

The video data of the bitstream is decoded based on the data of the first SEI NAL unit, which is the prefix SEI NAL unit, and based on the data of the second SEI NAL unit, which is the suffix SEI NAL unit. Decoding the video data further includes:

The prefix SEI NAL unit is extracted from a first access unit including the prefix SEI NAL unit, and subsequently, one or more first video decoding layer VCL NAL units are extracted, the first or more VCL NAL units including the last VCL NAL unit, and in the first access unit, the first or more VCL NAL units are following the prefix SEI NAL unit in the decoding order; and

Extract the second one or more VCL NAL units including the last VCL NAL unit from the second access unit and then extract the suffix SEI NAL unit, wherein the suffix SEI NAL unit is in the decoding order following the second one or more VCL NAL units including the last VCL NAL unit in the second access unit.

2. An apparatus for decoding video data, the apparatus comprising:

A memory configured to store video data of a bitstream; and

Processor, configured to:

The video data of the bitstream is decoded based on the data of the first SEI NAL unit, which is the prefix SEI NAL unit, and based on the data of the second SEI NAL unit, which is the suffix SEI NAL unit. To decode the video data, the processor is further configured to:

3. The apparatus of claim 2, wherein the apparatus comprises at least one of the following:

One or more integrated circuits;

One or more microprocessors;

One or more digital signal processors (DSPs);

One or more Field Programmable Gate Arrays (FPGAs);

Desktop computers;

Laptop computers;

Tablet computer;

Telephone;

TV set;

camera;

Display device;

Digital media player;

Video game console;

Video game devices;

Video streaming device; or

Wireless communication device.

4. The apparatus of claim 2, further comprising a display device configured to display at least a portion of the video data.

5. An apparatus for decoding video data, the apparatus comprising:

A means for determining a first value of a first NAL unit type of a first SEI NAL unit of a first supplemental enhancement information (SEI) network abstraction layer (NAL) unit of a bit stream and a second value of a second NAL unit type of a second SEI NAL unit of the bit stream, wherein the first NAL unit type value of the first SEI NAL unit indicates that the first SEI NAL unit includes a prefix SEI NAL unit containing a prefix SEI message, and wherein the second NAL unit type value of the second SEI NAL unit indicates that the second SEI NAL unit includes a suffix SEI NAL unit containing a suffix SEI message;

A means for decoding video data of the bitstream based on data of the first SEI NAL unit, wherein the first SEI NAL unit is the prefix SEI NAL unit, and based on data of the second SEI NAL unit, wherein the means for decoding the video data further comprises:

A means for retrieving the prefix SEI NAL unit from a first access unit including the prefix SEI NAL unit and subsequently retrieving one or more first video decoding layer VCL NAL units, the first or more VCL NAL units including a last VCL NAL unit, and in the first access unit, the first or more VCL NAL units are sequentially following the prefix SEI NAL unit in decoding order; and

A means for retrieving a second one or more VCL NAL units including the last VCL NAL unit from a second access unit and subsequently retrieving the suffix SEI NAL unit, wherein the suffix SEI NAL unit is in the decoding order following the second one or more VCL NAL units including the last VCL NAL unit in the second access unit.

6. The apparatus of claim 5, wherein the apparatus comprises at least one of the following:

One or more integrated circuits;

One or more microprocessors;

One or more digital signal processors (DSPs);

One or more Field Programmable Gate Arrays (FPGAs);

Desktop computers;

Laptop computers;

Tablet computer;

Telephone;

TV set;

camera;

Display device;

Digital media player;

Video game console;

Video game devices;

Video streaming device; or

Wireless communication device.

7. The apparatus of claim 5, further comprising a display means for displaying at least a portion of the video data.

8. A computer-readable storage medium having instructions stored thereon, said instructions, when executed, causing a processor to perform the following operations:

The video data of the bitstream is decoded based on the data of the first SEI NAL unit, which is the prefix SEI NAL unit, and based on the data of the second SEI NAL unit, which is the suffix SEI NAL unit. The instructions causing the processor to decode the video data further include instructions causing the processor to perform the following operations:

9. A method for generating a bitstream containing video data, the method comprising:

The encoding includes a first SEI message containing prefix supplemental enhancement information (SEI) and a second SEI message containing a suffix SEI, wherein each of the prefix SEI message and the suffix SEI message contains data related to the encoded video data.

The prefix SEI message is encapsulated in the first SEI network abstraction layer NAL unit;

Set the first NAL unit type value of the first SEI NAL unit to a value that indicates the first SEI NAL unit is a prefix SEI NAL unit;

The first access unit is formed such that the first access unit includes the first SEI NAL unit, and the final video decoding layer VCL NAL unit is connected after the first SEI NAL unit in the decoding order.

The suffix SEI message is encapsulated in a second SEI NAL unit;

Set the second NAL cell type value of the second SEI NAL cell to a value that indicates that the second SEI NAL cell is a suffix SEI NAL cell;

A second access unit is formed such that the second access unit includes one or more VCLNAL units containing the last VCL NAL unit, the suffix SEI NAL unit being sequentially appended to the last VCL NAL unit in the decoding order, such that the suffix SEI NAL unit is sequentially appended to all VCL NAL units in the second access unit in the decoding order; and

A bit stream is generated that includes at least the first access unit and the second access unit.

10. An apparatus for generating a bitstream comprising video data, the apparatus comprising a processor configured to:

The encoding includes a first SEI message containing prefix supplemental enhancement information (SEI) and a second SEI message containing suffix SEI information, wherein the prefix SEI information and the suffix SEI information each contain data related to the encoded video data.

The suffix SEI message is encapsulated in a second SEI NAL unit;

Set the second NAL cell type value of the second SEI NAL cell to a value indicating that the second SEI NAL cell is a suffix SEI NAL cell;

11. An apparatus for generating a bitstream comprising video data, the apparatus comprising:

A means for encoding a first SEI message containing a prefix supplemental enhancement information (SEI) message and a second SEI message containing a suffix SEI message, wherein the prefix SEI message and the suffix SEI message each contain data related to the encoded video data;

A means for encapsulating the prefixed SEI message in a first SEI network abstraction layer (NAL) unit;

A means for setting the first NAL cell type value of the first SEI NAL cell to a value indicating that the first SEI NAL cell is a prefix SEI NAL cell;

The means for forming a first access unit such that the first access unit includes the first SEI NAL unit, and the final video decoding layer VCL NAL unit is connected after the first SEI NAL unit in the decoding order.

A means for encapsulating the suffix SEI message in a second SEI NAL unit;

means for setting the second NAL cell type value of the second SEI NAL cell to indicate that the second SEI NAL cell is a suffix SEI NAL cell;

Means for forming a second access unit such that the second access unit includes one or more VCL NAL units comprising a last VCL NAL unit, the suffix SEI NAL unit being sequentially appended to the last VCL NAL unit in decoding order, such that the suffix SEI NAL unit is sequentially appended to all VCL NAL units in the second access unit in decoding order; and

A means for generating a bit stream that includes at least the first access unit and the second access unit.

12. A computer-readable storage medium having instructions stored thereon, said instructions, when executed, causing a processor to perform the following operations:

The encoding includes a first SEI message containing a prefix supplemental enhancement information (SEI) message and a second SEI message containing a suffix SEI message, wherein the prefix SEI message and the suffix SEI message each contain data related to the encoded video data;

The suffix SEI message is encapsulated in a second SEI NAL unit;