HK1210350B

HK1210350B - Signaling of picture order count to timing information relations for video timing in video coding

Info

Publication number: HK1210350B
Application number: HK15111066.9A
Authority: HK
Inventors: 王益魁
Original assignee: 高通股份有限公司
Priority date: 2013-01-07
Filing date: 2013-12-20
Publication date: 2019-09-27

Description

Signaling the relationship between picture order count and timing information for video decoding

本申请案主张2013年1月7日申请的第61/749,866号美国临时申请案的权利，所述申请案的全部内容以引用的方式并入本文中。This application claims the benefit of U.S. Provisional Application No. 61/749,866, filed January 7, 2013, the entire contents of which are incorporated herein by reference.

技术领域Technical Field

本发明涉及视频译码和视频处理，且更具体来说涉及用于用信号发送视频信息中的时序信息的技术。This disclosure relates to video coding and video processing, and more particularly to techniques for signaling timing information in video information.

背景技术Background Art

数字视频能力可并入到广泛范围的装置中，包含数字电视、数字直播系统、无线广播系统、个人数字助理(PDA)、膝上型或桌上型计算机、平板计算机、电子图书阅读器、数码相机、数字记录装置、数字媒体播放器、视频游戏装置、视频游戏控制台、蜂窝式或卫星无线电电话、所谓的“智能电话”、视频电话会议装置、视频流式传输装置及其类似者。数字视频装置实施视频压缩技术，例如由MPEG-2、MPEG-4、ITU-T H.263或ITU-T H.264/MPEG-4第10部分高级视频译码(AVC)所界定的标准、高效率视频译码(HEVC)标准及这些标准的扩展中所描述的视频译码技术。视频装置可通过实施此类视频压缩技术而更有效率地传输、接收、编码、解码及/或存储数字视频信息。Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital live broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, electronic book readers, digital cameras, digital recording devices, digital media players, video game devices, video game consoles, cellular or satellite radio telephones, so-called "smartphones", video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video compression techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263 or ITU-T H.264/MPEG-4 Part 10 Advanced Video Coding (AVC), the High Efficiency Video Coding (HEVC) standard, and extensions of these standards. By implementing such video compression techniques, video devices can more efficiently transmit, receive, encode, decode, and/or store digital video information.

视频压缩技术执行空间(图片内)预测及/或时间(图片间)预测以减少或移除视频序列中固有的冗余。对于基于块的视频译码来说，视频切片(例如，视频帧或视频帧的一部分)可以分割成视频块，视频块还可被称作树块、译码单元(CU)及/或译码节点。使用相对于同一图片中的相邻块中的参考样本的空间预测对图片的经帧内译码(I)切片中的视频块进行编码。图片的经帧间编码(P或B)切片中的视频块可使用相对于同一图片中的相邻块中的参考样本的空间预测或相对于其它参考图片中的参考样本的时间预测。图片可被称为帧，且参考图片可涉及参考帧。Video compression techniques perform spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (e.g., a video frame or a portion of a video frame) may be partitioned into video blocks, which may also be referred to as treeblocks, coding units (CUs), and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in an inter-coded (P or B) slice of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. Pictures may be referred to as frames, and reference pictures may refer to reference frames.

空间或时间预测利用预测块。残差数据表示待译码的原始块与预测块之间的像素差。经帧间译码块是根据指向形成预测块的参考样本块的运动向量及指示经译码块与预测块之间的差的残差数据编码的。经帧内译码块是根据帧内译码模式及残差数据来编码。为了实现进一步压缩，可以将残差数据从像素域变换到变换域，从而产生残差变换系数，接着可以对残差变换系数进行量化。可扫描最初布置为二维阵列的经量化变换系数，以便产生变换系数的一维向量，且可应用熵译码以实现更多压缩。Spatial or temporal prediction utilizes prediction blocks. Residual data represents the pixel differences between the original block to be coded and the prediction block. Inter-coded blocks are encoded based on a motion vector pointing to a block of reference samples forming the prediction block and residual data indicating the difference between the coded block and the prediction block. Intra-coded blocks are encoded based on an intra-coding mode and the residual data. To achieve further compression, the residual data can be transformed from the pixel domain to the transform domain, thereby producing residual transform coefficients, which can then be quantized. The quantized transform coefficients, initially arranged in a two-dimensional array, can be scanned to produce a one-dimensional vector of transform coefficients, and entropy coding can be applied to achieve even more compression.

经编码到位流的给定经译码视频序列包含经译码图片的有序序列。在H.264/AVC和HEVC标准中，用于位流的经译码图片的解码次序与所述有序序列等效。然而，所述标准还支持不同于所述解码次序的经解码图片的输出次序，且在此类情况下，经译码图片与指定视频序列中的图片的输出次序的图片次序计数(POC)值相关联。A given coded video sequence encoded into a bitstream comprises an ordered sequence of coded pictures. In the H.264/AVC and HEVC standards, the decoding order of the coded pictures for the bitstream is equivalent to the ordered sequence. However, the standards also support an output order of decoded pictures that is different from the decoding order, and in such cases, the coded pictures are associated with a picture order count (POC) value that specifies the output order of the pictures in the video sequence.

可在一或多个语法结构(或者称作为“参数集结构”或仅为“参数集”)的语法元素中用信号发送用于视频序列的视频时序信息。语法结构可包含序列参数集(SPS)，所述序列参数集(SPS)包含应用到经译码视频序列的所有切片的译码信息。SPS自身可包含称作为视频可用性信息(VUI)的参数，视频可用性信息(VUI)包含假想参考解码器(HRD)信息以及用于增强针对各种目的的对应视频序列的使用的信息。HRD信息自身可使用可包含在例如VUI语法结构的其它语法结构内的HRD语法结构来用信号发送。语法结构还可包含描述对应视频序列的特性的视频参数集(VPS)，视频参数集(VPS)例如为由多个层或操作点共享的共同语法元素以及可为多个序列参数集共同的其它操作点信息，例如用于各种层或子层的HRD信息。Video timing information for a video sequence may be signaled within syntax elements of one or more syntax structures (alternatively referred to as "parameter set structures" or simply "parameter sets"). The syntax structures may include a sequence parameter set (SPS), which contains coding information that applies to all slices of a coded video sequence. The SPS itself may include parameters referred to as video usability information (VUI), which includes hypothetical reference decoder (HRD) information and information used to enhance the use of the corresponding video sequence for various purposes. The HRD information itself may be signaled using an HRD syntax structure, which may be included within other syntax structures, such as the VUI syntax structure. The syntax structures may also include a video parameter set (VPS), which describes characteristics of the corresponding video sequence. For example, a video parameter set (VPS) is a common syntax element shared by multiple layers or operation points, as well as other operation point information that may be common to multiple sequence parameter sets, such as HRD information for various layers or sub-layers.

发明内容Summary of the Invention

一般来说，本发明描述用于视频译码的技术，且更具体来说用于用信号发送视频信息(例如)以指定图片输出时序和/或以界定缓冲模型(例如假想参考解码器(HRD))的技术。在一些实例中，所述技术可包含针对经译码视频序列产生经编码位流以在视频参数集(VPS)语法结构中用信号发送旗标，所述旗标指示经译码视频序列中的每一图片(除经译码视频序列中按解码次序的第一图片外)的图片次序计数(POC)值是否相对于经译码视频序列中的第一图片的输出时间而与所述图片的输出时间成比例。在一些情况下，所述技术可包含产生经编码位流以仅在呈时间尺度和时钟节拍语法元素中的单位数目的形式的时序信息也包含于VPS语法结构中时在VPS语法结构中用信号发送所述旗标。In general, this disclosure describes techniques for video coding, and more specifically, techniques for signaling video information, for example, to specify picture output timing and/or to define a buffering model, such as a hypothetical reference decoder (HRD). In some examples, the techniques may include generating an encoded bitstream for a coded video sequence to signal a flag in a video parameter set (VPS) syntax structure that indicates whether a picture order count (POC) value for each picture in the coded video sequence (except the first picture in decoding order in the coded video sequence) is proportional to the output time of the picture relative to the output time of the first picture in the coded video sequence. In some cases, the techniques may include generating an encoded bitstream to signal the flag in a VPS syntax structure only when timing information, in the form of a number of units in a time scale and clock tick syntax element, is also included in the VPS syntax structure.

在本发明的一个实例中，一种处理视频数据的方法包含：接收包括视频序列的经编码图片的经译码视频序列；及接收用于所述经译码视频序列的时序参数，所述时序参数包含所述经译码视频序列所参考的视频参数集(VPS)语法结构中的以下指示：所述经译码视频序列中的不是所述经译码视频序列中根据解码次序的第一图片的每一图片的图片次序计数(POC)值是否相对于所述经译码视频序列中的所述第一图片的输出时间而与所述图片的输出时间成比例。In one example of the present invention, a method of processing video data includes receiving a coded video sequence comprising encoded pictures of a video sequence, and receiving timing parameters for the coded video sequence, the timing parameters including an indication in a video parameter set (VPS) syntax structure referenced by the coded video sequence of whether a picture order count (POC) value for each picture in the coded video sequence that is not a first picture in the coded video sequence in decoding order is proportional to an output time of the picture relative to an output time of the first picture in the coded video sequence.

在本发明的另一实例中，一种编码视频数据的方法包含：编码视频序列的图片以产生包括所述经编码图片的经译码视频序列；及通过在所述经译码视频序列所参考的视频参数集(VPS)语法结构中用信号发送以下指示而用信号发送用于所述经译码视频序列的时序参数：所述经译码视频序列中的不是所述经译码视频序列中根据解码次序的第一图片的每一图片的图片次序计数(POC)值是否相对于所述经译码视频序列中的所述第一图片的输出时间而与所述图片的输出时间成比例。In another example of the present disclosure, a method of encoding video data includes encoding pictures of a video sequence to generate a coded video sequence including the encoded pictures; and signaling timing parameters for the coded video sequence by signaling, in a video parameter set (VPS) syntax structure referenced by the coded video sequence, an indication of whether a picture order count (POC) value for each picture in the coded video sequence that is not the first picture in the coded video sequence in decoding order is proportional to an output time of the picture relative to an output time of the first picture in the coded video sequence.

在本发明的另一实例中，一种用于处理视频数据的装置包含处理器，其经配置以：接收包括视频序列的经编码图片的经译码视频序列；及接收用于所述经译码视频序列的时序参数，所述时序参数包含所述经译码视频序列所参考的视频参数集(VPS)语法结构中的以下指示：所述经译码视频序列中的不是所述经译码视频序列中根据解码次序的第一图片的每一图片的图片次序计数(POC)值是否相对于所述经译码视频序列中的所述第一图片的输出时间而与所述图片的输出时间成比例。In another example of the present disclosure, a device for processing video data includes a processor configured to: receive a coded video sequence comprising encoded pictures of a video sequence; and receive timing parameters for the coded video sequence, the timing parameters including an indication in a video parameter set (VPS) syntax structure referenced by the coded video sequence of whether a picture order count (POC) value for each picture in the coded video sequence that is not the first picture in the coded video sequence according to decoding order is proportional to an output time of the picture relative to an output time of the first picture in the coded video sequence.

在本发明的另一实例中，一种用于编码视频数据的装置包含处理器，其经配置以：编码视频序列的图片以产生包括所述经编码图片的经译码视频序列；及通过在所述经译码视频序列所参考的视频参数集(VPS)语法结构中用信号发送以下指示而用信号发送用于所述经译码视频序列的时序参数：所述经译码视频序列中的不是所述经译码视频序列中根据解码次序的第一图片的每一图片的图片次序计数(POC)值是否相对于所述经译码视频序列中的所述第一图片的输出时间而与所述图片的输出时间成比例。In another example of the present disclosure, a device for encoding video data includes a processor configured to: encode pictures of a video sequence to produce a coded video sequence including the encoded pictures; and signal timing parameters for the coded video sequence by signaling, in a video parameter set (VPS) syntax structure referenced by the coded video sequence, an indication of whether a picture order count (POC) value for each picture in the coded video sequence that is not the first picture in the coded video sequence in decoding order is proportional to an output time of the picture relative to an output time of the first picture in the coded video sequence.

在本发明的另一实例中，一种用于处理视频数据的装置包含：用于接收包括视频序列的经编码图片的经译码视频序列的装置；及用于接收用于所述经译码视频序列的时序参数的装置，所述时序参数包含所述经译码视频序列所参考的视频参数集(VPS)语法结构中的以下指示：所述经译码视频序列中的不是所述经译码视频序列中根据解码次序的第一图片的每一图片的图片次序计数(POC)值是否相对于所述经译码视频序列中的所述第一图片的输出时间而与所述图片的输出时间成比例。In another example of the present disclosure, a device for processing video data includes: means for receiving a coded video sequence comprising encoded pictures of a video sequence; and means for receiving timing parameters for the coded video sequence, the timing parameters including an indication in a video parameter set (VPS) syntax structure referenced by the coded video sequence of whether a picture order count (POC) value for each picture in the coded video sequence that is not the first picture in the coded video sequence in decoding order is proportional to an output time of the picture relative to an output time of the first picture in the coded video sequence.

在另一实例中，本发明描述一种计算机可读存储媒体。所述计算机可读存储媒体具有存储于其上的指令，所述指令在执行时致使一或多个处理器进行以下操作：接收包括视频序列的经编码图片的经译码视频序列；及接收用于所述经译码视频序列的时序参数，所述时序参数包含所述经译码视频序列所参考的视频参数集(VPS)语法结构中的以下指示：所述经译码视频序列中的不是所述经译码视频序列中根据解码次序的第一图片的每一图片的图片次序计数(POC)值是否相对于所述经译码视频序列中的所述第一图片的输出时间而与所述图片的输出时间成比例。In another example, the disclosure describes a computer-readable storage medium having stored thereon instructions that, when executed, cause one or more processors to: receive a coded video sequence comprising encoded pictures of a video sequence; and receive timing parameters for the coded video sequence, the timing parameters including an indication in a video parameter set (VPS) syntax structure referenced by the coded video sequence of whether a picture order count (POC) value for each picture in the coded video sequence that is not a first picture in the coded video sequence in decoding order is proportional to an output time of the picture relative to an output time of the first picture in the coded video sequence.

一或多个实例的细节陈述于附图和以下描述中。其它特征、目标和优势将从所述描述和所述图式以及从权利要求书显而易见。The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为说明可利用本发明中描述的技术的实例视频编码和解码系统的框图。1 is a block diagram illustrating an example video encoding and decoding system that may utilize the techniques described in this disclosure.

图2为说明可实施本发明中描述的技术的实例视频编码器的框图。2 is a block diagram illustrating an example video encoder that may implement the techniques described in this disclosure.

图3为说明可实施本发明中描述的技术的实例视频解码器的框图。3 is a block diagram illustrating an example video decoder that may implement the techniques described in this disclosure.

图4为说明根据本文描述的技术的用于参考图片集的实例译码结构的时序信息的框图。4 is a block diagram illustrating timing information of an example coding structure for a reference picture set according to the techniques described herein.

图5为说明根据本发明中描述的技术的操作的实例方法的流程图。5 is a flowchart illustrating an example method of operation according to the techniques described in this disclosure.

图6A到6B为说明根据本发明中描述的技术的操作的实例方法的流程图。6A-6B are flowcharts illustrating example methods of operation according to the techniques described in this disclosure.

图7为说明根据本发明中描述的技术的操作的实例方法的流程图。7 is a flowchart illustrating an example method of operation according to the techniques described in this disclosure.

图8为说明根据本发明中描述的技术的操作的实例方法的流程图。8 is a flowchart illustrating an example method of operation according to the techniques described in this disclosure.

图9A到9B为说明根据本发明中描述的技术的操作的实例方法的流程图。9A-9B are flowcharts illustrating example methods of operation according to the techniques described in this disclosure.

图10为说明根据本发明中描述的技术的操作的实例方法的流程图。10 is a flowchart illustrating an example method of operation according to the techniques described in this disclosure.

具体实施方式DETAILED DESCRIPTION

本发明描述用于视频译码的各种技术，且更具体来说涉及用于用信号发送视频信息(例如)以指定图片输出时序和/或界定缓冲或解码模型(例如假想参考解码器(HRD))的技术。一般来说，在本发明中使用“用信号发送”以指在经译码位流内发生的信令。编码器可产生语法元素以在位流中用信号发送信息作为视频编码过程的部分。解码装置或其它视频处理装置可接收经译码位流，且解译所述经译码位流中的语法元素以作为视频解码过程或其它视频处理的部分。This disclosure describes various techniques for video coding, and more specifically relates to techniques for signaling video information, for example, to specify picture output timing and/or define buffering or decoding models, such as a hypothetical reference decoder (HRD). Generally, "signaling" is used in this disclosure to refer to signaling that occurs within a coded bitstream. An encoder may generate syntax elements to signal information in a bitstream as part of the video encoding process. A decoding device or other video processing device may receive the coded bitstream and interpret the syntax elements in the coded bitstream as part of the video decoding process or other video processing.

举例来说，为指示用于从经译码视频序列中的某一给定图片切换到根据输出排序的下一图片的输出时序，在一些情况下，用于经译码视频序列的时序信息可用信号发送对应于图片次序计数(POC)值的差等于一的时钟节拍的数目。POC值的差等于一可代表根据输出排序的某一给定图片的POC值与下一图片的POC值(例如，根据输出排序的第2图片的POC值与第3图片的POC值)之间的差。视频时序信息还可包含条件，所述条件指明视频时序信息是否用信号发送对应于图片次序计数值的差等于一的所述时钟节拍的数目。换句话说，仅在所述条件成立时，视频时序信息才用信号发送对应于图片次序计数值的差等于一的所述时钟节拍的数目。在一些情况下，所述条件不成立，因此视频时序信息不用信号发送对应于图片次序计数值的差等于一的所述时钟节拍的数目。所述时钟节拍的数目可取决于时间尺度(对应于(例如)界定经用信号发送信息的时间坐标系的振荡器频率，例如27MHz)以及在对应于时钟节拍计数器的一个增量(其称作为“时钟节拍”)的时间尺度下操作的时钟的时间单位的数目。For example, to indicate the output timing for switching from a given picture in a coded video sequence to the next picture in an output order, in some cases, the timing information for the coded video sequence may signal the number of clock ticks corresponding to a difference of one in the picture order count (POC) values. A difference of one in the POC values may represent a difference between the POC value of the given picture in the output order and the POC value of the next picture (e.g., the POC value of the second picture in the output order and the POC value of the third picture in the output order). The video timing information may also include a condition that specifies whether the video timing information signals the number of clock ticks corresponding to a difference of one in the picture order count values. In other words, the video timing information signals the number of clock ticks corresponding to a difference of one in the picture order count values only if the condition is met. In some cases, the condition is not met, and thus the video timing information does not signal the number of clock ticks corresponding to a difference of one in the picture order count values. The number of such clock ticks may depend on a time scale (corresponding to, for example, an oscillator frequency defining a time coordinate system over which information is signaled, such as 27 MHz) and the number of time units of the clock operating at a time scale corresponding to one increment of a clock tick counter, which is referred to as a "clock tick."

在一些实例中，所述技术可包含针对经译码视频序列产生经编码位流以在视频参数集(VPS)语法结构中用信号发送旗标，所述旗标指示经译码视频序列中的每一图片(除经译码视频序列中按解码次序的第一图片外)的图片次序计数(POC)值是否相对于经译码视频序列中的第一图片的输出时间而与所述图片的输出时间成比例。在一些情况下，所述技术可包含以仅在呈时间尺度和时钟节拍语法元素中的单位数目的形式的时序信息也包含于VPS语法结构中时才产生经编码位流以在VPS语法结构中用信号发送所述旗标。In some examples, the techniques may include generating an encoded bitstream for a coded video sequence to signal a flag in a video parameter set (VPS) syntax structure that indicates whether a picture order count (POC) value for each picture in the coded video sequence (except for the first picture in decoding order in the coded video sequence) is proportional to an output time of the picture relative to an output time of the first picture in the coded video sequence. In some cases, the techniques may include generating the encoded bitstream to signal the flag in the VPS syntax structure only when timing information in the form of a number of units in a time scale and clock tick syntax element is also included in the VPS syntax structure.

视频译码标准包含ITU-T H.261、ISO/IEC MPEG-1视频、ITU-T H.262或ISO/IECMPEG-2视频、ITU-T H.263、ISO/IEC MPEG-4视频和ITU-T H.264(还称作ISO/IEC MPEG-4AVC)，包含其可缩放视频译码(SVC)扩展和多视图视频译码(MVC)扩展。Video coding standards include ITU-T H.261, ISO/IEC MPEG-1 Video, ITU-T H.262 or ISO/IEC MPEG-2 Video, ITU-T H.263, ISO/IEC MPEG-4 Video, and ITU-T H.264 (also known as ISO/IEC MPEG-4 AVC), including its Scalable Video Coding (SVC) and Multiview Video Coding (MVC) extensions.

另外，存在新的视频译码标准，即高效率视频译码(HEVC)，其由ITU-T视频译码专家组(VCEG)和ISO/IEC运动图片专家组(MPEG)的视频译码联合协作小组(JCT-VC)开发。HEVC的最新工作草案(且在下文称作HEVC WD9或仅WD9)是布洛斯(Bross)等人的“高效率视频译码(HEVC)文本规范草案9(SoDIS)的建议编辑改进(Proposed editorialimprovements for High Efficiency Video Coding(HEVC)text specification draft 9(SoDIS))”，ITU-T SG 16 WP 3和ISO/IEC JTC 1/SC 29/WG 11的视频译码联合协作小组(JCT-VC)，第12次会议：瑞士日内瓦，2013年1月14日到23日，截至2013年1月7日可从http://phenix.int-evry.fr/jct/doc_end_user/documents/12_Geneva/wg11/JCTVC-L0030-v1.zip获得。Additionally, there is a new video coding standard, High Efficiency Video Coding (HEVC), which is being developed by the Joint Collaboration Team on Video Coding (JCT-VC) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG). The latest working draft of HEVC (and hereinafter referred to as HEVC WD9 or just WD9) is Bross et al., “Proposed editorial improvements for High Efficiency Video Coding (HEVC) text specification draft 9 (SoDIS)”, Joint Collaboration Team on Video Coding (JCT-VC) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 12th meeting: Geneva, Switzerland, January 14–23, 2013, available from http://phenix.int-evry.fr/jct/doc_end_user/documents/12_Geneva/wg11/JCTVC-L0030-v1.zip as of January 7, 2013.

HEVC标准的最近草案(称作“HEVC工作草案10”或“WD10”)描述于布洛斯等人的文献JCTVC-L1003v34中，题为“高效率视频译码(HEVC)文本规范草案10(针对FDIS&LastCall)(High efficiency video coding(HEVC)text specification draft 10(for FDIS&Last Call))”，ITU-T SG16 WP3和ISO/IEC JTC1/SC29/WG11的视频译码联合小组(JCT-VC)，第12次会议：瑞士日内瓦，2013年1月14日到23日，所述草案可从http://phenix.int-evry.fr/jct/doc_end_user/documents/12_Geneva/wg11/JCTVC-L1003-v34.zip下载。A recent draft of the HEVC standard (referred to as “HEVC Working Draft 10” or “WD10”) is described in document JCTVC-L1003v34 by Bloss et al., entitled “High efficiency video coding (HEVC) text specification draft 10 (for FDIS & Last Call)”, Joint Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, 12th Meeting: Geneva, Switzerland, January 14-23, 2013, which can be downloaded from http://phenix.int-evry.fr/jct/doc_end_user/documents/12_Geneva/wg11/JCTVC-L1003-v34.zip.

HEVC标准的另一草案(本文中称作“WD10版本”)描述于布洛斯等人的“编辑者对HEVC版本1的建议校正(Editors'proposed corrections to HEVC version 1)”中，ITU-TSG16 WP3和ISO/IEC JTC1/SC29/WG11的视频译码联合小组(JCT-VC)，第13次会议(韩国仁川，2013年4月)，所述草案截至2013年6月7日可从http://phenix.int-evry.fr/jct/doc_end_user/documents/13_Incheon/wg11/JCTVC-M0432-v3.zip下载Another draft of the HEVC standard (referred to herein as "WD10 version") is described in Bloss et al., "Editors' proposed corrections to HEVC version 1," Joint Team on Video Coding (JCT-VC) of ITU-TSG16 WP3 and ISO/IEC JTC1/SC29/WG11, 13th Meeting (Incheon, South Korea, April 2013), which was available as of June 7, 2013, from http://phenix.int-evry.fr/jct/doc_end_user/documents/13_Incheon/wg11/JCTVC-M0432-v3.zip.

HEVC标准化努力是基于被称作HEVC测试模型(HM)的视频译码装置的模型。HM假定当前视频解码装置相对于在其它先前视频解码标准(例如，ITU-T H.264/AVC)的开发期间可用的视频解码装置的能力的改进。举例来说，尽管H.264提供九个帧内预测编码模式，但HEVC提供多达三十五个帧内预测编码模式。HEVC WD9和HEVC WD10的全部内容以引用的方式并入本文中。The HEVC standardization effort is based on a model of a video coding device known as the HEVC Test Model (HM). The HM assumes improvements in the capabilities of current video decoding devices relative to those available during the development of other previous video decoding standards, such as ITU-T H.264/AVC. For example, while H.264 provides nine intra-frame prediction coding modes, HEVC provides up to thirty-five intra-frame prediction coding modes. The entire contents of HEVC WD9 and HEVC WD10 are incorporated herein by reference.

视频译码标准通常包含视频缓冲模型的规范。在AVC和HEVC中，缓冲模型被称作假想参考解码器(HRD)，其包含经译码图片缓冲器(CPB)和经解码图片缓冲器(DPB)两者的缓冲模型。如HEVC WD9中所定义，HRD为指定对网络抽象层(NAL)单元流的可变性的约束或符合编码过程可产生的字节流的假想解码器模型。在数学上指定CPB和DPB行为。HRD对不同时序、缓冲器大小和位速率直接强加约束，且对位流特性和统计间接强加约束。完整的HRD参数集包含五个基本参数：初始CPB移除延迟、CPB大小、位速率、初始DPB输出延迟和DPB大小。Video coding standards typically include specifications for video buffering models. In AVC and HEVC, the buffering model is called a hypothetical reference decoder (HRD), which includes buffering models for both the coded picture buffer (CPB) and the decoded picture buffer (DPB). As defined in HEVC WD9, HRD is a hypothetical decoder model that specifies constraints on the variability of the network abstraction layer (NAL) unit stream or conforms to the byte stream that can be generated by the encoding process. The CPB and DPB behaviors are mathematically specified. HRD directly imposes constraints on different timings, buffer sizes, and bit rates, and indirectly imposes constraints on bitstream characteristics and statistics. The complete HRD parameter set includes five basic parameters: initial CPB removal delay, CPB size, bit rate, initial DPB output delay, and DPB size.

在AVC和HEVC中，将位流符合性和解码器符合性指定为HRD规范的部分。尽管“假想参考解码器”包含术语“解码器”，但通常在编码器侧需要HRD来保证位流符合性，且解码器侧通常不需要HRD。指定两类型的位流或HRD符合性，即类型I和类型II。而且，指定两种类型的解码器符合性，即输出时序解码器符合性和输出次序解码器符合性。In AVC and HEVC, bitstream and decoder conformance are specified as part of the HRD specification. Although the term "hypothetical reference decoder" includes the term "decoder," HRD is generally required on the encoder side to ensure bitstream conformance, and is generally not required on the decoder side. Two types of bitstream or HRD conformance are specified: Type I and Type II. Furthermore, two types of decoder conformance are specified: output timing decoder conformance and output order decoder conformance.

在HEVC WD9中，HRD操作需要在hrd_parameters()语法结构、缓冲周期补充增强信息(SEI)消息、图片时序SEI消息中且有时还在解码单元信息SEI消息中用信号发送的参数。hrd_parameters()语法结构可在视频参数集(VPS)、序列参数集(SPS)或其任何组合中用信号发送。In HEVC WD9, HRD operation requires parameters that are signaled in the hrd_parameters() syntax structure, the buffering period supplemental enhancement information (SEI) message, the picture timing SEI message, and sometimes also in the decoding unit information SEI message. The hrd_parameters() syntax structure can be signaled in a video parameter set (VPS), a sequence parameter set (SPS), or any combination thereof.

在HEVC WD9中，hrd_parameters()语法结构包含用于视频时序信息的信令的语法元素，包含时间尺度和时钟节拍中的单位数目。SPS的视频可用性信息(VUI)部分包含旗标，所述旗标指示经译码视频序列中的每一图片(除经译码视频序列中按解码次序的第一图片外)的图片次序计数(POC)值是否相对于经译码视频序列中的第一图片的输出时间而与所述图片的输出时间成比例；如果是，则所述时钟节拍的数目对应于图片次序计数值的差等于1。In HEVC WD9, the hrd_parameters() syntax structure contains syntax elements for signaling video timing information, including the time scale and the number of units in clock ticks. The Video Usability Information (VUI) portion of the SPS contains a flag that indicates whether the picture order count (POC) value of each picture in the coded video sequence (except the first picture in decoding order in the coded video sequence) is proportional to the output time of the picture relative to the output time of the first picture in the coded video sequence; if so, the number of clock ticks corresponds to the difference in the picture order count value being equal to 1.

HEVC WD9中的相关语法和语义如下。表1展示根据WD9的实例视频参数集原始字节序列有效负载(RBSP)语法结构。The relevant syntax and semantics in HEVC WD9 are as follows: Table 1 shows an example video parameter set raw byte sequence payload (RBSP) syntax structure according to WD9.

表1：实例视频参数集RBSP语法结构Table 1: Example video parameter set RBSP syntax structure

在上表1中，语法元素vps_num_hrd_parameters指定存在于视频参数集原始字节序列有效负载(RBSP)中的hrd_parameters()语法结构的数目。在符合此规范的此版本的位流中，vps_num_hrd_parameters的值应小于或等于1。尽管在HEVC WD9中要求vps_num_hrd_parameters的值小于或等于1，但解码器应允许在范围0到1024(包括0和1024)中的vps_num_hrd_parameters的其它值出现于语法中。In Table 1 above, the syntax element vps_num_hrd_parameters specifies the number of hrd_parameters() syntax structures present in the video parameter set raw byte sequence payload (RBSP). In bitstreams conforming to this version of this specification, the value of vps_num_hrd_parameters shall be less than or equal to 1. Although the value of vps_num_hrd_parameters is required to be less than or equal to 1 in HEVC WD9, decoders shall allow other values of vps_num_hrd_parameters in the range 0 to 1024 (inclusive) to appear in the syntax.

语法元素hrd_op_set_idx[i]指定视频参数集(VPS)中的第i hrd_parameters()语法结构所应用到的操作点集合在视频参数集所指定的操作点集合的列表中的索引。在符合此规范的此版本的位流中，hrd_op_set_idx[i]的值应等于0。尽管在HEVC WD9中要求hrd_op_set_idx[i]的值小于或等于1，但解码器应允许在范围0到1023中的hrd_op_set_idx[i]的其它值出现于语法中。The syntax element hrd_op_set_idx[i] specifies the index in the list of operation point sets specified by the video parameter set (VPS) to which the i-th hrd_parameters() syntax structure in the video parameter set applies. In bitstreams conforming to this version of this specification, the value of hrd_op_set_idx[i] shall be equal to 0. Although HEVC WD9 requires that the value of hrd_op_set_idx[i] be less than or equal to 1, decoders shall allow other values of hrd_op_set_idx[i] in the range 0 to 1023 to appear in the syntax.

语法元素cprms_present_flag[i]等于1指定所有子层共同的HRD参数存在于视频参数集中的第i hrd_parameters()语法结构中。cprms_present_flag[i]等于0指定所有子层共同的HRD参数不存在于视频参数集中的第i hrd_parameters()语法结构中，且经导出视频参数集中的第(i-1)hrd_parameters()语法结构也是相同的。推断cprms_present_flag[0]等于1。The syntax element cprms_present_flag[i] equal to 1 specifies that the HRD parameters common to all sublayers are present in the i-th hrd_parameters() syntax structure in the video parameter set. cprms_present_flag[i] equal to 0 specifies that the HRD parameters common to all sublayers are not present in the i-th hrd_parameters() syntax structure in the video parameter set, and the (i-1)-th hrd_parameters() syntax structure in the derived video parameter set is also the same. cprms_present_flag[0] is inferred to be 1.

下表2展示根据WD9的VUI参数语法结构。Table 2 below shows the VUI parameter syntax structure according to WD9.

表2：VUI参数语法结构Table 2: VUI parameter syntax structure

在上表2中，语法元素hrd_parameters_present_flag等于1指定语法结构hrd_parameters()存在于vui_parameters()语法结构中。hrd_parameters_present_flag等于0指定语法结构hrd_parameters()不存在于vui_parameters()语法结构中。In Table 2 above, the syntax element hrd_parameters_present_flag equal to 1 specifies that the syntax structure hrd_parameters() is present in the vui_parameters() syntax structure. hrd_parameters_present_flag equal to 0 specifies that the syntax structure hrd_parameters() is not present in the vui_parameters() syntax structure.

语法元素poc_proportional_to_timing_flag等于1指示经译码视频序列中的每一图片(除经译码视频序列中按解码次序的第一图片外)的图片次序计数值相对于经译码视频序列中的第一图片的输出时间而与所述图片的输出时间成比例。poc_proportional_to_timing_flag等于0指示经译码视频序列中的每一图片(除经译码视频序列中按解码次序的第一图片外)的图片次序计数值可能或可能并非相对于经译码视频序列中的第一图片的输出时间而与所述图片的输出时间成比例。The syntax element poc_proportional_to_timing_flag equal to 1 indicates that the picture order count value of each picture in the coded video sequence (except the first picture in decoding order in the coded video sequence) is proportional to the output time of the picture relative to the output time of the first picture in the coded video sequence. The syntax element poc_proportional_to_timing_flag equal to 0 indicates that the picture order count value of each picture in the coded video sequence (except the first picture in decoding order in the coded video sequence) may or may not be proportional to the output time of the picture relative to the output time of the first picture in the coded video sequence.

语法元素num_ticks_poc_diff_one_minus1加1指定对应于图片次序计数值的差等于1的时钟节拍的数目。The syntax element num_ticks_poc_diff_one_minus1 plus 1 specifies the number of clock ticks corresponding to the difference in picture order count values being equal to 1.

下表3展示根据WD9的实例HRD参数语法结构。Table 3 below shows an example HRD parameter syntax structure according to WD9.

表3：实例HRD参数语法结构Table 3: Example HRD parameter syntax structure

在上表3中，语法元素timing_info_present_flag等于1指定num_units_in_tick和time_scale存在于hrd_parameters()语法结构中。如果timing_info_present_flag等于0，则num_units_in_tick和time_scale不存在于hrd_parameters()语法结构中。如果不存在，则推断timing_info_present_flag的值为0。In Table 3 above, the syntax element timing_info_present_flag equal to 1 specifies that num_units_in_tick and time_scale are present in the hrd_parameters() syntax structure. If timing_info_present_flag is equal to 0, then num_units_in_tick and time_scale are not present in the hrd_parameters() syntax structure. If not present, the value of timing_info_present_flag is inferred to be 0.

语法元素num_units_in_tick为在对应于时钟节拍计数器的一个增量(称作时钟节拍)的频率time_scale Hz下操作的时钟的时间单位的数目。语法元素num_units_in_tick的值应大于0。时钟节拍为当sub_pic_cpb_params_present_flag等于0时可用经译码数据表示的时间的最小间隔。举例来说，当视频信号的图片速率为25Hz时，time_scale可等于27,000,000，且num_units_in_tick可等于1,080,000。The syntax element num_units_in_tick is the number of time units of a clock operating at a frequency time_scale Hz corresponding to one increment of a clock tick counter (referred to as a clock tick). The value of the syntax element num_units_in_tick shall be greater than 0. A clock tick is the smallest interval of time that can be represented by coded data when sub_pic_cpb_params_present_flag is equal to 0. For example, when the picture rate of the video signal is 25 Hz, time_scale may be equal to 27,000,000 and num_units_in_tick may be equal to 1,080,000.

语法元素time_scale为在一秒中经过的时间单位的数目。举例来说，使用27MHz频率来测量时间的时间坐标系具有为27,000,000的time_scale。语法元素time_scale的值应大于0。The syntax element time_scale is the number of time units that pass in one second. For example, a time coordinate system that uses a 27 MHz frequency to measure time has a time_scale of 27,000,000. The value of the syntax element time_scale should be greater than 0.

如在HEVC WD9指定且如上文描述的时序信令可展现许多问题。第一，用于用信号发送语法元素num_ticks_poc_diff_one_minus1的条件为“if(poc_proportional_to_timing_flag&&timing_info_present_flag)”。此条件包含对两个用信号发送的语法元素的相依性：poc_proportional_to_timing_flag和timing_info_present_flag。然而，从HEVC WD9规范并不清楚：用于条件的timing_info_present_flag是参考SPS的VUI部分中的hrd_parameters()语法结构(如果存在)的语法元素timing_info_present_flag还是参考VPS中的hrd_parameters()语法结构的语法元素timing_info_present_flag。Timing signaling as specified in HEVC WD9 and as described above can present a number of problems. First, the condition for signaling the syntax element num_ticks_poc_diff_one_minus1 is "if (poc_proportional_to_timing_flag && timing_info_present_flag)". This condition includes a dependency on two signaled syntax elements: poc_proportional_to_timing_flag and timing_info_present_flag. However, it is not clear from the HEVC WD9 specification whether the timing_info_present_flag used for the condition is the syntax element timing_info_present_flag of the hrd_parameters() syntax structure (if present) in the VUI part of the reference SPS or the syntax element timing_info_present_flag of the hrd_parameters() syntax structure in the VPS.

另外，多个层或可缩放视频位流的多个可能位流子集可共享时间尺度和时钟节拍中的单位数目的共同值，所述共同值在hrd_parameters()语法结构的语法元素time_scale和num_units_in_tick中的HEVC WD9中指定，例如，所述共同值可在SPS的VUI部分中且在VPS中重复用信号发送。此类重复如果存在于位流中则可导致位浪费。In addition, multiple layers or multiple possible bitstream subsets of a scalable video bitstream may share common values for the time scale and the number of units in the tick, which are specified in HEVC WD9 in the syntax elements time_scale and num_units_in_tick of the hrd_parameters() syntax structure. For example, the common values may be signaled repeatedly in the VUI portion of the SPS and in the VPS. Such repetition, if present in the bitstream, may result in wasted bits.

此外，如果图片次序计数(POC)值与可缩放视频位流的若干层中的任一者的输出时间成比例，则POC值通常与可缩放视频位流的所有层的输出时间成比例。然而，HEVC WD9规范并未提供在可缩放视频位流中用信号发送POC值与可缩放视频位流的所有层或所有可能位流子集的输出时间成比例的指示。举例来说，参考可缩放视频位流的“层”可指可缩放层、纹理视图和/或深度视图。另外，尽管HEVC WD9指定旗标poc_proportional_to_timing_flag总是在SPS的VUI语法结构中用信号发送，但如果语法元素time_scale和num_units_in_tick也未在位流中用信号发送，则旗标poc_proportional_to_timing_flag不具有效用。Furthermore, if a picture order count (POC) value is proportional to the output time of any one of several layers of a scalable video bitstream, the POC value is typically proportional to the output time of all layers of the scalable video bitstream. However, the HEVC WD9 specification does not provide for signaling in a scalable video bitstream that POC values are proportional to the output time of all layers of the scalable video bitstream, or all possible bitstream subsets. For example, a reference to a "layer" of a scalable video bitstream may refer to a scalable layer, a texture view, and/or a depth view. Additionally, although HEVC WD9 specifies that the flag poc_proportional_to_timing_flag is always signaled in the VUI syntax structure of an SPS, the flag poc_proportional_to_timing_flag has no effect if the syntax elements time_scale and num_units_in_tick are also not signaled in the bitstream.

本发明的技术可解决上述问题中的一或多者以及提供其它改进，从而实现用于HRD操作的参数的有效信令。本文参考HEVC WD9和其潜在改进来描述所述技术的各种实例。解决方案适用于包含视频缓冲模型的规范的任何视频译码标准(例如包含AVC和HEVC)，但为了说明的目的，描述特定针对在HEVC WD9中定义且根据本发明的技术而修改的HRD参数信令。The techniques of this disclosure may address one or more of the above-mentioned issues and provide other improvements, thereby enabling efficient signaling of parameters for HRD operation. Various examples of the techniques are described herein with reference to HEVC WD9 and potential improvements thereof. The solutions are applicable to any video coding standard that includes a specification of a video buffering model, such as AVC and HEVC, but for illustrative purposes, the description is specific to HRD parameter signaling defined in HEVC WD9 and modified according to the techniques of this disclosure.

图1是说明可利用本发明中描述的技术的实例视频编码和解码系统10的框图。如图1中所示，系统10包含源装置12，所述源装置产生经编码视频数据以在稍后时间由目的地装置14解码。源装置12及目的地装置14可包括广泛范围的装置中的任一者，包含桌上型计算机、笔记型(即，膝上型)计算机、平板计算机、机顶盒、电话手持机(例如，所谓的“智能”电话)、所谓的“智能”平板计算机、电视机、摄像机、显示装置、数字媒体播放器、视频游戏控制台、视频流式传输装置或其类似者。在一些情况下，源装置12和目的地装置14可能经装备以用于无线通信。FIG1 is a block diagram illustrating an example video encoding and decoding system 10 that may utilize the techniques described in this disclosure. As shown in FIG1 , system 10 includes a source device 12 that generates encoded video data to be decoded at a later time by a destination device 14. Source device 12 and destination device 14 may comprise any of a wide range of devices, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets (e.g., so-called "smart" phones), so-called "smart" tablet computers, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, or the like. In some cases, source device 12 and destination device 14 may be equipped for wireless communication.

目的地装置14可经由链路16接收待解码的经编码视频数据。链路16可包括能够将经编码视频数据从源装置12移动到目的地装置14的任何类型的媒体或装置。在一个实例中，链路16可包括使得源装置12能够实时将经编码视频数据直接传输到目的地装置14的通信媒体。经编码视频数据可根据通信标准(例如，无线通信协议)来调制，且被传输到目的地装置14。通信媒体可包括任何无线或有线通信媒体，例如射频(RF)频谱或一或多个物理传输线路。通信媒体可形成基于包的网络(例如，局域网、广域网或全球网络，例如因特网)的部分。通信媒体可包含路由器、交换器、基站或可用于促进从源装置12到目的地装置14的通信的任何其它装备。Destination device 14 may receive the encoded video data to be decoded via link 16. Link 16 may comprise any type of medium or device capable of moving the encoded video data from source device 12 to destination device 14. In one example, link 16 may comprise a communication medium that enables source device 12 to transmit the encoded video data directly to destination device 14 in real time. The encoded video data may be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to destination device 14. The communication medium may comprise any wireless or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network, such as the Internet. The communication medium may include routers, switches, base stations, or any other equipment that can be used to facilitate communication from source device 12 to destination device 14.

或者，经编码数据可以从输出接口22输出到存储装置34。类似地，可以通过输入接口从存储装置34存取经编码数据。存储装置34可包含多种分布式或本地存取的数据存储媒体中的任一者，例如硬盘驱动器、蓝光光盘、DVD、CD-ROM、快闪存储器、易失性或非易失性存储器或任何其它用于存储经编码视频数据的合适的数字存储媒体。在另一实例中，存储装置34可以对应于文件服务器或可保持由源装置12产生的经编码视频的另一中间存储装置。目的地装置14可经由流式传输或下载从存储装置34存取经存储的视频数据。文件服务器可以是能够存储经编码视频数据并且将所述经编码视频数据传输到目的地装置14的任何类型的服务器。实例文件服务器包含网络服务器(例如，用于网站)、FTP服务器、网络附接存储(NAS)装置或本地磁盘驱动器。目的地装置14可以通过任何标准数据连接(包含因特网连接)来存取经编码视频数据。此可包含无线通道(例如，Wi-Fi连接)、有线连接(例如，DSL、缆线调制解调器，等等)，或适合于存取存储于文件服务器上的经编码视频数据的以上两者的组合。经编码视频数据从存储装置34的传输可能是流式传输、下载传输或两者的组合。Alternatively, the encoded data may be output from output interface 22 to storage device 34. Similarly, the encoded data may be accessed from storage device 34 via input interface. Storage device 34 may include any of a variety of distributed or locally accessed data storage media, such as a hard drive, Blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data. In another example, storage device 34 may correspond to a file server or another intermediate storage device that can hold the encoded video generated by source device 12. Destination device 14 may access the stored video data from storage device 34 via streaming or downloading. A file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to destination device 14. Example file servers include a network server (e.g., for a website), an FTP server, a network attached storage (NAS) device, or a local disk drive. Destination device 14 may access the encoded video data via any standard data connection, including an Internet connection. This may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both suitable for accessing encoded video data stored on a file server. The transmission of the encoded video data from storage device 34 may be a streaming transmission, a download transmission, or a combination of both.

本发明的技术不必限于无线应用或环境。所述技术可以应用于支持多种多媒体应用中的任一者的视频译码，例如空中协议电视广播、有线电视传输、卫星电视传输、流式视频传输(例如，经由因特网)、编码数字视频以供存储在数据存储媒体上，解码存储在数据存储媒体上的数字视频，或其它应用。在一些实例中，系统10可经配置以支持单向或双向视频传输，以支持例如视频流式传输、视频回放、视频广播和/或视频电话等应用。The techniques of this disclosure are not necessarily limited to wireless applications or environments. The techniques may be applied to video decoding to support any of a variety of multimedia applications, such as over-the-air protocol television broadcasting, cable television transmission, satellite television transmission, streaming video transmission (e.g., via the Internet), encoding digital video for storage on a data storage medium, decoding digital video stored on a data storage medium, or other applications. In some examples, system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

在图1的实例中，源装置12包含视频源18、视频编码器20和输出接口22。在一些情况下，输出接口22可包含调制器/解调器(调制解调器)及/或发射器。在源装置12中，视频源18可包含例如视频俘获装置(例如，摄像机)、含有先前俘获的视频的视频存档、用于从视频内容提供者接收视频的视频馈入接口及/或用于产生计算机图形数据作为源视频的计算机图形系统，或此类源的组合等源。作为一个实例，如果视频源18是摄像机，那么源装置12和目的地装置14可以形成所谓的摄像机电话或视频电话。然而，本发明中描述的技术一般可适用于视频译码，且可应用于无线及/或有线应用。1 , source device 12 includes a video source 18, a video encoder 20, and an output interface 22. In some cases, output interface 22 may include a modulator/demodulator (modem) and/or a transmitter. In source device 12, video source 18 may include a source such as a video capture device (e.g., a video camera), a video archive containing previously captured video, a video feed interface for receiving video from a video content provider, and/or a computer graphics system for generating computer graphics data as source video, or a combination of such sources. As an example, if video source 18 is a video camera, source device 12 and destination device 14 may form so-called camera phones or video phones. However, the techniques described in this disclosure may be applicable to video coding generally and may be applied to wireless and/or wired applications.

可由视频编码器20来编码所俘获视频、经预先俘获的视频或计算机产生的视频。经编码视频数据可经由源装置12的输出接口22直接传输到目的地装置14。经编码视频数据还可(或替代地)存储到存储装置34上以供稍后由目的地装置14或其它装置存取以用于解码及/或回放。The captured video, pre-captured video, or computer-generated video may be encoded by video encoder 20. The encoded video data may be transmitted directly to destination device 14 via output interface 22 of source device 12. The encoded video data may also (or instead) be stored on storage device 34 for later access by destination device 14 or other devices for decoding and/or playback.

目的地装置14包含输入接口28、视频解码器30及显示装置32。在一些情况下，输入接口28可包含接收器及/或调制解调器。目的地装置14的输入接口28经由链路16接收经编码视频数据。经由链路16传达或在存储装置34上提供的经编码视频数据可包含由视频编码器20产生的多种语法元素以供由例如视频解码器30等视频解码器用于解码视频数据。这些语法元素可与在通信媒体上传输、存储在存储媒体上或存储在文件服务器上的经编码视频数据包含在一起。Destination device 14 includes an input interface 28, a video decoder 30, and a display device 32. In some cases, input interface 28 may include a receiver and/or a modem. Input interface 28 of destination device 14 receives encoded video data via link 16. The encoded video data communicated via link 16 or provided on storage device 34 may include a variety of syntax elements generated by video encoder 20 for use by a video decoder, such as video decoder 30, in decoding the video data. These syntax elements may be included with the encoded video data transmitted over a communication medium, stored on a storage medium, or stored on a file server.

显示装置32可与目的地装置14集成或在目的地装置14外部。在一些实例中，目的地装置14可包含集成显示装置，且还经配置以与外部显示装置介接。在其它实例中，目的地装置14可为显示装置。一般来说，显示装置32将经解码视频数据显示给用户，且可包括多种显示装置中的任一者，例如液晶显示器(LCD)、等离子显示器、有机发光二极管(OLED)显示器或另一类型的显示装置。Display device 32 may be integrated with destination device 14 or external to destination device 14. In some examples, destination device 14 may include an integrated display device and also be configured to interface with an external display device. In other examples, destination device 14 may be a display device. Generally, display device 32 displays decoded video data to a user and may include any of a variety of display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or another type of display device.

视频编码器20及视频解码器30可根据视频压缩标准(例如，目前正在开发的高效率视频译码(HEVC)标准)来操作，且可符合HEVC测试模型(HM)。或者，视频编码器20和视频解码器30可根据其它专有或业界标准来操作，所述标准例如是ITU-T H.264标准(或者被称作MPEG-4第10部分，高级视频译码(AVC))，或此类标准的扩展。然而，本发明的技术不限于任何特定译码标准。视频压缩标准的其它实例包含MPEG-2及ITU-T H.263。Video encoder 20 and video decoder 30 may operate according to a video compression standard, such as the High Efficiency Video Coding (HEVC) standard currently under development, and may conform to the HEVC Test Model (HM). Alternatively, video encoder 20 and video decoder 30 may operate according to other proprietary or industry standards, such as the ITU-T H.264 standard (alternatively known as MPEG-4 Part 10, Advanced Video Coding (AVC)), or extensions of such standards. However, the techniques of this disclosure are not limited to any particular coding standard. Other examples of video compression standards include MPEG-2 and ITU-T H.263.

尽管图1中未展示，但在一些方面中，视频编码器20及视频解码器30可各自与音频编码器及解码器集成，且可包含适当MUX-DEMUX单元或其它硬件及软件以处置对共同数据流或单独数据流中的音频及视频两者的编码。在一些实例中，如果适用，MUX-DEMUX单元可符合ITU H.223多路复用器协议，或例如用户数据报协议(UDP)等其它协议。Although not shown in FIG1 , in some aspects, video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder and may include appropriate MUX-DEMUX units or other hardware and software to handle the encoding of both audio and video in a common data stream or in separate data streams. In some examples, the MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol, or other protocols such as the User Datagram Protocol (UDP), if applicable.

视频编码器20及视频解码器30各自可实施为多种合适的编码器电路中的任一者，例如一或多个微处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、离散逻辑、软件、硬件、固件或其任何组合。当部分地用软件实施所述技术时，装置可将用于软件的指令存储在合适的非暂时性计算机可读媒体中且使用一或多个处理器用硬件执行所述指令以执行本发明的技术。视频编码器20及视频解码器30中的每一者可包含在一或多个编码器或解码器中，所述编码器或解码器中的任一者可集成为相应装置中的组合编码器/解码器(CODEC)的部分。Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuits, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware, or any combination thereof. When the techniques are implemented partially in software, the device may store instructions for the software in a suitable, non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in the respective device.

JCT-VC正致力于HEVC标准的开发。HEVC标准化努力是基于视频译码装置的演进模型，其被称作HEVC测试模型(HM)。HM假设视频译码装置相对于根据(例如)ITU-T H.264/AVC的现有装置的若干额外能力。举例来说，H.264提供九个帧内预测编码模式，而HM可提供多达三十三个帧内预测编码模式。The JCT-VC is working on the development of the HEVC standard. The HEVC standardization effort is based on an evolving model of a video coding device, referred to as the HEVC Test Model (HM). The HM assumes several additional capabilities of video coding devices relative to existing devices based on, for example, ITU-T H.264/AVC. For example, while H.264 provides nine intra-frame prediction coding modes, the HM can provide up to 33 intra-frame prediction coding modes.

一般来说，HM的工作模型描述视频帧或图片可以被划分为包含亮度和色度样本两者的一序列的树块或最大译码单元(LCU)。树块具有与H.264标准的宏块类似的目的。切片包含呈译码次序的多个连续树块。视频帧或图片可被分割成一或多个切片。每一树块可以根据四叉树而分裂成译码单元(CU)。举例来说，作为四叉树的根节点的树块可分裂成四个子节点，且每一子节点又可为父节点且分裂成另外四个子节点。最后的未经分裂的子节点(作为四叉树的叶节点)包括译码节点，即，经译码视频块。与经译码位流相关联的语法数据可界定树块可分裂的最大次数，且还可界定译码节点的最小大小。In general, the working model of the HM describes that a video frame or picture can be divided into a sequence of tree blocks or largest coding units (LCUs) that include both luma and chroma samples. Tree blocks have a similar purpose to macroblocks of the H.264 standard. A slice includes multiple consecutive tree blocks in coding order. A video frame or picture can be partitioned into one or more slices. Each tree block can be split into coding units (CUs) according to a quadtree. For example, a tree block that is the root node of a quadtree can be split into four child nodes, and each child node can be a parent node and split into another four child nodes. The last unsplit child node (which is a leaf node of the quadtree) includes a coding node, i.e., a coded video block. Syntax data associated with the coded bitstream can define the maximum number of times a tree block can be split, and can also define the minimum size of a coding node.

CU包含译码节点以及与所述译码节点相关联的预测单元(PU)及变换单元(TU)。CU的大小一般对应于译码节点的大小并且形状通常必定是正方形。CU的大小可以在从8×8像素直到具有最大64×64像素或更大的树块的大小的范围内。每一CU可含有一或多个PU及一或多个TU。举例来说，与CU相关联的语法数据可描述将CU分割成一或多个PU。分割模式可以在CU被跳过或经直接模式编码、帧内预测模式编码或帧间预测模式编码之间有区别。PU可分割成非正方形形状。举例来说，与CU相关联的语法数据还可描述CU根据四叉树被分割为一或多个TU。TU可以是正方形或非正方形形状。A CU includes a coding node and prediction units (PUs) and transform units (TUs) associated with the coding node. The size of a CU generally corresponds to the size of the coding node and the shape is usually necessarily square. The size of a CU can range from 8×8 pixels up to the size of a treeblock with a maximum of 64×64 pixels or larger. Each CU may contain one or more PUs and one or more TUs. For example, the syntax data associated with a CU may describe the partitioning of the CU into one or more PUs. The partitioning mode may distinguish between whether the CU is skipped or encoded in direct mode, intra-prediction mode, or inter-prediction mode. The PU may be partitioned into a non-square shape. For example, the syntax data associated with a CU may also describe that the CU is partitioned into one or more TUs according to a quadtree. A TU may be square or non-square in shape.

HEVC标准允许根据TU的变换，TU可针对不同CU而有所不同。TU的大小通常是基于针对经分割LCU界定的给定CU内的PU的大小而设置，但是情况可能并不总是如此。TU通常与PU大小相同或小于PU。在一些实例中，对应于CU的残差样本可以使用被称为“残差四叉树”(RQT)的四叉树结构而细分成较小单元。RQT的叶节点可被称为变换单元(TU)。可以变换与TU相关联的像素差值以产生变换系数，可以将所述变换系数量化。The HEVC standard allows for transforms based on TUs, which can be different for different CUs. The size of a TU is typically set based on the size of the PU within a given CU defined for a partitioned LCU, but this may not always be the case. A TU is typically the same size as or smaller than a PU. In some examples, the residual samples corresponding to a CU can be subdivided into smaller units using a quadtree structure called a "residual quadtree" (RQT). The leaf nodes of the RQT can be called transform units (TUs). Pixel difference values associated with a TU can be transformed to produce transform coefficients, which can be quantized.

一般来说，PU包含与预测过程有关的数据。举例来说，当PU经帧内模式编码时，PU可包含描述PU的帧内预测模式的数据。作为另一实例，当PU经帧间模式编码时，PU可包含界定PU的运动向量的数据。界定PU的运动向量的数据可描述(例如)运动向量的水平分量、运动向量的垂直分量、运动向量的分辨率(例如，四分之一像素精度或八分之一像素精度)、运动向量指向的参考帧，和/或运动向量的参考图片列表(例如，列表0、列表1或列表C)。In general, a PU includes data related to the prediction process. For example, when the PU is intra-mode encoded, the PU may include data describing the intra-prediction mode of the PU. As another example, when the PU is inter-mode encoded, the PU may include data defining the motion vector of the PU. The data defining the motion vector of the PU may describe, for example, the horizontal component of the motion vector, the vertical component of the motion vector, the resolution of the motion vector (e.g., quarter-pixel precision or eighth-pixel precision), the reference frame to which the motion vector points, and/or the reference picture list of the motion vector (e.g., List 0, List 1, or List C).

一般来说，TU用于变换过程和量化过程。具有一或多个PU的给定CU还可包含一或多个变换单元(TU)。在预测之后，视频编码器20可根据PU而从译码节点所识别的视频块计算残差值。接着更新译码节点以参考残差值而非原始视频块。残差值包括像素差值，可使用TU中所规定的变换和其它变换信息来将所述像素差值变换为变换系数、量化和扫描以产生串行化变换系数用于熵译码。可再次更新译码节点以参考这些串行化变换系数。本发明通常使用术语“视频块”来指CU的译码节点。在一些特定情况下，本发明还可使用术语“视频块”来指包含译码节点和若干PU和TU的树块(即，LCU或CU)。In general, TUs are used for the transform and quantization processes. A given CU with one or more PUs may also include one or more transform units (TUs). After prediction, the video encoder 20 may calculate residual values from the video block identified by the coding node based on the PUs. The coding node is then updated to reference the residual values instead of the original video block. The residual values include pixel difference values, which can be transformed into transform coefficients, quantized, and scanned using the transform and other transform information specified in the TU to produce serialized transform coefficients for entropy coding. The coding node can be updated again to reference these serialized transform coefficients. This disclosure generally uses the term "video block" to refer to the coding node of a CU. In some specific cases, this disclosure may also use the term "video block" to refer to a tree block (i.e., LCU or CU) that includes a coding node and several PUs and TUs.

视频序列通常包含一系列视频帧或图片。图片群组(GOP)一般包括一系列一或多个视频图片。GOP可包含GOP的标头、图片中的一或多者的标头或其它处的语法数据，其描述GOP中所包含的图片的数目。图片的每一切片可包含描述用于相应切片的编码模式的切片语法数据。视频编码器20通常对个别视频切片内的视频块操作以便编码视频数据。视频块可以对应于CU内的译码节点。视频块可以具有固定或变化的大小，并且根据指定译码标准可以有不同大小。A video sequence typically includes a series of video frames or pictures. A group of pictures (GOP) generally includes a series of one or more video pictures. A GOP may include a header for the GOP, a header for one or more of the pictures, or other syntax data that describes the number of pictures included in the GOP. Each slice of a picture may include slice syntax data that describes the coding mode for the corresponding slice. Video encoder 20 typically operates on video blocks within individual video slices to encode the video data. A video block may correspond to a decoding node within a CU. A video block may have a fixed or variable size and may have different sizes depending on a specified decoding standard.

作为实例，HM支持各种PU大小的预测。假设特定CU的大小是2N×2N，那么HM支持2N×2N或N×N的PU大小的帧内预测，及2N×2N、2N×N、N×2N或N×N的对称PU大小的帧间预测。HM还支持用于2N×nU、2N×nD、nL×2N及nR×2N的PU大小的帧间预测的不对称分割。在不对称分割中，不分割CU的一个方向，而将另一方向分割成25％及75％。CU的对应于25％分区的部分由“n”继之以“上”、“下”、“左”或“右”的指示来指示。因此，例如，“2N×nU”是指经水平分割的2N×2N CU，其中顶部为2N×0.5N PU，而底部为2N×1.5N PU。As an example, the HM supports prediction for various PU sizes. Assuming the size of a particular CU is 2N×2N, the HM supports intra prediction for PU sizes of 2N×2N or N×N, and inter prediction for symmetric PU sizes of 2N×2N, 2N×N, N×2N, or N×N. The HM also supports asymmetric partitioning for inter prediction for PU sizes of 2N×nU, 2N×nD, nL×2N, and nR×2N. In asymmetric partitioning, one direction of the CU is not partitioned, while the other direction is partitioned into 25% and 75%. The portion of the CU corresponding to the 25% partition is indicated by an "n" followed by an indication of "above," "below," "left," or "right." Thus, for example, "2N×nU" refers to a horizontally partitioned 2N×2N CU with a 2N×0.5N PU at the top and a 2N×1.5N PU at the bottom.

在本发明中，“N×N”与“N乘N”可以可互换地使用，以在垂直和水平尺寸方面指代视频块的像素尺寸，例如16×16像素或16乘16像素。一般来说，16×16块将在垂直方向上具有16个像素(y＝16)，并且在水平方向上具有16个像素(x＝16)。同样地，N×N块一般在垂直方向上具有N个像素，且在水平方向上具有N个像素，其中N表示非负整数值。块中的像素可按行及列布置。此外，块不一定需要在水平方向与垂直方向上具有相同数目的像素。举例来说，块可包括N×M像素，其中M未必等于N。In this disclosure, "NxN" and "N by N" may be used interchangeably to refer to the pixel dimensions of a video block in terms of vertical and horizontal dimensions, e.g., 16x16 pixels or 16 by 16 pixels. Generally, a 16x16 block will have 16 pixels in the vertical direction (y=16) and 16 pixels in the horizontal direction (x=16). Similarly, an NxN block generally has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. The pixels in a block may be arranged in rows and columns. Furthermore, a block need not necessarily have the same number of pixels in the horizontal and vertical directions. For example, a block may comprise NxM pixels, where M is not necessarily equal to N.

在使用CU的PU进行帧内预测性或帧间预测性译码后，视频编码器20可计算残差数据，对所述残差数据应用CU的TU所指定的变换。残差数据可对应于未经编码图片的像素与对应于CU的预测值之间的像素差。视频编码器20可形成用于CU的残差数据，且接着变换所述残差数据以产生变换系数。After performing intra-predictive or inter-predictive coding using the PUs of a CU, video encoder 20 may calculate residual data to which the transform specified by the TUs of the CU is applied. The residual data may correspond to pixel differences between pixels of an unencoded picture and prediction values corresponding to the CU. Video encoder 20 may form the residual data for the CU and then transform the residual data to produce transform coefficients.

在应用任何变换以产生变换系数之后，视频编码器20可执行变换系数的量化。量化一般指如下过程：将变换系数量化以可能地减少用以表示所述系数的数据的量，从而提供进一步压缩。所述量化过程可减少与所述系数中的一些或所有相关联的位深度。举例来说，可在量化期间将n位值下舍入到m位值，其中n大于m。After applying any transforms to generate transform coefficients, video encoder 20 may perform quantization of the transform coefficients. Quantization generally refers to the process of quantizing transform coefficients to potentially reduce the amount of data used to represent the coefficients, thereby providing further compression. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value may be rounded down to an m-bit value during quantization, where n is greater than m.

在一些实例中，视频编码器20可利用预定义扫描次序来扫描经量化的变换系数，以产生可经熵编码的串行化向量。在其它实例中，视频编码器20可执行自适应扫描。在扫描经量化的变换系数以形成一维向量之后，视频编码器20可(例如)根据上下文自适应可变长度译码(CAVLC)、上下文自适应二进制算术译码(CABAC)、基于语法的上下文自适应二进制算术译码(SBAC)、概率区间分割熵(PIPE)译码或另一熵编码方法而熵编码所述一维向量。视频编码器20还可熵编码与经编码视频数据相关联的语法元素以供视频解码器30解码视频数据。In some examples, video encoder 20 may utilize a predefined scan order to scan the quantized transform coefficients to generate a serialized vector that can be entropy encoded. In other examples, video encoder 20 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 20 may entropy encode the one-dimensional vector, for example, according to context-adaptive variable length coding (CAVLC), context-adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding, or another entropy encoding method. Video encoder 20 may also entropy encode syntax elements associated with the encoded video data for video decoder 30 to decode the video data.

为了执行CABAC，视频编码器20可将上下文模型内的上下文指派给待发射的符号。所述上下文可能涉及(例如)符号的相邻值是否为非零。为了执行CAVLC，视频编码器20可针对待发射的符号而选择可变长度码。可建构VLC中的码字，使得相对较短的码对应于更有可能的符号，而较长码对应于较不可能的符号。以此方式，使用VLC可实现位节省(与(例如)针对待发射的每一符号使用等长度码字相比较)。概率确定可基于指派给符号的上下文。To perform CABAC, video encoder 20 may assign context within a context model to a symbol to be transmitted. The context may relate to, for example, whether neighboring values of the symbol are non-zero. To perform CAVLC, video encoder 20 may select a variable length code for the symbol to be transmitted. Codewords in VLC may be constructed such that relatively shorter codes correspond to more likely symbols, while longer codes correspond to less likely symbols. In this way, using VLC may achieve bit savings (compared to, for example, using equal-length codewords for each symbol to be transmitted). Probability determinations may be based on the context assigned to the symbol.

源装置12可产生经编码位流以包含符合根据本发明中描述的技术的语法结构的语法元素。在一些实例中，视频编码器20可产生经编码位流以在用于经译码视频序列的视频参数集(VPS)语法结构中或在序列参数集(SPS)语法结构的视频可用性信息(VUI)部分中直接用信号发送界定针对以下情况的条件的所有变量：用信号发送对应于图片次序计数(POC)值的差等于1的时钟节拍的数目。换句话说，不在并入于VPS语法结构或SPS语法结构的VUI部分中的另一语法结构(例如HRD参数语法结构)中用信号发送针对用于用信号发送对应于图片次序计数(POC)值的差等于1的时钟节拍的数目的条件的语法元素，视频编码器20产生经编码位流以在VPS和/或VUI语法结构中用信号发送界定所述条件的语法元素，且无需参考潜在并入于VPS和VUI语法结构中的任一者或两者中的另一语法结构。语法元素可包含timing_info_present_flag语法元素，其在HEVC WD9中经指定为HRD参数语法结构的语法元素。结果，所述技术可通过在语法中清晰指定界定所述条件的语法元素的源而减少且潜在地消除HEVC WD9内的模糊性。Source device 12 may generate an encoded bitstream to include syntax elements that conform to syntax structures according to the techniques described in this disclosure. In some examples, video encoder 20 may generate an encoded bitstream to directly signal all variables that define a condition for signaling a number of clock ticks corresponding to a difference in picture order count (POC) values equal to one in a video parameter set (VPS) syntax structure for a coded video sequence or in a video usability information (VUI) portion of a sequence parameter set (SPS) syntax structure. In other words, rather than signaling syntax elements for the condition for signaling a number of clock ticks corresponding to a difference in picture order count (POC) values equal to one in another syntax structure (e.g., an HRD parameter syntax structure) incorporated into the VPS syntax structure or the VUI portion of the SPS syntax structure, video encoder 20 generates an encoded bitstream to signal syntax elements that define the condition in the VPS and/or VUI syntax structures without reference to another syntax structure potentially incorporated into either or both the VPS and VUI syntax structures. The syntax elements may include a timing_info_present_flag syntax element, which is specified as a syntax element of the HRD parameter syntax structure in HEVC WD9. As a result, the techniques may reduce and potentially eliminate ambiguity within HEVC WD9 by clearly specifying in the syntax the source of the syntax elements defining the condition.

视频编码器20可测试经编码位流对于要求的符合性，所述要求经指定为在视频译码规范(例如HEVC WD9)或后续规范(例如HEVC WD10)中定义的一或多个位流符合性测试。视频编码器20可包含或以其它方式使用假想参考解码器以测试经编码位流的符合性。根据本文描述的技术，视频编码器20可通过以下操作来测试经编码位流的符合性：解码经编码位流以从经译码视频序列的VPS语法结构或在SPS语法结构的VUI部分中确定界定针对以下情况的条件的语法元素：用信号发送对应于POC值的差等于1的时钟节拍的数目。如果所述条件根据语法元素值而成立，则视频编码器20可确定对应于POC值的差等于1的时钟节拍的数目，且使用所确定数目个时钟节拍作为输入用于(例如)在包含于经编码位流中的经编码图片的解码期间确定CPB下溢或上溢。Video encoder 20 may test an encoded bitstream for compliance with requirements specified as one or more bitstream conformance tests defined in a video coding specification (e.g., HEVC WD9) or a subsequent specification (e.g., HEVC WD10). Video encoder 20 may include or otherwise use a hypothetical reference decoder to test the conformance of an encoded bitstream. According to the techniques described herein, video encoder 20 may test the conformance of an encoded bitstream by decoding the encoded bitstream to determine, from a VPS syntax structure of a coded video sequence or a VUI portion of an SPS syntax structure, a syntax element that defines a condition for signaling a number of clock ticks corresponding to a difference in POC values equal to 1. If the condition holds true according to the syntax element value, video encoder 20 may determine the number of clock ticks corresponding to a difference in POC values equal to 1 and use the determined number of clock ticks as input for determining CPB underflow or overflow, for example, during decoding of an encoded picture included in the encoded bitstream.

在一些情况下，在目的地装置14处，处于测试中的视频解码器30(或VUT)在一些情况下可接收视频编码器20所产生的经编码位流的表示，以在经译码视频序列的VPS语法结构中或在SPS语法结构的VUI部分中直接用信号发送界定针对以下情况的条件的所有语法元素：用信号发送对应于图片次序计数(POC)值的差等于1的时钟节拍的数目。视频解码器30可解码经编码位流以从经译码视频序列的VPS语法结构或在SPS语法结构的VUI部分中确定界定针对以下情况的条件的语法元素：用信号发送对应于POC值的差等于1的时钟节拍的数目。如果所述条件根据语法元素值而成立，则视频解码器30可确定对应于POC值的差等于1的时钟节拍的数目，且使用所确定数目个时钟节拍作为输入以用于(例如)在包含于经编码位流中的经编码图片的解码期间确定CPB下溢或上溢。In some cases, at destination device 14, video decoder 30 under test (or VUT) may, in some cases, receive a representation of an encoded bitstream generated by video encoder 20 to directly signal, in a VPS syntax structure for the coded video sequence or in a VUI portion of an SPS syntax structure, all syntax elements that define a condition for signaling a number of clock ticks corresponding to a difference in picture order count (POC) values being equal to 1. Video decoder 30 may decode the encoded bitstream to determine, from the VPS syntax structure for the coded video sequence or in the VUI portion of the SPS syntax structure, the syntax elements that define a condition for signaling a number of clock ticks corresponding to a difference in POC values being equal to 1. If the condition holds true according to the syntax element values, video decoder 30 may determine the number of clock ticks corresponding to a difference in POC values being equal to 1 and use the determined number of clock ticks as input for determining CPB underflow or overflow, for example, during decoding of an encoded picture included in the encoded bitstream.

在一些实例中，视频编码器20可产生经编码位流以在给定经译码视频序列的VPS语法结构和VUI语法结构中的每一者中用信号发送时间尺度和时钟节拍中的单位数目至多一次。即，在用于经编码位流的给定VPS语法结构中，视频编码器20可用信号发送时间尺度和时钟节拍语法元素中的单位数目至多一次。类似地，在用于经编码位流的给定VUI语法结构(例如，SPS语法结构的VUI部分)中，视频编码器20可用信号发送时间尺度和时钟节拍语法元素中的单位数目至多一次。结果，根据本文描述的技术而操作的视频编码器20可减少在经编码位流中的时间尺度语法元素(按照WD9的time_scale)的实例数目和时钟节拍(按照WD9的num_units_in_tick)语法元素中的单位数目。另外，在一些情况下，视频编码器20可产生经编码位流以在给定经译码视频序列的VPS和VUI语法结构中的每一者中直接用信号发送时间尺度和时钟节拍中的单位数目，而非在并入于VPS和/或VUI语法结构内的HRD参数语法结构中。In some examples, video encoder 20 may generate an encoded bitstream to signal the number of units in the time scale and ticks at most once in each of the VPS syntax structure and the VUI syntax structure for a given coded video sequence. That is, in a given VPS syntax structure for an encoded bitstream, video encoder 20 may signal the number of units in the time scale and ticks syntax element at most once. Similarly, in a given VUI syntax structure for an encoded bitstream (e.g., the VUI portion of an SPS syntax structure), video encoder 20 may signal the number of units in the time scale and ticks syntax element at most once. As a result, video encoder 20 operating according to the techniques described herein may reduce the number of instances of the time scale syntax element (time_scale according to WD9) and the number of units in the ticks (num_units_in_tick according to WD9) syntax element in an encoded bitstream. Additionally, in some cases, video encoder 20 may generate an encoded bitstream to signal the time scale and the number of units in clock ticks directly in each of the VPS and VUI syntax structures for a given coded video sequence, rather than in an HRD parameter syntax structure incorporated within the VPS and/or VUI syntax structures.

根据本文描述的技术，视频编码器20可通过以下操作来测试经编码位流(其由视频编码器20产生以在用于给定经译码时序的VPS和VUI语法结构中的每一者中用信号发送时间尺度和时钟节拍中的单位数目至多一次)的符合性：解码经编码位流以从经编码位流的VPS语法结构来确定时间尺度和时钟节拍中的单位数目，所述经编码位流在VPS语法结构中对时间尺度和时钟节拍语法元素中的单位数目进行编码至多一次。在一些情况下，视频编码器20可通过以下操作来测试经编码位流的符合性：解码经编码位流以从经编码位流的VUI语法结构来确定时间尺度和时钟节拍中的单位数目，所述经编码位流在VUI语法结构中对时间尺度和时钟节拍语法元素中的单位数目进行编码至多一次。时间尺度和时钟节拍中的单位数目可不在并入于VPS和/或VUI语法结构内的HRD参数语法结构中用信号发送。视频编码器20可使用所确定的时间尺度和所确定的时钟节拍中的单位数目作为输入，以用于(例如)在包含于经编码位流中的经编码图片的解码期间确定CPB下溢或上溢。According to the techniques described herein, video encoder 20 may test the conformance of an encoded bitstream (generated by video encoder 20 to signal the time scale and the number of units in clock ticks at most once in each of the VPS and VUI syntax structures for a given coded timing) by decoding the encoded bitstream to determine the time scale and the number of units in clock ticks from the VPS syntax structure of the encoded bitstream, in which the encoded bitstream encodes the number of units in the time scale and clock tick syntax element at most once. In some cases, video encoder 20 may test the conformance of the encoded bitstream by decoding the encoded bitstream to determine the time scale and the number of units in clock ticks from the VUI syntax structure of the encoded bitstream, in which the encoded bitstream encodes the number of units in the time scale and clock tick syntax element at most once. The time scale and the number of units in clock ticks may not be signaled in an HRD parameter syntax structure incorporated within the VPS and/or VUI syntax structures. Video encoder 20 may use the determined time scale and the determined number of units in the clock tick as input for determining CPB underflow or overflow, eg, during decoding of an encoded picture included in the encoded bitstream.

在一些情况下，在目的地装置14处，处于测试中的视频解码器30可在一些情况下接收视频编码器20所产生的经编码位流的表示，以在给定经译码视频序列的VPS和VUI语法结构中的每一者中用信号发送时间尺度和时钟节拍中的单位数目至多一次。视频解码器30可解码经编码位流以从经编码位流的VPS语法结构来确定时间尺度和时钟节拍中的单位数目，所述经编码位流在VPS语法结构中对时间尺度和时钟节拍语法元素中的单位数目进行编码至多一次。在一些情况下，视频解码器30可通过以下操作来测试经编码位流的符合性：解码经编码位流以从经编码位流的VUI语法结构来确定时间尺度和时钟节拍中的单位数目，所述经编码位流在VUI语法结构中对时间尺度和时钟节拍语法元素中的单位数目进行编码至多一次。时间尺度和时钟节拍中的单位数目可不在并入于VPS和/或VUI语法结构内的HRD参数语法结构中用信号发送。视频解码器30可使用所确定的时间尺度和所确定的时钟节拍中的单位数目作为输入，以用于(例如)在包含于经编码位流中的经编码图片的解码期间确定CPB下溢或上溢。In some cases, at destination device 14, video decoder 30 under test may, in some cases, receive a representation of an encoded bitstream generated by video encoder 20 to signal the time scale and the number of units in the ticks at most once in each of the VPS and VUI syntax structures for a given coded video sequence. Video decoder 30 may decode the encoded bitstream to determine the time scale and the number of units in the ticks from the VPS syntax structure of the encoded bitstream, in which the encoded bitstream encodes the number of units in the time scale and ticks syntax element at most once. In some cases, video decoder 30 may test the conformance of the encoded bitstream by decoding the encoded bitstream to determine the time scale and the number of units in the ticks from the VUI syntax structure of the encoded bitstream, in which the encoded bitstream encodes the number of units in the time scale and ticks syntax element at most once. The time scale and the number of units in the ticks may not be signaled in the HRD parameter syntax structure incorporated into the VPS and/or VUI syntax structures. Video decoder 30 may use the determined time scale and the determined number of units in the clock tick as input for determining CPB underflow or overflow, eg, during decoding of an encoded picture included in the encoded bitstream.

在一些实例中，视频编码器20可产生经编码位流以在一或多个经译码视频序列的VPS语法结构中用信号发送旗标，所述旗标指示经译码视频序列中的每一图片(除经译码视频序列中按解码次序的第一图片外)的POC值是否相对于经译码视频序列中的第一图片的输出时间而与所述图片的输出时间成比例。此指示旗标可替代地被称作为POC与时序成比例指示旗标。结果，视频编码器20可减少针对经译码视频序列的多个层和/或具有多个层的可缩放视频位流而用信号发送的时序信息中的指示的实例数目。在一些情况下，视频编码器20可仅在还包含时间尺度和时钟节拍语法元素中的单位数目的情况下在VPS语法结构中包含此旗标。视频编码器20可以此方式在使用POC与时序成比例指示所需的时钟节拍信息还不存在的情况下而避免用信号发送此特定时序信息(即，经译码视频序列中的每一图片(除经译码视频序列中按解码次序的第一图片外)的POC值是否相对于经译码视频序列中的第一图片的输出时间而与所述图片的输出时间成比例)。In some examples, video encoder 20 may generate an encoded bitstream to signal a flag in a VPS syntax structure for one or more coded video sequences that indicates whether the POC value of each picture in the coded video sequence (except the first picture in decoding order in the coded video sequence) is proportional to the output time of the picture relative to the output time of the first picture in the coded video sequence. This indication flag may alternatively be referred to as a POC proportional to timing indication flag. As a result, video encoder 20 may reduce the number of instances of indication in the timing information signaled for multiple layers of a coded video sequence and/or a scalable video bitstream having multiple layers. In some cases, video encoder 20 may include this flag in the VPS syntax structure only if the number of units in the timescale and clock tick syntax elements are also included. In this way, video encoder 20 can avoid signaling specific timing information (i.e., whether the POC value of each picture in the coded video sequence (except the first picture in the coded video sequence in decoding order) is proportional to the output time of the picture relative to the output time of the first picture in the coded video sequence) when the clock tick information required to use the POC proportional to timing indication does not already exist).

根据本文描述的技术，视频编码器20可测试经编码位流(其由视频编码器20产生)的符合性，以在一或多个经译码视频序列的VPS语法结构中用信号发送POC与时序成比例指示旗标。视频编码器20可通过解码经编码位流以确定所述旗标的值来测试经编码位流的符合性。视频编码器20可另外或替代地测试视频编码器20所产生的经编码位流，以仅在还包含时间尺度和时钟节拍语法元素中的单位数目的情况下在VPS语法结构中用信号发送所述旗标。视频编码器20可使用所确定的POC与时序成比例指示旗标的值以及时间尺度和时钟节拍语法元素中的单位数目作为输入，以用于(例如)在包含于经编码位流中的经编码图片的解码期间确定CPB下溢或上溢。According to the techniques described herein, video encoder 20 may test an encoded bitstream (generated by video encoder 20) for conformance to a signaled POC proportional to timing indication flag in a VPS syntax structure for one or more coded video sequences. Video encoder 20 may test the conformance of the encoded bitstream by decoding the encoded bitstream to determine the value of the flag. Video encoder 20 may additionally or alternatively test the encoded bitstream generated by video encoder 20 to signal the flag in the VPS syntax structure only if the flag is also included in the time scale and clock tick syntax element. Video encoder 20 may use the determined value of the POC proportional to timing indication flag and the number of units in the time scale and clock tick syntax element as input to determine, for example, CPB underflow or overflow during decoding of an encoded picture included in the encoded bitstream.

在一些情况下，在目的地装置14处，处于测试中的视频解码器30可在一些情况下接收视频编码器20所产生的经编码位流的表示，以在一或多个经译码视频序列的VPS语法结构中用信号发送POC与时序成比例指示旗标。视频解码器30可通过解码经编码位流以确定所述旗标的值来测试经编码位流的符合性。视频解码器30可另外或替代地测试视频解码器30所产生的经编码位流，以仅在还包含时间尺度和时钟节拍语法元素中的单位数目的情况下在VPS语法结构中用信号发送所述旗标。视频解码器30可使用所确定的POC与时序成比例指示旗标的值以及时间尺度和时钟节拍语法元素中的单位数目作为输入，以用于(例如)在包含于经编码位流中的经编码图片的解码期间确定CPB下溢或上溢。In some cases, at destination device 14, video decoder 30 under test may, in some cases, receive a representation of an encoded bitstream generated by video encoder 20 to signal a POC proportional to timing indication flag in a VPS syntax structure for one or more coded video sequences. Video decoder 30 may test the encoded bitstream for compliance by decoding the encoded bitstream to determine the value of the flag. Video decoder 30 may additionally or alternatively test the encoded bitstream generated by video decoder 30 to signal the flag in the VPS syntax structure only if the flag is also included in the time scale and clock tick syntax element. Video decoder 30 may use the determined value of the POC proportional to timing indication flag and the number of units in the time scale and clock tick syntax element as input to, for example, determine CPB underflow or overflow during decoding of an encoded picture included in the encoded bitstream.

图2是说明可实施本发明中描述的技术的实例视频编码器20的框图。视频编码器20可以执行视频切片内的视频块的帧内译码及帧间译码。帧内译码依赖于空间预测来减少或去除给定视频帧或图片内的视频中的空间冗余。帧间译码依赖于时间预测来减少或去除视频序列的邻接帧或图片内的视频中的时间冗余。帧内模式(I模式)可以指若干基于空间的压缩模式中的任一者。例如单向预测(P模式)或双向预测(B模式)等帧间模式可以指若干基于时间的压缩模式中的任一者。FIG2 is a block diagram illustrating an example video encoder 20 that may implement the techniques described in this disclosure. Video encoder 20 may perform intra- and inter-coding of video blocks within a video slice. Intra-coding relies on spatial prediction to reduce or remove spatial redundancy in video within a given video frame or picture. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy in video within adjacent frames or pictures of a video sequence. Intra-mode (I-mode) may refer to any of several spatially-based compression modes. Inter-modes, such as unidirectional prediction (P-mode) or bidirectional prediction (B-mode), may refer to any of several temporally-based compression modes.

在图2的实例中，视频编码器20包含分割单元35、预测模块41、参考图片存储器64、求和器50、变换模块52、量化单元54和熵编码单元56。预测模块41包含运动估计单元42、运动补偿单元44和帧内预测模块46。对于视频块重构，视频编码器20还包含反量化单元58、反变换模块60和求和器62。还可包含解块滤波器(图2中未展示)以对块边界进行滤波以从经重构的视频移除成块性假影。如果需要，则解块滤波器通常将对求和器62的输出进行滤波。除解块滤波器之外，还可使用额外的环路滤波器(环路内或环路后)。2 , video encoder 20 includes a partitioning unit 35, a prediction module 41, a reference picture memory 64, a summer 50, a transform module 52, a quantization unit 54, and an entropy encoding unit 56. Prediction module 41 includes a motion estimation unit 42, a motion compensation unit 44, and an intra-prediction module 46. For video block reconstruction, video encoder 20 also includes an inverse quantization unit 58, an inverse transform module 60, and a summer 62. A deblocking filter (not shown in FIG. 2 ) may also be included to filter block boundaries to remove blockiness artifacts from the reconstructed video. If necessary, the deblocking filter will typically filter the output of summer 62. In addition to the deblocking filter, additional loop filters (in-loop or post-loop) may also be used.

如图2中所展示，视频编码器20接收视频数据，且分割单元35将数据分割成视频块。此分割还可包含分割成切片、图像块或其它较大单元，以及例如根据LCU及CU的四叉树结构的视频块分割。视频编码器20大体说明编码待编码视频切片内的视频块的组件。所述切片可以划分成多个视频块(且可能划分成被称作图像块的数组视频块)。预测模块41可基于错误结果(例如，译码速率及失真等级)针对当前视频块选择多种可能译码模式中的一者，例如，多种帧内译码模式中的一者或多种帧间译码模式中的一者。预测模块41可将所得的经帧内译码或经帧间译码块提供到求和器50以产生残差块数据，且提供到求和器62以重构经编码块以用于用作参考图片。As shown in FIG2 , video encoder 20 receives video data, and partitioning unit 35 partitions the data into video blocks. This partitioning may also include partitioning into slices, tiles, or other larger units, as well as video block partitioning according to a quadtree structure of LCUs and CUs, for example. Video encoder 20 generally illustrates components that encode video blocks within a video slice to be encoded. The slice may be divided into multiple video blocks (and possibly into arrays of video blocks referred to as tiles). Prediction module 41 may select one of multiple possible coding modes for the current video block, such as one of multiple intra-coding modes or one of multiple inter-coding modes, based on error results (e.g., coding rate and distortion level). Prediction module 41 may provide the resulting intra-coded or inter-coded block to summer 50 to generate residual block data, and to summer 62 to reconstruct the encoded block for use as a reference picture.

预测模块41内的帧内预测模块46可相对于与待译码的当前块在相同的帧或切片中的一或多个相邻块执行当前视频块的帧内预测译码，以提供空间压缩。预测模块41内的运动估计单元42及运动补偿单元44相对于一或多个参考图片中的一或多个预测块执行当前视频块的帧间预测译码以提供时间压缩。Intra-prediction module 46 within prediction module 41 may perform intra-predictive coding of the current video block relative to one or more neighboring blocks in the same frame or slice as the current block to be coded to provide spatial compression. Motion estimation unit 42 and motion compensation unit 44 within prediction module 41 may perform inter-predictive coding of the current video block relative to one or more prediction blocks in one or more reference pictures to provide temporal compression.

运动估计单元42可经配置以根据用于视频序列的预定模式为视频切片确定帧间预测模式。预定模式可将序列中的视频切片指定为P切片、B切片或GPB切片。运动估计单元42与运动补偿单元44可高度集成，但出于概念上的目的而分开予以说明。运动估计单元42所执行的运动估计是产生运动向量的过程，所述过程估计视频块的运动。举例来说，运动向量可以指示当前视频帧或图片内的视频块的PU相对于参考图片内的预测块的移位。Motion estimation unit 42 may be configured to determine an inter-prediction mode for a video slice according to a predetermined pattern for a video sequence. The predetermined pattern may designate a video slice in the sequence as a P slice, a B slice, or a GPB slice. Motion estimation unit 42 and motion compensation unit 44 may be highly integrated but are illustrated separately for conceptual purposes. Motion estimation performed by motion estimation unit 42 is the process of generating motion vectors, which estimate the motion of video blocks. For example, a motion vector may indicate the displacement of a PU of a video block within a current video frame or picture relative to a prediction block within a reference picture.

预测块是被发现在像素差方面与待译码的视频块的PU紧密匹配的块，像素差可通过绝对差总和(SAD)、平方差总和(SSD)或其它差异度量来确定。在一些实例中，视频编码器20可计算存储于参考图片存储器64中的参考图片的子整数像素位置的值。举例来说，视频编码器20可以内插参考图片的四分之一像素位置、八分之一像素位置或其它分数像素位置的值。因此，运动估计单元42可相对于全像素位置和分数像素位置执行运动搜索并且输出具有分数像素精度的运动向量。A prediction block is a block that is found to closely match a PU of the video block to be coded in terms of pixel difference, which may be determined by sum of absolute difference (SAD), sum of squared difference (SSD), or other difference metrics. In some examples, video encoder 20 may calculate values for sub-integer pixel positions of a reference picture stored in reference picture memory 64. For example, video encoder 20 may interpolate values for quarter-pixel positions, eighth-pixel positions, or other fractional pixel positions of the reference picture. Thus, motion estimation unit 42 may perform a motion search relative to full-pixel positions and fractional pixel positions and output a motion vector with fractional pixel precision.

运动估计单元42通过比较PU的位置与参考图片的预测块的位置来计算经帧间译码切片中的视频块的PU的运动向量。参考图片可选自第一参考图片列表(列表0)或第二参考图片列表(列表1)，所述参考图片列表中的每一者识别存储在参考图片存储器64中的一或多个参考图片。运动估计单元42将计算出来的运动向量发送到熵编码单元56及运动补偿单元44。Motion estimation unit 42 calculates a motion vector for a PU of a video block in an inter-coded slice by comparing the position of the PU with the position of a prediction block of a reference picture. The reference picture may be selected from a first reference picture list (List 0) or a second reference picture list (List 1), each of which identifies one or more reference pictures stored in reference picture memory 64. Motion estimation unit 42 sends the calculated motion vector to entropy encoding unit 56 and motion compensation unit 44.

通过运动补偿单元44执行的运动补偿可以涉及基于通过运动估计(可能执行对子像素精确度的内插)确定的运动向量获取或产生预测块。在接收到当前视频块的PU的运动向量后，运动补偿单元44即刻可以在参考图片列表中的一者中定位所述运动向量指向的预测块。视频编码器20通过从正被译码的当前视频块的像素值减去预测块的像素值从而形成像素差值来形成残差视频块。像素差值形成用于所述块的残差数据，并且可包含亮度及色度差分量两者。求和器50表示执行此减法运算的一或多个组件。运动补偿单元44还可产生与视频块及视频切片相关联的语法元素55以供视频解码器30在解码视频切片的视频块时使用。Motion compensation performed by motion compensation unit 44 may involve retrieving or generating a prediction block based on a motion vector determined by motion estimation (possibly performing interpolation to sub-pixel precision). Upon receiving the motion vector for the PU of the current video block, motion compensation unit 44 may locate the prediction block to which the motion vector points in one of the reference picture lists. Video encoder 20 forms a residual video block by subtracting the pixel values of the prediction block from the pixel values of the current video block being coded, thereby forming pixel difference values. The pixel difference values form the residual data for the block and may include both luma and chroma difference components. Summer 50 represents the one or more components that perform this subtraction operation. Motion compensation unit 44 may also generate syntax elements 55 associated with the video block and video slice for use by video decoder 30 when decoding the video block of the video slice.

运动补偿单元44可产生符合根据本发明中描述的技术的语法结构的语法元素55。在一些实例中，视频编码器20可产生语法元素55以在与视频块相关联的视频参数集(VPS)语法结构中或在序列参数集(SPS)语法结构的视频可用性信息(VUI)部分中直接用信号发送界定针对以下情况的条件的所有语法元素：用信号发送对应于图片次序计数(POC)值的差等于1的时钟节拍的数目。换句话说，不在并入于VPS语法结构或SPS语法结构的VUI部分中的另一语法结构(例如HRD参数语法结构)中用信号发送针对用于用信号发送对应于图片次序计数(POC)值的差等于1的时钟节拍的数目的条件的语法元素，运动补偿单元44产生经编码位流以在VPS和/或VUI语法结构中用信号发送界定所述条件的语法元素，且无需参考潜在并入于VPS和VUI语法结构中的任一者或两者中的另一语法结构。Motion compensation unit 44 may generate syntax elements 55 that conform to a syntax structure according to the techniques described in this disclosure. In some examples, video encoder 20 may generate syntax elements 55 to directly signal, in a video parameter set (VPS) syntax structure associated with the video block, or in a video usability information (VUI) portion of a sequence parameter set (SPS) syntax structure, all syntax elements that define a condition for signaling the number of clock ticks corresponding to a difference in picture order count (POC) values being equal to one. In other words, rather than signaling syntax elements for the condition for signaling the number of clock ticks corresponding to a difference in picture order count (POC) values being equal to one in another syntax structure (e.g., an HRD parameter syntax structure) incorporated into the VPS syntax structure or the VUI portion of the SPS syntax structure, motion compensation unit 44 generates an encoded bitstream to signal the syntax elements defining the condition in the VPS and/or VUI syntax structures, without reference to another syntax structure potentially incorporated into either or both the VPS and VUI syntax structures.

在一些实例中，运动补偿单元44可产生语法元素55以在给定经译码视频序列的VPS和VUI语法结构中的每一者中用信号发送时间尺度和时钟节拍中的单位数目至多一次。即，在用于经编码位流的给定VPS语法结构中，运动补偿单元44可产生语法元素55以用信号发送时间尺度和时钟节拍语法元素中的单位数目至多一次。类似地，在用于经编码位流的给定VUI语法结构(例如，SPS语法结构的VUI部分)中，运动补偿单元44可产生语法元素55以用信号发送时间尺度和时钟节拍语法元素中的单位数目至多一次。另外，在一些情况下，运动补偿单元44可产生语法元素55以在给定经译码视频序列的VPS和VUI语法结构中的每一者中直接用信号发送时间尺度和时钟节拍中的单位数目，而非在并入于VPS和/或VUI语法结构内的HRD参数语法结构中。In some examples, motion compensation unit 44 may generate syntax element 55 to signal the time scale and the number of units in the clock tick at most once in each of the VPS and VUI syntax structures for a given coded video sequence. That is, in a given VPS syntax structure for an encoded bitstream, motion compensation unit 44 may generate syntax element 55 to signal the number of units in the time scale and clock tick syntax element at most once. Similarly, in a given VUI syntax structure for an encoded bitstream (e.g., the VUI portion of an SPS syntax structure), motion compensation unit 44 may generate syntax element 55 to signal the number of units in the time scale and clock tick syntax element at most once. In addition, in some cases, motion compensation unit 44 may generate syntax element 55 to signal the time scale and the number of units in the clock tick directly in each of the VPS and VUI syntax structures for a given coded video sequence, rather than in an HRD parameter syntax structure incorporated within the VPS and/or VUI syntax structures.

在一些实例中，运动补偿单元44可产生语法元素55以在一或多个经译码视频序列的VPS语法结构中用信号发送旗标，所述旗标指示经译码视频序列中的每一图片(除经译码视频序列中按解码次序的第一图片外)的POC值是否相对于经译码视频序列中的第一图片的输出时间而与所述图片的输出时间成比例。此指示旗标可替代地被称作为POC与时序成比例指示旗标。结果，运动补偿单元44可减少针对经译码视频序列的多个层和/或具有多个层的可缩放视频位流而用信号发送的时序信息中的指示的实例数目。在一些情况下，运动补偿单元44可仅在还包含时间尺度和时钟节拍语法元素中的单位数目的情况下在VPS语法结构中包含此旗标。运动补偿单元44可以此方式在使用POC与时序成比例指示所需的时钟节拍信息还不存在的情况下而避免用信号发送此特定时序信息(即，经译码视频序列中的每一图片(除经译码视频序列中按解码次序的第一图片外)的POC值是否相对于经译码视频序列中的第一图片的输出时间而与所述图片的输出时间成比例)。In some examples, motion compensation unit 44 may generate syntax element 55 to signal a flag in a VPS syntax structure for one or more coded video sequences that indicates whether the POC value for each picture in the coded video sequence (except the first picture in decoding order in the coded video sequence) is proportional to the output time of the picture relative to the output time of the first picture in the coded video sequence. This indication flag may alternatively be referred to as a POC proportional to timing indication flag. As a result, motion compensation unit 44 may reduce the number of instances of indication in the timing information signaled for multiple layers of a coded video sequence and/or a scalable video bitstream having multiple layers. In some cases, motion compensation unit 44 may include this flag in the VPS syntax structure only if the number of units in the timescale and clock tick syntax elements are also included. In this manner, motion compensation unit 44 may avoid signaling specific timing information (i.e., whether the POC value for each picture in the coded video sequence (except the first picture in decoding order in the coded video sequence) is proportional to the output time of the picture relative to the output time of the first picture in the coded video sequence) if the clock tick information required to use the POC proportional to timing indication does not already exist.

用以实行用于产生语法元素55的上述技术的HEVC WD9文本的实例改变如下(其它未提及部分可相对于HEVC WD9未修改)：Example changes to the HEVC WD9 text to implement the above-described techniques for generating syntax element 55 are as follows (other unmentioned portions may be unmodified relative to HEVC WD9):

以下为经修改以解决上述问题中的一或多者的视频参数集RBSP语法结构的实例(加下划线的语法为对HEVC WD9的视频参数集RBSP语法结构的添加；其它语法可相对于HEVC WD9而不变)：The following is an example of a video parameter set RBSP syntax structure modified to address one or more of the above issues (the underlined syntax is an addition to the video parameter set RBSP syntax structure of HEVC WD9; other syntax may be unchanged relative to HEVC WD9):

表4：实例视频参数集RBSP语法结构Table 4: Example video parameter set RBSP syntax structure

表4根据以下视频参数集(VPS)RBSP语义来定义新添加的语法元素：Table 4 defines the newly added syntax elements according to the following video parameter set (VPS) RBSP semantics:

vps_timing_info_present_flag等于1指定vps_num_units_in_tick、vps_time_scale和vps_poc_proportional_to_timing_flag存在于视频参数集中。vps_timing_info_present_flag等于0指定vps_num_units_in_tick、vps_time_scale和vps_poc_proportional_to_timing_flag不存在于视频参数集中。vps_timing_info_present_flag equal to 1 specifies that vps_num_units_in_tick, vps_time_scale, and vps_poc_proportional_to_timing_flag are present in the video parameter set. vps_timing_info_present_flag equal to 0 specifies that vps_num_units_in_tick, vps_time_scale, and vps_poc_proportional_to_timing_flag are not present in the video parameter set.

vps_num_units_in_tick为在对应于时钟节拍计数器的一个增量(称作时钟节拍)的频率vps_time_scale Hz下操作的时钟的时间单位的数目。vps_num_units_in_tick的值应大于0。以秒为单位的时钟节拍等于vps_num_units_in_tick除以vps_time_scale的商。举例来说，当视频信号的图片速率为25Hz时，vps_time_scale可等于27,000,000，且vps_num_units_in_tick可等于1,080,000，且因此时钟节拍可为0.04秒。vps_num_units_in_tick is the number of time units of a clock operating at a frequency vps_time_scale Hz corresponding to one increment of the clock tick counter (called a clock tick). The value of vps_num_units_in_tick should be greater than 0. The clock tick in seconds is equal to the quotient of vps_num_units_in_tick divided by vps_time_scale. For example, when the picture rate of the video signal is 25 Hz, vps_time_scale may be equal to 27,000,000 and vps_num_units_in_tick may be equal to 1,080,000, and thus the clock tick may be 0.04 seconds.

vps_time_scale为在一秒中经过的时间单位的数目。举例来说，使用27MHz频率来测量时间的时间坐标系具有为27,000,000的vps_time_scale。vps_time_scale的值应大于0。vps_time_scale is the number of time units that pass in one second. For example, a time coordinate system that uses a 27 MHz frequency to measure time has a vps_time_scale of 27,000,000. The value of vps_time_scale should be greater than 0.

vps_poc_proportional_to_timing_flag等于1指示经译码视频序列中的每一图片(除经译码视频序列中按解码次序的第一图片外)的图片次序计数值相对于经译码视频序列中的第一图片的输出时间而与所述图片的输出时间成比例。vps_poc_proportional_to_timing_flag等于0指示经译码视频序列中的每一图片(除经译码视频序列中按解码次序的第一图片外)的图片次序计数值相对于经译码视频序列中的第一图片的输出时间可能与或可能不与所述图片的输出时间成比例。vps_poc_proportional_to_timing_flag equal to 1 indicates that the picture order count value of each picture in the coded video sequence (except the first picture in decoding order in the coded video sequence) is proportional to the output time of the picture relative to the output time of the first picture in the coded video sequence. vps_poc_proportional_to_timing_flag equal to 0 indicates that the picture order count value of each picture in the coded video sequence (except the first picture in decoding order in the coded video sequence) may or may not be proportional to the output time of the picture relative to the output time of the first picture in the coded video sequence.

vps_num_ticks_poc_diff_one_minus1加1指定对应于图片次序计数值的差等于1的时钟节拍的数目。vps_num_ticks_poc_diff_one_minus1的值应在0到2^32-1的范围中(包括0和2^32-1)。vps_num_ticks_poc_diff_one_minus1 plus 1 specifies the number of clock ticks corresponding to the difference in picture order count values equal to 1. The value of vps_num_ticks_poc_diff_one_minus1 should be in the range of 0 to 2^32-1, inclusive.

以下为经修改以解决上述问题中的一或多者的VUI参数语法结构的实例(加下划线的语法为对HEVC WD9的VUI参数语法结构的添加；从HEVC WD9的VUI参数语法结构移除斜体的语法；其它语法可相对于HEVC WD9而不变)：The following is an example of a VUI parameter syntax structure modified to address one or more of the above issues (underlined syntax is an addition to the VUI parameter syntax structure of HEVC WD9; syntax in italics is removed from the VUI parameter syntax structure of HEVC WD9; other syntax may be unchanged relative to HEVC WD9):

表5：实例修改的VUI参数语法结构Table 5: VUI parameter syntax structure for instance modification

表5根据以下VUI参数语义(同样移除用于移除的语法元素的语义)来定义新添加的语法元素：Table 5 defines the newly added syntax elements according to the following VUI parameter semantics (also removing the semantics for the removed syntax elements):

sps_timing_info_present_flag等于1指定sps_num_units_in_tick、sps_time_scale和sps_poc_proportional_to_timing_flag存在于vui_parameters()语法结构中。sps_timing_info_present_flag等于0指定sps_num_units_in_tick、sps_time_scale和sps_poc_proportional_to_timing_flag不存在于vui_parameters()语法结构中。sps_timing_info_present_flag equal to 1 specifies that sps_num_units_in_tick, sps_time_scale, and sps_poc_proportional_to_timing_flag are present in the vui_parameters() syntax structure. sps_timing_info_present_flag equal to 0 specifies that sps_num_units_in_tick, sps_time_scale, and sps_poc_proportional_to_timing_flag are not present in the vui_parameters() syntax structure.

sps_num_units_in_tick为在对应于时钟节拍计数器的一个增量(称作时钟节拍)的频率sps_time_scale Hz下操作的时钟的时间单位的数目。sps_num_units_in_tick应大于0。以秒为单位的时钟节拍等于sps_num_units_in_tick除以sps_time_scale的商。举例来说，当视频信号的图片速率为25Hz时，sps_time_scale可等于27,000,000，且sps_num_units_in_tick可等于1,080,000，且因此时钟节拍可等于0.04秒(参见等式(1))。当vps_num_units_in_tick存在于序列参数集所参考的视频参数集中时，sps_num_units_in_tick(当存在时)应等于vps_num_units_in_tick。sps_num_units_in_tick is the number of time units of a clock operating at a frequency of sps_time_scale Hz corresponding to one increment of the clock tick counter (called a clock tick). sps_num_units_in_tick shall be greater than 0. The clock tick in seconds is equal to the quotient of sps_num_units_in_tick divided by sps_time_scale. For example, when the picture rate of the video signal is 25 Hz, sps_time_scale may be equal to 27,000,000 and sps_num_units_in_tick may be equal to 1,080,000, and thus the clock tick may be equal to 0.04 seconds (see equation (1)). When vps_num_units_in_tick is present in the video parameter set referenced by the sequence parameter set, sps_num_units_in_tick (when present) shall be equal to vps_num_units_in_tick.

将用于导出变量ClockTick(在本文还称作“时钟节拍”)的公式修改为如下：The formula for deriving the variable ClockTick (also referred to herein as "clock tick") is modified as follows:

等式(1) Equation (1)

sps_time_scale为在一秒中经过的时间单位的数目。举例来说，使用27MHz频率来测量时间的时间坐标系具有为27,000,000的sps_time_scale。sps_time_scale的值应大于0。当vps_time_scale存在于序列参数集所参考的视频参数集中时，sps_time_scale(当存在时)应等于vps_time_scale。sps_time_scale is the number of time units that pass in one second. For example, a time coordinate system using a 27 MHz frequency to measure time has an sps_time_scale of 27,000,000. The value of sps_time_scale should be greater than 0. When vps_time_scale is present in a video parameter set referenced by a sequence parameter set, sps_time_scale (if present) should be equal to vps_time_scale.

sps_poc_proportional_to_timing_flag等于1指示经译码视频序列中的每一图片(除经译码视频序列中按解码次序的第一图片外)的图片次序计数值相对于经译码视频序列中的第一图片的输出时间而与所述图片的输出时间成比例。sps_poc_proportional_to_timing_flag等于0指示经译码视频序列中的每一图片(除经译码视频序列中按解码次序的第一图片外)的图片次序计数值相对于经译码视频序列中的第一图片的输出时间可能与或可能不与所述图片的输出时间成比例。当vps_poc_proportional_to_timing_flag存在于序列参数集所参考的视频参数集中时，sps_poc_proportional_to_timing_flag(当存在时)应等于vps_poc_proportional_to_timing_flag。sps_poc_proportional_to_timing_flag equal to 1 indicates that the picture order count value of each picture in the coded video sequence (except the first picture in decoding order in the coded video sequence) is proportional to the output time of the picture relative to the output time of the first picture in the coded video sequence. sps_poc_proportional_to_timing_flag equal to 0 indicates that the picture order count value of each picture in the coded video sequence (except the first picture in decoding order in the coded video sequence) may or may not be proportional to the output time of the picture relative to the output time of the first picture in the coded video sequence. When vps_poc_proportional_to_timing_flag is present in the video parameter set referenced by the sequence parameter set, sps_poc_proportional_to_timing_flag (when present) shall be equal to vps_poc_proportional_to_timing_flag.

sps_num_ticks_poc_diff_one_minus1加1指定对应于图片次序计数值的差等于1的时钟节拍的数目。sps_num_ticks_poc_diff_one_minus1的值应在0到2^32-1的范围中(包括0和2^32-1)。当vps_num_ticks_poc_diff_one_minus1存在于序列参数集所参考的视频参数集中时，sps_num_ticks_poc_diff_one_minus1(当存在时)应等于sps_num_ticks_poc_diff_one_minus1。sps_num_ticks_poc_diff_one_minus1 plus 1 specifies the number of clock ticks corresponding to the difference in picture order count values equal to 1. The value of sps_num_ticks_poc_diff_one_minus1 shall be in the range of 0 to 2^32-1, inclusive. When vps_num_ticks_poc_diff_one_minus1 is present in the video parameter set referenced by the sequence parameter set, sps_num_ticks_poc_diff_one_minus1 (when present) shall be equal to sps_num_ticks_poc_diff_one_minus1.

以下为经修改以解决上述问题中的一或多者的HRD参数语法结构的实例(从HEVCWD9的HRD参数语法结构移除斜体的语法)：The following is an example of an HRD parameter syntax structure modified to address one or more of the above issues (with italicized syntax removed from the HRD parameter syntax structure of HEVC WD9):

表6：实例修改的HRD参数语法结构Table 6: Syntax structure of HRD parameters modified by instance

同样移除根据表6的实例修改的HRD参数语法结构而移除的语法元素的语义。The semantics of the syntax elements removed according to the modified HRD parameter syntax structure of the example in Table 6 are also removed.

作为如上文所描述由运动估计单元42和运动补偿单元44执行的帧间预测的替代方案，帧内预测模块46可以对当前块进行帧内预测。明确地说，帧内预测模块46可以确定用来编码当前块的帧内预测模式。在一些实例中，帧内预测模块46可(例如)在单独编码回合期间使用各种帧内预测模式对当前块进行编码，且帧内预测模块46(在一些实例中，或为模式选择单元40)可从所测试的模式中选择将使用的适当的帧内预测模式。举例来说，帧内预测模块46可以使用速率失真分析计算针对各种经测试帧内预测模式的速率失真值，并且从所述经测试模式当中选择具有最佳速率失真特性的帧内预测模式。速率失真分析一般确定经编码块与经编码以产生所述经编码块的原始的未经编码块之间的失真(或误差)的量，以及用于产生经编码块的位速率(也就是说，位数目)。帧内预测模块46可以根据用于各种经编码块的失真及速率计算比率，以确定哪种帧内预测模式对于所述块展现最佳速率失真值。As an alternative to the inter-prediction performed by motion estimation unit 42 and motion compensation unit 44 as described above, intra-prediction module 46 may perform intra-prediction on the current block. Specifically, intra-prediction module 46 may determine an intra-prediction mode to use to encode the current block. In some examples, intra-prediction module 46 may encode the current block using various intra-prediction modes, for example, during separate encoding passes, and intra-prediction module 46 (or mode selection unit 40, in some examples) may select an appropriate intra-prediction mode to use from the tested modes. For example, intra-prediction module 46 may calculate rate-distortion values for various tested intra-prediction modes using rate-distortion analysis and select the intra-prediction mode with the best rate-distortion characteristics from among the tested modes. Rate-distortion analysis generally determines the amount of distortion (or error) between an encoded block and the original, unencoded block that was encoded to produce the encoded block, as well as the bit rate (that is, the number of bits) used to produce the encoded block. Intra-prediction module 46 may calculate ratios from the distortions and rates for the various encoded blocks to determine which intra-prediction mode exhibits the best rate-distortion value for the block.

在任何情况下，在选择了用于一块的帧内预测模式之后，帧内预测模块46可将指示用于所述块的选定帧内预测模式的信息提供到熵编码单元56。熵编码单元56可根据本发明的技术编码指示所述选定帧内预测模式的信息。视频编码器20可在所传输的位流中包含配置数据，所述配置数据可包含多个帧内预测模式索引表及多个修改的帧内预测模式索引表(还被称作码字映射表)、编码用于各种块的上下文的界定，及用于所述上下文中的每一者的最可能的帧内预测模式、帧内预测模式索引表及修改的帧内预测模式索引表的指示。In any case, after selecting an intra-prediction mode for a block, intra-prediction module 46 may provide information indicating the selected intra-prediction mode for the block to entropy encoding unit 56. Entropy encoding unit 56 may encode the information indicating the selected intra-prediction mode according to the techniques of this disclosure. Video encoder 20 may include configuration data in the transmitted bitstream that may include multiple intra-prediction mode index tables and multiple modified intra-prediction mode index tables (also referred to as codeword mapping tables), definitions of contexts for encoding various blocks, and indications of the most probable intra-prediction mode, intra-prediction mode index tables, and modified intra-prediction mode index tables for each of the contexts.

在预测模块41经由帧间预测或帧内预测产生当前视频块的预测块之后，视频编码器20通过从当前视频块减去预测块而形成残差视频块。残差块中的残差视频数据可包含在一或多个TU中并应用于变换模块52。变换模块52使用例如离散余弦变换(DCT)或概念上类似的变换等变换将残差视频数据变换成残差变换系数。变换模块52可以将残差视频数据从像素域变换到变换域，例如频域。After prediction module 41 generates a prediction block for the current video block via inter-prediction or intra-prediction, video encoder 20 forms a residual video block by subtracting the prediction block from the current video block. The residual video data in the residual block may be contained in one or more TUs and applied to transform module 52. Transform module 52 transforms the residual video data into residual transform coefficients using a transform, such as a discrete cosine transform (DCT) or a conceptually similar transform. Transform module 52 may transform the residual video data from the pixel domain to a transform domain, such as the frequency domain.

变换模块52可将所得变换系数发送到量化单元54。量化单元54将变换系数量化以进一步减小位速率。量化过程可减少与系数中的一些系数或全部相关联的位深度。可通过调整量化参数来修改量化程度。在一些实例中，量化单元54可以接着执行对包含经量化的变换系数的矩阵的扫描。替代地，熵编码单元56可以执行所述扫描。Transform module 52 may send the resulting transform coefficients to quantization unit 54. Quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The degree of quantization may be modified by adjusting a quantization parameter. In some examples, quantization unit 54 may then perform a scan of the matrix containing the quantized transform coefficients. Alternatively, entropy encoding unit 56 may perform the scan.

在量化之后，熵编码单元56对经量化变换系数进行熵编码。举例来说，熵编码单元56可执行上下文自适应可变长度译码(CAVLC)、上下文自适应二进制算术译码(CABAC)、基于语法的上下文自适应二进制算术译码(SBAC)、概率区间分割熵(PIPE)译码或另一熵编码方法或技术。在熵编码单元56进行的熵编码之后，可将经编码位流传输到视频解码器30，或将经编码位流存档以供稍后传输或由视频解码器30检索。熵编码单元56还可对正被编码的当前视频切片的运动向量和其它语法元素进行熵编码。After quantization, entropy encoding unit 56 performs entropy encoding on the quantized transform coefficients. For example, entropy encoding unit 56 may perform context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding, or another entropy encoding method or technique. Following entropy encoding by entropy encoding unit 56, the encoded bitstream may be transmitted to video decoder 30 or archived for later transmission or retrieval by video decoder 30. Entropy encoding unit 56 may also entropy encode the motion vectors and other syntax elements for the current video slice being encoded.

反量化单元58和反变换模块60分别应用反量化和反变换以在像素域中重构残差块，例如以供稍后用作参考图片的参考块。运动补偿单元44可以通过将残差块添加到参考图片列表中的一者内的参考图片中的一者的预测块中来计算参考块。运动补偿单元44还可将一或多个内插滤波器应用于经重构的残差块以计算子整数像素值用于运动估计。求和器62将经重构的残差块添加到由运动补偿单元44产生的运动补偿预测块以产生参考块用于存储在参考图片存储器64(有时称为经解码图片缓冲器(DPB))中。参考块可由运动估计单元42和运动补偿单元44用作参考块以对后续视频帧或图片中的块进行帧间预测。Inverse quantization unit 58 and inverse transform module 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain, for example, for later use as a reference block of a reference picture. Motion compensation unit 44 may calculate a reference block by adding the residual block to a prediction block of one of the reference pictures in one of the reference picture lists. Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for motion estimation. Summer 62 adds the reconstructed residual block to the motion compensated prediction block produced by motion compensation unit 44 to produce a reference block for storage in reference picture memory 64 (sometimes referred to as a decoded picture buffer (DPB)). The reference block may be used by motion estimation unit 42 and motion compensation unit 44 as a reference block for inter-frame prediction of a block in a subsequent video frame or picture.

视频编码器20可任选地包含假想参考解码器(HRD)57(通过使用虚线而被说明为任选的)，以检查视频编码器20的元件所产生的经编码位流对于针对HRD 57而界定的缓冲器模型的符合性。HRD 57可检测类型I和/或类型II位流或位流子集的HRD符合性。HRD 57的操作所需的参数集通过两种类型的HRD参数集(NAL HRD参数和VCL HRD参数)中的一者来用信号发送。如上文所述，HRD参数集可并入于SPS语法结构和/或VPS语法结构内。Video encoder 20 may optionally include a hypothetical reference decoder (HRD) 57 (illustrated as optional using dashed lines) to check the conformance of the encoded bitstream produced by the elements of video encoder 20 to the buffer model defined for HRD 57. HRD 57 may detect HRD conformance for Type I and/or Type II bitstreams or bitstream subsets. The parameter sets required for the operation of HRD 57 are signaled through one of two types of HRD parameter sets: NAL HRD parameters and VCL HRD parameters. As described above, the HRD parameter sets may be incorporated into the SPS syntax structure and/or the VPS syntax structure.

HRD 57可测试视频块和相关联语法元素55对于如在视频译码规范(例如HEVCWD9)或后续规范(例如HEVC WD10)中定义的一或多个位流符合性测试所指定的要求的符合性。举例来说，HRD 57可通过以下操作来测试经编码位流的符合性：处理语法元素55以从经译码视频序列的VPS语法结构或在SPS语法结构的VUI部分中确定界定针对以下情况的条件的语法元素：用信号发送对应于POC值的差等于1的时钟节拍的数目。如果所述条件根据语法元素值而成立，则HRD 57可确定对应于POC值的差等于1的时钟节拍的数目，且使用所确定数目个时钟节拍作为输入以用于(例如)在包含于经编码位流中的经编码图片的解码期间确定CPB下溢或上溢。在本文关于语法元素而使用术语“处理”可指：用以获得呈解码器/HRD 57可使用的形式的语法元素的提取、解码和提取、读取、解析和任何其它可用操作或操作的组合。HRD 57 may test the compliance of video blocks and associated syntax elements 55 with the requirements specified by one or more bitstream conformance tests as defined in a video coding specification (e.g., HEVC WD9) or a subsequent specification (e.g., HEVC WD10). For example, HRD 57 may test the conformance of the encoded bitstream by processing syntax elements 55 to determine, from a VPS syntax structure of the coded video sequence or in a VUI portion of an SPS syntax structure, a syntax element that defines a condition for the case where a number of clock ticks corresponding to a difference in POC values equal to 1 is signaled. If the condition holds true according to the syntax element value, HRD 57 may determine the number of clock ticks corresponding to a difference in POC values equal to 1 and use the determined number of clock ticks as input for determining CPB underflow or overflow, for example, during decoding of an encoded picture included in the encoded bitstream. The term “processing” as used herein with respect to syntax elements may refer to extraction, decoding and extraction, reading, parsing, and any other applicable operation or combination of operations to obtain syntax elements in a form usable by decoder/HRD 57 .

作为另一实例，HRD 57可通过以下操作来测试经编码位流的符合性：解码经编码位流以从语法元素55的VPS语法结构来确定时间尺度和时钟节拍中的单位数目，语法元素55在VPS语法结构中对时间尺度和时钟节拍语法元素中的单位数目进行编码至多一次。在一些情况下，HRD 57可通过以下操作来测试经编码位流的符合性：解码语法元素55以从经编码位流的VUI语法结构来确定时间尺度和时钟节拍中的单位数目，所述经编码位流在VUI语法结构中对时间尺度和时钟节拍语法元素中的单位数目进行编码至多一次。时间尺度和时钟节拍中的单位数目可不在并入于VPS和/或VUI语法结构内的HRD参数语法结构中用信号发送。HRD 57可使用所确定的时间尺度和所确定的时钟节拍中的单位数目作为输入，以用于(例如)在包含于经编码位流中的经编码图片的解码期间确定CPB下溢或上溢。As another example, HRD 57 may test the conformance of the encoded bitstream by decoding the encoded bitstream to determine the time scale and the number of units in the ticks from the VPS syntax structure of syntax element 55, where the time scale and the number of units in the ticks syntax element are encoded at most once in the VPS syntax structure. In some cases, HRD 57 may test the conformance of the encoded bitstream by decoding syntax element 55 to determine the time scale and the number of units in the ticks from the VUI syntax structure of the encoded bitstream, where the time scale and the number of units in the ticks syntax element are encoded at most once in the VUI syntax structure. The time scale and the number of units in the ticks may not be signaled in an HRD parameter syntax structure incorporated into the VPS and/or VUI syntax structures. HRD 57 may use the determined time scale and the determined number of units in the ticks as inputs to, for example, determine CPB underflow or overflow during decoding of an encoded picture included in the encoded bitstream.

根据本文描述的技术，HRD 57可通过从用于一或多个经译码视频序列的语法元素55的VPS语法结构来解码POC与时序成比例指示旗标的值而测试经编码位流的符合性。HRD57可另外或替代地仅在还包含时间尺度和时钟节拍语法元素中的单位数目的情况下在VPS语法结构中解码POC与时序成比例指示旗标的值而测试经编码位流的符合性。HRD 57可使用所确定的POC与时序成比例指示旗标的值以及时间尺度和时钟节拍语法元素中的单位数目作为输入，以用于(例如)在包含于经编码位流中的经编码图片的解码期间确定CPB下溢或上溢。According to the techniques described herein, HRD 57 can test the conformance of the encoded bitstream by decoding the value of the POC proportional to timing indication flag from the VPS syntax structure of syntax elements 55 for one or more coded video sequences. HRD 57 can additionally or alternatively test the conformance of the encoded bitstream by decoding the value of the POC proportional to timing indication flag in the VPS syntax structure only if the number of units in the time scale and clock tick syntax element is also included. HRD 57 can use the determined value of the POC proportional to timing indication flag and the number of units in the time scale and clock tick syntax element as input to, for example, determine CPB underflow or overflow during decoding of an encoded picture included in the encoded bitstream.

图3为说明可实施本发明中描述的技术的实例视频解码器76的框图。在图3的实例中，视频解码器76包含经译码图片缓冲器(CPB)78、熵解码单元80、预测模块81、反量化单元86、反变换单元88、求和器90和经解码图片缓冲器(DPB)92。预测模块81包含运动补偿单元82和帧内预测模块84。在一些实例中，视频解码器76可执行与关于来自图2的视频编码器20所描述的编码遍次大体上互逆的解码遍次。视频解码器76可表示目的地装置14的视频解码器30或图2的假想参考解码器57的实例例子。FIG3 is a block diagram illustrating an example video decoder 76 that may implement the techniques described in this disclosure. In the example of FIG3, video decoder 76 includes a coded picture buffer (CPB) 78, an entropy decoding unit 80, a prediction module 81, an inverse quantization unit 86, an inverse transform unit 88, a summer 90, and a decoded picture buffer (DPB) 92. Prediction module 81 includes a motion compensation unit 82 and an intra-prediction module 84. In some examples, video decoder 76 may perform a decoding pass that is generally reciprocal to the encoding pass described with respect to video encoder 20 from FIG2. Video decoder 76 may represent an example instance of video decoder 30 of destination device 14 or hypothetical reference decoder 57 of FIG2.

CPB 78存储来自经编码图片位流的经译码图片。在一个实例中，CBP 78为含有按解码次序的存取单元(AU)的先入先出缓冲器。AU为根据指定分类规则而彼此相关联的网络抽象层(NAL)单元的集合，所述NAL单元在解码次序上为连续的，且各自仅含有一个经译码图片。解码次序为图片经解码的次序，且可与显示图片的次序(即，显示次序)不同。CPB的操作可通过假想参考解码器(HRD)来指定。CPB 78 stores coded pictures from the coded picture bitstream. In one example, CBP 78 is a first-in, first-out buffer containing access units (AUs) in decoding order. An AU is a set of network abstraction layer (NAL) units associated with each other according to a specified classification rule, the NAL units are consecutive in decoding order, and each contains only one coded picture. Decoding order is the order in which pictures are decoded and may be different from the order in which pictures are displayed (i.e., display order). The operation of the CPB may be specified by a hypothetical reference decoder (HRD).

在解码过程期间，视频解码器76接收经编码视频位流，其表示来自视频编码器20的经编码视频切片的视频块和相关联语法元素。视频解码器76的熵解码单元80解码所述位流以产生经量化系数、运动向量和其它语法元素55。熵解码单元80将运动向量和其它语法元素55转发到预测模块81。视频解码器76可接收在视频切片层级和/或视频块层级处的语法元素55。经编码视频位流可包含根据下文描述的技术而用信号发送的时序信息。举例来说，经编码视频位流可包含视频参数集(VPS)、序列参数集(SPS)，或具有根据本文描述的技术的语法结构的其任何组合，以用信号发送用于HRD操作的参数。During the decoding process, video decoder 76 receives an encoded video bitstream representing video blocks and associated syntax elements of an encoded video slice from video encoder 20. Entropy decoding unit 80 of video decoder 76 decodes the bitstream to produce quantized coefficients, motion vectors, and other syntax elements 55. Entropy decoding unit 80 forwards the motion vectors and other syntax elements 55 to prediction module 81. Video decoder 76 may receive syntax elements 55 at the video slice level and/or the video block level. The encoded video bitstream may include timing information signaled according to the techniques described below. For example, the encoded video bitstream may include a video parameter set (VPS), a sequence parameter set (SPS), or any combination thereof having a syntax structure according to the techniques described herein to signal parameters for HRD operations.

当视频切片经译码为经帧内译码(I)切片时，预测模块81的帧内预测模块84可基于用信号发送的帧内预测模式及来自当前帧或图片的先前经解码块的数据而产生用于当前视频切片的视频块的预测数据。当视频帧经译码为经帧间译码(即，B、P或GPB)切片时，预测模块81的运动补偿单元82基于从熵解码单元80接收到的运动向量和其它语法元素55产生用于当前视频切片的视频块的预测块。预测块可以从参考图片列表中的一者内的参考图片中的一者产生。视频解码器76可以基于存储在DPB 92中的参考图片使用默认建构技术建构参考帧列表：列表0和列表1。When the video slice is coded as an intra-coded (I) slice, intra prediction module 84 of prediction module 81 may generate prediction data for the video block of the current video slice based on the signaled intra prediction mode and data from previously decoded blocks of the current frame or picture. When the video frame is coded as an inter-coded (i.e., B, P, or GPB) slice, motion compensation unit 82 of prediction module 81 generates a prediction block for the video block of the current video slice based on the motion vectors and other syntax elements 55 received from entropy decoding unit 80. The prediction block may be generated from one of the reference pictures within one of the reference picture lists. Video decoder 76 may construct reference frame lists: List 0 and List 1 using a default construction technique based on the reference pictures stored in DPB 92.

运动补偿单元82通过解析运动向量及其它语法元素55来确定用于当前视频切片的视频块的预测信息，并且使用所述预测信息产生用于正被解码的当前视频块的预测块。举例来说，运动补偿单元82使用所接收的语法元素55中的一些语法元素来确定用于对视频切片的视频块进行译码的预测模式(例如，帧内预测或帧间预测)、帧间预测切片类型(例如，B切片、P切片或GPB切片)、用于切片的参考图片列表中的一或多者的建构信息、用于切片的每一经帧间编码视频块的运动向量、用于切片的每一经帧间译码视频块的帧间预测状态，及用以对当前视频切片中的视频块进行解码的其它信息。Motion compensation unit 82 determines prediction information for a video block of the current video slice by parsing the motion vectors and other syntax elements 55, and uses the prediction information to produce a prediction block for the current video block being decoded. For example, motion compensation unit 82 uses some of the received syntax elements 55 to determine a prediction mode (e.g., intra-prediction or inter-prediction) for coding the video block of the video slice, an inter-prediction slice type (e.g., a B slice, a P slice, or a GPB slice), construction information for one or more of the reference picture lists for the slice, a motion vector for each inter-coded video block of the slice, an inter-prediction state for each inter-coded video block of the slice, and other information used to decode video blocks in the current video slice.

运动补偿单元82还可基于内插滤波器执行内插。运动补偿单元82可使用由视频编码器20在编码视频块期间使用的内插滤波器来计算参考块的子整数像素的内插值。在此状况下，运动补偿单元82可根据所接收的语法元素55而确定由视频编码器20使用的内插滤波器且使用所述内插滤波器来产生预测块。Motion compensation unit 82 may also perform interpolation based on interpolation filters. Motion compensation unit 82 may calculate interpolated values for sub-integer pixels of a reference block using the interpolation filters used by video encoder 20 during encoding of the video block. In this case, motion compensation unit 82 may determine the interpolation filters used by video encoder 20 from the received syntax elements 55 and use the interpolation filters to produce a prediction block.

反量化单元86将在位流中提供且由熵解码单元80解码的经量化变换系数反量化，即，解量化。反量化过程可包含使用视频编码器20针对视频切片中的每一视频块计算以确定应该应用的量化程度和同样反量化程度的量化参数。反变换单元88对变换系数应用反变换，例如反DCT、反整数变换或概念上类似的反变换过程，以便产生像素域中的残差块。Inverse quantization unit 86 inverse quantizes, or dequantizes, the quantized transform coefficients provided in the bitstream and decoded by entropy decoding unit 80. The inverse quantization process may include calculating, using video encoder 20, a quantization parameter that determines, for each video block in a video slice, a degree of quantization and, likewise, a degree of inverse quantization that should be applied. Inverse transform unit 88 applies an inverse transform, such as an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process, to the transform coefficients to produce a residual block in the pixel domain.

在运动补偿单元82基于运动向量和其它语法元素55而产生当前视频块的预测性块后，视频解码器76通过将来自反变换单元88的残差块与由运动补偿单元82所产生的对应预测性块求和而形成经解码视频块。求和器90表示执行此求和运算的(一或多个)组件。如果需要，还可应用解块滤波器来对经解码块滤波以便移除成块性假影。其它环路滤波器(译码环路中或译码环路后)还可用以使像素转变平滑，或以其它方式改进视频质量。接着将给定帧或图片中的经解码视频块存储于DPB 92中，DPB 92存储用于后续运动补偿的参考图片。DPB 92还存储经解码视频以用于稍后呈现于显示装置上，例如图1的显示装置32。类似于CPB 78，在一个实例中，DPB 92的操作可通过假想参考解码器(HRD)来指定。After motion compensation unit 82 generates a predictive block for the current video block based on the motion vector and other syntax elements 55, video decoder 76 forms a decoded video block by summing the residual block from inverse transform unit 88 with the corresponding predictive block generated by motion compensation unit 82. Summer 90 represents the component(s) that perform this summing operation. If necessary, a deblocking filter may also be applied to filter the decoded block to remove blockiness artifacts. Other loop filters (in the decoding loop or after the decoding loop) may also be used to smooth pixel transitions or otherwise improve video quality. The decoded video blocks in a given frame or picture are then stored in DPB 92, which stores reference pictures used for subsequent motion compensation. DPB 92 also stores the decoded video for later presentation on a display device, such as display device 32 of FIG. 1 . Similar to CPB 78, in one example, the operation of DPB 92 may be specified by a hypothetical reference decoder (HRD).

如本发明中所描述，编码器20和解码器76表示经配置以执行如本发明中所描述的用于在视频译码过程中用信号发送时序的技术的装置的实例。因此，用于用信号发送时间的本发明中所描述的操作可由编码器20、解码器76或两者来执行。在一些情况下，编码器20可用信号发送时序信息，且解码器76可接收此类时序信息(例如)以用于界定一或多个HRD特征、特性、参数或条件。As described in this disclosure, encoder 20 and decoder 76 represent examples of devices configured to perform the techniques described in this disclosure for signaling timing in a video coding process. Thus, the operations described in this disclosure for signaling time may be performed by encoder 20, decoder 76, or both. In some cases, encoder 20 may signal timing information, and decoder 76 may receive such timing information, e.g., for use in defining one or more HRD features, characteristics, parameters, or conditions.

在一些情况下，视频解码器76可为处于测试中的视频解码器76(或VUT)。视频解码器76可接收视频编码器20所产生的经编码位流的表示，以在用于经译码视频序列的语法元素55的VPS语法结构中或在SPS语法结构的VUI部分中直接用信号发送界定针对以下情况的条件的所有语法元素：用信号发送对应于图片次序计数(POC)值的差等于1的时钟节拍的数目。视频解码器76可解码经编码位流以从经译码视频序列的VPS语法结构或在SPS语法结构的VUI部分中确定界定针对以下情况的条件的语法元素：用信号发送对应于POC值的差等于1的时钟节拍的数目。如果所述条件根据语法元素值而成立，则视频解码器76可确定对应于POC值的差等于1的时钟节拍的数目，且使用所确定数目个时钟节拍作为输入以用于(例如)在包含于经编码位流中的经编码图片的解码期间确定CPB 78下溢或上溢。In some cases, the video decoder 76 may be a video decoder under test 76 (or VUT). The video decoder 76 may receive a representation of the encoded bitstream generated by the video encoder 20 to directly signal all syntax elements defining a condition for the following case: the number of clock ticks corresponding to the difference in picture order count (POC) values being equal to 1 is signaled in the VPS syntax structure of the syntax elements 55 for the coded video sequence or in the VUI portion of the SPS syntax structure. The video decoder 76 may decode the encoded bitstream to determine, from the VPS syntax structure of the coded video sequence or in the VUI portion of the SPS syntax structure, the syntax elements defining the condition for the following case: the number of clock ticks corresponding to the difference in POC values being equal to 1 is signaled in the VPS syntax structure of the coded video sequence or in the VUI portion of the SPS syntax structure. If the condition holds true according to the syntax element value, the video decoder 76 may determine the number of clock ticks corresponding to the difference in POC values being equal to 1 and use the determined number of clock ticks as input for determining, for example, CPB 78 underflow or overflow during decoding of an encoded picture included in the encoded bitstream.

在另一实例中，视频编码器20可接收视频编码器20所产生的经编码位流的表示，以在用于给定经译码视频序列的语法元素55的VPS和VUI语法结构中的每一者中用信号发送时间尺度和时钟节拍中的单位数目至多一次。视频解码器76可解码经编码位流以从经编码位流的VPS语法结构来确定时间尺度和时钟节拍中的单位数目，所述经编码位流在VPS语法结构中对时间尺度和时钟节拍语法元素中的单位数目进行编码至多一次。在一些情况下，视频解码器76可通过以下操作来测试经编码位流的符合性：解码经编码位流以从经编码位流的VUI语法结构来确定时间尺度和时钟节拍中的单位数目，所述经编码位流在VUI语法结构中对时间尺度和时钟节拍语法元素中的单位数目进行编码至多一次。时间尺度和时钟节拍中的单位数目可不在并入于VPS和/或VUI语法结构内的HRD参数语法结构中用信号发送。视频解码器76可使用所确定的时间尺度和所确定的时钟节拍中的单位数目作为输入，以用于(例如)在包含于经编码位流中的经编码图片的解码期间确定CPB 78下溢或上溢。In another example, video encoder 20 may receive a representation of an encoded bitstream generated by video encoder 20 to signal the time scale and the number of units in the clock tick at most once in each of the VPS and VUI syntax structures of syntax elements 55 for a given coded video sequence. Video decoder 76 may decode the encoded bitstream to determine the time scale and the number of units in the clock tick from the VPS syntax structure of the encoded bitstream, where the encoded bitstream encodes the number of units in the time scale and clock tick syntax elements at most once in the VPS syntax structure. In some cases, video decoder 76 may test the conformance of the encoded bitstream by decoding the encoded bitstream to determine the time scale and the number of units in the clock tick from the VUI syntax structure of the encoded bitstream, where the encoded bitstream encodes the number of units in the time scale and clock tick syntax elements at most once in the VUI syntax structure. The time scale and the number of units in the clock tick may not be signaled in the HRD parameter syntax structure incorporated into the VPS and/or VUI syntax structures. Video decoder 76 may use the determined time scale and the determined number of units in the clock tick as input for determining CPB 78 underflow or overflow, eg, during decoding of an encoded picture included in the encoded bitstream.

在另一实例中，视频解码器76可接收视频编码器20所产生的经编码位流的表示，以在一或多个经译码视频序列的语法元素55的VPS语法结构中用信号发送POC与时序成比例指示旗标。视频解码器76可通过解码经编码位流以确定所述旗标的值来测试经编码位流的符合性。视频解码器76可另外或替代地测试视频解码器76所产生的经编码位流，以仅在还包含时间尺度和时钟节拍语法元素中的单位数目的情况下在VPS语法结构中用信号发送所述旗标。视频解码器76可使用所确定的POC与时序成比例指示旗标的值以及时间尺度和时钟节拍语法元素中的单位数目作为输入，以用于(例如)在包含于经编码位流中的经编码图片的解码期间确定CPB 78下溢或上溢。In another example, the video decoder 76 may receive a representation of an encoded bitstream generated by the video encoder 20 to signal a POC proportional to timing indication flag in a VPS syntax structure of syntax elements 55 of one or more coded video sequences. The video decoder 76 may test the encoded bitstream for compliance by decoding the encoded bitstream to determine the value of the flag. The video decoder 76 may additionally or alternatively test the encoded bitstream generated by the video decoder 76 to signal the flag in the VPS syntax structure only if the flag is also included in the time scale and clock tick syntax element. The video decoder 76 may use the determined value of the POC proportional to timing indication flag and the number of units in the time scale and clock tick syntax element as input to, for example, determine CPB 78 underflow or overflow during decoding of an encoded picture included in the encoded bitstream.

图4为说明用于参考图片集的实例译码结构100的框图。译码结构100包含切片102A到102E(统称为“切片102”)。与译码结构100相关联的图片次序计数108表示参考图片集中的对应切片的输出次序。举例来说，将首先输出I切片102A(POC值0)，而其次输出b切片102B(POC值1)。与译码结构100相关联的解码次序110表示参考图片集中的对应切片的解码次序。举例来说，将首先输出I切片102A(解码次序1)，而其次输出b切片102B(解码次序2)。FIG4 is a block diagram illustrating an example coding structure 100 for a reference picture set. Coding structure 100 includes slices 102A-102E (collectively, "slices 102"). A picture order count 108 associated with coding structure 100 represents the output order of corresponding slices in the reference picture set. For example, I slice 102A (POC value 0) will be output first, while b slice 102B (POC value 1) will be output second. A decoding order 110 associated with coding structure 100 represents the decoding order of corresponding slices in the reference picture set. For example, I slice 102A will be output first (decoding order 1), while b slice 102B will be output second (decoding order 2).

箭头104指示沿着时间连续区t的图片的输出时间。时间间隔106表示对应于图片次序计数(POC)值的差等于1的时间间隔。时间间隔106可包含时钟节拍的数目，其可取决于时间尺度(对应于(例如)界定用于用信号发送的信息的时间坐标系的振荡器频率，例如27MHz)以及在对应于时钟节拍计数器的一个增量(其称作为“时钟节拍”)的时间尺度下操作的时钟的时间单位的数目。根据本文描述的技术，视频编码器20可产生位流以在用于经译码视频序列的视频参数集(VPS)语法结构中或在序列参数集(SPS)语法结构的视频可用性信息(VUI)部分中直接用信号发送界定针对以下情况的条件的语法元素：用信号发送对应于图片次序计数(POC)值的差等于1的时钟节拍的数目。Arrow 104 indicates the output time of a picture along a temporal continuum t. Time interval 106 represents a time interval corresponding to a difference in picture order count (POC) values equal to 1. Time interval 106 may include a number of clock ticks, which may depend on a time scale (corresponding to, for example, an oscillator frequency, e.g., 27 MHz, defining a time coordinate system for signaled information) and a number of time units of a clock operating at a time scale corresponding to one increment of a clock tick counter (referred to as a "clock tick"). According to the techniques described herein, video encoder 20 may generate a bitstream to directly signal syntax elements defining a condition for the following case: signaling the number of clock ticks corresponding to a difference in picture order count (POC) values equal to 1, in a video parameter set (VPS) syntax structure for a coded video sequence or in a video usability information (VUI) portion of a sequence parameter set (SPS) syntax structure.

图5为说明根据本发明中描述的技术的操作的实例方法的流程图。视频编码器20编码视频序列的图片以产生经译码视频序列(200)。视频编码器20另外产生用于经译码视频序列的参数集。参数集可包含根据序列参数集(SPS)语法结构和/或根据视频参数集(VPS)语法结构而编码的参数。根据本文描述的技术，视频编码器20将用于时钟节拍中的单位数目和时间尺度的语法元素直接编码到经译码视频序列的VPS语法结构和/或直接编码到SPS语法结构(202)。术语“直接”指示可产生此类编码而无需在VPS语法结构或SPS语法结构(在适用时)中并入用于针对独立参数集语法结构而定义的时钟节拍中的单位数目和时间尺度的语法元素，例如对应于如在HEVC WD9中定义的假想参考解码器(HRD)参数集。FIG5 is a flowchart illustrating an example method of operation according to the techniques described in the present disclosure. Video encoder 20 encodes pictures of a video sequence to produce a coded video sequence (200). Video encoder 20 further generates parameter sets for the coded video sequence. The parameter sets may include parameters encoded according to a sequence parameter set (SPS) syntax structure and/or according to a video parameter set (VPS) syntax structure. According to the techniques described herein, video encoder 20 encodes syntax elements for the number of units in clock ticks and time scales directly into the VPS syntax structure of the coded video sequence and/or directly into the SPS syntax structure (202). The term "directly" indicates that such encoding can be generated without incorporating syntax elements for the number of units in clock ticks and time scales defined for an independent parameter set syntax structure in the VPS syntax structure or the SPS syntax structure (as applicable), e.g., corresponding to a hypothetical reference decoder (HRD) parameter set as defined in HEVC WD9.

另外，视频编码器20将针对以下情况的条件直接编码到经译码视频序列的VPS语法结构和/或SPS语法结构：用信号发送对应于图片次序计数(POC)值的差等于一的时钟节拍的数目(204)。所述条件可包含表示布尔(Boolean)公式的变量的一或多个语法元素，在此情况下，视频编码器20可将每一此类语法元素直接编码到经译码视频序列的VPS语法结构和/或SPS语法结构中。视频编码器20输出经译码视频序列以及所述经译码视频序列的VPS语法结构和/或SPS语法结构(206)。在一些情况下，视频编码器20将这些结构输出到视频编码器20的HRD。In addition, video encoder 20 encodes a condition for the following case directly into the VPS syntax structure and/or SPS syntax structure of the coded video sequence: the number of clock ticks corresponding to the difference in picture order count (POC) values being equal to one is signaled (204). The condition may include one or more syntax elements representing variables of a Boolean formula, in which case video encoder 20 may encode each such syntax element directly into the VPS syntax structure and/or SPS syntax structure of the coded video sequence. Video encoder 20 outputs the coded video sequence and the VPS syntax structure and/or SPS syntax structure for the coded video sequence (206). In some cases, video encoder 20 outputs these structures to the HRD of video encoder 20.

图6A到6B为说明根据本发明中描述的技术的操作的实例方法的流程图。在图6A中，视频编码器20编码视频序列的图片以产生经译码视频序列(300)。视频编码器20另外产生用于经译码视频序列的参数集。参数集可包含根据视频参数集(VPS)语法结构而编码的参数。根据本文描述的技术，视频编码器20将用于时钟节拍中的单位数目和时间尺度的语法元素直接且至多一次编码到经译码视频序列的VPS语法结构(302)。在一些情况下，甚至在VPS语法结构包含HRD参数的多个实例的情况下，通过将语法元素直接编码到VPS语法结构(至多一次)而非编码到HRD参数集(或任何其它并入的参数集语法结构)，VPS语法结构可包含用于时钟节拍中的单位数目和时间尺度中的每一者的单一语法元素。视频编码器20输出经译码视频序列以及所述经译码视频序列的VPS语法结构(304)。在一些情况下，视频编码器20将这些结构输出到视频编码器20的HRD。6A-6B are flowcharts illustrating an example method of operation according to the techniques described in this disclosure. In FIG. 6A , video encoder 20 encodes pictures of a video sequence to produce a coded video sequence ( 300 ). Video encoder 20 also generates a parameter set for the coded video sequence. The parameter set may include parameters encoded according to a video parameter set (VPS) syntax structure. According to the techniques described herein, video encoder 20 encodes syntax elements for the number of units in a clock tick and the time scale directly and at most once into the VPS syntax structure of the coded video sequence ( 302 ). In some cases, even where the VPS syntax structure includes multiple instances of HRD parameters, the VPS syntax structure may include a single syntax element for each of the number of units in a clock tick and the time scale by encoding the syntax elements directly into the VPS syntax structure (at most once) rather than into the HRD parameter set (or any other incorporated parameter set syntax structure). Video encoder 20 outputs the coded video sequence and the VPS syntax structure for the coded video sequence ( 304 ). In some cases, video encoder 20 outputs these structures to the HRD of video encoder 20 .

在图6B中，视频编码器20编码视频序列的图片以产生经译码视频序列(310)。视频编码器20另外产生用于经译码视频序列的参数集。参数集可包含根据序列参数集(SPS)语法结构而编码的参数。根据本文描述的技术，视频编码器20将用于时钟节拍中的单位数目和时间尺度的语法元素直接且至多一次编码到经译码视频序列的SPS语法结构(312)。在一些情况下，甚至在SPS语法结构包含HRD参数的多个实例的情况下，通过将语法元素直接编码到SPS语法结构(至多一次)而非编码到HRD参数集(或任何其它并入的参数集语法结构)，SPS语法结构可包含用于时钟节拍中的单位数目和时间尺度中的每一者的单一语法元素。视频编码器20输出经译码视频序列以及所述经译码视频序列的SPS语法结构(314)。在一些情况下，视频编码器20将这些结构输出到视频编码器20的HRD。在一些情况下，视频编码器20将用于时钟节拍中的单位数目和时间尺度的语法元素编码到经译码视频序列的VPS语法结构和SPS语法结构两者。In FIG6B , video encoder 20 encodes pictures of a video sequence to produce a coded video sequence ( 310 ). Video encoder 20 also generates a parameter set for the coded video sequence. The parameter set may include parameters encoded according to a sequence parameter set (SPS) syntax structure. According to the techniques described herein, video encoder 20 encodes syntax elements for the number of units in a clock tick and the time scale directly, and at most once, into the SPS syntax structure of the coded video sequence ( 312 ). In some cases, even where the SPS syntax structure includes multiple instances of HRD parameters, the SPS syntax structure may include a single syntax element for each of the number of units in a clock tick and the time scale by encoding the syntax elements directly into the SPS syntax structure (at most once) rather than into the HRD parameter set (or any other incorporated parameter set syntax structure). Video encoder 20 outputs the coded video sequence and the SPS syntax structure for the coded video sequence ( 314 ). In some cases, video encoder 20 outputs these structures to the HRD of video encoder 20. In some cases, video encoder 20 encodes syntax elements for the number of units and the time scale in clock ticks into both the VPS syntax structure and the SPS syntax structure of the coded video sequence.

图7为说明根据本发明中描述的技术的操作的实例方法的流程图。视频编码器20编码视频序列的图片以产生经译码视频序列(400)。视频编码器20另外产生用于经译码视频序列的参数集。参数集可包含根据视频参数集(VPS)语法结构而编码的参数。如果将包含时序信息(例如)用于界定HRD缓冲模型(402的“是”分支)，则视频编码器20将具有一值的语法元素直接编码到经译码视频序列的VPS语法结构，所述值指定经译码视频序列中的每一图片(除经译码视频序列中按解码次序的第一图片外)的图片次序计数(POC)值相对于经译码视频序列中的第一图片的输出时间是否与所述图片的输出时间成比例(404)。所述语法元素可在语义上类似于HEVC WD9所定义的poc_proportional_to_timing_flag。时序信息可表示时钟节拍中的单位数目和时间尺度。FIG7 is a flowchart illustrating an example method of operation according to the techniques described in this disclosure. Video encoder 20 encodes pictures of a video sequence to produce a coded video sequence (400). Video encoder 20 further generates a parameter set for the coded video sequence. The parameter set may include parameters encoded according to a video parameter set (VPS) syntax structure. If timing information is to be included, e.g., for defining an HRD buffer model (“yes” branch of 402), video encoder 20 encodes a syntax element having a value directly into the VPS syntax structure of the coded video sequence that specifies whether a picture order count (POC) value for each picture in the coded video sequence (except the first picture in decoding order in the coded video sequence) is proportional to the output time of the picture relative to the output time of the first picture in the coded video sequence (404). The syntax element may be semantically similar to the poc_proportional_to_timing_flag defined by HEVC WD9. The timing information may represent the number of units in clock ticks and the time scale.

如果语法元素的值为真(406的“是”分支)，则视频编码器20还编码用于对应于图片次序计数值的差等于一的时钟节拍的数目的语法元素(408)。因为视频编码器20将语法元素编码到VPS，所以语法元素的值可应用到所有层或可缩放视频位流的所有可能位流子集，这是由于VPS表示最高层参数集且描述经译码图片序列的整体特性。If the value of the syntax element is true ("yes" branch of 406), video encoder 20 also encodes a syntax element for the number of clock ticks corresponding to the difference in picture order count values equal to one (408). Because video encoder 20 encodes the syntax element into the VPS, the value of the syntax element may apply to all layers or all possible bitstream subsets of the scalable video bitstream because the VPS represents the highest layer parameter set and describes the overall characteristics of the coded picture sequence.

如果在VPS语法结构中将不包含时序信息(402的“否”分支)，则视频编码器20对于以下两者均不编码：用于指示POC与时序信息成比例的语法元素；及用于对应于图片次序计数值的差等于一的时钟节拍的数目的语法元素。如果POC不与时序信息成比例(即，值为假)(406的“否”分支)，则视频编码器20不编码用于对应于图片次序计数值的差等于一的时钟节拍的数目的语法元素。If timing information is not to be included in the VPS syntax structure (the "NO" branch of 402), video encoder 20 does not encode either: a syntax element indicating that the POC is proportional to the timing information; or a syntax element corresponding to the number of clock ticks for which the difference in picture order count values is equal to one. If the POC is not proportional to the timing information (i.e., the value is false) (the "NO" branch of 406), video encoder 20 does not encode a syntax element corresponding to the number of clock ticks for which the difference in picture order count values is equal to one.

视频编码器20输出经译码视频序列以及所述经译码视频序列的VPS语法结构(410)。在一些情况下，视频编码器20将这些结构输出到视频编码器20的HRD。Video encoder 20 outputs the coded video sequence and the VPS syntax structures for the coded video sequence (410). In some cases, video encoder 20 outputs these structures to the HRD of video encoder 20.

图8为说明根据本发明中描述的技术的操作的实例方法的流程图。视频解码器装置30或视频编码器装置20的假想参考解码器57(在下文为“解码器”)接收经译码视频序列和用于所述经译码视频序列的视频参数集(VPS)语法结构和/或序列参数集(SPS)语法结构(500)。可将经译码视频序列和/或语法结构编码到包含一或多个经编码图片的位流。FIG8 is a flowchart illustrating an example method of operation according to the techniques described in this disclosure. A hypothetical reference decoder 57 (hereinafter "decoder") of video decoder device 30 or video encoder device 20 receives a coded video sequence and a video parameter set (VPS) syntax structure and/or a sequence parameter set (SPS) syntax structure for the coded video sequence (500). The coded video sequence and/or syntax structure may be encoded into a bitstream including one or more encoded pictures.

解码器处理VPS语法结构和/或SPS语法结构以提取语法元素，所述语法元素直接在VPS语法结构和/或SPS语法结构中指定针对以下情况的条件：用信号发送对应于图片次序计数(POC)值的差等于一的时钟节拍的数目(502)。所述条件可包含表示布尔(Boolean)公式的变量的一或多个语法元素，在此情况下，解码器可处理直接来自经译码视频序列的VPS语法结构和/或SPS语法结构的每一此类语法元素。The decoder processes the VPS syntax structure and/or the SPS syntax structure to extract syntax elements that specify conditions directly in the VPS syntax structure and/or the SPS syntax structure for the case where a number of clock ticks corresponding to a difference in picture order count (POC) values equal to one is signaled (502). The conditions may include one or more syntax elements representing variables of a Boolean formula, in which case the decoder may process each such syntax element directly from the VPS syntax structure and/or the SPS syntax structure of the coded video sequence.

解码器另外处理VPS语法结构和/或SPS语法结构以直接从经译码视频序列的VPS语法结构和/或直接从SPS语法结构提取用于时钟节拍中的单位数目和时间尺度的语法元素(504)。解码器接着可验证经译码视频序列对于视频缓冲模型的符合性，所述视频缓冲模型至少部分由针对所述条件、如从VPS语法结构和/或SPS语法结构提取以及如从对应语法元素读取的时钟节拍中的单位数目和时间尺度的值来界定(506)。The decoder additionally processes the VPS syntax structure and/or the SPS syntax structure to extract syntax elements for the number of units in clock ticks and the time scale directly from the VPS syntax structure and/or directly from the SPS syntax structure of the coded video sequence (504). The decoder may then verify conformance of the coded video sequence to a video buffering model defined at least in part by values for the number of units in clock ticks and the time scale for the condition, as extracted from the VPS syntax structure and/or the SPS syntax structure and as read from the corresponding syntax elements (506).

图9A到9B为说明根据本发明中描述的技术的操作的实例方法的流程图。在图9A中，视频解码器装置30或视频编码器装置20的假想参考解码器57(在下文为“解码器”)接收经译码视频序列和用于所述经译码视频序列的视频参数集(VPS)语法结构(600)。可将经译码视频序列和/或VPS语法结构编码到包含一或多个经编码图片的位流。9A-9B are flowcharts illustrating example methods of operation according to the techniques described in this disclosure. In FIG9A , a hypothetical reference decoder 57 (hereinafter “decoder”) of a video decoder device 30 or a video encoder device 20 receives a coded video sequence and a video parameter set (VPS) syntax structure for the coded video sequence (600). The coded video sequence and/or the VPS syntax structure may be encoded into a bitstream including one or more encoded pictures.

根据本文描述的技术，解码器处理VPS语法结构以提取用于时钟节拍中的单位数目和时间尺度的语法元素，所述语法元素直接出现(且至多一次)于经译码视频序列的VPS语法结构中(602)。解码器接着可验证经译码视频序列对于视频缓冲模型的符合性，所述视频缓冲模型至少部分由如从VPS语法结构提取以及如从对应语法元素读取的时钟节拍中的单位数目和时间尺度的值来界定(604)。According to the techniques described herein, a decoder processes a VPS syntax structure to extract syntax elements for the number of units and time scale in clock ticks that appear directly (and at most once) in the VPS syntax structure for a coded video sequence (602). The decoder can then verify the conformance of the coded video sequence to a video buffering model defined at least in part by the values for the number of units and time scale in clock ticks as extracted from the VPS syntax structure and as read from the corresponding syntax elements (604).

在图9B中，解码器接收经译码视频序列和用于所述经译码视频序列的序列参数集(SPS)语法结构(610)。可将经译码视频序列和/或SPS语法结构编码到包含一或多个经编码图片的位流。9B , a decoder receives a coded video sequence and a sequence parameter set (SPS) syntax structure for the coded video sequence ( 610 ).The coded video sequence and/or the SPS syntax structure may be encoded into a bitstream that includes one or more encoded pictures.

根据本文描述的技术，解码器处理SPS语法结构以提取用于时钟节拍中的单位数目和时间尺度的语法元素，所述语法元素直接出现(且至多一次)于经译码视频序列的SPS语法结构中(612)。解码器接着可验证经译码视频序列对于视频缓冲模型的符合性，所述视频缓冲模型至少部分由如从SPS语法结构提取以及如从对应语法元素读取的时钟节拍中的单位数目和时间尺度的值来界定(614)。According to the techniques described herein, the decoder processes the SPS syntax structure to extract syntax elements for the number of units and time scale in clock ticks that appear directly (and at most once) in the SPS syntax structure for the coded video sequence (612). The decoder can then verify the conformance of the coded video sequence to a video buffering model defined at least in part by the values of the number of units and time scale in clock ticks as extracted from the SPS syntax structure and as read from the corresponding syntax elements (614).

图10为说明根据本发明中描述的技术的操作的实例方法的流程图。在图10中，视频解码器装置30或视频编码器装置20的假想参考解码器57(在下文为“解码器”)接收经译码视频序列和用于所述经译码视频序列的视频参数集(VPS)语法结构(700)。可将经译码视频序列和/或VPS语法结构编码到包含一或多个经编码图片的位流。FIG10 is a flowchart illustrating an example method of operation according to the techniques described in this disclosure. In FIG10 , a hypothetical reference decoder 57 (hereinafter “decoder”) of a video decoder device 30 or a video encoder device 20 receives a coded video sequence and a video parameter set (VPS) syntax structure for the coded video sequence (700). The coded video sequence and/or the VPS syntax structure may be encoded into a bitstream including one or more encoded pictures.

解码器处理VPS语法结构以提取语法元素，所述语法元素指定经译码视频序列中的每一图片(除经译码视频序列中按解码次序的第一图片外)的图片次序计数值相对于经译码视频序列中的第一图片的输出时间而与所述图片的输出时间成比例(702)。如果语法元素的值为真，则解码器进一步处理VPS语法结构以提取用于对应于图片次序计数值的差等于一的时钟节拍的数目的语法元素(706)。解码器接着可验证经译码视频序列对于视频缓冲模型的符合性，所述视频缓冲模型至少部分由如从VPS语法结构提取以及如从对应语法元素读取的用于对应于图片次序计数值的差等于一的时钟节拍的数目的值来界定(708)。The decoder processes the VPS syntax structure to extract a syntax element that specifies a picture order count value for each picture in the coded video sequence (except the first picture in decoding order in the coded video sequence) that is proportional to the output time of the picture relative to the output time of the first picture in the coded video sequence (702). If the value of the syntax element is true, the decoder further processes the VPS syntax structure to extract a syntax element for a number of clock ticks corresponding to a difference of one in the picture order count values (706). The decoder can then verify conformance of the coded video sequence to a video buffering model defined at least in part by a value for a number of clock ticks corresponding to a difference of one in the picture order count values, as extracted from the VPS syntax structure and as read from the corresponding syntax element (708).

在一或多个实例中，所描述的功能可以用硬件、软件、固件或其任何组合来实施。如果以软件实施，则所述功能可作为一或多个指令或代码在计算机可读媒体上存储或传输，且由基于硬件的处理单元执行。计算机可读媒体可包含计算机可读存储媒体，其对应于有形媒体，例如数据存储媒体，或包含任何促进将计算机程序从一处传送到另一处的媒体(例如，根据通信协议)的通信媒体。以此方式，计算机可读媒体一般可对应于(1)非暂时性的有形计算机可读存储媒体或(2)例如信号或载波等通信媒体。数据存储媒体可为可由一或多个计算机或一或多个处理器存取以检索用于实施本发明中描述的技术的指令、代码及/或数据结构的任何可用媒体。计算机程序产品可以包含计算机可读媒体。In one or more instances, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored or transmitted as one or more instructions or codes on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which corresponds to tangible media such as data storage media, or communication media that includes any media that facilitates the transfer of a computer program from one place to another (e.g., according to a communication protocol). In this manner, computer-readable media may generally correspond to (1) non-transitory tangible computer-readable storage media or (2) communication media such as signals or carrier waves. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, codes, and/or data structures for implementing the techniques described in this disclosure. A computer program product may include computer-readable media.

举例来说且并非限制，所述计算机可读媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置，快闪存储器，或可用于存储呈指令或数据结构的形式的所要程序代码且可由计算机存取的任何其它媒体。同样，任何连接可恰当地称为计算机可读媒体。举例来说，如果使用同轴电缆、光纤缆线、双绞线、数字订户线(DSL)或例如红外线、无线电和微波等无线技术从网站、服务器或其它远程源传输指令，那么同轴电缆、光纤缆线、双绞线、DSL或例如红外线、无线电和微波等无线技术包含在媒体的界定中。然而，应理解，所述计算机可读存储媒体和数据存储媒体并不包含连接、载波、信号或其它瞬时媒体，而是实际上针对非瞬时的有形存储媒体。如本文所使用，磁盘及光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)、软性磁盘及蓝光光盘，其中磁盘通常以磁性方式重现数据，而光盘使用激光以光学方式重现数据。上述各者的组合也应包含在计算机可读媒体的范围内。By way of example, and not limitation, computer-readable media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Likewise, any connection is properly termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwaves, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwaves are included in the definition of medium. However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but rather refer to non-transient, tangible storage media. As used herein, disk and disc include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, and Blu-ray disc, where disks typically reproduce data magnetically, while discs use lasers to reproduce data optically. Combinations of the above should also be included within the scope of computer-readable media.

指令可以由一或多个处理器执行，所述一或多个处理器例如是一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效的集成或离散逻辑电路。因此，如本文中所使用的术语“处理器”可指上述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外，在一些方面中，本文中所描述的功能性可以在经配置用于编码和解码的专用硬件和/或软件模块内提供，或者并入在组合编解码器中。并且，可将所述技术完全实施于一或多个电路或逻辑元件中。Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general-purpose microprocessors, application-specific integrated circuits (ASICs), field-programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits. Thus, the term "processor," as used herein, may refer to any of the aforementioned structures or any other structure suitable for implementing the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated into a combined codec. Furthermore, the techniques may be fully implemented in one or more circuits or logic elements.

本发明的技术可实施于广泛多种装置或设备中，包含无线手持机、集成电路(IC)或IC组(例如，芯片组)。本发明中描述各种组件、模块或单元是为了强调经配置以执行所揭示技术的装置的功能方面，但不必需要通过不同硬件单元实现。实际上，如上文所描述，各种单元可以结合合适的软件及/或固件组合在编码解码器硬件单元中，或者通过互操作硬件单元的集合来提供，所述硬件单元包含如上文所描述的一或多个处理器。The techniques of this disclosure can be implemented in a wide variety of devices or apparatuses, including wireless handsets, integrated circuits (ICs), or sets of ICs (e.g., chipsets). The various components, modules, or units described herein are intended to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require implementation by distinct hardware units. In fact, as described above, the various units may be combined in a codec hardware unit in conjunction with appropriate software and/or firmware, or provided by a collection of interoperating hardware units, including one or more processors as described above.

已描述各种实例。这些及其它实例在所附权利要求书的范围内。Various embodiments have been described. These and other embodiments are within the scope of the following claims.

Claims

1. A method for processing video data, the method comprising:

Receives a decoded video sequence comprising encoded images of a video sequence; and

Receive timing parameters for the decoded video sequence, the timing parameters including the following indication in the video parameter set VPS syntax structure referenced by the decoded video sequence: whether the image order count (POC) value of each image in the decoded video sequence that is not the first image in the decoded video sequence according to the decoding order is proportional to the time interval between the output time of the first image in the decoded video sequence and the output time of the image.

2. The method of claim 1, wherein the instruction includes the vps_poc_proportional_to_timing_flag syntax element.

3. The method of claim 1, wherein receiving timing parameters for the decoded video sequence further comprises:

The number of clock ticks in the VPS syntax structure corresponding to a difference of one between the POC values is received only if the indication indicates that the POC value of each image in the decoded video sequence that is not the first image in the decoded video sequence according to the decoding order is proportional to the time interval between the output time of the first image in the decoded video sequence and the output time of the image.

4. The method according to claim 1,

Receiving the instruction includes receiving the instruction only if the syntax element for the number of units in the time scale and clock ticks is present in the VPS syntax structure.

5. The method of claim 1, wherein receiving the timing parameters for the decoded video sequence further comprises:

Receive an indication of whether the syntax element for the number of units in the time scale and clock tick exists in the VPS syntax structure.

6. The method of claim 5, wherein the indication in the VPS syntax structure regarding the presence of the syntax element for the number of units in the time scale and clock ticks includes the vps_timing_info_present_flag syntax element.

7. The method of claim 1, wherein receiving the timing parameters for the decoded video sequence further comprises:

The number of clock beats corresponding to a difference of one between the POC value and the time interval between the output time of the first image in the decoded video sequence (SPS syntax structure referenced by the decoded video sequence) is received only if the POC value of each image in the decoded video sequence that is not the first image in the decoded video sequence according to the decoding order is proportional to the time interval between the output time of the first image in the decoded video sequence and the output time of the image.

8. The method of claim 1, wherein the indication includes a first indication, and wherein receiving the timing parameters for the decoded video sequence further includes:

The second indication in the VUI portion of the SPS syntax structure is received only if the syntax element for the number of units in the time scale and clock beat exists in the video availability information (VUI) portion of the sequence parameter set (SPS) syntax structure referenced by the decoded video sequence: whether the POC value of each image in the decoded video sequence that is not the first image in the decoded video sequence according to the decoding order is proportional to the time interval between the output time of the first image in the decoded video sequence and the output time of the image.

9. The method of claim 8, wherein the second instruction comprises the sps_poc_proportional_to_timing_flag syntax element.

10. The method of claim 1, wherein receiving the timing parameters for the decoded video sequence further comprises:

The following indication is received from the Video Availability Information (VUI) section of the Sequence Parameter Set (SPS) syntax structure referenced by the decoded video sequence: whether the syntax elements for the number of units in the time scale and clock beats exist in the VUI section of the SPS syntax structure.

11. The method of claim 10, wherein the indication in the VUI portion of the SPS syntax structure referenced by the decoded video sequence regarding the presence of syntax elements for the number of units in the time scale and clock beats in the VUI portion of the SPS syntax structure includes the sps_timing_info_present_flag syntax element.

12. The method of claim 1, wherein receiving the decoded video sequence includes receiving a decoded bitstream, the decoded bitstream including a bit sequence forming a representation of the encoded picture, the method further comprising: verifying the conformity of the bitstream to a video buffer model of a decoded picture buffer and a decoded picture buffer, the video buffer model being defined at least in part by the indication.

13. The method of claim 1, wherein the timing parameters include timing parameters for hypothetical reference decoding operations.

14. A method for encoding video data, the method comprising:

Images of an encoded video sequence are used to generate a decoded video sequence including the encoded images; and

The timing parameters for the decoded video sequence are signaled by signaling the following indication in the Video Parameter Set (VPS) syntax structure referenced by the decoded video sequence: whether the Picture Order Count (POC) value of each picture in the decoded video sequence that is not the first picture in the decoded video sequence according to the decoding order is proportional to the time interval between the output time of the first picture in the decoded video sequence and the output time of the picture.

15. The method of claim 14, wherein the instruction includes the vps_poc_proportional_to_timing_flag syntax element.

16. The method of claim 14, wherein signaling the timing parameters for the decoded video sequence further comprises:

The number of clock ticks corresponding to a difference of one between the POC values is signaled in the VPS syntax structure only when the POC value of each image in the decoded video sequence that is not the first image in the decoded video sequence according to the decoding order is proportional to the time interval between the output time of the first image in the decoded video sequence and the output time of the image.

17. The method according to claim 14,

Sending the indication by signaling includes sending the indication only when the number of syntax elements used for time scales and clock ticks are present in the VPS syntax structure.

18. The method of claim 14, wherein signaling the timing parameters for the decoded video sequence further comprises:

The VPS syntax structure is signaled to indicate whether a syntax element for the number of units in the time scale and clock tick exists in the VPS syntax structure.

19. The method of claim 18, wherein the indication in the VPS syntax structure regarding the presence of the syntax element for the number of units in the time scale and clock ticks includes the vps_timing_info_present_flag syntax element.

20. The method of claim 14, wherein sending the timing parameters for the decoded video sequence by signaling further comprises:

The number of clock beats corresponding to a difference of one between the POC values is signaled in the Video Usability Information (VUI) portion of the Sequence Parameter Set (SPS) syntax structure referenced by the decoded video sequence only when the POC value of each image in the decoded video sequence that is not the first image in the decoded video sequence according to the decoding order is proportional to the time interval between the output time of the first image in the decoded video sequence and the output time of the first image.

21. The method of claim 14, wherein the indication includes a first indication, and wherein signaling the timing parameters for the decoded video sequence further includes:

A second indication is signaled in the VUI portion of the SPS syntax structure only when the syntax element for the number of units in the time scale and clock beat exists in the video availability information (VUI) portion of the sequence parameter set (SPS) syntax structure referenced by the decoded video sequence: whether the POC value of each image in the decoded video sequence that is not a first image in the decoded video sequence according to the decoding order is proportional to the time interval between the output time of the first image in the decoded video sequence and the output time of the image.

22. The method of claim 21, wherein the second instruction comprises the sps_poc_proportional_to_timing_flag syntax element.

23. The method of claim 14, wherein sending the timing parameters for the decoded video sequence by signaling further comprises:

The following indication is sent by signal in the Video Availability Information (VUI) section of the Sequence Parameter Set (SPS) syntax structure referenced by the decoded video sequence: whether the syntax elements for the number of units in the time scale and clock beat exist in the VUI section of the SPS syntax structure.

24. The method of claim 23, wherein the indication in the VUI portion of the SPS syntax structure referenced by the decoded video sequence regarding the presence of syntax elements for the number of units in the time scale and clock beats in the VUI portion of the SPS syntax structure includes the sps_timing_info_present_flag syntax element.

25. An apparatus for processing video data, the apparatus comprising:

Processor, configured to:

26. The apparatus of claim 25, wherein the instruction includes the vps_poc_proportional_to_timing_flag syntax element.

27. The apparatus of claim 25, wherein, in order to receive timing parameters for the decoded video sequence, the processor is further configured to:

28. The apparatus according to claim 25,

In order to receive the instruction, the processor is further configured to receive the instruction only if a number of syntax elements for the time scale and clock ticks are present in the VPS syntax structure.

29. The apparatus of claim 25, wherein, in order to receive the timing parameters for the decoded video sequence, the processor is further configured to receive an indication of the presence of a syntax element in the VPS syntax structure for the number of units in the time scale and clock beats.

30. The apparatus of claim 29, wherein the indication in the VPS syntax structure regarding the presence of the syntax element for the number of units in the time scale and clock ticks includes the vps_timing_info_present_flag syntax element.

31. The apparatus of claim 25, wherein, in order to receive the timing parameters for the decoded video sequence, the processor is further configured to:

32. The apparatus of claim 25, wherein the indication includes a first indication, and wherein, in order to receive the timing parameters for the decoded video sequence, the processor is further configured to:

33. The apparatus of claim 32, wherein the second instruction includes the sps_poc_proportional_to_timing_flag syntax element.

34. The apparatus of claim 25, wherein, in order to receive the timing parameters for the decoded video sequence, the processor is further configured to:

35. The apparatus of claim 34, wherein the indication in the VUI portion of the SPS syntax structure referenced by the decoded video sequence regarding the presence of syntax elements for the number of units in the time scale and clock beats in the VUI portion of the SPS syntax structure includes the sps_timing_info_present_flag syntax element.

36. The apparatus of claim 25, wherein, in order to receive the decoded video sequence, the processor is further configured to:

Receive a decoded bitstream, the decoded bitstream comprising a bit sequence forming a representation of the encoded image; and verify the conformity of the bitstream to a video buffer model of a decoded image buffer and a decoded image buffer, the video buffer model being defined at least in part by the indication.

37. The apparatus of claim 25, wherein the timing parameters include timing parameters for hypothetical reference decoding operations.

38. An apparatus for encoding video data, the apparatus comprising:

Processor, configured to:

39. The apparatus of claim 38, wherein the instruction includes the vps_poc_proportional_to_timing_flag syntax element.

40. The apparatus of claim 38, wherein, in order to signal the timing parameters for the decoded video sequence, the processor is further configured to:

41. The apparatus according to claim 38,

In order to send the indication by signal, the processor is further configured to send the indication by signal only when a number of syntax elements for the time scale and clock ticks are present in the VPS syntax structure.

42. The apparatus of claim 38, wherein, in order to signal the timing parameters for the decoded video sequence, the processor is further configured to:

43. The apparatus of claim 42, wherein the indication in the VPS syntax structure regarding the presence or absence of the syntax element for the number of units in the time scale and clock ticks includes the vps_timing_info_present_flag syntax element.

44. The apparatus of claim 38, wherein, in order to signal the timing parameters for the decoded video sequence, the processor is further configured to:

45. The apparatus of claim 38, wherein the indication includes a first indication, and wherein, in order to signal the timing parameters for the decoded video sequence, the processor is further configured to:

46. The apparatus of claim 45, wherein the second instruction comprises the sps_poc_proportional_to_timing_flag syntax element.

47. The apparatus of claim 38, wherein, in order to signal the timing parameters for the decoded video sequence, the processor is further configured to:

48. The apparatus of claim 47, wherein the indication in the VUI portion of the SPS syntax structure referenced by the decoded video sequence regarding the presence of syntax elements for the number of units in the time scale and clock beats in the VUI portion of the SPS syntax structure includes the sps_timing_info_present_flag syntax element.

49. An apparatus for processing video data, comprising:

A means for receiving a decoded video sequence comprising encoded images of a video sequence; and

A means for receiving timing parameters for the decoded video sequence, the timing parameters including the following indication in the video parameter set (VPS) syntax structure referenced by the decoded video sequence: whether the picture order count (POC) value of each picture in the decoded video sequence that is not the first picture in the decoded video sequence according to the decoding order is proportional to the time interval between the output time of the first picture in the decoded video sequence and the output time of the picture.

50. A non-transitory computer-readable storage medium storing instructions for processing video data, the instructions causing the one or more processors, when executed, to perform the following operations: