HK1220062B

HK1220062B - Methods and devices of multi-layer video file format designs

Info

Publication number: HK1220062B
Application number: HK16108012.9A
Authority: HK
Inventors: 王益魁; 陈颖; 阿达许．克里许纳．瑞玛苏布雷蒙尼安; 伏努．亨利
Original assignee: 高通股份有限公司
Priority date: 2013-10-23
Filing date: 2014-10-23
Publication date: 2020-01-03

Description

Method and device for designing multi-layer video file format

本申请案主张2013年10月23日申请的第61/894,886号美国临时专利申请案的权利，所述申请案的全部内容被以引用的方式并入本文中。This application claims the benefit of U.S. Provisional Patent Application No. 61/894,886, filed October 23, 2013, which is incorporated herein by reference in its entirety.

技术领域Technical Field

本发明涉及视频译码。The present invention relates to video decoding.

背景技术Background Art

数字视频能力可并入到广泛范围的装置中，包含数字电视、数字直播系统、无线广播系统、个人数字助理(PDA)、膝上型或桌上型计算机、平板计算机、电子书阅读器、数字相机、数字记录装置、数字媒体播放器、视频游戏装置、视频游戏控制台、蜂窝式或卫星无线电电话(所谓的“智能电话”)、视频电话会议装置、视频流式传输装置及其类似者。数字视频装置实施视频压缩技术，例如由MPEG-2、MPEG-4、ITU-T H.263、 ITU-T H.264/MPEG-4第10部分高级视频译码(AVC)所定义的标准、目前正在开发的高效率视频译码(HEVC)标准及此些标准的扩展中所描述的那些视频压缩技术。视频装置通过实施此些视频压缩技术可更有效地发射、接收、编码、解码及/或存储数字视频信息。Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital live broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video game devices, video game consoles, cellular or satellite radio telephones (so-called "smartphones"), video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video compression techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 Part 10 Advanced Video Coding (AVC), the High Efficiency Video Coding (HEVC) standard currently under development, and extensions to such standards. By implementing such video compression techniques, video devices can more efficiently transmit, receive, encode, decode, and/or store digital video information.

视频压缩技术执行空间(图片内)预测及/或时间(图片间)预测来减少或移除视频序列中固有的冗余。对于基于块的视频译码，可将视频切片(即，视频帧或视频帧的一部分)分割成视频块(其也可被称作树块)、译码单元(CU)及/或译码节点。可使用相对于同一图片中的相邻块中的参考样本的空间预测来编码图片的经帧内译码(I)的切片中的视频块。图片的经帧间译码(P或B)切片中的视频块可使用相对于同一图片中的相邻块中的参考样本的空间预测或相对于其它参考图片中的参考样本的时间预测。图片可被称作帧，且参考图片可被称作参考帧。Video compression techniques perform spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (i.e., a video frame or portion of a video frame) may be partitioned into video blocks (which may also be referred to as treeblocks), coding units (CUs), and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture may be encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in an inter-coded (P or B) slice of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. Pictures may be referred to as frames, and reference pictures may be referred to as reference frames.

空间或时间预测导致用于待编码的块的预测性块。残余数据表示经译码的原始块与预测性块之间的像素差。根据指向形成预测性块的参考样本的块的运动向量及指示经译码块与预测性块之间的差异的残余数据来编码经帧间译码块。经帧内译码块是根据帧内译码模式及残余数据来编码。为进行进一步压缩，可将残余数据从像素域变换到变换域，从而导致可接着进行量化的残余变换系数。可扫描最初排列成二维阵列的经量化变换系数以便产生变换系数的一维向量，且可应用熵译码以达成甚至较多压缩。Spatial or temporal prediction results in a predictive block for the block to be encoded. Residual data represents the pixel differences between the coded original block and the predictive block. Inter-coded blocks are encoded according to a motion vector pointing to a block of reference samples forming the predictive block and residual data indicating the difference between the coded block and the predictive block. Intra-coded blocks are encoded according to an intra-coding mode and the residual data. For further compression, the residual data can be transformed from the pixel domain to the transform domain, resulting in residual transform coefficients that can then be quantized. The quantized transform coefficients, initially arranged in a two-dimensional array, can be scanned to produce a one-dimensional vector of transform coefficients, and entropy coding can be applied to achieve even greater compression.

发明内容Summary of the Invention

一般来说，本发明涉及基于国际标准组织(ISO)基本媒体文件格式(ISOBMFF)将视频内容存储于文件中。本发明的一些实例涉及用于存储含有多个经译码层的视频流的方法，其中每一层可为可缩放层、纹理视图、深度视图等，且所述方法可适用于存储多视图高效率视频译码(MV-HEVC)、可缩放HEVC(SHVC)、三维HEVC(3D-HEVC)及其它类型的视频数据。In general, the present invention relates to storing video content in files based on the International Organization for Standardization (ISO) Base Media File Format (ISOBMFF). Some examples of the present invention relate to methods for storing a video stream containing multiple coded layers, where each layer can be a scalable layer, a texture view, a depth view, etc., and the methods can be applied to storing Multi-view High Efficiency Video Coding (MV-HEVC), Scalable HEVC (SHVC), Three-dimensional HEVC (3D-HEVC), and other types of video data.

在一个方面中，本发明描述一种处理多层视频数据的方法，所述方法包括：产生包括围封媒体内容的媒体数据框的文件，所述媒体内容包括一连串样本，所述样本中的每一者为所述多层视频数据的存取单元，其中产生所述文件包括：响应于所述多层视频数据的位流的至少一存取单元包含具有等于第一值的图片输出旗标的经译码图片及具有等于第二值的图片输出旗标的经译码图片的确定，使用至少一第一播放轨及第二播放轨将所述位流存储于所述文件中，其中：对于来自所述第一播放轨及所述第二播放轨的每一相应播放轨，在所述相应播放轨的每一样本中的所有经译码图片具有所述图片输出旗标的相同值；且允许输出具有等于所述第一值的图片输出旗标的图片，且允许将具有等于所述第二值的图片输出旗标的图片用作参考图片，但不允许将其输出。In one aspect, the present disclosure describes a method for processing multi-layer video data, the method comprising: generating a file including a media data box enclosing media content, the media content including a series of samples, each of the samples being an access unit of the multi-layer video data, wherein generating the file comprises: in response to a determination that at least one access unit of a bitstream of the multi-layer video data includes a decoded picture having a picture output flag equal to a first value and a decoded picture having a picture output flag equal to a second value, storing the bitstream in the file using at least one first playback track and a second playback track, wherein: for each respective playback track from the first playback track and the second playback track, all decoded pictures in each sample of the respective playback track have the same value of the picture output flag; and pictures having the picture output flag equal to the first value are allowed to be output, and pictures having the picture output flag equal to the second value are allowed to be used as reference pictures but are not allowed to be output.

在另一方面中，本发明描述一种处理多层视频数据的方法，所述方法包括：从文件获得第一播放轨框及第二播放轨框，所述第一播放轨框含有用于所述文件中的第一播放轨的元数据，所述第二播放轨框含有用于所述文件中的第二播放轨的元数据，其中：所述第一播放轨及所述第二播放轨中的每一者包括一连串样本，所述样本中的每一者为所述多层视频数据的视频存取单元，对于来自所述第一播放轨及所述第二播放轨的每一相应播放轨，在所述相应播放轨的每一样本中的所有经译码图片具有图片输出旗标的相同值，且允许输出具有等于第一值的图片输出旗标的图片，且允许将具有等于第二值的图片输出旗标的图片用作参考图片，但不允许将其输出。In another aspect, the present invention describes a method for processing multi-layer video data, the method comprising: obtaining a first play track box and a second play track box from a file, the first play track box containing metadata for a first play track in the file, and the second play track box containing metadata for a second play track in the file, wherein: each of the first play track and the second play track includes a series of samples, each of the samples is a video access unit of the multi-layer video data, for each corresponding play track from the first play track and the second play track, all decoded pictures in each sample of the corresponding play track have the same value of a picture output flag, and pictures with a picture output flag equal to a first value are allowed to be output, and pictures with a picture output flag equal to a second value are allowed to be used as reference pictures but are not allowed to be output.

在另一方面中，本发明描述一种视频装置，所述视频装置包括：数据存储媒体，其经配置以存储多层视频数据；及一或多个处理器，其经配置以：产生包括围封媒体内容的媒体数据框的文件，所述媒体内容包括一连串样本，所述样本中的每一者为所述多层视频数据的存取单元，其中为了产生所述文件，所述一或多个处理器经配置以：响应于所述多层视频数据的位流的至少一存取单元包含具有等于第一值的图片输出旗标的经译码图片及具有等于第二值的图片输出旗标的经译码图片的确定，使用至少一第一播放轨及第二播放轨将所述位流存储于所述文件中，其中：对于来自所述第一播放轨及所述第二播放轨的每一相应播放轨，在所述相应播放轨的每一样本中的所有经译码图片具有所述图片输出旗标的相同值；且允许输出具有等于所述第一值的图片输出旗标的图片，且允许将具有等于所述第二值的图片输出旗标的图片用作参考图片，但不允许将其输出。In another aspect, the disclosure describes a video device comprising: a data storage medium configured to store multi-layer video data; and one or more processors configured to: generate a file comprising a media data box enclosing media content, the media content comprising a sequence of samples, each of the samples being an access unit of the multi-layer video data, wherein to generate the file, the one or more processors are configured to: in response to a determination that at least one access unit of a bitstream of the multi-layer video data includes a coded picture having a picture output flag equal to a first value and a coded picture having a picture output flag equal to a second value, store the bitstream in the file using at least one first playback track and a second playback track, wherein: for each respective playback track from the first playback track and the second playback track, all coded pictures in each sample of the respective playback track have the same value of the picture output flag; and allow output of pictures having the picture output flag equal to the first value, and allow use of pictures having the picture output flag equal to the second value as reference pictures but do not allow output of them.

在另一方面中，本发明描述一种视频装置，所述视频装置包括：数据存储媒体，其经配置以存储多层视频数据；及一或多个处理器，其经配置以：从文件获得第一播放轨框及第二播放轨框，所述第一播放轨框含有用于所述文件中的第一播放轨的元数据，所述第二播放轨框含有用于所述文件中的第二播放轨的元数据，其中：所述第一播放轨及所述第二播放轨中的每一者包括一连串样本，所述样本中的每一者为所述多层视频数据的视频存取单元，对于来自所述第一播放轨及所述第二播放轨的每一相应播放轨，在所述相应播放轨的每一样本中的所有经译码图片具有图片输出旗标的相同值，且允许输出具有等于第一值的图片输出旗标的图片，且允许将具有等于第二值的图片输出旗标的图片用作参考图片，但不允许将其输出。In another aspect, the present invention describes a video device comprising: a data storage medium configured to store multi-layer video data; and one or more processors configured to: obtain a first play track box and a second play track box from a file, the first play track box containing metadata for a first play track in the file, and the second play track box containing metadata for a second play track in the file, wherein: each of the first play track and the second play track includes a series of samples, each of the samples being a video access unit of the multi-layer video data, for each respective play track from the first play track and the second play track, all decoded pictures in each sample of the respective play track have the same value of a picture output flag, and pictures having a picture output flag equal to a first value are allowed to be output, and pictures having a picture output flag equal to a second value are allowed to be used as reference pictures but are not allowed to be output.

在另一方面中，本发明描述一种视频装置，所述视频装置包括：用于产生包括围封媒体内容的媒体数据框的文件的装置，所述媒体内容包括一连串样本，所述样本中的每一者为多层视频数据的存取单元，其中产生所述文件包括：响应于所述多层视频数据的位流的至少一存取单元包含具有等于第一值的图片输出旗标的经译码图片及具有等于第二值的图片输出旗标的经译码图片的确定，使用至少一第一播放轨及第二播放轨将所述位流存储于所述文件中，其中：对于来自所述第一播放轨及所述第二播放轨的每一相应播放轨，在所述相应播放轨的每一样本中的所有经译码图片具有所述图片输出旗标的相同值；且允许输出具有等于所述第一值的图片输出旗标的图片，且允许将具有等于所述第二值的图片输出旗标的图片用作参考图片，但不允许将其输出。In another aspect, the present disclosure describes a video device comprising: a device for generating a file including a media data box enclosing media content, the media content including a series of samples, each of the samples being an access unit of multi-layer video data, wherein generating the file comprises: in response to a determination that at least one access unit of a bitstream of the multi-layer video data includes a decoded picture having a picture output flag equal to a first value and a decoded picture having a picture output flag equal to a second value, storing the bitstream in the file using at least one first playback track and a second playback track, wherein: for each respective playback track from the first playback track and the second playback track, all decoded pictures in each sample of the respective playback track have the same value of the picture output flag; and pictures having the picture output flag equal to the first value are allowed to be output, and pictures having the picture output flag equal to the second value are allowed to be used as reference pictures but are not allowed to be output.

在另一方面中，本发明描述一种视频装置，所述视频装置包括：用于从文件获得第一播放轨框及第二播放轨框的装置，所述第一播放轨框含有用于所述文件中的第一播放轨的元数据，所述第二播放轨框含有用于所述文件中的第二播放轨的元数据，其中：所述第一播放轨及所述第二播放轨中的每一者包括一连串样本，所述样本中的每一者为多层视频数据的视频存取单元，对于来自所述第一播放轨及所述第二播放轨的每一相应播放轨，在所述相应播放轨的每一样本中的所有经译码图片具有图片输出旗标的相同值，且允许输出具有等于第一值的图片输出旗标的图片，且允许将具有等于第二值的图片输出旗标的图片用作参考图片，但不允许将其输出。In another aspect, the present disclosure describes a video device comprising: a device for obtaining a first play track box and a second play track box from a file, the first play track box containing metadata for the first play track in the file, and the second play track box containing metadata for the second play track in the file, wherein: each of the first play track and the second play track includes a series of samples, each of the samples is a video access unit of multi-layer video data, for each corresponding play track from the first play track and the second play track, all decoded pictures in each sample of the corresponding play track have the same value of a picture output flag, and pictures with a picture output flag equal to a first value are allowed to be output, and pictures with a picture output flag equal to a second value are allowed to be used as reference pictures but are not allowed to be output.

在另一方面中，本发明描述一种计算机可读数据存储媒体，其具有存储于其上的指令，所述指令当经执行时使一或多个处理器：产生包括围封媒体内容的媒体数据框的文件，所述媒体内容包括一连串样本，所述样本中的每一者为多层视频数据的存取单元，其中为了产生所述文件，所述指令使所述一或多个处理器：响应于所述多层视频数据的位流的至少一存取单元包含具有等于第一值的图片输出旗标的经译码图片及具有等于第二值的图片输出旗标的经译码图片的确定，使用至少一第一播放轨及第二播放轨将所述位流存储于所述文件中，其中：对于来自所述第一播放轨及所述第二播放轨的每一相应播放轨，在所述相应播放轨的每一样本中的所有经译码图片具有所述图片输出旗标的相同值；且允许输出具有等于所述第一值的图片输出旗标的图片，且允许将具有等于所述第二值的图片输出旗标的图片用作参考图片，但不允许将其输出。In another aspect, the present disclosure describes a computer-readable data storage medium having instructions stored thereon that, when executed, cause one or more processors to: generate a file including a media data box enclosing media content, the media content including a sequence of samples, each of the samples being an access unit of multi-layer video data, wherein to generate the file, the instructions cause the one or more processors to: in response to a determination that at least one access unit of a bitstream of the multi-layer video data includes a coded picture having a picture output flag equal to a first value and a coded picture having a picture output flag equal to a second value, store the bitstream in the file using at least one first playback track and a second playback track, wherein: for each respective playback track from the first playback track and the second playback track, all coded pictures in each sample of the respective playback track have the same value of the picture output flag; and allow output of pictures having the picture output flag equal to the first value, and allow use of pictures having the picture output flag equal to the second value as reference pictures but do not allow output of them.

在另一方面中，本发明描述一种计算机可读数据存储媒体，其具有存储于其上的指令，所述指令当经执行时使一或多个处理器：从文件获得第一播放轨框及第二播放轨框，所述第一播放轨框含有用于所述文件中的第一播放轨的元数据，所述第二播放轨框含有用于所述文件中的第二播放轨的元数据，其中：所述第一播放轨及所述第二播放轨中的每一者包括一连串样本，所述样本中的每一者为多层视频数据的视频存取单元，对于来自所述第一播放轨及所述第二播放轨的每一相应播放轨，在所述相应播放轨的每一样本中的所有经译码图片具有图片输出旗标的相同值，且允许输出具有等于第一值的图片输出旗标的图片，且允许将具有等于第二值的图片输出旗标的图片用作参考图片，但不允许将其输出。In another aspect, the present disclosure describes a computer-readable data storage medium having instructions stored thereon that, when executed, cause one or more processors to: obtain a first track box and a second track box from a file, the first track box containing metadata for a first track in the file, the second track box containing metadata for a second track in the file, wherein: each of the first track and the second track comprises a sequence of samples, each of the samples being a video access unit of multi-layer video data, for each respective track from the first track and the second track, all coded pictures in each sample of the respective track have the same value of a picture output flag, and pictures having the picture output flag equal to the first value are allowed to be output, and pictures having the picture output flag equal to the second value are allowed to be used as reference pictures but are not allowed to be output.

在附图及以下描述中阐明本发明的一或多个实例的细节。其它特征、目标及优势将从描述、图式及权利要求书显而易见。The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description, drawings, and claims.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为说明可使用本发明中描述的技术的实例视频编码及解码系统的框图。1 is a block diagram illustrating an example video encoding and decoding system that may use the techniques described in this disclosure.

图2为说明可实施本发明中所描述的技术的实例视频编码器的框图。2 is a block diagram illustrating an example video encoder that may implement the techniques described in this disclosure.

图3为说明可实施本发明中所描述的技术的实例视频解码器的框图。3 is a block diagram illustrating an example video decoder that may implement the techniques described in this disclosure.

图4为说明形成网络的部分的一组实例装置的框图。4 is a block diagram illustrating an example set of devices forming part of a network.

图5为说明根据本发明的一或多个技术的文件的实例结构的概念图。5 is a conceptual diagram illustrating an example structure of a file, in accordance with one or more techniques of this disclosure.

图6为说明根据本发明的一或多个技术的文件的实例结构的概念图。6 is a conceptual diagram illustrating an example structure of a file, in accordance with one or more techniques of this disclosure.

图7为说明根据本发明的一或多个技术的文件产生装置的实例操作的流程图。7 is a flowchart illustrating example operation of a file generation device, in accordance with one or more techniques of this disclosure.

图8为说明根据本发明的一或多个技术的计算装置执行随机存取及/或等级切换的实例操作的流程图。8 is a flowchart illustrating example operation of a computing device performing random access and/or rank switching in accordance with one or more techniques of this disclosure.

图9为说明根据本发明的一或多个技术的文件产生装置的实例操作的流程图。9 is a flowchart illustrating example operation of a file generation device, in accordance with one or more techniques of this disclosure.

图10为说明根据本发明的一或多个技术的计算装置的实例操作的流程图。10 is a flowchart illustrating example operation of a computing device in accordance with one or more techniques of this disclosure.

图11为说明根据本发明的一或多个技术的文件产生装置的实例操作的流程图。11 is a flowchart illustrating example operation of a file generation device, in accordance with one or more techniques of this disclosure.

图12为说明根据本发明的一或多个技术的目的地装置的实例操作的流程图。12 is a flowchart illustrating example operation of a destination device in accordance with one or more techniques of this disclosure.

具体实施方式DETAILED DESCRIPTION

ISO基本媒体文件格式(ISOBMFF)为用于存储媒体数据的文件格式。ISOBMFF可扩展以支持符合特定视频译码标准的视频数据的存储。举例来说，ISOBMFF先前已经扩展以支持符合H.264/AVC及高效率视频译码(HEVC)视频译码标准的视频数据的存储。此外，ISOBMFF先前已经扩展以支持符合H.264/AVC的多视图译码(MVC)及可缩放视频译码(SVC)扩展的视频数据的存储。MV-HEVC、3D-HEVC及SHVC为HEVC视频译码标准的支持多层视频数据的扩展。添加到ISOBMFF用于符合H.264/AVC的MVC及 SVC扩展的视频数据的存储的特征不足够用于符合MV-HEVC、3D-HEVC及SHVC的视频数据的有效存储。换句话说，如果想要试图将用于符合H.264/AVC的MVC及SVC 扩展的视频数据的存储的ISOBMFF的扩展用于符合MV-HEVC、3D-HEVC及SHVC的视频数据的有效存储，那么可能出现各种问题。The ISO Base Media File Format (ISOBMFF) is a file format for storing media data. ISOBMFF is extensible to support the storage of video data conforming to specific video coding standards. For example, ISOBMFF has previously been extended to support the storage of video data conforming to the H.264/AVC and High Efficiency Video Coding (HEVC) video coding standards. Furthermore, ISOBMFF has previously been extended to support the storage of video data conforming to the Multi-View Coding (MVC) and Scalable Video Coding (SVC) extensions of H.264/AVC. MV-HEVC, 3D-HEVC, and SHVC are extensions of the HEVC video coding standard that support multi-layer video data. The features added to ISOBMFF for the storage of video data conforming to the MVC and SVC extensions of H.264/AVC are insufficient for the efficient storage of video data conforming to MV-HEVC, 3D-HEVC, and SHVC. In other words, if one attempts to use extensions of ISOBMFF for storage of video data conforming to the MVC and SVC extensions of H.264/AVC for efficient storage of video data conforming to MV-HEVC, 3D-HEVC, and SHVC, various problems may arise.

举例来说，不同于符合H.264/AVC的MVC或SVC扩展的位流，符合MV-HEVC、 3D-HEVC或SHVC的位流可包含含有帧内随机存取点(IRAP)图片及非IRAP图片的存取单元。含有IRAP图片及非IRAP图片的存取单元可用于MV-HEVC、3D-HEVC及SHVC 中的随机存取。然而，ISOBMFF及其现有扩展不提供识别此些存取单元的方式。此可妨碍计算装置执行随机存取及层切换的能力。For example, unlike bitstreams conforming to the MVC or SVC extensions of H.264/AVC, bitstreams conforming to MV-HEVC, 3D-HEVC, or SHVC may include access units containing intra random access point (IRAP) pictures and non-IRAP pictures. Access units containing IRAP pictures and non-IRAP pictures can be used for random access in MV-HEVC, 3D-HEVC, and SHVC. However, ISOBMFF and its existing extensions do not provide a way to identify such access units. This can hinder the ability of computing devices to perform random access and layer switching.

因此，根据本发明的一个实例，计算装置可产生文件，所述文件包括含有用于所述文件中的播放轨的元数据的播放轨框。用于播放轨的媒体数据包括一连串样本。样本中的每一者可为多层视频数据(例如，MV-HEVC、3D-HEVC或SHVC视频数据)的视频存取单元。作为产生文件的部分，计算装置可在文件中产生以文件记载含有至少一IRAP 图片的所有所述样本的额外框。能够基于额外框中的信息确定含有IRAP图片的样本可使接收文件的计算装置能够在不剖析及解译NAL单元的情况下执行随机存取及层切换。此可减小复杂度且减少处理时间。Therefore, according to one example of the present invention, a computing device may generate a file that includes a track box containing metadata for a track in the file. The media data for the track includes a series of samples. Each of the samples may be a video access unit of multi-layer video data (e.g., MV-HEVC, 3D-HEVC, or SHVC video data). As part of generating the file, the computing device may generate an additional box in the file that documents all of the samples containing at least one IRAP picture. Being able to determine the samples containing IRAP pictures based on the information in the additional box enables the computing device receiving the file to perform random access and layer switching without parsing and interpreting NAL units. This can reduce complexity and processing time.

此外，例如MV-HEVC、3D-HEVC及SHVC视频数据的多层视频数据可包含用于每一存取单元的多个经译码图片。然而，当在存取单元中存在多个经编码图片时， ISOBMFF及其现有扩展不提供关于存取单元内的个别经译码图片的信息。因此，在计算装置(例如，流式传输服务器)正确定是否转递文件中的NAL单元的实例中，计算装置可能需要剖析且解译存储于NAL单元中的信息以便确定是否转递NAL单元。剖析且解译存储于NAL单元中的信息可增大计算装置的复杂度且可增加流式传输延迟。Furthermore, multi-layer video data, such as MV-HEVC, 3D-HEVC, and SHVC video data, may include multiple coded pictures for each access unit. However, when multiple coded pictures are present in an access unit, ISOBMFF and its existing extensions do not provide information about the individual coded pictures within the access unit. Therefore, in an example where a computing device (e.g., a streaming server) is determining whether to forward a NAL unit in a file, the computing device may need to parse and interpret the information stored in the NAL unit in order to determine whether to forward the NAL unit. Parsing and interpreting the information stored in the NAL unit may increase the complexity of the computing device and may increase streaming latency.

因此，根据本发明的一个实例，计算装置可产生文件，所述文件包括含有用于所述文件中的播放轨的元数据的播放轨框。用于播放轨的媒体数据包括一连串样本。样本中的每一者为多层视频数据的视频存取单元。作为产生文件的部分，计算装置在文件中产生子样本信息框，所述子样本信息框含有指定在所述子样本信息框中给出的子样本信息的类型的旗标。当旗标具有特定值时，对应于子样本信息框的子样本含有正好一个经译码图片及与所述经译码图片相关联的零或多个非视频译码层(VCL)NAL单元。以此方式，接收文件的计算装置可能能够使用在子样本信息框中给出的子样本信息进行关于文件的样本内的个别经译码图片的确定。与经译码图片相关联的非VCL NAL单元可包含用于适用于经译码图片的参数集(例如，PPS、SPS、VPS)及SEI的NAL单元。Thus, according to one example of the present disclosure, a computing device may generate a file comprising a track box containing metadata for a track in the file. The media data for the track comprises a sequence of samples. Each of the samples is a video access unit of multi-layer video data. As part of generating the file, the computing device generates a subsample information box in the file, the subsample information box containing a flag that specifies the type of subsample information provided in the subsample information box. When the flag has a particular value, the subsample corresponding to the subsample information box contains exactly one coded picture and zero or more non-video coding layer (VCL) NAL units associated with the coded picture. In this way, a computing device receiving the file may be able to use the subsample information provided in the subsample information box to make determinations about individual coded pictures within a sample of the file. The non-VCL NAL units associated with the coded pictures may include NAL units for parameter sets (e.g., PPS, SPS, VPS) and SEI applicable to the coded picture.

在多层视频数据中，存取单元可包含标记为用于输出的经译码图片及标记为不用于输出的经译码图片。视频解码器可使用标记为不用于输出的经译码图片作为用于解码标记为用于输出的经译码图片的参考图片。用于图片的切片的NAL单元的NAL单元标头可包含图片输出旗标(例如，HEVC中的pic_output_flag)，其指示是否将所述图片标记为用于输出。在ISOBMFF文件中，需要每一样本与指示样本将输出的时间的输出时间(例如，组成时间)相关联。然而，标记为不用于输出的图片不具有输出时间。因此标记为不用于输出的图片的存在可能违反ISOBMFF的此要求，或可能需要非标准暂时解决方案技术。In multi-layer video data, access units may include coded pictures marked for output and coded pictures marked not for output. A video decoder can use coded pictures marked not for output as reference pictures for decoding coded pictures marked for output. The NAL unit header of a NAL unit for a slice of a picture may include a picture output flag (e.g., pic_output_flag in HEVC) that indicates whether the picture is marked for output. In ISOBMFF files, each sample is required to be associated with an output time (e.g., composition time) indicating when the sample will be output. However, pictures marked not for output do not have an output time. Therefore, the presence of pictures marked not for output may violate this requirement of ISOBMFF or may require non-standard workaround techniques.

因此，根据本发明的一或多个技术，计算装置可产生文件，所述文件包括围封媒体内容的媒体数据框。媒体内容包括一连串样本。样本中的每一者包括多层视频数据的存取单元。作为产生文件的部分，计算装置可响应于多层视频数据的位流的至少一存取单元包含具有等于第一值(例如，1)的图片输出旗标的经译码图片及具有等于第二值(例如， 0)的图片输出旗标的经译码图片，使用至少两个播放轨将位流存储于文件中。对于来自至少两个播放轨的每一相应播放轨，相应播放轨的每一样本中的所有经译码图片具有相同的图片输出旗标值。允许输出具有等于第一值(例如，1)的图片输出旗标的图片，且允许将具有等于第二值(例如，0)的图片输出旗标的图片用作参考图片，但不允许将其输出。至少两个播放轨的使用可解决以上描述的问题，这是因为可对每一播放轨中的每一样本指派恰当输出时间，且视频解码器可不输出不允许输出的含有所述样本的播放轨中的图片。Therefore, according to one or more techniques of this disclosure, a computing device may generate a file comprising a media data box enclosing media content. The media content comprises a series of samples. Each of the samples comprises an access unit of multi-layer video data. As part of generating the file, the computing device may store the bitstream in the file using at least two tracks in response to at least one access unit of a bitstream of multi-layer video data including a coded picture having a picture output flag equal to a first value (e.g., 1) and a coded picture having a picture output flag equal to a second value (e.g., 0). For each respective track from the at least two tracks, all coded pictures in each sample of the respective track have the same picture output flag value. Pictures having a picture output flag equal to the first value (e.g., 1) are permitted to be output, and pictures having a picture output flag equal to the second value (e.g., 0) are permitted to be used as reference pictures but are not permitted to be output. The use of at least two tracks can resolve the above-described issues because each sample in each track can be assigned an appropriate output time, and the video decoder can avoid outputting pictures in tracks containing samples that are not permitted to be output.

虽然本发明的所述技术的描述中的许多者描述MV-HEVC、3D-HEVC及SHVC，但读者应了解，本发明的所述技术可适用于其它视频译码标准及/或其扩展。Although many of the descriptions of the techniques of this disclosure describe MV-HEVC, 3D-HEVC, and SHVC, the reader should appreciate that the techniques of this disclosure may be applicable to other video coding standards and/or extensions thereof.

图1为说明可使用本发明中描述的技术的实例视频编码及解码系统10的框图。如图1中所展示，系统10包含源装置12，源装置12产生稍后待由目的地装置14解码的经编码视频数据。源装置12及目的地装置14可包括广泛范围的装置中的任一者，包含桌上型计算机、笔记型(即，膝上型)计算机、平板计算机、机顶盒、电话手持机(例如，所谓的“智能”电话)、所谓的“智能”板、电视、相机、显示装置、数字媒体播放器、视频游戏控制台、视频流式传输装置或类似者。在一些情况下，源装置12和目的地装置14可经装备以用于无线通信。源装置12及目的地装置14可被考虑为视频装置。FIG1 is a block diagram illustrating an example video encoding and decoding system 10 that may use the techniques described in this disclosure. As shown in FIG1 , system 10 includes a source device 12 that generates encoded video data to be later decoded by a destination device 14. Source device 12 and destination device 14 may comprise any of a wide range of devices, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets (e.g., so-called "smart" phones), so-called "smart" pads, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, or the like. In some cases, source device 12 and destination device 14 may be equipped for wireless communication. Source device 12 and destination device 14 may be considered video devices.

在图1的实例中，源装置12包含视频源18、视频编码器20及输出接口22。在一些情况下，输出接口22可包含调制器/解调器(调制解调器)及/或发射器。在源装置12中，视频源18可包含例如视频俘获装置(例如，视频相机)、含有先前所俘获的视频的视频存档、从视频内容提供者接收视频的视频馈入接口及/或用于将计算机图形数据产生为源视频的计算机图形系统的源，或此些源的组合。然而，本发明中所描述的技术可大体上适用于视频译码，且可应用于无线及/或有线应用。1 , source device 12 includes a video source 18, a video encoder 20, and an output interface 22. In some cases, output interface 22 may include a modulator/demodulator (modem) and/or a transmitter. In source device 12, video source 18 may include, for example, a video capture device (e.g., a video camera), a video archive containing previously captured video, a video feed interface that receives video from a video content provider, and/or a computer graphics system for generating computer graphics data as source video, or a combination of such sources. However, the techniques described in this disclosure may be applicable to video coding in general and may be applied to wireless and/or wired applications.

视频编码器20可编码经俘获、经预俘获或计算机产生的视频。源装置12可经由源装置12的输出接口22将经编码视频数据直接发射到目的地装置14。经编码视频数据也可(或替代地)存储到存储装置33上，用于稍后由目的地装置14或其它装置存取，以用于解码及/或播放。Video encoder 20 may encode captured, pre-captured, or computer-generated video. Source device 12 may transmit the encoded video data directly to destination device 14 via output interface 22 of source device 12. The encoded video data may also (or alternatively) be stored on storage device 33 for later access by destination device 14 or other devices for decoding and/or playback.

目的地装置14包含输入接口28、视频解码器30及显示装置32。在一些情况下，输入接口28可包含接收器及/或调制解调器。目的地装置14的输入接口28经由链路16 接收经编码视频数据。经由链路16传达或在存储装置33上所提供的经编码视频数据可包含由视频编码器20所产生的多种语法元素，其供例如视频解码器30等视频解码器在解码所述视频数据时使用。此些语法元素可与在通信媒体上发射、存储于存储媒体上或存储于文件服务器上的经编码视频数据包含在一起。Destination device 14 includes an input interface 28, a video decoder 30, and a display device 32. In some cases, input interface 28 may include a receiver and/or a modem. Input interface 28 of destination device 14 receives encoded video data via link 16. The encoded video data communicated via link 16 or provided on storage device 33 may include a variety of syntax elements generated by video encoder 20 for use by a video decoder, such as video decoder 30, in decoding the video data. Such syntax elements may be included with the encoded video data transmitted on a communication medium, stored on a storage medium, or stored on a file server.

显示装置32可与目的地装置14集成在一起或在目的地装置14的外部。在一些实例中，目的地装置14可包含集成式显示装置且也可经配置以与外部显示装置介接。在其它实例中，目的地装置14可为显示装置。一般来说，显示装置32向用户显示经解码视频数据，且可包括多种显示装置中的任一者，例如，液晶显示器(LCD)、等离子体显示器、有机发光二极管(OLED)显示器或另一类型的显示装置。Display device 32 may be integrated with destination device 14 or external to destination device 14. In some examples, destination device 14 may include an integrated display device and may also be configured to interface with an external display device. In other examples, destination device 14 may be a display device. In general, display device 32 displays decoded video data to a user and may include any of a variety of display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or another type of display device.

视频编码器20及视频解码器30各自可实施为多种合适的编码器电路中的任一者，例如，一或多个微处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、离散逻辑、软件、硬件、固件或其任何组合。当技术部分以软件实施时，装置可将用于软件的指令存储于合适的非暂时性计算机可读媒体中，且使用一或多个处理器执行硬件中的所述指令，从而执行本发明的技术。视频编码器20及视频解码器30 中的每一者可包含在一或多个编码器或解码器中，编码器或解码器中的任一者可集成为相应装置中的组合式编码器/解码器(编解码器)的部分。Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuits, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware, or any combination thereof. When the techniques are implemented in part in software, the device may store instructions for the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors, thereby performing the techniques of this disclosure. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in the respective device.

目的地装置14可经由链路16接收待解码的经编码视频数据。链路16可包括能够将经编码视频数据从源装置12移动到目的地装置14的任何类型的媒体或装置。在一个实例中，链路16可包括使源装置12能够实时将经编码视频数据直接发射到目的地装置 14的通信媒体。可根据通信标准(例如，无线通信协议)调制经编码视频数据，且将经编码视频数据发射到目的地装置14。通信媒体可包括任何无线或有线通信媒体，例如，射频(RF)频谱或一或多个物理传输线。通信媒体可形成基于包的网络(例如，局域网、广域网或例如因特网等全球网络)的部分。通信媒体可包含路由器、交换器、基站或任何其它可以用于促进从源装置12到目的地装置14的通信的设备。Destination device 14 may receive the encoded video data to be decoded via link 16. Link 16 may comprise any type of medium or device capable of moving the encoded video data from source device 12 to destination device 14. In one example, link 16 may comprise a communication medium that enables source device 12 to transmit the encoded video data directly to destination device 14 in real time. The encoded video data may be modulated according to a communication standard (e.g., a wireless communication protocol) and transmitted to destination device 14. The communication medium may comprise any wireless or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the Internet. The communication medium may include routers, switches, base stations, or any other equipment that may be used to facilitate communication from source device 12 to destination device 14.

替代地，输出接口22可将经编码数据输出到存储装置33。类似地，输入接口28可存取经编码数据存储装置33。存储装置33可包含多种分布式或局部存取的数据存储媒体中的任何者，例如，硬盘驱动器、蓝光(Blu-ray)光盘、DVD、CD-ROM、快闪存储器、易失性或非易失性存储器或用于存储经编码视频数据的任何其它合适数字存储媒体。在再一实例中，存储装置33可对应于文件服务器或可固持由源装置12产生的经编码视频的另一中间存储装置。目的地装置14可经由流式传输或下载存取来自存储装置33的经存储的视频数据。文件服务器可为能够存储经编码视频数据并将所述经编码视频数据发射到目的地装置14的任何类型的服务器。实例文件服务器包含网页服务器(例如，用于网站)、FTP服务器、网络附接存储(NAS)装置及本地磁盘驱动器。目的地装置14可经由任何标准数据连接(包含因特网连接)而存取经编码视频数据。此可包含适合于存取存储于文件服务器上的经编码视频数据的无线信道(例如，Wi-Fi连接)、有线连接(例如， DSL、缆线调制解调器等)或两者的结合。经编码视频数据从存储装置33的传输可为流式传输、下载传输或两者的组合。Alternatively, output interface 22 may output the encoded data to storage device 33. Similarly, input interface 28 may access encoded data storage device 33. Storage device 33 may include any of a variety of distributed or locally accessed data storage media, such as a hard drive, Blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data. In yet another example, storage device 33 may correspond to a file server or another intermediate storage device that can hold the encoded video generated by source device 12. Destination device 14 may access the stored video data from storage device 33 via streaming or downloading. A file server may be any type of server capable of storing encoded video data and transmitting it to destination device 14. Example file servers include web servers (e.g., for websites), FTP servers, network attached storage (NAS) devices, and local disk drives. Destination device 14 may access the encoded video data via any standard data connection, including an Internet connection. This may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.) suitable for accessing encoded video data stored on a file server, or a combination of both. The transmission of the encoded video data from storage device 33 may be a streaming transmission, a download transmission, or a combination of both.

本发明的技术不必限于无线应用或设定。所述技术可适用于支持多种多媒体应用(例如，(例如)经由因特网的空中电视广播、有线电视传输、卫星电视传输、流式视频传输)中的任一者的视频译码、供存储于数据存储媒体上的数字视频的编码、存储于数据存储媒体上的数字视频的解码或其它应用。在一些实例中，系统10可经配置以支持单向或双向视频传输以支持例如视频流式传输、视频播放、视频广播及/或视频电话等应用。The techniques of this disclosure are not necessarily limited to wireless applications or settings. The techniques may be applicable to video decoding to support any of a variety of multimedia applications, such as over-the-air television broadcasts, cable television transmissions, satellite television transmissions, streaming video transmissions over the Internet, encoding of digital video for storage on a data storage medium, decoding of digital video stored on a data storage medium, or other applications. In some examples, system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

此外，在图1的实例中，视频译码系统10包含文件产生装置34。文件产生装置34 可接收由源装置12产生的经编码视频数据。文件产生装置34可产生包含经编码视频数据的文件。目的地装置14可接收由文件产生装置34产生的文件。在各种实例中，文件产生装置34可包含各种类型的计算装置。举例来说，文件产生装置34可包括媒体感知网络元件(MANE)、服务器计算装置、个人计算装置、专用计算装置、商用计算装置或另一类型的计算装置。在一些实例中，文件产生装置34为内容传递网络的部分。文件产生装置34可经由例如链路16等信道从源装置12接收经编码视频数据。此外，目的地装置14可经由例如链路16等信道从文件产生装置34接收文件。文件产生装置34可被考虑为视频装置。Furthermore, in the example of FIG. 1 , video coding system 10 includes file generation device 34. File generation device 34 may receive encoded video data generated by source device 12. File generation device 34 may generate a file including the encoded video data. Destination device 14 may receive the file generated by file generation device 34. In various examples, file generation device 34 may include various types of computing devices. For example, file generation device 34 may include a media-aware network element (MANE), a server computing device, a personal computing device, a dedicated computing device, a commercial computing device, or another type of computing device. In some examples, file generation device 34 is part of a content delivery network. File generation device 34 may receive encoded video data from source device 12 via a channel such as link 16. Furthermore, destination device 14 may receive the file from file generation device 34 via a channel such as link 16. File generation device 34 may be considered a video device.

在其它实例中，源装置12或另一计算装置可产生包含经编码视频数据的文件。然而，为了易于解释，本发明将文件产生装置34描述为产生文件。然而，应理解，一般来说，此些描述适用于计算装置。In other examples, source device 12 or another computing device may generate a file including the encoded video data. However, for ease of explanation, this disclosure describes file generation device 34 as generating a file. However, it should be understood that, in general, such descriptions apply to computing devices.

视频编码器20及视频解码器30可根据例如高效率视频译码(HEVC)标准或其扩展等视频压缩标准操作。HEVC标准也可被称作ISO/IEC 23008-2。最近，已由ITU-T视频译码专家组(VCEG)及ISO/IEC动画专家组(MPEG)的视频译码联合协作小组(JCT-VC)完成HEVC的设计。最新近的HEVC草案规格且下文被称作HEVC WD可从 http://phenix.int-evry.fr/jct/doc_end_user/documents/14_Vienna/wg11/JCTVC-N1003-v1.zip 中获得。对HEVC的多视图扩展(即，MV-HEVC)也正由JCT-3V开发。题为“MV-HEVC 草案文本5(MV-HEVC Draft Text5)”且下文被称作MV-HEVC WD5的MV-HEVC的最近工作草案(WD)可从http://phenix.it-sudparis.eu/jct2/doc_end_user/documents/5_Vienna/ wg11/JCT3V-E1004-v6.zip中获得。对HEVC的可缩放扩展(即，SHVC)也正由JCT-VC 开发。题为“高效率视频译码(HEVC)可缩放扩展草案3(High efficiency video coding (HEVC)scalable extension draft 3)”且下文被称作SHVC WD3的SHVC的最近工作草案 (WD)可从http://phenix.it-sudparis.eu/jct/doc_end_user/documents/14_Vienna/wg11/ JCTVC-N1008-v3.zip中获得。HEVC的范围扩展的最近工作草案(WD)可从 http://phenix.int-evry.fr/jct/doc_end_user/documents/14_Vienna/wg11/JCTVC-N1005-v3.zip 中获得。题为“3D-HEVC草案文本1(3D-HEVC Draft Text 1)”的HEVC的3D扩展的最近工作草案(WD)(即，3D-HEVC)可从http://phenix.int-evry.fr/jct2/doc_end_user/ documents/5_Vienna/wg11/JCT3V-E1001-v3.zip中获得。视频编码器20及视频解码器30 可根据此些标准中的一或多者操作。Video encoder 20 and video decoder 30 may operate according to a video compression standard, such as the High Efficiency Video Coding (HEVC) standard or an extension thereof. The HEVC standard may also be referred to as ISO/IEC 23008-2. Recently, the design of HEVC was completed by the Joint Collaboration Team on Video Coding (JCT-VC) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Motion Picture Experts Group (MPEG). The most recent HEVC draft specification, hereinafter referred to as HEVC WD, is available at http://phenix.int-evry.fr/jct/doc_end_user/documents/14_Vienna/wg11/JCTVC-N1003-v1.zip. A multi-view extension to HEVC, namely MV-HEVC, is also being developed by JCT-3V. A recent working draft (WD) of MV-HEVC, entitled "MV-HEVC Draft Text 5," hereinafter referred to as MV-HEVC WD5, is available at http://phenix.it-sudparis.eu/jct2/doc_end_user/documents/5_Vienna/wg11/JCT3V-E1004-v6.zip. A scalable extension to HEVC, namely SHVC, is also being developed by JCT-VC. A recent working draft (WD) of SHVC, entitled "High Efficiency Video Coding (HEVC) Scalable Extension Draft 3," hereinafter referred to as SHVC WD3, is available at http://phenix.it-sudparis.eu/jct/doc_end_user/documents/14_Vienna/wg11/JCTVC-N1008-v3.zip. A recent working draft (WD) of the range extension of HEVC is available at http://phenix.int-evry.fr/jct/doc_end_user/documents/14_Vienna/wg11/JCTVC-N1005-v3.zip. A recent working draft (WD) of the 3D extension of HEVC (i.e., 3D-HEVC), entitled “3D-HEVC Draft Text 1,” is available at http://phenix.int-evry.fr/jct2/doc_end_user/documents/5_Vienna/wg11/JCT3V-E1001-v3.zip. Video encoder 20 and video decoder 30 may operate according to one or more of these standards.

替代地，视频编码器20及视频解码器30可根据其它专属或行业标准(例如，ITU-TH.264标准，替代地被称作MPEG-4，第10部分，高级视频译码(AVC))或此些标准的扩展而操作。然而，本发明的技术不限于任何特定译码标准。视频压缩标准的其它实例包含ITU-TH.261、ISO/IEC MPEG-1Visual、ITU-T H.262或ISO/IEC MPEG-2Visual、ITU-T H.263、ISO/IEC MPEG-4Visual及ITU-T H.264(也称为ISO/IEC MPEG-4AVC)，包含其可缩放视频译码(SVC)及多视图视频译码(MVC)扩展。Alternatively, video encoder 20 and video decoder 30 may operate according to other proprietary or industry standards, such as the ITU-T H.264 standard, alternatively referred to as MPEG-4, Part 10, Advanced Video Coding (AVC), or extensions of such standards. However, the techniques of this disclosure are not limited to any particular coding standard. Other examples of video compression standards include ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual, and ITU-T H.264 (also known as ISO/IEC MPEG-4 AVC), including its scalable video coding (SVC) and multi-view video coding (MVC) extensions.

尽管图1中未展示，但在一些方面中，视频编码器20及视频解码器30可各自与音频编码器及解码器集成，且可包含适当MUX-DEMUX单元或其它硬件及软件，以处置共同数据流或单独数据流中的音频及视频两者的编码。如果适用，那么在一些实例中， MUX-DEMUX单元可符合ITU H.223多路复用器协议或其它协议(例如，用户数据报协议(UDP))。Although not shown in FIG1 , in some aspects, video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder and may include appropriate MUX-DEMUX units or other hardware and software to handle the encoding of both audio and video in a common data stream or in separate data streams. If applicable, in some examples, the MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol or other protocols, such as the User Datagram Protocol (UDP).

JCT-VC正致力于HEVC标准的开发。HEVC标准化努力是基于视频译码装置的演进型模型(被称作HEVC测试模型(HM))。HM根据(例如)ITU-T H.264/AVC假定视频译码装置相对于现有装置的若干额外能力。举例来说，尽管H.264/AVC提供九个帧内预测编码模式，但HM可提供多达三十三个帧内预测编码模式。The JCT-VC is working on the development of the HEVC standard. The HEVC standardization effort is based on an evolved model of a video coding device, known as the HEVC Test Model (HM). The HM assumes several additional capabilities of video coding devices relative to existing devices, such as those in ITU-T H.264/AVC. For example, while H.264/AVC provides nine intra-prediction coding modes, the HM can provide up to thirty-three intra-prediction coding modes.

一般来说，HM的工作模型描述视频帧或图片可划分成包含明度样本及色度样本两者的树块或最大译码单元(LCU)的序列。树块也可被称作译码树单元(CTU)。树块具有与H.264/AVC标准的宏块类似的用途。切片包含按译码次序的许多连续树块。视频帧或图片可分割成一或多个切片。每一树块可根据四分树而分裂成若干译码单元(CU)。举例来说，作为四分树的根节点的树块可分裂成四个子节点，且每一子节点又可为母节点并分裂成另四个子节点。作为四分树的叶节点的最后未分裂的子节点包括译码节点(即，经译码视频块)。与经译码位流相关联的语法数据可定义树块可分裂的最大次数，且也可定义译码节点的最小大小。In general, the working model of the HM describes that a video frame or picture can be divided into a sequence of treeblocks or largest coding units (LCUs) that include both luma and chroma samples. A treeblock may also be referred to as a coding tree unit (CTU). A treeblock has a similar purpose to a macroblock of the H.264/AVC standard. A slice includes many consecutive treeblocks in coding order. A video frame or picture can be partitioned into one or more slices. Each treeblock can be split into coding units (CUs) according to a quadtree. For example, a treeblock that is the root node of a quadtree can be split into four child nodes, and each child node can be a parent node and split into another four child nodes. The last unsplit child node that is a leaf node of the quadtree comprises a coding node (i.e., a coded video block). Syntax data associated with the coded bitstream can define the maximum number of times a treeblock can be split, and can also define the minimum size of a coding node.

CU包含译码节点以及与所述译码节点相关联的预测单元(PU)及变换单元(TU)。CU的大小对应于译码节点的大小且形状必须为正方形。CU的大小可在从8x8像素高达具有最大64x64像素或大于64x64像素的树块的大小的范围内。每一CU可含有一或多个 PU及一或多个TU。与CU相关联的语法数据可描述(例如)CU到一或多个PU的分割。分割模式可在CU经跳过或直接模式编码、帧内预测模式编码或是帧间预测模式编码之间不同。PU可经分割成非正方形形状。与CU相关联的语法数据也可描述(例如)CU根据四分树到一或多个TU的分割。TU的形状可为正方形或非正方形。A CU comprises a coding node and the prediction units (PUs) and transform units (TUs) associated with the coding node. The size of the CU corresponds to the size of the coding node and must be square in shape. The size of a CU can range from 8x8 pixels up to the size of a treeblock with a maximum of 64x64 pixels or larger. Each CU may contain one or more PUs and one or more TUs. Syntax data associated with a CU may describe, for example, the partitioning of the CU into one or more PUs. The partitioning mode may differ between whether the CU is skip or direct mode coded, intra-prediction mode coded, or inter-prediction mode coded. A PU may be partitioned into a non-square shape. Syntax data associated with a CU may also describe, for example, the partitioning of the CU into one or more TUs according to a quadtree. The shape of a TU may be square or non-square.

HEVC标准允许根据TU进行变换，所述变换对于不同CU可不同。通常基于针对经分割LCU所定义的给定CU内的PU的大小来对TU设定大小，但情况可能并非始终如此。TU的大小通常与PU相同或比PU小。在一些实例中，可使用被称为“残余四分树”(RQT)的四分树结构而将对应于CU的残余样本再分为较小单元。RQT的叶节点可被称作TU。与TU相关联的像素差值可经变换以产生可加以量化的变换系数。The HEVC standard allows for transforms to be performed on a per-TU basis, which can be different for different CUs. TUs are typically sized based on the size of the PUs within a given CU defined for a partitioned LCU, but this may not always be the case. TUs are typically the same size as or smaller than the PUs. In some examples, a quadtree structure called a "residual quadtree" (RQT) can be used to subdivide the residual samples corresponding to a CU into smaller units. The leaf nodes of the RQT can be referred to as TUs. Pixel difference values associated with a TU can be transformed to produce transform coefficients that can be quantized.

一般来说，PU包含与预测过程有关的数据。举例来说，当PU经帧内模式编码时， PU可包含描述用于PU的帧内预测模式的数据。作为另一实例，当PU经帧间模式编码时，PU可包含定义PU的运动向量的数据。定义PU的运动向量的数据可描述(例如)运动向量的水平分量、运动向量的垂直分量、运动向量的分辨率(例如，四分之一像素精度或八分之一像素精度)、运动向量所指向的参考图片及/或运动向量的参考图片列表(例如，列表0、列表1或列表C)。In general, a PU includes data related to the prediction process. For example, when the PU is intra-mode encoded, the PU may include data describing the intra-prediction mode used for the PU. As another example, when the PU is inter-mode encoded, the PU may include data defining a motion vector for the PU. The data defining the motion vector for the PU may describe, for example, the horizontal component of the motion vector, the vertical component of the motion vector, the resolution of the motion vector (e.g., quarter-pixel precision or eighth-pixel precision), the reference picture to which the motion vector points, and/or the reference picture list for the motion vector (e.g., List 0, List 1, or List C).

一般来说，TU用于变换及量化过程。具有一或多个PU的给定CU也可包含一或多个变换单元(TU)。在预测之后，视频编码器20可计算对应于PU的残余值。残余值包含像素差值，所述像素差值可变换成变换系数、经量化且使用TU进行扫描以产生串行化变换系数用于熵译码。本发明通常使用术语“视频块”来指CU的译码节点(即，译码块)。在一些特定情况下，本发明也可使用术语“视频块”指树块(即，LCU)或CU，其包含译码节点及PU与TU。In general, TUs are used for transform and quantization processes. A given CU having one or more PUs may also include one or more transform units (TUs). After prediction, video encoder 20 may calculate residual values corresponding to the PUs. The residual values include pixel difference values, which may be transformed into transform coefficients, quantized, and scanned using the TUs to produce serialized transform coefficients for entropy coding. This disclosure generally uses the term "video block" to refer to a coding node (i.e., a coding block) of a CU. In some specific cases, this disclosure may also use the term "video block" to refer to a treeblock (i.e., an LCU) or a CU, which includes a coding node and PUs and TUs.

视频序列通常包含一系列视频帧或图片。图片群组(GOP)大体上包括一系列视频图片中的一或多者。GOP可包含GOP的标头、图片中的一或多者的标头或别处的语法数据，所述语法数据描述包含于GOP中的图片的数目。图片的每一切片可包含描述所述相应切片的编码模式的切片语法数据。视频编码器20通常对个别视频切片内的视频块进行操作，以便编码视频数据。视频块可对应于CU内的译码节点。视频块可具有固定或变化的大小，且可根据指定译码标准而大小不同。A video sequence typically includes a series of video frames or pictures. A group of pictures (GOP) generally includes one or more of a series of video pictures. A GOP may include a header for the GOP, a header for one or more of the pictures, or syntax data elsewhere that describes the number of pictures included in the GOP. Each slice of a picture may include slice syntax data that describes the coding mode for the respective slice. Video encoder 20 typically operates on video blocks within individual video slices to encode the video data. A video block may correspond to a coding node within a CU. A video block may have a fixed or varying size and may be sized differently according to a specified coding standard.

作为一实例，HM支持以各种PU大小的预测。假定特定CU的大小为2Nx2N，那么HM支持以2Nx2N或NxN的PU大小的帧内预测，及以2Nx2N、2NxN、Nx2N或 NxN的对称PU大小的帧间预测。HM也支持以2NxnU、2NxnD、nLx2N及nRx2N的PU大小的帧间预测的不对称分割。在不对称分割中，CU的一个方向未分割，而另一方向分割成25％及75％。CU的对应于25％分割的部分由“n”其后接着“上(Up)”、“下 (Down)”、“左(Left)”或“右(Right)”的指示来指示。因此，举例来说，“2NxnU”指水平上以顶部的2Nx0.5N PU及底部的2Nx1.5N PU分割的2Nx2N CU。As an example, the HM supports prediction at various PU sizes. Assuming a particular CU is 2Nx2N, the HM supports intra prediction at PU sizes of 2Nx2N or NxN, and inter prediction at symmetric PU sizes of 2Nx2N, 2NxN, Nx2N, or NxN. The HM also supports asymmetric partitioning for inter prediction at PU sizes of 2NxnU, 2NxnD, nLx2N, and nRx2N. In an asymmetric partitioning, the CU is unpartitioned in one direction and partitioned into 25% and 75% in the other direction. The portion of the CU corresponding to the 25% partition is indicated by an "n" followed by an indication of "Up," "Down," "Left," or "Right." Thus, for example, "2NxnU" refers to a 2Nx2N CU partitioned horizontally with a 2Nx0.5N PU at the top and a 2Nx1.5N PU at the bottom.

在本发明中，“NxN”与“N乘N”可互换地使用以指视频块在垂直维度与水平维度方面的像素尺寸，例如，16x16像素或16乘16像素。一般来说，16x16块在垂直方向上具有16个像素(y＝16)且在水平方向上将具有16个像素(x＝16)。同样地，NxN块通常在垂直方向上具有N个像素且在水平方向上具有N个像素，其中N表示非负整数值。可按行及列来排列块中的像素。此外，块未必需要在水平方向上与垂直方向上具有相同数目个像素。举例来说，块可包括NxM像素，其中M未必等于N。In this disclosure, "NxN" and "N by N" are used interchangeably to refer to the pixel dimensions of a video block in terms of vertical and horizontal dimensions, e.g., 16x16 pixels or 16 by 16 pixels. Generally, a 16x16 block will have 16 pixels in the vertical direction (y=16) and 16 pixels in the horizontal direction (x=16). Similarly, an NxN block will typically have N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. The pixels in a block may be arranged in rows and columns. Furthermore, a block need not necessarily have the same number of pixels in the horizontal and vertical directions. For example, a block may comprise NxM pixels, where M is not necessarily equal to N.

在使用CU的PU的帧内预测性或帧间预测性译码之后，视频编码器20可计算CU 的TU的残余数据。PU可包括空间域中的像素数据(也被称作像素域)，且TU可包括在将变换(例如，离散余弦变换(DCT)、整数变换、小波变换或概念上类似的变换)应用于残余视频数据之后的变换域中的系数。所述残余数据可对应于未经编码的图片的像素与对应于PU的预测值之间的像素差。视频编码器20可形成包含CU的残余数据的TU，且接着变换所述TU以产生CU的变换系数。After intra-predictive or inter-predictive coding of the PUs of a CU, video encoder 20 may calculate residual data for the TUs of the CU. The PUs may include pixel data in the spatial domain (also referred to as the pixel domain), and the TUs may include coefficients in the transform domain after applying a transform (e.g., a discrete cosine transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform) to the residual video data. The residual data may correspond to pixel differences between pixels of an unencoded picture and prediction values corresponding to the PU. Video encoder 20 may form TUs including the residual data for the CU, and then transform the TUs to produce transform coefficients for the CU.

在进行用以产生变换系数的任何变换之后，视频编码器20可对变换系数执行量化。量化通常指将变换系数量化以可能地减少用以表示所述系数的数据的量，从而提供进一步压缩的过程。量化过程可减少与系数中的一些或所有相关联的位深度。举例来说，可在量化期间将n位值降值舍位到m位值，其中n大于m。After performing any transforms to produce transform coefficients, video encoder 20 may perform quantization on the transform coefficients. Quantization generally refers to the process of quantizing the transform coefficients to potentially reduce the amount of data used to represent the coefficients, thereby providing further compression. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value may be truncated to an m-bit value during quantization, where n is greater than m.

在一些实例中，视频编码器20可使用预定义扫描次序来扫描经量化变换系数以产生可经熵编码的经串行化向量。在其它实例中，视频编码器20可执行自适应扫描。在扫描经量化变换系数以形成一维向量之后，视频编码器20可(例如)根据上下文自适应可变长度译码(CAVLC)、上下文自适应二进制算术译码(CABAC)、基于语法的上下文自适应二进制算术译码(SBAC)、概率区间分割熵(PIPE)译码或另一熵编码方法来熵编码一维向量。视频编码器20也可熵编码与经编码的视频数据相关联的供由视频解码器30在解码视频数据过程中使用的语法元素。In some examples, video encoder 20 may use a predefined scan order to scan the quantized transform coefficients to produce a serialized vector that can be entropy encoded. In other examples, video encoder 20 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 20 may entropy encode the one-dimensional vector, for example, according to context-adaptive variable length coding (CAVLC), context-adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding, or another entropy encoding method. Video encoder 20 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 30 in decoding the video data.

为了执行CABAC，视频编码器20可将上下文模型内的上下文指派到待发射的符号。所述上下文可与(例如)符号的邻近值是否为非零有关。为了执行CAVLC，视频编码器 20可选择用于待发射的符号的可变长度码。可变长度译码(VLC)中的码字可经建构使得相对较短代码对应于更可能的符号，而较长代码对应于较不可能的符号。以此方式，相对于(例如)针对待发射的每一符号使用相等长度码字，使用VLC可达成位节省。概率确定可基于指派给符号的上下文而进行。To perform CABAC, video encoder 20 may assign context within a context model to a symbol to be transmitted. The context may relate to, for example, whether neighboring values of the symbol are non-zero. To perform CAVLC, video encoder 20 may select a variable length code for the symbol to be transmitted. Codewords in variable length coding (VLC) may be constructed such that relatively shorter codes correspond to more likely symbols, while longer codes correspond to less likely symbols. In this way, using VLC may achieve bit savings relative to, for example, using equal length codewords for each symbol to be transmitted. Probability determinations may be made based on the context assigned to the symbol.

视频编码器20可输出包含形成经译码图片及相关联数据的表示的位序列的位流。术语“位流”可为用以指网络抽象层(NAL)单元流(例如，一连串NAL单元)或字节流(例如，含有开始码首码的NAL单元流及如由HEVC标准的附录B指定的NAL单元的封装)的集合性术语。NAL单元为含有NAL单元中的数据的类型的指示及含有所述数据的呈按需要穿插有仿真阻止位的原始字节序列有效负载(RBSP)的形式的字节的语法结构。 NAL单元中的每一者可包含NAL单元标头且可封装RBSP。NAL单元标头可包含指示 NAL单元类型码的语法元素。由NAL单元的NAL单元标头指定的NAL单元类型码指示NAL单元的类型。RBSP可为含有封装在NAL单元内的整数数目个字节的语法结构。在一些情况下，RBSP包含零个位。Video encoder 20 may output a bitstream comprising a sequence of bits forming a representation of a coded picture and associated data. The term "bitstream" may be a collective term used to refer to a network abstraction layer (NAL) unit stream (e.g., a series of NAL units) or a byte stream (e.g., a NAL unit stream containing a start code header and an encapsulation of NAL units as specified by Annex B of the HEVC standard). A NAL unit is a syntax structure containing an indication of the type of data in the NAL unit and bytes of that data in the form of a raw byte sequence payload (RBSP), interspersed with emulation prevention bits as needed. Each NAL unit may include a NAL unit header and may encapsulate an RBSP. The NAL unit header may include a syntax element indicating a NAL unit type code. The NAL unit type code specified by the NAL unit header of the NAL unit indicates the type of the NAL unit. The RBSP may be a syntax structure containing an integer number of bytes encapsulated within the NAL unit. In some cases, the RBSP comprises zero bits.

不同类型的NAL单元可封装不同类型的RBSP。举例来说，第一类型的NAL单元可封装图片参数集(PPS)的RBSP，第二类型的NAL单元可封装切片片段的RBSP，第三类型的NAL单元可封装补充增强信息(SEI)的RBSP，等等。封装视频译码数据的 RBSP(如与参数集及SEI消息的RBSP相对)的NAL单元可被称作视频译码层(VCL)NAL 单元。含有参数集(例如，视频参数集(VPS)、序列参数集(SPS)、PPS等)的NAL单元可被称作参数集NAL单元。Different types of NAL units can encapsulate different types of RBSPs. For example, a first type of NAL unit can encapsulate an RBSP for a picture parameter set (PPS), a second type of NAL unit can encapsulate an RBSP for a slice segment, a third type of NAL unit can encapsulate an RBSP for supplemental enhancement information (SEI), and so on. NAL units that encapsulate RBSPs for video coding data (as opposed to RBSPs for parameter sets and SEI messages) can be referred to as video coding layer (VCL) NAL units. NAL units that contain parameter sets (e.g., video parameter sets (VPS), sequence parameter sets (SPS), PPS, etc.) can be referred to as parameter set NAL units.

本发明可将封装片段切片的RBSP的NAL单元称作经译码切片NAL单元。如在 HEVCWD中所定义，切片片段为在平铺块扫描中连续定序且在单一NAL单元中含有的整数数目个CTU。相比之下，在HEVC WD中，切片可为一独立切片片段及(如果有)在同一存取单元内在下一个独立切片片段(如果有)前的所有随后附属切片片段中含有的整数数目个CTU。独立切片片段为切片片段标头的语法元素的值并非从先前切片片段的值推断的切片片段。附属切片片段为切片片段标头的一些语法元素的值从按解码次序先前独立切片片段的值推断的切片片段。经译码切片NAL单元的RBSP可包含切片片段标头及切片数据。切片片段标头为含有涉及在切片片段中表示的第一或所有CTU的数据元素的经译码切片片段的一部分。切片标头为独立切片片段的切片片段标头，所述独立切片区是当前切片片段或按解码次序在当前附属切片片段前最近的独立切片片段。This disclosure may refer to the NAL unit that encapsulates the RBSP for a segment slice as a coded slice NAL unit. As defined in HEVC WD, a slice segment is an integer number of CTUs that are sequentially ordered in a tile scan and contained in a single NAL unit. In contrast, in HEVC WD, a slice may be an integer number of CTUs contained in an independent slice segment and, if any, all subsequent dependent slice segments preceding the next independent slice segment (if any) within the same access unit. An independent slice segment is a slice segment in which the values of the syntax elements of the slice segment header are not inferred from the values of the previous slice segment. A dependent slice segment is a slice segment in which the values of some syntax elements of the slice segment header are inferred from the values of the previous independent slice segment in decoding order. The RBSP of a coded slice NAL unit may include a slice segment header and slice data. The slice segment header is a portion of the coded slice segment that contains data elements related to the first or all CTUs represented in the slice segment. The slice header is the slice segment header of an independent slice segment, where the independent slice region is the current slice segment or the independent slice segment that is closest to the current dependent slice segment in decoding order.

VPS为含有适用于零或多个全部经译码视频序列(CVS)的语法元素的语法结构。SPS 为含有适用于零或多个全部CVS的语法元素的语法结构。SPS可包含识别在SPS在作用中时在作用中的VPS的语法元素。因此，VPS的语法元素可比SPS的语法元素更一般化地可适用。A VPS is a syntax structure that contains syntax elements applicable to zero or more all coded video sequences (CVSs). An SPS is a syntax structure that contains syntax elements applicable to zero or more all CVSs. An SPS may include syntax elements that identify the VPS that is active when the SPS is active. Thus, syntax elements of a VPS may be more generally applicable than syntax elements of an SPS.

参数集(例如，VPS、SPS、PPS等)可含有直接或间接从切片的切片标头参考的识别。参考过程被称为“启动”。因此，当视频解码器30正解码特定切片时，由所述特定切片的切片标头中的语法元素直接或间接参考的参数集被称为“经启动”。取决于参数集类型，启动可基于每一图片或基于每一序列发生。举例来说，切片的切片标头可包含识别 PPS的语法元素。因此，当视频译码器译码切片时，可启动PPS。此外，PPS可包含识别SPS的语法元素。因此，当识别SPS的PPS经启动时，可启动SPS。SPS可包含识别 VPS的语法元素。因此，当识别VPS的SPS经启动时，启动VPS。A parameter set (e.g., VPS, SPS, PPS, etc.) may contain an identification referenced directly or indirectly from a slice header of a slice. This reference process is referred to as "activation." Thus, when video decoder 30 is decoding a particular slice, a parameter set referenced directly or indirectly by syntax elements in the slice header of that particular slice is said to be "activated." Depending on the parameter set type, activation may occur on a per-picture or per-sequence basis. For example, a slice header of a slice may include syntax elements identifying a PPS. Thus, when the video coder decodes the slice, the PPS may be activated. Furthermore, the PPS may include syntax elements identifying an SPS. Thus, when the PPS identifying the SPS is activated, the SPS may be activated. The SPS may include syntax elements identifying a VPS. Thus, when the SPS identifying the VPS is activated, the VPS is activated.

视频解码器30可接收由视频编码器20产生的位流。此外，视频解码器30可剖析所述位流以从所述位流获得语法元素。视频解码器30可至少部分基于从位流获得的语法元素重建构视频数据的图片。重建构视频数据的过程可与由视频编码器20执行的过程大体上互逆。举例来说，视频解码器30可使用PU的运动向量确定当前CU的PU的预测性块。此外，视频解码器30可反量化当前CU的TU的系数块。视频解码器30可对系数块执行反变换，以重建构当前CU的TU的变换块。通过将当前CU的PU的预测性块的样本添加到当前CU的TU的变换块的对应样本，视频解码器30可重建构当前 CU的译码块。通过重建构图片的每一CU的译码块，视频解码器30可重建构图片。Video decoder 30 may receive a bitstream generated by video encoder 20. Furthermore, video decoder 30 may parse the bitstream to obtain syntax elements from the bitstream. Video decoder 30 may reconstruct a picture of the video data based, at least in part, on the syntax elements obtained from the bitstream. The process of reconstructing the video data may be generally reciprocal to the process performed by video encoder 20. For example, video decoder 30 may determine the predictive blocks of the PUs of the current CU using the motion vectors of the PUs. Furthermore, video decoder 30 may inverse quantize the coefficient blocks of the TUs of the current CU. Video decoder 30 may perform an inverse transform on the coefficient blocks to reconstruct the transform blocks of the TUs of the current CU. Video decoder 30 may reconstruct the coding blocks of the current CU by adding samples of the predictive blocks of the PUs of the current CU to corresponding samples of the transform blocks of the TUs of the current CU. By reconstructing the coding blocks of each CU of the picture, video decoder 30 may reconstruct the picture.

在HEVC WD中，CVS可开始于瞬时解码刷新(IDR)图片，或断链存取(BLA)图片，或为位流中的第一图片的清洁随机存取(CRA)图片，包含并非IDR或BLA图片的所有随后图片。IDR图片仅含有I切片(即，仅使用帧内预测的切片)。IDR图片可为按解码次序在位流中的第一图片，或可稍后出现在位流中。每一IDR图片为按解码次序CVS 的第一图片。在HEVC WD中，IDR图片可为每一VCL NAL单元具有等于IDR_W_RADL 或IDR_N_LP的nal_unit_type的帧内随机存取点(IRAP)图片。In HEVC WD, a CVS can start with an instantaneous decoding refresh (IDR) picture, a broken link access (BLA) picture, or a clean random access (CRA) picture that is the first picture in the bitstream, including all subsequent pictures that are not IDR or BLA pictures. IDR pictures contain only I slices (i.e., slices that use only intra prediction). IDR pictures can be the first picture in the bitstream in decoding order, or they can appear later in the bitstream. Each IDR picture is the first picture of a CVS in decoding order. In HEVC WD, an IDR picture can be an intra random access point (IRAP) picture with nal_unit_type equal to IDR_W_RADL or IDR_N_LP for each VCL NAL unit.

IDR图片可用于随机存取。然而，按解码次序在IDR图片后的图片不能使用在IDR图片前解码的图片作为参考。因此，依赖于供随机存取的IDR图片的位流可具有比使用额外类型的随机存取图片的位流显著低的译码效率。在至少一些实例中，IDR存取单元为含有IDR图片的存取单元。IDR pictures can be used for random access. However, pictures that follow an IDR picture in decoding order cannot use pictures decoded before the IDR picture as references. Therefore, a bitstream that relies on IDR pictures for random access may have significantly lower coding efficiency than a bitstream that uses additional types of random access pictures. In at least some examples, an IDR access unit is an access unit that contains an IDR picture.

在HEVC中引入CRA图片的概念以允许按解码次序在CRA图片之后但按输出次序在CRA图片之前的图片将在所述CRA图片前解码的图片用于参考。按解码次序在CRA 图片之后但按输出次序在CRA图片之前的图片被称作与CRA图片相关联的前置图片 (或CRA图片的前置图片)。即，为了改进译码效率，在HEVC中引入CRA图片的概念，以允许按解码次序在CRA图片之后但按输出次序在CRA图片之前的图片将在CRA图片前解码的图片用于参考。CRA存取单元为经译码图片为CRA图片的存取单元。在 HEVC WD中，CRA图片为每一VCL NAL单元具有等于CRA_NUT的nal_unit_type的帧内随机存取图片。HEVC introduces the concept of CRA pictures to allow pictures that follow a CRA picture in decoding order but precede it in output order to reference pictures decoded before the CRA picture. Pictures that follow a CRA picture in decoding order but precede it in output order are called preceding pictures associated with the CRA picture (or preceding pictures of the CRA picture). Specifically, to improve coding efficiency, HEVC introduces the concept of CRA pictures to allow pictures that follow a CRA picture in decoding order but precede it in output order to reference pictures decoded before the CRA picture. A CRA access unit is an access unit whose coded picture is a CRA picture. In HEVC WD, a CRA picture is an intra random access picture with nal_unit_type equal to CRA_NUT for each VCL NAL unit.

CRA图片的前置图片在解码从IDR图片或按解码次序在所述CRA图片前出现的 CRA图片开始的情况下可正确地解码。然而，在发生从CRA图片的随机存取时，CRA 图片的前置图片可为不可解码的。因此，视频解码器在随机存取解码期间通常解码CRA 图片的前置图片。为防止从取决于解码开始处而可能不可用的参考图片的误差传播，按解码次序及输出次序两者在CRA图片后的图片无一者可将按解码次序或输出次序在 CRA图片前的任何图片(其包含前置图片)用于参考。The preceding pictures of a CRA picture can be correctly decoded if decoding starts from an IDR picture or a CRA picture that occurs before the CRA picture in decoding order. However, when random access occurs from a CRA picture, the preceding pictures of the CRA picture may be undecodable. Therefore, a video decoder typically decodes the preceding pictures of a CRA picture during random access decoding. To prevent error propagation from reference pictures that may be unavailable depending on where decoding starts, no pictures that follow the CRA picture in both decoding order and output order can use any pictures that precede the CRA picture in decoding order or output order (including the preceding pictures) for reference.

BLA图片的概念是在引入了CRA图片后在HEVC中引入的，且是基于CRA图片的概念。BLA图片通常源自在CRA图片的位置处拼接的位流，且在所述拼接的位流中，将所述拼接点CRA图片改变到BLA图片。因此，BLA图片可为在原始位流处的CRA 图片，且CRA图片由位流拼接器改变为在所述CRA图片的位置处的位流拼接后的BLA 图片。在一些情况下，含有RAP图片的存取单元可在本文中被称作RAP存取单元。BLA 存取单元为含有BLA图片的存取单元。在HEVC WD中，BLA图片可为帧内随机存取图片，对于所述帧内随机存取图片，每一VCL NAL单元具有等于BLA_W_LP、 BLA_W_RADL或BLA_N_LP的nal_unit_type。The concept of BLA pictures was introduced in HEVC after the introduction of CRA pictures and is based on the concept of CRA pictures. BLA pictures are usually derived from a bitstream spliced at the position of a CRA picture, and in the spliced bitstream, the splicing point CRA picture is changed to a BLA picture. Therefore, a BLA picture may be a CRA picture at the original bitstream, and the CRA picture is changed by the bitstream splicer to a BLA picture after bitstream splicing at the position of the CRA picture. In some cases, an access unit containing a RAP picture may be referred to herein as a RAP access unit. A BLA access unit is an access unit containing a BLA picture. In HEVC WD, a BLA picture may be an intra-frame random access picture, for which each VCL NAL unit has a nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP.

一般来说，IRAP图片含有仅I切片，且可为BLA图片、CRA图片或IDR图片。举例来说，HEVC WD指示IRAP图片可为每一VCL NAL单元具有在BLA_W_LP到 RSV_IRAP_VCL23的范围中(包含性)的nal_unit_type的经译码图片。此外，HEVC WD 指示按解码次序在位流的第一图片必须为IRAP图片。HEVC WD的表7-1展示NAL单元类型码及NAL单元类型类别。以下再现HEVC WD的表7-1。In general, an IRAP picture contains only I slices and can be a BLA picture, a CRA picture, or an IDR picture. For example, HEVC WD indicates that an IRAP picture can be a coded picture for which each VCL NAL unit has a nal_unit_type in the range of BLA_W_LP to RSV_IRAP_VCL23, inclusive. Furthermore, HEVC WD indicates that the first picture in the bitstream in decoding order must be an IRAP picture. Table 7-1 of HEVC WD shows the NAL unit type code and NAL unit type category. Table 7-1 of HEVC WD is reproduced below.

表7-1-NAL单元类型码及NAL单元类型类别Table 7-1 - NAL unit type code and NAL unit type category

BLA图片与CRA图片之间的一差异如下。对于CRA图片，如果解码开始于按解码次序在CRA图片前的RAP图片，那么相关联的前置图片可正确地解码。然而，当发生从CRA图片的随机存取时(即，当解码从所述CRA图片开始时，或换句话说，当所述 CRA图片为位流中的第一图片时)，与所述CRA图片相关联的前置图片不可正确地解码。相比之下，可能不存在与BLA图片相关联的前置图片可解码的情形，甚至当解码从按解码次序在BLA图片前的RAP图片开始时。One difference between BLA pictures and CRA pictures is as follows. For a CRA picture, if decoding starts from a RAP picture that precedes the CRA picture in decoding order, the associated leading pictures can be correctly decoded. However, when random access occurs from a CRA picture (i.e., when decoding starts from the CRA picture, or in other words, when the CRA picture is the first picture in the bitstream), the leading pictures associated with the CRA picture cannot be correctly decoded. In contrast, there may not be a situation where the leading pictures associated with a BLA picture are decodable, even when decoding starts from a RAP picture that precedes the BLA picture in decoding order.

与特定CRA图片或特定BLA图片相关联的前置图片中的一些可正确地解码，甚至当所述特定CRA图片或所述特定BLA图片为位流中的第一图片时。此些前置图片可被称作可解码前置图片(DLP)或随机存取可解码前置(RADL)图片。在HEVC WD中，RADL 图片可为每一VCL NAL单元具有等于RADL_R或RADL_N的nal_unit_type的经译码图片。此外，HEVC WD指示所有RADL图片为前置图片且不将RADL图片用作用于同一相关联的IRAP图片的后置图片的解码过程的参考图片。当存在时，所有RADL图片按解码次序先于同一相关联的IRAP图片的所有后置图片。HEVC WD指示RADL存取单元可为经译码图片为RADL图片的存取单元。后置图片可为按输出次序在相关联的 IRAP图片后的图片(即，按解码次序的先前IRAP图片)。Some of the leading pictures associated with a particular CRA picture or a particular BLA picture can be correctly decoded even when the particular CRA picture or the particular BLA picture is the first picture in the bitstream. Such leading pictures may be referred to as decodable leading pictures (DLP) or random access decodable leading (RADL) pictures. In HEVC WD, a RADL picture may be a coded picture with nal_unit_type equal to RADL_R or RADL_N for each VCL NAL unit. Furthermore, HEVC WD indicates that all RADL pictures are leading pictures and that RADL pictures are not used as reference pictures for the decoding process of following pictures of the same associated IRAP picture. When present, all RADL pictures precede all following pictures of the same associated IRAP picture in decoding order. HEVC WD indicates that a RADL access unit may be an access unit whose coded picture is a RADL picture. Following pictures may be pictures that follow the associated IRAP picture in output order (i.e., the previous IRAP picture in decoding order).

其它前置图片可被称作不可解码前置图片(NLP)或随机存取跳过前置(RASL)图片。在HEVC WD中，RASL图片可为每一VCL NAL单元具有等于RASL_R或RASL_N的 nal_unit_type的经译码图片。所有RASL图片皆为相关联的BLA图片或CRA图片的前置图片。Other preceding pictures may be referred to as non-decodable preceding pictures (NLP) or random access skip preceding (RASL) pictures. In HEVC WD, RASL pictures may be coded pictures with nal_unit_type equal to RASL_R or RASL_N for each VCL NAL unit. All RASL pictures are preceding pictures of the associated BLA picture or CRA picture.

假设必要参数集在其需要启动时可用，那么IRAP图片及按解码次序所有随后非RASL图片可正确地解码，而不执行按解码次序在IRAP图片前的任何图片的解码过程。在位流中可存在仅含有并非IRAP图片的I切片的图片。Assuming the necessary parameter sets are available when they are needed for activation, an IRAP picture and all subsequent non-RASL pictures in decoding order can be correctly decoded without performing the decoding process for any pictures preceding the IRAP picture in decoding order. There may be pictures in the bitstream that contain only I slices that are not IRAP pictures.

在多视图译码中，可存在来自不同视点的同一场景的多个视图。术语“存取单元”可用以指对应于相同时间实例的图片集。因此，视频数据可经概念化为一系列随时间过去而出现的存取单元。“视图分量”可为单一存取单元中的视图的经译码表示。在本发明中，“视图”可指与相同视图识别符相关联的一连串或一组视图分量。视图分量可含有纹理视图分量及深度视图分量。在本发明中，“视图”可指与相同视图识别符相关联的一组或一连串一或多个视图分量。In multi-view coding, there may be multiple views of the same scene from different viewpoints. The term "access unit" may be used to refer to a set of pictures corresponding to the same time instance. Thus, video data may be conceptualized as a series of access units occurring over time. A "view component" may be a coded representation of a view in a single access unit. In this disclosure, a "view" may refer to a sequence or group of view components associated with the same view identifier. A view component may contain a texture view component and a depth view component. In this disclosure, a "view" may refer to a group or sequence of one or more view components associated with the same view identifier.

纹理视图分量(即，纹理图片)可为单一存取单元中的视图的纹理的经译码表示。纹理视图可为与视图次序索引的相同值相关联的一连串纹理视图分量。视图的视图次序索引可指示所述视图相对于其它视图的相机位置。深度视图分量(即，深度图片)可为单一存取单元中的视图的深度的经译码表示。深度视图可为与视图次序索引的相同值相关联的一组或一连串一或多个深度视图分量。A texture view component (i.e., a texture picture) may be a coded representation of the texture of a view in a single access unit. A texture view may be a sequence of texture view components associated with the same value of a view order index. The view order index of a view may indicate the camera position of the view relative to other views. A depth view component (i.e., a depth picture) may be a coded representation of the depth of a view in a single access unit. A depth view may be a group or sequence of one or more depth view components associated with the same value of a view order index.

在MV-HEVC、3D-HEVC及SHVC中，视频编码器可产生包括一系列NAL单元的位流。位流的不同NAL单元可与位流的不同层相关联。可将层定义为具有相同层识别符的VCL NAL单元及相关联的非VCL NAL单元的集合。层可等效于多视图视频译码中的视图。在多视图视频译码中，层可含有具有不同时间实例的相同层的所有视图分量。每一视图分量可为属于特定时间实例的特定视图的视频场景的经译码图片。在3D视频译码的一些实例中，层可含有特定视图的所有经译码深层图片或特定视图的经译码纹理图片。在3D视频译码的其它实例中，层可含有特定视图的纹理视图分量及深度视图分量两者。类似地，在可缩放视频译码的上下文中，层通常对应于具有不同于其它层中的经译码图片的视频特性的经译码图片。此些视频特性通常包含空间分辨率及质量等级(例如，信噪比)。在HEVC及其扩展中，通过将具有特定时间等级的图片的群组定义为子层，可在一个层内获得时间可缩放性。In MV-HEVC, 3D-HEVC, and SHVC, a video encoder may generate a bitstream comprising a series of NAL units. Different NAL units of the bitstream may be associated with different layers of the bitstream. A layer may be defined as a set of VCL NAL units and associated non-VCL NAL units with the same layer identifier. A layer may be equivalent to a view in multi-view video coding. In multi-view video coding, a layer may contain all view components of the same layer at different time instances. Each view component may be a coded picture of a video scene belonging to a particular view at a particular time instance. In some examples of 3D video coding, a layer may contain all coded depth pictures of a particular view or coded texture pictures of a particular view. In other examples of 3D video coding, a layer may contain both texture view components and depth view components of a particular view. Similarly, in the context of scalable video coding, a layer typically corresponds to a coded picture having different video characteristics than coded pictures in other layers. Such video characteristics typically include spatial resolution and quality level (e.g., signal-to-noise ratio). In HEVC and its extensions, temporal scalability can be achieved within a layer by defining groups of pictures with a specific temporal level as sub-layers.

对于位流的每一相应层，可在不参考任何较高层中的数据的情况下解码较低层中的数据。在可缩放视频译码中，例如，可在不参考增强层中的数据的情况下解码基础层中的数据。一般来说，NAL单元可仅封装单一层的数据。因此，可从位流移除封装位流的最高剩余层的数据的NAL单元，而不影响位流的剩余层中的数据的可解码性。在多视图译码及3D-HEVC中，较高层可包含额外视图分量。在SHVC中，较高层可包含信噪比(SNR)增强数据、空间增强数据及/或时间增强数据。在MV-HEVC、3D-HEVC及SHVC 中，如果视频解码器可在不参考任何其它层的数据的情况下解码层中的图片，那么所述层可被称作“基础层”。基础层可符合HEVC基础规格(例如，HEVC WD)。For each corresponding layer of the bitstream, data in the lower layers can be decoded without reference to data in any higher layers. In scalable video coding, for example, data in the base layer can be decoded without reference to data in the enhancement layers. In general, NAL units can encapsulate data for only a single layer. Therefore, NAL units that encapsulate data for the highest remaining layer of the bitstream can be removed from the bitstream without affecting the decodability of data in the remaining layers of the bitstream. In multi-view coding and 3D-HEVC, higher layers may include additional view components. In SHVC, higher layers may include signal-to-noise ratio (SNR) enhancement data, spatial enhancement data, and/or temporal enhancement data. In MV-HEVC, 3D-HEVC, and SHVC, a layer may be referred to as a "base layer" if the video decoder can decode pictures in the layer without reference to data from any other layer. The base layer may conform to the HEVC base specification (e.g., HEVC WD).

在SVC中，不同于基础层的层可被称作“增强层”，且可提供增强从位流解码的视频数据的视觉质量的信息。SVC可增强空间分辨率、信噪比(即，质量)或时间速率。在可缩放视频译码(例如，SHVC)中，“层表示”可为单一存取单元中的空间层的经译码表示。为了易于解释，本发明可将视图分量及/或层表示称作“视图分量/层表示”或简单地称作“图片”。In SVC, layers other than the base layer may be referred to as "enhancement layers" and may provide information that enhances the visual quality of video data decoded from the bitstream. SVC may enhance spatial resolution, signal-to-noise ratio (i.e., quality), or temporal rate. In scalable video coding (e.g., SHVC), a "layer representation" may be a coded representation of a spatial layer in a single access unit. For ease of explanation, this disclosure may refer to view components and/or layer representations as "view components/layer representations" or simply as "pictures."

为了实施所述层，NAL单元的标头可包含nuh_reserved_zero_6bits语法元素。在HEVC WD中，保留nuh_reserved_zero_6bits语法元素。然而，在MV-HEVC、3D-HEVC 及SVC中，nuh_reserved_zero_6bits语法元素被称作nuh_layer_id语法元素。nuh_layer_id 语法元素指定层的识别符。位流的具有指定不同值的nuh_layer_id语法元素的NAL单元属于位流的不同层。To implement these layers, the NAL unit header may include the nuh_reserved_zero_6bits syntax element. In HEVC WD, the nuh_reserved_zero_6bits syntax element is reserved. However, in MV-HEVC, 3D-HEVC, and SVC, the nuh_reserved_zero_6bits syntax element is referred to as the nuh_layer_id syntax element. The nuh_layer_id syntax element specifies the identifier of a layer. NAL units in a bitstream that specify different values for the nuh_layer_id syntax element belong to different layers of the bitstream.

在一些实例中，如果NAL单元涉及多视图译码(例如，MV-HEVC)、3DV译码(例如，3D-HEVC)或可缩放视频译码(例如，SHVC)中的基础层，那么所述NAL单元的 nuh_layer_id语法元素等于0。可在不参考位流的任何其它层中的数据的情况下解码位流的基础层中的数据。如果NAL单元与多视图译码、3DV或可缩放视频译码中的基础层不相关，那么所述NAL单元的nuh_layer_id语法元素可具有非零值。In some examples, if a NAL unit relates to a base layer in multi-view coding (e.g., MV-HEVC), 3DV coding (e.g., 3D-HEVC), or scalable video coding (e.g., SHVC), the nuh_layer_id syntax element of the NAL unit is equal to 0. Data in the base layer of the bitstream can be decoded without reference to data in any other layer of the bitstream. If a NAL unit does not relate to a base layer in multi-view coding, 3DV, or scalable video coding, the nuh_layer_id syntax element of the NAL unit may have a non-zero value.

此外，在层内的一些视图分量/层表示可在不参考同一层内的其它视图分量/层表示的情况下加以解码。因此，封装层的某些视图分量/层表示的数据的NAL单元可从位流移除，而不影响所述层中的其它视图分量/层表示的可解码性。移除封装此些视图分量/ 层表示的数据的NAL单元可减小位流的帧速率。可在不参考在层内的其它视图分量/层表示的情况下解码的在所述层内的视图分量/层表示的子集可在本文中被称作“子层”或“时间子层”。Furthermore, some view components/layer representations within a layer can be decodable without reference to other view components/layer representations within the same layer. Thus, NAL units encapsulating data for certain view components/layer representations of a layer can be removed from the bitstream without affecting the decodability of other view components/layer representations in that layer. Removing the NAL units encapsulating data for such view components/layer representations can reduce the frame rate of the bitstream. A subset of view components/layer representations within a layer that can be decodable without reference to other view components/layer representations within the layer may be referred to herein as a "sublayer" or "temporal sublayer."

NAL单元可包含指定NAL单元的时间识别符(即，TemporalIds)的temporal_id语法元素。NAL单元的时间识别符识别NAL单元属于的子层。因此，位流的每一子层可具有不同时间识别符。一般来说，如果层的第一NAL单元的时间识别符小于同一层的第二NAL单元的时间识别符，那么可在不参考由第二NAL单元封装的数据的情况下解码由第一NAL单元封装的数据。A NAL unit may include a temporal_id syntax element that specifies the temporal identifier (i.e., TemporalIds) of the NAL unit. The temporal identifier of a NAL unit identifies the sublayer to which the NAL unit belongs. Thus, each sublayer of a bitstream may have a different temporal identifier. In general, if the temporal identifier of the first NAL unit of a layer is less than the temporal identifier of the second NAL unit of the same layer, the data encapsulated by the first NAL unit can be decoded without reference to the data encapsulated by the second NAL unit.

位流可与多个操作点相关联。位流的每一操作点与一组层识别符(例如，一组nuh_layer_id值)及时间识别符相关联。可将所述组层识别符表示为OpLayerIdSet，且可将时间识别符表示为TemporalID。如果NAL单元的层识别符在操作点的层识别符集合中且NAL单元的时间识别符小于或等于操作点的时间识别符，那么所述NAL单元与所述操作点相关联。因此，操作点可对应于所述位流中的NAL单元的子集。A bitstream can be associated with multiple operation points. Each operation point of the bitstream is associated with a set of layer identifiers (e.g., a set of nuh_layer_id values) and a temporal identifier. The set of layer identifiers can be represented as OpLayerIdSet, and the temporal identifier can be represented as TemporalID. A NAL unit is associated with an operation point if its layer identifier is in the layer identifier set of the operation point and its temporal identifier is less than or equal to the temporal identifier of the operation point. Thus, an operation point can correspond to a subset of NAL units in the bitstream.

如上所介绍，本发明涉及基于ISO基本媒体文件格式(ISOBMFF)将视频内容存储于文件中。详言地说，本发明描述用于含有多个经译码层的视频流的存储的各种技术，其中每一层可为可缩放层、纹理视图、深度视图或其它类型的层或视图。本发明的所述技术可应用于(例如)MV-HEVC视频数据、SHVC视频数据、3D-HEVC视频数据及/或其它类型的视频数据的存储。As introduced above, this disclosure relates to storing video content in files based on the ISO Base Media File Format (ISOBMFF). Specifically, this disclosure describes various techniques for storing a video stream containing multiple coded layers, where each layer may be a scalable layer, a texture view, a depth view, or other types of layers or views. The techniques of this disclosure may be applied, for example, to the storage of MV-HEVC video data, SHVC video data, 3D-HEVC video data, and/or other types of video data.

现将简要地论述文件格式及文件格式标准。文件格式标准包含ISO基本媒体文件格式(ISOBMFF、ISO/IEC 14496-12，下文“ISO/IEC 14996-12”)及从ISOBMFF导出的其它文件格式标准，包含MPEG-4文件格式(ISO/IEC 14496-14)、3GPP文件格式(3GPP TS 26.244)及AVC文件格式(ISO/IEC 14496-15，下文“ISO/IEC 14996-15”)。因此，ISO/IEC 14496-12指定ISO基本媒体文件格式。其它文件针对特定应用扩展ISO基本媒体文件格式。举例来说，ISO/IEC 14496-15描述呈ISO基本媒体文件格式的NAL单元结构化视频的载运。H.264/AVC及HEVC以及其扩展为NAL单元结构化视频的实例。ISO/IEC 14496-15包含描述H.264/AVCNAL单元的载运的章节。另外，ISO/IEC 14496-15的第8 节描述HEVC NAL单元的载运。File formats and file format standards will now be briefly discussed. File format standards include the ISO Base Media File Format (ISOBMFF, ISO/IEC 14496-12, hereinafter "ISO/IEC 14996-12") and other file format standards derived from ISOBMFF, including the MPEG-4 file format (ISO/IEC 14496-14), the 3GPP file format (3GPP TS 26.244), and the AVC file format (ISO/IEC 14496-15, hereinafter "ISO/IEC 14996-15"). Thus, ISO/IEC 14496-12 specifies the ISO base media file format. Other documents extend the ISO base media file format for specific applications. For example, ISO/IEC 14496-15 describes the carriage of NAL unit structured video in the ISO base media file format. H.264/AVC and HEVC, and their extensions, are examples of NAL unit structured video. ISO/IEC 14496-15 includes a section describing the carriage of H.264/AVC NAL units. In addition, Section 8 of ISO/IEC 14496-15 describes the carriage of HEVC NAL units.

将ISOBMFF用作用于许多编解码器封装格式(例如，AVC文件格式)以及用于许多多媒体容器格式(例如，MPEG-4文件格式、3GPP文件格式(3GP)及DVB文件格式)的基础。除例如音频及视频等连续媒体之外，例如图像等静态媒体以及元数据可存储于符合 ISOBMFF的文件中。根据ISOBMFF结构化的文件可用于许多用途，包含局部媒体文件播放、远程文件的逐渐下载、用于经由HTTP的动态自适应流式传输(DASH)的片段、用于待流式传输的内容及其包化指令的容器及接收的实时媒体流的记录。因此，虽然原先针对存储而设计，但ISOBMFF已证明用于流式传输(例如，用于逐渐下载或DASH)的价值。为了流式传输目的，可使用在ISOBMFF中定义的电影片段。ISOBMFF serves as the foundation for many codec packaging formats (e.g., the AVC file format) and for many multimedia container formats (e.g., the MPEG-4 file format, the 3GPP file format (3GP), and the DVB file format). In addition to continuous media such as audio and video, static media such as images and metadata can be stored in files conforming to ISOBMFF. Files structured according to ISOBMFF can be used for many purposes, including local media file playback, progressive download of remote files, segments for Dynamic Adaptive Streaming over HTTP (DASH), containers for content to be streamed and its packetization instructions, and recording of received real-time media streams. Thus, although originally designed for storage, ISOBMFF has proven its value for streaming (e.g., for progressive download or DASH). For streaming purposes, movie fragments defined in ISOBMFF can be used.

符合HEVC文件格式的文件可包括一系列叫作框的对象。框可为由唯一类型识别符及长度定义的面向对象建构块。举例来说，框可为ISOBMFF中的基本语法结构，包含四字符译码框类型、框的字节计数及有效负载。换句话说，框可为包括经译码框类型、框的字节计数及有效负载的语法结构。在一些情况下，在符合HEVC文件格式的文件中的所有数据可包括于框内，且在并不在框中的文件中可能不存在数据。因此，ISOBMFF 文件可由一连串框组成，且框可含有其它框。举例来说，框的有效负载可包含一或多个额外框。在本发明中其它处详细描述的图5及图6展示根据本发明的一或多个技术的在文件内的实例框。A file conforming to the HEVC file format may include a series of objects called boxes. A box may be an object-oriented building block defined by a unique type identifier and a length. For example, a box may be a basic syntactic structure in ISOBMFF, comprising a four-character coded box type, a byte count for the box, and a payload. In other words, a box may be a syntactic structure that includes a coded box type, a byte count for the box, and a payload. In some cases, all data in a file conforming to the HEVC file format may be included within a box, and data may not be present in the file that is not within a box. Thus, an ISOBMFF file may consist of a series of boxes, and a box may contain other boxes. For example, the payload of a box may include one or more additional boxes. Figures 5 and 6, described in detail elsewhere in this disclosure, show example boxes within a file according to one or more techniques of this disclosure.

符合ISOBMFF的文件可包含各种类型的框。举例来说，符合ISOBMFF的文件可包含文件类型框、媒体数据框、电影框、电影片段框等等。在此实例中，文件类型框包含文件类型及兼容性信息。媒体数据框可含有样本(例如，经译码图片)。电影框(“moov”) 含有用于存在于文件中的连续媒体流的元数据。可将连续媒体流中的每一者在文件中表示为播放轨。举例来说，电影框可含有关于电影的元数据(例如，样本之间的逻辑及时序关系，还有指向样本的位置的指标)。电影框可包含若干类型的子框。电影框中的子框可包含一或多个播放轨框。播放轨框可包含关于电影的个别播放轨的信息。播放轨框可包含指定单一播放轨的总信息的播放轨标头框。此外，播放轨框可包含含有媒体信息框的媒体框。媒体信息框可包含含有媒体样本在播放轨中的数据索引的样本表框。样本表框中的信息可用以按时间(且对于播放轨的样本中的每一者，按类型、大小、容器及到样本的所述容器的偏移)定位样本。因此，将用于播放轨的元数据围封于播放轨框(“trak”) 中，而将播放轨的媒体内容围封于媒体数据框(“mdat”)中或直接围封于单独文件中。用于播放轨的媒体内容包括一连串样本(例如，由一连串样本组成)，例如，音频或视频存取单元。An ISOBMFF-compliant file may contain various types of boxes. For example, an ISOBMFF-compliant file may include a file type box, a media data box, a movie box, a movie fragment box, and so on. In this example, the file type box contains file type and compatibility information. The media data box may contain samples (e.g., coded pictures). The movie box ("moov") contains metadata for the continuous media streams present in the file. Each of these continuous media streams may be represented in the file as a track. For example, the movie box may contain metadata about the movie (e.g., the logical and timing relationships between samples, as well as pointers to the locations of the samples). The movie box may contain several types of sub-boxes. A sub-box within a movie box may include one or more track boxes. A track box may contain information about individual tracks of a movie. A track box may include a track header box that specifies the overall information for a single track. Furthermore, a track box may include a media box containing a media information box. A media information box may include a sample table box containing data indexes for media samples within a track. The information in the sample table box can be used to locate the sample in time (and, for each of the samples of a track, by type, size, container, and offset to the container of the sample). Thus, metadata for a track is enclosed in a track box ("trak"), while the media content of the track is enclosed in a media data box ("mdat") or directly in a separate file. The media content for a track includes (e.g., consists of) a sequence of samples, such as an audio or video access unit.

ISOBMFF指定以下类型的播放轨：媒体播放轨，其含有基本媒体流；提示播放轨，其包含媒体发射指令或表示接收的包流；及计时元数据播放轨，其包括时间同步的元数据。用于每一播放轨的元数据包含样本描述项的列表，每一项提供在播放轨中使用的译码或封装格式及对于处理所述格式需要的初始化数据。每一样本与播放轨的样本描述项中的一者相关联。ISOBMFF specifies the following types of tracks: media tracks, which contain elementary media streams; hint tracks, which contain media emission instructions or packet streams indicating reception; and timing metadata tracks, which include metadata for time synchronization. The metadata for each track consists of a list of sample description items, each of which provides the encoding or encapsulation format used in the track and the initialization data required to process that format. Each sample is associated with one of the track's sample description items.

ISOBMFF实现通过各种机构指定样本特定元数据。样本表框(“stbl”)内的特定框已经标准化以响应普通需求。举例来说，同步样本框(“stss”)为样本表框内的框。同步样本框用以列出播放轨的随机存取样本。本发明可将由同步样本框列出的样本称作同步样本。在另一实例中，样本分组机构实现根据四字符分组类型将样本映射成共享指定为文件中的样本群组描述项的同一性质的样本的群组。已在ISOBMFF中指定若干分组类型。ISOBMFF implements various mechanisms for specifying sample-specific metadata. Certain boxes within the sample table box ("stbl") have been standardized to address common needs. For example, the synchronization sample box ("stss") is a box within the sample table box. The synchronization sample box is used to list random access samples for a track. This disclosure may refer to the samples listed by the synchronization sample box as synchronization samples. In another example, a sample grouping mechanism implements mapping samples into groups of samples sharing the same property, specified as a sample group descriptor in the file, based on a four-character grouping type. Several grouping types have been specified in ISOBMFF.

样本表框可包含一或多个SampleToGroup框及一或多个样本群组描述框(即，SampleGroupDescription框)。SampleToGroup框可用以确定样本属于的样本群组，连同所述样本群组的相关联描述。换句话说，SampleToGroup框可指示样本属于的群组。SampleToGroup框可具有“sbgp”的框类型。SampleToGroup框可包含分组类型元件(例如，grouping_type)。分组类型元件可为识别样本分组的类型(即，用以形成样本群组的规则)的整数。此外，SampleToGroup框可包含一或多个项。SampleToGroup框中的每一项可与播放轨中的一系列不同的非重叠连续样本相关联。每一项可指示样本计数元素(例如，sample_count)及群组描述索引元素(例如，group_description_index)。项目的样本计数元素可指示与所述项目相关联的样本的数目。换句话说，项目的样本计数元素可为给出具有相同样本群组描述符的连续样本的数目的整数。群组描述索引元素可识别含有与所述项目相关联的样本的描述的SampleGroupDescription框。多个项目的群组描述索引元素可识别相同SampleGroupDescription框。The Sample Table box may include one or more SampleToGroup boxes and one or more Sample Group Description boxes (i.e., SampleGroupDescription boxes). The SampleToGroup box can be used to identify the sample group to which a sample belongs, along with an associated description of the sample group. In other words, the SampleToGroup box may indicate the group to which the sample belongs. The SampleToGroup box may have a box type of "sbgp." The SampleToGroup box may include a grouping type element (e.g., grouping_type). The grouping type element may be an integer that identifies the type of sample grouping (i.e., the rule used to form sample groups). In addition, the SampleToGroup box may include one or more entries. Each entry in the SampleToGroup box may be associated with a series of different, non-overlapping, consecutive samples in a playback track. Each entry may indicate a sample count element (e.g., sample_count) and a group description index element (e.g., group_description_index). The sample count element of an entry may indicate the number of samples associated with the entry. In other words, the sample count element of an entry may be an integer that specifies the number of consecutive samples with the same sample group descriptor. The group description index element may identify a SampleGroupDescription box containing a description of the samples associated with the item. The group description index elements of multiple items may identify the same SampleGroupDescription box.

当前文件格式设计可具有一或多个问题。为了基于ISOBMFF存储特定视频编解码器的视频内容，可能需要所述视频编解码器的文件格式规格。为了含有例如MV-HEVC 及SHVC等多个层的视频流的存储，可重新使用来自SVC及MVC文件格式的概念中的一些。然而，许多部分不能直接用于SHVC及MV-HEVC视频流。HEVC文件格式的直接应用具有至少下列缺点：SHVC及MV-HEVC可开始于含有基础层中的IRAP图片但也可含有其它层中的其它非IRAP图片的存取单元，或反之亦然。同步样本当前不允许用于随机存取的此点的指示。The current file format design may have one or more problems. In order to store video content for a specific video codec based on ISOBMFF, a file format specification of that video codec may be needed. For the storage of video streams containing multiple layers such as MV-HEVC and SHVC, some concepts from the SVC and MVC file formats can be reused. However, many parts cannot be directly used for SHVC and MV-HEVC video streams. Direct application of the HEVC file format has at least the following disadvantages: SHVC and MV-HEVC can start with an access unit that contains an IRAP picture in the base layer but may also contain other non-IRAP pictures in other layers, or vice versa. Synchronization samples are currently not allowed for indication of this point in random access.

本发明描述对以上问题的潜在解决方案，以及提供其它潜在改进，以实现含有多个层的视频流的高效且灵活存储。本发明中描述的技术潜在地适用于用于存储由任何视频编解码器译码的此视频内容的任何文件格式，但所述描述是针对基于HEVC文件格式存储SHVC及MV-HEVC视频流，其在ISO/IEC 14496-15的第8条中指定。This disclosure describes potential solutions to the above problems, as well as other potential improvements, for efficient and flexible storage of video streams containing multiple layers. The techniques described in this disclosure are potentially applicable to any file format for storing such video content decoded by any video codec, but the description is directed to storing SHVC and MV-HEVC video streams based on the HEVC file format, which is specified in clause 8 of ISO/IEC 14496-15.

以下将详细论述本发明的所述技术的详细实施方案。可在以下实例中概述本发明的技术。可分开来使用以下实例。替代地，可将以下实例的各种组合一起使用。Detailed embodiments of the technology of the present invention will be discussed in detail below. The technology of the present invention can be summarized in the following examples. The following examples can be used separately. Alternatively, various combinations of the following examples can be used together.

在第一实例中，Compressorname为在VisualSampleEntry框中指定的值。如在ISO/IEC 14496-12的第8.5.2.1节中所描述，VisualSampleEntry框为存储关于使用的译码类型的详细信息及对于所述译码所需要的任何初始化信息的用于视频播放轨的样本表框的类型。Compressorname指示用以产生媒体数据的压缩器的名称。视频解码器可使用Compressorname的值确定如何及/或是否解码文件中的视频数据。如在ISO/IEC 14496-12的第8.5.3节中所定义，Compressorname在固定32字节字段中格式化，其中第一字节经设定到待显示的字节的数目，接着为可显示的数据的字节的所述数目，且接着填补完整的一共32个字节(包含大小字节)。In a first example, Compressorname is the value specified in the VisualSampleEntry box. As described in Section 8.5.2.1 of ISO/IEC 14496-12, the VisualSampleEntry box is a type of sample table box for a video track that stores detailed information about the type of decoding used and any initialization information required for that decoding. Compressorname indicates the name of the compressor used to generate the media data. A video decoder can use the value of Compressorname to determine how and/or whether to decode the video data in a file. As defined in Section 8.5.3 of ISO/IEC 14496-12, Compressorname is formatted in a fixed 32-byte field, with the first byte set to the number of bytes to be displayed, followed by the number of bytes of data that can be displayed, and then padded to a total of 32 bytes (including the size byte).

第一实例允许Compressorname的两个新值。Compressorname的第一新值为针对含有SHVC视频流的文件的“\013SHVC译码”。Compressorname的第二新值为用于含有 MV-HEVC视频流的文件的“\016MV-HEVC译码”。可如在以下第9.5.3.1.3节及第10.5.3.2 节中所展示实施此第一实例。The first example allows two new values for Compressorname. The first new value for Compressorname is "\013SHVC Coding" for files containing SHVC video streams. The second new value for Compressorname is "\016MV-HEVC Coding" for files containing MV-HEVC video streams. This first example can be implemented as shown in Sections 9.5.3.1.3 and 10.5.3.2 below.

如上简要地描述，文件可包含含有用于文件的播放轨的元数据的电影框。电影框可包含用于所述文件的每一播放轨的播放轨框。此外，播放轨框可包含含有声明播放轨的媒体的特性信息的所有对象的媒体信息框。媒体信息框可包含样本表框。样本表框可指定样本特定元数据。举例来说，样本表框可包含多个样本描述框。样本描述框中的每一者可为样本项的实例。在ISO/IEC 14496-12中，VisualSampleEntry类别的实例可用作样本项。针对特定视频译码标准的样本项的类别可扩展VisualSampleEntry类别。举例来说，针对HEVC的样本项的类别可扩展VisualSampleEntry类别。因此，本发明可将扩展VisualSampleEntry类别的不同类别称作不同样本项类型。As briefly described above, a file may include a movie box containing metadata for the file's tracks. The movie box may include a track box for each track of the file. In addition, the track box may include a media information box containing all objects that declare characteristic information about the media of the track. The media information box may include a sample table box. The sample table box may specify sample-specific metadata. For example, the sample table box may include multiple sample description boxes. Each of the sample description boxes may be an instance of a sample item. In ISO/IEC 14496-12, an instance of the VisualSampleEntry class may be used as a sample item. The class of sample items for a specific video decoding standard may extend the VisualSampleEntry class. For example, the class of sample items for HEVC may extend the VisualSampleEntry class. Therefore, this disclosure may refer to different classes that extend the VisualSampleEntry class as different sample item types.

在第二实例中，针对HEVC播放轨定义两个新样本项(即，“样本”)类型——“hev2”及“hvc2”。两个新样本项类型允许使用聚集器及提取器。一般来说，聚集器聚集呈单一聚集的数据单元的形式的多个NAL单元。举例来说，聚集器可含有多个NAL单元及 /或可实际上串接多个NAL单元。一般来说，提取器指示从其它播放轨获得的数据的类型。举例来说，经由多个播放轨存储媒体数据(例如，HEVC数据)可导致紧凑的文件，这是因为可通过使用叫作提取器(其作为NAL单元嵌入于媒体数据中)的相对较小数据单元跨媒体播放轨参考数据来避免数据的复制。可如在以下第9.5.3.1.1节、第9.5.3.1.2 节、第9.5.4节、第9.5.6节、第10.4.5节、第10.5.3.1.1.1节及第10.5.3.2节中所展示来实施此第二实例。In a second example, two new sample item (i.e., "sample") types are defined for HEVC tracks—"hev2" and "hvc2". The two new sample item types allow the use of aggregators and extractors. In general, an aggregator aggregates multiple NAL units into a single aggregated unit of data. For example, an aggregator may contain multiple NAL units and/or may actually concatenate multiple NAL units. In general, an extractor indicates the type of data obtained from other tracks. For example, storing media data (e.g., HEVC data) across multiple tracks can result in compact files because data duplication can be avoided by using relatively small data units called extractors (which are embedded in the media data as NAL units) to reference data across media tracks. This second example may be implemented as shown in Sections 9.5.3.1.1, 9.5.3.1.2, 9.5.4, 9.5.6, 10.4.5, 10.5.3.1.1.1, and 10.5.3.2 below.

在第三实例中，与关于用于多层位流的参数集的存储的特定要求相关联的样本项的定义经修改，以便实现对特定层或特定操作点的方便的随机存取。举例来说，当SHVC、MV-HEVC或3D-HEVC播放轨具有样本项且当样本含有至少一IRAP图片时，对于解码所述样本所需要的所有参数应包含于所述样本项或所述样本自身中。在此实例中，当样本不含有任何IRAP图片时，对于解码所述样本所需要的所有参数集(例如，VPS、SPS、 PPS)应包含于所述样本项中或自从含有至少一IRAP图片的先前样本到所述样本自身(包含性)的任何样本中。可如以下在第9.5.3.1.1节中所展示实施此第三实例。In a third example, the definition of a sample item associated with specific requirements regarding the storage of parameter sets for a multi-layer bitstream is modified to enable convenient random access to a specific layer or a specific operation point. For example, when an SHVC, MV-HEVC, or 3D-HEVC playback track has a sample item and when a sample contains at least one IRAP picture, all parameters required for decoding the sample should be included in the sample item or the sample itself. In this example, when a sample does not contain any IRAP pictures, all parameter sets (e.g., VPS, SPS, PPS) required for decoding the sample should be included in the sample item or in any sample from the previous sample containing at least one IRAP picture to the sample itself (inclusive). This third example can be implemented as shown below in Section 9.5.3.1.1.

在第三实例的一个替代版本中，当SHVC、MV-HEVC或3D-HEVC播放轨具有样本项且当在样本中的图片为IRAP图片时，对于解码所述图片所需要的所有参数集应包含于所述样本项中或所述样本自身中。此外，在此替代例中，当样本不含有任何IRAP 图片时，对于解码所述图片所需要的所有参数集应包含于样本项中或同一层中在含有至少一IRAP图片的先前样本后的样本到所述样本自身中的任何者中(包含性)。In an alternative version of the third example, when an SHVC, MV-HEVC, or 3D-HEVC track has a sample entry and when a picture in the sample is an IRAP picture, all parameter sets required for decoding the picture should be included in the sample entry or in the sample itself. Furthermore, in this alternative, when the sample does not contain any IRAP pictures, all parameter sets required for decoding the picture should be included in the sample entry or in any sample in the same layer following the previous sample containing at least one IRAP picture, inclusive.

在第四实例中，定义用于现有样本项类型的以下情况。在此实例中，属于样本项类型“hev1”及“hvc1”的样本含有用于具有HEVC VCL NAL单元的SHVC及MV-HEVC 播放轨的HEVC、SHVC及MV-HEVC配置。此外，含有SHVC及MV-HEVC配置的样本项类型“hev1”及“hvc1”是针对无HEVC NAL单元但具有nuh_layer_id大于0的 VCL NAL单元的SHVC及MV-HEVC播放轨定义，其中不允许提取器。可如以下在第 9.5.3.1.1节中所展示实施此第四实例。In a fourth example, the following case is defined for existing sample item types. In this example, samples belonging to the sample item types "hev1" and "hvc1" contain HEVC, SHVC, and MV-HEVC configurations for SHVC and MV-HEVC tracks with HEVC VCL NAL units. Furthermore, sample item types "hev1" and "hvc1" containing SHVC and MV-HEVC configurations are defined for SHVC and MV-HEVC tracks without HEVC NAL units but with VCL NAL units with nuh_layer_id greater than 0, where extractors are not allowed. This fourth example can be implemented as shown below in Section 9.5.3.1.1.

在第五实例中，将SHVC、MV-HEVC或3D-HEVC播放轨中的同步样本定义为含有皆为IRAP图片的图片的样本。可如以下在第9.5.5节及第10.4.3节中所展示实施此第五实例。如以下在第9.5.5节所指定，如果存取单元中的每一经译码图片为IRAP图片，那么将SHVC样本考虑为同步样本，如在HEVC WD中所定义。此外，如以下在第10.4.3 节中所指定，如果存取单元中的每一经译码图片为无RASL图片的IRAP图片，那么将 MV-HEVC样本考虑为同步样本，如在HEVC WD中所定义。In a fifth example, synchronization samples in an SHVC, MV-HEVC, or 3D-HEVC track are defined as samples containing pictures that are all IRAP pictures. This fifth example may be implemented as shown below in Sections 9.5.5 and 10.4.3. As specified below in Section 9.5.5, if every coded picture in an access unit is an IRAP picture, then SHVC samples are considered synchronization samples, as defined in HEVC WD. Furthermore, as specified below in Section 10.4.3, if every coded picture in an access unit is an IRAP picture without a RASL picture, then MV-HEVC samples are considered synchronization samples, as defined in HEVC WD.

因此，在第五实例中，作为产生文件的部分，文件产生装置34可产生同步样本框，所述同步样本框包含以文件记载多层视频数据的播放轨的同步样本的同步样本表。播放轨的每一同步样本为播放轨的随机存取样本。如果存取单元中的每一经译码图片为 IRAP图片，那么可缩放视频译码样本为同步样本。如果存取单元中的每一经译码图片为无RASL图片的IRAP图片，那么多视图视频译码样本为同步样本。Thus, in a fifth example, as part of generating a file, file generation device 34 may generate a synchronization sample box that includes a synchronization sample table that files synchronization samples for a track of multi-layer video data. Each synchronization sample of a track is a random access sample of the track. If each coded picture in an access unit is an IRAP picture, then the scalable video coding sample is a synchronization sample. If each coded picture in an access unit is an IRAP picture without a RASL picture, then the multiview video coding sample is a synchronization sample.

在第五实例的替代版本中，将SHVC、MV-HEVC或3D-HEVC播放轨中的同步样本定义为含有皆为无RASL图片的IRAP图片的图片的样本。同步样本表以文件记载同步样本。视情况，同步样本样本群组以文件记载同步样本。换句话说，同步样本样本群组包含识别同步样本的信息。In an alternative version of the fifth example, sync samples in an SHVC, MV-HEVC, or 3D-HEVC track are defined as samples containing pictures that are all IRAP pictures without RASL pictures. A sync sample table documents the sync samples. Optionally, a sync sample sample group documents the sync samples. In other words, the sync sample sample group contains information identifying the sync samples.

在第六实例中，将“rap”样本群组定义为含有含皆为IRAP图片(具有或无RASL 图片)的图片的那些样本。可如以下在第9.5.5节中所示实施此第六实例。替代地，在第六实例中，将“rap”样本群组定义为含有含皆为IRAP图片的图片的那些样本，但不包含指示为同步样本的那些样本。In a sixth example, a "rap" sample group is defined as containing samples that contain pictures that are all IRAP pictures (with or without RASL pictures). This sixth example may be implemented as shown below in Section 9.5.5. Alternatively, in the sixth example, a "rap" sample group is defined as containing samples that contain pictures that are all IRAP pictures, but does not include samples indicated as sync samples.

在第七实例中，定义以文件记载含有以下各者的所有样本的新样本群组或新框：至少一IRAP图片、所述样本中的IRAP图片中的VCL NAL单元的NAL单元类型、样本中的所有经译码图片是否皆为IRAP图片及(如果不)样本中的IRAP图片的数目，及样本中的此些IRAP图片的层ID值。In a seventh example, a new sample group or a new box is defined that documents all samples that contain at least one IRAP picture, the NAL unit type of the VCL NAL units in the IRAP pictures in the sample, whether all coded pictures in the sample are IRAP pictures and, if not, the number of IRAP pictures in the sample, and the layer ID values of such IRAP pictures in the sample.

因此，在此第七实例中，文件产生装置34可产生包括含有用于文件中的播放轨的元数据的播放轨框的文件。用于播放轨的媒体数据包括一连串样本。样本中的每一者可为多层视频数据的存取单元。作为产生文件的部分，文件产生装置34在文件中产生以文件记载含有至少一IRAP图片的所有样本的额外框。Thus, in this seventh example, file generation device 34 may generate a file including a track box containing metadata for a track in the file. The media data for the track includes a series of samples. Each of the samples may be an access unit of multiple layers of video data. As part of generating the file, file generation device 34 generates an additional box in the file to document all samples containing at least one IRAP picture.

部分如以下在第9.5.5.1节中所示实施此第七实例。如以下在第9.5.5.1节中所示，随机可存取样本项类别扩展VisualSampleGroupEntry类别。随机可存取样本项类别的实例(即，随机可存取样本项框)对应于含有至少一IRAP图片的样本。此外，随机可存取样本项框包含指定对应的样本中的所有经译码图片是否为IRAP图片的 all_pics_are_IRAP值。This seventh example is implemented in part as shown below in Section 9.5.5.1. As shown below in Section 9.5.5.1, the Randomly Accessible Samples Entry class extends the VisualSampleGroupEntry class. An instance of the Randomly Accessible Samples Entry class (i.e., the Randomly Accessible Samples Entry box) corresponds to a sample that contains at least one IRAP picture. Furthermore, the Randomly Accessible Samples Entry box includes an all_pics_are_IRAP value that specifies whether all coded pictures in the corresponding sample are IRAP pictures.

因此，在第七实例中，文件产生装置34可产生包含值(例如，all_pics_are_IRAP)的样本项。值等于1指定样本中的每一经译码图片为IRAP图片。值等于0指定样本中并非所有经译码图片皆为IRAP图片。Thus, in a seventh example, file generation device 34 may generate a sample entry including a value, eg, all_pics_are_IRAP. A value equal to 1 specifies that every coded picture in the sample is an IRAP picture. A value equal to 0 specifies that not all coded pictures in the sample are IRAP pictures.

此外，根据第七实例，当样本的并非所有经译码图片皆为IRAP图片时，文件产生装置34可在对应于样本的样本项中包含指示样本群组的每一样本中的IRAP图片的数目的值。另外，当样本中并非所有经译码图片皆为IRAP图片时，文件产生装置34可在对应于样本的样本项中包含指示样本中的IRAP图片的层识别符的值。Furthermore, according to the seventh example, when not all coded pictures of a sample are IRAP pictures, file generation device 34 may include, in the sample entry corresponding to the sample, a value indicating the number of IRAP pictures in each sample of the sample group. Additionally, when not all coded pictures in the sample are IRAP pictures, file generation device 34 may include, in the sample entry corresponding to the sample, a value indicating a layer identifier of an IRAP picture in the sample.

替代地，在第七实例中，新样本群组或新框以文件记载此些样本，但不包含指示为同步样本或“rap”样本群组的成员的样本。Alternatively, in a seventh example, a new sample group or new frame documents such samples, but does not include samples indicated as sync samples or members of a "rap" sample group.

此第七实例可解决当使用ISOBMFF或其现有扩展存储多层视频数据时可引起的一或多个问题。举例来说，在单层视频译码中，通常每存取单元仅存在单一经译码图片。然而，在多层视频译码中，通常每存取单元存在一个以上经译码图片。ISOBMFF及其现有扩展不提供指示哪些样本包含一或多个IRAP图片的方式。此可妨碍计算装置定位文件中的随机存取点或执行层切换的能力。举例来说，在无指示样本中的哪些者含有一或多个IRAP图片的信息的情况下，计算装置可能需要剖析及解译NAL单元以便确定存取单元是否可用作随机存取点及/或用于层切换。剖析及解释NAL单元可对计算装置增加复杂度，且可消耗时间及处理资源。此外，执行随机存取及/或层切换的一些计算装置 (例如，流式传输服务器)并未经配置以剖析或解译NAL单元。This seventh example can address one or more issues that may arise when using ISOBMFF or its existing extensions to store multi-layer video data. For example, in single-layer video coding, there is typically only a single coded picture per access unit. However, in multi-layer video coding, there is typically more than one coded picture per access unit. ISOBMFF and its existing extensions do not provide a way to indicate which samples contain one or more IRAP pictures. This can hinder a computing device's ability to locate random access points in a file or perform layer switching. For example, without information indicating which of the samples contain one or more IRAP pictures, a computing device may need to parse and interpret NAL units in order to determine whether an access unit can be used as a random access point and/or for layer switching. Parsing and interpreting NAL units can add complexity to the computing device and can consume time and processing resources. Furthermore, some computing devices that perform random access and/or layer switching (e.g., streaming servers) are not configured to parse or interpret NAL units.

在第八实例中，包含新类型的子样本的介绍，其中每一子样本含有一个经译码图片及其相关联的非VCL NAL单元。可如以下在第9.5.8节中所示实施此第八实例。因此，在此第八实例中，文件产生装置34可产生包括含有用于文件中的播放轨的元数据的播放轨框的文件。用于播放轨的媒体数据包括一连串样本。样本中的每一者为多层视频数据的存取单元。作为产生文件的部分，文件产生装置34在文件中产生子样本信息框，所述子样本信息框含有指定在所述子样本信息框中给定的子样本信息的类型的旗标。当旗标具有特定值时，对应于所述子样本信息框的子样本含有正好一个经译码图片及与所述经译码图片相关联的零或多个非VCL NAL单元。In the eighth example, a new type of subsample is introduced, where each subsample contains a coded picture and its associated non-VCL NAL unit. This eighth example can be implemented as shown below in Section 9.5.8. Thus, in this eighth example, the file generator 34 can generate a file including a track box containing metadata for a track in the file. The media data for the track includes a series of samples. Each of the samples is an access unit of multi-layer video data. As part of generating the file, the file generator 34 generates a subsample information box in the file, which contains a flag that specifies the type of subsample information given in the subsample information box. When the flag has a specific value, the subsample corresponding to the subsample information box contains exactly one coded picture and zero or more non-VCL NAL units associated with the coded picture.

此第八实例可解决当使用ISOBMFF或其现有扩展存储多层视频数据时可引起的一或多个问题。举例来说，在多层视频译码中，每样本可存在多个经译码图片。举例来说，对于每一层，在样本中可存在一或多个图片。然而，在对于H.264/AVC及HEVC的 ISOBMFF的扩展中，当样本包含多个图片时，子样本信息框不提供关于所述样本内的个别图片的信息。此第八实例的技术可通过提供新类型的子样本信息框来解决此问题，所述新类型的子样本信息框提供关于含有仅一个经译码图片及与所述经译码图片相关联的非VCL NAL单元的子样本的信息。如与仅提供在与经译码图片相关联的NAL单元内的此信息相反，提供关于文件结构中的个别经译码图片的信息可使计算装置能够确定关于经译码图片的信息，而不必解译所述NAL单元。在一些情况下，为了减小计算装置的复杂度及/或增大计算装置的处理量，计算装置未经配置以解译NAL单元。在计算装置正流式传输存储于文件中的NAL单元的一些实例中，计算装置可使用子样本信息框中的信息确定是否将子样本的NAL单元转递到客户端装置。This eighth example can address one or more issues that may arise when using ISOBMFF or its existing extensions to store multi-layer video data. For example, in multi-layer video coding, multiple coded pictures may exist per sample. For example, for each layer, a sample may contain one or more pictures. However, in the ISOBMFF extensions for H.264/AVC and HEVC, when a sample contains multiple pictures, the subsample information box does not provide information about individual pictures within the sample. The techniques of this eighth example address this issue by providing a new type of subsample information box that provides information about subsamples containing only one coded picture and non-VCL NAL units associated with the coded picture. Providing information about individual coded pictures in the file structure, as opposed to providing this information only within the NAL units associated with the coded pictures, allows a computing device to determine information about the coded pictures without having to interpret the NAL units. In some cases, to reduce computing device complexity and/or increase computing device throughput, the computing device is not configured to interpret NAL units. In some examples where a computing device is streaming NAL units stored in a file, the computing device may use information in the subsample information box to determine whether to forward the NAL units for the subsample to the client device.

第九实例涉及在多层上下文中的非输出样本的处置。特定来说，在第九实例中，当存取单元含有具有等于1的pic_output_flag的一些经译码图片及具有等于0的 pic_output_flag的一些其它经译码图片时，必须使用至少两个播放轨存储流，使得在每一播放轨内，每一样本中的所有经译码图片具有相同的pic_output_flag值。可如以下在第9.5.9节中所示实施此第九实例。The ninth example relates to the handling of non-output samples in a multi-layer context. Specifically, in the ninth example, when an access unit contains some coded pictures with pic_output_flag equal to 1 and some other coded pictures with pic_output_flag equal to 0, at least two tracks must be used to store the stream so that within each track, all coded pictures in each sample have the same pic_output_flag value. This ninth example can be implemented as shown below in Section 9.5.9.

因此，在此第九实例中，文件产生装置34可产生包括围封媒体内容的媒体数据框的文件。媒体内容包括一连串样本。样本中的每一者为多层视频数据的存取单元。响应于多层视频数据的位流的至少一存取单元包含具有等于1的图片输出旗标(例如， pic_output_flag)的经译码图片及具有等于0的图片输出旗标的经译码图片，文件产生装置34可使用至少两个播放轨将位流存储于文件中。对于来自至少两个播放轨的每一相应播放轨，相应播放轨的每一样本中的所有经译码图片具有相同的图片输出旗标值。Thus, in this ninth example, file generation device 34 may generate a file including a media data box enclosing media content. The media content includes a series of samples. Each of the samples is an access unit of multi-layer video data. In response to at least one access unit of the bitstream of multi-layer video data including a coded picture having a picture output flag (e.g., pic_output_flag) equal to 1 and a coded picture having a picture output flag equal to 0, file generation device 34 may store the bitstream in a file using at least two tracks. For each respective track from the at least two tracks, all coded pictures in each sample of the respective track have the same picture output flag value.

此第九实例可解决当使用ISOBMFF或其现有扩展存储多层视频数据时可引起的一或多个问题。举例来说，如果使用单一播放轨存储具有等于0的图片输出旗标及等于1 的图片输出旗标的经译码图片，那么将违反各种文件格式化规则。举例来说，文件格式化规则通常需要每时刻在播放轨中仅存在一个样本。如果单一播放轨存储具有等于0的图片输出旗标及等于1的图片输出旗标的经译码图片，那么每时刻在播放轨中将存在多个样本。迫使具有不同图片输出旗标值的经译码图片在文件的不同播放轨中可解决此问题。This ninth example can address one or more issues that can arise when using ISOBMFF or its existing extensions to store multi-layer video data. For example, if a single track is used to store coded pictures with a picture output flag equal to 0 and a picture output flag equal to 1, various file formatting rules will be violated. For example, file formatting rules typically require that only one sample be present in a track at any time. If a single track stores coded pictures with a picture output flag equal to 0 and a picture output flag equal to 1, then multiple samples will be present in the track at any time. Forcing coded pictures with different picture output flag values to be in different tracks of the file can address this issue.

以下描述本发明的一些技术的实例实施方案。以下描述的实例实施方案是基于在MPEG输出文件W13478中的14496-15的最新近集成规格。以下包含对附录A的改变(通过下划线展示)及添加的章节(第9节针对SHVC，且第10节针对MV-HEVC)。换句话说，本发明的特定实例可修改ISO/IEC 14496-15的附录A，且可将第9节及/或第10节添加到ISO/IEC 14496-15。通过下划线及双下划线展示的文字可具有与本发明的实例的特定相关性。虽然在本文中描述的实例中各处使用术语SHVC，但本发明的设计实际上不仅将仅支持SHVC编解码器，而取而代之，可支持包含MV-HEVC、3D-HEVC的所有多层编解码器，除非另有明确地提到。The following describes example implementations of some of the techniques of this disclosure. The example implementations described below are based on the most recently integrated specification of ISO/IEC 14496-15 in MPEG output document W13478. Included below are changes to Annex A (shown by underlining) and added sections (Section 9 for SHVC and Section 10 for MV-HEVC). In other words, specific examples of this disclosure may modify Annex A of ISO/IEC 14496-15 and may add Sections 9 and/or 10 to ISO/IEC 14496-15. Text shown by underlining and double underlining may have specific relevance to examples of this disclosure. Although the term SHVC is used throughout the examples described herein, the design of this disclosure will actually support not only the SHVC codec, but instead all multi-layer codecs, including MV-HEVC and 3D-HEVC, unless explicitly mentioned otherwise.

9 SHVC基本流及样本定义9 SHVC basic stream and sample definition

9.1介绍9.1 Introduction

此条款指定SHVC数据的存储格式。其扩展在第8条中的HEVC的存储格式的定义。This clause specifies the storage format of SHVC data. It extends the definition of the storage format of HEVC in clause 8.

如在此条款及附录A到D中定义的用于SHVC内容的存储的文件格式使用ISO基本媒体文件格式的现有能力及普通HEVC文件格式(即，在第8条中指定的文件格式)。此外，使用尤其以下结构或扩展来支持SHVC特定特征。The file format for storage of SHVC content as defined in this clause and Appendices A to D uses existing capabilities of the ISO base media file format and the normal HEVC file format (i.e., the file format specified in clause 8). In addition, the following structures or extensions are used to support SHVC-specific features.

聚集器：Aggregator:

通过将NAL单元的不规则模式改变成聚集的数据单元的规则模式来实现NAL 单元的高效可缩放分组的结构。The structure of efficient scalable grouping of NAL units is achieved by changing the irregular pattern of NAL units into a regular pattern of aggregated data units.

提取器：Extractor:

实现从不同于含有媒体数据的播放轨的播放轨高效提取NAL单元的结构。A structure that enables efficient extraction of NAL units from a different track than the one containing the media data.

时间元数据语句：Time metadata statement:

用于存储媒体样本的经时间对准信息的结构。A structure for storing time-aligned information for media samples.

HEVC兼容性：HEVC compatibility:

提供用于以HEVC兼容方式存储SHVC位流，使得HEVC兼容基础层可由任一遵照普通HEVC文件格式的读取器使用。Provides for storing SHVC bitstreams in an HEVC-compatible manner so that the HEVC-compatible base layer can be used by any reader conforming to the normal HEVC file format.

9.2基本流结构9.2 Basic Stream Structure

根据8.2存储SHVC流，具有SHVC视频基本流的以下定义：SHVC streams are stored according to 8.2, with the following definition of an SHVC video elementary stream:

●SHVC视频基本流应含有所有视频译码有关的NAL单元(即，含有视频数据或传信视频结构的那些NAL单元)且可含有例如SEI消息及存取单元定界符NAL 单元的非视频译码有关的NAL单元。也可存在聚集器(见A.2)或提取器(见A.3)。聚集器及提取器应如在此国际标准中所定义来处理(例如，在存取文件时，不应直接置放于输出缓冲器中)。可存在未明确受到抑制的其它NAL单元，且如果其未经辨识，那么应将其忽略(例如，在存取文件时，不置放于输出缓冲器中)。An SHVC video elementary stream shall contain all video coding related NAL units (i.e., those NAL units that contain video data or signal video structure) and may contain non-video coding related NAL units such as SEI messages and access unit delimiter NAL units. Aggregators (see A.2) or extractors (see A.3) may also be present. Aggregators and extractors shall be processed as defined in this International Standard (e.g., shall not be placed directly in the output buffer when accessing the file). Other NAL units that are not explicitly suppressed may be present and, if not recognized, shall be ignored (e.g., not placed in the output buffer when accessing the file).

不应使用相关联的参数集流来存储SHVC流。SHVC streams shall not be stored with associated parameter set streams.

可存在等于0的nuh_layer_id的VCL NAL单元、大于0的nuh_layer_id的VCL NAL单元及SHVC视频基本流中存在的非VCL NAL单元。另外，可存在SHVC视频基本流中存在的聚集器NAL单元及提取器NAL单元。There may be VCL NAL units with nuh_layer_id equal to 0, VCL NAL units with nuh_layer_id greater than 0, and non-VCL NAL units present in the SHVC video elementary stream. In addition, there may be aggregator NAL units and extractor NAL units present in the SHVC video elementary stream.

9.3普通HEVC文件格式的使用9.3 Use of common HEVC file format

SHVC文件格式为在第8条中定义的普通HEVC文件格式的扩展。The SHVC file format is an extension of the normal HEVC file format defined in clause 8.

9.4样本及配置定义9.4 Sample and Configuration Definition

9.4.1介绍9.4.1 Introduction

SHVC样本：SHVC样本也为如在ISO/IEC 23008-2的附录H中定义的存取单元。SHVC sample: An SHVC sample is also an access unit as defined in Annex H of ISO/IEC 23008-2.

9.4.2标准次序及限制9.4.2 Standard order and restrictions

9.4.2.1限制9.4.2.1 Limitations

除8.3.2中的要求外，以下限制也应用于SHVC数据。In addition to the requirements in 8.3.2, the following restrictions apply to SHVC data.

●VCL NAL单元：在其组成时间为由一个存取单元表示的图片的组成时间的实例中应含有在所述存取单元中的所有VCL NAL单元。SHVC样本应含有至少一 VCL NAL单元。● VCL NAL unit: In the instance where its composition time is the composition time of a picture represented by an access unit, it shall contain all VCL NAL units in the access unit. An SHVC sample shall contain at least one VCL NAL unit.

●聚集器/提取器：在聚集器中包含或由提取器参考的所有NAL单元的次序恰好为解码次序，如同此些NAL单元存在于不含有聚集器/提取器的样本中。在处理了聚集器或提取器后，所有NAL单元必须呈有效的解码次序，如在ISO/IEC 23008-2中所指定。Aggregator/Extractor: The order of all NAL units included in an aggregator or referenced by an extractor is exactly in decoding order, as if these NAL units were present in a sample that did not contain the aggregator/extractor. After processing an aggregator or extractor, all NAL units must be in valid decoding order, as specified in ISO/IEC 23008-2.

9.4.2.2解码器配置记录9.4.2.2 Decoder Configuration Record

当将在8.3.3.1中定义的解码器配置记录用于可解译为SHVC或HEVC流的流时，HEVC解码器配置记录应反映HEVC兼容基础层的性质，例如，其应仅含有对于解码 HEVC基础层所需要的参数集。When the decoder configuration record defined in 8.3.3.1 is used for a stream that can be interpreted as an SHVC or HEVC stream, the HEVC decoder configuration record should reflect the properties of the HEVC compatible base layer, for example, it should only contain the parameter sets required for decoding the HEVC base layer.

SHVCDecoderConfigurationRecord在结构上与HEVCDecoderConfigurationRecord相同。语法如下：SHVCDecoderConfigurationRecord is structurally identical to HEVCDecoderConfigurationRecord. The syntax is as follows:

aligned(8)class SHVCDecoderConfigurationRecord{aligned(8)class SHVCDecoderConfigurationRecord{

//与HEVCDecoderConfigurationRecord语法结构中的字段相同的字段//The same fields as those in the HEVCDecoderConfigurationRecord syntax structure

}}

SHVCDecoderConfigurationRecord中的字段的语义与针对HEVCDecoderConfigurationRecord所定义相同。The semantics of the fields in SHVCDecoderConfigurationRecord are the same as defined for HEVCDecoderConfigurationRecord.

9.5从ISO基本媒体文件格式的导出9.5 Export from ISO Base Media File Format

9.5.1 SHVC播放轨结构9.5.1 SHVC Track Structure

可缩放视频流由文件中的一或多个视频播放轨表示。每一播放轨表示可缩放流的一或多个操作点。当然，如果需要，可缩放流可进一步变细。A scalable video stream is represented by one or more video tracks in a file. Each track represents one or more operating points of the scalable stream. Of course, the scalable stream can be further refined if necessary.

让最低操作点为含有具有仅等于0的nuh_layer_id及仅等于0的TemporalId的NAL单元的所有操作点中的一者。应将含有最低操作点的播放轨提名为“可缩放基础播放轨”。为相同可缩放编码信息的部分的所有其它播放轨应通过类型‘sbas’(可缩放基础) 的播放轨参考而与此基础播放轨有联系。Let the lowest operation point be one of all operation points containing NAL units with only nuh_layer_id equal to 0 and only TemporalId equal to 0. The track containing the lowest operation point should be nominated as the "scalable base track". All other tracks that are part of the same scalable coding information should be related to this base track through track references of type 'sbas' (scalable base).

共享同一可缩放基础播放轨的所有播放轨必须共享与可缩放基础播放轨相同的时间标度。All tracks that share the same scalable base track must share the same timescale as the scalable base track.

9.5.2数据共享及提取9.5.2 Data Sharing and Extraction

不同的播放轨可逻辑上共享数据。此共享可呈以下两个形式中的一者：Different tracks can logically share data. This sharing can take one of two forms:

a)将样本数据从一个播放轨复制到另一播放轨(且可压紧或与例如音频等其它数据重新交错)。此建立较大总体文件，但为了易于提取，低位速率数据可经压紧及/或与其它材料交错。a) Copy sample data from one track to another (and possibly compact or re-interleave with other data such as audio). This creates a larger overall file, but for ease of extraction, the low bit rate data can be compacted and/or interleaved with other material.

b)可存在关于如何在读取文件时执行此复制的指令。b) There may be instructions on how to perform this copying when reading the file.

对于第二情况，使用提取器(在A.3中所定义)。For the second case, an extractor (defined in A.3) is used.

9.5.3 SHVC视频流定义9.5.3 SHVC video stream definition

9.5.3.1样本项名称及格式9.5.3.1 Sample item name and format

9.5.3.1.1定义9.5.3.1.1 Definition

类型：‘hvc2’、‘hev2’、‘shc1’、‘shv1’、‘shcC’Type: ‘hvc2’, ‘hev2’, ‘shc1’, ‘shv1’, ‘shcC’

容器：样本描述框(‘stsd’)Container: Sample description box (‘stsd’)

必选：‘hvc1’、‘hev1’、‘hvc2’、‘hev2’、‘shc1’或‘shv1’样本项为必选的Required: 'hvc1', 'hev1', 'hvc2', 'hev2', 'shc1', or 'shv1' sample items are required

数量：可存在一或多个样本项Quantity: There can be one or more sample items

当样本项名称为‘shc1’时，array_completeness的默认及必选值对于所有类型的参数集的阵列为1，且对于所有其它阵列为0。当样本项名称为‘shv1’时，array_completeness 的默认值对于所有阵列皆为0。When the sample item name is 'shc1', the default and required value of array_completeness is 1 for arrays of all types of parameter sets, and 0 for all other arrays. When the sample item name is 'shv1', the default value of array_completeness is 0 for all arrays.

当样本项名称为‘shv1’时，以下适用：When the sample item name is 'shv1', the following applies:

●如果样本含有至少一IRAP图片(如在ISO/IEC 23008-2中所定义)，那么对于解码所述样本所需要的所有参数集应包含于样本项中或所述样本自身中。• If a sample contains at least one IRAP picture (as defined in ISO/IEC 23008-2), all parameter sets needed for decoding the sample shall be included in the sample entry or in the sample itself.

●否则(样本不含有IRAP图片)，对于解码所述样本所需要的所有参数集应包含于样本项中或自从含有至少一IRAP图片的先前样本到所述样本自身(包含性)的任何样本中。• Otherwise (the sample does not contain an IRAP picture), all parameter sets needed for decoding the sample shall be included in the sample entry or in any sample from the previous sample containing at least one IRAP picture to the sample itself (inclusive).

●●

如果SHVC基本流含有可使用的HEVC兼容基础层，那么应使用HEVC视觉样本项(‘hvc1’或‘hev1’)。此处，所述项目一开始应含有HEVC配置框，可能接着为如下所定义的SHVC配置框。HEVC配置框以文件记载涉及HEVC兼容基础层的简档、层、层级及可能也有参数集，如由HEVCDecoderConfigurationRecord所定义。SHVC配置框以文件记载涉及存储于SHVCConfigurationBox中的含有SHVC兼容增强层的整个流的简档、层、层级及可能也有参数集(如由HEVCDecoderConfigurationRecord所定义)。If an SHVC elementary stream contains an HEVC-compatible base layer that can be used, then the HEVC visual samples item ('hvc1' or 'hev1') should be used. Here, the item should initially contain an HEVC configuration box, possibly followed by an SHVC configuration box as defined below. The HEVC configuration box documents the profiles, layers, levels, and possibly parameter sets for the HEVC-compatible base layer, as defined by HEVCDecoderConfigurationRecord. The SHVC configuration box documents the profiles, layers, levels, and possibly parameter sets for the entire stream containing the SHVC-compatible enhancement layers stored in the SHVCConfigurationBox (as defined by HEVCDecoderConfigurationRecord).

如果SHVC基本流不含有可使用的HEVC基础层，那么应使用SHVC视觉样本项(‘shc1’或‘shv1’)。SHVC视觉样本项应含有如下所定义的SHVC配置框。此包含SHVCDecoderConfigurationRecord，如在此国际标准中所定义。If the SHVC elementary stream does not contain an applicable HEVC base layer, then the SHVC visual sample item ('shc1' or 'shv1') shall be used. The SHVC visual sample item shall contain the SHVC configuration box defined below. This contains the SHVCDecoderConfigurationRecord as defined in this International Standard.

在任一给定样本项中的SHVC及HEVC配置中的lengthSizeMinusOne字段应具有相同值。The lengthSizeMinusOne field in the SHVC and HEVC configurations in any given sample entry shall have the same value.

提取器或聚集器可用于在‘hvc1’、‘hev1’、‘hvc2’、‘hev2’、‘shc1’或‘shv1’播放轨中具有大于0的nuh_layer_id的NAL单元。‘hvc2’或‘hev2’样本项中的‘extra_boxes’可为SHVCConfigurationBox或其它扩展框。The extractor or aggregator may be used for NAL units in an 'hvc1', 'hev1', 'hvc2', 'hev2', 'shc1', or 'shv1' track with a nuh_layer_id greater than 0. The 'extra_boxes' in an 'hvc2' or 'hev2' sample entry may be an SHVCConfigurationBox or other extension box.

注意当指示HEVC兼容性时，可能有必要指示用于HEVC基础层的不现实层级以适应整个流的位速率，这是因为将所有NAL单元考虑为包含于HEVC基础层中，且因此，可将其馈入到解码器，期望解码器舍弃其未辨识的那些NAL单元。此情况在使用‘hvc1’或‘hev1’样本项且HEVC及SHVC两个配置皆存在时发生。Note that when indicating HEVC compatibility, it may be necessary to indicate an unrealistic level for the HEVC base layer to accommodate the bit rate of the entire stream, since all NAL units are considered to be contained in the HEVC base layer and, therefore, can be fed to a decoder with the expectation that the decoder will discard those NAL units it does not recognize. This occurs when using the 'hvc1' or 'hev1' sample entry and both HEVC and SHVC profiles are present.

SHVCConfigurationBox可存在于‘hvc1’或‘hev1’样本项中。在此情况下，以下的HEVCSHVCSampleEntry定义适用。SHVCConfigurationBox may be present in either 'hvc1' or 'hev1' sample entries. In this case, the following HEVCSHVCSampleEntry definition applies.

下表展示对于视频播放轨的样本项、配置及SHVC工具(不包含计时元数据，其始终用于另一播放轨中)的所有可能使用：The following table shows sample items, configurations, and all possible uses of the SHVC tool for a video track (excluding timing metadata, which is always used in another track):

表10－用于HEVC及SHVC播放轨的样本项的使用Table 10 – Use of sample items for HEVC and SHVC tracks

9.5.3.1.2语法9.5.3.1.2 Syntax

9.5.3.1.3语义9.5.3.1.3 Semantics

当样本项应用到的流含有具有大于0的nuh_layer_id的NAL单元时，基础类别VisualSampleEntry中的Compressorname指示与正推荐的值“\013SHVC Coding” (\013为11，以字节为单元的字符串“SHVC Coding”的长度)一起使用的压缩器的名称。When the stream to which the sample entry applies contains NAL units with nuh_layer_id greater than 0, Compressorname in the base class VisualSampleEntry indicates the name of the compressor used with the value "\013SHVC Coding" being recommended (\013 is 11, the length of the string "SHVC Coding" in bytes).

9.5.4 SHVC视觉宽度及高度9.5.4 SHVC visual width and height

如果含有具有大于0的nuh_layer_id的NAL单元的流由类型‘hvc1’、‘hev1’、‘hvc2’、‘hev2’的样本项描述，那么在所述流的VisualSampleEntry中以文件记载的视觉宽度及高度为HEVC基础层的视觉宽度及高度；否则，其为通过解码整个流的最高层的经解码图片的视觉宽度及高度。If a stream containing a NAL unit with a nuh_layer_id greater than 0 is described by sample entries of type ‘hvc1’, ‘hev1’, ‘hvc2’, or ‘hev2’, the visual width and height documented in the VisualSampleEntry of the stream are the visual width and height of the HEVC base layer; otherwise, they are the visual width and height of the decoded picture of the highest layer by decoding the entire stream.

9.5.5同步样本9.5.5 Synchronous Samples

如果存取单元中的每一经译码图片为IRAP图片，那么将SHVC样本考虑为同步样本，如在ISO/IEC 23008-2中所定义。同步样本由同步样本表以文件记载，且可额外地由同步样本样本群组及‘rap’样本群组以文件记载。If each coded picture in the access unit is an IRAP picture, the SHVC samples are considered synchronization samples, as defined in ISO/IEC 23008-2. Synchronization samples are documented by a synchronization sample table and may additionally be documented by synchronization sample sample groups and 'rap' sample groups.

9.5.5.1随机存取样本样本群组9.5.5.1 Random Access Samples and Sample Groups

9.5.5.1.1定义9.5.5.1.1 Definition

群组类型：‘ras’Group type: ‘ras’

容器：样本群组描述框(‘ras’)Container: Sample group description box (‘ras’)

必选：否Required: No

数量：零或多个Quantity: zero or more

随机存取样本样本群组识别含有至少一IRAP图片的样本。A random access sample group identifies samples that contain at least one IRAP picture.

9.5.5.1.2语法9.5.5.1.2 Syntax

9.5.5.1.3语义9.5.5.1.3 Semantics

all_pics_are_IRAP等于1指定群组的每一样本中的所有经译码图片为IRAP图片。当值等于0时，以上约束可或可不适用。all_pics_are_IRAP equal to 1 specifies that all coded pictures in each sample of the group are IRAP pictures. When the value is equal to 0, the above constraints may or may not apply.

IRAP_nal_unit_type指定群组的每一样本中的IRAP图片的NAL单元类型。 IRAP_nal_unit_type的值应在16到23的范围中(包含性)。IRAP_nal_unit_type specifies the NAL unit type of the IRAP picture in each sample of the group. The value of IRAP_nal_unit_type should be in the range of 16 to 23, inclusive.

num_IRAP_pics指定群组的每一样本中的IRAP图片的数目。num_IRAP_pics specifies the number of IRAP pictures in each sample of the group.

IRAP_pic_layer_id指定群组的每一样本中的第i个IRAP图片的nuh_layer_id 的值。IRAP_pic_layer_id specifies the value of nuh_layer_id of the i-th IRAP picture in each sample of the group.

9.5.6关于随机存取恢复点及随机存取点的样本群组9.5.6 Random Access Recovery Points and Random Access Point Sample Groups

对于由类型‘hvc1’、‘hev1’、‘hvc2’或‘hev2’的样本项描述的视频数据，随机存取恢复样本群组及随机存取点样本群组分别识别用于对整个位流操作的HEVC解码及SHVC解码器(如果有)的随机存取恢复点及随机存取点。For video data described by sample items of type ‘hvc1’, ‘hev1’, ‘hvc2’, or ‘hev2’, random access recovery sample groups and random access point sample groups identify random access recovery points and random access points, respectively, for the HEVC decoder and SHVC decoder (if any) operating on the entire bitstream.

对于由类型‘shc1’或‘shv1’描述的视频数据，随机存取恢复样本群组识别整个SHVC位流中的随机存取恢复，且随机存取点样本群组识别整个SHVC位流中的随机存取点。For video data described by type 'shc1' or 'shv1', the random access recovery sample group identifies random access recovery in the entire SHVC bitstream, and the random access point sample group identifies random access points in the entire SHVC bitstream.

如果存取单元中的每一经译码图片为IRAP图片(具有或无RASL图片)，那么将SHVC样本考虑为随机存取点，如在ISO/IEC 23008-2中所定义，且ISO/IEC 14496-2中的前置样本为所有图片皆为RASL图片的样本，如在ISO/IEC 23008-2中所定义。If every coded picture in the access unit is an IRAP picture (with or without RASL pictures), SHVC samples are considered random access points, as defined in ISO/IEC 23008-2, and the leading samples in ISO/IEC 14496-2 are samples where all pictures are RASL pictures, as defined in ISO/IEC 23008-2.

9.5.7独立抛弃式样本框9.5.7 Independent Disposable Sample Frames

如果其用于HEVC及SHVC两者皆兼容的播放轨中，那么应注意语句为真，不管使用SHVC数据(可能仅HEVC数据)的何有效子集。如果信息变化，那么可能需要“未知”值(字段sample-depends-on、sample-is-depended-on及sample-has-redundancy的值0)。If used in a track compatible with both HEVC and SHVC, then note that this statement is true regardless of any valid subset of SHVC data used (possibly only HEVC data). If this information varies, then an "unknown" value (a value of 0 for the fields sample-depends-on, sample-is-dependent-on, and sample-has-redundancy) may be required.

9.5.8用于SHVC的子样本的定义9.5.8 Definition of Subsamples for SHVC

此子条款扩展8.4.8中的HEVC的子样本的定义。This subclause extends the definition of subsamples for HEVC in 8.4.8.

对于在SHVC流中的子样本信息框(ISO/IEC 14496-12的8.7.7)的使用，基于如以下指定的子样本信息框的旗标的值定义子样本。此框的存在是可选的；然而，如果存在于含有SHVC数据的播放轨中，那么其应具有此处定义的语义。For use of the Subsample Information box (8.7.7 of ISO/IEC 14496-12) in SHVC streams, subsamples are defined based on the values of the flags in the Subsample Information box as specified below. The presence of this box is optional; however, if present in a track containing SHVC data, it shall have the semantics defined here.

旗标指定在此框中给出的子样本信息的类型，如下：The flags specify the type of subsample information given in this box, as follows:

0：基于NAL单元的子样本。子样本含有一或多个相邻NAL单元。0: Based on NAL unit subsamples. A subsample contains one or more adjacent NAL units.

1：基于解码单元的子样本。子样本含有正好一个解码单元。1: Subsample based on decoding unit. A subsample contains exactly one decoding unit.

2：基于平铺块的子样本。子样本含有一个平铺块及含有所述平铺块的VCL NAL单元的相关联的非VCL NAL单元(如果有)，或含有一或多个非VCL NAL单元。2: Tile-based subsample: A subsample contains one tile and the associated non-VCL NAL units (if any) containing the VCL NAL unit of the tile, or contains one or more non-VCL NAL units.

3：基于CTU行的子样本。子样本含有切片内的一个CTU行及含有所述CTU 行的VCLNAL单元的相关联的非VCL NAL单元(如果有)，或含有一或多个非VCL NAL单元。当entropy_coding_sync_enabled_flag等于0时，不应使用此类型的子样本信息。3: CTU-row-based subsample. A subsample contains one CTU row within a slice and the associated non-VCL NAL units (if any) containing the VCL NAL unit of the CTU row, or contains one or more non-VCL NAL units. This type of subsample information shall not be used when entropy_coding_sync_enabled_flag is equal to 0.

4：基于切片的子样本。子样本含有一个切片(其中每一切片可含有一或多个切片片段，其中的每一者为NAL单元)及相关联的非VCL NAL单元(如果有)，或含有一或多个非VCL NAL单元。4: Slice-based subsample: A subsample contains one slice (where each slice may contain one or more slice segments, each of which is a NAL unit) and associated non-VCL NAL units (if any), or contains one or more non-VCL NAL units.

其它旗标值是保留的。Other flag values are reserved.

subsample_priority字段应设定到根据在ISO/IEC 14496-12中的此字段的规格的值。The subsample_priority field shall be set to a value according to the specification of this field in ISO/IEC 14496-12.

仅如果在此子样本经舍弃的情况下仍可解码此样本时，应将此可舍弃字段设定到1(例如，子样本由SEI NAL单元组成)。This discardable field should be set to 1 only if this sample can still be decoded if this subsample is discarded (e.g., the subsample consists of a SEI NAL unit).

当NAL单元的第一字节包含于子样本中时，先前长度字段必须也包含于同一子样本中。When the first byte of a NAL unit is contained in a subsample, the preceding length field must also be contained in the same subsample.

SubLayerRefNalUnitFlag等于0指示子样本中的所有NAL单元为如在ISO/IEC23008-2中指定的子层非参考图片的VCL NAL单元。值1指示子样本中的所有NAL 单元为如在ISO/IEC 23008-2中指定的子层参考图片的VCL NAL单元。SubLayerRefNalUnitFlag equal to 0 indicates that all NAL units in the subsample are VCL NAL units of sub-layer non-reference pictures as specified in ISO/IEC 23008-2. A value of 1 indicates that all NAL units in the subsample are VCL NAL units of sub-layer reference pictures as specified in ISO/IEC 23008-2.

RapNalUnitFlag等于0指示子样本中的NAL单元中无一者具有等于如在 ISO/IEC23008-2中指定的IDR_W_RADL、IDR_N_LP、CRA_NUT、BLA_W_LP、 BLA_W_RADL、BLA_N_LP、RSV_IRAP_VCL22或RSV_IRAP_VCL23的 nal_unit_type。值1指示子样本中的所有NAL单元具有如在ISO/IEC 23008-2中指定的IDR_W_RADL、IDR_N_LP、CRA_NUT、BLA_W_LP、BLA_W_RADL、BLA_N_LP、RSV_IRAP_VCL22或RSV_IRAP_VCL23的nal_unit_type。RapNalUnitFlag equal to 0 indicates that none of the NAL units in the subsample has a nal_unit_type equal to IDR_W_RADL, IDR_N_LP, CRA_NUT, BLA_W_LP, BLA_W_RADL, BLA_N_LP, RSV_IRAP_VCL22, or RSV_IRAP_VCL23 as specified in ISO/IEC 23008-2. A value of 1 indicates that all NAL units in the subsample have a nal_unit_type equal to IDR_W_RADL, IDR_N_LP, CRA_NUT, BLA_W_LP, BLA_W_RADL, BLA_N_LP, RSV_IRAP_VCL22, or RSV_IRAP_VCL23 as specified in ISO/IEC 23008-2.

VclNalUnitFlag等于0指示子样本中的所有NAL单元为非VCL NAL单元。值 1指示子样本中的所有NAL单元为VCL NAL单元。VclNalUnitFlag equal to 0 indicates that all NAL units in the subsample are non-VCL NAL units. A value of 1 indicates that all NAL units in the subsample are VCL NAL units.

vcl_idc指示子样本含有视频译码层(VCL)数据、非VCL数据或是两者，如下：vcl_idc indicates whether the subsample contains Video Coding Layer (VCL) data, non-VCL data, or both, as follows:

0：子样本含有VCL数据且不含有非VCL数据0: The subsample contains VCL data and no non-VCL data

1：子样本不含有VCL数据且含有非VCL数据1: The subsample does not contain VCL data and contains non-VCL data

2：子样本可含有应彼此相关联的VCL及非VCL数据两者。举例来说，子样本可含有解码单元信息SEI消息，接着为与SEI消息相关联的NAL单元的集合。2: A subsample may contain both VCL and non-VCL data that should be associated with each other. For example, a subsample may contain a decoding unit information SEI message, followed by a set of NAL units associated with the SEI message.

3：保留3: Keep

log2_min_luma_ctb指示ctb_x及ctb_y的单位，如下所指定：log2_min_luma_ctb indicates the units of ctb_x and ctb_y as specified below:

0：8个明度样本0: 8 brightness samples

1：16个明度样本1: 16 brightness samples

2：32个明度样本2: 32 brightness samples

3：64个明度样本3: 64 brightness samples

ctb_x指定当旗标等于2且vcl_idc等于1或2时与子样本相关联的平铺块的最右边明度样本的0基坐标，按从如以上指定的log2_min_luma_ctb导出的单位。ctb_x specifies the zero-based coordinate of the rightmost luma sample of the tile associated with the subsample when the flag is equal to 2 and vcl_idc is equal to 1 or 2, in units derived from log2_min_luma_ctb as specified above.

ctb_y指定当旗标等于2且vcl_idc等于1或2时与子样本相关联的平铺块的最底部明度样本的0基坐标，按从如以上指定的log2_min_luma_ctb导出的单位。ctb_y specifies the zero-based coordinate of the bottom-most luma sample of the tile associated with the subsample when flag is equal to 2 and vcl_idc is equal to 1 or 2, in units derived from log2_min_luma_ctb as specified above.

9.5.9处置非输出样本9.5.9 Handling Non-Output Samples

8.4.9中的规格适用，其中“HEVC”由“SHVC”替换，且将非输出样本定义为目标输出层的图片具有等于0的pic_output_flag的样本。当存取单元含有具有等于1的 pic_output_flag的一些经译码图片及具有等于0的pic_output_flag的一些其它经译码图片时，必须使用至少两个播放轨来存储流，使得在每一播放轨内，每一样本中的所有经译码图片具有相同的pic_output_flag值。The specifications in 8.4.9 apply, with "HEVC" replaced by "SHVC", and non-output samples are defined as samples whose pictures of the target output layer have pic_output_flag equal to 0. When an access unit contains some coded pictures with pic_output_flag equal to 1 and some other coded pictures with pic_output_flag equal to 0, the stream must be stored using at least two tracks such that within each track, all coded pictures in each sample have the same pic_output_flag value.

10 10 MV-HEVC基本流及样本定义10 10 MV-HEVC basic stream and sample definition

10.1介绍10.1 Introduction

此条款指定MV-HEVC数据的存储格式。其扩展在第8条中的HEVC的存储格式的定义。This clause specifies the storage format of MV-HEVC data. It extends the definition of the HEVC storage format in clause 8.

如在此条款及附录A到D中定义的用于MV-HEVC内容的存储的文件格式使用ISO 基本媒体文件格式的现有能力及普通HEVC文件格式(即，在第8条中指定的文件格式)。此外，使用尤其以下结构或扩展来支持MV-HEVC特定特征。The file format for storage of MV-HEVC content as defined in this clause and Appendices A to D uses the existing capabilities of the ISO base media file format and the normal HEVC file format (i.e., the file format specified in clause 8). In addition, the following structures or extensions are used to support MV-HEVC specific features.

聚集器：Aggregator:

提取器：Extractor:

HEVC兼容性：HEVC compatibility:

提供用于以HEVC兼容方式存储MV-HEVC位流，使得HEVC兼容基础层可由任一遵照普通HEVC文件格式的读取器使用。Provides for storing MV-HEVC bitstreams in an HEVC-compatible manner so that the HEVC-compatible base layer can be used by any reader conforming to the normal HEVC file format.

对MV-HEVC的支持包含许多工具，且存在可使用其的方式的各种“模型”。详言地说，可以许多方式将MV-HEVC流置放于播放轨中，其中有以下方式：Support for MV-HEVC includes many tools, and there are various "models" of how it can be used. Specifically, an MV-HEVC stream can be placed in a playback track in many ways, including the following:

1.一个播放轨中的所有视图，标有样本群组；1. All views in a playback track, marked with sample groups;

2.其自己的播放轨中的每一视图，标于样本项中；2. Each view in its own playback track, marked in the sample item;

3.混合，含有所有视图的一个播放轨，及各含有可独立地译码的视图的一或多个单一视图播放轨；3. Hybrid, one track containing all views, and one or more single-view tracks, each containing independently decodable views;

4.各在播放轨中的预期的操作点(例如，HEVC基础、立体声对、多视图场景)。4. The expected operation points in each track (e.g., HEVC base, stereo pair, multi-view scene).

MV-HEVC文件格式允许将一或多个视图存储到播放轨内，类似于第9条中对SHVC的支持。每播放轨可使用多个视图的存储，例如，当内容提供者想要提供并不意欲用于子集建构的多视图位流时，或当已针对少数预定义的输出视图的集合(例如，1个、2个、 5个或9个视图)建立位流时(其中可相应地建立播放轨)。如果将一个以上视图存储于播放轨中且存在表示MV-HEVC位流的若干播放轨(一个以上)，那么推荐样本分组机制的使用。The MV-HEVC file format allows one or more views to be stored in a track, similar to the support for SHVC in clause 9. The storage of multiple views per track can be used, for example, when a content provider wants to provide a multi-view bitstream that is not intended for subset construction, or when a bitstream has been created for a small set of predefined output views (e.g., 1, 2, 5, or 9 views) (where tracks can be created accordingly). If more than one view is stored in a track and there are several tracks (more than one) representing the MV-HEVC bitstream, the use of a sample grouping mechanism is recommended.

当MV-HEVC位流由多个播放轨表示且播放器使用含有多个播放轨中的数据的操作点时，所述播放器必须重建构MV-HEVC存取单元，之后将其递送到MV-HEVC解码器。 MV-HEVC操作点可明确地由播放轨表示，即，仅通过解析样本的所有提取器及聚集器 NAL单元来重建构存取单元。如果操作点的数目大，那么针对每一操作点建立播放轨可能消耗空间且不切实际。在此情况下，MV-HEVC存取单元如在10.5.2中所指定来重建构。MV-HEVC解码器配置记录含有指示相关联的样本使用明确或是隐含存取单元重建构的字段(见explicit_au_track字段)。When an MV-HEVC bitstream is represented by multiple playback tracks and a player uses operation points that contain data from multiple playback tracks, the player must reconstruct an MV-HEVC access unit before delivering it to the MV-HEVC decoder. MV-HEVC operation points can be represented explicitly by playback tracks, that is, the access unit is reconstructed simply by parsing all extractor and aggregator NAL units of the samples. If the number of operation points is large, then creating a playback track for each operation point may consume space and be impractical. In this case, the MV-HEVC access unit is reconstructed as specified in 10.5.2. The MV-HEVC decoder configuration record contains a field indicating whether the associated samples are reconstructed using explicit or implicit access units (see the explicit_au_track field).

10.2MV-HEVC播放轨结构10.2MV-HEVC Playback Track Structure

根据8.2存储MV-HEVC流，具有MV-HEVC视频基本流的以下定义：MV-HEVC streams are stored according to 8.2, with the following definition of an MV-HEVC video elementary stream:

●MV-HEVC视频基本流应含有所有视频译码有关的NAL单元(即，含有视频数据或传信视频结构的那些NAL单元)且可含有例如SEI消息及存取单元定界符NAL单元的非视频译码有关的NAL单元。也可存在聚集器(见A.2)或提取器(见 A.3)。聚集器及提取器应如在此国际标准中所定义来处理(例如，在存取文件时，不应直接置放于输出缓冲器中)。可存在未明确受到抑制的其它NAL单元，且如果其未经辨识，那么应将其忽略(例如，在存取文件时，不置放于输出缓冲器中)。An MV-HEVC video elementary stream shall contain all video coding-related NAL units (i.e., those NAL units that contain video data or signal video structure) and may contain non-video coding-related NAL units such as SEI messages and access unit delimiter NAL units. Aggregators (see A.2) or extractors (see A.3) may also be present. Aggregators and extractors shall be processed as defined in this International Standard (e.g., shall not be placed directly in the output buffer when accessing the file). Other NAL units that are not explicitly suppressed may be present and, if not recognized, shall be ignored (e.g., not placed in the output buffer when accessing the file).

当需要时，不应使用相关联的参数集流来存储MV-HEVC流。When required, the MV-HEVC stream shall not be stored with the associated parameter set stream.

可存在等于0的nuh_layer_id的VCL NAL单元、大于0的nuh_layer_id的VCL NAL单元及MV-HEVC视频基本流中存在的非VCL NAL单元。另外，可存在MV-HEVC视频基本流中存在的聚集器及提取器NAL单元。There may be VCL NAL units with nuh_layer_id equal to 0, VCL NAL units with nuh_layer_id greater than 0, and non-VCL NAL units present in the MV-HEVC video elementary stream. In addition, there may be aggregator and extractor NAL units present in the MV-HEVC video elementary stream.

10.3普通HEVC文件格式的使用10.3 Use of Common HEVC File Format

MV-HEVC文件格式为在第8条中定义的普通HEVC文件格式的扩展。The MV-HEVC file format is an extension of the normal HEVC file format defined in clause 8.

10.4样本及配置定义10.4 Sample and Configuration Definition

10.4.1介绍10.4.1 Introduction

MV-HEVC样本：MV-HEVC样本也为如在ISO/IEC 23008-2的附录F中定义的存取单元。MV-HEVC sample: An MV-HEVC sample is also an access unit as defined in Annex F of ISO/IEC 23008-2.

10.4.2标准次序及限制10.4.2 Standard order and restrictions

10.4.2.1限制10.4.2.1 Limitations

除8.3.2中的要求外，以下限制也应用于MV-HEVC数据。In addition to the requirements in 8.3.2, the following restrictions apply to MV-HEVC data.

●VCL NAL单元：在其组成时间为由一个存取单元表示的图片的组成时间的实例中应含有在所述存取单元中的所有VCL NAL单元。MV-HEVC样本应含有至少一VCL NAL单元。● VCL NAL unit: In the instance where its composition time is the composition time of a picture represented by one access unit, it shall contain all VCL NAL units in the access unit. An MV-HEVC sample shall contain at least one VCL NAL unit.

10.4.2.2解码器配置记录10.4.2.2 Decoder Configuration Record

当将在8.3.3.1中定义的解码器配置记录用于可解译为MV-HEVC或HEVC流的流时，HEVC解码器配置记录应反映HEVC兼容基础视图的性质，例如，其应仅含有对于解码HEVC基础视图所需要的参数集。When the decoder configuration record defined in 8.3.3.1 is used for a stream interpretable as an MV-HEVC or HEVC stream, the HEVC decoder configuration record shall reflect the properties of the HEVC compatible base view, e.g., it shall only contain the parameter sets required for decoding the HEVC base view.

MVHEVCDecoderConfigurationRecord在结构上与HEVCDecoderConfigurationRecord 相同。语法如下：MVHEVCDecoderConfigurationRecord is structurally identical to HEVCDecoderConfigurationRecord. The syntax is as follows:

aligned(8)class MVHEVCDecoderConfigurationRecord{aligned(8)class MVHEVCDecoderConfigurationRecord{

}}

MVHEVCDecoderConfigurationRecord中的字段的语义与针对HEVCDecoderConfigurationRecord所定义相同。The semantics of the fields in MVHEVCDecoderConfigurationRecord are the same as defined for HEVCDecoderConfigurationRecord.

10.4.3同步样本10.4.3 Synchronous Samples

如果存取单元中的每一经译码图片为无RASL图片的IRAP图片，那么将MV-HEVC 样本考虑为同步样本，如在ISO/IEC 23008-2中所定义。同步样本由同步样本表以文件记载，且可额外地由与在SHVC中类似地定义的同步样本样本群组及‘rap’样本群组以文件记载。If each coded picture in an access unit is an IRAP picture without a RASL picture, the MV-HEVC sample is considered a synchronization sample, as defined in ISO/IEC 23008-2. Synchronization samples are documented by a synchronization sample table and may additionally be documented by synchronization sample sample groups and 'rap' sample groups similarly defined in SHVC.

10.4.4独立且抛弃式样本框10.4.4 Independent and disposable sample frame

如果其用于HEVC及MV-HEVC两者皆兼容的播放轨中，那么应注意语句为真，不管使用MV-HEVC数据(可能仅HEVC数据)的何有效子集。如果信息变化，那么可能需要“未知”值(字段sample-depends-on、sample-is-depended-on及sample-has-redundancy 的值0)。If used in a track compatible with both HEVC and MV-HEVC, then note that this statement is true regardless of any valid subset of MV-HEVC data used (possibly only HEVC data). If this information varies, then an "unknown" value (a value of 0 for the fields sample-depends-on, sample-is-dependent-on, and sample-has-redundancy) may be required.

10.4.5关于随机存取恢复点及随机存取点的样本群组10.4.5 Random Access Recovery Points and Random Access Point Sample Groups

对于由类型‘hvc1’、‘hev1’、‘hvc2’或‘hev2’的样本项描述的视频数据，随机存取恢复样本群组及随机存取点样本群组分别识别用于对整个位流操作的HEVC解码及MV-HEVC解码器(如果有)的随机存取恢复点及随机存取点。For video data described by sample items of type ‘hvc1’, ‘hev1’, ‘hvc2’, or ‘hev2’, random access recovery sample groups and random access point sample groups identify random access recovery points and random access points, respectively, for HEVC decoding and MV-HEVC decoder (if any) operating on the entire bitstream.

对于由MV-HEVC样本项类型描述的视频数据，随机存取恢复样本群组识别整个MV-HEVC位流中的随机存取恢复，且随机存取点样本群组识别整个MV-HEVC位流中的随机存取点。For video data described by the MV-HEVC sample item type, a random access recovery sample group identifies random access recovery in the entire MV-HEVC bitstream, and a random access point sample group identifies a random access point in the entire MV-HEVC bitstream.

10.5从ISO基本媒体文件格式的导出10.5 Export from ISO Base Media File Format

10.5.1 MV-HEVC播放轨结构10.5.1 MV-HEVC Track Structure

多视图视频流由文件中的一或多个视频播放轨表示。每一播放轨表示流的一或多个视图。A multi-view video stream is represented by one or more video tracks in the file. Each track represents one or more views of the stream.

存在一或多个播放轨的最小集合，当将所述一或多个播放轨放在一起时，其含有经编码信息的完整集合。所有此些播放轨应具有在所有其样本项中的旗标“complete_representation”集合。形成完整经编码信息的此群播放轨叫作“完整子集”。There is a minimum set of one or more tracks that, when put together, contain a complete set of coded information. All such tracks should have the "complete_representation" flag set in all of their sample entries. This group of tracks that form the complete coded information is called a "complete subset."

让最低操作点为含有具有仅等于0的nuh_layer_id及仅等于0的TemporalId的NAL单元的所有操作点中的一者。应将含有最低操作点的播放轨提名为“基础视图播放轨”。为相同流的部分的所有其它播放轨应通过类型‘sbas’(视图基础)的播放轨参考而与此基础播放轨有联系。Let the lowest operation point be one of all operation points containing NAL units with only nuh_layer_id equal to 0 and only TemporalId equal to 0. The track containing the lowest operation point should be nominated as the "base view track". All other tracks that are part of the same stream should be related to this base track through track references of type 'sbas' (view base).

共享同一基础视图播放轨的所有播放轨必须共享与基础视图播放轨相同的时间标度。All tracks that share the same base view track must share the same timescale as the base view track.

如果由播放轨表示的视图将由另一播放轨表示的另一视图用作视图间预测参考，那么类型‘scal’的播放轨参考应包含于参考源播放轨的播放轨中，用于视图间预测。If a view represented by a track uses another view represented by another track as an inter-view prediction reference, then a track reference of type 'scal' should be included in the track that references the source track for inter-view prediction.

如果将编辑应用到含有MV-HEVC位流的视图分量，那么编辑列表应在受到编辑影响的所有播放轨上一致。If edits are applied to view components containing an MV-HEVC bitstream, the edit lists should be consistent across all tracks affected by the edits.

10.5.2存取单元的重建构10.5.2 Reconstruction of Access Units

为了从一或多个MV-HEVC播放轨的样本重建构存取单元，可能需要首先确定目标输出视图。In order to reconstruct an access unit from samples of one or more MV-HEVC tracks, it may be necessary to first determine the target output view.

对于解码确定的目标输出视图需要的视图可从在视图识别符框或‘scal’播放轨参考中包含的参考视图识别符得出。The views required for decoding the determined target output view can be derived from the reference view identifiers contained in the view identifier box or the 'scal' track reference.

如果若干播放轨含有用于存取单元的数据，那么在解码时间执行播放轨中的相应样本的对准，即，仅使用时间到样本表，而不考虑编辑列表。If several tracks contain data for an access unit, alignment of corresponding samples in the tracks is performed at decode time, ie only time-to-sample tables are used without taking edit lists into account.

通过按符合ISO/IEC 23008-02的次序排列其NAL单元从所需的播放轨中的相应样本重建构存取单元。以下次序提供形成符合的存取单元的程序的概括：An access unit is reconstructed from the corresponding samples in the required track by arranging its NAL units in an order conforming to ISO/IEC 23008-02. The following order provides an overview of the procedure for forming a conforming access unit:

●所有参数集NAL单元(从相关联的参数集播放轨且从相关联的基本流播放轨)。• All parameter set NAL units (from the associated parameter set track and from the associated elementary stream track).

●所有SEI NAL单元(从相关联的参数集播放轨且从相关联的基本流播放轨)。• All SEI NAL units (from the associated parameter set track and from the associated elementary stream track).

●按视图次序索引值的降序的视图分量。在视图分量内的NAL单元在所述样本内按其出现的次序。• View components in descending order of view order index values. NAL units within a view component in the order they appear within the sample.

10.5.3样本项10.5.3 Sample Items

10.5.3.1用于样本项的框10.5.3.1 Boxes for Sample Items

10.5.3.1.1视图识别符框10.5.3.1.1 View Identifier Box

10.5.3.1.1.1定义10.5.3.1.1.1 Definition

框类型：‘vwid’Frame type: ‘vwid’

容器：样本项(‘hev1’、‘hvc1’、‘hev2’、‘hvc2’、‘mhc1’、‘mhv1’)或MultiviewGroupEntryContainer: Sample Item (‘hev1’, ‘hvc1’, ‘hev2’, ‘hvc2’, ‘mhc1’, ‘mhv1’) or MultiviewGroupEntry

必选：是(对于样本项)Required: Yes (for sample items)

数量：正好一个(对于样本项)Quantity: Exactly one (for a sample item)

当包含于样本项中时，此框指示包含于播放轨中的视图。此框也指示针对每一列出的视图的视图次序索引。另外，当视图识别符框包含于样本项中时，所述框包含在播放轨中包含的temporal_id的最小值及最大值。此外，所述框指示对于解码播放轨中包含的视图所需要的参考的视图。When included in a sample entry, this box indicates the views included in the track. This box also indicates the view order index for each listed view. Additionally, when the View Identifier box is included in a sample entry, it contains the minimum and maximum temporal_id values for the track. Furthermore, this box indicates the reference views required to decode the views included in the track.

10.5.3.1.1.2语法10.5.3.1.1.2 Syntax

10.5.3.1.1.3语义10.5.3.1.1.3 Semantics

当视图识别符框包含于样本项中时，min_temporal_id、max_temporal_id分别选取在映射到播放轨或层的NAL单元的NAL单元标头扩展中存在的temporal_id 语法元素的相应最小值及最大值。对于AVC流，此选取在或将在首码NAL单元中的值。When the view identifier box is included in a sample entry, min_temporal_id and max_temporal_id take the corresponding minimum and maximum values of the temporal_id syntax element present in the NAL unit header extension of the NAL units mapped to the track or layer. For AVC streams, this takes the value that is or will be in the first NAL unit.

num_views指示当视图识别符框存在于样本项中时在播放轨中包含的视图的数目。num_views indicates the number of views included in the track when the view identifier box exists in the sample item.

layer_id[i]指示当视图识别符框包含于样本项中时在包含于播放轨中的层的NAL单元标头中的nuh_layer_id语法元素的值。layer_id[i] indicates the value of the nuh_layer_id syntax element in the NAL unit header of a layer included in a playback track when a view identifier box is included in a sample entry.

view_id指示具有等于layer_id[i]的nuh_layer_id的第i层的视图识别符，如在ISO/IEC 23008-2的附录F中所指定。view_id indicates the view identifier of the i-th layer with nuh_layer_id equal to layer_id[i], as specified in Annex F of ISO/IEC 23008-2.

base_view_type指示视图是否为基础视图(是否虚拟)。其选取以下值：base_view_type indicates whether the view is a base view (virtual or not). It can take the following values:

0指示视图既非基础视图，也非虚拟基础视图。0 indicates that the view is neither a base view nor a virtual base view.

1应用以标注MV-HEVC位流的非虚拟基础视图。1 applies to labeling the non-virtual base views of MV-HEVC bitstreams.

2为保留值且不应使用。The value 2 is reserved and should not be used.

3指示具有view_id[i]的视图为虚拟基础视图。具有view_id[i]的相应独立译码的非基础视图驻留于另一播放轨中，当base_view_type等于3时，随后 num_ref_views应等于0。3 indicates that the view with view_id[i] is a virtual base view. The corresponding independently coded non-base view with view_id[i] resides in another playback track. When base_view_type is equal to 3, then num_ref_views shall be equal to 0.

depdent_layer[i][j]指示具有等于j的nuh_layer_id的第j层可为具有等于layer_id[i]的nuh_layer_id的层的直接或是间接参考的层。当视图识别符框包含于样本项中时，推荐其指示同一样本项中的参考的视图。depdent_layer[i][j] indicates that the jth layer with nuh_layer_id equal to j may be a layer directly or indirectly referenced by the layer with nuh_layer_id equal to layer_id[i]. When the view identifier box is included in a sample entry, it is recommended to indicate the referenced view in the same sample entry.

10.5.3.2样本项定义10.5.3.2 Sample Item Definition

样本项类型：‘hvc2’、‘hev2’、‘mhc1’、‘mhv1’、‘mhcC’Sample item types: ‘hvc2’, ‘hev2’, ‘mhc1’, ‘mhv1’, ‘mhcC’

必选：‘hvc1’、‘hev1’、‘hvc2’、‘hev2’、‘mhc1’或‘mhv1’框中的一者为必选的Required: One of the ‘hvc1’, ‘hev1’, ‘hvc2’, ‘hev2’, ‘mhc1’, or ‘mhv1’ boxes is required

如果MV-HEVC基本流含有可使用的HEVC兼容基础层，那么应使用HEVC视觉样本项(‘hvc1’、‘hev1’、‘hvc2’、‘hev2’)。此处，所述项目一开始应含有HEVC配置框，可能接着为如下所定义的MV-HEVC配置框。HEVC配置框以文件记载涉及HEVC 兼容基础层的简档、层级及可能也有参数集，如由HEVCDecoderConfigurationRecord所定义。MV-HEVC配置框以文件记载涉及存储于MVHEVCConfigurationBox中的含有非基础视图的整个流的简档、层级及可能也有参数集信息(如由 MVHEVCDecoderConfigurationRecord所定义)。If the MV-HEVC elementary stream contains an applicable HEVC-compatible base layer, then the HEVC visual sample items ('hvc1', 'hev1', 'hvc2', 'hev2') shall be used. Here, the items shall initially contain the HEVC Configuration Box, which may be followed by the MV-HEVC Configuration Box as defined below. The HEVC Configuration Box documents the profile, tier, and possibly parameter set information for the HEVC-compatible base layer, as defined by HEVCDecoderConfigurationRecord. The MV-HEVC Configuration Box documents the profile, tier, and possibly parameter set information for the entire stream, including non-base views, stored in the MVHEVCConfigurationBox (as defined by MVHEVCDecoderConfigurationRecord).

对于所有样本项‘hvc1’、‘hev1’、‘hvc2’、‘hev2’，样本项中的宽度及高度字段以文件记载HEVC基础层。对于MV-HEVC样本项(‘mhc1’、‘mhv1’)，宽度及高度以文件记载通过解码整个流的任一单一视图所达成的分辨率。For all sample entries 'hvc1', 'hev1', 'hvc2', and 'hev2', the width and height fields in the sample entry document the HEVC base layer. For MV-HEVC sample entries ('mhc1', 'mhv1'), the width and height fields document the resolution achieved by decoding any single view of the entire stream.

如果MV-HEVC基本流不含有可使用的HEVC基础层，那么应使用MV-HEVC视觉样本项(‘mhc1’、‘mhv1’)。MV-HEVC视觉样本项应含有如下所定义的MV-HEVC配置框。此包含MVHEVCDecoderConfigurationRecord，如在此国际标准中所定义。If the MV-HEVC elementary stream does not contain an applicable HEVC base layer, then the MV-HEVC visual sample item ('mhc1', 'mhv1') shall be used. The MV-HEVC visual sample item shall contain the MV-HEVC configuration box defined below. This contains the MVHEVCDecoderConfigurationRecord as defined in this International Standard.

在任一给定样本项中的MV-HEVC及HEVC配置中的lengthSizeMinusOne字段应具有相同值。The lengthSizeMinusOne field in the MV-HEVC and HEVC profiles in any given sample entry shall have the same value.

此处也适用如在6.5.3.1.1中以文件记载的对于相同项类型‘hvc1’及‘hev1’的要求。The requirements for the same item types 'hvc1' and 'hev1' as documented in 6.5.3.1.1 also apply here.

MVHEVCConfigurationBox可存在于‘hvc1’、‘hev1’、‘hvc2’、‘hev2’样本项中。在此些情况下，以下HEVCMVHEVCSampleEntry或HEVC2MVHEVCSampleEntry定义分别适用。MVHEVCConfigurationBox may be present in 'hvc1', 'hev1', 'hvc2', or 'hev2' sample entries. In these cases, the following HEVCMVHEVCSampleEntry or HEVC2MVHEVCSampleEntry definitions apply, respectively.

基础类别VisualSampleEntry中的Compressorname指示使用的压缩器的名称，其中推荐值“\014MV-HEVC Coding”(\016为14，以字节为单元的字符串“MV-HEVC coding”的长度)。Compressorname in the base class VisualSampleEntry indicates the name of the compressor used, where the recommended value is "\014MV-HEVC Coding" (\016 is 14, the length of the string "MV-HEVC coding" in bytes).

直接或通过来自提取器的参考解码存在于视频流的样本数据中的NAL单元所需要的参数集应存在于所述视频流的解码器配置中或相关联的参数集流(如果使用)中。The parameter sets required to decode NAL units present in the sample data of a video stream, either directly or by reference from an extractor, shall be present in the decoder configuration of that video stream or in the associated parameter set stream (if used).

下表展示对于视频播放轨的样本项(当MV-HEVC基本流存储于一或多个播放轨中时)、配置及MV-HEVC工具的所有可能使用。The following table shows sample entries for video tracks (when MV-HEVC elementary streams are stored in one or more tracks), configurations, and all possible uses of MV-HEVC tools.

表14用于HEVC及MV-HEVC播放轨的样本项的使用Table 14: Use of sample items for HEVC and MV-HEVC playback tracks

以下中的样本项mvhevc-type为{mhv1,mhc1}中的一者。The sample item mvhevc-type in the following is one of {mhv1,mhc1}.

10.5.3.3语法10.5.3.3 Syntax

10.5.4用于MV-HEVC的子样本的定义10.5.4 Definition of subsamples for MV-HEVC

类似于针对SHVC定义的定义来定义用于MV-HEVC的子样本的定义。The definition of subsamples for MV-HEVC is defined similarly to that defined for SHVC.

10.5.5处置非输出样本10.5.5 Handling Non-Output Samples

类似于针对SHVC定义的处置来处置用于MV-HEVC的非输出样本。Non-output samples for MV-HEVC are handled similarly to the handling defined for SHVC.

以下展示到附录A的改变。The following shows the changes to Appendix A.

附录A(标准)Appendix A (Standard)

流中结构In-stream structure

A.1介绍A.1 Introduction

聚集器及提取器为实现NAL单元的高效分组或从其它播放轨提取NAL单元的文件格式内部结构。Aggregators and extractors implement the internal structure of the file format to achieve efficient grouping of NAL units or extract NAL units from other playback tracks.

聚集器及提取器使用NAL单元语法。此些结构被看作样本结构的上下文中的NAL单元。在存取样本时，必须移除聚集器(留下其含有或参考的NAL单元)且提取器必须由其参考的数据替换。聚集器及提取器必须不存在于文件格式外的流中。Aggregators and extractors use NAL unit syntax. These structures are treated as NAL units in the context of a sample structure. When accessing a sample, the aggregator must be removed (leaving the NAL units it contains or references) and the extractor must be replaced by the data it references. Aggregators and extractors must not be present in streams outside the file format.

此些结构使用由ISO/IEC 14496-10或ISO/IEC 23008-2针对应用/输送层保留的NAL 单元类型。Such structures use NAL unit types reserved by ISO/IEC 14496-10 or ISO/IEC 23008-2 for the application/transport layer.

注意以下来自ISO/IEC 14496-10：Note the following from ISO/IEC 14496-10:

“注意——可使用NAL单元类型0及24..31，如由应用程序所确定。在此推荐国际标准中未指定针对此些nal_unit_type值的解码过程。”"NOTE—NAL unit types 0 and 24..31 may be used, as determined by the application. The decoding process for these nal_unit_type values is not specified in this Recommended International Standard."

注意以下来自ISO/IEC 23008-2：Note the following from ISO/IEC 23008-2:

“注意1——可使用在UNSPEC48..UNSPEC63的范围中的NAL单元类型，如由应用程序所确定。在此规格中未指定针对此些nal_unit_type值的解码过程。由于不同应用程序可将此些NAL单元类型用于不同目的，因此必须在产生具有此些nal_unit_type值的NAL单元的编码器的设计中及在解译具有此些 nal_unit_type值的NAL单元的内容的解码器的设计中格外小心。”"NOTE 1—NAL unit types in the range of UNSPEC48..UNSPEC63 may be used, as determined by the application. The decoding process for such nal_unit_type values is not specified in this specification. Because different applications may use these NAL unit types for different purposes, care must be taken in the design of encoders that generate NAL units with such nal_unit_type values and in the design of decoders that interpret the content of NAL units with such nal_unit_type values."

A.2聚集器A.2 Aggregator

A.2.1定义A.2.1 Definition

此子条款描述使NALU映射群组项能够一致且重复的聚集器。(见附录B)。This subclause describes the aggregator that enables consistent and repeatable NALU mapping group entries (see Appendix B).

聚集器用以对属于同一样本的NAL单元分组。Aggregators are used to group NAL units belonging to the same sample.

为了ISO/IEC 14496-10视频的存储，以下规则适用：For the storage of ISO/IEC 14496-10 video, the following rules apply:

-聚集器使用与SVC VCL NAL单元或MVC VCL NAL单元相同但具有不同 NAL单元类型值的NAL单元标头。- The aggregator uses the same NAL unit header as the SVC VCL NAL unit or MVC VCL NAL unit but with a different NAL unit type value.

-当聚集器的NAL单元语法(在ISO/IEC 14496-10的7.3.1中所指定)的 svc_extension_flag等于1时，SVC VCL NAL单元的NAL单元标头用于聚集器。否则，将MVC VCLNAL单元的NAL单元标头用于聚集器。- When the svc_extension_flag of the NAL unit syntax of the aggregator (specified in 7.3.1 of ISO/IEC 14496-10) is equal to 1, the NAL unit header of the SVC VCL NAL unit is used for the aggregator. Otherwise, the NAL unit header of the MVC VCL NAL unit is used for the aggregator.

为了ISO/IEC 23008-2视频的存储，聚集器使用如在ISO/IEC 23008-2中所定义的NAL单元标头，其对于普通HEVC、SHVC及MV-HEVC具有相同语法。For storage of ISO/IEC 23008-2 video, the aggregator uses the NAL unit header as defined in ISO/IEC 23008-2, which has the same syntax for normal HEVC, SHVC and MV-HEVC.

聚集器可通过包含来将NAL单元聚集于其内(在由其长度指示的大小内)，且也通过参考聚集其后的NAL单元(在由其内的additional_bytes字段指示的区域内)。当流由AVC或HEVC文件读取器扫描时，仅将包含的NAL单元看作“在聚集器内”。此准许AVC 或HEVC文件读取器跳过整个一组不需要的NAL单元(当其通过包含而经聚集时)。此也准许AVC或HEVC读取器不跳过需要的NAL单元，而让其保持在流中(当其通过参考而经聚集时)。An aggregator can aggregate NAL units within it by inclusion (within the size indicated by its length), and also aggregate subsequent NAL units by reference (within the area indicated by the additional_bytes field within it). When the stream is scanned by an AVC or HEVC file reader, only the included NAL units are considered "within the aggregator". This permits an AVC or HEVC file reader to skip an entire set of unneeded NAL units (when they are aggregated by inclusion). This also permits an AVC or HEVC reader to not skip needed NAL units, but keep them in the stream (when they are aggregated by reference).

聚集器可用以将基础层或基础视图NAL单元分组。如果将此些聚集器用于‘avc1’、‘hvc1’或‘hev1’播放轨中，那么聚集器不应使用基础层或基础视图NAL单元的包含，而使用基础层或基础视图NAL单元的参考(聚集器的长度仅包含其标头，且由聚集器参考的NAL单元由additional_bytes指定)。Aggregators can be used to group base layer or base view NAL units. If such aggregators are used in 'avc1', 'hvc1', or 'hev1' tracks, the aggregator should not use the inclusion of base layer or base view NAL units, but rather references to them (the length of the aggregator includes only its header, and the NAL units referenced by the aggregator are specified by additional_bytes).

当聚集器由具有等于零的data_length的提取器或由映射样本群组参考时，将聚集器作为聚集包含及参考的字节对待。When an aggregator is referenced by an extractor with data_length equal to zero or by a mapped sample group, the aggregator is treated as aggregating the bytes included and referenced.

聚集器可包含或参考提取器。提取器可从聚集器提取。聚集器必须不直接包含或参考另一聚集器；然而，聚集器可包含或参考参考聚集器的提取器。Aggregators may contain or reference extractors. Extractors may extract from an aggregator. An aggregator must not directly contain or reference another aggregator; however, an aggregator may contain or reference an extractor that references an aggregator.

当扫描流时：When scanning a stream:

a)如果聚集器未经辨识(例如，由AVC或HEVC读取器或解码器)，那么其易于与其包含的内容一起被舍弃；a) If an aggregator is not recognized (e.g., by an AVC or HEVC reader or decoder), it is susceptible to being discarded along with the content it contains;

b)如果不需要聚集器(即，其属于不当层)，那么其及其通过包含及参考两者的内容易于被舍弃(使用其长度及additional_bytes字段)；b) If the aggregator is not needed (i.e. it belongs to the wrong layer), then it and its contents, both by inclusion and reference, can be discarded (using its length and additional_bytes fields);

c)如果需要聚集器，那么易于舍弃其标头且保留其内容。c) If an aggregator is needed, it is easy to discard its header and keep its content.

将聚集器存储于如任一其它NAL单元的样本内。Aggregators are stored within samples like any other NAL unit.

所有NAL单元按解码次序保持处于聚集器内。All NAL units are kept within the aggregator in decoding order.

A.2.2语法A.2.2 Syntax

A.2.3语义A.2.3 Semantics

变量AggregatorSize的值等于聚集器NAL单元的大小，且函数sizeof(X)按字节返回字段X的大小。The value of the variable AggregatorSize is equal to the size of the aggregator NAL unit, and the function sizeof(X) returns the size of the field X in bytes.

NALUnitHeader()：SVC及MVC VCL NAL单元的首先四个字节，或ISO/IEC 23008- 2NAL单元的首先两个字节。 NALUnitHeader(): The first four bytes of the SVC and MVC VCL NAL unit, or the first two bytes of the ISO/IEC 23008-2 NAL unit.

nal_unit_type应设定到聚集器NAL单元类型(对于ISO/IEC 14496-10视频为类型 30且对于ISO/IEC 23008-2视频为类型48)。 nal_unit_type shall be set to the aggregator NAL unit type (type 30 for ISO/IEC 14496-10 video and type 48 for ISO/IEC 23008-2 video).

对于包含或参考SVC NAL单元的聚集器，以下应适用。For aggregators that contain or reference SVC NAL units, the following shall apply.

应如在ISO/IEC 14496-10中所指定设定forbidden_zero_bit及 reserved_three_2bits。The forbidden_zero_bit and reserved_three_2bits shall be set as specified in ISO/IEC 14496-10.

应如在A.4中所指定设定其它字段(nal_ref_idc、idr_flag,priority_id、 no_inter_layer_pred_flag、dependency_id、quality_id、temporal_id、 use_ref_base_pic_flag、discardable_flag及output_flag)。The other fields (nal_ref_idc, idr_flag, priority_id, no_inter_layer_pred_flag, dependency_id, quality_id, temporal_id, use_ref_base_pic_flag, discardable_flag, and output_flag) shall be set as specified in A.4.

对于包含或参考MVC NAL单元的聚集器，以下应适用。For aggregators that include or reference MVC NAL units, the following shall apply.

应如在ISO/IEC 14496-10 中所指定设定 forbidden_zero_bit及reserved_one_The forbidden_zero_bit and reserved_one_ bit。bit.

应如在A.5中所指定设定其它字段(nal_ref_idc、non_idr_flag、priority_id、 view_id、temporal_id、anchor_pic_flag及inter_view_flag)。对于包含或参考ISO/IEC23008-2NAL单元的聚集器，以下应适用。The other fields (nal_ref_idc, non_idr_flag, priority_id, view_id, temporal_id, anchor_pic_flag, and inter_view_flag) shall be set as specified in A.5.For aggregators containing or referencing ISO/IEC 23008-2 NAL units, the following shall apply.

应如在ISO/IEC 23008-2中所指定设定forbidden_zero_bit。The forbidden_zero_bit shall be set as specified in ISO/IEC 23008-2.

应如在A.6中所指定设定其它字段(nuh_layer_id及 nuh_temporal_id_plus1)。The other fields (nuh_layer_id and nuh_temporal_id_plus1) shall be set as specified in A.6.

additional_bytes：当此聚集器由具有等于零的data_length或映射样本群组的提取器参考时应被考虑为聚集的在此聚集器NAL单元后的字节的数目。additional_bytes: The number of bytes following this aggregator NAL unit that should be considered as aggregated when this aggregator is referenced by an extractor with data_length equal to zero or a mapped sample group.

NALUnitLength：指定NAL单元遵循的大小(以字节计)。此字段的大小通过lengthSizeMinusOne字段指定。NALUnitLength: Specifies the size (in bytes) of the NAL unit to follow. The size of this field is specified by the lengthSizeMinusOne field.

NALUnit：如在ISO/IEC 14496-10或ISO/IEC 23008-2中指定的NAL单元，包含NAL单元标头。NAL单元的大小由NALUnitLength指定。NALUnit: A NAL unit as specified in ISO/IEC 14496-10 or ISO/IEC 23008-2 , including a NAL unit header. The size of the NAL unit is specified by NALUnitLength.

A.3提取器A.3 Extractor

A.3.1定义A.3.1 Definition

此子条款描述实现通过参考从其它播放轨提取NAL单元数据的播放轨的紧密形成的提取器。This subclause describes a compact extractor that implements a track that extracts NAL unit data from other tracks by reference.

聚集器可包含或参考提取器。提取器可参考聚集器。当提取器由需要其的文件读取器处理时，提取器由其参考的字节逻辑替换。那些字节不必含有提取器；提取器不必直接或间接参考另一提取器。Aggregators may contain or reference extractors. Extractors may reference aggregators. When an extractor is processed by a file reader that requires it, the extractor is logically replaced by the bytes it references. Those bytes do not necessarily contain the extractor; an extractor does not necessarily reference another extractor, directly or indirectly.

注意参考的播放轨可含有提取器，即使由提取器参考的数据不必。Note that a referenced track may contain extractors, even though the data referenced by the extractor does not have to.

提取器含有通过类型‘scal’的播放轨参考从另一播放轨提取数据的指令，所述另一播放轨与提取器驻留于其中的播放轨有联系。An extractor contains instructions to extract data from another track via a track reference of type 'scal', the other track being related to the track in which the extractor resides.

复制的字节应为以下中的一者：The bytes copied should be one of the following:

a)一整个NAL单元；注意，当参考聚集器时，复制包含的及参考的字节a) An entire NAL unit; note that when referencing an aggregator, the contained and referenced bytes are copied

b)一个以上整个NAL单元b) More than one entire NAL unit

在两个情况下，提取的字节开始于有效长度字段及NAL单元标头。In both cases, the extracted bytes begin with the valid length field and the NAL unit header.

仅从经由指示的‘scal’播放轨参考参考的播放轨中的单一识别的样本复制字节。对准是在解码时间，即，仅使用时间到样本表，接着为样本数目的计数的偏移。提取器为媒体级概念且因此在考虑任一编辑列表前适用于目的地播放轨。(然而，通常将预期两个播放轨中的编辑列表将相同)。Only bytes are copied from a single identified sample in the track referenced via the indicated 'scal' track reference. Alignment is at decode time, i.e., only the time to the sample table is used, followed by an offset counting the number of samples. Extractors are a media-level concept and are therefore applied to the destination track before any edit lists are considered. (However, it would generally be expected that the edit lists in both tracks would be identical).

A.3.2语法A.3.2 Syntax

A.3.3语义A.3.3 Semantics

应将nal_unit_type设定到提取器NAL单元类型(对于ISO/IEC 14496-10视频为类型31且对于ISO/IEC 23008-2视频为类型49)。 nal_unit_type should be set to the extractor NAL unit type ( type 31 for ISO/IEC 14496-10 video and type 49 for ISO/IEC 23008-2 video).

对于参考SVC NAL单元的提取器，以下应适用。For extractors that reference SVC NAL units, the following shall apply.

对于参考MVC NAL单元的提取器，以下应适用。For extractors that reference MVC NAL units, the following shall apply.

应如在ISO/IEC 14496-10中所指定设定forbidden_zero_bit及 reserved_one_bit。The forbidden_zero_bit and reserved_one_bit shall be set as specified in ISO/IEC 14496-10.

应如在A.5中所指定设定其它字段(nal_ref_idc、non_idr_flag、 priority_id、view_id、temporal_id、anchor_pic_flag及inter_view_flag)。The other fields (nal_ref_idc, non_idr_flag, priority_id, view_id, temporal_id, anchor_pic_flag, and inter_view_flag) should be set as specified in A.5.

对于参考ISO/IEC 23008-2NAL单元的提取器，以下应适用。For extractors referencing ISO/IEC 23008-2 NAL units, the following shall apply.

应如在A.6中所指定设定其它字段(nuh_layer_id及nuh_temporal_id_plus1)。The other fields (nuh_layer_id and nuh_temporal_id_plus1) shall be set as specified in A.6.

track_ref_index指定类型‘scal’的播放轨参考的索引以用以找到提取数据所来自的播放轨。数据提取自的所述播放轨中的样本在时间上对准或在媒体解码时间线中最紧接于前(即，仅使用时间到样本表)，通过具有含有提取器的样本的 sample_offset所指定的偏移来调整。第一播放轨参考具有索引值1；值0为保留的。track_ref_index specifies the index of a track reference of type 'scal' used to find the track from which data is to be extracted. The samples in the track from which data is extracted are aligned in time or to the immediately preceding one in the media decoding timeline (i.e., using only the time-to-sample table), adjusted by the offset specified by sample_offset with the sample containing the extractor. The first track reference has index value 1; value 0 is reserved.

sample_offset给出应用作信息的源的有联系的播放轨中的样本的相对索引。样本0(零)为具有与含有提取器的样本的解码时间相比相同或最紧靠于前的解码时间的样本；样本1(一)为下一个样本，样本-1(负1)为先前样本，等等。sample_offset gives the relative index of the sample in the associated track that should be used as the source of the information. Sample 0 (zero) is the sample with the same or immediately preceding decode time compared to the decode time of the sample containing the extractor; sample 1 (one) is the next sample, sample -1 (minus 1) is the previous sample, and so on.

data_offset：在参考样本内的复制的第一字节的偏移。如果提取开始于所述样本中的数据的第一字节，那么偏移选取值0。偏移应参考NAL单元长度字段的开头。data_offset: The offset of the first byte of the copy within the reference sample. If the extraction starts at the first byte of data in the sample, then the offset takes the value 0. The offset shall be referenced to the start of the NAL unit length field.

data_length：复制的字节的数目。如果此字段选取值0，那么复制整个单一参考的NAL单元(即，复制的长度从由数据偏移参考的长度字段取得，在聚集器的情况下，由additional_bytes字段扩增)。data_length: Number of bytes copied. If this field takes the value 0, then the entire single referenced NAL unit is copied (i.e., the length of the copy is taken from the length field referenced by the data offset, augmented by the additional_bytes field in the case of an aggregator).

注意如果两个播放轨使用不同lengthSizeMinusOne值，那么提取的数据将需要重新格式化以符合目的地播放轨的长度字段大小。Note that if two tracks use different lengthSizeMinusOne values, the extracted data will need to be reformatted to match the length field size of the destination track.

A.4 SVC的NAL单元标头值A.4 SVC NAL unit header values

提取器及聚集器皆使用NAL单元标头SVC扩展。由提取器提取或由聚集器聚集的NAL单元为通过递归式检验聚集器或提取器NAL单元的内容而参考或包含的所有那些 NAL单元。Both extractors and aggregators use the NAL unit header SVC extension. The NAL units extracted by an extractor or aggregated by an aggregator are all those NAL units referenced or included by recursively checking the contents of the aggregator or extractor NAL units.

字段nal_ref_idc、idr_flag、priority_id、temporal_id、dependency_id、quality_id、 discardable_flag、output_flag、use_ref_base_pic_flag及no_inter_layer_pred_flag应选取以下值：The fields nal_ref_idc, idr_flag, priority_id, temporal_id, dependency_id, quality_id, discardable_flag, output_flag, use_ref_base_pic_flag, and no_inter_layer_pred_flag shall take the following values:

nal_ref_idc应设定到在所有提取的或聚集的NAL单元中的字段的最高值。nal_ref_idc shall be set to the highest value of the field in all extracted or aggregated NAL units.

idr_flag应设定到在所有提取的或聚集的NAL单元中的字段的最高值。The idr_flag shall be set to the highest value of the field in all extracted or aggregated NAL units.

priority_id、temporal_id、dependency_id及quality_id应分别设定到在所有提取的或聚集的NAL单元中的字段的最低值。priority_id, temporal_id, dependency_id, and quality_id shall each be set to the lowest value of the field in all extracted or aggregated NAL units.

如果且仅如果所有提取的或聚集的NAL单元具有设定到1的discardable_flag，那么应将discardable_flag设定到1，且否则，将其设定到0。The discardable_flag shall be set to 1 if and only if all extracted or aggregated NAL units have the discardable_flag set to 1, and shall be set to 0 otherwise.

如果聚集的或提取的NAL单元中的至少一者具有设定到1的output_flag，将应将此旗标设定到1，且否则，将其设定到0。This flag should be set to 1 if at least one of the aggregated or extracted NAL units has output_flag set to 1, and to 0 otherwise.

如果且仅如果提取的或聚集的VCL NAL单元中的至少一者具有设定到1的 use_ref_base_pic_flag，那么应将use_ref_base_pic_flag设定到1，且否则，将其设定到0。The use_ref_base_pic_flag shall be set to 1 if and only if at least one of the extracted or aggregated VCL NAL units has the use_ref_base_pic_flag set to 1, and shall be set to 0 otherwise.

如果且仅如果所有提取的或聚集的VCL NAL单元具有设定到1的 no_inter_layer_pred_flag，那么应将no_inter_layer_pred_flag设定到1，且否则，将其设定到0。The no_inter_layer_pred_flag shall be set to 1 if and only if all extracted or aggregated VCL NAL units have the no_inter_layer_pred_flag set to 1, and shall be set to 0 otherwise.

如果提取的或聚集的NAL的组合为空，那么此些字段中的每一者选取与映射的层描述一致的值。If the combination of extracted or aggregated NALs is empty, then each of these fields takes a value consistent with the mapped layer description.

注意聚集器可将具有不同可缩放性信息的NAL单元分组。Note that the aggregator may group NAL units with different scalability information.

注意聚集器可用以将可不由NAL单元标头传信的属于可缩放性等级的 NAL单元(例如，属于相关区的NAL单元)分组。此些聚集器的描述可通过层描述及NAL单元映射群组进行。在此情况下，在一个实例中可出现具有相同可缩放性信息的一个以上聚集器。Note that aggregators can be used to group NAL units belonging to scalability levels that may not be signaled by the NAL unit header (e.g., NAL units belonging to related regions). Such aggregators can be described using layer descriptions and NAL unit map groups. In this case, more than one aggregator with the same scalability information can appear in one example.

注意如果多个可缩放播放轨参考相同媒体数据，那么聚集器应仅将具有相同可缩放性信息的NAL单元分组。此确保所得模式可由播放轨中的每一者存取。Note that if multiple scalable playback tracks reference the same media data, the aggregator should only group NAL units with the same scalability information. This ensures that the resulting mode can be accessed by each of the playback tracks.

注意如果特定层中无NAL单元于存取单元中，那么可存在空聚集器(其中聚集器的长度仅包含标头，且additional_bytes为零)。Note that if there are no NAL units in a particular layer in an access unit, then there may be an empty aggregator (where the length of the aggregator includes only the header, and additional_bytes is zero).

A.5 MVC的NAL单元标头值A.5 NAL unit header values for MVC

聚集器及提取器皆使用NAL单元标头MVC扩展。由提取器提取或由聚集器聚集的NAL单元为通过递归式检验聚集器或提取器NAL单元的内容而参考或包含的所有那些 NAL单元。Both aggregators and extractors use the NAL unit header MVC extension. The NAL units extracted by an extractor or aggregated by an aggregator are all those NAL units referenced or included by recursively examining the contents of aggregator or extractor NAL units.

字段nal_ref_idc、non_idr_flag、priority_id、view_id、temporal_id、anchor_pic_flag 及inter_view_flag应选取以下值：The fields nal_ref_idc, non_idr_flag, priority_id, view_id, temporal_id, anchor_pic_flag, and inter_view_flag shall take the following values:

nal_ref_idc应设定到在所有聚集的或提取的NAL单元中的字段的最高值。nal_ref_idc shall be set to the highest value of the field in all aggregated or extracted NAL units.

non_idr_flag应设定到在所有聚集的或提取的NAL单元中的字段的最低值。The non_idr_flag shall be set to the lowest value of the fields in all aggregated or extracted NAL units.

priority_id及temporal_id应分别设定到在所有聚集的或提取的NAL单元中的字段的最低值。priority_id and temporal_id shall be set to the lowest value of the fields in all aggregated or extracted NAL units, respectively.

view_id应设定到所有聚集的或提取的VCL NAL单元当中的具有最低视图次序索引的VCL NAL单元的view_id值。view_id shall be set to the view_id value of the VCL NAL unit with the lowest view order index among all aggregated or extracted VCL NAL units.

anchor_pic_flag及inter_view_flag应分别设定到在所有聚集的或提取的VCLNAL单元中的字段的最高值。The anchor_pic_flag and inter_view_flag shall be set to the highest value of the fields in all aggregated or extracted VCL NAL units, respectively.

如果提取的或聚集的NAL的组合为空，那么此些字段中的每一者选取与映射的层If the combination of extracted or aggregated NALs is empty, then each of these fields selects the layer to which the mapping is applied. 描述一致的值。Describes consistent values.

A.6用于ISO/IEC 23008-2的NAL单元标头值A.6 NAL unit header values for ISO/IEC 23008-2

聚集器及提取器皆使用如在ISO/IEC 23008-2中指定的NAL单元标头。由提取器提Both the Aggregator and the Extractor use the NAL unit header as specified in ISO/IEC 23008-2. 取或由聚集器聚集的NAL单元为通过递归式检验聚集器或提取器NAL单元的内容而参考或NAL units that are fetched or aggregated by an aggregator are referenced or retrieved by recursively examining the contents of aggregator or extractor NAL units. 包含的所有那些NAL单元。All those NAL units contained.

字段nuh_layer_id及nuh_temporal_id_plus1应如下设定：The fields nuh_layer_id and nuh_temporal_id_plus1 should be set as follows:

nuh_layer_id应设定到所有聚集的或提取的NAL单元中的字段的最低值。nuh_layer_id shall be set to the lowest value of the fields in all aggregated or extracted NAL units.

nuh_temporal_id_plus1应设定到所有聚集的或提取的NAL单元中的字段的最低nuh_temporal_id_plus1 shall be set to the lowest of the fields in all aggregated or extracted NAL units. 值。value.

在一个替代性实例中，定义新结构、表或样本群组以文件记载所有IRAP存取单元，如在MV-HEVC WD5或SHVC WD3的附录A中所定义。替代地，定义所述新结构、表或样本群组以文件记载所有IRAP存取单元(如在MV-HEVC WD5或SHVC WD3的附录 F中所定义)，但不包含所有经译码图片为IRAP图片的那些存取单元。在另一替代性实例中，重新定义同步样本样本群组项SyncSampleEntry包含在保留的位中的指定属于此群组的样本中的所有图片为IDR图片、CRA图片或BLA图片的一者中的 aligned_sync_flag。在另一替代性实例中，定义用于SHVC及MV-HEVC的共同文件格式包含来自SHVC及MV-HEVC文件格式的所有共同方面，且仅将SHVC及MV-HEVC 文件格式重新定义为仅包含仅与所述扩展有关的方面。在另一替代性实例中，定义SHVC 元数据样本项SHVCMetadataSampleEntry及SHVCMetadataSampleConfigBox，且也定义元数据样本语句类型scalabilityInfoSHVCStatement。In one alternative example, a new structure, table, or sample group is defined to document all IRAP access units, as defined in Annex A of MV-HEVC WD5 or SHVC WD3. Alternatively, the new structure, table, or sample group is defined to document all IRAP access units (as defined in Annex F of MV-HEVC WD5 or SHVC WD3), but excluding those access units where all coded pictures are IRAP pictures. In another alternative example, the synchronization sample sample group entry SyncSampleEntry is redefined to include an aligned_sync_flag in a reserved bit specifying that all pictures in samples belonging to this group are one of IDR pictures, CRA pictures, or BLA pictures. In another alternative example, a common file format for SHVC and MV-HEVC is defined to include all common aspects from the SHVC and MV-HEVC file formats, and the SHVC and MV-HEVC file formats are redefined to include only aspects relevant only to the extension. In another alternative example, SHVC metadata sample items SHVCMetadataSampleEntry and SHVCMetadataSampleConfigBox are defined, and a metadata sample statement type scalabilityInfoSHVCStatement is also defined.

图2为说明可实施本发明中所描述的技术的实例视频编码器20的框图。视频编码器20可经配置以输出单一视图、多视图、可缩放、3D及其它类型的视频数据。视频编码器20可经配置以将视频输出到后处理实体27。后处理实体27意欲表示可处理来自视频编码器20的经编码视频数据的视频实体(例如，MANE或拼接/编辑装置)的实例。在一些情况下，后处理实体可为网络实体的实例。在一些视频编码系统，后处理实体27 及视频编码器20可为分开的装置的部分，而在其它情况下，关于后处理实体27描述的功能性可由包括视频编码器20的同一装置执行。后处理实体27可为视频装置。在一些实例中，后处理实体27可与图1的文件产生装置34相同。FIG2 is a block diagram illustrating an example video encoder 20 that may implement the techniques described in this disclosure. Video encoder 20 may be configured to output single-view, multi-view, scalable, 3D, and other types of video data. Video encoder 20 may be configured to output video to a post-processing entity 27. Post-processing entity 27 is intended to represent an example of a video entity (e.g., a MANE or splicing/editing device) that may process the encoded video data from video encoder 20. In some cases, the post-processing entity may be an example of a network entity. In some video coding systems, post-processing entity 27 and video encoder 20 may be parts of separate devices, while in other cases, the functionality described with respect to post-processing entity 27 may be performed by the same device that includes video encoder 20. Post-processing entity 27 may be a video device. In some examples, post-processing entity 27 may be the same as file generation device 34 of FIG1 .

视频编码器20可执行视频切片内的视频块的帧内译码及帧间译码。帧内译码依赖于空间预测以减小或移除给定视频帧或图片内的视频中的空间冗余。帧间译码依赖于时间预测以减小或移除视频序列的邻近帧或图片内的视频的时间冗余。帧内模式(I模式) 可指若干基于空间的压缩模式中的任一者。帧间模式(例如，单向预测(P模式)或双向预测(B模式))可指若干基于时间的压缩模式中的任一者。Video encoder 20 may perform intra- and inter-coding of video blocks within a video slice. Intra-coding relies on spatial prediction to reduce or remove spatial redundancy in video within a given video frame or picture. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy in video within adjacent frames or pictures of a video sequence. Intra-mode (I-mode) may refer to any of several spatial-based compression modes. Inter-mode (e.g., unidirectional prediction (P-mode) or bidirectional prediction (B-mode)) may refer to any of several temporal-based compression modes.

在图2的实例中，视频编码器20包含分割单元35、预测处理单元41、滤波器单元63、参考图片存储器64、求和器50、变换处理单元52、量化单元54及熵编码单元56。预测处理单元41包含运动估计单元42、运动补偿单元44及帧内预测处理单元46。为了视频块重建构，视频编码器20也包含反量化单元58、反变换处理单元60及求和器 62。滤波器单元63意欲表示一或多个环路滤波器，例如，解块滤波器、自适应环路滤波器(ALF)及样本自适应偏移(SAO)滤波器。尽管滤波器单元63在图2中展示为环路滤波器，但在其它配置中，滤波器单元63可实施为后环路滤波器。2 , video encoder 20 includes partitioning unit 35, prediction processing unit 41, filter unit 63, reference picture memory 64, summer 50, transform processing unit 52, quantization unit 54, and entropy encoding unit 56. Prediction processing unit 41 includes motion estimation unit 42, motion compensation unit 44, and intra-prediction processing unit 46. For video block reconstruction, video encoder 20 also includes inverse quantization unit 58, inverse transform processing unit 60, and summer 62. Filter unit 63 is intended to represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter. Although filter unit 63 is shown in FIG2 as a loop filter, in other configurations, filter unit 63 may be implemented as a post-loop filter.

视频编码器20的视频数据存储器可存储待由视频编码器20的组件编码的视频数据。存储于视频数据存储器中的视频数据可(例如)从视频源18获得。参考图片存储器 64可为存储参考视频数据用于由视频编码器20在编码视频数据过程中使用(例如，在帧内或帧间译码模式中)的参考图片存储器。视频数据存储器及参考图片存储器64可由多种存储器装置中的任一者形成，例如，动态随机存取存储器(DRAM)(包含同步 DRAM(SDRAM))、磁阻式RAM(MRAM)、电阻式RAM(RRAM)或其它类型的存储器装置。视频数据存储器及参考图片存储器64可由相同的存储器装置或单独存储器装置来提供。在各种实例中，视频数据存储器可与视频编码器20的其它组件一起在芯片上，或相对于那些组件在芯片外。The video data memory of video encoder 20 may store video data to be encoded by the components of video encoder 20. The video data stored in the video data memory may be obtained, for example, from video source 18. Reference picture memory 64 may be a reference picture memory that stores reference video data for use by video encoder 20 in encoding video data (e.g., in intra- or inter-coding modes). Video data memory and reference picture memory 64 may be formed from any of a variety of memory devices, such as dynamic random access memory (DRAM) (including synchronous DRAM (SDRAM)), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. Video data memory and reference picture memory 64 may be provided by the same memory device or separate memory devices. In various examples, the video data memory may be on-chip with other components of video encoder 20 or off-chip relative to those components.

如图2中所展示，视频编码器20接收视频数据，且分割单元35将数据分割成视频块。此分割也可包含分割成切片、平铺块或其它较大单元以及视频块分割，例如，根据 LCU及CU的四分树结构。视频编码器20大体上说明编码待编码视频切片内的视频块的组件。可将切片划分成多个视频块(且可能划分成被称作平铺块的视频块集合)。预测处理单元41可基于误差产生(例如，译码速率及失真的等级)选择用于当前视频块的多个可能译码模式中的一者，例如，多个帧内译码模式中的一者或多个帧间译码模式中的一者。预测处理单元41可将所得经帧内或帧间译码块提供到求和器50以产生残余块数据并提供到求和器62以重建构经编码块以用于用作参考图片。As shown in FIG2 , video encoder 20 receives video data, and partitioning unit 35 partitions the data into video blocks. This partitioning may also include partitioning into slices, tiles, or other larger units, as well as video block partitioning, e.g., according to a quadtree structure of LCUs and CUs. Video encoder 20 generally illustrates components that encode video blocks within a video slice to be encoded. A slice may be divided into multiple video blocks (and possibly into sets of video blocks referred to as tiles). Prediction processing unit 41 may select one of multiple possible coding modes for the current video block, e.g., one of multiple intra-coding modes or one of multiple inter-coding modes, based on error generation (e.g., coding rate and level of distortion). Prediction processing unit 41 may provide the resulting intra- or inter-coded block to summer 50 to generate residual block data and to summer 62 to reconstruct the encoded block for use as a reference picture.

预测处理单元41内的帧内预测处理单元46可执行当前视频块相对于与待译码的当前块相同的帧或切片中的一或多个相邻块的帧内预测性译码以提供空间压缩。预测处理单元41内的运动估计单元42及运动补偿单元44执行当前视频块相对于一或多个参考图片中的一或多个预测性块的帧间预测性译码，以提供时间压缩。Intra-prediction processing unit 46 within prediction processing unit 41 may perform intra-predictive coding of the current video block relative to one or more neighboring blocks in the same frame or slice as the current block to be coded to provide spatial compression. Motion estimation unit 42 and motion compensation unit 44 within prediction processing unit 41 may perform inter-predictive coding of the current video block relative to one or more predictive blocks in one or more reference pictures to provide temporal compression.

运动估计单元42可经配置以根据视频序列的预定图案来确定用于视频切片的帧间预测模式。预定图案可将序列中的视频切片指明为P切片、B切片或GPB切片。运动估计单元42及运动补偿单元44可高度集成，但为概念目的而分开来说明。由运动估计单元42执行的运动估计为产生运动向量的过程，运动向量估计视频块的运动。举例来说，运动向量可指示当前视频帧或图片内的视频块的PU相对于参考图片内的预测性块的位移。Motion estimation unit 42 may be configured to determine an inter-prediction mode for a video slice based on a predetermined pattern for a video sequence. The predetermined pattern may designate a video slice in the sequence as a P slice, a B slice, or a GPB slice. Motion estimation unit 42 and motion compensation unit 44 may be highly integrated but are illustrated separately for conceptual purposes. Motion estimation performed by motion estimation unit 42 is the process of generating motion vectors, which estimate the motion of video blocks. For example, a motion vector may indicate the displacement of a PU of a video block within a current video frame or picture relative to a predictive block within a reference picture.

预测性块为就像素差而言被发现紧密地匹配待译码的视频块的PU的块，所述像素差可由绝对差和(SAD)、平方差和(SSD)或其它差度量确定。在一些实例中，视频编码器 20可计算存储于参考图片存储器64中的参考图片的次整数像素位置的值。举例来说，视频编码器20可内插参考图片的四分之一像素位置、八分之一像素位置或其它分数像素位置的值。因此，运动估计单元42可执行相对于全像素位置及分数像素位置的运动搜索且输出具有分数像素精确度的运动向量。A predictive block is a block that is found to closely match a PU of the video block to be coded, in terms of pixel difference, which may be determined by sum of absolute difference (SAD), sum of squared difference (SSD), or other difference metrics. In some examples, video encoder 20 may calculate values for sub-integer pixel positions of a reference picture stored in reference picture memory 64. For example, video encoder 20 may interpolate values for quarter-pixel positions, eighth-pixel positions, or other fractional pixel positions of the reference picture. Thus, motion estimation unit 42 may perform motion searches relative to full-pixel positions and fractional pixel positions and output motion vectors with fractional pixel precision.

运动估计单元42通过比较PU的位置与参考图片的预测性块的位置而计算经帧间译码切片中的视频块的PU的运动向量。参考图片可从第一参考图片列表(列表0)或第二参考图片列表(列表1)选择，所述列表中的每一者识别存储于参考图片存储器64中的一或多个参考图片。运动估计单元42将经计算运动向量发送到熵编码单元56及运动补偿单元44。Motion estimation unit 42 calculates a motion vector for a PU of a video block in an inter-coded slice by comparing the position of the PU to the position of a predictive block of a reference picture. The reference picture may be selected from a first reference picture list (List 0) or a second reference picture list (List 1), each of which identifies one or more reference pictures stored in reference picture memory 64. Motion estimation unit 42 sends the calculated motion vector to entropy encoding unit 56 and motion compensation unit 44.

由运动补偿单元44所执行的运动补偿可涉及基于由运动估计所确定的运动向量而提取或产生预测性块，可能执行子像素精确度的内插。在接收到当前视频块的PU的运动向量之后，运动补偿单元44可在参考图片列表中的一者中定位运动向量所指向的预测性块。视频编码器20可通过从正被译码的当前视频块的像素值减去预测性块的像素值来形成残余视频块，从而形成像素差值。像素差值形成用于块的残余数据，且可包含明度及色度差分量两者。求和器50表示执行此减法运算的所述组件。运动补偿单元44 也可产生与视频块及视频切片相关联的语法元素以供视频解码器30在解码视频切片的视频块时使用。Motion compensation performed by motion compensation unit 44 may involve extracting or generating a predictive block based on the motion vector determined by motion estimation, possibly performing interpolation with sub-pixel precision. After receiving the motion vector for the PU of the current video block, motion compensation unit 44 may locate the predictive block pointed to by the motion vector in one of the reference picture lists. Video encoder 20 may form a residual video block by subtracting the pixel values of the predictive block from the pixel values of the current video block being coded, forming pixel difference values. The pixel difference values form residual data for the block and may include both luma and chroma difference components. Summer 50 represents the component that performs this subtraction operation. Motion compensation unit 44 may also generate syntax elements associated with the video block and video slice for use by video decoder 30 when decoding the video block of the video slice.

如上文所描述，作为由运动估计单元42及运动补偿单元44所执行的帧间预测的替代，帧内预测处理单元46可对当前块进行帧内预测。详言地说，帧内预测处理单元46 可确定帧内预测模式以用以编码当前块。在一些实例中，帧内预测处理单元46可(例如) 在分开的编码遍次期间使用各种帧内预测模式来编码当前块，且帧内预测单元46(或在一些实例中，模式选择单元40)可从所测试的模式选择使用的适当帧内预测模式。举例来说，帧内预测处理单元46可使用用于各种所测试帧内预测模式的速率-失真分析来计算速率-失真值，并在所测试模式当中选择具有最佳速率-失真特性的帧内预测模式。速率-失真分析大体上确定经编码块与原始未经编码块(其经编码以产生经编码块)之间的失真(或误差)量，以及用以产生经编码块的位速率(即，位的数目)。帧内预测处理单元 46可从各种经编码块的失真及速率计算比率以确定哪种帧内预测模式展现所述块的最佳速率-失真值。As described above, as an alternative to the inter-prediction performed by motion estimation unit 42 and motion compensation unit 44, intra-prediction processing unit 46 may intra-predict the current block. Specifically, intra-prediction processing unit 46 may determine an intra-prediction mode to use to encode the current block. In some examples, intra-prediction processing unit 46 may encode the current block using various intra-prediction modes, for example, during separate encoding passes, and intra-prediction unit 46 (or, in some examples, mode selection unit 40) may select an appropriate intra-prediction mode to use from the tested modes. For example, intra-prediction processing unit 46 may calculate rate-distortion values using a rate-distortion analysis for the various tested intra-prediction modes and select the intra-prediction mode with the best rate-distortion characteristics among the tested modes. Rate-distortion analysis generally determines the amount of distortion (or error) between an encoded block and the original, unencoded block (which was encoded to generate the encoded block), as well as the bit rate (i.e., the number of bits) used to generate the encoded block. Intra-prediction processing unit 46 may calculate ratios from the distortions and rates for the various encoded blocks to determine which intra-prediction mode exhibits the best rate-distortion value for the block.

在任何情况下，在选择用于块的帧内预测模式之后，帧内预测处理单元46可将指示用于块的选定帧内预测模式的信息提供到熵编码单元56。熵编码单元56可根据本发明的技术编码指示选定帧内预测模式的信息。视频编码器20可在所发射的位流中包含配置数据，其可包含以下各者：多个帧内预测模式索引表及多个经修改的帧内预测模式索引表(也称作码字映射表)；各种块的编码上下文的定义；及待用于所述上下文中的每一者的最有可能的帧内预测模式、帧内预测模式索引表及经修改的帧内预测模式索引表的指示。In any case, after selecting an intra-prediction mode for a block, intra-prediction processing unit 46 may provide information indicating the selected intra-prediction mode for the block to entropy encoding unit 56. Entropy encoding unit 56 may encode the information indicating the selected intra-prediction mode according to the techniques of this disclosure. Video encoder 20 may include configuration data in the transmitted bitstream, which may include: multiple intra-prediction mode index tables and multiple modified intra-prediction mode index tables (also referred to as codeword mapping tables); definitions of encoding contexts for various blocks; and indications of the most probable intra-prediction mode, intra-prediction mode index tables, and modified intra-prediction mode index tables to be used for each of the contexts.

在预测处理单元41经由帧间预测或帧内预测产生当前视频块的预测性块之后，视频编码器20可通过从当前视频块减去预测性块而形成残余视频块。残余块中的残余视频数据可包含于一或多个TU中且被应用于变换处理单元52。变换处理单元52使用例如离散余弦变换(DCT)或概念上类似变换的变换将残余视频数据变换成残余变换系数。变换处理单元52可将残余视频数据从像素域转换到变换域(例如，频域)。After prediction processing unit 41 generates a predictive block for the current video block via inter-prediction or intra-prediction, video encoder 20 may form a residual video block by subtracting the predictive block from the current video block. The residual video data in the residual block may be included in one or more TUs and applied to transform processing unit 52. Transform processing unit 52 transforms the residual video data into residual transform coefficients using a transform, such as a discrete cosine transform (DCT) or a conceptually similar transform. Transform processing unit 52 may convert the residual video data from the pixel domain to a transform domain (e.g., the frequency domain).

变换处理单元52可将所得变换系数发送到量化单元54。量化单元54量化变换系数以进一步减小位速率。量化过程可减小与系数中的一些或所有相关联的位深度。可通过调整量化参数来修改量化程度。在一些实例中，量化单元54可接着执行包含经量化变换系数的矩阵的扫描。替代地，熵编码单元56可执行扫描。Transform processing unit 52 may send the resulting transform coefficients to quantization unit 54. Quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The degree of quantization may be modified by adjusting a quantization parameter. In some examples, quantization unit 54 may then perform a scan of the matrix containing the quantized transform coefficients. Alternatively, entropy encoding unit 56 may perform the scan.

在量化后，熵编码单元56可熵编码表示经量化变换系数的语法元素。举例来说，熵编码单元56可执行上下文自适应可变长度译码(CAVLC)、上下文自适应二进制算术译码(CABAC)、基于语法的上下文自适应二进制算术译码(SBAC)、概率区间分割熵 (PIPE)译码或另一熵编码方法或技术。在由熵编码单元56熵编码之后，经编码位流可被发射到视频解码器30，或经存档以供视频解码器30稍后发射或检索。熵编码单元56也可熵编码当前正译码的视频切片的运动向量及其它语法元素。After quantization, entropy encoding unit 56 may entropy encode syntax elements representing the quantized transform coefficients. For example, entropy encoding unit 56 may perform context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding, or another entropy encoding method or technique. After entropy encoding by entropy encoding unit 56, the encoded bitstream may be transmitted to video decoder 30 or archived for later transmission or retrieval by video decoder 30. Entropy encoding unit 56 may also entropy encode motion vectors and other syntax elements for the video slice currently being coded.

反量化单元58及反变换处理单元60分别应用反量化及反变换以重建构像素域中的残余块，以供稍后用作参考图片的参考块。运动补偿单元44可通过将残余块添加到参考图片列表中的一者内的参考图片中的一者的预测性块来计算参考块。运动补偿单元44 也可将一或多个内插滤波器应用到经重建构残余块，以计算子整数像素值以用于运动估计中。求和器62将经重建构残余块添加到由运动补偿单元44所产生的运动补偿预测块以产生用于存储于参考图片存储器64中的参考块。参考块可由运动估计单元42及运动补偿单元44用作参考块以帧间预测随后视频帧或图片中的块。Inverse quantization unit 58 and inverse transform processing unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain for later use as a reference block for a reference picture. Motion compensation unit 44 may calculate a reference block by adding the residual block to a predictive block of one of the reference pictures within one of the reference picture lists. Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for use in motion estimation. Summer 62 adds the reconstructed residual block to the motion compensated prediction block produced by motion compensation unit 44 to produce a reference block for storage in reference picture memory 64. The reference block may be used by motion estimation unit 42 and motion compensation unit 44 as a reference block to inter-predict a block in a subsequent video frame or picture.

视频编码器20表示经配置以产生可使用本发明中所描述的文件格式技术存储的视频数据的视频译码器的实例。Video encoder 20 represents an example of a video coder configured to generate video data that may be stored using the file format techniques described in this disclosure.

图3为说明可实施本发明中所描述的技术的实例视频解码器30的框图。视频解码器30可经配置以解码单一视图、多视图、可缩放、3D及其它类型的视频数据。在图3 的实例中，视频解码器30包含熵解码单元80、预测处理单元81、反量化单元86、反变换处理单元88、求和器90、滤波器单元91及参考图片存储器92。预测处理单元81包含运动补偿单元82及帧内预测处理单元84。视频解码器30可在一些实例中执行与关于来自图2的视频编码器20描述的编码遍次大体上互逆的解码遍次。FIG3 is a block diagram illustrating an example video decoder 30 that may implement the techniques described in this disclosure. Video decoder 30 may be configured to decode single-view, multi-view, scalable, 3D, and other types of video data. In the example of FIG3 , video decoder 30 includes an entropy decoding unit 80, a prediction processing unit 81, an inverse quantization unit 86, an inverse transform processing unit 88, a summer 90, a filter unit 91, and a reference picture memory 92. Prediction processing unit 81 includes a motion compensation unit 82 and an intra-prediction processing unit 84. Video decoder 30 may, in some examples, perform a decoding pass that is generally reciprocal to the encoding pass described with respect to video encoder 20 from FIG2 .

经译码图片缓冲器(CPB)79可接收且存储位流的经编码视频数据(例如，NAL单元)。存储于CPB 79中的视频数据可(例如)经由视频数据的有线或无线网络通信或通过存取物理数据存储媒体从链路16获得，例如，从例如相机等局部视频源。CPB 79可形成存储来自经编码视频位流的经编码视频数据的视频数据存储器。CPB 79可为存储参考视频数据用于由视频解码器30在解码视频数据过程中使用(例如，在帧内或帧间译码模式中) 的参考图片存储器。CPB 79及参考图片存储器92可由多种存储器装置中的任一者形成，例如，动态随机存取存储器(DRAM)(包含同步DRAM(SDRAM))、磁阻式RAM(MRAM)、电阻式RAM(RRAM)或其它类型的存储器装置。CPB 79及参考图片存储器92可由同一存储器装置或分开的存储器装置提供。在各种实例中，CPB 79可与视频解码器30的其它组件一起在芯片上，或相对于那些组件在芯片外。Coded picture buffer (CPB) 79 can receive and store the encoded video data (e.g., NAL units) of the bitstream. The video data stored in CPB 79 can be obtained from link 16, for example, via a wired or wireless network for video data communication or by accessing physical data storage media, e.g., from a local video source such as a camera. CPB 79 can form a video data memory that stores the encoded video data from the encoded video bitstream. CPB 79 can be a reference picture memory that stores reference video data for use by video decoder 30 in decoding the video data (e.g., in intra- or inter-coding modes). CPB 79 and reference picture memory 92 can be formed from any of a variety of memory devices, such as dynamic random access memory (DRAM) (including synchronous DRAM (SDRAM)), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. CPB 79 and reference picture memory 92 can be provided by the same memory device or separate memory devices. In various examples, CPB 79 may be on-chip with other components of video decoder 30 , or off-chip relative to those components.

在解码过程期间，视频解码器30从视频编码器20接收表示经编码视频切片的视频块的经编码视频位流及相关联的语法元素。视频解码器30可从网络实体29接收经编码视频位流。网络实体29可(例如)为服务器、MANE、视频编辑器/拼接器或经配置以实施以上描述的技术中的一或多者的其它此类装置。网络实体29可或可不包含视频编码器，例如，视频编码器20。本发明中所描述的技术中的一些可由网络实体29在网络实体29 将经编码视频位流发射到视频解码器30前实施。在一些视频解码系统中，网络实体29 及视频解码器30可为分开的装置的部分，而在其它情况下，关于网络实体29描述的功能性可由包括视频解码器30的同一装置执行。可将网络实体29考虑为视频装置。此外，在一些实例中，网络实体29为图1的文件产生装置34。During the decoding process, video decoder 30 receives an encoded video bitstream representing video blocks of an encoded video slice and associated syntax elements from video encoder 20. Video decoder 30 may receive the encoded video bitstream from network entity 29. Network entity 29 may, for example, be a server, a MANE, a video editor/splicer, or other such device configured to implement one or more of the techniques described above. Network entity 29 may or may not include a video encoder, such as video encoder 20. Some of the techniques described in this disclosure may be implemented by network entity 29 before network entity 29 transmits the encoded video bitstream to video decoder 30. In some video decoding systems, network entity 29 and video decoder 30 may be parts of separate devices, while in other cases, the functionality described with respect to network entity 29 may be performed by the same device that includes video decoder 30. Network entity 29 may be considered a video device. Furthermore, in some examples, network entity 29 is file generation device 34 of FIG.

视频解码器30的熵解码单元80熵解码位流的特定语法元素以产生经量化系数、运动向量及其它语法元素。熵解码单元80将运动向量及其它语法元素转递到预测处理单元81。视频解码器30可在视频切片层级及/或视频块层级接收语法元素。Entropy decoding unit 80 of video decoder 30 entropy decodes specific syntax elements of the bitstream to generate quantized coefficients, motion vectors, and other syntax elements. Entropy decoding unit 80 forwards the motion vectors and other syntax elements to prediction processing unit 81. Video decoder 30 may receive syntax elements at the video slice level and/or the video block level.

当视频切片经译码为经帧内译码(I)切片时，预测处理单元81的帧内预测处理单元 84可基于来自当前帧或图片的先前经解码块的经传信帧内预测模式及数据来产生用于当前视频切片的视频块的预测数据。当将视频帧经译码为经帧间译码(即，B、P或GPB) 切片时，预测处理单元81的运动补偿单元82基于从熵解码单元80接收的运动向量及其它语法元素来产生当前视频切片的视频块的预测性块。可根据在参考图片列表中的一者内的参考图片中的一者产生所述预测性块。视频解码器30可基于存储于参考图片存储器92中的参考图片使用默认建构技术来建构参考帧列表：列表0及列表1。When the video slice is coded as an intra-coded (I) slice, intra-prediction processing unit 84 of prediction processing unit 81 may generate prediction data for the video block of the current video slice based on the signaled intra-prediction mode and data from previously decoded blocks of the current frame or picture. When the video frame is coded as an inter-coded (i.e., B, P, or GPB) slice, motion compensation unit 82 of prediction processing unit 81 generates predictive blocks for the video block of the current video slice based on motion vectors and other syntax elements received from entropy decoding unit 80. The predictive blocks may be generated according to one of the reference pictures within one of the reference picture lists. Video decoder 30 may construct reference frame lists: List 0 and List 1 using a default construction technique based on the reference pictures stored in reference picture memory 92.

运动补偿单元82通过剖析运动向量及其它语法元素来确定用于当前视频切片的视频块的预测信息，并使用所述预测信息以产生正解码的当前视频块的预测性块。举例来说，运动补偿单元82使用接收的语法元素中的一些确定用以译码视频切片的视频块的预测模式(例如，帧内预测或帧间预测)、帧间预测切片类型(例如，B切片、P切片或GPB 切片)、切片的参考图片列表中的一或多者的建构信息、切片的每一经帧间编码视频块的运动向量、切片的每一经帧间译码视频块的帧间预测状态及解码当前视频切片中的视频块的其它信息。Motion compensation unit 82 determines prediction information for a video block of the current video slice by parsing motion vectors and other syntax elements, and uses the prediction information to generate a predictive block for the current video block being decoded. For example, motion compensation unit 82 uses some of the received syntax elements to determine the prediction mode (e.g., intra-prediction or inter-prediction) used to code the video block of the video slice, the inter-prediction slice type (e.g., B-slice, P-slice, or GPB-slice), construction information for one or more of the slice's reference picture lists, motion vectors for each inter-coded video block of the slice, inter-prediction status for each inter-coded video block of the slice, and other information for decoding video blocks in the current video slice.

运动补偿单元82也可执行基于内插滤波器的内插。运动补偿单元82可使用如由视频编码器20在视频块的编码期间所使用的内插滤波器，以计算参考块的次整数像素的内插值。在此情况下，运动补偿单元82可从所接收语法元素确定由视频编码器20所使用的内插滤波器并可使用所述内插滤波器产生预测性块。Motion compensation unit 82 may also perform interpolation based on interpolation filters. Motion compensation unit 82 may use interpolation filters as used by video encoder 20 during encoding of the video blocks to calculate interpolated values for sub-integer pixels of reference blocks. In this case, motion compensation unit 82 may determine the interpolation filters used by video encoder 20 from received syntax elements and may use the interpolation filters to produce the predictive blocks.

反量化单元86将位流中所提供且由熵解码单元80解码的经量化变换系数反量化(即，解量化)。反量化过程可包含使用由视频编码器20计算的用于视频切片中的每一视频块的量化参数，以确定量化程度及(同样)应应用的反量化程度。反变换处理单元88将反变换(例如，反DCT、反整数变换或概念上类似的反变换过程)应用于变换系数，以便产生像素域中的残余块。Inverse quantization unit 86 inverse quantizes (i.e., dequantizes) the quantized transform coefficients provided in the bitstream and decoded by entropy decoding unit 80. The inverse quantization process may include using the quantization parameters for each video block in the video slice calculated by video encoder 20 to determine the degree of quantization and, likewise, the degree of inverse quantization that should be applied. Inverse transform processing unit 88 applies an inverse transform (e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients in order to produce residual blocks in the pixel domain.

在运动补偿单元82基于运动向量及其它语法元素产生当前视频块的预测性块后，视频解码器30通过将来自反变换处理单元88的残余块与由运动补偿单元82所产生的对应预测性块求和而形成经解码视频块。求和器90表示执行此求和操作的所述组件。如果需要，也可使用其它环路滤波器(在译码环路中或在译码环路后)使像素转变平滑，或以其它方式改进视频质量。滤波器单元91意欲表示一或多个环路滤波器，例如，解块滤波器、自适应环路滤波器(ALF)及样本自适应偏移(SAO)滤波器。尽管滤波器单元 91在图3中展示为环路滤波器，但在其它配置中，滤波器单元91可实施为后环路滤波器。接着，将给定帧或图片中的经解码视频块存储于存储用于随后运动补偿的参考图片的参考图片存储器92中。参考图片存储器92也存储用于稍后在显示装置(例如，图1 的显示装置32)上呈现的经解码视频。After motion compensation unit 82 generates a predictive block for the current video block based on the motion vectors and other syntax elements, video decoder 30 forms a decoded video block by summing the residual block from inverse transform processing unit 88 with the corresponding predictive block generated by motion compensation unit 82. Summer 90 represents the component that performs this summing operation. If desired, other loop filters (either in the coding loop or after the coding loop) may also be used to smooth pixel transitions or otherwise improve video quality. Filter unit 91 is intended to represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter. Although filter unit 91 is shown in FIG3 as a loop filter, in other configurations, filter unit 91 may be implemented as a post-loop filter. The decoded video block in a given frame or picture is then stored in reference picture memory 92, which stores reference pictures used for subsequent motion compensation. Reference picture memory 92 also stores decoded video for later presentation on a display device (eg, display device 32 of FIG. 1 ).

图3的视频解码器30表示经配置以解码可使用本发明中所描述的文件格式技术存储的视频数据的视频解码器的实例。Video decoder 30 of FIG. 3 represents an example of a video decoder configured to decode video data that may be stored using the file format techniques described in this disclosure.

图4为说明形成网络100的部分的装置的实例集合的框图。在此实例中，网络100包含路由装置104A、104B(路由装置104)及转码装置106。路由装置104及转码装置106 意欲表示可形成网络100的部分的少量装置。例如交换器、集线器、网关、防火墙、桥接器及其它此些装置的其它网络装置也可包含于网络100内。此外，可沿着服务器装置 102与客户端装置108之间的网络路径提供额外网络装置。在一些实例中，服务器装置 102可对应于源装置12(图1)，而客户端装置108可对应于目的地装置14(图1)。FIG4 is a block diagram illustrating an example set of devices that form part of network 100. In this example, network 100 includes routing devices 104A, 104B (routing devices 104) and transcoding device 106. Routing devices 104 and transcoding device 106 are intended to represent a small number of devices that may form part of network 100. Other network devices, such as switches, hubs, gateways, firewalls, bridges, and other such devices, may also be included within network 100. Furthermore, additional network devices may be provided along the network path between server device 102 and client device 108. In some examples, server device 102 may correspond to source device 12 ( FIG1 ), while client device 108 may correspond to destination device 14 ( FIG1 ).

一般来说，路由装置104实施一或多个路由协议以经由网络100交换网络数据。在一些实例中，路由装置104可经配置以执行代理服务器或高速缓冲存储器操作。因此，在一些实例中，路由装置104可被称作代理装置。一般来说，路由装置104执行路由协议以发现经由网络100的路线。通过执行此些路由协议，路由装置104B可发现从自身经由路由装置104A到服务器装置102的网络路线。Generally speaking, routing device 104 implements one or more routing protocols to exchange network data through network 100. In some examples, routing device 104 may be configured to perform proxy or cache operations. Therefore, in some examples, routing device 104 may be referred to as a proxy device. Generally speaking, routing device 104 executes routing protocols to discover routes through network 100. By executing such routing protocols, routing device 104B can discover a network route from itself via routing device 104A to server device 102.

本发明的技术可由例如路由装置104及转码装置106等网络装置实施，但也可由客户端装置108实施。以此方式，路由装置104、转码装置106及客户端装置108表示经配置以执行本发明的技术的装置的实例。此外，图1的装置及图2中所说明的编码器20 及图3中所说明的解码器30也为可经配置以执行本发明的技术中的一或多者的装置的实例。The techniques of this disclosure may be implemented by network devices such as routing device 104 and transcoding device 106, but may also be implemented by client device 108. In this manner, routing device 104, transcoding device 106, and client device 108 represent examples of devices configured to perform the techniques of this disclosure. Furthermore, the devices of FIG. 1 , encoder 20 illustrated in FIG. 2 , and decoder 30 illustrated in FIG. 3 are also examples of devices that may be configured to perform one or more of the techniques of this disclosure.

图5为说明根据本发明的一或多个技术的文件300的实例结构的概念图。在图5的实例中，文件300包含电影框302及多个媒体数据框304。虽然在图5的实例中说明为在同一文件中，但在其它实例中，电影框302及媒体数据框304可在分开的文件中。如上所指示，框可为由唯一类型识别符及长度定义的面向对象建构块。举例来说，框可为 ISOBMFF中的基本语法结构，包含四字符译码框类型、框的字节计数及有效负载。FIG5 is a conceptual diagram illustrating an example structure of a file 300 in accordance with one or more techniques of this disclosure. In the example of FIG5 , file 300 includes a movie box 302 and a plurality of media data boxes 304. Although illustrated in the example of FIG5 as being in the same file, in other examples, movie box 302 and media data box 304 may be in separate files. As indicated above, a box may be an object-oriented building block defined by a unique type identifier and length. For example, a box may be a basic syntactic structure in ISOBMFF, including a four-character coded box type, a byte count of the box, and a payload.

电影框302可含有用于文件300的播放轨的元数据。文件300的每一播放轨可包括媒体数据的连续流。媒体数据框304中的每一者可包含一或多个样本305。样本305中的每一者可包括音频或视频存取单元。如在本发明中其它处所描述，在多视图译码(例如， MV-HEVC及3D-HEVC)及可缩放视频译码(例如，SHVC)中，每一存取单元可包括多个经译码图片。举例来说，存取单元可包含用于每一层的一或多个经译码图片。Movie box 302 may contain metadata for a track of file 300. Each track of file 300 may comprise a continuous stream of media data. Each of media data boxes 304 may include one or more samples 305. Each of samples 305 may comprise an audio or video access unit. As described elsewhere in this disclosure, in multi-view coding (e.g., MV-HEVC and 3D-HEVC) and scalable video coding (e.g., SHVC), each access unit may comprise multiple coded pictures. For example, an access unit may include one or more coded pictures for each layer.

此外，在图5的实例中，电影框302包含播放轨框306。播放轨框306可围封用于文件300的播放轨的元数据。在其它实例中，电影框302可包含用于文件300的不同播放轨的多个播放轨框。播放轨框306包含媒体框307。媒体框307可含有声明关于播放轨内的媒体数据的信息的所有对象。媒体框307包含媒体信息框308。媒体信息框308 可含有声明播放轨的媒体的特性信息的所有对象。媒体信息框308包含样本表框309。样本表框309可指定样本特定元数据。5 , the movie box 302 includes a track box 306. The track box 306 may enclose metadata for a track of the file 300. In other examples, the movie box 302 may include multiple track boxes for different tracks of the file 300. The track box 306 includes a media box 307. The media box 307 may contain all objects that declare information about the media data within the track. The media box 307 includes a media information box 308. The media information box 308 may contain all objects that declare information about the characteristics of the media of the track. The media information box 308 includes a sample table box 309. The sample table box 309 may specify sample-specific metadata.

在图5的实例中，样本表框309包含SampleToGroup框310及SampleGroupDescription框312。在其它实例中，除SampleToGroup框310及SampleGroupDescription框312之外，样本表框309也可包含其它框，及/或可包含多个SampleToGroup框及SampleGroupDescription框。SampleToGroup框310可将样本(例如，样本305中的特定者)映射到一群样本。SampleGroupDescription框312可指定由所述群样本(即，样本群组)中的样本共享的性质。此外，样本表框309可包含多个样本项框311。样本项框311中的每一者可对应于所述群样本中的样本。在一些实例中，样本项框311 为扩展基础样本群组描述类别(如在以上第9.5.5.1.2节中所定义)的随机可存取样本项类别的实例。In the example of FIG5 , sample table box 309 includes a SampleToGroup box 310 and a SampleGroupDescription box 312. In other examples, sample table box 309 may include other boxes in addition to SampleToGroup box 310 and SampleGroupDescription box 312, and/or may include multiple SampleToGroup boxes and SampleGroupDescription boxes. SampleToGroup box 310 may map a sample (e.g., a particular one of sample 305) to a group of samples. SampleGroupDescription box 312 may specify properties shared by the samples in the group of samples (i.e., a sample group). Furthermore, sample table box 309 may include multiple SampleItem boxes 311. Each of the SampleItem boxes 311 may correspond to a sample in the group of samples. In some examples, SampleItem boxes 311 are instances of a randomly accessible SampleItem class that extends the base SampleGroupDescription class (as defined in Section 9.5.5.1.2 above).

根据本发明的一或多个技术，SampleGroupDescription框312可指定样本群组中的每一样本含有至少一IRAP图片。以此方式，文件产生装置34可产生包括含有用于文件300中的播放轨的元数据的播放轨框306的文件。用于播放轨的媒体数据包括一连串样本305。样本中的每一者可为多层视频数据(例如，SHVC、MV-HEVC或3D-HEVC视频数据)的视频存取单元。此外，作为产生文件300的部分，文件产生装置34可在文件300 中产生以文件记载含有至少一IRAP图片的所有样本305的额外框(即，样本表框309)。换句话说，额外框识别含有至少一IRAP图片的所有样本305。在图5的实例中，额外框定义以文件记载(例如，识别)含有至少一IRAP图片的所有样本305的样本群组。换句话说，所述额外框指定含有至少一IRAP图片的样本305属于样本群组。According to one or more techniques of this disclosure, the SampleGroupDescription box 312 may specify that each sample in a sample group contains at least one IRAP picture. In this manner, the file generation device 34 may generate a file including a Track box 306 containing metadata for a track in the file 300. The media data for the track includes a series of samples 305. Each of the samples may be a video access unit of multi-layer video data (e.g., SHVC, MV-HEVC, or 3D-HEVC video data). Furthermore, as part of generating the file 300, the file generation device 34 may generate an additional box (i.e., a Sample Table box 309) in the file 300 that documents all samples 305 that contain at least one IRAP picture. In other words, the additional box identifies all samples 305 that contain at least one IRAP picture. In the example of FIG. 5 , the additional box defines a sample group that documents (e.g., identifies) all samples 305 that contain at least one IRAP picture. In other words, the additional box specifies that the samples 305 that contain at least one IRAP picture belong to the sample group.

此外，根据本发明的一或多个技术，样本项框311中的每一者可包含指示对应的样本中的所有经译码图片是否为IRAP图片的值(例如，all_pics_are_IRAP)。在一些实例中，所述值等于1指定所述样本中并非所有经译码图片皆为IRAP图片。所述值等于0指定不需要样本群组中的每一样本中的每一经译码图片为IRAP图片。Furthermore, in accordance with one or more techniques of this disclosure, each of the sample entry boxes 311 may include a value (e.g., all_pics_are_IRAP) indicating whether all coded pictures in the corresponding sample are IRAP pictures. In some examples, a value equal to 1 specifies that not all coded pictures in the sample are IRAP pictures. A value equal to 0 specifies that not every coded picture in every sample in the sample group is an IRAP picture.

在一些实例中，当特定样本中并非所有经译码图片皆为IRAP图片时，文件产生装置34在用于所述特定样本的样本项框311中的一者中可包含指示所述特定样本中的 IRAP图片的数目的值(例如，num_IRAP_pics)。另外，文件产生装置34在用于所述特定样本的样本项中可包含指示所述特定样本中的IRAP图片的层识别符的值。文件产生装置34在用于所述特定样本的样本项中也可包含指示所述特定样本的IRAP图片中的VCL NAL单元的NAL单元类型的值。In some examples, when not all coded pictures in a particular sample are IRAP pictures, file generation device 34 may include a value indicating the number of IRAP pictures in the particular sample (e.g., num_IRAP_pics) in one of the sample entry boxes 311 for the particular sample. Additionally, file generation device 34 may include a value indicating the layer identifier of the IRAP picture in the particular sample in the sample entry for the particular sample. File generation device 34 may also include a value indicating the NAL unit type of the VCL NAL unit in the IRAP picture of the particular sample in the sample entry for the particular sample.

此外，在图5的实例中，样本表框309包含子样本信息框314。虽然图5的实例仅展示一个子样本信息框，但样本表框309可包含多个子样本信息框。一般来说，子样本信息框经设计以含有子样本信息。子样本为样本的一系列相邻字节。ISO/IEC 14496-12 指示应针对给定译码系统(例如，H.264/AVC或HEVC)供应子样本的特定定义。Furthermore, in the example of FIG5 , the sample table box 309 includes a subsample information box 314. Although the example of FIG5 shows only one subsample information box, the sample table box 309 may include multiple subsample information boxes. Generally speaking, the subsample information box is designed to contain subsample information. A subsample is a series of contiguous bytes of a sample. ISO/IEC 14496-12 specifies that a specific definition of subsamples should be provided for a given coding system (e.g., H.264/AVC or HEVC).

ISO/IEC 14496-15的第8.4.8节指定用于HEVC的子样本的定义。特定来说，ISO/IEC 14496-15的第8.4.8节指定对于在HEVC流中的子样本信息框(ISO/IEC 14496-12的8.7.7) 的使用，基于子样本信息框的旗标字段的值定义子样本。根据本发明的一或多个技术，如果子样本信息框314中的旗标字段等于5，那么对应于子样本信息框314的子样本含有一个经译码图片及相关联的非VCL NAL单元。相关联的非VCL NAL单元可包含含有可适用于经译码图片的SEI消息的NAL单元及含有可适用于经译码图片的参数集(例如，VPS、SPS、PPS等)的NAL单元。Section 8.4.8 of ISO/IEC 14496-15 specifies the definition of subsamples for HEVC. Specifically, section 8.4.8 of ISO/IEC 14496-15 specifies that for the use of the subsample information box (8.7.7 of ISO/IEC 14496-12) in an HEVC stream, subsamples are defined based on the value of the flags field of the subsample information box. According to one or more techniques of this disclosure, if the flags field in the subsample information box 314 is equal to 5, then the subsample corresponding to the subsample information box 314 contains one coded picture and associated non-VCL NAL units. Associated non-VCL NAL units may include NAL units containing SEI messages applicable to the coded picture and NAL units containing parameter sets (e.g., VPS, SPS, PPS, etc.) applicable to the coded picture.

因此，在一个实例中，文件产生装置34可产生文件(例如，文件300)，所述文件包括含有用于文件中的播放轨的元数据的播放轨框(例如，播放轨框306)。在此实例中，用于播放轨的媒体数据包括一连串样本，所述样本中的每一者为多层视频数据(例如， SHVC、MV-HEVC或3D-HEVC视频数据)的视频存取单元。此外，在此实例中，作为文件产生装置34产生文件的部分，文件产生装置34可在文件中产生子样本信息框(例如，子样本信息框314)，所述子样本信息框含有指定在所述子样本信息框中给出的子样本信息的类型的旗标。当所述旗标具有特定值时，对应于子样本信息框的子样本含有正好一个经译码图片及与所述经译码图片相关联的零或多个非VCL NAL单元。Thus, in one example, file generation device 34 may generate a file (e.g., file 300) that includes a track box (e.g., track box 306) containing metadata for a track in the file. In this example, the media data for the track includes a series of samples, each of which is a video access unit of multi-layer video data (e.g., SHVC, MV-HEVC, or 3D-HEVC video data). Furthermore, in this example, as part of generating the file, file generation device 34 may generate a subsample information box (e.g., subsample information box 314) in the file that contains a flag that specifies the type of subsample information given in the subsample information box. When the flag has a particular value, the subsample corresponding to the subsample information box contains exactly one coded picture and zero or more non-VCL NAL units associated with the coded picture.

此外，根据本发明的一或多个技术，如果子样本信息框314的旗标字段等于0，那么子样本信息框314进一步包含DiscardableFlag值，NoInterLayerPredFlag值、LayerId 值及TempId值。如果子样本信息框314的旗标字段等于5，那么子样本信息框314可包含DiscardableFlag值、VclNalUnitType值、LayerId值、TempId值、NoInterLayerPredFlag值、SubLayerRefNalUnitFlag值及保留值。Furthermore, according to one or more techniques of this disclosure, if the Flags field of the subsample information box 314 is equal to 0, the subsample information box 314 further includes a DiscardableFlag value, a NoInterLayerPredFlag value, a LayerId value, and a TempId value. If the Flags field of the subsample information box 314 is equal to 5, the subsample information box 314 may include a DiscardableFlag value, a VclNalUnitType value, a LayerId value, a TempId value, a NoInterLayerPredFlag value, a SubLayerRefNalUnitFlag value, and a Reserved value.

SubLayerRefNalUnitFlag等于0指示子样本中的所有NAL单元为子层非参考图片的 VCL NAL单元，如在ISO/IEC 23008-2(即，HEVC)中所指定。SubLayerRefNalUnitFlag 等于1指示子样本中的所有NAL单元为子层参考图片的VCL NAL单元，如在ISO/IEC 23008-2(即，HEVC)中所指定。因此，当文件产生装置34产生子样本信息框314且旗标具有特定值(例如，5)时，文件产生装置34在子样本信息框314中包含指示子样本中的所有NAL单元是否为子层非参考图片的VCL NAL单元的额外旗标。SubLayerRefNalUnitFlag equal to 0 indicates that all NAL units in the subsample are VCL NAL units of sub-layer non-reference pictures, as specified in ISO/IEC 23008-2 (i.e., HEVC). SubLayerRefNalUnitFlag equal to 1 indicates that all NAL units in the subsample are VCL NAL units of sub-layer reference pictures, as specified in ISO/IEC 23008-2 (i.e., HEVC). Therefore, when the file generation device 34 generates the subsample information box 314 and the flag has a specific value (e.g., 5), the file generation device 34 includes an additional flag in the subsample information box 314 that indicates whether all NAL units in the subsample are VCL NAL units of sub-layer non-reference pictures.

DiscardableFlag值指示子样本中的VCL NAL单元的discardable_flag值的值。如在 ISO/IEC 14496-15的第A.4节中所指定，如果且仅如果所有所述提取的或聚集的NAL 单元具有设定到1的discardable_flag，那么应将discardable_flag值设定到1，且否则，将其设定到0。如果含有NAL单元的位流可在无NAL单元的情况下正确地解码，那么 NAL单元可具有设定到1的discardable_flag。因此，如果含有NAL单元的位流可在无 NAL单元的情况下正确地解码，那么NAL单元可为“可舍弃的”。子样本中的所有VCL NAL单元应具有相同discardable_flag值。因此，当文件产生装置34产生子样本信息框 314且旗标具有特定值(例如，5)时，文件产生装置34在子样本信息框314中包含指示子样本的所有VCL NAL单元是否可舍弃的额外旗标(例如，discardable_flag)。The DiscardableFlag value indicates the value of the discardable_flag value of the VCL NAL units in the subsample. As specified in Section A.4 of ISO/IEC 14496-15, the discardable_flag value should be set to 1 if and only if all the extracted or aggregated NAL units have the discardable_flag set to 1, and otherwise set to 0. If the bitstream containing the NAL unit can be correctly decoded without the NAL unit, then the NAL unit may have the discardable_flag set to 1. Therefore, if the bitstream containing the NAL unit can be correctly decoded without the NAL unit, then the NAL unit may be "discardable". All VCL NAL units in a subsample should have the same discardable_flag value. Therefore, when the file generation device 34 generates the subsample information box 314 and the flag has a specific value (e.g., 5), the file generation device 34 includes an additional flag (e.g., discardable_flag) in the subsample information box 314 that indicates whether all VCL NAL units of the subsample are discardable.

NoInterLayerPredFlag值指示子样本中的VCL NAL单元的 inter_layer_pred_enabled_flag的值。如果且仅如果所有提取的或聚集的VCL NAL单元具有设定到1的inter_layer_pred_enabled_flag，那么应将inter_layer_pred_enabled_flag设定到1，且否则，将其设定到0。子样本中的所有VCL NAL单元应具有相同 inter_layer_pred_enabled_flag值。因此，当文件产生装置34产生子样本信息框314且旗标具有特定值(例如，5)时，文件产生装置34在子样本信息框314中包含指示是否针对子样本的所有VCL NAL单元启用层间预测的额外值(例如， inter_layer_pred_enabled_flag)。The NoInterLayerPredFlag value indicates the value of the inter_layer_pred_enabled_flag for the VCL NAL units in the subsample. The inter_layer_pred_enabled_flag should be set to 1 if and only if all extracted or aggregated VCL NAL units have the inter_layer_pred_enabled_flag set to 1, and otherwise set to 0. All VCL NAL units in a subsample should have the same inter_layer_pred_enabled_flag value. Therefore, when the file generation device 34 generates the subsample information box 314 and the flag has a specific value (e.g., 5), the file generation device 34 includes an additional value (e.g., inter_layer_pred_enabled_flag) in the subsample information box 314 indicating whether inter-layer prediction is enabled for all VCL NAL units of the subsample.

LayerId指示子样本中的NAL单元的nuh_layer_id值。子样本中的所有NAL单元应具有相同nuh_layer_id值。因此，当文件产生装置34产生子样本信息框314且旗标具有特定值(例如，5)时，文件产生装置34在子样本信息框314中包含指示子样本的每一NAL 单元的层识别符的额外值(例如，LayerId)。LayerId indicates the nuh_layer_id value of the NAL units in the subsample. All NAL units in the subsample should have the same nuh_layer_id value. Therefore, when the file generation device 34 generates the subsample information box 314 and the flag has a specific value (e.g., 5), the file generation device 34 includes an additional value (e.g., LayerId) in the subsample information box 314 that indicates the layer identifier of each NAL unit of the subsample.

TempId指示子样本中的NAL单元的TemporalId值。子样本中的所有NAL单元应具有相同TemporalId值。因此，当文件产生装置34产生子样本信息框314且旗标具有特定值(例如，5)时，文件产生装置34在子样本信息框314中包含指示子样本的每一NAL 单元的时间识别符的额外值(例如，TempId)。TempId indicates the TemporalId value of the NAL units in the subsample. All NAL units in a subsample should have the same TemporalId value. Therefore, when the file generation device 34 generates the subsample information box 314 and the flag has a specific value (e.g., 5), the file generation device 34 includes an additional value (e.g., TempId) in the subsample information box 314 indicating the temporal identifier of each NAL unit in the subsample.

VclNalUnitType指示子样本中的VCL NAL单元的nal_unit_type语法元素。 nal_unit_type语法元素为NAL单元的NAL单元标头中的语法元素。nal_unit_type语法元素指定NAL单元中含有的RBSP的类型。子样本中的所有nal_unit_type VCL NAL单元应具有相同nal_unit_type值。因此，当文件产生装置34产生子样本信息框314且旗标具有特定值(例如，5)时，文件产生装置34在子样本信息框314中包含指示子样本的 VCL NAL单元的NAL单元类型的额外值(例如，VclNalUnitType)。子样本的所有VCL NAL单元具有相同NAL单元类型。VclNalUnitType indicates the nal_unit_type syntax element of the VCL NAL unit in the subsample. The nal_unit_type syntax element is a syntax element in the NAL unit header of a NAL unit. The nal_unit_type syntax element specifies the type of RBSP contained in the NAL unit. All nal_unit_type VCL NAL units in a subsample should have the same nal_unit_type value. Therefore, when the file generation device 34 generates the subsample information box 314 and the flag has a specific value (e.g., 5), the file generation device 34 includes an additional value (e.g., VclNalUnitType) in the subsample information box 314 indicating the NAL unit type of the VCL NAL unit of the subsample. All VCL NAL units of a subsample have the same NAL unit type.

图6为说明根据本发明的一或多个技术的文件300的实例结构的概念图。如在ISO/IEC 14496-15的第8.4.9节中所指定，HEVC允许仅用于参考且不用于输出的文件格式样本。举例来说，HEVC允许视频中的未显示的参考图片。FIG6 is a conceptual diagram illustrating an example structure of a file 300 in accordance with one or more techniques of this disclosure. As specified in Section 8.4.9 of ISO/IEC 14496-15, HEVC allows file format samples that are used only for reference and not for output. For example, HEVC allows for non-displayed reference pictures in a video.

此外，ISO/IEC 14496-15的第8.4.9节指定当任一此非输出样本存在于播放轨中时，应如下约束文件。Furthermore, section 8.4.9 of ISO/IEC 14496-15 specifies that when any such non-output samples are present in a playback track, the file should be constrained as follows.

1.非输出样本应被给予在输出的样本的时间范围外的组成时间。1. Non-output samples should be given a composition time outside the time range of the output samples.

2.应使用不包含非输出样本的组成时间的编辑列表。2. An edit list that does not include component times for non-output samples should be used.

3.当播放轨包含CompositionOffsetBox(‘ctts’)时，3. When the playback track contains CompositionOffsetBox (‘ctts’),

a.应使用CompositionOffsetBox的版本1，a. Version 1 of CompositionOffsetBox should be used,

b.对于每一非输出样本，应将sample_offset的值设定为等于‐2³¹，b. For each non-output sample, the value of sample_offset should be set to be equal to ‐2 ³¹ ,

c.播放轨的SampleTableBox(‘stbl’)中应含有CompositionToDecodeBox(‘cslg’)，且c. The SampleTableBox (‘stbl’) of the playback track should contain CompositionToDecodeBox (‘cslg’), and

d.当对于所述播放轨存在CompositionToDecodeBox时，框中的leastDecodeToDisplayDelta字段的值应等于不包含用于非输出样本的 sample_offset值的CompositionOffsetBox中的最小组成偏移。d. When a CompositionToDecodeBox exists for the playback track, the value of the leastDecodeToDisplayDelta field in the box shall be equal to the smallest composition offset in the CompositionOffsetBox that does not contain sample_offset values for non-output samples.

注意：因此，leastDecodeToDisplayDelta大于‐2³¹。Note: Therefore, leastDecodeToDisplayDelta is greater than ‐2 ³¹ .

如在ISO/IEC 14496-12中所指定，CompositionOffsetBox提供解码时间与组成时间之间的偏移。CompositionOffsetBox包含一组sample_offset值。sample_offset值中的每一者为给出组成时间与解码时间之间的偏移的非负整数。组成时间指将输出样本的时间。解码时间指将解码样本的时间。As specified in ISO/IEC 14496-12, the CompositionOffsetBox provides the offset between the decoding time and the composition time. The CompositionOffsetBox contains a set of sample_offset values. Each sample_offset value is a non-negative integer that gives the offset between the composition time and the decoding time. The composition time refers to the time at which the sample will be output. The decoding time refers to the time at which the sample will be decoded.

如上所指示，经译码切片NAL单元可包含切片片段标头。切片片段标头可为经译码切片片段的部分，且可含有涉及切片片段中的第一或所有CTU的数据元素。在HEVC 中，切片片段标头包含pic_output_flag语法元素。一般来说，pic_output_flag语法元素包含于图片的切片的第一切片片段标头中。因此，本发明可将图片的切片的第一切片片段标头的pic_output_flag称作图片的pic_output_flag。As indicated above, a coded slice NAL unit may include a slice segment header. The slice segment header may be part of a coded slice segment and may contain data elements relating to the first or all CTUs in the slice segment. In HEVC, the slice segment header includes a pic_output_flag syntax element. Generally, the pic_output_flag syntax element is included in the first slice segment header of a slice of a picture. Therefore, this disclosure may refer to the pic_output_flag of the first slice segment header of a slice of a picture as the pic_output_flag of the picture.

如在HEVC WD的第7.4.7.1中所指定，pic_output_flag语法元素影响经解码图片输出及移除过程，如在HEVC WD的附录C中所指定。一般来说，如果用于切片片段的切片片段标头的pic_output_flag语法元素为1，那么输出包含对应于所述切片片段标头的切片的图片。另外，如果用于切片片段的切片片段标头的pic_output_flag语法元素为0，那么可解码包含对应于所述切片片段标头的切片的图片，用于用作参考图片，但不输出所述图片。As specified in Section 7.4.7.1 of the HEVC WD, the pic_output_flag syntax element affects the decoded picture output and removal process, as specified in Annex C of the HEVC WD. In general, if the pic_output_flag syntax element of the slice segment header for a slice segment is 1, then the picture including the slice corresponding to the slice segment header is output. Otherwise, if the pic_output_flag syntax element of the slice segment header for a slice segment is 0, then the picture including the slice corresponding to the slice segment header may be decoded for use as a reference picture, but the picture is not output.

根据本发明的一或多个技术，在ISO/IEC 14496-15的第8.4.9节中对HEVC的参考可由对应的对SHVC、MV-HEVC或3D-HEVC的参考替换。此外，根据一或多个技术的本发明，当存取单元含有具有等于1的pic_output_flag的一些经译码图片及具有等于 0的pic_output_flag的一些其它经译码图片时，必须使用至少两个播放轨来存储流。用于所述播放轨中的每一相应者，相应播放轨的每一样本中的所有经译码图片具有相同 pic_output_flag值。因此，播放轨的第一者中的所有经译码图片具有等于0的 pic_output_flag，且播放轨的第二者中的所有经译码图片具有等于1的pic_output_flag。According to one or more techniques of this disclosure, references to HEVC in section 8.4.9 of ISO/IEC 14496-15 may be replaced by corresponding references to SHVC, MV-HEVC, or 3D-HEVC. Furthermore, according to one or more techniques of this disclosure, when an access unit contains some coded pictures with pic_output_flag equal to 1 and some other coded pictures with pic_output_flag equal to 0, at least two tracks must be used to store the stream. For each respective one of the tracks, all coded pictures in each sample of the respective track have the same pic_output_flag value. Thus, all coded pictures in the first track have pic_output_flag equal to 0, and all coded pictures in the second track have pic_output_flag equal to 1.

因此，在图6的实例中，文件产生装置34可产生文件400。类似于在图5的实例中的文件300，文件400包含电影框402及一或多个媒体数据框404。媒体数据框404中的每一者可对应于文件400的不同播放轨。电影框402可含有用于文件400的播放轨的元数据。文件400的每一播放轨可包括媒体数据的连续流。媒体数据框404中的每一者可包含一或多个样本405。样本405中的每一者可包括音频或视频存取单元。6 , the file generation device 34 may generate a file 400. Similar to the file 300 in the example of FIG5 , the file 400 includes a movie box 402 and one or more media data boxes 404. Each of the media data boxes 404 may correspond to a different track of the file 400. The movie box 402 may contain metadata for the tracks of the file 400. Each track of the file 400 may include a continuous stream of media data. Each of the media data boxes 404 may include one or more samples 405. Each of the samples 405 may include an audio or video access unit.

如上所指示，在一些实例中，当存取单元含有具有等于1的pic_output_flag的一些经译码图片及具有等于0的pic_output_flag的一些其它经译码图片时，必须使用至少两个播放轨来存储流。因此，在图6的实例中，电影框402包含播放轨框406及播放轨框 408。播放轨框406及408中的每一者围封用于文件400的不同播放轨的元数据。举例来说，播放轨框406可围封用于具有pic_output_flag等于0的经译码图片且不具有 pic_output_flag等于1的图片的播放轨的元数据。播放轨框408可围封用于具有 pic_output_flag等于1的经译码图片且不具有pic_output_flag等于0的图片的播放轨的元数据。As indicated above, in some examples, when an access unit contains some coded pictures with pic_output_flag equal to 1 and some other coded pictures with pic_output_flag equal to 0, at least two tracks must be used to store the stream. Thus, in the example of FIG6 , movie box 402 includes track box 406 and track box 408. Each of track boxes 406 and 408 encloses metadata for a different track of file 400. For example, track box 406 may enclose metadata for tracks with coded pictures with pic_output_flag equal to 0 and no pictures with pic_output_flag equal to 1. Track box 408 may enclose metadata for tracks with coded pictures with pic_output_flag equal to 1 and no pictures with pic_output_flag equal to 0.

因此，在一个实例中，文件产生装置34可产生包括围封(例如，包括)媒体内容的媒体数据框(例如，媒体数据框404)的文件(例如，文件400)。媒体内容包括一连串样本(例如，样本405)。样本中的每一者可为多层视频数据的存取单元。在此实例中，当文件产生装置34响应于位流的至少一存取单元包含具有等于1的图片输出旗标的经译码图片及具有等于0的图片输出旗标的经译码图片的确定产生文件时，文件产生装置34可使用至少两个播放轨将位流存储于文件中。对于来自至少两个播放轨的每一相应播放轨，在相应播放轨的每一样本中的所有经译码图片具有相同图片输出旗标值。允许输出具有等于1的图片输出旗标的图片，且允许将具有等于0的图片输出旗标的图片用作参考图片，但不允许将其输出。Thus, in one example, file generation device 34 may generate a file (e.g., file 400) that includes a media data box (e.g., media data box 404) that encloses (e.g., includes) media content. The media content includes a series of samples (e.g., sample 405). Each of the samples may be an access unit of multi-layer video data. In this example, when file generation device 34 generates a file in response to a determination that at least one access unit of a bitstream includes a coded picture having a picture output flag equal to 1 and a coded picture having a picture output flag equal to 0, file generation device 34 may store the bitstream in the file using at least two tracks. For each respective track from the at least two tracks, all coded pictures in each sample of the respective track have the same picture output flag value. Pictures having a picture output flag equal to 1 are permitted to be output, and pictures having a picture output flag equal to 0 are permitted to be used as reference pictures, but are not permitted to be output.

图7为说明根据本发明的一或多个技术的文件产生装置34的实例操作的流程图。图7的操作与本发明的其它流程图中所说明的操作一起为实例。根据本发明的技术的其它实例操作可包含更多、更少或不同动作。FIG7 is a flowchart illustrating an example operation of a file generation device 34 according to one or more techniques of this disclosure. The operation of FIG7 , along with the operations illustrated in other flowcharts of this disclosure, is an example. Other example operations according to the techniques of this disclosure may include more, fewer, or different actions.

在图7的实例中，文件产生装置34产生文件(500)。作为产生文件的部分，文件产生装置34产生含有用于文件中的播放轨的元数据的播放轨框(502)。以此方式，文件产生装置34产生文件，所述文件包括含有用于在文件中的播放轨的元数据的播放轨框。用于所述播放轨的媒体数据包括一连串样本。所述样本中的每一者为多层视频数据的视频存取单元。在一些实例中，文件产生装置34编码所述多层视频数据。In the example of FIG7 , the file generation device 34 generates a file ( 500 ). As part of generating the file, the file generation device 34 generates a track box ( 502 ) containing metadata for a track in the file. In this manner, the file generation device 34 generates a file that includes a track box containing metadata for a track in the file. The media data for the track includes a series of samples. Each of the samples is a video access unit of multi-layer video data. In some examples, the file generation device 34 encodes the multi-layer video data.

此外，作为产生文件的部分，文件产生装置34识别含有至少一IRAP图片的所有样本(504)。此外，文件产生装置34可在文件中产生以文件记载含有至少一IRAP图片的所有样本的额外框(506)。在一些实例中，所述额外框为未在ISOBMFF或其现有扩展中定义的新框。在一些实例中，所述额外框定义以文件记载含有至少一IRAP图片的所有样本的样本群组。举例来说，所述额外框可为或包括包含SampleToGroup框及 SampleGroupDescription框的样本表框。SampleToGroup框识别含有至少一IRAP图片的样本。SampleGroupDescription框指示所述样本群组为含有至少一IRAP图片的一群样本。Furthermore, as part of generating the file, file generation device 34 identifies all samples that contain at least one IRAP picture (504). Furthermore, file generation device 34 may generate an additional box in the file that documents all samples that contain at least one IRAP picture (506). In some examples, the additional box is a new box not defined in ISOBMFF or its existing extensions. In some examples, the additional box defines a sample group that documents all samples that contain at least one IRAP picture. For example, the additional box may be or include a sample table box that includes a SampleToGroup box and a SampleGroupDescription box. The SampleToGroup box identifies samples that contain at least one IRAP picture. The SampleGroupDescription box indicates that the sample group is a group of samples that contain at least one IRAP picture.

此外，在图7的实例中，文件产生装置34可产生用于包含至少一IRAP图片的样本中的特定者的样本项(508)。在一些实例中，文件产生装置34可产生用于包含至少一IRAP 图片的样本中的每一者相应者的样本项。所述样本项可为 RandomAccessibleSampleEntry，如在以上第9.5.5.1.2节中所定义。7 , file generation device 34 may generate a sample entry for a particular one of the samples that include at least one IRAP picture (508). In some examples, file generation device 34 may generate a sample entry for each respective one of the samples that include at least one IRAP picture. The sample entry may be a RandomAccessibleSampleEntry, as defined in Section 9.5.5.1.2 above.

如在图7的实例中所说明，作为产生用于特定样本的样本项的部分，文件产生装置34在用于所述特定样本的样本项中可包含指示所述特定样本中的所有经译码图片是否为IRAP图片的值(510)。以此方式，文件产生装置34可在文件中产生包含指示在所述一连串样本中的特定样本中的所有经译码图片是否为IRAP图片的值的样本项。此外，文件产生装置34在用于所述特定样本的样本项中可包含指示所述特定样本的IRAP图片中的VCL NAL单元的NAL单元类型的值(512)。7 , as part of generating a sample entry for a particular sample, file generation device 34 may include in the sample entry for the particular sample a value indicating whether all coded pictures in the particular sample are IRAP pictures ( 510 ). In this manner, file generation device 34 may generate in a file a sample entry that includes a value indicating whether all coded pictures in a particular sample in the series of samples are IRAP pictures. Furthermore, file generation device 34 may include in the sample entry for the particular sample a value indicating the NAL unit type of the VCL NAL units in the IRAP pictures of the particular sample ( 512 ).

此外，文件产生装置34可确定特定样本中的所有经译码图片是否为IRAP图片(514)。当特定样本中并非所有经译码图片皆为IRAP图片(514的“否”)时，文件产生装置34在用于所述特定样本的样本项中可包含指示所述特定样本中的IRAP图片的数目的值(516)。另外，文件产生装置34在用于所述特定样本的样本项中可包含指示所述特定样本中的IRAP图片的层识别符(例如，nuh_layer_ids)的值。Furthermore, file generation device 34 may determine whether all coded pictures in a particular sample are IRAP pictures (514). When not all coded pictures in a particular sample are IRAP pictures ("No" of 514), file generation device 34 may include, in the sample entry for the particular sample, a value indicating the number of IRAP pictures in the particular sample (516). Additionally, file generation device 34 may include, in the sample entry for the particular sample, a value indicating a layer identifier (e.g., nuh_layer_ids) of the IRAP pictures in the particular sample.

如上所指示，图7仅提供为实例。其它实例不包含图7的每一动作。举例来说，一些实例不包含步骤502、504及508。此外，一些实例不包含步骤510到518中的各者。此外，一些实例包含一或多个额外动作。举例来说，一些实例包含作为产生所述文件的部分而产生同步样本框的额外动作，所述同步样本框包含以文件记载多层视频数据的播放轨的同步样本的同步样本表。播放轨的每一同步样本为播放轨的随机存取样本。在此实例中，如果存取单元中的每一经译码图片为IRAP图片，那么可缩放视频译码样本为同步样本。此外，在此实例中，如果存取单元中的每一经译码图片为无RASL图片的IRAP 图片，那么多视图视频译码样本为同步样本。As indicated above, FIG. 7 is provided only as an example. Other examples do not include every action of FIG. 7 . For example, some examples do not include steps 502, 504, and 508. Furthermore, some examples do not include each of steps 510 through 518. Furthermore, some examples include one or more additional actions. For example, some examples include the additional action of generating a synchronization sample box as part of generating the file, wherein the synchronization sample box includes a synchronization sample table that records synchronization samples for a track of multi-layer video data in a file. Each synchronization sample of a track is a random access sample of the track. In this example, if each coded picture in the access unit is an IRAP picture, then the scalable video coding sample is a synchronization sample. Furthermore, in this example, if each coded picture in the access unit is an IRAP picture without a RASL picture, then the multiview video coding sample is a synchronization sample.

图8为说明根据本发明的一或多个技术的计算装置执行随机存取及/或等级切换的实例操作的流程图。在图8的实例中，计算装置接收文件(550)。在图8的实例中，计算装置可为中间网络装置(例如，MANE、流式传输服务器)、解码装置(例如，目的地装置 14)或另一类型的视频装置。在一些实例中，计算装置可为内容传递网络的部分。FIG8 is a flowchart illustrating example operations of a computing device performing random access and/or level switching according to one or more techniques of this disclosure. In the example of FIG8 , the computing device receives a file (550). In the example of FIG8 , the computing device may be an intermediate network device (e.g., a MANE, a streaming server), a decoding device (e.g., destination device 14), or another type of video device. In some examples, the computing device may be part of a content delivery network.

在图8的实例中，计算装置可从文件获得含有用于文件中的播放轨的元数据的播放轨框(552)。用于所述播放轨的媒体数据包括一连串样本。在图8的实例中，样本中的每一者为多层视频数据的视频存取单元。In the example of FIG8 , the computing device may obtain a track box ( 552 ) containing metadata for a track in the file from a file. The media data for the track includes a series of samples. In the example of FIG8 , each of the samples is a video access unit of multi-layer video data.

此外，在图8的实例中，计算装置可从文件获得额外框(554)。额外框以文件记载含有至少一IRAP图片的所有样本。因此，计算装置可基于额外框中的信息确定含有至少一IRAP图片的所有样本(556)。8 , the computing device may obtain an additional box from the file (554). The additional box records all samples containing at least one IRAP picture in the file. Therefore, the computing device may determine all samples containing at least one IRAP picture based on the information in the additional box (556).

此外，在一些实例中，计算装置可从文件获得包含指示在所述一连串样本中的特定样本中的所有经译码图片是否为IRAP图片的值的样本项。当特定样本中并非所有经译码图片皆为IRAP图片时，计算装置可从样本项获得指示特定样本中的IRAP图片的数目的值。另外，所述计算装置可从样本项获得指示特定样本中的IRAP图片的层识别符的值。此外，在一些实例中，计算装置可从样本项获得指示所述特定样本的IRAP图片中的VCL NAL单元的NAL单元类型的值。另外，在一些实例中，计算装置可从文件获得同步样本框，所述同步样本框包含以文件记载视频数据的播放轨的同步样本的同步样本表。在此些实例中，播放轨的每一同步样本为播放轨的随机存取样本，如果存取单元中的每一经译码图片为IRAP图片，那么可缩放视频译码样本为同步样本，且如果存取单元中的每一经译码图片为无RASL图片的IRAP图片，那么多视图视频译码样本为同步样本。Furthermore, in some examples, the computing device may obtain from a file a sample entry including a value indicating whether all coded pictures in a particular sample in the series of samples are IRAP pictures. When not all coded pictures in the particular sample are IRAP pictures, the computing device may obtain from the sample entry a value indicating the number of IRAP pictures in the particular sample. Furthermore, the computing device may obtain from the sample entry a value indicating a layer identifier for the IRAP pictures in the particular sample. Furthermore, in some examples, the computing device may obtain from the sample entry a value indicating the NAL unit type of a VCL NAL unit in the IRAP picture of the particular sample. Furthermore, in some examples, the computing device may obtain from the file a synchronization sample box including a synchronization sample table that records synchronization samples for a track of video data in the file. In these examples, each synchronization sample of the track is a random access sample of the track, scalable video coding samples are synchronization samples if each coded picture in the access unit is an IRAP picture, and multiview video coding samples are synchronization samples if each coded picture in the access unit is an IRAP picture without a RASL picture.

另外，在图8的实例中，计算装置可开始转递或解码含有至少一IRAP图片的样本的NAL单元，而不转递或解码按解码次序在所述样本之前的文件的NAL单元(558)。以此方式，计算装置可执行随机存取或层切换。举例来说，计算装置可开始在含有至少一 IRAP图片的一或多个样本中的一者处的多层视频数据的解码。8 , the computing device may begin forwarding or decoding NAL units for samples containing at least one IRAP picture without forwarding or decoding NAL units for files that precede the sample in decoding order ( 558 ). In this manner, the computing device may perform random access or layer switching. For example, the computing device may begin decoding multi-layer video data at one of one or more samples containing at least one IRAP picture.

图9为说明根据本发明的一或多个技术的文件产生装置34的实例操作的流程图。在图9的实例中，文件产生装置34可产生包括含有用于文件中的播放轨的元数据的播放轨框的文件(600)。用于所述播放轨的媒体数据包括一连串样本。在图9的实例中，样本中的每一者为多层视频数据的视频存取单元。在一些实例中，文件产生装置34编码所述多层视频数据。FIG9 is a flowchart illustrating example operations of a file generation device 34 according to one or more techniques of this disclosure. In the example of FIG9 , the file generation device 34 may generate a file (600) including a track box containing metadata for a track in the file. The media data for the track includes a sequence of samples. In the example of FIG9 , each of the samples is a video access unit of multi-layer video data. In some examples, the file generation device 34 encodes the multi-layer video data.

作为产生文件的部分，文件产生装置34可确定子样本是否含有正好一个经译码图片及与所述经译码图片相关联的零或多个非VCL NAL单元(602)。响应于确定子样本含有正好一个经译码图片及与所述经译码图片相关联的零或多个非VCL NAL单元(602的“是”)，文件产生装置34可在文件中产生子样本信息框，所述子样本信息框含有具有指示子样本含有正好一个经译码图片及与所述经译码图片相关联的零或多个非VCL NAL 单元的值(例如，5)的旗标(604)。否则(602的“否”)，文件产生装置34可在文件中产生含有具有另一值(例如，0、1、2、3、4)的旗标的子样本信息框(606)。As part of generating the file, file generation device 34 may determine whether the subsample contains exactly one coded picture and zero or more non-VCL NAL units associated with the coded picture (602). In response to determining that the subsample contains exactly one coded picture and zero or more non-VCL NAL units associated with the coded picture ("Yes" of 602), file generation device 34 may generate a subsample information box in the file, the subsample information box containing a flag with a value (e.g., 5) indicating that the subsample contains exactly one coded picture and zero or more non-VCL NAL units associated with the coded picture (604). Otherwise ("No" of 602), file generation device 34 may generate a subsample information box in the file containing a flag with another value (e.g., 0, 1, 2, 3, 4).

以此方式，文件产生装置34可产生文件，所述文件包括含有用于在文件中的播放轨的元数据的播放轨框。用于所述播放轨的媒体数据包括一连串样本，所述样本中的每一者为多层视频数据的视频存取单元。作为产生文件的部分，文件产生装置34在文件中产生子样本信息框，所述子样本信息框含有指定在所述子样本信息框中给出的子样本信息的类型的旗标。当所述旗标具有特定值时，对应于所述子样本信息框的子样本含有正好一个经译码图片及与所述经译码图片相关联的零或多个非VCL NAL单元。In this manner, file generation device 34 can generate a file that includes a track box containing metadata for a track in the file. The media data for the track includes a sequence of samples, each of which is a video access unit of multi-layer video data. As part of generating the file, file generation device 34 generates a subsample information box in the file that contains a flag that specifies the type of subsample information given in the subsample information box. When the flag has a particular value, the subsample corresponding to the subsample information box contains exactly one coded picture and zero or more non-VCL NAL units associated with the coded picture.

图10为说明根据本发明的一或多个技术的计算装置的实例操作的流程图。在图10的实例中，计算装置接收文件(650)。在图10的实例中，计算装置可为中间网络装置，例如，MANE或流式传输服务器。在一些实例中，计算装置可为内容传递网络的部分。此外，在图10的实例中，计算装置可从文件获得播放轨框(651)。播放轨框含有用于文件中的播放轨的元数据。用于所述播放轨的媒体数据包括一连串样本。在图10的实例中，样本中的每一者为多层视频数据的视频存取单元。FIG10 is a flowchart illustrating an example operation of a computing device according to one or more techniques of this disclosure. In the example of FIG10 , a computing device receives a file (650). In the example of FIG10 , the computing device may be an intermediate network device, such as a MANE or a streaming server. In some examples, the computing device may be part of a content delivery network. Furthermore, in the example of FIG10 , the computing device may obtain a track frame (651) from the file. The track frame contains metadata for a track in the file. The media data for the track includes a series of samples. In the example of FIG10 , each of the samples is a video access unit of multi-layer video data.

此外，在图10的实例中，计算装置可从文件获得子样本信息框(652)。计算装置使用子样本信息框中的信息提取子位流(654)。子位流可包括存储于文件中的位流的操作点的每一NAL单元。换句话说，子位流的NAL单元可为存储于文件中的NAL单元的子集。计算装置可从文件获得子样本信息框，且可在不剖析或解译样本的序列中包含的 NAL单元的情况下提取子位流。当提取子位流时不剖析或解译NAL单元可减小计算装置的复杂度及/或可加快提取子位流的过程。Furthermore, in the example of FIG. 10 , the computing device may obtain a subsample information box from the file ( 652 ). The computing device uses the information in the subsample information box to extract a sub-bitstream ( 654 ). The sub-bitstream may include each NAL unit of an operation point of the bitstream stored in the file. In other words, the NAL units of the sub-bitstream may be a subset of the NAL units stored in the file. The computing device may obtain the subsample information box from the file and may extract the sub-bitstream without parsing or interpreting the NAL units included in the sequence of samples. Not parsing or interpreting NAL units when extracting the sub-bitstream may reduce the complexity of the computing device and/or may speed up the process of extracting the sub-bitstream.

此外，在一些实例中，当旗标具有特定值时，计算装置可从子样本信息框获得以下中的一或多者：Furthermore, in some examples, when the flag has a particular value, the computing device may obtain one or more of the following from the subsample information box:

●指示子样本的所有VCL NAL单元是否可舍弃的额外旗标，● an additional flag indicating whether all VCL NAL units of the subsample can be discarded,

●指示子样本的VCL NAL单元的NAL单元类型的额外值，其中所述子样本的所有VCL NAL单元具有相同NAL单元类型，an additional value indicating the NAL unit type of the VCL NAL units of a subsample, where all VCL NAL units of the subsample have the same NAL unit type,

●指示子样本的每一NAL单元的层识别符的额外值，● an additional value for the layer identifier of each NAL unit indicating a subsample,

●指示子样本的每一NAL单元的时间识别符的额外值，● an additional value for the temporal identifier of each NAL unit indicating a subsample,

●指示是否针对子样本的所有VCL NAL单元启用层间预测的额外旗标，或● an additional flag indicating whether inter-layer prediction is enabled for all VCL NAL units of the subsample, or

●指示子样本中的所有NAL单元是否为子层非参考图片的VCL NAL单元的额外旗标。• An additional flag indicating whether all NAL units in the sub-sample are VCL NAL units of sub-layer non-reference pictures.

在图10的实例中，作为提取子位流的部分，计算装置可确定子样本信息框的“旗标”值是否具有指示子样本信息框对应于正好一个经译码图片及与所述经译码图片相关联的零或多个非VCL NAL单元的特定值(例如，5)(656)。当子样本信息框的“旗标”值具有特定值(656的“是”)时，计算装置可基于在子样本信息框中指定的信息确定是否需要经译码图片以便解码操作点(658)。举例来说，计算装置可基于可舍弃旗标确定VCL NAL单元类型指示符、层识别符、时间识别符、无层间预测旗标及/或子层参考NAL单元旗标，不管是否需要经译码图片以便解码操作点。当需要经译码图片以解码操作点(658 的“是”)时，计算装置可在子位流中包含子样本的NAL单元(660)。否则，在图10的实例中，当不需要经译码图片以解码操作点(658的“否”)时，计算装置不在子位流中包含子样本的NAL单元(662)。In the example of FIG10 , as part of extracting the sub-bitstream, the computing device may determine whether the “flag” value of the subsample information box has a specific value (e.g., 5) indicating that the subsample information box corresponds to exactly one coded picture and zero or more non-VCL NAL units associated with the coded picture (656). When the “flag” value of the subsample information box has the specific value (“yes” of 656), the computing device may determine whether the coded picture is required to decode the operation point based on the information specified in the subsample information box (658). For example, the computing device may determine, based on the discardable flag, a VCL NAL unit type indicator, a layer identifier, a temporal identifier, a no inter-layer prediction flag, and/or a sub-layer reference NAL unit flag, whether the coded picture is required to decode the operation point. When the coded picture is required to decode the operation point (“yes” of 658), the computing device may include the NAL unit of the subsample in the sub-bitstream (660). Otherwise, in the example of FIG. 10 , when no coded picture is needed to decode the operation point (“NO” of 658 ), the computing device does not include NAL units for the sub-sample in the sub-bitstream ( 662 ).

此外，在图10的实例中，计算装置可输出子位流(664)。举例来说，计算装置可将子位流存储到计算机可读存储媒体或将子位流发射到另一计算装置。10, the computing device may output the sub-bitstream (664). For example, the computing device may store the sub-bitstream to a computer-readable storage medium or transmit the sub-bitstream to another computing device.

如上所指示，图10为实例。其它实例可包含或省略图10的特定动作。举例来说，一些实例省略动作650、651、654及/或664。此外，一些实例省略动作656到662中的一或多者。As indicated above, Figure 10 is an example. Other examples may include or omit specific actions of Figure 10. For example, some examples omit actions 650, 651, 654, and/or 664. In addition, some examples omit one or more of actions 656-662.

图11为说明根据本发明的一或多个技术的文件产生装置34的实例操作的流程图。在图11的实例中，文件产生装置34可产生包括围封媒体内容的媒体数据框的文件(700)。所述媒体内容可包括一连串样本，所述样本中的每一者为多层视频数据的存取单元。在各种实例中，多层视频数据可为SHVC数据、MV-HEVC数据或3D-HEVC数据。在一些实例中，文件产生装置34编码所述多层视频数据。FIG11 is a flowchart illustrating an example operation of file generation device 34 according to one or more techniques of this disclosure. In the example of FIG11 , file generation device 34 may generate a file (700) including a media data box enclosing media content. The media content may include a series of samples, each of which is an access unit of multi-layer video data. In various examples, the multi-layer video data may be SHVC data, MV-HEVC data, or 3D-HEVC data. In some examples, file generation device 34 encodes the multi-layer video data.

在图11的实例中，作为产生文件的部分，文件产生装置34可确定多层视频数据的位流的至少一存取单元是否包含具有等于第一值(例如，1)的图片输出旗标的经译码图片及具有等于第二值(例如，0)的图片输出旗标的经译码图片(702)。允许输出具有等于第一值(例如，1)的图片输出旗标的图片，且允许将具有等于第二值(例如，0)的图片输出旗标的图片用作参考图片，但不允许将其输出。在其它实例中，其它装置可进行确定多层视频数据的位流的至少一存取单元是否包含具有等于第一值的图片输出旗标的经译码图片及具有等于第二值的图片输出旗标的经译码图片。11 , as part of generating a file, file generation device 34 may determine whether at least one access unit of a bitstream of multi-layer video data includes a coded picture having a picture output flag equal to a first value (e.g., 1) and a coded picture having a picture output flag equal to a second value (e.g., 0) ( 702 ). Pictures having the picture output flag equal to the first value (e.g., 1) are permitted to be output, and pictures having the picture output flag equal to the second value (e.g., 0) are permitted to be used as reference pictures but are not permitted to be output. In other examples, other devices may perform the determination of whether at least one access unit of a bitstream of multi-layer video data includes a coded picture having a picture output flag equal to the first value and a coded picture having a picture output flag equal to the second value.

响应于多层视频数据的位流的至少一存取单元包含具有等于第一值的图片输出旗标的经译码图片及具有等于第二值的图片输出旗标的经译码图片(702的“是”)，文件产生装置34使用至少一第一播放轨及第二播放轨将位流存储于文件中(704)。对于来自第一播放轨及第二播放轨的每一相应播放轨，在相应播放轨的每一样本中的所有经译码图片具有相同图片输出旗标值。In response to at least one access unit of the bitstream of multi-layer video data including a coded picture having a picture output flag equal to a first value and a coded picture having a picture output flag equal to a second value (YES at 702 ), the file generation device 34 stores the bitstream in a file using at least a first track and a second track ( 704 ). For each respective track from the first track and the second track, all coded pictures in each sample of the respective track have the same picture output flag value.

此外，在图11的实例中，响应于确定位流中无存取单元包含具有等于第一值(例如， 1)的图片输出旗标的经译码图片及具有等于第二值(例如，0)的图片输出旗标的经译码图片(702的“否”)，文件产生装置34可使用单一播放轨将位流存储于文件中(706)。在其它实例中，文件产生装置34可产生具有多个播放轨的文件，甚至当位流中无存取单元包含具有等于第一值(例如，1)的图片输出旗标的经译码图片及具有等于第二值(例如， 0)的图片输出旗标的经译码图片时。11 , in response to determining that no access unit in the bitstream includes a coded picture having a picture output flag equal to a first value (e.g., 1) and a coded picture having a picture output flag equal to a second value (e.g., 0) ("No" of 702), file generation device 34 may store the bitstream in a file using a single track (706). In other examples, file generation device 34 may generate a file with multiple tracks even when no access unit in the bitstream includes a coded picture having a picture output flag equal to a first value (e.g., 1) and a coded picture having a picture output flag equal to a second value (e.g., 0).

如上所指示，图11为实例。其它实例可包含较少动作。举例来说，一些实例省略动作702及706。As indicated above, Figure 11 is an example. Other examples may include fewer actions. For example, some examples omit actions 702 and 706.

图12为说明根据本发明的一或多个技术的目的地装置14的实例操作的流程图。在图12的实例中，目的地装置14接收文件(750)。所述文件可包括围封媒体内容的媒体数据框，所述媒体内容包括一连串样本。所述样本中的每一者可为多层视频数据的存取单元。在各种实例中，多层视频数据可为SHVC数据、MV-HEVC数据或3D-HEVC数据。此外，在图12的实例中，目的地装置14可从文件获得第一播放轨框及第二播放轨框 (751)。第一播放轨框含有用于文件中的第一播放轨的元数据。第二播放轨框含有用于文件中的第二播放轨的元数据。对于来自第一播放轨及第二播放轨的每一相应播放轨，在相应播放轨的每一样本中的所有经译码图片具有相同图片输出旗标值。允许输出具有等于第一值(例如，1)的图片输出旗标的图片，且允许将具有等于第二值(例如，0)的图片输出旗标的图片用作参考图片，但不允许将其输出。FIG12 is a flowchart illustrating an example operation of a destination device 14 according to one or more techniques of the present disclosure. In the example of FIG12 , the destination device 14 receives a file ( 750 ). The file may include a media data box enclosing media content, the media content including a series of samples. Each of the samples may be an access unit of multi-layer video data. In various examples, the multi-layer video data may be SHVC data, MV-HEVC data, or 3D-HEVC data. Furthermore, in the example of FIG12 , the destination device 14 may obtain a first track box and a second track box ( 751 ) from the file. The first track box contains metadata for the first track in the file. The second track box contains metadata for the second track in the file. For each respective track from the first track and the second track, all coded pictures in each sample of the respective track have the same picture output flag value. Pictures with a picture output flag equal to a first value (e.g., 1) are allowed to be output, and pictures with a picture output flag equal to a second value (e.g., 0) are allowed to be used as reference pictures, but are not allowed to be output.

目的地装置14的视频解码器30可针对具有等于第一值(例如，1)的图片输出旗标的图片解码播放轨中的图片，且可针对具有等于第二值(例如，0)的图片输出旗标的图片解码播放轨中的图片(752)。在一些情况下，视频解码器30可使用具有等于1的图片输出旗标的图片解码具有等于0的图片输出旗标的图片，且反之亦然。目的地装置14可输出具有等于第一值的图片输出旗标的图片(754)。目的地装置14不输出具有等于第二值的图片输出旗标的图片(756)。以此方式，对于来自第一播放轨及第二播放轨的每一相应播放轨，目的地装置14可解码在相应播放轨的每一样本中的经译码图片，且输出具有等于第一值的图片输出旗标的经解码图片。Video decoder 30 of destination device 14 may decode pictures in the track for pictures with a picture output flag equal to a first value (e.g., 1), and may decode pictures in the track for pictures with a picture output flag equal to a second value (e.g., 0) (752). In some cases, video decoder 30 may decode pictures with a picture output flag equal to 0 using pictures with a picture output flag equal to 1, and vice versa. Destination device 14 may output pictures with a picture output flag equal to the first value (754). Destination device 14 does not output pictures with a picture output flag equal to the second value (756). In this manner, for each respective track from the first and second tracks, destination device 14 may decode a coded picture in each sample of the respective track and output a decoded picture with a picture output flag equal to the first value.

如上所指示，图12仅提供为实例。其它实例可省略图12的特定动作，例如，动作752到756。As indicated above, Figure 12 is provided merely as an example. Other examples may omit certain actions of Figure 12, such as actions 752 to 756.

在一或多个实例中，所描述功能可以硬件、软件、固件或其任何组合来实施。如果以软件实施，那么所述功能可作为一或多个指令或代码而在计算机可读媒体上存储或传输，且由基于硬件的处理单元执行。计算机可读媒体可包含计算机可读存储媒体(其对应于例如数据存储媒体等有形媒体)，或包含促进将计算机程序从一处传送到另一处(例如，根据通信协议)的任何媒体的通信媒体。以此方式，计算机可读媒体大体上可对应于(1) 非暂时性的有形计算机可读存储媒体，或(2)例如信号或载波的通信媒体。数据存储媒体可为可通过一或多个计算机或一或多个处理器存取以检索指令、代码及/或数据结构以用于实施本发明所描述的技术的任何可用媒体。计算机程序产品可包含计算机可读媒体。In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored or transmitted as one or more instructions or codes on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media (which corresponds to tangible media such as data storage media), or communication media including any media that facilitates the transfer of a computer program from one place to another (e.g., according to a communication protocol). In this manner, computer-readable media may generally correspond to (1) non-transitory, tangible computer-readable storage media, or (2) communication media such as signals or carrier waves. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, codes, and/or data structures for implementing the techniques described herein. A computer program product may include computer-readable media.

借助于实例而非限制，此些计算机可读存储媒体可包括RAM、ROM、EEPROM、 CD-ROM或其它光盘存储器、磁盘存储器或其它磁性存储器、快闪存储器或可用以存储呈指令或数据结构形式的所要的程序代码且可由计算机存取的任何其它媒体。而且，任何连接被恰当地称为计算机可读媒体。举例来说，如果使用同轴缆线、光纤缆线、双绞线、数字订户线(DSL)或例如红外线、无线电及微波等无线技术从网站、服务器或其它远程源传输指令，那么同轴缆线、光纤缆线、双绞线、DSL或例如红外线、无线电及微波等无线技术包含于媒体的定义中。然而，应理解，计算机可读存储媒体及数据存储媒体不包含连接、载波、信号或其它暂时性媒体，而取而代之，是有关非暂时性的有形存储媒体。如本文所使用，磁盘及光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)、软性磁盘及蓝光光盘，其中磁盘通常以磁性方式再现数据，而光盘用激光以光学方式再现数据。以上各者的组合也应包含于计算机可读媒体的范围内。By way of example, and not limitation, such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Moreover, any connection is properly termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwaves, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwaves are included in the definition of medium. However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but instead refer to non-transitory, tangible storage media. As used herein, disk and disc include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, and Blu-ray disc. Disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

可通过例如一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路的一或多个处理器来执行指令。因此，如本文中所使用的术语“处理器”可指上述结构或适合于实施本文中所描述的技术的任何其它结构中的任一者。此外，在一些方面中，本文中所描述的功能性可在经配置用于编码及解码的专用硬件及/或软件模块内提供，或并入于组合式编解码器中。而且，所述技术可完全实施于一或多个电路或逻辑元件中。Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general-purpose microprocessors, application-specific integrated circuits (ASICs), field-programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits. Thus, the term "processor," as used herein, may refer to any of the aforementioned structures or any other structure suitable for implementing the techniques described herein. Furthermore, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated into a combined codec. Moreover, the techniques may be fully implemented in one or more circuits or logic elements.

本发明的技术可以多种装置或设备实施，所述装置或设备包含无线手持机、集成电路(IC)或IC集合(例如，芯片集)。在本发明中描述各种组件、模块或单元以强调经配置以执行所揭示技术的装置的功能方面，但未必需要通过不同硬件单元来实现。相反地，如上所述，各种单元可与合适的软件及/或固件一起组合在编解码器硬件单元中或由互操作硬件单元的集合提供，硬件单元包含如上文所描述的一或多个处理器。The techniques of this disclosure may be implemented in a variety of devices or apparatuses, including wireless handsets, integrated circuits (ICs), or sets of ICs (e.g., chipsets). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require implementation by different hardware units. Rather, as described above, the various units may be combined with appropriate software and/or firmware in a codec hardware unit or provided by a collection of interoperable hardware units, including one or more processors as described above.

已描述各种实例。此些及其它实例处于以下权利要求书的范围内。Various examples have been described. These and other examples are within the scope of the following claims.

Claims

1. A method for processing multi-layer video data, the method comprising:

Generate a file including movie frames, the movie frames containing metadata for a continuous media stream existing in the file, wherein generating the file includes:

In response to the determination of at least one access unit of the bitstream of the multi-layer video data, which includes a decoded image having an image output flag equal to a first value and a decoded image having an image output flag equal to a second value different from the first value, the bitstream is stored in the file using at least one first playback track and a second playback track, wherein:

For each corresponding playback track from the first playback track and the second playback track, all decoded images in each sample of the corresponding playback track have the same value for the image output flag; and

Images with an image output flag equal to the first value are allowed to be output, and images with an image output flag equal to the second value are allowed to be used as reference images, but are not allowed to be output; and

For each corresponding playback track of the file, the corresponding playback track frame is enclosed in the movie frame, the corresponding playback track frame enclosing the metadata for the corresponding playback track, wherein each of the media streams present in the file is represented in the file as the corresponding playback track of the file, the media stream for the corresponding playback track includes a corresponding sample sequence, the corresponding sample sequence includes the access unit of the multi-layer video data, wherein the media stream for the corresponding playback track is enclosed in the media data frame of the file.

2. The method of claim 1, wherein generating the file comprises:

In response to determining that no access unit in the bitstream contains a decoded image with an image output flag equal to the first value and a decoded image with an image output flag equal to the second value, the bitstream is stored in the file using a single playback track.

3. The method according to claim 1, wherein the multi-layer video data is Scalable High Efficiency Video Decoding (SHVC) data.

4. The method according to claim 1, wherein the multi-layer video data is multi-view high-efficiency video decoding (MV-HEVC) data.

5. The method according to claim 1, wherein the multi-layer video data is 3D high-efficiency video decoding 3D-HEVC data.

6. The method of claim 1, wherein the first value is equal to 1 and the second value is equal to 0.

7. The method of claim 1, further comprising:

Encode the multi-layer video data.

8. A method for processing multi-layer video data, the method comprising:

A first playback track frame and a second playback track frame are obtained from the file. The first playback track frame and the second playback track frame are located within a movie frame. The movie frame contains metadata for a continuous media stream existing in the file, wherein:

For each corresponding playback track of the file, each of the media streams existing in the file is represented in the file as the corresponding playback track of the file. The media stream for the corresponding playback track contains a corresponding sample sequence, the corresponding sample sequence contains access units for the multi-layer video data, and the media stream for the corresponding playback track is enclosed in the media data frame of the file.

The first playback track frame contains metadata from the file for the first playback track, and the second playback track frame contains metadata from the file for the second playback track.

For each corresponding playback track from the first playback track and the second playback track, all decoded images in each sample of the corresponding playback track have the same value for the image output flag, and

Images with an image output flag equal to the first value are allowed to be output, and images with an image output flag equal to a second value different from the first value are allowed to be used as reference images, but they are not allowed to be output.

9. The method according to claim 8, wherein the multi-layer video data is Scalable High Efficiency Video Decoding (SHVC) data.

10. The method according to claim 8, wherein the multi-layer video data is multi-view high-efficiency video decoding (MV-HEVC) data.

11. The method according to claim 8, wherein the multi-layer video data is 3D-HEVC data with high efficiency video decoding.

12. The method of claim 8, wherein the first value is equal to 1 and the second value is equal to 0.

13. The method of claim 8, further comprising:

For each corresponding playback track from the first playback track and the second playback track:

Decode the decoded image in each sample of the corresponding playback track; and

The decoded image is output with an image output flag equal to the first value.

14. A video apparatus for processing multi-layer video data, the video apparatus comprising:

Data storage media configured to store the multi-layer video data; and

One or more processors, configured to:

Generate a file including movie frames, the movie frames containing metadata for a continuous media stream existing in the file, wherein, in order to generate the file, the one or more processors are configured to:

For each corresponding playback track from the first playback track and the second playback track, all decoded images in each sample of the corresponding playback track have the same value for the image output flag;

Images with an image output flag equal to the first value are allowed to be output, and images with an image output flag equal to the second value are allowed to be used as reference images, but are not allowed to be output; and for each corresponding playback track of the file, the corresponding playback track frame is contained within the movie frame, the corresponding playback track frame encloses the metadata for the corresponding playback track, wherein each of the media streams present in the file is represented in the file as the corresponding playback track of the file, the media stream for the corresponding playback track contains a corresponding sample sequence, the corresponding sample sequence contains the access unit of the multi-layer video data, wherein the media stream for the corresponding playback track is enclosed in the media data frame of the file.

15. The video apparatus of claim 14, wherein, in order to generate the file, the one or more processors are configured to:

16. The video apparatus of claim 14, wherein the multilayer video data is one of the following: Scalable High Efficiency Video Coding (SHVC) data, Multi-View High Efficiency Video Coding (MV-HEVC) data, or 3D High Efficiency Video Coding (3D-HEVC) data.

17. The video apparatus of claim 14, wherein the first value is equal to 1 and the second value is equal to 0.

18. The video apparatus of claim 14, wherein the one or more processors are configured to encode the multilayer video data.

19. The video apparatus of claim 14, wherein the video apparatus comprises at least one of the following:

integrated circuit;

microprocessor; or

Wireless communication device.

20. A video apparatus for processing multi-layer video data, the video apparatus comprising:

Data storage media configured to store the multi-layer video data; and

One or more processors, configured to:

21. The video apparatus of claim 20, wherein the multi-layer video data is one of the following: Scalable High Efficiency Video Coding (SHVC) data, Multi-View High Efficiency Video Coding (MV-HEVC) data, or 3D High Efficiency Video Coding (3D-HEVC) data.

22. The video apparatus of claim 20, wherein the first value is equal to 1 and the second value is equal to 0.

23. The video apparatus of claim 20, wherein the one or more processors are configured to:

The decoded image is output with an image output flag equal to the first value.

24. The video apparatus of claim 20, wherein the video apparatus comprises at least one of the following:

integrated circuit;

microprocessor; or

Wireless communication device.

25. A video apparatus for processing multi-layer video data, the video apparatus comprising:

A device for storing the multi-layer video data; and

A means for generating a file including a movie frame, the movie frame containing metadata for a continuous media stream existing in the file, wherein the means for generating the file includes:

At least one access unit for responding to the bitstream of the multi-layer video data includes determining a decoded image having an image output flag equal to a first value and a decoded image having an image output flag equal to a second value different from the first value, and means for storing the bitstream in the file using at least one first playback track and a second playback track, wherein:

Images with an image output flag equal to the first value are allowed to be output, and images with an image output flag equal to the second value are allowed to be used as reference images, but are not allowed to be output.

For each corresponding playback track of the file, means for enclosing the corresponding playback track frame within the movie frame, the corresponding playback track frame enclosing metadata for the corresponding playback track, wherein each of the media streams present in the file is represented in the file as the corresponding playback track of the file, the media stream for the corresponding playback track includes a corresponding sample sequence, the corresponding sample sequence includes access units for the multi-layer video data, wherein the media stream for the corresponding playback track is enclosed within the media data frame of the file.

26. The video apparatus of claim 25, comprising:

A means for storing the bitstream in the file using a single playback track in response to determining that no access unit in the bitstream contains a decoded image with an image output flag equal to the first value and a decoded image with an image output flag equal to the second value.

27. The video apparatus of claim 25, wherein the multilayer video data is one of the following: Scalable High Efficiency Video Coding (SHVC) data, Multi-View High Efficiency Video Coding (MV-HEVC) data, or 3D High Efficiency Video Coding (3D-HEVC) data.

28. The video apparatus of claim 25, wherein the first value is equal to 1 and the second value is equal to 0.

29. A video apparatus for processing multi-layer video data, the video apparatus comprising:

A device for receiving documents; and

A means for obtaining a first playback track frame and a second playback track frame from the file, the first playback track frame and the second playback track frame being in a movie frame, the movie frame containing metadata for a continuous media stream existing in the file, wherein:

30. The video apparatus of claim 29, wherein the multi-layer video data is one of the following: Scalable High Efficiency Video Coding (SHVC) data, Multi-View High Efficiency Video Coding (MV-HEVC) data, or 3D High Efficiency Video Coding (3D-HEVC) data.

31. The video apparatus of claim 29, wherein the first value is equal to 1 and the second value is equal to 0.

32. A non-transitory computer-readable data storage medium having instructions stored thereon, which, when executed, cause one or more processors to:

Generate a file including movie frames, the movie frames containing metadata for a continuous media stream existing in the file, wherein, in order to generate the file, the instructions cause the one or more processors to:

In response to the determination of at least one access unit of the bitstream of multi-layer video data, which includes a decoded image having an image output flag equal to a first value and a decoded image having an image output flag equal to a second value different from the first value, the bitstream is stored in the file using at least one first playback track and a second playback track, wherein:

33. The non-transitory computer-readable data storage medium of claim 32, wherein the instructions cause one or more processors to:

34. The non-transitory computer-readable data storage medium of claim 32, wherein the multi-layer video data is one of the following: Scalable High Efficiency Video Coding (SHVC) data, Multi-View High Efficiency Video Coding (MV-HEVC) data, or 3D High Efficiency Video Coding (3D-HEVC) data.

35. The non-transitory computer-readable data storage medium of claim 32, wherein the first value is equal to 1 and the second value is equal to 0.

36. A non-transitory computer-readable data storage medium having instructions stored thereon, which, when executed, cause one or more processors to:

For each corresponding playback track of the file, each of the media streams existing in the file is represented in the file as the corresponding playback track of the file. The media stream for the corresponding playback track contains a corresponding sample sequence, the corresponding sample sequence contains access units for multiple layers of video data, and the media stream for the corresponding playback track is enclosed in the media data frame of the file.

37. The non-transitory computer-readable data storage medium of claim 36, wherein the multi-layer video data is one of: Scalable High Efficiency Video Coding (SHVC) data, Multi-View High Efficiency Video Coding (MV-HEVC) data, or 3D High Efficiency Video Coding (3D-HEVC) data.

38. The non-transitory computer-readable data storage medium of claim 36, wherein the first value is equal to 1 and the second value is equal to 0.