[go: up one dir, main page]

HK40020459B - Video coding with content adaptive spatially varying quantization - Google Patents

Video coding with content adaptive spatially varying quantization Download PDF

Info

Publication number
HK40020459B
HK40020459B HK62020009946.8A HK62020009946A HK40020459B HK 40020459 B HK40020459 B HK 40020459B HK 62020009946 A HK62020009946 A HK 62020009946A HK 40020459 B HK40020459 B HK 40020459B
Authority
HK
Hong Kong
Prior art keywords
video data
block
video
quantization parameter
decoded
Prior art date
Application number
HK62020009946.8A
Other languages
Chinese (zh)
Other versions
HK40020459A (en
Inventor
D‧鲁萨诺夫斯基
A‧K‧瑞玛苏布雷蒙尼安
Original Assignee
高通股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 高通股份有限公司 filed Critical 高通股份有限公司
Publication of HK40020459A publication Critical patent/HK40020459A/en
Publication of HK40020459B publication Critical patent/HK40020459B/en

Links

Description

利用内容自适应空间变化量化的视频译码Video decoding using content-adaptive spatial variation quantization

本申请案要求保护2017年10月12日申请的美国临时申请案第62/571,732号的权益并要求保护2018年10月9日申请的美国申请案16/155,344的权益,所述申请案的全部内容以引用的方式并入。This application claims the benefit of U.S. Provisional Application No. 62/571,732, filed October 12, 2017, and U.S. Application No. 16/155,344, filed October 9, 2018, the entire contents of which are incorporated herein by reference.

技术领域Technical Field

本发明涉及视频译码及/或视频处理。This invention relates to video decoding and/or video processing.

背景技术Background Technology

数字视频频能力可并入至广泛范围的装置中,所述装置包含数字电视、数字直播系统、无线广播系统、个人数字助理(PDA)、膝上型或台式计算机、平板计算机、电子书阅读器、数字相机、数字记录装置、数字媒体播放器、视频游戏装置、视频游戏控制台、蜂窝或卫星无线电电话(所谓的“智能电话”)、视频电话会议装置、视频流式发射装置及其类似者。数字视频装置实施视频译码技术,如由MPEG-2、MPEG-4、ITU-T H.263、ITU-T H.264/MPEG-4、先进视频译码(AVC)第10部分、ITU-T H.265、高效率视频译码(HEVC)所定义的标准及这些标准的扩展中所描述的那些技术。视频装置可通过实施此类视频译码技术来更有效地发射、接收、编码、解码及/或存储数字视频信息。Digital video capabilities can be incorporated into a wide range of devices, including digital television, digital live broadcast systems, wireless broadcasting systems, personal digital assistants (PDAs), laptops or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video game devices, video game consoles, cellular or satellite radio phones (so-called "smartphones"), video conferencing devices, video streaming devices, and the like. Digital video devices implement video decoding technologies, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Advanced Video Decoding (AVC) Part 10, ITU-T H.265, High Efficiency Video Decoding (HEVC), and extensions to these standards. Video devices can transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing such video decoding technologies.

视频译码技术包含空间(图片内)预测及/或时间(图片间)预测以减少或移除视频序列中固有的冗余。对于基于块的视频译码,视频切片(例如,视频帧或视频帧的一部分)可分割成视频块(其也可被称作树型块)、译码单元(CU)及/或译码节点。使用关于同一图片中的相邻块中的参考样本的空间预测来编码图片的经帧内译码(I)的切片中的视频块。图片的帧间译码(P或B)切片中的视频块可使用关于同一图片中的相邻块中的参考样本的空间预测或关于其它参考图片中的参考样本的时间预测。图片可被称作帧,且参考图片可被称作参考帧。Video decoding techniques involve spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove inherent redundancy in video sequences. For block-based video decoding, video slices (e.g., video frames or portions of video frames) can be segmented into video blocks (which may also be called tree blocks), decoding units (CUs), and/or decoding nodes. Video blocks in intra-frame decoded (I) slices of a picture are encoded using spatial predictions about reference samples in adjacent blocks within the same picture. Video blocks in inter-frame decoded (P or B) slices of a picture can use spatial predictions about reference samples in adjacent blocks within the same picture or temporal predictions about reference samples in other reference pictures. A picture may be called a frame, and a reference picture may be called a reference frame.

空间或时间预测产生用于待译码的块的预测性块。残余数据表示待译码的原始块与预测性块之间的像素差。根据指向形成预测性块的参考样本的块的运动向量及指示经译码块与预测性块之间的差的残余数据来编码经帧间译码的块。经帧内译码块是根据帧内译码模式及残余数据编码。为了进一步压缩,可将残余数据从像素域变换到变换域,从而产生可接着进行量化的残余变换系数。可扫描最初布置成二维阵列的经量化变换系数以便产生变换系数的一维向量,且可应用熵译码以实现甚至较多压缩。Spatial or temporal prediction produces predictive blocks for the block to be decoded. Residual data represents the pixel difference between the original block and the predictive block. Inter-frame decoded blocks are encoded based on the motion vector of the block pointing to the reference sample forming the predictive block and the residual data indicating the difference between the decoded block and the predictive block. Intra-frame decoded blocks are encoded according to the intra-frame decoding mode and the residual data. For further compression, the residual data can be transformed from the pixel domain to the transform domain, resulting in residual transform coefficients that can then be quantized. The quantized transform coefficients, initially arranged in a two-dimensional array, can be scanned to produce a one-dimensional vector of transform coefficients, and entropy decoding can be applied to achieve even greater compression.

可由色域来定义可捕捉、译码及显示的色值的总数目。色域是指装置可捕捉(例如,相机)或再现(例如,显示器)的色彩的范围。常常,色域在装置之间是不同的。针对视频译码,可使用视频数据的预定义色域,以使得视频译码过程中的每一装置可经配置以在同一色域中处理像素值。一些色域是用比传统上已用于视频译码的色域大的色彩范围进行定义。具有较大色彩范围的这些色域可被称作广色域(WCG)。A color gamut can be defined as the total number of color values that can be captured, decoded, and displayed. A color gamut refers to the range of colors a device can capture (e.g., a camera) or reproduce (e.g., a display). Often, color gamuts differ between devices. For video decoding, a predefined color gamut of the video data can be used so that each device in the video decoding process can be configured to process pixel values within the same color gamut. Some color gamuts are defined using a larger range of colors than those traditionally used for video decoding. These color gamuts with a larger range of colors are called wide color gamuts (WCG).

视频数据的另一方面是动态范围。动态范围通常经定义为视频信号的最大亮度与最小亮度(例如,照度)之间的比。认为过去所使用的常用视频数据的动态范围具有标准动态范围(SDR)。视频数据的其它实例规范定义具有较大的最大亮度与最小亮度的比的色彩数据。此视频数据可描述为具有高动态范围(HDR)。Another aspect of video data is dynamic range. Dynamic range is typically defined as the ratio between the maximum and minimum luminance (e.g., illumination) of a video signal. The dynamic range of commonly used video data in the past is considered to have a standard dynamic range (SDR). Other instances of video data specifications define color data with a large ratio of maximum to minimum luminance. This video data can be described as having a high dynamic range (HDR).

发明内容Summary of the Invention

本发明描述应用于视频译码系统的译码(例如,编码或解码)环路的实例处理方法(及经配置以执行所述方法的装置)。本发明的技术适用于在视频数据的动态范围内对具有视频数据的非均匀分布的感知到的恰可辨差异(例如信号对噪声比)的视频数据表示进行译码。视频编码器可经配置以应用多阶段量化过程,其中首先使用从块的样本的统计导出的有效量化参数来量化残余。随后使用跨图片均匀的基础量化参数来进一步量化所述残余。视频解码器可经配置以使用所述基础量化参数来解码视频数据。所述视频解码器可进一步经配置以从所述块的经解码样本的统计估计所述有效量化参数。所述视频解码器接着可使用所述经估计的有效量化参数来确定其它译码工具(包含滤波器)的参数。以此方式,在未用信号表示有效量化参数时,保存信令开销,但在解码器侧对其进行估计。This invention describes an example processing method (and apparatus configured to perform the method) for a decoding (e.g., encoding or decoding) loop applied to a video decoding system. The technology of this invention is applicable to decoding video data representations with perceived discernible differences (e.g., signal-to-noise ratio) across a non-uniform distribution of video data within the dynamic range of the video data. A video encoder can be configured to apply a multi-stage quantization process, wherein the residual is first quantized using effective quantization parameters derived statistically from samples of the block. The residual is then further quantized using a base quantization parameter uniform across the image. A video decoder can be configured to decode the video data using the base quantization parameters. The video decoder can be further configured to estimate the effective quantization parameters from statistically derived samples of the decoded block. The video decoder can then use the estimated effective quantization parameters to determine parameters for other decoding tools (including filters). In this way, signaling overhead is preserved when the effective quantization parameters are not represented as signals, but are estimated at the decoder side.

在一个实例中,本发明描述一种解码视频数据的方法,所述方法包括:接收视频数据的经编码块,所述视频数据的所述经编码块已使用有效量化参数及基础量化参数编码,其中所述有效量化参数为添加到基础量化参数的量化参数偏移的函数;确定用以编码视频数据的经编码块的基础量化参数;使用所述基础量化参数解码视频数据的经编码块以建立视频数据的经解码块;基于与视频数据的经解码块相关联的统计确定用于视频数据的经解码块的量化参数偏移的估计;将量化参数偏移的估计添加到基础量化参数以建立有效量化参数的估计;及根据有效量化参数的估计对视频数据的经解码块执行一或多个滤波操作。In one example, the present invention describes a method for decoding video data, the method comprising: receiving an encoded block of video data, the encoded block of video data having been encoded using effective quantization parameters and basic quantization parameters, wherein the effective quantization parameters are a function of quantization parameter offsets added to the basic quantization parameters; determining basic quantization parameters for encoding the encoded block of video data; decoding the encoded block of video data using the basic quantization parameters to establish a decoded block of video data; determining an estimate of the quantization parameter offset for the decoded block of video data based on statistics associated with the decoded block of video data; adding the estimate of the quantization parameter offset to the basic quantization parameters to establish an estimate of the effective quantization parameters; and performing one or more filtering operations on the decoded block of video data according to the estimate of the effective quantization parameters.

在另一实例中,本发明描述一种编码视频数据的方法,所述方法包括:确定用于视频数据块的基础量化参数;基于与视频数据块相关联的统计确定用于所述视频数据块的量化参数偏移;将量化参数偏移添加到基础量化参数以建立有效量化参数;及使用有效量化参数及基础量化参数编码视频数据块。In another example, the present invention describes a method for encoding video data, the method comprising: determining a base quantization parameter for a video data block; determining a quantization parameter offset for the video data block based on statistics associated with the video data block; adding the quantization parameter offset to the base quantization parameter to establish an effective quantization parameter; and encoding the video data block using the effective quantization parameter and the base quantization parameter.

在另一实例中,本发明描述一种经配置以解码视频数据的设备,所述设备包括经配置以存储视频数据的经编码块的存储器,及与所述存储器通信的一或多个处理器,所述一或多个处理器经配置以:接收视频数据的经编码块;所述视频数据的经编码块已使用有效量化参数及基础量化参数编码,其中所述有效量化参数为添加到基础量化参数的量化参数偏移的函数;确定用以编码视频数据的经编码块的基础量化参数;使用所述基础量化参数解码视频数据的经编码块以建立视频数据的经解码块;基于与视频数据的经解码块相关联的统计确定用于视频数据的经解码块的量化参数偏移的估计;将量化参数偏移的估计添加到基础量化参数以建立有效量化参数的估计;及根据有效量化参数的估计对视频数据的经解码块执行一或多个滤波操作。In another example, the present invention describes an apparatus configured to decode video data, the apparatus including a memory configured to store encoded blocks of video data, and one or more processors in communication with the memory, the processors being configured to: receive encoded blocks of video data; the encoded blocks of video data being encoded using effective quantization parameters and basic quantization parameters, wherein the effective quantization parameters are a function of quantization parameter offsets added to the basic quantization parameters; determine basic quantization parameters for encoding the encoded blocks of video data; decode the encoded blocks of video data using the basic quantization parameters to establish decoded blocks of video data; determine an estimate of the quantization parameter offset for the decoded blocks of video data based on statistics associated with the decoded blocks of video data; add the estimate of the quantization parameter offset to the basic quantization parameters to establish an estimate of the effective quantization parameters; and perform one or more filtering operations on the decoded blocks of video data according to the estimate of the effective quantization parameters.

在另一实例中,本发明描述一种经配置以编码视频数据的设备,所述设备包括经配置以存储视频数据块的存储器,及与所述存储器通信的一或多个处理器,所述一或多个处理器经配置以:确定用于视频数据块的基础量化参数;基于与所述视频数据块相关联的统计确定用于视频数据块的量化参数偏移;将量化参数偏移添加到基础量化参数以建立有效量化参数;及使用有效量化参数及基础量化参数来编码视频数据块。In another example, the present invention describes an apparatus configured to encode video data, the apparatus including a memory configured to store blocks of video data, and one or more processors in communication with the memory, the one or more processors being configured to: determine a base quantization parameter for the video data blocks; determine a quantization parameter offset for the video data blocks based on statistics associated with the video data blocks; add the quantization parameter offset to the base quantization parameter to establish a valid quantization parameter; and encode the video data blocks using the valid quantization parameter and the base quantization parameter.

在另一实例中,本发明描述一种经配置以解码视频数据的设备,所述设备包括:用于接收视频数据的经编码块的装置,所述视频数据的所述经编码块已使用有效量化参数及基础量化参数编码,其中所述有效量化参数为添加到基础量化参数的量化参数偏移的函数;用于确定用以编码视频数据的经编码块的基础量化参数的装置;用于使用所述基础量化参数解码视频数据的经编码块以建立视频数据的经解码块的装置;用于基于与视频数据的经解码块相关联的统计确定用于视频数据的经解码块的量化参数偏移的估计的装置;用于将量化参数偏移的估计添加到基础量化参数以建立有效量化参数的估计的装置;及用于根据有效量化参数的估计对视频数据的经解码块执行一或多个滤波操作的装置。In another example, the present invention describes an apparatus configured to decode video data, the apparatus comprising: means for receiving encoded blocks of video data, the encoded blocks of video data having been encoded using effective quantization parameters and basic quantization parameters, wherein the effective quantization parameters are a function of quantization parameter offsets added to the basic quantization parameters; means for determining the basic quantization parameters for encoding the encoded blocks of video data; means for decoding the encoded blocks of video data using the basic quantization parameters to construct decoded blocks of video data; means for determining an estimate of the quantization parameter offset for the decoded blocks of video data based on statistics associated with the decoded blocks of video data; means for adding the estimate of the quantization parameter offset to the basic quantization parameters to construct an estimate of the effective quantization parameters; and means for performing one or more filtering operations on the decoded blocks of video data based on the estimate of the effective quantization parameters.

在另一实例中,本发明描述一种经配置以编码视频数据的设备,所述设备包括:用于确定用于视频数据块的基础量化参数的装置;用于基于与视频数据块相关联的统计确定用于所述视频数据块的量化参数偏移的装置;用于将量化参数偏移添加到基础量化参数以建立有效量化参数的装置;及用于使用有效量化参数及基础量化参数编码视频数据块的装置。In another example, the present invention describes an apparatus configured to encode video data, the apparatus comprising: means for determining a base quantization parameter for a video data block; means for determining a quantization parameter offset for the video data block based on statistics associated with the video data block; means for adding the quantization parameter offset to the base quantization parameter to establish a valid quantization parameter; and means for encoding the video data block using the valid quantization parameter and the base quantization parameter.

在另一实例中,本发明描述一种存储指令的非暂时性计算机可读存储媒体,所述指令在经执行时使一或多个处理器:接收视频数据的经编码块,所述视频数据的所述经编码块已使用有效量化参数及基础量化参数编码,其中所述有效量化参数为添加到基础量化参数的量化参数偏移的函数;确定用以编码视频数据的经编码块的基础量化参数;使用所述基础量化参数解码视频数据的经编码块以建立视频数据的经解码块;基于与视频数据的经解码块相关联的统计确定用于视频数据的经解码块的量化参数偏移的估计;将量化参数偏移的估计添加到基础量化参数以建立有效量化参数的估计;及根据有效量化参数的估计对视频数据的经解码块执行一或多个滤波操作。In another example, the present invention describes a non-transitory computer-readable storage medium storing instructions that, when executed, cause one or more processors to: receive an encoded block of video data, the encoded block of video data having been encoded using effective quantization parameters and basic quantization parameters, wherein the effective quantization parameters are a function of quantization parameter offsets added to the basic quantization parameters; determine the basic quantization parameters for encoding the encoded block of video data; decode the encoded block of video data using the basic quantization parameters to establish a decoded block of video data; determine an estimate of the quantization parameter offset for the decoded block of video data based on statistics associated with the decoded block of video data; add the estimate of the quantization parameter offset to the basic quantization parameters to establish an estimate of the effective quantization parameters; and perform one or more filtering operations on the decoded block of video data according to the estimate of the effective quantization parameters.

在另一实例中,本发明描述一种存储指令的非暂时性计算机可读存储媒体,所述指令在经执行时使一或多个处理器:确定用于视频数据块的基础量化参数;基于与视频数据块相关联的统计确定用于所述视频数据块的量化参数偏移;将量化参数偏移添加到基础量化参数以建立有效量化参数;及使用有效量化参数及基础量化参数编码视频数据块。In another example, the present invention describes a non-transitory computer-readable storage medium storing instructions that, when executed, cause one or more processors to: determine a base quantization parameter for a video data block; determine a quantization parameter offset for the video data block based on statistics associated with the video data block; add the quantization parameter offset to the base quantization parameter to establish a valid quantization parameter; and encode the video data block using the valid quantization parameter and the base quantization parameter.

在以下随附图式及描述中阐述一或多个实例的细节。其它特征、目标及优势从描述、图式及权利要求书将是显而易见的。Details of one or more examples are set forth in the accompanying drawings and description. Other features, objects, and advantages will be apparent from the description, drawings, and claims.

附图说明Attached Figure Description

图1为说明经配置以实施本发明的技术的实例视频编码及解码系统的框图。Figure 1 is a block diagram illustrating an example video encoding and decoding system configured to implement the technology of the present invention.

图2A及2B为说明实例四分树二元树(QTBT)结构及对应译码树型单元(CTU)的概念图。Figures 2A and 2B are conceptual diagrams illustrating the structure of the Quad-Tab Tree (QTBT) and its corresponding decoding tree unit (CTU).

图3为说明HDR数据的概念的概念图。Figure 3 is a conceptual diagram illustrating the concept of HDR data.

图4为说明实例色域的概念图。Figure 4 is a conceptual diagram illustrating the example color gamut.

图5为说明HDR/WCG表示转换的实例的流程图。Figure 5 is a flowchart illustrating an example of HDR/WCG representation conversion.

图6为说明HDR/WCG反转换的实例的流程图。Figure 6 is a flowchart illustrating an example of HDR/WCG inverse conversion.

图7为说明用于从感知均匀的代码级别到线性照度的视频数据转换(包含SDR及HDR)的电光转移函数(EOTF)的实例的概念图。Figure 7 is a conceptual diagram illustrating an example of an electro-optical transfer function (EOTF) used for video data conversion (including SDR and HDR) from perceived uniform code level to linear illumination.

图8为说明可实施本发明的技术的视频编码器的实例的框图。Figure 8 is a block diagram illustrating an example of a video encoder from which the technology of the present invention can be implemented.

图9为说明可实施本发明的技术的视频编码器的实例量化单元的框图。Figure 9 is a block diagram illustrating an example quantization unit of a video encoder that can implement the technology of the present invention.

图10为说明可实施本发明的技术的视频解码器的实例的框图。Figure 10 is a block diagram illustrating an example of a video decoder from which the technology of the present invention can be implemented.

图11为说明实例编码方法的流程图。Figure 11 is a flowchart illustrating the instance encoding method.

图12为说明实例解码方法的流程图。Figure 12 is a flowchart illustrating the instance decoding method.

具体实施方式Detailed Implementation

本发明涉及对具有高动态范围(HDR)及广色域(WCG)表示的视频数据的处理及/或译码。更具体地说,本发明的技术包含在未对量化参数(例如,由δQP语法元素表示的量化参数的变化)进行显式信令的情况下进行内容自适应空间改变量化以高效压缩HDR/WCG视频信号。本文所描述的技术及装置可改进用于译码HDR及WCG视频数据的视频译码系统的压缩效率。本发明的技术可用于先进视频编解码器的上下文中,诸如HEVC的扩展或视频译码标准的下一代中。This invention relates to the processing and/or decoding of video data with High Dynamic Range (HDR) and Wide Color Gamut (WCG) representations. More specifically, the techniques of this invention involve content-adaptive spatial variation quantization to efficiently compress HDR/WCG video signals without explicit signaling of quantization parameters (e.g., variations in quantization parameters represented by δQP syntax elements). The techniques and apparatus described herein can improve the compression efficiency of video decoding systems used for decoding HDR and WCG video data. The techniques of this invention can be used in the context of advanced video codecs, such as extensions to HEVC or next-generation video decoding standards.

包含混合式视频译码标准的视频译码标准包含ITU-T H.261、ISO/IEC MPEG-1Visual、ITU-T H.262或ISO/IEC MPEG-2 Visual、ITU-T H.263、ISO/IEC MPEG-4 Visual及ITU-T H.264(也被称作ISO/IEC MPEG-4AVC),包含其可调式视频译码(SVC)及多视图视频译码(MVC)扩展。已由ITU-T视频译码专家组(VCEG)及ISO/IEC运动图片专家组(MPEG)的视频译码联合合作小组(JCT-VC)定案新的视频译码标准(即,高效率视频译码(HEVC,也称作H.265))的设计。Bross等人的被称作HEVC工作草案10(WD10)的HEVC草案规范“(高效率视频译码(HEVC)文本规范草案10(FDIS及最后一次要求)(High efficiency video coding(HEVC)text specification draft 10(for FDIS&Last Call)”,ITU-T SG16WP3与ISO/IECJTC1/SC29/WG11的关于视频译码的联合合作小组(JCT-VC),第12次会议:瑞士日内瓦,2013年1月14日至23日,JCTVC-L1003v34)可从http://phenix.int-evry.fr/jct/doc_end_ user/documents/12_Geneva/wg11/JCTVC-L1003-v34.zi p获得。定案的HEVC标准被称作HEVC版本1。定案的HEVC标准文件在2013年4月公布为“ITU-T H.265H系列:视听及多媒体系统、视听服务的基础架构——移动视频的译码、高效率视频译码、国际电信联盟(ITU)的电信标准化部门(ITU-T H.265,Series H:Audiovisual and Multimedia Systems,Infrastructure of audiovisual services-Coding of moving video,High efficiencyvideo coding,Telecommunication Standardization Sector of InternationalTelecommunication Union(ITU))”,且定案的HEVC标准的另一版本是在2014年10月公布。H.265/HEVC说明书文本的复本可从http://www.itu.int/rec/T-REC-H.265-201504-I/en下载。Video decoding standards that include hybrid video decoding standards include ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual, and ITU-T H.264 (also known as ISO/IEC MPEG-4 AVC), which include their Adjustable Video Decoding (SVC) and Multi-View Video Decoding (MVC) extensions. The design of a new video decoding standard (i.e., High Efficiency Video Decoding (HEVC, also known as H.265)) has been finalized by the Joint Collaborating Group on Video Decoding (JCT-VC) of the ITU-T Video Decoding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG). The HEVC draft specification, known as Working Draft 10 (WD10) by Bross et al., entitled "High efficiency video coding (HEVC) text specification draft 10 (for FDIS & Last Call)", was presented at the 12th meeting of the Joint Working Group on Video Decoding (JCT-VC) of ITU-T SG16WP3 and ISO/IEC JTC1 /SC29/WG11, Geneva, Switzerland, January 14-23, 2013, JCTVC-L1003v34. It is available at http://phenix.int-evry.fr/jct/doc_end_user/documents/12_Geneva/wg11/JCTVC-L1003-v34.zip . The finalized HEVC standard is called HEVC Version 1. The finalized HEVC standard document was published in April 2013 as "ITU-T..." The H.265H series refers to the infrastructure of audiovisual and multimedia systems and services—specifically, the decoding of moving video, high-efficiency video coding, and is part of the Telecommunication Standardization Sector of the International Telecommunication Union (ITU). Another version of the finalized HEVC standard was published in October 2014. A copy of the H.265/HEVC specification text can be downloaded from http://www.itu.int/rec/T-REC-H.265-201504-I/en .

ITU-T VCEG(Q6/16)及ISO/IEC MPEG(JTC 1/SC 29/WG 11)现正研究对于将具有显著超过当前HEVC标准(包含其当前扩展及针对屏幕内容译码及高动态范围译码的近期扩展)的压缩能力的压缩能力的未来视频译码技术标准化的潜在需要。所述小组正共同致力于联合合作工作(被称为联合视频探索小组(JVET))中的这种探索活动,以评估由此领域中的专家提议的压缩技术设计。JVET在2015年10月19日至21日期间第一次会面。且参考软件的最新版本,即联合探索模型7(JEM7)可从https://jvet.hhi.fraunhofer.de/svn/svn_ HMJEMSoftware/tags/HM-16.6-JEM-7.0/下载。JEM7的此算法描述可被称作J.Chen、E.Alshina、G.J.Sullivan、J.-R.Ohm、J.Boyce(JVET-C1001,托里诺,2017年7月)的“联合探索测试模型7(JEM7)的算法描述(Algorithm description of Joint Exploration TestModel 7(JEM7))”中。ITU-T VCEG (Q6/16) and ISO/IEC MPEG (JTC 1/SC 29/WG 11) are currently investigating the potential need to standardize future video decoding technologies with compression capabilities significantly exceeding those of the current HEVC standard (including its current extensions and recent extensions for screen content decoding and high dynamic range decoding). These groups are working together on this exploratory activity within a joint collaborative effort known as the Joint Video Exploration Group (JVET) to evaluate compression technology designs proposed by experts in this field. The JVET first met between October 19 and 21, 2015. The latest version of the reference software, Joint Exploration Model 7 ( JEM7 ), can be downloaded from https://jvet.hhi.fraunhofer.de/svn/svn_HMJEMSoftware/tags/HM-16.6-JEM-7.0/ . The algorithm description of JEM7 can be found in "Algorithm description of Joint Exploration TestModel 7 (JEM7)" by J. Chen, E. Alshina, G. J. Sullivan, J.-R. Ohm, and J. Boyce (JVET-C1001, Torino, July 2017).

最近,被称作通用视频译码(VVC)标准的新的视频译码标准处于由VCEG及MPEG的联合视频专家小组(JVET)进行的开发中。VVC的早期草案可在文件JVET-J1001“通用视频译码(草案1)(Versatile Video coding(Draft 1)”中获得且其算法描述可在文件JVET-J1002“多功能视频译码及测试模型1(VTM 1)的算法描述(Algorithm description forVersatile Video Coding and Test Model 1(VTM 1))”中获得。Recently, a new video decoding standard, known as the Versatile Video Coding (VVC) standard, is under development by the Joint Video Experts Group (JVET) of VCEG and MPEG. An early draft of VVC is available in document JVET-J1001 "Versatile Video Coding (Draft 1)" and its algorithm description is available in document JVET-J1002 "Algorithm description for Versatile Video Coding and Test Model 1 (VTM 1)".

图1为说明可利用本发明的技术的实例视频编码及解码系统10的框图。如图1中所示,系统10包含源装置12,其提供稍后时间将由目的地装置14解码的经编码视频数据。确切地说,源装置12经由计算机可读媒体16将视频数据提供到目的地装置14。源装置12及目的地装置14可包括广泛范围的装置中的任一者,包含台式计算机、笔记型(即,膝上型)计算机、平板计算机、机顶盒、电话手机(诸如,所谓的“智能”电话)、所谓的“智能”垫、电视、相机、显示装置、数字媒体播放器、视频游戏控制台、视频流式发射装置或其类似者。在一些情况下,可装备源装置12及目的地装置14以用于无线通信。Figure 1 is a block diagram illustrating an example video encoding and decoding system 10 from which the technology of the present invention can be utilized. As shown in Figure 1, system 10 includes a source device 12 that provides encoded video data to a destination device 14 for later decoding. Specifically, the source device 12 provides the video data to the destination device 14 via a computer-readable medium 16. The source device 12 and the destination device 14 may include any of a wide range of devices, including desktop computers, notebook computers, tablet computers, set-top boxes, mobile phones (such as so-called "smart" phones), so-called "smart" mats, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, or the like. In some cases, the source device 12 and the destination device 14 may be equipped for wireless communication.

目的地装置14可经由计算机可读媒体16接收待解码的经编码视频数据。计算机可读媒体16可包括能够将经编码视频数据从源装置12移动到目的地装置14的任一类型的媒体或装置。在一个实例中,计算机可读媒体16可包括通信媒体以使源装置12能够实时地将经编码视频数据直接发射到目的地装置14。可根据通信标准(诸如,有线或无线通信协议)调制经编码视频数据,且将其发射到目的地装置14。通信媒体可包括任何无线或有线通信媒体,诸如,射频(RF)频谱或一或多个物理发射线。通信媒体可形成基于包的网络(诸如,局域网、广域网或诸如因特网的全域网)的部分。通信媒体可包含路由器、交换器、基站或可用于促进从源装置12到目的地装置14的通信的任何其它设备。Destination device 14 may receive encoded video data to be decoded via computer-readable medium 16. Computer-readable medium 16 may include any type of media or device capable of moving encoded video data from source device 12 to destination device 14. In one example, computer-readable medium 16 may include communication media enabling source device 12 to transmit encoded video data directly to destination device 14 in real time. The encoded video data may be modulated according to communication standards (such as wired or wireless communication protocols) and transmitted to destination device 14. Communication media may include any wireless or wired communication media, such as radio frequency (RF) spectrum or one or more physical transmission lines. Communication media may form part of a packet-based network (such as a local area network, wide area network, or global area network such as the Internet). Communication media may include routers, switches, base stations, or any other devices that can be used to facilitate communication from source device 12 to destination device 14.

在其它实例中,计算机可读媒体16可包含非暂时性存储媒体,诸如硬盘、闪存盘、光盘、数字视频光盘、蓝光光盘或其它计算机可读媒体。在一些实例中,网络服务器(图中未展示)可从源装置12接收经编码视频数据且(例如)经由网络发射将经编码视频数据提供到目的地装置14。类似地,诸如光盘冲压设施的媒体生产设施的计算设备可从源设备12接收经编码视频数据且生产含有经编码视频数据的光盘。因此,在各种实例中,可将计算机可读媒体16理解为包含各种形式的一或多个计算机可读媒体。In other instances, computer-readable media 16 may comprise non-transitory storage media, such as hard disks, flash drives, optical discs, digital video discs, Blu-ray discs, or other computer-readable media. In some instances, a network server (not shown) may receive encoded video data from source device 12 and, for example, provide the encoded video data to destination device 14 via network transmission. Similarly, a computing device in a media production facility, such as an optical disc stamping facility, may receive encoded video data from source device 12 and produce an optical disc containing the encoded video data. Therefore, in various instances, computer-readable media 16 can be understood as comprising one or more computer-readable media of various forms.

在一些实例中,可从输出接口22将经编码数据输出到存储设备。类似地,可通过输入接口从存储装置存取经编码数据。存储装置可包含多种分布式或本地存取的数据存储媒体中的任一者,诸如硬盘驱动器、蓝光光盘、DVD、CD-ROM、闪速存储器、易失性或非易失性存储器或用于存储经编码视频数据的任何其它合适的数字存储媒体。在再一实例中,存储设备可对应于文件服务器或可存储由源装置12产生的经编码视频的另一中间存储装置。目的地装置14可经由流式发射或下载从存储装置存取存储的视频数据。文件服务器可为能够存储经编码视频数据且将经编码视频数据发射到目的地装置14的任何类型的服务器。实例文件服务器包含网页服务器(例如,用于网站)、FTP服务器、网络附加存储(NAS)设备或本地磁盘机。目的地装置14可经由任何标准数据连接(包含因特网连接)而存取经编码视频数据。其可包含无线信道(例如Wi-Fi连接)、有线连接(例如,DSL、电缆调制解调器等),或所述两者的适合于存取存储于文件服务器上的经编码视频数据的组合。经编码视频数据从存储装置的发射可为流式发射、下载发射或其组合。In some instances, encoded data can be output to a storage device from output interface 22. Similarly, encoded data can be accessed from a storage device via an input interface. The storage device may comprise any of a variety of distributed or locally accessible data storage media, such as hard disk drives, Blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other suitable digital storage media for storing encoded video data. In yet another instance, the storage device may correspond to a file server or another intermediate storage device capable of storing the encoded video generated by source device 12. Destination device 14 can access the stored video data from the storage device via streaming or downloading. The file server can be any type of server capable of storing and transmitting encoded video data to destination device 14. Example file servers include web servers (e.g., for websites), FTP servers, network attached storage (NAS) devices, or local disk drives. Destination device 14 can access the encoded video data via any standard data connection, including an Internet connection. It may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both suitable for accessing encoded video data stored on a file server. The transmission of encoded video data from the storage device may be streaming, downloading, or a combination thereof.

本发明的技术不必限于无线应用或设置。所述技术可应用于支持多种多媒体应用中的任一者的视频译码,诸如,空中电视广播、有线电视发射、卫星电视发射、因特网流式视频发射(诸如,经由HTTP动态自适应流式发射(DASH))、经编码到数据存储媒体上的数字视频、存储于数据存储媒体上的数字视频的解码或其它应用。在一些实例中,系统10可经配置以支持单向或双向视频发射从而支持诸如视频流、视频播放、视频广播及/或视频电话的应用。The technology of this invention is not limited to wireless applications or setups. It can be applied to video decoding supporting any of a variety of multimedia applications, such as over-the-air television broadcasting, cable television transmission, satellite television transmission, internet streaming video transmission (e.g., via HTTP Dynamic Adaptive Streaming (DASH)), digital video encoded to data storage media, decoding digital video stored on data storage media, or other applications. In some instances, system 10 can be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

在图1的实例中,源装置12包含视频源18、视频编码器20及输出接口22。目的地装置14包含输入接口28、动态范围调整(DRA)单元19、视频解码器30及显示装置32。根据本发明,源装置12的DRA单元19可经配置以实施本发明的技术,包含应用于特定色彩空间中的视频数据以实现HDR及WCG视频数据的更高效压缩的信令及相关操作。在一些实例中,DRA单元19可与视频编码器20分离。在其它实例中,DRA单元19可为视频编码器20的部分。在其它实例中,源装置及目的地装置可包含其它组件或布置。举例来说,源装置12可从外部视频源18(诸如,外部相机)接收视频数据。同样地,目的地装置14可与外部显示装置介接,而非包含集成式显示装置。In the example of Figure 1, source device 12 includes a video source 18, a video encoder 20, and an output interface 22. Destination device 14 includes an input interface 28, a dynamic range adjustment (DRA) unit 19, a video decoder 30, and a display device 32. According to the invention, the DRA unit 19 of source device 12 can be configured to implement the techniques of the invention, including signaling and related operations applied to video data in a specific color space to achieve more efficient compression of HDR and WCG video data. In some instances, the DRA unit 19 can be separate from the video encoder 20. In other instances, the DRA unit 19 can be part of the video encoder 20. In other instances, the source and destination devices can include other components or arrangements. For example, source device 12 can receive video data from an external video source 18 (such as an external camera). Similarly, destination device 14 can interface with an external display device, rather than including an integrated display device.

图1的所说明系统10仅为一个实例。用于处理及译码HDR及WCG视频数据的技术可由任何数字视频编码及/或视频解码装置来执行。此外,本发明的一些实例技术也可由视频预处理器及/或视频后处理器执行。视频预处理器可为任何经配置以在编码之前(例如,在HEVC、VVC或其它编码之前)处理视频数据的装置。视频后处理器可为任何经配置以在解码之后(例如,在HEVC、VVC或其它解码之后)处理视频数据的装置。源装置12及目的地装置14仅为源装置12产生经译码视频数据以用于发射到目的地装置14的此类译码装置的实例。在一些实例中,装置12、14可以基本上对称的方式操作,以使得装置12、14中的每一者包含视频编码及解码组件,以及视频预处理器及视频后处理器(例如,分别为DRA单元19及反DRA单元31)。因此,系统10可支持视频装置12、14之间的单向或双向视频传播以用于(例如)视频流式发射、视频播放、视频广播或视频电话。The system 10 illustrated in Figure 1 is only one example. The techniques for processing and decoding HDR and WCG video data can be performed by any digital video encoding and/or video decoding device. Furthermore, some examples of the techniques of this invention can also be performed by a video preprocessor and/or a video postprocessor. A video preprocessor can be any device configured to process video data before encoding (e.g., before HEVC, VVC, or other encoding). A video postprocessor can be any device configured to process video data after decoding (e.g., after HEVC, VVC, or other decoding). Source device 12 and destination device 14 are merely examples of such decoding devices where source device 12 generates decoded video data for transmission to destination device 14. In some instances, devices 12, 14 can operate in a substantially symmetrical manner, such that each of devices 12, 14 includes video encoding and decoding components, as well as a video preprocessor and a video postprocessor (e.g., DRA unit 19 and anti-DRA unit 31, respectively). Therefore, system 10 can support one-way or two-way video transmission between video devices 12, 14 for purposes such as video streaming, video playback, video broadcasting, or video telephony.

源装置12的视频源18可包含视频捕捉装置,诸如视频相机、含有先前捕捉的视频的视频存档及/或用以从视频内容提供者接收视频的视频馈送接口。作为另一替代,视频源18可产生基于计算机图形的数据作为源视频,或直播视频、经存档视频及计算机产生的视频的组合。在一些情况下,如果视频源18为视频相机,那么源装置12及目的地装置14可形成所谓的相机电话或视频电话。然而,如上文所提及,本发明中所描述的技术可适用于视频译码及视频处理,一般来说,且可应用于无线及/或有线应用。在每一情况下,所捕捉、预先捕捉或计算机产生的视频可由视频编码器20编码。经编码视频信息可接着由输出接口22输出到计算机可读媒体16上。The video source 18 of the source device 12 may include a video capture device, such as a video camera, a video archive containing previously captured video, and/or a video feed interface for receiving video from a video content provider. Alternatively, the video source 18 may generate computer graphics-based data as source video, or a combination of live video, archived video, and computer-generated video. In some cases, if the video source 18 is a video camera, then the source device 12 and the destination device 14 may form a so-called camera phone or video phone. However, as mentioned above, the techniques described in this invention are applicable to video decoding and video processing, and generally, can be applied to wireless and/or wired applications. In each case, the captured, pre-captured, or computer-generated video may be encoded by the video encoder 20. The encoded video information may then be output from the output interface 22 to the computer-readable medium 16.

目的地装置14的输入接口28从计算机可读媒体16接收信息。计算机可读媒体16的信息可包含由视频编码器20定义的语法信息(其也由视频解码器30使用),所述语法信息包含描述块及其它经译码单元(例如,图片群组(GOP))的特性及/或处理的语法元素。显示装置32将经解码视频数据显示给用户,且可包括多种显示装置中的任一者,诸如阴极射线管(CRT)、液晶显示器(LCD)、等离子显示器、有机发光二极管(OLED)显示器或另一类型的显示设备。The input interface 28 of the destination device 14 receives information from the computer-readable medium 16. The information on the computer-readable medium 16 may include grammatical information defined by the video encoder 20 (which is also used by the video decoder 30), the grammatical information including descriptive blocks and other characteristics and/or processed grammatical elements of the decoding unit (e.g., group of pictures (GOP)). The display device 32 displays the decoded video data to the user and may include any of a variety of display devices, such as a cathode ray tube (CRT), liquid crystal display (LCD), plasma display, organic light-emitting diode (OLED) display, or another type of display device.

视频编码器20及视频解码器30各自可实施为多种合适的编码器或解码器电路中的任一者,诸如,一或多个微处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、离散逻辑、软件、硬件、固件或其任何组合。当所述技术部分以软件实施时,装置可将用于软件的指令存储于合适的非暂时性计算机可读媒体中,且使用一或多个处理器在硬件中执行所述指令,以执行本发明的技术。视频编码器20及视频解码器30中的每一者可包含在一或多个编码器或解码器中,编码器或解码器中的任一者可集成为相应装置中的组合式编码器/解码器(编解码器)的部分。The video encoder 20 and video decoder 30 can each be implemented as any of a variety of suitable encoder or decoder circuits, such as one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware, or any combination thereof. When the technical portion is implemented in software, the device may store instructions for the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the technology of the present invention. Each of the video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, and any of the encoders or decoders may be integrated as part of a combined encoder/decoder (codec) in the respective device.

DRA单元19及反DRA单元31可各自实施为多种合适的编码器电路中的任一者,诸如一或多个微处理器、DSP、ASIC、FPGA、离散逻辑、软件、硬件、固件或其任何组合。当所述技术部分以软件实施时,装置可将用于软件的指令存储于合适的非暂时性计算机可读媒体中,且使用一或多个处理器在硬件中执行所述指令,以执行本发明的技术。DRA unit 19 and anti-DRA unit 31 can each be implemented as any of a variety of suitable encoder circuits, such as one or more microprocessors, DSPs, ASICs, FPGAs, discrete logic, software, hardware, firmware, or any combination thereof. When the technical portion is implemented in software, the device may store instructions for the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the technology of the present invention.

在一些实例中,视频编码器20及视频解码器30根据视频压缩标准(诸如,ITU-TH.265/HEVC、VVC或其它下一代视频译码标准)操作。In some instances, the video encoder 20 and the video decoder 30 operate according to video compression standards such as ITU-TH.265/HEVC, VVC, or other next-generation video decoding standards.

在HEVC及其它视频译码标准中,视频序列通常包含一系列图片。图片也可被称为“帧”。图片可包含三个样本阵列,表示为SL、SCb及SCr。SL为明度样本的二维阵列(即,块)。SCb为Cb彩度样本的二维阵列。SCr为Cr彩度样本的二维阵列。彩度样本也可在本文中被称作“色度(chroma)”样本。在其它情况下,图片可为单色的,且可仅包含明度样本阵列。In HEVC and other video decoding standards, video sequences typically contain a series of images. Images are also referred to as "frames." An image may contain three sample arrays, denoted as SL , SCb , and SCr . SL is a two-dimensional array (i.e., a block) of luminance samples. SCb is a two-dimensional array of Cb chroma samples. SCr is a two-dimensional array of Cr chroma samples. Chroma samples may also be referred to as "chroma" samples in this document. In other cases, images may be monochrome and may contain only luminance sample arrays.

视频编码器20可产生一组译码树型单元(CTU)。CTU中的每一者可包括明度样本的译码树型块、色度样本的两个对应译码树型块,及用以译码所述译码树型块的样本的语法结构。在单色图片或具有三个单独色彩平面的图片中,CTU可包括单一译码树型块及用于译码所述译码树型块的样本的语法结构。译码树型块可为样本的N×N块。CTU也可被称作“树型块”或“最大译码单元”(LCU)。HEVC的CTU可广泛地类似于诸如H.264/AVC的其它视频译码标准的宏块。然而,CTU未必限于特定大小,且可包含一或多个译码单元(CU)。切片可包含在光栅扫描中连续排序的整数数目个CTU。The video encoder 20 can generate a set of decoding tree units (CTUs). Each CTU may include a decoding tree block for luminance samples, two corresponding decoding tree blocks for chrominance samples, and a syntax structure for the samples used to decode the decoding tree block. In a monochrome image or an image with three separate color planes, a CTU may include a single decoding tree block and a syntax structure for the samples used to decode the decoding tree block. The decoding tree block may be an N×N block of samples. A CTU may also be referred to as a "tree block" or a "maximum decoding unit" (LCU). HEVC's CTUs can be broadly similar to macroblocks in other video decoding standards such as H.264/AVC. However, a CTU is not necessarily limited to a specific size and may contain one or more decoding units (CUs). A slice may contain an integer number of CTUs ordered consecutively in a raster scan.

本发明可使用术语“视频单元”或“视频块”来指样本的一或多个块,及用于译码样本的一或多个块的样本的语法结构。视频单元的实例类型可包含HEVC中的CTU、CU、PU、变换单元(TU),或其它视频译码标准中的宏块、宏块分区等。This invention may use the terms "video unit" or "video block" to refer to one or more blocks of a sample, and the grammatical structure of the sample used to decode one or more blocks of the sample. Instance types of video units may include CTUs, CUs, PUs, transform units (TUs) in HEVC, or macroblocks, macroblock partitions, etc., in other video decoding standards.

为产生经译码CTU,视频编码器20可对CTU的译码树型块递回地执行四分树分割,以将译码树型块划分成译码块,因此名称为“译码树型单元”。译码块为样本的N×N块。CU可包括明度样本的译码块及具有明度样本阵列、Cb样本阵列及Cr样本阵列的图片的色度样本的两个对应译码块,以及用于译码所述译码块的样本的语法结构。在单色图片或具有三个单独色彩平面的图片中,CU可包括单一译码块及用以译码所述译码块的样本的语法结构。To generate a decoded CTU, the video encoder 20 can recursively perform quartic tree partitioning on the decoded tree block of the CTU to divide the decoded tree block into decoded blocks, hence the name "decoded tree unit". A decoded block is an N×N block of samples. A CU may include a decoded block of luminance samples and two corresponding decoded blocks of chrominance samples of an image having luminance sample arrays, Cb sample arrays, and Cr sample arrays, as well as a syntax structure of samples for decoding said decoded blocks. In a monochrome image or an image with three separate color planes, the CU may include a single decoded block and a syntax structure of samples for decoding said decoded block.

视频编码器20可将CU的译码块分割成一或多个预测块。预测块可为应用相同预测的样本的矩形(即,正方形或非正方形)块。CU的预测单元(PU)可包括图片的明度样本的预测块、色度样本的两个对应预测块及用以对预测块样本进行预测的语法结构。在单色图片或具有三个单独色彩平面的图片中,PU可包括单一预测块,及用以对预测块样本进行预测的语法结构。视频编码器20可产生CU的每一PU的明度、Cb及Cr预测块的预测性明度、Cb及Cr块。The video encoder 20 can segment the decoding blocks of the CU into one or more prediction blocks. A prediction block can be a rectangular (i.e., square or non-square) block of samples to which the same prediction is applied. The prediction unit (PU) of the CU can include prediction blocks of luminance samples of the image, two corresponding prediction blocks of chrominance samples, and a syntax structure for predicting the prediction block samples. In a monochrome image or an image with three separate color planes, the PU can include a single prediction block and a syntax structure for predicting the prediction block samples. The video encoder 20 can generate predictive luminance, Cb, and Cr blocks for each PU of the CU.

在JEM7中,可使用四分树二元树(QTBT)分割结构而非使用上述HEVC的四分树分割结构。QTBT结构移除多个分区类型的概念。即,QTBT结构移除CU、PU及TU概念的分离,且支持CU分区形状的较多可挠性。在QTBT块结构中,CU可具有正方形或矩形形状。在一个实例中,CU为按四分树结构的第一分区。四分树叶节点通过二元树结构进一步分割。In JEM7, a Quadtree Binary Tree (QTBT) partitioning structure can be used instead of the HEVC quadtree partitioning structure described above. The QTBT structure removes the concept of multiple partition types. That is, the QTBT structure removes the separation of CU, PU, and TU concepts and supports greater flexibility in the shape of the CU partition. In a QTBT block structure, the CU can have a square or rectangular shape. In one example, the CU is the first partition according to the quadtree structure. The leaf nodes of the quadtrees are further partitioned using a binary tree structure.

在一些实例中,存在两种分裂类型:对称水平分裂及对称竖直分裂。二元树叶节点被称作CU,且所述分段(即,CU)用于预测及变换处理而无需任何进一步分割。这意味着CU、PU及TU在QTBT译码块结构中具有相同块大小。在JEM中,CU有时由具有不同色彩分量的译码块(CB)组成。举例来说,在4:2:0色度格式的P及B切片的情况下,一个CU含有一个明度CB及两个色度CB,且有时由具有单一分量的CB组成。举例来说,在I切片的情况下,一个CU含有仅一个明度CB或仅两个色度CB。In some instances, two splitting types exist: symmetrical horizontal splitting and symmetrical vertical splitting. Binary leaf nodes are called CUs, and the segments (i.e., CUs) are used for prediction and transform processing without any further segmentation. This means that CUs, PUs, and TUs have the same block size in the QTBT decoding block structure. In JEM, CUs sometimes consist of decoding blocks (CBs) with different color components. For example, in the case of P and B slices in a 4:2:0 chroma format, a CU contains one luminance CB and two chroma CBs, and sometimes it consists of CBs with a single component. For example, in the case of I slices, a CU contains only one luminance CB or only two chroma CBs.

在一些实例中,视频编码器20及视频解码器30可经配置以根据JEM/VVC标准操作。根据JEM/VVC,视频译码器(诸如视频编码器20)将图片分割成多个CU。JEM的实例QTBT结构包含两个级别:根据四分树分割进行分割的第一级别,及根据二元树分割进行分割的第二级别。QTBT结构的根节点对应于CTU。二元树的叶节点对应于译码单元(CU)。In some instances, the video encoder 20 and video decoder 30 can be configured to operate according to the JEM/VVC standard. According to JEM/VVC, the video decoder (such as the video encoder 20) segments the image into multiple CUs. An example QTBT structure of JEM contains two levels: a first level of segmentation based on a quartic tree, and a second level of segmentation based on a binary tree. The root node of the QTBT structure corresponds to a CTU. The leaf nodes of the binary tree correspond to decoding units (CUs).

在一些实例中,视频编码器20及视频解码器30可使用单一QTBT结构来表示照度及彩度分量中的每一者,而在其它实例中,视频编码器20及视频解码器30可使用两个或更多个QTBT结构,诸如用于照度分量的一个QTBT结构及用于两个彩度分量的另一QTBT结构(或用于相应彩度分量的两个QTBT结构)。In some instances, the video encoder 20 and the video decoder 30 may use a single QTBT structure to represent each of the illuminance and chroma components, while in other instances, the video encoder 20 and the video decoder 30 may use two or more QTBT structures, such as one QTBT structure for the illuminance component and another QTBT structure for the two chroma components (or two QTBT structures for the respective chroma components).

视频编码器20及视频解码器30可经配置以使用根据HEVC的四分树分割、根据JEM/VVC的QTBT分割,或其它分割结构。出于解释的目的,关于QTBT分割呈现本发明的技术的描述。然而,应理解,本发明的技术也可应用于经配置以使用四分树分割也或其它类型的分割的视频译码器。The video encoder 20 and video decoder 30 can be configured to use quadtree segmentation according to HEVC, QTBT segmentation according to JEM/VVC, or other segmentation structures. For illustrative purposes, the description of the present invention is presented with respect to QTBT segmentation. However, it should be understood that the technology of the present invention can also be applied to video decoders configured to use quadtree segmentation or other types of segmentation.

图2A及2B为说明实例四分树二元树(QTBT)结构130及对应译码树型单元(CTU)132的概念图。实线表示四分树分裂,且点线指示二元树分裂。在二元树的每一分裂(即,非叶)节点中,一个旗标用信号表示以指示使用哪一分裂类型(即,水平或竖直),其中在此实例中,0指示水平分裂且1指示竖直分裂。对于四分树分裂,不存在对于指示分裂类型的需要,这是由于四分树节点将块水平地及竖直地分裂成具有相等大小的4个子块。因此,视频编码器20可编码,且视频解码器30可解码用于QTBT结构130的区域树级别(即,实线)的语法元素(诸如分裂信息)及用于QTBT结构130的预测树级别(即,虚线)的语法元素(诸如分裂信息)。视频编码器20可编码,且视频解码器30可解码用于由QTBT结构130的端叶节点表示的CU的视频数据(诸如预测及变换数据)。Figures 2A and 2B are conceptual diagrams illustrating the Quadtree Binary Tree (QTBT) structure 130 and the corresponding decoding tree unit (CTU) 132. Solid lines represent quadtree splits, and dotted lines indicate binary tree splits. In each split (i.e., non-leaf) node of the binary tree, a flag is signaled to indicate which split type (i.e., horizontal or vertical) is used, where in this example, 0 indicates a horizontal split and 1 indicates a vertical split. For quadtree splits, there is no need to indicate the split type because the quadtree node splits the block horizontally and vertically into four sub-blocks of equal size. Therefore, the video encoder 20 can encode, and the video decoder 30 can decode, syntax elements (such as split information) for the region tree level (i.e., solid lines) of the QTBT structure 130 and syntax elements (such as split information) for the prediction tree level (i.e., dashed lines) of the QTBT structure 130. The video encoder 20 can encode, and the video decoder 30 can decode, video data (such as prediction and transform data) for the CU represented by the end-leaf nodes of the QTBT structure 130.

一般来说,图2B的CTU 132可与定义对应于在第一及第二级别处的QTBT结构130的节点的块的大小的参数相关联。这些参数可包含CTU大小(表示样本中的CTU 132的大小)、最小四分树大小(MinQTSize,表示最小允许四分树叶节点大小)、最大二元树大小(MaxBTSize,表示最大允许二元树根节点大小)、最大二元树深度(MaxBTDepth,表示最大允许二元树深度),及最小二元树大小(MinBTSize,表示最小允许二元树叶节点大小)。Generally, the CTU 132 in Figure 2B can be associated with parameters that define the size of the blocks corresponding to the nodes of the QTBT structure 130 at the first and second levels. These parameters may include the CTU size (representing the size of the CTU 132 in the sample), the minimum quartic tree size (MinQTSize, representing the minimum allowed quartic tree leaf node size), the maximum binary tree size (MaxBTSize, representing the maximum allowed binary tree root node size), the maximum binary tree depth (MaxBTDepth, representing the maximum allowed binary tree depth), and the minimum binary tree size (MinBTSize, representing the minimum allowed binary tree leaf node size).

对应于CTU的QTBT结构的根节点可具有在QTBT结构的第一级别处的四个子节点,所述节点中的每一者可根据四分树分割来分割。即,第一级别的节点为叶节点(不具有子节点)或具有四个子节点。QTBT结构130的实例表示诸如包含具有用于分枝的实线的父节点及子节点的节点。如果第一级别的节点不大于最大允许二元树根节点大小(MaxBTSize),那么其可通过相应二元树进一步分割。一个节点的二元树分裂可重复,直到由分裂产生的节点达到最小允许二元树叶节点大小(MinBTSize),或最大允许二元树深度(MaxBTDepth)为止。QTBT结构130的实例表示诸如具有用于分枝的虚线的节点。二元树叶节点被称为译码单元(CU),其用于预测(例如,图片内或图片间预测)及变换而无需任何进一步分割。如上文所论述,CU也可被称作“视频块”或“块”。The root node of a QTBT structure corresponding to a CTU can have four child nodes at the first level of the QTBT structure, each of which can be partitioned according to a quartic tree. That is, a node at the first level is a leaf node (without child nodes) or has four child nodes. An example of QTBT structure 130 represents a node such as one containing a parent node and child nodes with solid lines for branching. If a node at the first level is not larger than the maximum allowed binary tree root node size (MaxBTSize), then it can be further partitioned by the corresponding binary tree. A binary tree split of a node can be repeated until the node produced by the split reaches the minimum allowed binary tree leaf node size (MinBTSize) or the maximum allowed binary tree depth (MaxBTDepth). An example of QTBT structure 130 represents a node such as one with dashed lines for branching. The binary tree leaf node is called a decoding unit (CU), which is used for prediction (e.g., intra-picture or inter-picture prediction) and transformation without any further partitioning. As discussed above, a CU can also be referred to as a "video block" or "block".

在QTBT分割结构的一个实例中,CTU大小经设置为128×128(明度样本及两个对应64×64色度样本),MinQTSize经设置为16×16,MaxBTSize经设置为64×64,MinBTSize(对于宽度及高度两者)经设置为4,且MaxBTDepth经设置为4。四分树分割首先应用于CTU以产生四分树叶节点。四分树叶节点可具有从16×16(即,MinQTSize)至128×128(即,CTU大小)的大小。如果叶四分树节点为128×128,那么所述节点将不会由二元树进一步分裂,这是由于大小超过MaxBTSize(即,在此实例中,64×64)。否则,叶四分树节点将通过二元树进一步分割。因此,四分树叶节点也为二元树的根节点并具有为0的二元树深度。当二元树深度达到MaxBTDepth(在此实例中为4)时,不准许进一步分裂。当二元树节点具有等于MinBTSize(在此实例中为4)的宽度时,其意指不准许进一步水平分裂。类似地,具有等于MinBTSize的高度的二元树节点意指对于所述二元树节点不准许进一步竖直分裂。如上文所提及,二元树的叶节点被称作CU,且根据预测及变换来进一步处理而不进一步分割。In an example of a QTBT segmentation structure, the CTU size is set to 128×128 (lightness sample and two corresponding 64×64 chroma samples), the MinQTSize is set to 16×16, the MaxBTSize is set to 64×64, the MinBTSize (for both width and height) is set to 4, and the MaxBTDepth is set to 4. Quarter tree segmentation is first applied to the CTU to generate quarter leaf nodes. Quarter leaf nodes can have sizes ranging from 16×16 (i.e., MinQTSize) to 128×128 (i.e., CTU size). If a leaf quarter node is 128×128, then the node will not be further split by the binary tree because its size exceeds MaxBTSize (i.e., 64×64 in this example). Otherwise, the leaf quarter node will be further segmented by the binary tree. Therefore, the quarter leaf node is also the root node of the binary tree and has a binary tree depth of 0. When the depth of a binary tree reaches MaxBTDepth (4 in this example), further splitting is not permitted. When a binary tree node has a width equal to MinBTSize (4 in this example), it means that further horizontal splitting is not permitted. Similarly, a binary tree node with a height equal to MinBTSize means that further vertical splitting is not permitted for that binary tree node. As mentioned above, the leaf nodes of the binary tree are called CUs and are further processed according to predictions and transformations without further splitting.

视频编码器20可使用帧内预测或帧间预测来产生PU的预测性块。如果视频编码器20使用帧内预测产生PU的预测性块,那么视频编码器20可基于与PU相关联的图片的经解码样本产生PU的预测性块。Video encoder 20 can use intra-frame prediction or inter-frame prediction to generate predictive blocks of the PU. If video encoder 20 uses intra-frame prediction to generate predictive blocks of the PU, then video encoder 20 can generate predictive blocks of the PU based on decoded samples of the image associated with the PU.

如果视频编码器20使用帧间预测以产生PU的预测性块,那么视频编码器20可基于不同于与PU相关联的图片的一或多个图片的经解码样本而产生PU的预测性块。帧间预测可为单向帧间预测(即,单向预测)或双向帧间预测(即,双向预测)。为执行单向预测或双向预测,视频编码器20可产生当前切片的第一参考图片列表(RefPicList0)及第二参考图片列表(RefPicList1)。If the video encoder 20 uses inter-frame prediction to generate predictive blocks of the PU, then the video encoder 20 can generate predictive blocks of the PU based on decoded samples of one or more images different from the images associated with the PU. Inter-frame prediction can be unidirectional inter-frame prediction (i.e., one-way prediction) or bidirectional inter-frame prediction (i.e., two-way prediction). To perform unidirectional or bidirectional prediction, the video encoder 20 can generate a first reference image list (RefPicList0) and a second reference image list (RefPicList1) for the current slice.

参考图片列表中的每一者可包含一或多个参考图片。当使用单向预测时,视频编码器20可搜索RefPicList0及RefPicList1中的任一者或两者中的参考图片,以确定参考图片内的参考位置。此外,当使用单向预测时,视频编码器20可至少部分地基于对应于参考位置的样本产生PU的预测性样本块。此外,当使用单向预测时,视频编码器20可产生指示PU的预测块与参考位置之间的空间位移的单一运动向量。为了指示PU的预测块与参考位置之间的空间位移,运动向量可包含指定PU的预测块与参考位置之间的水平位移的水平分量,且可包含指定PU的预测块与参考位置之间的竖直位移的竖直分量。Each of the reference image lists may contain one or more reference images. When using unidirectional prediction, the video encoder 20 may search for reference images in either or both of RefPicList0 and RefPicList1 to determine a reference position within the reference image. Furthermore, when using unidirectional prediction, the video encoder 20 may generate predictive sample blocks of the PU based at least partially on samples corresponding to the reference positions. Additionally, when using unidirectional prediction, the video encoder 20 may generate a single motion vector indicating the spatial displacement between the predicted block of the PU and the reference position. To indicate the spatial displacement between the predicted block of the PU and the reference position, the motion vector may include a horizontal component of the horizontal displacement between the predicted block of the PU and the reference position, and may include a vertical component of the vertical displacement between the predicted block of the PU and the reference position.

当使用双向预测编码PU时,视频编码器20可确定RefPicList0中的参考图片中的第一参考位置,及RefPicList1中的参考图片中的第二参考位置。视频编码器20可接着至少部分地基于对应于第一及第二参考位置的样本而产生PU的预测性块。此外,当使用双向预测编码PU时,视频编码器20可产生指示PU的样本块与第一参考位置之间的空间位移的第一运动,及指示PU的预测块与第二参考位置之间的空间位移的第二运动。When using bidirectional predictive coding (PU), the video encoder 20 can determine a first reference position in a reference image in RefPicList0 and a second reference position in a reference image in RefPicList1. The video encoder 20 can then generate predictive blocks of the PU based at least in part on samples corresponding to the first and second reference positions. Furthermore, when using bidirectional predictive coding (PU), the video encoder 20 can generate a first motion indicating the spatial displacement between the sample block of the PU and the first reference position, and a second motion indicating the spatial displacement between the predictive block of the PU and the second reference position.

在一些实例中,JEM/VVC也提供仿射(affine)运动补偿模式,其可被视为帧间预测模式。在仿射运动补偿模式中,视频编码器20可确定表示非平移运动(诸如放大或缩小、旋转、透视运动或其它不规则运动类型)的两个或更多个运动向量。In some instances, JEM/VVC also provides an affine motion compensation mode, which can be viewed as an inter-frame prediction mode. In affine motion compensation mode, the video encoder 20 can determine two or more motion vectors representing non-translational motion (such as zooming in or out, rotation, perspective motion, or other irregular motion types).

在视频编码器20产生CU的一或多个PU的预测性明度块、预测性Cb块及预测性Cr块之后,视频编码器20可产生CU的明度残余块。CU的明度残余块中的各样本指示CU的预测性明度块中的一者中的明度样本与CU的原始明度译码块中的对应样本之间的差异。另外,视频编码器20可产生用于CU的Cb残余块。CU的Cb残余块中的每一样本可指示CU的预测性Cb块中的中一者中的Cb样本与CU的原始Cb译码块中的对应样本之间的差异。视频编码器20也可产生CU的Cr残余块。CU的Cr残余块中的每一样本可指示CU的预测性Cr块的中的一者中的Cr样本与CU的原始Cr译码块中的对应样本之间的差异。After the video encoder 20 generates predictive luminance blocks, predictive Cb blocks, and predictive Cr blocks for one or more PUs of the CU, the video encoder 20 can generate luminance residual blocks for the CU. Each sample in the luminance residual block of the CU indicates the difference between a luminance sample in one of the predictive luminance blocks of the CU and a corresponding sample in the original luminance decoded block of the CU. Additionally, the video encoder 20 can generate Cb residual blocks for the CU. Each sample in the Cb residual block of the CU indicates the difference between a Cb sample in one of the predictive Cb blocks of the CU and a corresponding sample in the original Cb decoded block of the CU. The video encoder 20 can also generate Cr residual blocks for the CU. Each sample in the Cr residual block of the CU indicates the difference between a Cr sample in one of the predictive Cr blocks of the CU and a corresponding sample in the original Cr decoded block of the CU.

此外,视频编码器20可使用四分树分割将CU的明度残余块、Cb残余块及Cr残余块分解成一或多个明度变换块、Cb变换块及Cr变换块。变换块可为于其上应用相同变换的样本的矩形块。CU的变换单元(TU)可包括明度样本的变换块、色度样本的两个对应变换块及用以对变换块样本进行变换的语法结构。在单色图片或具有三个单独色彩平面的图片中,TU可包括单一变换块,及用以对变换块样本进行变换的语法结构。因此,CU的每一TU可与明度变换块、Cb变换块及Cr变换块相关联。与TU相关联的亮度变换块可为CU的亮度残余块的子块。Cb变换块可为CU的Cb残余块的子块。Cr变换块可为CU的Cr残余块的子块。Furthermore, the video encoder 20 can use a quadtree partitioning method to decompose the luminance residual block, Cb residual block, and Cr residual block of the CU into one or more luminance transform blocks, Cb transform blocks, and Cr transform blocks. A transform block can be a rectangular block of samples on which the same transform is applied. A transform unit (TU) of the CU can include a transform block of luminance samples, two corresponding transform blocks of chrominance samples, and a syntax structure for transforming the transform block samples. In a monochrome image or an image with three separate color planes, a TU can include a single transform block and a syntax structure for transforming the transform block samples. Therefore, each TU of the CU can be associated with a luminance transform block, a Cb transform block, and a Cr transform block. The luminance transform block associated with a TU can be a sub-block of the luminance residual block of the CU. The Cb transform block can be a sub-block of the Cb residual block of the CU. The Cr transform block can be a sub-block of the Cr residual block of the CU.

视频编码器20可将一或多个变换应用于TU的明度变换块以产生TU的明度系数块。系数块可为变换系数的二维阵列。变换系数可为标量。视频编码器20可将一或多个变换应用至TU的Cb变换块以产生TU的Cb系数块。视频编码器20可将一或多个变换应用于TU的Cr变换块,以产生TU的Cr系数块。The video encoder 20 can apply one or more transforms to the luminance transform block of the TU to generate a luminance coefficient block of the TU. The coefficient block can be a two-dimensional array of transform coefficients. The transform coefficients can be scalars. The video encoder 20 can apply one or more transforms to the Cb transform block of the TU to generate a Cb coefficient block of the TU. The video encoder 20 can apply one or more transforms to the Cr transform block of the TU to generate a Cr coefficient block of the TU.

在产生系数块(例如,明度系数块、Cb系数块或Cr系数块)之后,视频编码器20可将系数块量化。量化通常是指将变换系数量化以可能地减少用以表示变换系数的数据的量从而提供进一步压缩的过程。此外,视频编码器20可对变换系数进行反量化,并将反变换应用于变换系数,以便重构建图片的CU的TU的变换块。视频编码器20可使用CU的TU的经重构建变换块及CU的PU的预测性块来重构建CU的译码块。通过重构建图片的每一CU的译码块,视频编码器20可重构建图片。视频编码器20可将经重构建图片存储于经解码图片缓冲器(DPB)中。视频编码器20可将DPB中的经重构建的图片用于进行帧间预测及帧内预测。After generating coefficient blocks (e.g., brightness coefficient blocks, Cb coefficient blocks, or Cr coefficient blocks), the video encoder 20 can quantize the coefficient blocks. Quantization typically refers to the process of quantizing transform coefficients to reduce the amount of data used to represent the transform coefficients as much as possible, thereby providing further compression. Furthermore, the video encoder 20 can inverse-quantize the transform coefficients and apply the inverse transform to the transform coefficients to reconstruct the transform blocks of the TU of the CU of the image. The video encoder 20 can use the reconstructed transform blocks of the TU of the CU and the predictive blocks of the PU of the CU to reconstruct the decoded blocks of the CU. By reconstructing the decoded blocks of each CU of the image, the video encoder 20 can reconstruct the image. The video encoder 20 can store the reconstructed image in a decoded picture buffer (DPB). The video encoder 20 can use the reconstructed image in the DPB for inter-frame prediction and intra-frame prediction.

在视频编码器20量化系数块之后,视频编码器20可对指示经量化变换系数的语法元素进行熵编码。举例来说,视频编码器20可对指示经量化变换系数的语法元素执行上下文自适应二进制算术译码(CABAC)。视频编码器20可在位流中输出经熵编码的语法元素。Following the quantization coefficient block in video encoder 20, video encoder 20 may perform entropy encoding on the syntax elements indicating the quantized transform coefficients. For example, video encoder 20 may perform context-adaptive binary arithmetic decoding (CABAC) on the syntax elements indicating the quantized transform coefficients. Video encoder 20 may output the entropy-encoded syntax elements in the bitstream.

视频编码器20可输出包含形成经译码图片及相关联数据的表示的位序列的位流。所述位流可包括网络抽象层(NAL)单元的序列。NAL单元中的每一者包含NAL单元标头且囊封原始字节序列有效负载(RBSP)。NAL单元标头可包括指示NAL单元类型码的语法元素。通过NAL单元的NAL单元标头指定的NAL单元类型码指示NAL单元的类型。RBSP可为含有囊封在NAL单元内的整数数目个字节的语法结构。在一些情况下,RBSP包含零个位。The video encoder 20 can output a bitstream containing a bit sequence that forms a representation of a decoded image and associated data. The bitstream may include a sequence of Network Abstraction Layer (NAL) units. Each NAL unit contains a NAL unit header and encapsulates a Raw Byte Sequence Payload (RBSP). The NAL unit header may include syntax elements indicating the NAL unit type code. The NAL unit type code, specified by the NAL unit header, indicates the type of the NAL unit. The RBSP may be a syntax structure containing an integer number of bytes encapsulated within the NAL units. In some cases, the RBSP contains zero bits.

不同类型的NAL单元可囊封不同类型的RBSP。举例来说,第一类型的NAL单元可囊封图片参数集(PPS)的RBSP,第二类型的NAL单元可囊封经译码切片的RBSP,第三类型的NAL单元可囊封补充增强信息(SEI)的RBSP等等。PPS为可含有适用于零或多个完整经译码图片的语法元素的语法结构。封装视频译码数据的RBSP(与参数集合及SEI讯息的RBSP相对)的NAL单元可被称作视频译码层(VCL)NAL单元。囊封经译码切片的NAL单元在本文中可被称作经译码切片NAL单元。用于经译码切片的RBSP可包含切片标头及切片数据。Different types of NAL units can encapsulate different types of RBSPs. For example, a first type of NAL unit can encapsulate an RBSP of a Picture Parameter Set (PPS), a second type of NAL unit can encapsulate an RBSP of decoded slices, a third type of NAL unit can encapsulate an RBSP of Supplemental Enhancement Information (SEI), and so on. A PPS is a syntax structure that can contain syntax elements applicable to zero or more complete decoded pictures. NAL units that encapsulate RBSPs of video decoded data (as opposed to RBSPs of parameter sets and SEI messages) are called Video Decoding Layer (VCL) NAL units. NAL units that encapsulate decoded slices are referred to herein as decoded slice NAL units. RBSPs used for decoded slices can contain slice headers and slice data.

视频解码器30可接收位流。此外,视频解码器30可剖析位流以从位流解码语法元素。视频解码器30可至少部分地基于从位流解码的语法元素重构建视频数据的图片。重构建视频数据的过程可与由视频编码器20执行的过程大体互逆。举例来说,视频解码器30可使用PU的运动向量确定当前CU的PU的预测性块。视频解码器30可使用PU的一或多个运动向量产生PU的预测性块。Video decoder 30 can receive a bitstream. Furthermore, video decoder 30 can parse the bitstream to decode syntax elements from it. Video decoder 30 can reconstruct images of video data, at least in part, based on the syntax elements decoded from the bitstream. The process of reconstructing the video data can be substantially the inverse of the process performed by video encoder 20. For example, video decoder 30 can use motion vectors of the PU to determine predictive blocks of the PU at the current CU. Video decoder 30 can use one or more motion vectors of the PU to generate predictive blocks of the PU.

此外,视频解码器30可反量化与当前CU的TU相关联的系数块。视频解码器30可对系数块执行反向变换以重构建与当前CU的TU相关联的变换块。通过将当前CU的PU的预测性样本块的样本添加到当前CU的TU的变换块的对应样本,视频解码器30可重构建当前CU的译码块。通过重构建图片的各CU的译码块,视频解码器30可重构建图片。视频解码器30可将经解码图片存储于经解码图片缓冲器中,以用于输出及/或用于在解码其它图片过程中使用。Furthermore, the video decoder 30 can dequantize the coefficient block associated with the TU of the current CU. The video decoder 30 can perform an inverse transform on the coefficient block to reconstruct the transform block associated with the TU of the current CU. By adding samples of the predictive sample block of the PU of the current CU to the corresponding samples of the transform block of the TU of the current CU, the video decoder 30 can reconstruct the decoded block of the current CU. By reconstructing the decoded blocks of each CU of the image, the video decoder 30 can reconstruct the image. The video decoder 30 can store the decoded image in a decoded image buffer for output and/or use during the decoding of other images.

预期下一代视频应用将对表示具有HDR及/或(WCG的经捕捉景物的视频数据进行操作。所利用动态范围及色域的参数为视频内容的两个独立属性,且出于数字电视及多媒体服务的目的,其规范由若干国际标准界定。举例来说,ITU-R Rec.BT.709,“制作与国际项目交流的高清晰度电视标准的参数值(Parameter values for the HDTV standards forproduction and international programme exchange)”定义用于HDTV(高清晰度电视)的参数,诸如标准动态范围(SDR)及标准色域,且ITU-R Rec.BT.2020,“制作与国际项目交流的超高清晰度电视系统的参数值(Parameter values for ultra-high definitiontelevision systems for production and international programme exchange)”指定诸如HDR及WCG的UHDTV(超高清晰度电视)参数。也存在其它标准开发组织(SDO)文献,其指定其它系统中的动态范围及色域属性,例如DCI-P3色域经定义于SMPTE-231-2(运动图片及电视工程师协会)中且HDR的一些参数经定义于STMPTE-2084中。在下文中提供视频数据的动态范围及色域的简要描述。The next generation of video applications is expected to operate on video data representing captured scenes with HDR and/or WCG. The parameters utilized, dynamic range and color gamut, are two independent attributes of video content, and their specifications are defined by several international standards for digital television and multimedia services. For example, ITU-R Rec.BT.709, "Parameter values for the HDTV standards for production and international program exchange," defines parameters used for HDTV (High-Definition Television), such as standard dynamic range (SDR) and standard color gamut, and ITU-R Rec.BT.2020, " The "Parameter values for ultra-high definition television systems for production and international program exchange" specifies UHDTV (Ultra-High Definition Television) parameters such as HDR and WCG. Other standards development organization (SDO) documents also exist, specifying dynamic range and color gamut attributes for other systems; for example, the DCI-P3 color gamut is defined in SMPTE-231-2 (Institute of Moving Picture and Television Engineers), and some HDR parameters are defined in SMPTE-2084. A brief description of the dynamic range and color gamut of video data is provided below.

动态范围通常经定义为视频信号的最大亮度与最小亮度(例如,照度)之间的比。也可以“f光阑”为单位量测动态范围,其中一个f光阑对应于信号动态范围的加倍。在MPEG的定义中,HDR内容为以大于16个f光阑的亮度变化为特征的内容。在一些术语中,10个f光阑与16个f光阑之间的水平被视为中间动态范围,但在其它定义中被视为HDR。在本发明的一些实例中,HDR视频内容可为相较于传统使用的具有标准动态范围的视频内容(例如,如通过ITU-R Rec.BT.709所指定的视频内容)具有较高动态范围的任何视频内容。Dynamic range is typically defined as the ratio between the maximum and minimum luminance (e.g., illuminance) of a video signal. Dynamic range can also be measured in "f-stops," where one f-stop corresponds to doubling the signal's dynamic range. In the MPEG definition, HDR content is characterized by a luminance variation of more than 16 f-stops. In some terminology, the level between 10 and 16 f-stops is considered intermediate dynamic range, but in other definitions it is considered HDR. In some embodiments of the invention, HDR video content can be any video content with a higher dynamic range than conventionally used video content with a standard dynamic range (e.g., video content specified by ITU-R Rec.BT.709).

人类视觉系统(HVS)能够感知比SDR内容及HDR内容大很多的动态范围。然而,HVS包含调适机构,其将HVS的动态范围缩窄至所谓的同时范围。同时范围的宽度可取决于当前照明条件(例如,当前亮度)。由HDTV的SDR、UHDTV的预期HDR及HVS动态范围提供的动态范围的观测展示于图3中,但精确范围可基于每一个人及显示器而改变。The human visual system (HVS) can perceive a much larger dynamic range than SDR and HDR content. However, the HVS includes an adaptation mechanism that narrows the HVS's dynamic range to what is known as the simultaneous range. The width of the simultaneous range can depend on current lighting conditions (e.g., current brightness). The observations of the dynamic range provided by the SDR of HDTV, the expected HDR of UHDTV, and the HVS dynamic range are shown in Figure 3, but the precise range can vary based on each individual and display.

一些实例视频应用及服务由ITU Rec.709调节且提供SDR,其通常支持每m2大约0.1至100烛光(cd)的范围的亮度(例如,照度)(常常被称作“尼特(nit)”),从而导致小于10个f光阑。预期一些实例下一代视频服务将提供至多16个f光阑的动态范围。尽管用于此内容的详细规格目前正在研发,但一些初始参数已于SMPTE-2084及ITU-R Rec.2020中予以指定。Some instance video applications and services are regulated by ITU Rec.709 and provide SDR, which typically supports a range of luminance (e.g., illuminance) (often referred to as "nits") of approximately 0.1 to 100 candela per square meter, resulting in less than 10 f-stops. Some instance next-generation video services are expected to offer a dynamic range of up to 16 f-stops. Although detailed specifications for this content are currently under development, some initial parameters have been specified in SMPTE-2084 and ITU-R Rec.2020.

除HDR以外,更逼真视频体验的另一方面是色彩维度。色彩维度通常由色域定义。图4为展示SDR色域(基于BT.709色彩原色的三角形100)及用于UHDTV的较广色域(基于BT.2020色彩原色的三角形102)的概念图。图3也描绘所谓的光谱轨迹(由舌片形状的区域104定界),从而表示天然色的界限。如图3所说明,从BT.709(三角形100)移动到BT.2020(三角形102),色彩原色旨在提供具有约多于70%的色彩的UHDTV服务。D65指定用于BT.709及/或BT.2020规范的实例白色。Besides HDR, another aspect of a more realistic video experience is the color dimension. The color dimension is typically defined by the color gamut. Figure 4 is a conceptual diagram illustrating the SDR color gamut (triangle 100 based on BT.709 color primaries) and the wider color gamut used in UHDTV (triangle 102 based on BT.2020 color primaries). Figure 3 also depicts the so-called spectral locus (bounded by tongue-shaped region 104), thus representing the boundaries of natural colors. As illustrated in Figure 3, moving from BT.709 (triangle 100) to BT.2020 (triangle 102), the color primaries are designed to provide UHDTV services with approximately more than 70% color. D65 specifies an example of white for the BT.709 and/or BT.2020 specifications.

用于DCI-P3、BT.709及BT.202色彩空间的色域规范的实例展示于表1中。Examples of color gamut specifications for the DCI-P3, BT.709, and BT.202 color spaces are shown in Table 1.

表1-色域参数Table 1 - Color Gamut Parameters

如表1中可见,色域可由白点的X及Y值并由原色(例如,红(R)、绿(G)及蓝(B))的X及Y值定义。X及Y值表示色彩的色度(X)及亮度(Y),如由CIE 1931色彩空间定义。CIE 1931色彩空间定义纯色(例如,就波长来说)之间的连接及人眼如何感知此类色彩。As shown in Table 1, the color gamut can be defined by the X and Y values of the white point and by the X and Y values of the primary colors (e.g., red (R), green (G), and blue (B)). The X and Y values represent the chromaticity (X) and lightness (Y) of a color, as defined by the CIE 1931 color space. The CIE 1931 color space defines the connections between pure colors (e.g., in terms of wavelength) and how the human eye perceives such colors.

通常在每分量(甚至浮点)极高精度下(在4:4:4色度次取样格式及极宽色彩空间(例如,CIE XYZ)的情况下)获取及存储HDR/WCG视频数据。这种表示以高精度为目标且在数学上几乎无损。然而,用于存储HDR/WCG视频数据的此格式可包含大量冗余且对于压缩目的来说可能非最优的。具有基于HVS的假设的较低精确度格式通常用于目前先进技术的视频应用。HDR/WCG video data is typically acquired and stored at extremely high precision per component (even floating-point) (in 4:4:4 chroma subsampling formats and extremely wide color spaces, such as CIE XYZ). This representation aims for high precision and is mathematically virtually lossless. However, this format used to store HDR/WCG video data can contain significant redundancy and may be suboptimal for compression purposes. Lower precision formats with HVS-based assumptions are commonly used in current state-of-the-art video applications.

出于压缩目的的视频数据格式转换过程的一个实例包含三个主要过程,如图5中所示。图5的所述技术可由源装置12执行。线性RGB数据110可为HDR/WCG视频数据且可以浮点表示存储。可使用用于动态范围压缩的非线性转移函数(TF)112来压缩线性RGB数据110。转移函数112可使用任何数目的非线性转移函数(例如,如SMPTE-2084中所定义的PQ TF)来压缩线性RGB数据110。在一些实例中,色彩转换过程114将经压缩的数据转换成较适合于由混合型视频编码器压缩的更紧密或稳固的色彩空间(例如,YUV或YCrCb色彩空间)。接着使用浮点至整数表示量化单元116将此数据量化以产生经转换HDR'数据118。在此实例中,HDR'数据118呈整数表示。现今HDR'数据呈较适合于由混合型视频编码器(例如,应用HEVC技术的视频编码器20)来压缩的格式。图5中所描绘的过程的次序是作为一实例给出,且在其它应用中可改变。举例来说,色彩转换可先于TF过程。在一些实例中,例如空间次取样的额外处理可应用于色彩分量。An example of a video data format conversion process for compression purposes comprises three main processes, as shown in Figure 5. The technique described in Figure 5 can be performed by source device 12. Linear RGB data 110 may be HDR/WCG video data and may be stored in floating-point representation. The linear RGB data 110 can be compressed using a non-linear transfer function (TF) 112 for dynamic range compression. The transfer function 112 can use any number of non-linear transfer functions (e.g., PQ TF as defined in SMPTE-2084) to compress the linear RGB data 110. In some instances, color conversion process 114 converts the compressed data into a more compact or robust color space (e.g., YUV or YCrCb color space) suitable for compression by a hybrid video encoder. This data is then quantized using a floating-point to integer representation quantization unit 116 to produce converted HDR' data 118. In this example, the HDR' data 118 is represented as an integer. The HDR' data is now in a format suitable for compression by a hybrid video encoder (e.g., a video encoder 20 using HEVC technology). The order of processes depicted in Figure 5 is given as an example and may be changed in other applications. For instance, color conversion may precede the TF process. In some instances, additional processing, such as spatial subsampling, may be applied to the color components.

在解码器侧处的反转换描绘于图6中。图6的技术可由目的地装置14执行。经转换HDR'数据120可在目的地装置14处经由使用混合视频解码器(例如,应用HEVC技术的视频解码器30)解码视频数据而获得。接着可由反量化单元122来反量化HDR'数据120。接着可将反色彩转换过程124应用于经反量化HDR'数据。反色彩转换过程124可为色彩转换过程114的逆转。举例来说,反色彩转换过程124可将HDR'数据从YCrCb格式转换回至RGB格式。接下来,可将反转移函数126应用于数据以添加回由转移函数112压缩的动态范围,从而重建线性RGB数据128。The inverse conversion at the decoder side is depicted in Figure 6. The technique of Figure 6 can be performed by the destination device 14. The converted HDR' data 120 can be obtained at the destination device 14 by decoding video data using a hybrid video decoder (e.g., a video decoder 30 applying HEVC technology). The HDR' data 120 can then be dequantized by the dequantization unit 122. The inverse color conversion process 124 can then be applied to the dequantized HDR' data. The inverse color conversion process 124 can be the reverse of the color conversion process 114. For example, the inverse color conversion process 124 can convert the HDR' data from YCrCb format back to RGB format. Next, the inverse transfer function 126 can be applied to the data to add back the dynamic range compressed by the transfer function 112, thereby reconstructing the linear RGB data 128.

现将更详细地论述图5中所描绘的技术。将呈现于图像容器中的数字值映射至光能且从光能映射所述数字值可能涉及“转移函数”的使用。一般来说,将转移函数应用于数据(例如HDR/WCG视频数据)以压缩数据的动态范围。这种压缩允许用较少位来表示数据。在一个实例中,转移函数可为一维(1D)非线性函数且可反映终端用户显示器的电光转移函数(EOTF)的倒数,例如,如针对ITU-R BT.1886中的SDR所指定(也于Rec.709中所定义)。在另一实例中,转移函数可近似对亮度变换的HVS感知,例如,SMPTE-2084中针对HDR所指定的PQ转移函数。OETF的反向过程为EOTF(电光转移函数),其将代码级别映射回至照度。图7展示用以压缩某些色彩容器的动态范围的非线性转移函数的若干实例。所述转移函数也可单独地应用于每一R、G及B分量。The techniques depicted in Figure 5 will now be discussed in more detail. Mapping digital values presented in an image container to and from light energy may involve the use of a “transfer function.” Generally, a transfer function is applied to data (e.g., HDR/WCG video data) to compress the dynamic range of the data. This compression allows data to be represented with fewer bits. In one instance, the transfer function may be a one-dimensional (1D) nonlinear function and may reflect the inverse of the electro-optical transfer function (EOTF) of the end-user display, for example, as specified for SDR in ITU-R BT.1886 (also defined in Rec.709). In another instance, the transfer function may approximate the HVS perception of luminance changes, for example, the PQ transfer function specified for HDR in SMPTE-2084. The reverse process of OETF is the EOTF (electro-optical transfer function), which maps the code level back to illuminance. Figure 7 shows several examples of nonlinear transfer functions used to compress the dynamic range of certain color containers. The transfer function may also be applied individually to each R, G, and B component.

ITU-R推荐BT.1886中所指定的参考EOTF是由以下方程式定义:The reference EOTF specified in ITU-R Recommendation BT.1886 is defined by the following equation:

L=a(max[V+b),0])γ L=a(max[V+b),0]) γ

其中:in:

L:以cd/m2为单位的屏幕照度L: Screen illuminance in cd/m2

LW:白色的屏幕照度L W : Illuminance of white screen

LB:黑色的屏幕照度L B : Illuminance of a black screen

V:输入视频信号级别(经标准化,在V=0处为黑色,在V=1处成白色)。对于按推荐ITU-RBT.709掌握的内容,10位数字码值“D”按以下方程式映射成V值:V=(D-64)/876V: Input video signal level (normalized, black at V=0, white at V=1). For content mastered according to ITU-R BT.709, the 10-bit digital code value "D" is mapped to the V value according to the following equation: V = (D-64)/876

γ:功率函数的指数,γ=2.404γ: The exponent of the power function, γ = 2.404

a:用户增益的变量(传统“对比”控制)a: Variables related to user gain (traditional "contrast" control)

a=(LW 1/γ-LB 1/γ)γ a=(L W 1/γ -L B 1/γ ) γ

b:用户黑色级别上升的变量(传统“亮度”控制)b: Variables that increase the user's black level (traditional "brightness" control)

以上变量a及b可通过解出以下方程式而导出,使得V=1得出L=LW且使得V=0得到L=LBThe variables a and b above can be derived by solving the following equations, such that V = 1 yields L = L W , and such that V = 0 yields L = L B :

LB=a·bγ L B = a·b γ

LW=a·(1+b)γ L W = a·(1+b) γ

为了更有效率地支持较高动态范围,SMPTE最近已标准化被称作SMPTE ST-2084的新转移函数。ST2084的规范如下定义EOTF应用。将TF应用于标准化线性R、G、B值,这导致R'G'B'的非线性表示。ST-2084通过NORM=10000定义标准化,其与10000尼特(cd/m2)的峰值亮度相关联。To more efficiently support higher dynamic range, SMPTE has recently standardized a new transfer function called SMPTE ST-2084. The ST-2084 specification defines the EOTF application as follows. Applying TF to the standardized linear R, G, B values results in a non-linear representation of R'G'B'. ST-2084 defines normalization by NORM = 10000, which is associated with a peak luminance of 10000 nits (cd/m²).

οR'=PQ_TF(max(0,min(R/NORM,1)))οR'=PQ_TF(max(0,min(R/NORM,1)))

οG'=PQ_TF(max(0,min(G/NORM,1)))οG'=PQ_TF(max(0,min(G/NORM,1)))

οB'=PQ_TF(max(0,min(B/NORM,1)))οB'=PQ_TF(max(0,min(B/NORM,1)))

其中in

通常,EOTF经定义为具有浮点准确度的函数,因此如果应用反向TF(所谓的OETF),那么无错误经引入至具有此非线性的信号。ST-2084中所指定的反向TF(OETF)经定义为inversePQ函数:Typically, EOTF is defined as a function with floating-point accuracy; therefore, if an inverse TF (so-called OETF) is applied, then error-free input is introduced to the signal with this non-linearity. The inverse TF (OETF) specified in ST-2084 is defined as the inversePQ function:

οR=10000*inversePQ_TF(R')οR=10000*inversePQ_TF(R')

οG=10000*inversePQ_TF(G')οG=10000*inversePQ_TF(G')

οB=10000*inversePQ_TF(B')οB=10000*inversePQ_TF(B')

其中in

应注意,EOTF及OETF为非常活跃的研究及标准化的个体,且一些视频译码系统中所利用的TF可不同于ST-2084。It should be noted that EOTF and OETF are very active research and standardization entities, and the TF used in some video decoding systems may differ from ST-2084.

在本发明的上下文中,术语“信号值”或“色彩值”可用于描述对应于图像元件的特定色彩分量(诸如,R、G、B或Y)的值的照度水平。信号值通常表示线性光阶(照度值)。术语“代码级别”或“数字码值”可指图像信号值的数字表示。通常,此数字表示表示非线性信号值。EOTF表示提供到显示装置(例如,显示装置32)的非线性信号值与由显示装置产生的线性色彩值之间的关系。In the context of this invention, the terms "signal value" or "color value" can be used to describe the illuminance level corresponding to the value of a specific color component (such as R, G, B, or Y) of an image element. The signal value typically represents a linear illuminance level (illuminance value). The terms "code level" or "digital code value" can refer to a digital representation of the image signal value. Typically, this digital representation indicates a non-linear signal value. EOTF represents the relationship between the non-linear signal value provided to the display device (e.g., display device 32) and the linear color value generated by the display device.

RGB数据通常被用作输入色彩空间,这是因为RGB是通常通过图像捕捉感测器产生的数据类型。然而,RGB色彩空间在其分量当中具有高冗余且对于紧密表示来说并非最优的。为实现更紧密且更稳固的表示,RGB分量通常经转换(例如,执行色彩变换)到更适合于压缩的更不相关色彩空间(例如,YCbCr)。YCbCr色彩空间分离不同的较不相关分量中的呈照度(Y)形式的亮度及色彩信息(CrCb)。在此上下文中,稳固表示可指在以受限位速率进行压缩时特征为较高阶错误弹性的色彩空间。RGB data is commonly used as the input color space because it is a data type typically generated by image capture sensors. However, the RGB color space has high redundancy in its components and is not optimal for compact representation. To achieve a more compact and robust representation, RGB components are often transformed (e.g., by performing color transformations) to a less correlated color space (e.g., YCbCr) that is more suitable for compression. The YCbCr color space separates the luminance (Y) and color information (CrCb) in the form of illumination (Y) from the different less correlated components. In this context, a robust representation can refer to a color space that is characterized by higher-order error resilience when compressed at a limited bit rate.

对于现代视频译码系统,通常使用的色彩空位为YCbCr,如ITU-R BT.709中所指定。BT.709标准中的YCbCr色彩空间指定从R'G'B'至Y'CbCr的以下转换过程(非恒定照度表示):For modern video decoding systems, the commonly used color space is YCbCr, as specified in ITU-R BT.709. The YCbCr color space in the BT.709 standard specifies the following conversion process from R'G'B' to Y'CbCr (in non-constant illuminance representation):

a.Y'=0.2126*R'+0.7152*G'+0.0722*B'a.Y'=0.2126*R'+0.7152*G'+0.0722*B'

b.b.

c.c.

以上过程也可使用避免Cb及Cr分量的除法的以下近似大致转换来实施:The above process can also be implemented using the following approximate transformation to avoid division of Cb and Cr components:

a.Y'=0.212600*R'+0.715200*G'+0.072200*B'a.Y'=0.212600*R'+0.715200*G'+0.072200*B'

b.Cb=-0.114572*R'-0.385428*G'+0.500000*B'b.Cb=-0.114572*R'-0.385428*G'+0.500000*B'

c.Cr=0.500000*R'-0.454153*G'-0.045847*B'c.Cr=0.500000*R'-0.454153*G'-0.045847*B'

ITU-R BT.2020标准指定从R'G'B'至Y'CbCr(非恒定照度表示)的以下转换过程:The ITU-R BT.2020 standard specifies the following conversion process from R'G'B' to Y'CbCr (non-constant illuminance representation):

a.Y'=0.2627*R'+0.6780*G'+0.0593*B'a.Y'=0.2627*R'+0.6780*G'+0.0593*B'

b.b.

c.c.

以上过程也可使用避免Cb及Cr分量的除法的以下近似大致转换来实施:The above process can also be implemented using the following approximate transformation to avoid division of Cb and Cr components:

a.Y'=0.262700*R'+0.678000*G'+0.059300*B'a.Y'=0.262700*R'+0.678000*G'+0.059300*B'

b.Cb=-0.139630*R'-0.360370*G'+0.500000*B'b.Cb=-0.139630*R'-0.360370*G'+0.500000*B'

c.Cr=0.500000*R'-0.459786*G'-0.040214*B'c.Cr=0.500000*R'-0.459786*G'-0.040214*B'

在色彩变换之后,仍然可以高位深度(例如,浮点准确度)来表示目标色彩空间中的输入数据。可例如使用量化过程将高位深度数据转换为目标位深度。某些研究展示,10位至12位准确度结合PQ转移足以提供具有低于恰可辨差异(JND)的失真的16f光阑的HDR数据。一般来说,JND是为了使差异可辨(例如,通过HVS)而必须改变的某物(例如,视频数据)的量。以10个位准确度表示的数据可进一步通过目前先进技术视频译码解决方案的大部分译码。此量化为有损译码的元素且为引入到经转换数据的不准确度的来源。After color transformation, the input data in the target color space can still be represented at a high bit depth (e.g., floating-point accuracy). This high bit depth data can be converted to the target bit depth, for example, using a quantization process. Some studies have shown that 10- to 12-bit accuracy combined with PQ transfer is sufficient to provide HDR data with a 16f aperture and distortion below just distinguishable difference (JND). Generally, JND is the amount of something (e.g., video data) that must be changed to make the difference distinguishable (e.g., via HVS). Data represented at 10-bit accuracy can be further decoded by most current state-of-the-art video decoding solutions. This quantization is an element of lossy decoding and a source of inaccuracy introduced into the converted data.

应用到目标色彩空间(在此实例中,YCbCr)中的码字的此类量化的实例展示在下文中。将以浮点准确度表示的输入值YCbCr转换成Y值的固定位深度BitDepthY及色度值(Cb、Cr)的固定位深度BitDepthC的信号。An example of such quantization applied to codewords in the target color space (YCbCr in this example) is shown below. The input value YCbCr, expressed in floating-point accuracy, is converted into a signal with a fixed bit depth of BitDepthY for the Y value and a fixed bit depth of BitDepthC for the chromaticity values (Cb, Cr).

οDY′=Clip1Y(Round((1<<(BitDepthY-8))*(219*Y′+16)))οD Y′ =Clip1 Y (Round((1<<(BitDepth Y -8))*(219*Y′+16)))

οDCb=Clip1C(Round((1<<(BitDepthC-8))*(224*Cb+128)))οD Cb =Clip1 C (Round((1<<(BitDepth C -8))*(224*Cb+128)))

οDCr=Clip1C(Round((1<<(BitDepthC-8))*(224*Cr+128)))οD Cr =Clip1 C (Round((1<<(BitDepth C -8))*(224*Cr+128)))

其中in

Round(x)=Sign(x)*Floor(Abs(x)+0.5)Round(x)=Sign(x)*Floor(Abs(x)+0.5)

如果x<0,那么Sign(x)=-1;如果x=0,那么Sign(x)=0;如果x>0,那么Sign(x)=1If x < 0, then Sign(x) = -1; if x = 0, then Sign(x) = 0; if x > 0, then Sign(x) = 1.

Floor(x)小于或等于x的最大整数Floor(x) is the largest integer less than or equal to x.

如果x>=0,那么Abs(x)=x;如果x<0,那么Abs(x)=-xIf x >= 0, then Abs(x) = x; if x < 0, then Abs(x) = -x

Clip1Y(x)=Clip3(0,(1<<BitDepthY)-1,x)Clip1 Y (x)=Clip3(0,(1<<BitDepth Y )-1,x)

Clip1C(x)=Clip3(0,(1<<BitDepthC)-1,x)Clip1 C (x)=Clip3(0,(1<<BitDepth C )-1,x)

如果z<x,那么Clip3(x,y,z)=x;如果z>y,那么Clip3(x,y,z)=y;否则Clip3(x,y,z)=z现将描述速率失真优化量化器(RDOQ)。大部分目前先进技术的视频译码解决方案(例如,HEVC及开发中的VVC)是基于所谓的混合视频译码方案,其基本上是应用变换系数的标量量化,所述变换系数产生于又通过在当前经译码视频信号与在解码器侧处可用的参考图片之间应用时间或空间预测产生的残余信号。标量量化应用于编码器侧(例如视频编码器20)上且反标量解量化应用于解码器侧(例如视频解码器30)上。有损标量量化将失真引入到经重构建信号且需要某些数目的位以将经量化变换系数以及译码模式描述传递到解码器侧。If z < x, then Clip3(x,y,z) = x; if z > y, then Clip3(x,y,z) = y; otherwise, Clip3(x,y,z) = z. The Rate Distortion Optimized Quantizer (RDOQ) will now be described. Most current state-of-the-art video decoding solutions (e.g., HEVC and the developing VVC) are based on so-called hybrid video decoding schemes, which essentially apply scalar quantization to transform coefficients generated from the residual signal through temporal or spatial prediction between the current decoded video signal and a reference image available at the decoder side. Scalar quantization is applied to the encoder side (e.g., video encoder 20) and inverse scalar dequantization is applied to the decoder side (e.g., video decoder 30). Lossy scalar quantization introduces distortion into the reconstructed signal and requires a certain number of bits to pass the quantized transform coefficients and the decoding mode description to the decoder side.

在视频压缩技术的演进期间,已开发目标为经量化系数计算的改进的多种方法。一种方法为速率失真优化量化(RDOQ),其是基于修改的RD成本的粗略估计或所选择变换系数或变换系数群组的移除。RDOQ的目的为找到表示经编码块中的残余数据的经量化变换系数的理想或最理想集合。RDOQ计算经编码块中的图像失真(通过变换系数的量化引入)及编码对应的经量化变换系数所需的位的数目。基于此两个值,编码器通过计算RD成本来选择较好系数值。During the evolution of video compression technology, various methods have been developed aimed at improving the calculation of quantized coefficients. One such method is Rate Distortion Optimized Quantization (RDOQ), which is based on a rough estimate of the modified RD cost or the removal of selected transform coefficients or groups of transform coefficients. The goal of RDOQ is to find an ideal or optimal set of quantized transform coefficients that represent the residual data in the coded block. RDOQ calculates the image distortion in the coded block (introduced by the quantization of the transform coefficients) and the number of bits required to encode the corresponding quantized transform coefficients. Based on these two values, the encoder selects better coefficient values by calculating the RD cost.

编码器中的RDOQ可包含3个阶段:变换系数的量化、系数群组(CG)的消除及最后非零系数的选择。在第一阶段,视频编码器通过没有零值区的均匀量化器产生变换系数,其引起对当前变换系数的级别值计算。在此之后,视频编码器考虑此经量化系数的两个额外量值:级别-1及0。对于这些3个选项{级别、级别-1、0}中的每一者,视频编码器计算编码具有所选量值的系数的RD成本并选择具有最低RD成本的系数。另外,一些RDOQ实施可考虑使变换系数群组完全无效,或通过减少所述群组中的每一者的最后用信号表示的系数的位置来减小用信号表示的变换系数群组的大小。在解码器侧处,将反标量量化应用于从位流的语法元素导出的经量化变换系数。The RDOQ in the encoder can comprise three stages: quantization of the transform coefficients, elimination of coefficient groups (CGs), and selection of the last non-zero coefficients. In the first stage, the video encoder generates transform coefficients using a uniform quantizer without zero-value regions, which causes the calculation of the level value for the current transform coefficient. After this, the video encoder considers two additional values for this quantized coefficient: level -1 and 0. For each of these three options {level, level -1, 0}, the video encoder calculates the RD cost of encoding the coefficients with the selected values and selects the coefficient with the lowest RD cost. Additionally, some RDOQ implementations may consider completely invalidating the transform coefficient groups, or reducing the size of the transformed coefficient groups represented by signals by decreasing the position of the last signaled coefficient in each of the groups. On the decoder side, inverse scalar quantization is applied to the quantized transform coefficients derived from the syntax elements of the bitstream.

用于视频译码的现有转移函数及色彩变换中的一些可产生视频数据表示,所述视频数据表示的特征为在信号表示的动态范围内的恰可辨差异(JND)阈值的显著变化。即,与明度及/或色度分量的码字值的其它范围相比,明度及/或色度分量的码字值的一些范围可具有不同JND阈值。对于此类表示,在明度值的动态范围内均匀(例如,在明度的所有码字值内均匀)的量化方案将引入量化误差,具有不同的在信号片段(动态范围的分割区)内的人类感知优点。对信号的此类影响可解释为在经处理数据范围内具有产生不相等信号对噪声比的非均匀量化的处理系统。Some existing transfer functions and color transformations used for video decoding can produce video data representations characterized by significant variations in the just-discriminate difference (JND) threshold within the dynamic range of the signal representation. That is, some ranges of codeword values for the luminance and/or chrominance components may have different JND thresholds compared to other ranges of codeword values for the luminance and/or chrominance components. For such representations, a quantization scheme that is uniform across the dynamic range of luminance values (e.g., uniform across all codeword values of luminance) will introduce quantization errors, resulting in different perceptual advantages within the signal segments (segments of the dynamic range). This effect on the signal can be interpreted as a processing system with non-uniform quantization that produces unequal signal-to-noise ratios within the processed data range.

此类表示的实例为于非恒定照度(NCL)YCbCr色彩空间中表示的视频信号,其中色彩原色为ITU-R Rec.BT.2020中所定义,且具有ST-2084转移函数。如表2中所说明,NCLYCbCR色彩空间为信号的低强度值分配显著较大量码字,例如,30%的码字表示线性光样本<10尼特,而高强度样本(高亮度)是用更小量的码字表示,例如,25%的码字是分配给在1000至10000尼特的范围内的线性光。因此,特征为对数据的所有范围进行均匀量化的视频译码系统(例如H.265/HEVC)将引入更多严重译码伪影到高强度样本(信号的明亮区域),其中引入到低强度样本(相同信号的黑暗区域)的失真将远低于可辨差异。An example of this representation is a video signal represented in the Non-Constant Illumination (NCL) YCbCr color space, where the color primaries are as defined in ITU-R Rec.BT.2020 and have the ST-2084 transfer function. As illustrated in Table 2, the NCLYCbCr color space assigns a significantly larger number of codewords to the low intensity values of the signal; for example, 30% of the codewords represent linear light samples <10 nits, while high intensity samples (high brightness) are represented with a smaller number of codewords; for example, 25% of the codewords are assigned to linear light in the range of 1000 to 10000 nits. Therefore, video decoding systems characterized by uniform quantization across the entire range of data (e.g., H.265/HEVC) will introduce more severe decoding artifacts to high intensity samples (bright areas of the signal), while the distortion introduced to low intensity samples (dark areas of the same signal) will be far less noticeable.

表2.线性光强度与SMPTE ST 2084中的码值之间的关系(位度=10)Table 2. Relationship between linear optical intensity and code value in SMPTE ST 2084 (bits = 10)

<![CDATA[线性光强度(cd/m<sup>2</sup>)]]><![CDATA[Linear light intensity (cd/m<sup>2</sup>)]]> 全范围Full range SDI范围SDI range 窄范围Narrow range ~0.01~0.01 21twenty one 2525 8383 ~0.1~0.1 6464 6767 119119 ~1~1 153153 156156 195195 ~10~10 307307 308308 327327 ~100~100 520520 520520 509509 ~1,000~1,000 769769 767767 723723 ~4,000~4,000 923923 920920 855855 ~10,000~10,000 10231023 10191019 940940

实际上,这意味着视频译码系统设计或编码算法可得益于调整每一所选择的视频数据表示,即,每一所选择的转移函数及色彩空间。先前已提出以下方法来解决关于上述非最佳感知质量码字分布的问题。In practice, this means that the design of video decoding systems or encoding algorithms can benefit from adjusting each chosen video data representation, i.e., each chosen transfer function and color space. The following methods have previously been proposed to address the problem of the aforementioned suboptimal perceptual quality codeword distribution.

在2015年9月D.Rusanovskyy、A.K.Ramasubramonian、D.Bugdayci、S.Lee,J.Sole、M.Karczewicz的VCEG文件COM16-C 1027-E“动态范围调整SEI以使得能够利用后向兼容能力进行高动态范围视频译码(Dynamic Range Adjustment SEI to enable High DynamicRange video coding with Backward-Compatible Capability)”中,作者提出在视频译码之前将码字再分布应用于视频数据。ST-2084/BT.2020表示中的视频数据在视频压缩之前经历码字再分布。再分布经由动态范围调整引入在数据的动态范围内的感知到的失真(信号对噪声比)的线性化。得出这种再分布是为了在位速率约束下改进视觉质量。为了补偿再分布且将数据转换成原始ST 2084/BT.2020表示,在视频解码之后将反向过程应用于数据。In their September 2015 VCEG paper COM16-C 1027-E, "Dynamic Range Adjustment SEI to enable High Dynamic Range video coding with Backward-Compatible Capability," D. Rusanovskyy, A.K. Ramasubramonian, D. Bugdayci, S. Lee, J. Sole, and M. Karczewicz proposed applying codeword redistribution to video data prior to video decoding. The video data in the ST-2084/BT.2020 representation undergoes codeword redistribution before video compression. This redistribution, via dynamic range adjustment, introduces a linearization of the perceived distortion (signal-to-noise ratio) within the dynamic range of the data. This redistribution is intended to improve visual quality under bit-rate constraints. To compensate for redistribution and convert the data back to its original ST 2084/BT.2020 representation, a reverse process is applied to the data after video decoding.

此方法的一个缺陷为,预处理及后处理通常是与由目前先进技术的编码器在基于块的基础上所采用的速率失真优化处理分离。因此,VCEG文件COM16-C 1027-E中所描述的技术并不会采用对解码器可用的信息,诸如由视频编解码器的量化方案引入的量化失真的目标帧速率。One drawback of this method is that preprocessing and postprocessing are typically separated from the rate distortion optimization processes employed by encoders using current state-of-the-art techniques on a block-based basis. Therefore, the techniques described in the VCEG document COM16-C 1027-E do not utilize information available to the decoder, such as the target frame rate of quantization distortion introduced by the video codec's quantization scheme.

在2015年9月J.Zhao、S.-H.Kim、A.Segall、K.Misra的VCEG文件COM16-C 1030-E的“高动态范围及广色域视频译码技术的性能调查(Performance investigation of highdynamic range and wide color gamut video coding techniques)”中,提出用以对准位速率分配及应用于Y2020(ST2084/BT2020)与Y709(BT1886/BT 2020)表示上的视频译码之间的视觉感知失真的强度相依空间变化(基于块的)量化方案。观察到为了维持相同级别的用于明度分量的量化,在Y2020及Y709中的信号的量化相差取决于明度的值,以使得:In the VCEG document COM16-C 1030-E, "Performance investigation of high dynamic range and wide color gamut video coding techniques," published in September 2015 by J. Zhao, S.-H. Kim, A. Segall, and K. Misra, a block-based quantization scheme was proposed to address the intensity-dependent spatial variation (block-based) quantization of visually perceived distortion between video decoding applied to Y 2020 (ST2084/BT2020) and Y 709 (BT1886/BT 2020) representations, for aligning bit rate allocation. It was observed that to maintain the same level of quantization for the luminance component, the quantization phase difference of the signals in Y 2020 and Y 709 depends on the luminance value, such that:

QP_Y2020=QP_Y709-f(Y2020)QP_Y 2020 =QP_Y 709 -f(Y 2020 )

函数f(Y2020)被认为对于在Y2020中的视频的强度值(亮度等级)为线性,且所述函数可近似于:The function f(Y 2020 ) is considered to be linear with respect to the intensity values (luminance levels) of the video in Y 2020 , and the function can be approximated as:

f(Y2020)=max(0.03*Y2020-3,0)f(Y 2020 )=max(0.03*Y 2020 -3,0)

所提出的在编码阶段引入的空间变化量化方案被认为能够改进在ST 2084/BT.2020表示中的经译码视频信号的视觉感知信号对量化噪声比。The proposed spatial variation quantization scheme introduced during the encoding stage is believed to improve the visual perception signal-to-quantization noise ratio of decoded video signals in ST 2084/BT.2020 representation.

此方法的一个缺陷为QP调适的基于块的粒度。通常,在编码器侧处为了压缩选择的所利用块大小是经由速率失真优化过程导出,且可不表示视频信号的动态范围特性,因此所选择的QP设置对于块内部的信号将为次优的。此问题可能对于倾向于采用预测及较大维度的变换块大小的下一代视频译码系统变得甚至更加重要。此设计的另一方面为将QP调适参数用信号表示到解码器侧以供于反向解量化的需要。另外,在编码器侧的量化参数的空间调适增加编码优化的复杂性且可干扰速率控制算法。One drawback of this approach is the block-based granularity of QP tuning. Typically, the block size selected for compression at the encoder side is derived through a rate distortion optimization process and may not represent the dynamic range characteristics of the video signal; therefore, the selected QP setting will be suboptimal for the signal within the block. This problem may become even more significant for next-generation video decoding systems that tend to employ predictive and larger-dimensional transform block sizes. Another aspect of this design is to represent the QP tuning parameters as signals on the decoder side for the needs of inverse dequantization. Furthermore, spatial tuning of quantization parameters at the encoder side increases the complexity of coding optimization and can interfere with the rate control algorithm.

在“在HEVC中应用的强度相关空间量化(Intensity dependent spatialquantization with application in HEVC)”(纳斯卡·马泰奥(Matteo Naccari)及马克·马尔塔(Marta Mrak),电气与电子工程师协会会刊(Proc.of IEEE),2013多媒体与世博国际会议,2013年7月)中,提出强度相依空间量化(IDSQ)感知机构。IDSQ利用人类视觉系统的强度遮蔽并在感知上调整在块级别处的信号量化。此论文的作者提出采用环路内像素域缩放。用于当前经处理块的环路内缩放的参数是从经预测块中的明度分量的平均值导出。在解码器侧,执行反缩放,且解码器从在解码器侧可用的经预测块而导出缩放的参数。The paper "Intensity-dependent spatial quantization with application in HEVC" (Matteo Naaccari and Marta Marak, Proc. of IEEE, 2013 International Conference on Multimedia and Expo, July 2013) proposes an intensity-dependent spatial quantization (IDSQ) perceptual mechanism. IDSQ utilizes the intensity occlusion of the human visual system and perceptually adjusts the signal quantization at the block level. The authors propose employing in-loop pixel-domain scaling. The parameters for in-loop scaling of the current processed block are derived from the average of the luminance components in the predicted block. On the decoder side, inverse scaling is performed, and the decoder derives the scaling parameters from the predicted block available on the decoder side.

类似于“高动态范围及广色域视频译码技术的性能调查(Performanceinvestigation of high dynamic range and wide color gamut video codingtechniques)”中的技术,此方法的基于块的粒度归因于应用于经处理块的所有样本的次优缩放参数而限定本方法的性能。所提出的解决方案的另一方面在于,分度值是从经预测块导出且并不反映可能在当前编解码器块与经预测块之间发生的信号波动。Similar to the techniques in the "Performance investigation of high dynamic range and wide color gamut video coding techniques," the block-based granularity of this method is limited by the suboptimal scaling parameters applied to all samples of the processed block, thus constraining its performance. Another aspect of the proposed solution is that the gradation value is derived from the predicted block and does not reflect signal fluctuations that may occur between the current codec block and the predicted block.

“下一代容器的解量化及缩放(De-quantization and scaling for nextgeneration containers)”(J.Zhao、A.Segall,S.-H.Kim、K.Misra(Sharp),JVET文件B0054,2016年1月)解决ST.2084/BT.2020表示中的不均匀感知失真的问题。作者提出采用环路内强度相依基于块的变换域缩放。用于当前经处理块的所选变换系数(AC系数)的环路内缩放的参数是根据经预测块中的明度分量的平均值及针对当前块导出的DC值而导出。在解码器侧,执行反缩放,且解码器从在解码器侧为可用的经预测块且从用信号表示到解码器的经量化DC值而导出AC系数缩放的参数。The paper "De-quantization and scaling for next-generation containers" (J. Zhao, A. Segall, S.-H. Kim, K. Misra (Sharp), JVET document B0054, January 2016) addresses the problem of non-uniform perceived distortion in ST.2084/BT.2020 representations. The authors propose an in-loop intensity-dependent block-based transform domain scaling. The parameters for in-loop scaling of the selected transform coefficients (AC coefficients) of the currently processed block are derived from the average value of the luminance components in the predicted block and the DC value derived for the current block. On the decoder side, inverse scaling is performed, and the decoder derives the AC coefficient scaling parameters from the predicted block available on the decoder side and from the quantized DC value represented by the signal to the decoder.

类似于“高动态范围及广色域视频译码技术的性能调查(Performanceinvestigation of high dynamic range and wide color gamut video codingtechniques)”及“在HEVC中应用的强度相关空间量化(Intensity dependent spatialquantization with application in HEVC)”中的技术,这种方法的基于块的粒度归因于应用于经处理块的所有样本的缩放缩放参数的次优性而限定本方法的性能。所提出的解决方案的另一方面在于,分度值仅应用于AC变换系数。因此,信号对噪声比改进不会影响DC值,其降低方案的性能。另外,在一些视频译码系统设计中,在AC值缩放时,例如,在量化过程之后为一连串变换操作的情况下,经量化DC值可能不可用。此提议的另一限制为当编码器为当前块选择变换跨越或变换/量化旁路模式时,归因于排除此两种模式的潜在译码增益,因此并不应用次优的缩放(因此,在解码器处,缩放并不经定义以用于变换跨越及变换/量化旁路模式)。Similar to the techniques in "Performance investigation of high dynamic range and wide color gamut video coding techniques" and "Intensity-dependent spatial quantization with application in HEVC," the block-based granularity of this method limits its performance due to the suboptimal scaling parameters applied to all samples of the processed block. Another aspect of the proposed solution is that the gradation value is applied only to the AC transform coefficients. Therefore, the improvement in the signal-to-noise ratio does not affect the DC value, which degrades the scheme's performance. Furthermore, in some video decoding system designs, the quantized DC value may be unavailable when scaling the AC value, for example, in cases where a series of transform operations follow the quantization process. Another limitation of this proposal is that when the encoder selects transform-span or transform/quantization bypass mode for the current block, suboptimal scaling is not applied due to the exclusion of potential decoding gain for these two modes (therefore, at the decoder, scaling is not defined for transform-span and transform/quantization bypass modes).

在2017年5月15日提交的美国专利申请案第15/595,793号中,描述用于具有非均匀分布的JD的视频信号的环路内样本处理。本专利申请案描述在像素域、残余域或变换域中表示的信号样本的分度及偏移的应用。提出用于导出分度及偏移的若干算法。U.S. Patent Application No. 15/595,793, filed May 15, 2017, describes in-loop sample processing for video signals with a non-uniformly distributed JD. This patent application describes the application of indexing and offsetting of signal samples represented in the pixel domain, residual domain, or transform domain. Several algorithms for deriving the indexing and offset are proposed.

本发明描述可应用于视频译码系统的视频译码环路(例如,在视频编码及/或解码过程期间而非在预处理或后处理期间)的若干视频译码及处理技术。本发明的技术包含编码器侧(例如视频编码器20)算法,其具有在未对量化参数(例如,由δQP语法元素表示的量化参数的变化)进行显式信令的情况下的内容自适应空间变化量化以更高效压缩HDR/WCG视频信号。本发明的技术也包含解码器侧(例如视频解码器30)操作,其改进使用量化参数信息的视频解码工具的性能。此类解码工具的实例可包含解块滤波器、双向滤波器、自适应环路滤波器或使用量化信息作为输入的其它视频译码工具。This invention describes several video decoding and processing techniques applicable to video decoding loops in video decoding systems (e.g., during video encoding and/or decoding processes, but not during preprocessing or postprocessing). The techniques of this invention include encoder-side (e.g., video encoder 20) algorithms with content-adaptive spatial variation quantization to more efficiently compress HDR/WCG video signals without explicit signaling of quantization parameters (e.g., variations in quantization parameters represented by δQP syntax elements). The techniques of this invention also include decoder-side (e.g., video decoder 30) operations that improve the performance of video decoding tools that use quantization parameter information. Examples of such decoding tools may include deblocking filters, bidirectional filters, adaptive loop filters, or other video decoding tools that use quantization information as input.

视频编码器20及/或视频解码器30可经配置以独立地或以与其它技术的任何组合的方式执行以下技术中的一或多者。The video encoder 20 and/or video decoder 30 may be configured to perform one or more of the following technologies independently or in any combination with other technologies.

在本发明的一个实例中,视频编码器20可经配置以针对视频数据图片中的每一视频数据块执行多阶段量化过程。下文所描述的技术可应用于视频数据的明度及色度分量两者。视频解码器可经配置以使用基础量化参数(QPb)值执行量化。即,跨越所有块均匀地应用QPb值。对于经提供至待应用于经译码块Cb的样本s(Cb)的变换量化的给定基础量化参数(QPb)值,视频编码器20可进一步经配置以利用内容相依QP偏移作为与QPb值的偏差。即,对于每一视频数据块,或对于视频数据块的群组,视频编码器20可进一步确定基于块群组中的块的内容的QP偏移。In one embodiment of the invention, the video encoder 20 may be configured to perform a multi-stage quantization process for each video data block in a video data picture. The techniques described below can be applied to both the luminance and chrominance components of the video data. The video decoder may be configured to perform quantization using a base quantization parameter (QPb) value. That is, the QPb value is applied uniformly across all blocks. For a given base quantization parameter (QPb) value provided to the transform quantization of the sample s (Cb) to be applied to the decoded block Cb, the video encoder 20 may be further configured to utilize a content-dependent QP offset as a deviation from the QPb value. That is, for each video data block, or for a group of video data blocks, the video encoder 20 may further determine a QP offset based on the content of the blocks in the block group.

以此方式,视频编码器20可考虑由实际上不同的量化参数(QPe)产生的量化级别LevelX的速率失真优化(RDO)选择。在本发明中,QPe可被称为有效量化参数。QPe为QP偏移(δQP)加基础QPb值。视频编码器20可使用以下方程式导出当前块Cb的QPe:In this way, the video encoder 20 can consider the rate distortion optimization (RDO) selection of the quantization level LevelX, which is generated by practically different quantization parameters (QPe). In this invention, QPe can be referred to as the effective quantization parameter. QPe is the QP offset (δQP) plus the base QPb value. The video encoder 20 can derive the QPe of the current block Cb using the following equation:

QPe(Cb)=QPb(Cb)+deltaQP(s(Cb)),with deltaQP>0   (1)QPe(Cb)=QPb(Cb)+deltaQP(s(Cb)),with deltaQP>0 (1)

其中δQP(Cb)变量是从经译码块Cb的局部特性(例如统计)导出。举例来说,视频编码器20可经配置以使用块Cb的样本值(例如明度或色度值)的平均值的函数导出块Cb的δQP值。在其它实例中,视频编码器20可使用块Cb的样本值值的其它函数来确定δQP值。举例来说,视频编码器20可使用对块Cb的样本值的二阶操作(例如差异)确定δQP值。作为另一实例,视频编码器20可使用块Cb的样本值及邻近块的一或多个样本值的函数确定δQP值。如将在下文更详细地解释,视频编码器20可经配置以利用QPb值及QPe值两者量化块Cb的残余值。因此,针对当前经译码块Cb导出的残余数据r(Cb)是用量化参数QPb译码。然而,首先使用量化参数QPe产生引入至残余的失真,从而产生变换量化系数tq(Cb)。由于QPe在块之间可有所不同,故视频编码器20可调整存在于一些色彩表示中存在的变化的JND阈值,并提供非均匀量化。The variable δQP(Cb) is derived from local characteristics (e.g., statistics) of the decoded block Cb. For example, video encoder 20 can be configured to derive the δQP value of block Cb as a function of the average of the sample values (e.g., luminance or chrominance values) of block Cb. In other instances, video encoder 20 can determine the δQP value using other functions of the sample values of block Cb. For example, video encoder 20 can determine the δQP value using a second-order operation (e.g., difference) on the sample values of block Cb. As another example, video encoder 20 can determine the δQP value as a function of the sample values of block Cb and one or more sample values from neighboring blocks. As will be explained in more detail below, video encoder 20 can be configured to quantize the residual value of block Cb using both the QPb value and the QPe value. Therefore, the residual data r(Cb) derived for the current decoded block Cb is decoded using the quantization parameter QPb. However, the distortion introduced into the residual is first generated using the quantization parameter QPe, resulting in the transformed quantization coefficient tq(Cb). Since QPe can vary between blocks, the video encoder 20 can adjust the JND threshold, which exists in some color representations, and provide non-uniform quantization.

在视频解码器30处,经量化变换系数tq(Cb)经历利用基础量化参数QPb的反量化。视频解码器30可从与当前块Cb相关联的语法元素导出基础量化参数QPb。视频解码器30可接收经编码视频位流中的语法元素。视频解码器30接着可对经反量化的变换系数执行一或多个反变换以建立经解码残余。视频解码器30接着可执行预测过程(例如帧间预测或帧内预测)以产生当前块Cb的经解码样本d(Cb)。At video decoder 30, the quantized transform coefficients tq(Cb) undergo inverse quantization using the underlying quantization parameter QPb. Video decoder 30 can derive the underlying quantization parameter QPb from the syntax elements associated with the current block Cb. Video decoder 30 can receive syntax elements from the encoded video bitstream. Video decoder 30 can then perform one or more inverse transforms on the inverse-quantized transform coefficients to construct the decoded residual. Video decoder 30 can then perform a prediction process (e.g., inter-frame prediction or intra-frame prediction) to produce a decoded sample d(Cb) of the current block Cb.

应注意,当重构建块的残余值时,视频解码器30并未使用有效量化参数QPe。因此,残余中仍有在编码期间应用QPe时通过视频编码器20引入的失真,由此改进关于某些色彩空间的不均匀JND阈值问题,如上文所论述。然而,考虑到残余信号的特征为通过量化参数QPe引入的失真,所述量化参数QPe大于在位流传达且与当前Cb相关联的QPb值,可调整依赖于由位流提供以用于削弱其操作的QP参数的其它解码工具(例如环路内滤波、熵解码等)以改进其性能。此调整是通过向所考虑的译码工具提供由视频编码器20应用至Cb的实际QPe的估计来进行。如下文将更详细地解释,视频解码器30可经配置以从经解码样本d(Cb)的统计及位流的其它参数导出有效量化参数QPe的估计。以此方式,当QPe的逐块值未在位流中用信号表示时,节省了位开销。It should be noted that when reconstructing the residual values of blocks, the video decoder 30 does not use the effective quantization parameter QPe. Therefore, the residuals still contain distortion introduced by the video encoder 20 when QPe was applied during encoding, thereby improving the non-uniform JND thresholding problem for certain color spaces, as discussed above. However, considering that the residual signal is characterized by distortion introduced by the quantization parameter QPe, which is greater than the QPb value conveyed in the bitstream and associated with the current Cb, other decoding tools (e.g., in-loop filtering, entropy decoding, etc.) that rely on the QP parameter provided by the bitstream to weaken their operation can be adjusted to improve their performance. This adjustment is made by providing the decoder under consideration with an estimate of the actual QPe applied to Cb by the video encoder 20. As will be explained in more detail below, the video decoder 30 can be configured to derive an estimate of the effective quantization parameter QPe from the statistics of the decoded sample d(Cb) and other parameters of the bitstream. In this way, bit overhead is saved when the block-by-block value of QPe is not represented as a signal in the bitstream.

以下部分提供本发明的技术的实施的非限制性实例。首先,将描述视频编码器20编码器侧算法的结构的实例。The following sections provide non-limiting examples of implementing the technology of the present invention. First, an example of the structure of the encoder-side algorithm of the video encoder 20 will be described.

图8为说明可实施本发明的技术的视频编码器20的实例的框图。如图8中所展示,视频编码器20接收待编码的视频帧内的视频数据的当前视频块。根据本发明的技术,由视频编码器20接收的视频数据可为HDR及/或WCG视频数据。在图8的实例中,视频编码器20包含模式选择单元40、视频数据存储器41、DPB 64、求和器50、变换处理单元52、量化单元54及熵编码单元56。模式选择单元40又包含运动补偿单元44、运动估计单元42、帧内预测处理单元46及分割单元48。为了视频块重构建,视频编码器20也包含反量化单元58、反变换处理单元60及求和器62。也可包含解块滤波器(图8中未展示)以便对块边界进行滤波,以从经重构建视频中移除块效应伪影。若需要,解块滤波器将通常对求和器62的输出进行滤波。除了解块滤波器外,也可使用额外滤波器(环路内或环路后)。为简洁起见未展示这些滤波器,但若需要,这些滤波器可对求和器50的输出进行滤波(作为环路内滤波器)。Figure 8 is a block diagram illustrating an example of a video encoder 20 on which the technology of the present invention can be implemented. As shown in Figure 8, the video encoder 20 receives the current video block of video data within a video frame to be encoded. According to the technology of the present invention, the video data received by the video encoder 20 may be HDR and/or WCG video data. In the example of Figure 8, the video encoder 20 includes a mode selection unit 40, a video data memory 41, a DPB 64, a summer 50, a transform processing unit 52, a quantization unit 54, and an entropy coding unit 56. The mode selection unit 40 further includes a motion compensation unit 44, a motion estimation unit 42, an intra-frame prediction processing unit 46, and a segmentation unit 48. For video block reconstruction, the video encoder 20 also includes an inverse quantization unit 58, an inverse transform processing unit 60, and a summer 62. A deblocking filter (not shown in Figure 8) may also be included to filter block boundaries to remove block artifacts from the reconstructed video. If necessary, the deblocking filter will typically filter the output of the summer 62. In addition to the deblocking filter, additional filters (inside or after the loop) can also be used. These filters are not shown for simplicity, but they can be used to filter the output of summer 50 (as in-loop filters) if needed.

视频数据存储器41可存储待由视频编码器20的组件编码的视频数据。存储于视频数据存储器41中的视频数据可(例如)从视频源18获得。经解码图片缓冲器64可为参考图片存储器,其存储参考视频数据以供视频编码器20例如以帧内或帧间译码模式在编码视频数据时使用。视频数据存储器41及经解码图片缓冲器64可由多种存储器装置中的任一者形成,诸如,动态随机存取存储器(DRAM)(包含同步DRAM(SDRAM))、磁阻式RAM(MRAM)、电阻式RAM(RRAM)或其它类型的存储器装置。可由同一存储器装置或单独存储器装置提供视频数据存储器41及经解码图片缓冲器64。在各种实例中,视频数据存储器41可与视频编码器20的其它组件一起在芯片上,或相对于那些组件在芯片外。Video data memory 41 can store video data to be encoded by components of video encoder 20. The video data stored in video data memory 41 can be obtained, for example, from video source 18. Decoded picture buffer 64 can be a reference picture memory that stores reference video data for use by video encoder 20, for example, when encoding video data in intra-frame or inter-frame decoding modes. Video data memory 41 and decoded picture buffer 64 can be formed of any of a variety of memory devices, such as dynamic random access memory (DRAM) (including synchronous DRAM (SDRAM)), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. Video data memory 41 and decoded picture buffer 64 can be provided by the same memory device or separate memory devices. In various instances, video data memory 41 can be on-chip along with other components of video encoder 20, or off-chip relative to those components.

在编码处理期间,视频编码器20接收待译码的视频帧或切片。可将所述帧或切片分成多个视频块。运动估计单元42及运动补偿单元44执行所接收的视频块相对于一或多个参考帧中的一或多个块的帧间预测性译码以提供时间预测。帧内预测处理单元46可替代地执行所接收视频块相对于在与待译码的块相同的帧或切片中的一或多个相邻块的帧内预测性译码以提供空间预测。视频编码器20可执行多个译码遍次,(例如)以选择用于每一视频数据块的适当译码模式。During encoding processing, video encoder 20 receives video frames or slices to be decoded. The frames or slices can be divided into multiple video blocks. Motion estimation unit 42 and motion compensation unit 44 perform inter-frame predictive decoding of the received video blocks relative to one or more blocks in one or more reference frames to provide temporal prediction. Intra-frame prediction processing unit 46 may alternatively perform intra-frame predictive decoding of the received video blocks relative to one or more adjacent blocks in the same frame or slice as the block to be decoded to provide spatial prediction. Video encoder 20 may perform multiple decoding passes, for example, to select an appropriate decoding mode for each video data block.

此外,分割单元48可基于对先前译码遍次中的先前分割方案的评估而将视频数据块分割成子块。举例来说,分割单元48最初可将帧或切片分割成LCU,且基于位率-失真分析(例如,速率-失真优化)来将所述LCU中的每一者分割成子CU。模式选择单元40可进一步产生指示将LCU分割为子CU的四分树数据结构。四分树的叶节点CU可包含一或多个PU及一或多个TU。在其它实例中,分割单元48可根据QTBT分割结构分割输入视频数据。Furthermore, segmentation unit 48 can segment video data blocks into sub-blocks based on an evaluation of previous segmentation schemes in previous decoding passes. For example, segmentation unit 48 can initially segment frames or slices into LCUs, and further segment each of the LCUs into sub-CUs based on bitrate-distortion analysis (e.g., rate-distortion optimization). Mode selection unit 40 can further generate a quadtree data structure indicating the segmentation of LCUs into sub-CUs. The leaf nodes of the quadtree, CUs, can contain one or more PUs and one or more TUs. In other instances, segmentation unit 48 can segment the input video data according to a QTBT segmentation structure.

模式选择单元40可(例如)基于误差结果而选择译码模式(帧内或帧间)中的一者,且将所得经帧内译码块或经帧间译码块提供至求和器50以产生残余块数据,及提供到求和器62以重构建经编码块以用作参考帧。模式选择单元40也将语法元素(诸如运动向量、帧内模式指示符、分区信息及其它此类语法信息)提供到熵编码单元56。The mode selection unit 40 may, for example, select a decoding mode (intra-frame or inter-frame) based on the error result, and provide the resulting intra-frame decoded block or inter-frame decoded block to the summer 50 to generate residual block data, and to the summer 62 to reconstruct the coded block for use as a reference frame. The mode selection unit 40 also provides syntax elements (such as motion vectors, intra-frame mode indicators, partition information, and other such syntax information) to the entropy coding unit 56.

运动估计单元42及运动补偿单元44可高度集成,但出于概念目的而单独说明。由运动估计单元42执行的运动估计为产生运动向量的过程,所述运动向量估计视频块的运动。举例来说,运动向量可指示在当前视频帧或图片内的视频块的PU相对于在参考图片(或其它经译码单元)内的预测性块相对于在所述当前图片(或其它经译码单元)内正经译码的当前块的位移。预测性块为就像素差来说被发现紧密地匹配待译码块的块,所述像素差可通过绝对差总和(SAD)、平方差总和(SSD)或其它差度量确定。在一些实例中,视频编码器20可计算存储于经解码图片缓冲器64中的参考图片的次整数像素位置的值。举例来说,视频编码器20可内插参考图片的四分之一像素位置、八分之一像素位置或其它分数像素位置的值。因此,运动估计单元42可执行关于全像素位置及分数像素位置的运动搜索且输出具有分数像素精确度的运动向量。Motion estimation unit 42 and motion compensation unit 44 can be highly integrated, but are described separately for conceptual purposes. Motion estimation performed by motion estimation unit 42 is a process of generating motion vectors that estimate the motion of video blocks. For example, a motion vector may indicate the displacement of a video block's PU within the current video frame or picture relative to a predictive block within a reference picture (or other decoded unit) relative to the current block being decoded within the current picture (or other decoded unit). The predictive block is a block found to closely match the block to be decoded in terms of pixel differences, which can be determined by sum of absolute differences (SAD), sum of squared differences (SSD), or other difference metrics. In some instances, video encoder 20 may calculate the value of the second-integer pixel position of the reference picture stored in decoded picture buffer 64. For example, video encoder 20 may interpolate the value of a quarter-pixel position, an eighth-pixel position, or other fractional pixel position of the reference picture. Therefore, motion estimation unit 42 can perform motion search with respect to full-pixel positions and fractional pixel positions and output motion vectors with fractional pixel accuracy.

运动估计单元42通过比较PU的位置与参考图片的预测性块的位置而计算经帧间译码切片中的视频块的PU的运动向量。参考图片可选自第一参考图片列表(列表0)或第二参考图片列表(列表1),其中的每一者识别存储于经解码图片缓冲器64中的一或多个参考图片。运动估计单元42将所计算运动向量发送到熵编码单元56及运动补偿单元44。The motion estimation unit 42 calculates the motion vector of the PU in the inter-frame decoded slice by comparing the position of the PU with the position of the predictive block of the reference image. The reference image can be selected from a first reference image list (list 0) or a second reference image list (list 1), each of which identifies one or more reference images stored in the decoded image buffer 64. The motion estimation unit 42 sends the calculated motion vector to the entropy coding unit 56 and the motion compensation unit 44.

由运动补偿单元44执行的运动补偿可涉及基于由运动估计单元42确定的运动向量提取或产生预测性块。再次,在一些实例中,运动估计单元42与运动补偿单元44可在功能上集成。在接收当前视频块的PU的运动向量之后,运动补偿单元44可在参考图片列表中的一者中定位运动向量所指向的预测性块。求和器50通过从正经译码的当前视频块的像素值减去预测性块的像素值来形成残余视频块,从而形成像素差值,如下文所论述。一般来说,运动估计单元42执行关于明度分量的运动估计,且运动补偿单元44将基于所述明度分量计算的运动向量用于色度分量与明度分量两者。模式选择单元40也可产生与视频块及视频切片相关联的语法元素以供视频解码器30在解码视频切片的视频块时使用。Motion compensation performed by motion compensation unit 44 may involve extracting or generating predictive blocks based on motion vectors determined by motion estimation unit 42. Again, in some instances, motion estimation unit 42 and motion compensation unit 44 may be functionally integrated. After receiving the motion vector of the PU for the current video block, motion compensation unit 44 may locate the predictive block to which the motion vector points in one of the reference image lists. Summer 50 forms a residual video block by subtracting the pixel values of the predictive block from the pixel values of the decoded current video block, thus forming a pixel difference, as discussed below. Generally, motion estimation unit 42 performs motion estimation with respect to the luminance component, and motion compensation unit 44 uses motion vectors calculated based on the luminance component for both the chroma and luminance components. Mode selection unit 40 may also generate syntax elements associated with video blocks and video slices for use by video decoder 30 when decoding video slices.

如上文所描述,作为由运动估计单元42及运动补偿单元44所执行的帧间预测的替代,帧内预测处理单元46可对当前块进行帧内预测。确切地说,帧内预测处理单元46可确定帧内预测模式用来编码当前块。在一些实例中,帧内预测处理单元46可(例如)在单独编码遍次期间使用各种帧内预测模式来编码当前块,且帧内预测处理单元46(或在一些实例中为模式选择单元40)可从所测试模式选择适当帧内预测模式来使用。As described above, instead of the inter-frame prediction performed by the motion estimation unit 42 and the motion compensation unit 44, the intra-frame prediction processing unit 46 may perform intra-frame prediction for the current block. Specifically, the intra-frame prediction processing unit 46 may determine an intra-frame prediction mode to use for encoding the current block. In some instances, the intra-frame prediction processing unit 46 may (e.g.) use various intra-frame prediction modes to encode the current block during individual encoding passes, and the intra-frame prediction processing unit 46 (or, in some instances, the mode selection unit 40) may select an appropriate intra-frame prediction mode from the tested modes for use.

举例来说,帧内预测处理单元46可使用针对各种所测试帧内预测模式的速率-失真分析来计算速率-失真值,且在所测试模式间选择具有最佳速率-失真特性的帧内预测模式。速率-失真分析通常确定经编码块与原始未经编码块(其经编码以产生经编码块)之间的失真(或误差)量,以及用以产生经编码块的位速率(即,位的数目)。帧内预测处理单元46可从各种经编码块的失真及速率计算比率以确定哪一帧内预测模式展现所述块的最佳速率-失真值。For example, the intra-prediction processing unit 46 can use rate-distortion analysis for various tested intra-prediction modes to calculate rate-distortion values and select the intra-prediction mode with the best rate-distortion characteristics among the tested modes. Rate-distortion analysis typically determines the amount of distortion (or error) between the coded block and the original uncoded block (which is encoded to produce the coded block), as well as the bit rate (i.e., the number of bits) used to produce the coded block. The intra-prediction processing unit 46 can calculate the ratio from the distortion and rate of various coded blocks to determine which intra-prediction mode exhibits the best rate-distortion value for the block.

在选择用于块的帧内预测模式后,帧内预测单元46可将指示用于块的所选帧内预测模式的信息提供至熵编码单元56。熵编码单元56可编码指示所选帧内预测模式的信息。视频编码器20可在所发射的位流中包含以下各者:配置数据,其可包含多个帧内预测模式索引表及多个经修改的帧内预测模式索引表(也称作码字映射表);各种块的编码上下文的定义;及待用于所述上下文中的每一者的最可能的帧内预测模式、帧内预测模式索引表及经修改的帧内预测模式索引表的指示。After selecting an intra-prediction mode for a block, the intra-prediction unit 46 may provide information indicating the selected intra-prediction mode for the block to the entropy coding unit 56. The entropy coding unit 56 may encode the information indicating the selected intra-prediction mode. The video encoder 20 may include in the transmitted bitstream: configuration data, which may include multiple intra-prediction mode index tables and multiple modified intra-prediction mode index tables (also called codeword maps); definitions of the coding contexts for various blocks; and indications of the most probable intra-prediction mode to be used for each of the contexts, the intra-prediction mode index tables, and the modified intra-prediction mode index tables.

视频编码器20通过从正经译码的原始视频块减去来自模式选择单元40的预测数据而形成残余视频块(例如,当前块Cb的r1(Cb))。求和器50表示执行此减法运算的一或多个组件。变换处理单元52将变换(诸如离散余弦变换(DCT)或概念上类似的变换)应用于残余块,从而产生包括残余变换系数值的视频块。变换处理单元52可执行概念上类似于DCT的其它变换。也可使用小波变换、整数变换、子频带变换或其它类型的变换。在任何情况下,变换处理单元52将变换应用于残余块,从而产生残余变换系数块。变换可将残余信息从像素值域转换为变换域,诸如频域。变换处理单元52可将所得变换系数tCb发送至量化单元54。Video encoder 20 forms a residual video block (e.g., r1(Cb) of the current block Cb) by subtracting the predicted data from mode selection unit 40 from the properly decoded original video block. Summer 50 represents one or more components that perform this subtraction operation. Transform processing unit 52 applies a transform (such as Discrete Cosine Transform (DCT) or a conceptually similar transform) to the residual block, thereby producing a video block that includes residual transform coefficient values. Transform processing unit 52 may perform other transforms conceptually similar to DCT. Wavelet transform, integer transform, subband transform, or other types of transforms may also be used. In any case, transform processing unit 52 applies a transform to the residual block, thereby producing a residual transform coefficient block. The transform can convert residual information from the pixel value domain to the transform domain, such as the frequency domain. Transform processing unit 52 may send the resulting transform coefficients tCb to quantization unit 54.

如上文所描述,视频编码器20可从当前经译码块s(Cb)的样本及经预测样本p(Cb)(例如,来自帧间预测或帧内预测的经预测样本)产生当前经译码块Cb的残余信号r(Cb)。视频编码器20可对残余r(Cb)执行一或多种前向变换从而产生变换系数t(Cb)。视频编码器20接着可在熵编码之前量化变换系数t(Cb)。量化单元54量化变换系数以进一步减小位速率。量化过程可减小与系数中的一些或所有相关联的位深度。可通过调整量化参数来修改量化程度。在一些实例中,量化单元54可接着执行对包含经量化变换系数的矩阵的扫描。替代性地,熵编码单元56可执行扫描。As described above, video encoder 20 can generate a residual signal r(Cb) of the current decoded block Cb from samples of the current decoded block s(Cb) and predicted samples p(Cb) (e.g., predicted samples from inter-frame prediction or intra-frame prediction). Video encoder 20 can perform one or more forward transforms on the residual r(Cb) to produce transform coefficients t(Cb). Video encoder 20 can then quantize the transform coefficients t(Cb) before entropy coding. Quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process can reduce the bit depth associated with some or all of the coefficients. The degree of quantization can be modified by adjusting the quantization parameters. In some instances, quantization unit 54 can then perform a scan of a matrix containing the quantized transform coefficients. Alternatively, entropy coding unit 56 can perform the scan.

根据本发明的技术,量化单元54可经配置以对变换系数t(Cb)执行多阶段量化过程。图9为说明可实施本发明的技术的视频编码器的实例量化单元的框图。According to the technology of the present invention, the quantization unit 54 can be configured to perform a multi-stage quantization process on the transform coefficients t(Cb). Figure 9 is a block diagram illustrating an example quantization unit of a video encoder in which the technology of the present invention can be implemented.

如图9中所示,在第一阶段,QPe确定单元202可经配置以导出当前块Cb的量化参数偏移(δQP(s(Cb))。在一个实例中,QPe确定单元202可经配置以从查找表(例如LUT_DQP204)导出δQP(s(Cb))。LUT_DQP 204包含δQP值且通过从块Cb的s(Cb)样本(例如明度或色度样本)的平均值导出的索引存取。以下方程式展示导出量化参数偏移的一个实例:As shown in Figure 9, in the first stage, the QPe determination unit 202 can be configured to derive the quantization parameter offset (δQP(s(Cb))) of the current block Cb. In one instance, the QPe determination unit 202 can be configured to derive δQP(s(Cb)) from a lookup table (e.g., LUT_DQP 204). LUT_DQP 204 contains the δQP values and is accessed via an index derived from the average of s(Cb) samples (e.g., lightness or chroma samples) of block Cb. The following equation illustrates an example of deriving the quantization parameter offset:

δQP(s(Cb))=LUT_DQP(mean(s(Cb))   (2)δQP(s(Cb))=LUT_DQP(mean(s(Cb)) (2)

其中LUT_DQP为δQP(s(Cb))的查找表且mean(s(Cb))为块Cb的样本值的平均值。Where LUT_DQP is the lookup table for δQP(s(Cb)) and mean(s(Cb)) is the average of the sample values of block Cb.

在其它实例中,QPe确定单元202可经配置以通过经译码块的样本的一些其它特征或位流的特征的函数(例如,基于方差的二阶函数)导出δQP(s(Cb))的值。QPe确定单元202可经配置以使用算法、查找表确定δQP值,或可使用其它手段明确推导δQP值。在一些实例中,用以确定δQP()的样本可包含明度及色度样本两者,或更一般来说包含经译码块的一或多个分量的样本。In other instances, the QPe determination unit 202 may be configured to derive the value of δQP(s(Cb)) through a function of some other characteristic of the samples of the decoded block or the characteristics of the bitstream (e.g., a second-order function based on variance). The QPe determination unit 202 may be configured to determine the δQP value using an algorithm, a lookup table, or other means of explicitly deriving the δQP value. In some instances, the samples used to determine δQP() may include both luminance and chrominance samples, or more generally, samples containing one or more components of the decoded block.

QPe确定单元202接着可使用变量δQP(Cb)导出有效量化参数QPe,如以上方程式(1)中所示。QPe确定单元202接着可将QPe值提供至第一量化单元206及反量化单元208。在第二阶段,第一量化单元206使用经导出的QPe值对变换系数t(Cb)执行前向量化。随后,反量化单元208使用QPe值对经量化变换系数进行反量化且反变换单元210执行反变换(例如变换处理单元52的反变换)。这产生具有QPE的经引入失真的残余块r2(Cb)。以下展示所述过程的第二阶段的方程式。QPe determination unit 202 can then derive the effective quantization parameter QPe using the variable δQP(Cb), as shown in equation (1) above. QPe determination unit 202 can then provide the QPe value to the first quantization unit 206 and the dequantization unit 208. In the second stage, the first quantization unit 206 performs pre-vectorization on the transform coefficients t(Cb) using the derived QPe value. Subsequently, the dequantization unit 208 dequantizes the quantized transform coefficients using the QPe value, and the inverse transform unit 210 performs an inverse transform (e.g., the inverse transform of the transform processing unit 52). This produces a residual block r2(Cb) with introduced distortion and QPE. The equations for the second stage of the process are shown below.

r2(Cb)=InverseTrans(InverseQuant(QPe,ForwardQuant(QPe,t(Cb))))   (3)r2(Cb)=InverseTrans(InverseQuant(QPe,ForwardQuant(QPe,t(Cb)))) (3)

其中InverseTrans为反变换过程,InverseQuant为反量化过程,且ForwardQuant为前向量化过程。InverseTrans is the inverse transformation process, InverseQuant is the inverse quantization process, and ForwardQuant is the forward vectorization process.

在第三阶段,变换处理单元212对残余r2(Cb)执行一或多个前向变换(例如,与变换处理单元52相同)。随后,第二量化单元214使用基础量化参数QPb对经变换残余执行前向量化。此产生经量化变换系数tq(Cb),如以下方程式中所展示:In the third stage, transform processing unit 212 performs one or more forward transforms on the residual r2(Cb) (e.g., the same as transform processing unit 52). Subsequently, second quantization unit 214 performs pre-vectorization on the transformed residual using the basic quantization parameter QPb. This produces quantized transform coefficients tq(Cb), as shown in the following equation:

tq(Cb)=ForwardQuant(QPb,ForwardTrans(r2(Cb)))   (4)tq(Cb)=ForwardQuant(QPb,ForwardTrans(r2(Cb))) (4)

其中ForwardTrans为前向变换过程。Where ForwardTrans represents the forward transformation process.

返回至图8,在量化后,熵编码单元56对经量化变换系数tq(Cb)进行熵译码。举例来说,熵编码单元56可执行上下文自适应可变长度译码(CAVLC)、上下文自适应二进制算术译码(CABAC)、基于语法的上下文自适应二进制算术译码(SBAC)、概率区间分割熵(PIPE)译码或另一熵译码技术。在基于上下文的熵译码的情况下,上下文可基于邻近块。在由熵编码单元56进行熵译码之后,可将经编码位流发射到另一装置(例如视频解码器30),或加以存档以供稍后发射或检索。Returning to Figure 8, after quantization, entropy coding unit 56 performs entropy decoding on the quantized transform coefficients tq(Cb). For example, entropy coding unit 56 may perform context-adaptive variable-length decoding (CAVLC), context-adaptive binary arithmetic decoding (CABAC), syntax-based context-adaptive binary arithmetic decoding (SBAC), probabilistic interval partitioning entropy (PIPE) decoding, or another entropy decoding technique. In the case of context-based entropy decoding, the context may be based on neighboring blocks. After entropy decoding by entropy coding unit 56, the encoded bitstream can be transmitted to another device (e.g., video decoder 30) or archived for later transmission or retrieval.

反量化单元58及反变换单元60分别应用反量化及反变换以在像素域中重构建残余块(例如)以供稍后用作参考块。运动补偿单元44可通过将残余块添加到经解码图片缓冲器64的帧中的一者的预测性块来计算参考块。运动补偿单元44也可将一或多个内插滤波器应用至经重构建的残余块以计算用于在运动估计中使用的次整数像素值。求和器62将经重构建的残余块添加到由运动补偿单元44产生的经运动补偿的预测块以产生经重构建的视频块以用于存储于经解码图片缓冲器64中。经重构建的视频块可由运动估计单元42及运动补偿单元44用作参考块以对后续视频帧中的块进行帧间译码。Inverse quantization unit 58 and inverse transform unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct a residual block (e.g.,) in the pixel domain for later use as a reference block. Motion compensation unit 44 can calculate the reference block by adding the residual block to a predictive block of one of the frames in the decoded image buffer 64. Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for use in motion estimation. Summer 62 adds the reconstructed residual block to the motion-compensated predictive block generated by motion compensation unit 44 to produce a reconstructed video block for storage in decoded image buffer 64. The reconstructed video block can be used as a reference block by motion estimation unit 42 and motion compensation unit 44 for inter-frame decoding of blocks in subsequent video frames.

现将描述解码器侧处理的实例实施例。在解码器侧处,某些译码工具依赖于与用于对当前块或块群组进行译码的QP值相关联的量化参数。一些非限制性实例可包含:解块滤波器、双向滤波器、环路滤波器滤波器、内插滤波器、熵编解码器初始化或其它。Example embodiments of decoder-side processing will now be described. At the decoder side, some decoding tools rely on quantization parameters associated with the QP value used to decode the current block or group of blocks. Some non-limiting examples may include: deblocking filters, bidirectional filters, loop filters, interpolation filters, entropy codec initialization, or others.

图10为说明可实施本发明的技术的视频解码器30的实例的框图。在图10的实例中,视频解码器30包含熵解码单元70、视频数据存储器71、运动补偿单元72、帧内预测处理单元74、反量化单元76、反变换处理单元78、经解码图片缓冲器82、求和器80、QPe估计单元84、LUT_DQP 86及滤波器单元88。在一些实例中,视频解码器30可执行通常与关于视频编码器20(图8)所描述的编码遍次互逆的解码遍次。运动补偿单元72可基于从熵解码单元70接收的运动向量产生预测数据,而帧内预测处理单元74可基于从熵解码单元70接收的帧内预测模式指示符产生预测数据。Figure 10 is a block diagram illustrating an example of a video decoder 30 in which the technology of the present invention can be implemented. In the example of Figure 10, the video decoder 30 includes an entropy decoding unit 70, a video data memory 71, a motion compensation unit 72, an intra-frame prediction processing unit 74, an inverse quantization unit 76, an inverse transform processing unit 78, a decoded image buffer 82, a summer 80, a QPe estimation unit 84, a LUT_DQP 86, and a filter unit 88. In some examples, the video decoder 30 may perform a decoding pass that is generally the inverse of the encoding pass described with respect to the video encoder 20 (Figure 8). The motion compensation unit 72 may generate prediction data based on motion vectors received from the entropy decoding unit 70, while the intra-frame prediction processing unit 74 may generate prediction data based on intra-frame prediction mode indicators received from the entropy decoding unit 70.

视频数据存储器71可存储待由视频解码器30的组件解码的视频数据,诸如经编码视频位流。存储于视频数据存储器71中的视频数据可经由视频数据的有线或无线网络通信或通过存取物理数据存储媒体例如从计算机可读媒体16(例如,从本地视频源,诸如相机)获得。视频数据存储器71可形成存储来自经编码视频位流的经编码视频数据的经译码图片缓冲器(CPB)。经解码图片缓冲器82可为参考图片存储器,其存储参考视频数据以供视频解码器30例如以帧内或帧间译码模式解码视频数据时使用。视频数据存储器71及经解码图片缓冲器82可通过多种存储器装置中的任一者形成,诸如DRAM,包含SDRAM、MRAM、RRAM或其它类型的存储器装置。可由同一存储器装置或单独存储器装置提供视频数据存储器71及经解码图片缓冲器82。在各种实例中,视频数据存储器71可与视频解码器30的其它组件一起在芯片上,或相对于那些组件在芯片外。Video data memory 71 may store video data, such as encoded video bitstreams, to be decoded by components of video decoder 30. The video data stored in video data memory 71 may be obtained via wired or wireless network communication of video data or by accessing physical data storage media, such as from computer-readable media 16 (e.g., from a local video source, such as a camera). Video data memory 71 may form a decoded picture buffer (CPB) storing encoded video data from the encoded video bitstream. Decoded picture buffer 82 may be a reference picture memory storing reference video data for use by video decoder 30, for example, when decoding video data in intra-frame or inter-frame decoding modes. Video data memory 71 and decoded picture buffer 82 may be formed from any of a variety of memory devices, such as DRAM, including SDRAM, MRAM, RRAM, or other types of memory devices. Video data memory 71 and decoded picture buffer 82 may be provided by the same memory device or separate memory devices. In various instances, video data memory 71 may be on-chip along with other components of video decoder 30, or off-chip relative to those components.

在解码过程期间,视频解码器30从视频编码器20接收表示经编码视频切片的视频块及相关联语法元素的经编码视频位流。经编码视频位流可能已由视频编码器20使用上述多阶段量化过程编码。经编码视频位流也可表示通过HDR及/或WCG色彩格式定义的视频数据。视频解码器30的熵解码单元70对位流进行熵解码以产生经量化系数、运动向量或帧内预测模式指示符及其它语法元素。熵解码单元70将运动向量及其它语法元素转递到运动补偿单元72。在一些实例中,熵解码单元70可对指示用于待解码的视频数据块的基础量化参数QPb的语法元素进行解码。视频解码器30可接收视频切片级别及/或视频块级别的语法元素。During the decoding process, the video decoder 30 receives from the video encoder 20 a encoded video bitstream representing video blocks of encoded video slices and associated syntax elements. The encoded video bitstream may have been encoded by the video encoder 20 using the multi-stage quantization process described above. The encoded video bitstream may also represent video data defined using HDR and/or WCG color formats. The entropy decoding unit 70 of the video decoder 30 performs entropy decoding on the bitstream to produce quantized coefficients, motion vectors or intra-frame prediction mode indicators, and other syntax elements. The entropy decoding unit 70 forwards the motion vectors and other syntax elements to the motion compensation unit 72. In some instances, the entropy decoding unit 70 may decode syntax elements indicating the underlying quantization parameter QPb used for the video data block to be decoded. The video decoder 30 may receive syntax elements at the video slice level and/or video block level.

当视频切片经译码为经帧内译码(I)切片时,帧内预测处理单元74可基于经用信号表示帧内预测模式及来自当前帧或图片的先前经解码块的数据而产生当前视频切片的视频块的预测数据。当视频帧经译码为经帧间译码(即,B或P)切片时,运动补偿单元72基于运动向量及从熵解码单元70接收的其它语法元素产生用于当前视频切片的视频块的预测性块。所述预测性块可从参考图片列表中的一者内的参考图片中的一者产生。视频解码器30可基于存储于经解码图片缓冲器82中的参考图片使用预设构建技术构建参考图片列表:列表0及列表1。运动补偿单元72通过剖析运动向量及其它语法元素确定用于当前视频切片的视频块的预测信息,且使用预测信息产生用于正解码的当前视频块的预测性块。举例来说,运动补偿单元72使用一些接收到的语法元素来确定用以对视频切片的视频块进行译码的预测模式(例如,帧内预测或帧间预测)、帧间预测切片类型(例如,B切片或P切片)、切片的一或多个参考图片列表的构建信息、切片的每一经帧间编码视频块的运动向量、切片的每一经帧间译码视频块的帧间预测状态,及用以对当前视频切片中的视频块进行解码的其它信息。When a video slice is decoded into an intra-frame decoded (I) slice, the intra-frame prediction processing unit 74 can generate prediction data for the video blocks of the current video slice based on the intra-frame prediction mode represented by the signal and data from the previously decoded blocks from the current frame or picture. When a video frame is decoded into an inter-frame decoded (i.e., B or P) slice, the motion compensation unit 72 generates predictive blocks for the video blocks of the current video slice based on motion vectors and other syntax elements received from the entropy decoding unit 70. The predictive blocks can be generated from one of the reference pictures in a reference picture list. The video decoder 30 can construct a reference picture list: list 0 and list 1, based on the reference pictures stored in the decoded picture buffer 82 using a preset construction technique. The motion compensation unit 72 determines the prediction information for the video blocks of the current video slice by analyzing the motion vectors and other syntax elements, and uses the prediction information to generate predictive blocks for the current video block for forward decoding. For example, motion compensation unit 72 uses some received syntax elements to determine the prediction mode (e.g., intra-frame prediction or inter-frame prediction) used to decode video blocks of a video slice, the inter-frame prediction slice type (e.g., B-slice or P-slice), the construction information of one or more reference picture lists of the slice, the motion vectors of each inter-frame coded video block of the slice, the inter-frame prediction state of each inter-frame decoded video block of the slice, and other information used to decode video blocks in the current video slice.

运动补偿单元72也可执行基于内插滤波器的内插。运动补偿单元72可使用如由视频编码器20在视频块的编码期间使用的内插滤波器,以计算参考块的次整数像素的内插值。在此情况下,运动补偿单元72可从所接收的语法元素确定由视频编码器20所使用的内插滤波器并使用所述内插滤波器以产生预测性块。The motion compensation unit 72 can also perform interpolation based on interpolation filters. The motion compensation unit 72 can use interpolation filters, such as those used by the video encoder 20 during the encoding of video blocks, to calculate interpolated values for the second-integer pixels of the reference block. In this case, the motion compensation unit 72 can determine the interpolation filter used by the video encoder 20 from the received syntax elements and use the interpolation filter to generate a predictive block.

反量化单元76反量化(即,解量化)位流中所提供且由熵解码单元70解码的经量化的变换系数。反量化过程可包含使用由视频解码器30针对视频切片中的每一视频块确定的基础量化参数QPb来确定量化程度,且同样地,确定应该应用的反量化的程度。反变换处理单元78将反变换(例如,反DCT、反整数变换或概念上类似的反变换过程)应用于变换系数以便在像素域中产生残余块。The dequantization unit 76 dequantizes (i.e., dequantizes) the quantized transform coefficients provided in the bitstream and decoded by the entropy decoding unit 70. The dequantization process may involve determining the degree of quantization using the underlying quantization parameter QPb determined by the video decoder 30 for each video block in the video slice, and similarly, determining the degree of dequantization to be applied. The inverse transform processing unit 78 applies an inverse transform (e.g., inverse DCT, inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients to produce a residual block in the pixel domain.

在运动补偿单元72基于运动向量及其它语法元素而产生用于当前视频块的预测性块之后,视频解码器30通过将来自反变换处理单元78的残余块与运动补偿单元72所产生的对应预测性块求和而形成经解码视频块。求和器80表示执行此求和运算的一或多个组件。After the motion compensation unit 72 generates a predictive block for the current video block based on motion vectors and other syntax elements, the video decoder 30 forms a decoded video block by summing the residual block from the inverse transform processing unit 78 with the corresponding predictive block generated by the motion compensation unit 72. The summer 80 represents one or more components that perform this summation operation.

滤波器单元88可经配置以将一或多个滤波操作应用于经解码视频数据,随后输出并存储于经解码图片缓冲器82中。接着将给定帧或图片中的经解码视频块存储于经解码图片缓冲器82中,所述经解码图片缓冲器82存储用于后续运动补偿的参考图片。经解码图片缓冲器82也存储经解码视频,以用于稍后在显示装置(诸如,图1的显示装置32)上呈现。由滤波器单元88应用的实例滤波器包含解块滤波器、双向滤波器、自适应环路滤波器、样本自适应偏移滤波器及其它。举例来说,若需要,也可应用解块滤波器以对经解码块进行滤波以便移除块效应伪影。也可使用其它环路滤波器(在译码环路内或在译码环路之后)使像素转变平滑,或以其它方式改进视频质量。经解码图片缓冲器82也存储经解码视频,以用于稍后在显示装置(诸如,图1的显示装置32)上呈现。Filter unit 88 can be configured to apply one or more filtering operations to the decoded video data, which is then output and stored in decoded image buffer 82. Decoded video blocks from a given frame or image are then stored in decoded image buffer 82, which stores reference images for subsequent motion compensation. Decoded image buffer 82 also stores decoded video for later presentation on a display device (such as display device 32 of FIG. 1). Examples of filters applied by filter unit 88 include deblocking filters, bidirectional filters, adaptive loop filters, sample adaptive offset filters, and others. For example, a deblocking filter can be applied, if needed, to filter the decoded blocks to remove block artifacts. Other loop filters (within or after the decoding loop) can also be used to smooth pixel transitions or otherwise improve video quality. Decoded image buffer 82 also stores decoded video for later presentation on a display device (such as display device 32 of FIG. 1).

在一些实例中,由滤波器单元88应用的滤波器的参数可基于量化参数。如上文所描述,由视频解码器30接收的视频数据包含由视频编码器20使用有效量化参数QPe引入的失真,所述有效量化参数QPe大于在位流中传达且与当前Cb相关联的QPb值。由滤波器单元88应用的滤波器可依赖于由位流提供到调整性能的QP参数。因此,视频解码器30可经配置以导出通过视频编码器20应用至Cb的实际QPe的估计。就此来说,视频解码器30可包含用以导出QPe值的QPe估计单元84。In some instances, the parameters of the filter applied by filter unit 88 may be based on quantization parameters. As described above, the video data received by video decoder 30 contains distortion introduced by video encoder 20 using an effective quantization parameter QPe, which is greater than the QPb value conveyed in the bitstream and associated with the current Cb. The filter applied by filter unit 88 may depend on the QP parameter provided by the bitstream to adjust performance. Therefore, video decoder 30 may be configured to derive an estimate of the actual QPe applied to Cb by video encoder 20. In this regard, video decoder 30 may include a QPe estimation unit 84 for deriving the QPe value.

举例来说,QPe估计单元84可经配置以估计当前块Cb的量化参数偏移(δQP(s(Cb))。在一个实例中,QPe估计单元84可经配置以从查找表(例如LUT_DQP 86)估计δQP(s(Cb))。LUT_DQP 86包含δQP值的估计且通过从块Cb的经解码s(Cb)样本(例如明度或色度样本)的平均值导出的索引存取。以下方程式展示导出量化参数偏移的一个实例:For example, QPe estimation unit 84 can be configured to estimate the quantization parameter offset (δQP(s(Cb))) of the current block Cb. In one instance, QPe estimation unit 84 can be configured to estimate δQP(s(Cb)) from a lookup table (e.g., LUT_DQP 86). LUT_DQP 86 contains estimates of the δQP values and is accessed via an index derived from the average of decoded s(Cb) samples (e.g., lightness or chroma samples) of block Cb. The following equation illustrates an example of deriving the quantization parameter offset:

δQP(s(Cb))=LUT_DQP(mean(s(Cb))   (2)δQP(s(Cb))=LUT_DQP(mean(s(Cb)) (2)

其中LUT_DQP为δQP(s(Cb))的查找表且mean(s(Cb))为块Cb的经解码样本值的平均值。Where LUT_DQP is the lookup table for δQP(s(Cb)) and mean(s(Cb)) is the average value of the decoded sample values of block Cb.

在其它实例中,QPe估计单元84可经配置以通过经译码块的样本的一些其它特征或位流的特征的函数(例如,基于方差的二阶函数)导出δQP(s(Cb))的值。QPe估计单元84可经配置以使用算法、查找表估计δQP值,或可使用其它手段明确估计δQP值。在一些实例中,用以确定δQP()的样本可包含明度及色度样本两者,或更一般来说,包含经解码块的一或多个分量的样本。QPe估计单元84接着可提供QPe的估计值至滤波器单元88以供由滤波器单元88实施的一或多个译码工具使用。In other instances, the QPe estimation unit 84 may be configured to derive the value of δQP(s(Cb)) from some other feature of the samples of the decoded block or a function of the characteristics of the bitstream (e.g., a second-order function based on variance). The QPe estimation unit 84 may be configured to estimate the δQP value using an algorithm, a lookup table, or other means. In some instances, the samples used to determine δQP() may include both luminance and chrominance samples, or more generally, samples of one or more components of the decoded block. The QPe estimation unit 84 can then provide an estimate of QPe to the filter unit 88 for use by one or more decoding tools implemented by the filter unit 88.

在一个实例中,滤波器88可经配置以执行解块滤波。在解块实施的一个非限制性实例中,下文给出解块过程作为用于解块滤波的HEVC规格的变化。所引入的变化是以In one instance, filter 88 can be configured to perform deblocking filtering. In a non-limiting example of deblocking implementation, the following describes the deblocking process as a variation of the HEVC specification for deblocking filtering. The introduced change is based on...

8.7.2.5.3用于明度块边缘的决策过程8.7.2.5.3 Decision process for the edges of lightness blocks

变量QpQ及QpP经设置为等于译码单元Cbq及Cbp的值,所述译码单元Cbq及Cbp包含分别含有样本q0,0及p0,0的块。如下导出Variables QpQ and QpP are set to be equal to the values of decoding units Cbq and Cbp, which contain blocks containing samples q0,0 and p0,0 respectively. The following derivation follows.

如下导出变量qPL:The following is the exported variable qPL:

qPL=((QpQ+Qpp+1)>>1)qPL=((QpQ+Qpp+1)>>1)

8.7.2.5.5用于色度块边缘的滤波过程8.7.2.5.5 Filtering process for chroma block edges

变量QpQ及QpP经设置为等于译码单元的值,所述译码单元包含分别含有样本q0,0及p0,0的译码块。如下导出Variables QpQ and QpP are set to be equal to the values of the decoding unit, which contains decoding blocks containing samples q0,0 and p0,0 respectively. The following derivation follows.

如果ChromaArrayType等于1,那么基于如下导出的索引qPi来确定变量QpC,如表8至10中所指定:If ChromaArrayType equals 1, then the variable QpC is determined based on the derived index qPi, as specified in Tables 8 to 10:

qPi=((QpQ+QpP+1)>>1)+cQpPicOffset(qPi=((QpQ+QpP+1)>>1)+cQpPicOffset(

在以上实例中,QpY与QPb相同且QpY_EQ为相同QPe。In the above examples, QpY and QPb are the same and QpY_EQ is the same as QPe.

在另一实例中,滤波器单元88可经配置以实施双向滤波器。双向滤波器基于其邻域中的样本的加权平均值修改样本,且权重是基于邻近样本与当前样本的距离及当前样本及邻近样本的样本值的差而导出。In another instance, filter unit 88 may be configured to implement a bidirectional filter. The bidirectional filter modifies the sample based on a weighted average of the samples in its neighborhood, and the weights are derived based on the distance between the current sample and the neighboring samples and the difference between the sample values of the current sample and the neighboring samples.

使x为基于其邻域N(x)中的样本进行滤波的当前样本值的位置。对于y属于N(x)的每一样本d(y),使w(y,x)为与在位置y处的样本相关联的权重以获得x处的样本的经滤波版本。x的经滤波版本D(x)经获得作为Let x be the position of the current sample value filtered based on samples in its neighborhood N(x). For each sample d(y) belonging to N(x), let w(y,x) be the weight associated with the sample at position y to obtain the filtered version of the sample at position x. The filtered version D(x) of x is then obtained as...

D(x)=∑y∈N(x)w(y,x)d(y)   (8)D(x)=∑ y∈N(x) w(y,x)d(y) (8)

权重经导出Weights are exported

w(y,x)=f(y,x,d(y),d(x),QP(Cb))   (9)w(y,x)=f(y,x,d(y),d(x),QP(Cb)) (9)

其中f()为基于样本位置及样本值计算权重的函数。用以对含有样本的块进行译码的QP也可为在f()导出中的其它变元(argument)。在一些实例中,含有x的块的QP值用作至f()的变元。在此实例中,用作f()中的其它变元的QP值为如下导出的QPd(Cb):Here, f() is a function that calculates weights based on sample positions and sample values. The QP used to decode the block containing the sample can also be other arguments derived from f(). In some instances, the QP value of the block containing x is used as an argument to f(). In this instance, the QP value used as another argument in f() is the derived QPd(Cb):

QPe(Cb)=QP(Cb)+deltaQP(d(Cb))   (10)QPe(Cb)=QP(Cb)+deltaQP(d(Cb)) (10)

其中QP(Cb)为经译码块的用信号表示的QP值(例如QPb),且δQP(d(Cb))为基于经解码译码块的特征的QP值,例如,均值。因此,经导出的权重如下:Where QP(Cb) is the QP value (e.g., QPb) represented by the signal of the decoded block, and δQP(d(Cb)) is the QP value based on the characteristics of the decoded block, such as the mean. Therefore, the derived weights are as follows:

w(y,x)=f(y,x,d(y),d(x),QPe(Cb))   (11)w(y,x)=f(y,x,d(y),d(x),QPe(Cb)) (11)

在一些实例中,分别针对明度及色度导出加权函数。与色度译码块相关联的QP也可具有经导出或在位流中用信号表示的色度偏移的效应,且所导出的δQP()可为一或多个分量的样本的函数。In some instances, weighting functions are derived separately for luminance and chrominance. The QP associated with the chrominance decoding block can also have the effect of a chrominance offset, either derived or represented as a signal in the bit stream, and the derived δQP() can be a function of samples of one or more components.

在一些实例中,用于f()的其它变元的QP可通过考虑针对含有位置x处的样本的经译码块导出的QPe()值及针对含有位置y处的样本的经译码块导出的QPe()值而获得。举例来说,可选择从两个QPd()值导出的值(例如平均值)作为f()的变元。In some instances, the QP values for other variables of f() can be obtained by considering the QPe() values derived from the decoded block for the sample at position x and the QPe() values derived from the decoded block for the sample at position y. For example, values derived from the two QPd() values (e.g., the average) can be chosen as variables of f().

在本发明的另一实例中,视频解码器30可经配置以使用多个LUT_DQP。在一些实例中,两个或更多个LUT_DQP表可在视频解码器30处使用。视频解码器30可经配置以导出两个或更多个查找表中的特定一者的索引以便用于导出特定块边缘。视频解码器30可经配置以从语法元素、从当前样本的相同时空邻域中的块的译码信息或从经解码图片样本的统计导出索引。In another embodiment of the invention, the video decoder 30 may be configured to use multiple LUT_DQPs. In some instances, two or more LUT_DQP tables may be used at the video decoder 30. The video decoder 30 may be configured to derive an index of a specific one of two or more lookup tables for use in deriving specific block edges. The video decoder 30 may be configured to derive the index from syntax elements, from decoding information of blocks in the same spatiotemporal neighborhood of the current sample, or from statistics of decoded image samples.

举例来说:For example:

δQP(d(Cbq))=LUT_DQP(d(Cbq),Idx1)   (12)δQP(d(Cbq))=LUT_DQP(d(Cbq),Idx1) (12)

δQP(d(Cbq))=LUT_DQP(d(Cbp),Idx2)δQP(d(Cbq))=LUT_DQP(d(Cbp),Idx2)

其中Idx1及Idx2为在视频解码器30处可用的若干LUT_DQP表中的索引选择。Idx1 and Idx2 are index selections from several LUT_DQP tables available at video decoder 30.

在本发明的另一实例中,视频编码器20及视频解码器30可经配置以应用具有更精细块粒度的空间变化量化。在一些实例中,视频编码器20可经配置以将当前经译码块Cb分裂成子分区,其中的每一者根据以上方程式2、3及4独立地处理。一旦针对每一分区产生经重构建信号r2,则其形成如以上方程式(5)中所展示的经进一步处理的r2(Cb)数据。In another embodiment of the invention, the video encoder 20 and video decoder 30 may be configured to apply spatial variation quantization with finer block granularity. In some embodiments, the video encoder 20 may be configured to split the currently decoded block Cb into sub-partitions, each of which is processed independently according to Equations 2, 3, and 4 above. Once a reconstructed signal r2 is generated for each partition, it forms further processed r2(Cb) data as shown in Equation (5) above.

在视频解码器30处,修改某些译码工具(例如解块)以反映此分割,但其并未在CU分割中提供。举例来说,解块被称作对除了当前指定的TU及PU以外的这些虚拟块边缘进行滤波。At video decoder 30, certain decoding tools (such as deblocking) are modified to reflect this segmentation, but this is not provided in CU segmentation. For example, deblocking is referred to as filtering the edges of these virtual blocks other than the currently specified TU and PU.

在一些实例中,关于块分割的更精细粒度的信息可在位流的语法元素(例如,PPS、SPS或切片标头)中用信号表示且提供到解码器作为旁侧信息。In some instances, finer-grained information about block segmentation can be represented as signals in the bitstream syntax elements (e.g., PPS, SPS, or slice headers) and provided to the decoder as side information.

在一些实例中,可移除或扩展对于包含δQP或chromaQP偏移值的最大QP值的约束(例如,由削波过程实现)以支持QPe参数与利用类似于HEVC的视频译码架构的QPb有更宽偏差。In some instances, constraints on the maximum QP value containing δQP or chromaQP offset values can be removed or extended (e.g., implemented by a clipping process) to support a wider deviation of the QPe parameter from QPb utilizing a video decoding architecture similar to HEVC.

本发明的上述技术可提供优于其它技术的以下优点。本发明的上述技术可避免δQP信令,由此相比支持HDR/WCG视频数据的基于δQP的方法固有地带来百分之几的位速率降低。The technology described above in this invention offers the following advantages over other technologies. The technology described above in this invention avoids delta-QP signaling, thereby avoiding a bit rate reduction of a few percent compared to delta-QP-based methods supporting HDR/WCG video data.

与“下一代容器的解量化及缩放(De-quantization and scaling for nextgeneration containers)”(J.Zhao、A.Segall、S.-H.Kim、K.Misra(Sharp),JVET文件B0054,2016年1月)中的技术相对比,本发明的上述技术允许对t(Cb)的所有变换系数进行相同缩放。Compared with the techniques in "De-quantization and scaling for next-generation containers" (J. Zhao, A. Segall, S.-H. Kim, K. Misra (Sharp), JVET document B0054, January 2016), the above-described techniques of the present invention allow for the same scaling of all transformation coefficients of t(Cb).

与美国专利申请案第15/595,793号中的技术相比,本发明的上述技术可提供对局部亮度的较高准确度估计,是因为经解码值提供比经预测样本好的估计。Compared to the technology in U.S. Patent Application No. 15/595,793, the above-described technology of the present invention can provide a higher accuracy estimate of local brightness because the decoded value provides a better estimate than the predicted sample.

本发明的上述技术可允许δQP导出及应用的更精细粒度而不会使与基于δQP的解决方案相关联的信令开销的增加。The above-described technology of the present invention allows for finer granularity in δQP derivation and application without increasing the signaling overhead associated with δQP-based solutions.

与“下一代容器的解量化及缩放(De-quantization and scaling for nextgeneration containers)”及美国专利申请案第15/595,793号的基于变换缩放的设计相比,本发明的上述技术具有更简单的实施设计。Compared with the transformation scaling-based design in "De-quantization and scaling for next-generation containers" and U.S. Patent Application No. 15/595,793, the above-described technology of the present invention has a simpler implementation design.

图11为说明实例编码方法的流程图。包含量化单元54的视频编码器20可经配置以执行图11的技术。Figure 11 is a flowchart illustrating the example encoding method. The video encoder 20, which includes a quantization unit 54, can be configured to perform the technique of Figure 11.

在本发明的一个实例中,视频编码器20可经配置以确定用于视频数据块的基础量化参数(1100),且基于与视频数据块相关联的统计确定用于视频数据块的量化参数偏移(1102)。视频编码器20可进一步经配置以将量化参数偏移添加到基础量化参数以建立有效量化参数(1104),且使用有效量化参数及基础量化参数编码视频数据块(1106)。在一个实例中,对于所有视频数据块,基础量化参数是相同的。在一个实例中,视频数据的样本值是通过高动态范围视频数据色彩格式定义。In one embodiment of the invention, the video encoder 20 may be configured to determine a base quantization parameter (1100) for a video data block and to determine a quantization parameter offset (1102) for the video data block based on statistics associated with it. The video encoder 20 may be further configured to add the quantization parameter offset to the base quantization parameter to establish an effective quantization parameter (1104), and to encode the video data block using the effective quantization parameter and the base quantization parameter (1106). In one embodiment, the base quantization parameter is the same for all video data blocks. In one embodiment, the sample values of the video data are defined using a high dynamic range video data color format.

在本发明的另一实例中,为编码视频数据块,视频编码器20可进一步经配置以预测块以产生残余样本,变换残余样本以建立变换系数,用有效量化参数量化变换系数,用有效量化参数对经量化变换系数进行反量化以产生失真变换系数,对失真变换系数进行反变换以产生失真残余样本,变换失真残余样本,且使用基础量化参数量化经变换失真残余样本。In another embodiment of the invention, for encoding video data blocks, the video encoder 20 may be further configured to predict blocks to generate residual samples, transform the residual samples to establish transform coefficients, quantize the transform coefficients with effective quantization parameters, dequantize the quantized transform coefficients with effective quantization parameters to generate distortion transform coefficients, inverse transform the distortion transform coefficients to generate distortion residual samples, transform the distortion residual samples, and quantize the transformed distortion residual samples with basic quantization parameters.

在本发明的另一实例中,为确定量化参数偏移,视频编码器20可进一步经配置以从查找表确定量化参数偏移。In another embodiment of the invention, to determine the quantization parameter offset, the video encoder 20 may be further configured to determine the quantization parameter offset from a lookup table.

图12为说明实例解码方法的流程图。视频解码器30(包含反量化单元76、QPe估计单元84及滤波器单元88)可经配置以执行图12的技术。Figure 12 is a flowchart illustrating the example decoding method. The video decoder 30 (including an inverse quantization unit 76, a QPe estimation unit 84, and a filter unit 88) can be configured to perform the technique of Figure 12.

在本发明的一个实例中,视频解码器30可经配置以接收视频数据的经编码块,所述视频数据的所述经编码块已使用有效量化参数及基础量化参数编码,其中所述有效量化参数为添加到基础量化参数的量化参数偏移的函数(1200)。视频解码器30可进一步经配置以确定用以编码视频数据的经编码块的基础量化参数(1202),并使用基础量化参数对视频数据的经编码块进行解码以建立视频数据的经解码块(1204)。视频解码器30可进一步经配置以基于与视频数据的经解码块相关联的统计确定视频数据的经解码块的量化参数偏移的估计(1206),并将量化参数偏移的估计添加到基础量化参数以建立有效量化参数的估计(1208)。视频解码器30可进一步经配置以根据有效量化参数的估计对视频数据的经解码块执行一或多个滤波操作(1210)。在一个实例中,对于所有视频数据块,基础量化参数是相同的。在另一实例中,视频数据的样本值是通过高动态范围视频数据色彩格式定义。In one embodiment of the invention, a video decoder 30 may be configured to receive encoded blocks of video data, said encoded blocks of video data having been encoded using effective quantization parameters and basic quantization parameters, wherein said effective quantization parameters are a function of quantization parameter offsets added to the basic quantization parameters (1200). The video decoder 30 may be further configured to determine basic quantization parameters (1202) for encoding the encoded blocks of video data, and to decode the encoded blocks of video data using the basic quantization parameters to establish decoded blocks of video data (1204). The video decoder 30 may be further configured to determine an estimate of the quantization parameter offset of the decoded blocks of video data based on statistics associated with the decoded blocks of video data (1206), and to add the estimate of the quantization parameter offset to the basic quantization parameters to establish an estimate of the effective quantization parameters (1208). The video decoder 30 may be further configured to perform one or more filtering operations on the decoded blocks of video data according to the estimate of the effective quantization parameters (1210). In one embodiment, the basic quantization parameters are the same for all video data blocks. In another embodiment, the sample values of the video data are defined using a high dynamic range video data color format.

在本发明的另一实例中,为确定基础量化参数,视频解码器30可进一步经配置以将基础量化参数语法元素接收于经编码视频位流中,所述基础量化paymaster语法元素的值指示基础量化参数。In another embodiment of the invention, to determine the underlying quantization parameters, the video decoder 30 may be further configured to receive an underlying quantization parameter syntax element in the encoded video bitstream, the value of which indicates the underlying quantization parameters.

在本发明的另一实例中,为解码视频数据块,视频解码器30可进一步经配置以对视频数据的经编码块进行熵解码以确定经量化变换系数,使用基础量化参数对经量化变换系数进行反量化以建立变换系数,对变换系数进行反变换以建立残余值,且对残余值执行预测过程以建立视频数据的经解码块。In another embodiment of the invention, for decoding video data blocks, the video decoder 30 may be further configured to perform entropy decoding on the encoded blocks of video data to determine quantized transform coefficients, dequantize the quantized transform coefficients using basic quantization parameters to establish transform coefficients, perform inverse transform on the transform coefficients to establish residual values, and perform a prediction process on the residual values to establish decoded blocks of video data.

在本发明的另一实例中,为确定视频数据的经解码块的量化参数偏移的估计,视频解码器30可进一步经配置以确定视频数据的经解码块的样本值的平均值,且使用视频数据的经解码块的样本值的平均值确定视频数据的经解码块的量化参数偏移的估计。In another embodiment of the invention, to determine an estimate of the quantization parameter offset of a decoded block of video data, the video decoder 30 may be further configured to determine the average value of sample values of the decoded block of video data, and use the average value of sample values of the decoded block of video data to determine an estimate of the quantization parameter offset of the decoded block of video data.

在本发明的另一实例中,为确定量化参数偏移的估计,视频解码器30可进一步经配置以从查找表确定量化参数偏移的估计,其中样本值的平均值为到查找表的输入。In another embodiment of the invention, to determine an estimate of the quantization parameter offset, the video decoder 30 may be further configured to determine the estimate of the quantization parameter offset from a lookup table, wherein the average of the sample values is the input to the lookup table.

在本发明的另一实例中,视频解码器30可进一步经配置以从多个查找表确定查找表。In another embodiment of the invention, the video decoder 30 may be further configured to determine a lookup table from a plurality of lookup tables.

在本发明的另一实例中,为对视频数据的经解码块执行一或多个滤波操作,视频解码器30可进一步经配置以使用有效量化参数将解块滤波器应用于视频数据的经解码块。In another embodiment of the invention, in order to perform one or more filtering operations on decoded blocks of video data, the video decoder 30 may be further configured to apply a deblocking filter to the decoded blocks of video data using effective quantization parameters.

在本发明的另一实例中,为对视频数据的经解码块执行一或多个滤波操作,视频解码器30可进一步经配置以使用有效量化参数将双向滤波器应用于视频数据的经解码块。In another embodiment of the invention, in order to perform one or more filtering operations on the decoded blocks of video data, the video decoder 30 may be further configured to apply a bidirectional filter to the decoded blocks of video data using effective quantization parameters.

已出于说明的目的关于HEVC、HEVC的扩展,及JEM及VVC的实例标准描述本发明的某些方面。然而,本发明中所描述的技术可用于其它视频译码过程,包含尚未开发的其它标准或专有视频译码过程。Certain aspects of the invention have been described for illustrative purposes with reference to HEVC, its extensions, and example standards such as JEM and VVC. However, the techniques described herein can be used in other video decoding processes, including other standard or proprietary video decoding processes that have not yet been developed.

如本发明中所描述,视频译码器可指视频编码器或视频解码器。类似地,视频译码单元可指视频编码器或视频解码器。同样地,若适用,则视频译码可指视频编码或视频解码。As described in this invention, a video decoder may refer to a video encoder or a video decoder. Similarly, a video decoding unit may refer to a video encoder or a video decoder. Likewise, if applicable, video decoding may refer to video encoding or video decoding.

应认识到,取决于实例,本文中所描述的技术中的任一者的某些动作或事件可以不同序列被执行、可被添加、合并或完全省去(例如,并非所有所描述动作或事件为实践所述技术所必要)。此外,在某些实例中,可例如经由多线程处理、中断处理或多个处理器同时而非循序执行动作或事件。It should be recognized that, depending on the instance, certain actions or events of any of the techniques described herein may be performed in different sequences, may be added, combined, or omitted entirely (e.g., not all described actions or events are necessary for practicing the techniques). Furthermore, in some instances, actions or events may be performed simultaneously rather than sequentially, for example, via multithreading, interrupt handling, or multiple processors.

在一或多个实例中,所描述功能可以硬件、软件、固件或其任何组合来实施。如果以软件实施,那么所述功能可作为一或多个指令或代码而存储于计算机可读媒体上或经由计算机可读媒体进行发射,且由基于硬件的处理单元执行。计算机可读媒体可包含计算机可读存储媒体(其对应于诸如数据存储媒体的有形媒体)或通信媒体,所述通信媒体包含(例如)根据通信协议促进计算机程序从一处传送到另一处的任何媒体。以此方式,计算机可读媒体通常可对应于(1)非暂时性的有形计算机可读存储媒体,或(2)诸如信号或载波的通信媒体。数据存储媒体可为可通过一或多个计算机或一或多个处理器存取以检索指令、代码及/或数据结构以用于实施本发明所描述的技术的任何可用媒体。计算机程序产品可包含计算机可读媒体。In one or more instances, the described functionality may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functionality may be stored on or transmitted via a computer-readable medium as one or more instructions or code, and executed by a hardware-based processing unit. The computer-readable medium may comprise a computer-readable storage medium (which corresponds to a tangible medium such as a data storage medium) or a communication medium, which includes, for example, any medium that facilitates the transfer of a computer program from one place to another according to a communication protocol. In this way, a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium such as a signal or carrier wave. The data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code, and/or data structures for implementing the techniques described herein. A computer program product may comprise a computer-readable medium.

通过实例而非限制,这些计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储器、磁盘存储器或其它磁性存储装置、闪速存储器或可用于存储呈指令或数据结构形式的所要程序代码且可由计算机存取的任何其它媒体。而且,任何连接被恰当地称为计算机可读媒体。举例来说,如果使用同轴缆线、光纤缆线、双绞线、数字订户线(DSL)或诸如红外线、无线电及微波的无线技术从网站、服务器或其它远程源来发射指令,那么同轴缆线、光纤缆线、双绞线、DSL或诸如红外线、无线电及微波的无线技术包含于媒体的定义中。然而,应理解,计算机可读存储媒体及数据存储媒体不包含连接、载波、信号或其它暂时性媒体,而实情为是关于非暂时性有形存储媒体。如本文中所使用,磁盘及光盘包含光盘(CD)、激光光盘、光学光盘、数字影音光盘(DVD)、软性磁盘及蓝光光盘,其中磁盘通常以磁性方式再生数据,而光盘通过激光以光学方式再生数据。以上各者的组合也应包含于计算机可读媒体的范围内。By way of example, and not limitation, these computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disc storage, disk storage or other magnetic storage devices, flash memory, or any other media that can be used to store desired program code in the form of instructions or data structures and that is accessible to a computer. Furthermore, any connection is properly referred to as computer-readable media. For example, if instructions are transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media. However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but are in fact referring to non-transient tangible storage media. As used herein, disks and optical discs include optical discs (CDs), laser discs, optical discs, digital video discs (DVDs), floppy disks, and Blu-ray discs, wherein disks typically reproduce data magnetically, while optical discs reproduce data optically using lasers. The combination of the above should also be included in the scope of computer-readable media.

指令可由一或多个处理器执行,诸如一或多个DSP、通用微处理器、ASIC、FPGA或其它等效集成或离散逻辑电路。因此,如本文中所使用的术语“处理器”可指上述结构或适合于实施本文中所描述的技术的任何其它结构中的任一者。此外,在一些实例中,本文所描述的功能性可提供于经配置以供编码及解码或并入于经组合编解码器中的专用硬件及/或软件模块内。此外,所述技术可完全实施于一或多个电路或逻辑元件中。Instructions can be executed by one or more processors, such as one or more DSPs, general-purpose microprocessors, ASICs, FPGAs, or other equivalent integrated or discrete logic circuits. Therefore, as used herein, the term "processor" can refer to any of the above-described structures or any other structures suitable for implementing the techniques described herein. Furthermore, in some instances, the functionality described herein can be provided within dedicated hardware and/or software modules configured for encoding and decoding or incorporated into combined codecs. Moreover, the techniques can be fully implemented within one or more circuit or logic elements.

本发明的技术可实施于广泛多种装置或设备中,包含无线手持机、集成电路(IC)或一组IC(例如芯片组)。在本发明中描述各种组件、模块或单元以强调经配置以执行所揭示技术的装置的功能方面,但未必要求由不同硬件单元来实现。确切来说,如上文所描述,各种单元可与合适的软件及/或固件一起组合于编解码器硬件单元中或由互操作性硬件单元的集合提供,硬件单元包含如上文所描述的一或多个处理器。The technology of this invention can be implemented in a wide variety of devices or equipment, including wireless handheld devices, integrated circuits (ICs), or a set of ICs (e.g., chipsets). Various components, modules, or units are described in this invention to emphasize functional aspects of a device configured to perform the disclosed technology, but are not necessarily required to be implemented by different hardware units. Specifically, as described above, various units may be combined with suitable software and/or firmware in a codec hardware unit or provided as a collection of interoperable hardware units, which include one or more processors as described above.

各种实例已予以描述。这些及其它实例在以下权利要求书的范围内。Various examples have been described. These and other examples are within the scope of the following claims.

Claims (34)

1.一种解码视频数据的方法,所述方法包括:1. A method for decoding video data, the method comprising: 接收所述视频数据的经编码块;Receive the encoded blocks of the video data; 确定用以编码所述视频数据的所述经编码块的基础量化参数;Determine the basic quantization parameters for the coded block used to encode the video data; 使用所述基础量化参数解码所述视频数据的所述经编码块以建立视频数据的经解码块;The encoded blocks of the video data are decoded using the basic quantization parameters to establish decoded blocks of the video data; 基于所述视频数据的所述经解码块的内容确定用于所述视频数据的所述经解码块的所述量化参数偏移的估计;Based on the content of the decoded block of the video data, an estimate of the quantization parameter offset for the decoded block of the video data is determined; 将所述量化参数偏移的所述估计添加到所述基础量化参数以建立有效量化参数的估计;及The estimated quantization parameter offset is added to the base quantization parameter to establish an estimate of the effective quantization parameter; and 根据所述有效量化参数的所述估计对视频数据的所述经解码块执行一或多个滤波操作。Based on the estimate of the effective quantization parameters, one or more filtering operations are performed on the decoded block of the video data. 2.根据权利要求1所述的方法,其中对于所述视频数据的所述块中的全部,所述基础量化参数是相同的。2. The method of claim 1, wherein the basic quantization parameters are the same for all of the blocks of the video data. 3.根据权利要求1所述的方法,其中所述视频数据的样本值是通过高动态范围视频数据色彩格式定义。3. The method according to claim 1, wherein the sample values of the video data are defined by a high dynamic range video data color format. 4.根据权利要求1所述的方法,其中确定所述基础量化参数包括:4. The method according to claim 1, wherein determining the basic quantization parameters comprises: 将基础量化参数语法元素接收于经编码视频位流中,所述基础量化参数语法元素的值指示所述基础量化参数。The basic quantization parameter syntax element is received in the encoded video bitstream, and the value of the basic quantization parameter syntax element indicates the basic quantization parameter. 5.根据权利要求1所述的方法,其中解码所述视频数据的所述经编码块包括:5. The method of claim 1, wherein decoding the coded block of the video data comprises: 对所述视频数据的所述经编码块进行熵解码以确定经量化变换系数;Entropy decoding is performed on the coded blocks of the video data to determine the quantized transform coefficients; 使用所述基础量化参数对所述经量化变换系数进行反量化以建立变换系数;The quantized transform coefficients are dequantized using the basic quantization parameters to establish transform coefficients; 对所述变换系数进行反变换以建立残余值;及Perform an inverse transformation on the transformation coefficients to establish residual values; and 对所述残余值执行预测过程以建立所述视频数据的所述经解码块。A prediction process is performed on the residual values to construct the decoded block of the video data. 6.根据权利要求1所述的方法,其中确定用于所述视频数据的所述经解码块的所述量化参数偏移的所述估计包括:6. The method of claim 1, wherein determining the estimate of the quantization parameter offset for the decoded block of the video data comprises: 确定所述视频数据的所述经解码块的样本值的平均值;及Determine the average value of the sample values of the decoded blocks of the video data; and 使用所述视频数据的所述经解码块的所述样本值的所述平均值确定用于所述视频数据的所述经解码块的所述量化参数偏移的所述估计。The average value of the sample values of the decoded block of the video data is used to determine the estimate of the quantization parameter offset for the decoded block of the video data. 7.根据权利要求6所述的方法,其中确定所述量化参数偏移的所述估计包括:7. The method of claim 6, wherein determining the estimate of the quantization parameter offset comprises: 从查找表确定所述量化参数偏移的所述估计,其中所述样本值的所述平均值为到所述查找表的输入。The estimate of the quantization parameter offset is determined from a lookup table, wherein the average of the sample values is the input to the lookup table. 8.根据权利要求7所述的方法,其进一步包括:8. The method of claim 7, further comprising: 从多个查找表中确定所述查找表。The lookup table is determined from multiple lookup tables. 9.根据权利要求1所述的方法,其中对所述视频数据的所述经解码块执行所述一或多个滤波操作包括:9. The method of claim 1, wherein performing the one or more filtering operations on the decoded block of the video data comprises: 使用所述有效量化参数将解块滤波器应用于视频数据的所述经解码块。The deblocking filter is applied to the decoded block of video data using the effective quantization parameters. 10.根据权利要求1所述的方法,其中对所述视频数据的所述经解码块执行所述一或多个滤波操作包括:10. The method of claim 1, wherein performing the one or more filtering operations on the decoded block of the video data comprises: 使用所述有效量化参数将双向滤波器应用于视频数据的所述经解码块。The bidirectional filter is applied to the decoded block of video data using the effective quantization parameters. 11.一种编码视频数据的方法,所述方法包括:11. A method for encoding video data, the method comprising: 确定用于所述视频数据的块的基础量化参数;Determine the basic quantization parameters for the blocks of the video data; 基于所述视频数据的所述块的内容确定用于所述视频数据的所述块的量化参数偏移;The quantization parameter offset for the block of video data is determined based on the content of the block. 将所述量化参数偏移添加到所述基础量化参数以建立有效量化参数;及Add the quantization parameter offset to the base quantization parameter to establish an effective quantization parameter; and 使用所述有效量化参数及所述基础量化参数编码所述视频数据的所述块。The block of the video data is encoded using the effective quantization parameters and the basic quantization parameters. 12.根据权利要求11所述的方法,其中对于所述视频数据的所述块中的全部,所述基础量化参数是相同的。12. The method of claim 11, wherein the basic quantization parameters are the same for all of the blocks of the video data. 13.根据权利要求11所述的方法,其中所述视频数据的样本值是通过高动态范围视频数据色彩格式定义。13. The method of claim 11, wherein the sample values of the video data are defined by a high dynamic range video data color format. 14.根据权利要求11所述的方法,其中编码所述视频数据的所述块包括:14. The method of claim 11, wherein the block encoding the video data comprises: 预测所述块以产生残余样本;Predict the block to generate residual samples; 对所述残余样本进行变换以建立变换系数;The residual samples are transformed to establish transformation coefficients; 用所述有效量化参数量化所述变换系数;Quantize the transformation coefficients using the effective quantization parameters; 用所述有效量化参数对经量化变换系数进行反量化以产生失真变换系数;The quantized transform coefficients are dequantized using the effective quantization parameters to generate distortion transform coefficients. 对所述失真变换系数进行反变换以产生失真残余样本;The distortion transformation coefficients are inversely transformed to generate distorted residual samples; 对所述失真残余样本进行变换;及Transform the distorted residual sample; and 使用所述基础量化参数对经变换失真残余样本进行量化。The transformed distortion residual sample is quantized using the aforementioned basic quantization parameters. 15.根据权利要求11所述的方法,其中确定所述量化参数偏移包括从查找表中确定所述量化参数偏移。15. The method of claim 11, wherein determining the quantization parameter offset comprises determining the quantization parameter offset from a lookup table. 16.一种经配置以解码视频数据的设备,所述设备包括:16. An apparatus configured to decode video data, the apparatus comprising: 存储器,其经配置以存储所述视频数据的经编码块;及A memory configured to store encoded blocks of the video data; and 与所述存储器通信的一或多个处理器,所述一或多个处理器经配置以:One or more processors communicating with the memory, the one or more processors being configured to: 接收所述视频数据的所述经编码块;The encoded block that receives the video data; 确定用以对所述视频数据的所述经编码块进行编码的基础量化参数;Determine the basic quantization parameters used to encode the coded blocks of the video data; 使用所述基础量化参数解码所述视频数据的所述经编码块以建立视频数据的经解码块;The encoded blocks of the video data are decoded using the basic quantization parameters to establish decoded blocks of the video data; 基于所述视频数据的所述经解码块的内容确定用于所述视频数据的所述经解码块的所述量化参数偏移的估计;Based on the content of the decoded block of the video data, an estimate of the quantization parameter offset for the decoded block of the video data is determined; 将所述量化参数偏移的所述估计添加到所述基础量化参数以建立有效量化参数的估计;及The estimated quantization parameter offset is added to the base quantization parameter to establish an estimate of the effective quantization parameter; and 根据所述有效量化参数的所述估计对视频数据的所述经解码块执行一或多个滤波操作。Based on the estimate of the effective quantization parameters, one or more filtering operations are performed on the decoded block of the video data. 17.根据权利要求16所述的设备,其中对于所述视频数据的所述块中的全部,所述基础量化参数是相同的。17. The device of claim 16, wherein the basic quantization parameters are the same for all of the blocks of the video data. 18.根据权利要求16所述的设备,其中所述视频数据的样本值是通过高动态范围视频数据色彩格式定义。18. The device of claim 16, wherein the sample values of the video data are defined by a high dynamic range video data color format. 19.根据权利要求16所述的设备,其中为确定所述基础量化参数,所述一或多个处理器进一步经配置以19. The device of claim 16, wherein, for determining the fundamental quantization parameters, the one or more processors are further configured to... 将基础量化参数语法元素接收于经编码视频位流中,所述基础量化参数语法元素的值指示所述基础量化参数。The basic quantization parameter syntax element is received in the encoded video bitstream, and the value of the basic quantization parameter syntax element indicates the basic quantization parameter. 20.根据权利要求16所述的设备,其中为解码所述视频数据的所述块,所述一或多个处理器进一步经配置以:20. The apparatus of claim 16, wherein, for decoding the block of video data, the one or more processors are further configured to: 对所述视频数据的所述经编码块进行熵解码以执行经量化变换系数;Entropy decoding is performed on the coded blocks of the video data to execute quantized transform coefficients; 使用所述基础量化参数对所述经量化变换系数进行反量化以建立变换系数;The quantized transform coefficients are dequantized using the basic quantization parameters to establish transform coefficients; 对所述变换系数进行反变换以建立残余值;及Perform an inverse transformation on the transformation coefficients to establish residual values; and 对所述残余值执行预测过程以建立所述视频数据的所述经解码块。A prediction process is performed on the residual values to construct the decoded block of the video data. 21.根据权利要求16所述的设备,其中为确定用于所述视频数据的所述经解码块的所述量化参数偏移的所述估计,所述一或多个处理器进一步经配置以:21. The apparatus of claim 16, wherein, in order to determine the estimate of the quantization parameter offset for the decoded block of the video data, the one or more processors are further configured to: 确定所述视频数据的所述经解码块的样本值的平均值;及Determine the average value of the sample values of the decoded blocks of the video data; and 使用所述视频数据的所述经解码块的所述样本值的所述平均值确定用于所述视频数据的所述经解码块的所述量化参数偏移的所述估计。The average value of the sample values of the decoded block of the video data is used to determine the estimate of the quantization parameter offset for the decoded block of the video data. 22.根据权利要求21所述的设备,其中为确定所述量化参数偏移的所述估计,所述一或多个处理器进一步经配置以:22. The device of claim 21, wherein, for determining the estimate of the quantization parameter offset, the one or more processors are further configured to: 从查找表中确定所述量化参数偏移的所述估计,其中所述样本值的所述平均值为到所述查找表的输入。The estimate of the quantization parameter offset is determined from a lookup table, wherein the average of the sample values is the input to the lookup table. 23.根据权利要求21所述的设备,其中所述一或多个处理器进一步经配置以:23. The device of claim 21, wherein the one or more processors are further configured to: 从多个查找表中确定所述查找表。The lookup table is determined from multiple lookup tables. 24.根据权利要求16所述的设备,其中为对所述视频数据的所述经解码块执行所述一或多个滤波操作,所述一或多个处理器进一步经配置以:24. The apparatus of claim 16, wherein, for performing the one or more filtering operations on the decoded block of the video data, the one or more processors are further configured to: 使用所述有效量化参数将解块滤波器应用于视频数据的所述经解码块。The deblocking filter is applied to the decoded block of video data using the effective quantization parameters. 25.根据权利要求16所述的设备,其中为对所述视频数据的所述经解码块执行所述一或多个滤波操作,所述一或多个处理器进一步经配置以:25. The apparatus of claim 16, wherein, for performing the one or more filtering operations on the decoded block of the video data, the one or more processors are further configured to: 使用所述有效量化参数将双向滤波器应用于视频数据的所述经解码块。The bidirectional filter is applied to the decoded block of video data using the effective quantization parameters. 26.一种经配置以编码视频数据的设备,所述设备包括:26. An apparatus configured to encode video data, the apparatus comprising: 存储器,其经配置以存储所述视频数据的块;及A memory, configured to store blocks of the video data; and 与所述存储器通信的一或多个处理器,所述一或多个处理器经配置以:One or more processors communicating with the memory, the one or more processors being configured to: 确定用于所述视频数据的所述块的基础量化参数;Determine the underlying quantization parameters for the block of the video data; 基于所述视频数据的所述块的内容确定用于所述视频数据的所述块的量化参数偏移;The quantization parameter offset for the block of video data is determined based on the content of the block. 将所述量化参数偏移添加到所述基础量化参数以建立有效量化参数;及Add the quantization parameter offset to the base quantization parameter to establish an effective quantization parameter; and 使用所述有效量化参数及所述基础量化参数编码所述视频数据的所述块。The block of the video data is encoded using the effective quantization parameters and the basic quantization parameters. 27.根据权利要求26所述的设备,其中对于所述视频数据的所述块中的全部,所述基础量化参数是相同的。27. The device of claim 26, wherein the basic quantization parameters are the same for all of the blocks of the video data. 28.根据权利要求26所述的设备,其中所述视频数据的样本值是通过高动态范围视频数据色彩格式定义。28. The device of claim 26, wherein the sample values of the video data are defined by a high dynamic range video data color format. 29.根据权利要求26所述的设备,其中为编码所述视频数据的所述块,所述一或多个处理器进一步经配置以:29. The apparatus of claim 26, wherein, for the block encoding the video data, the one or more processors are further configured to: 预测所述块以产生残余样本;Predict the block to generate residual samples; 对所述残余样本进行变换以建立变换系数;The residual samples are transformed to establish transformation coefficients; 用所述有效量化参数量化所述变换系数;Quantize the transformation coefficients using the effective quantization parameters; 用所述有效量化参数对经量化变换系数进行反量化以产生失真变换系数;The quantized transform coefficients are dequantized using the effective quantization parameters to generate distortion transform coefficients. 对所述失真变换系数进行反变换以产生失真残余样本;The distortion transformation coefficients are inversely transformed to generate distorted residual samples; 对所述失真残余样本进行变换;及Transform the distorted residual sample; and 使用所述基础量化参数对经变换失真残余样本进行量化。The transformed distortion residual sample is quantized using the aforementioned basic quantization parameters. 30.根据权利要求26所述的设备,其中为确定所述量化参数偏移,所述一或多个处理器进一步经配置以从查找表中确定所述量化参数偏移。30. The device of claim 26, wherein, in order to determine the quantization parameter offset, the one or more processors are further configured to determine the quantization parameter offset from a lookup table. 31.一种经配置以解码视频数据的设备,所述设备包括:31. An apparatus configured to decode video data, the apparatus comprising: 用于接收所述视频数据的经编码块的装置;A means for receiving coded blocks of the video data; 用于确定用以编码所述视频数据的所述经编码块的基础量化参数的装置;A means for determining the basic quantization parameters of the coded block used to encode the video data; 用于使用所述基础量化参数解码所述视频数据的所述经编码块以建立视频数据的经解码块的装置;A means for decoding the coded blocks of the video data using the basic quantization parameters to construct decoded blocks of the video data; 用于基于所述视频数据的所述经解码块的内容确定用于所述视频数据的所述经解码块的所述量化参数偏移的估计的装置;A means for determining an estimate of the quantization parameter offset for the decoded block of the video data based on the content of the decoded block of the video data; 用于将所述量化参数偏移的所述估计添加到所述基础量化参数以建立有效量化参数的估计的装置;及A means for adding the estimate of the quantization parameter offset to the base quantization parameters to establish an estimate of the effective quantization parameters; and 用于根据所述有效量化参数的所述估计对所述视频数据的所述经解码块执行一或多个滤波操作的装置。A means for performing one or more filtering operations on the decoded block of the video data based on the estimate of the effective quantization parameters. 32.一种经配置以编码视频数据的设备,所述设备包括:32. An apparatus configured to encode video data, the apparatus comprising: 用于确定所述视频数据的块的基础量化参数的装置;A means for determining the basic quantization parameters of blocks of the video data; 用于基于所述视频数据的所述块的内容确定用于所述视频数据的所述块的量化参数偏移的装置;A means for determining a quantization parameter offset for a block of video data based on the content of the block; 用于将所述量化参数偏移添加到所述基础量化参数以建立有效量化参数的装置;及A means for adding the quantization parameter offset to the base quantization parameters to establish effective quantization parameters; and 用于使用所述有效量化参数及所述基础量化参数编码所述视频数据的所述块的装置。A means for encoding the block of video data using the effective quantization parameters and the basic quantization parameters. 33.一种存储指令的非暂时性计算机可读存储介质,所述指令在经执行时使一或多个处理器:33. A non-transitory computer-readable storage medium storing instructions, which, when executed, cause one or more processors to: 接收视频数据的经编码块;Received encoded blocks of video data; 确定用以编码所述视频数据的所述经编码块的基础量化参数;Determine the basic quantization parameters for the coded block used to encode the video data; 使用所述基础量化参数解码所述视频数据的所述经编码块以建立视频数据的经解码块;The encoded blocks of the video data are decoded using the basic quantization parameters to establish decoded blocks of the video data; 基于所述视频数据的所述经解码块的内容确定用于所述视频数据的所述经解码块的所述量化参数偏移的估计;Based on the content of the decoded block of the video data, an estimate of the quantization parameter offset for the decoded block of the video data is determined; 将所述量化参数偏移的所述估计添加到所述基础量化参数以建立有效量化参数的估计;及The estimated quantization parameter offset is added to the base quantization parameter to establish an estimate of the effective quantization parameter; and 根据所述有效量化参数的所述估计对视频数据的所述经解码块执行一或多个滤波操作。Based on the estimate of the effective quantization parameters, one or more filtering operations are performed on the decoded block of the video data. 34.一种存储指令的非暂时性计算机可读存储介质,所述指令在经执行时使一或多个处理器:34. A non-transitory computer-readable storage medium storing instructions, which, when executed, cause one or more processors to: 确定用于视频数据的块的基础量化参数;Determine the basic quantization parameters for the blocks of video data; 基于所述视频数据的所述块的内容确定用于所述视频数据的所述块的量化参数偏移;The quantization parameter offset for the block of video data is determined based on the content of the block. 将所述量化参数偏移添加到所述基础量化参数以建立有效量化参数;及Add the quantization parameter offset to the base quantization parameter to establish an effective quantization parameter; and 使用所述有效量化参数及所述基础量化参数编码所述视频数据的所述块。The block of the video data is encoded using the effective quantization parameters and the basic quantization parameters.
HK62020009946.8A 2017-10-12 2018-10-10 Video coding with content adaptive spatially varying quantization HK40020459B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US62/571,732 2017-10-12
US16/155,344 2018-10-09

Publications (2)

Publication Number Publication Date
HK40020459A HK40020459A (en) 2020-10-23
HK40020459B true HK40020459B (en) 2024-11-15

Family

ID=

Similar Documents

Publication Publication Date Title
US11765355B2 (en) Video coding with content adaptive spatially varying quantization
CN109155848B (en) In-loop sample processing for high dynamic range and wide color gamut video coding
CN110999299B (en) System and method for cross-component dynamic range adjustment (CC-DRA) in video coding
CN111480339B (en) Method and apparatus for video coding, computer readable storage medium, device
JP6800896B2 (en) Processing high dynamic range and wide color gamut video data for video coding
CN107690803B (en) An Adaptive Constant Illumination Method for High Dynamic Range and Wide Color Gamut Video Decoding
TWI765903B (en) Video coding tools for in-loop sample processing
HK40115177A (en) Method, apparatus and medium of decoding video data
HK40020459B (en) Video coding with content adaptive spatially varying quantization
HK40020459A (en) Video coding with content adaptive spatially varying quantization
HK40017308A (en) System and method of cross-component dynamic range adjustment (cc-dra) in video coding
HK40000964A (en) Video coding tools for in-loop sample processing