HK1217393B

HK1217393B - Adaptive reshaping for layered coding of enhanced dynamic range signals

Info

Publication number: HK1217393B
Application number: HK16105325.7A
Authority: HK
Inventors: 苏冠铭; R‧阿特肯斯; J‧S‧米勒
Original assignee: 杜比实验室特许公司
Priority date: 2013-06-17
Filing date: 2014-06-16
Publication date: 2019-09-20

Description

Adaptive Shaping for Layered Coding of Signals with Enhanced Dynamic Range

相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS

本申请要求以下专利申请的优先权：2013年6月17日提交的美国临时专利申请No.61/836,044；2014年3月12日提交的美国临时专利申请No.61/951,914；以及2014年5月23日提交的美国临时专利申请No.62/002,631，每件专利申请的全部内容特此通过引用并入。This application claims priority to U.S. Provisional Patent Application No. 61/836,044, filed June 17, 2013; U.S. Provisional Patent Application No. 61/951,914, filed March 12, 2014; and U.S. Provisional Patent Application No. 62/002,631, filed May 23, 2014, the entire contents of each of which are hereby incorporated by reference.

本申请还涉及2014年3月25日提交的国际申请No.PCT/US2014/031716，该申请的全部内容通过引用并入本文。This application is also related to International Application No. PCT/US2014/031716, filed on March 25, 2014, the entire contents of which are incorporated herein by reference.

技术领域Technical Field

本发明一般涉及视频图像。更具体地，本发明的实施例涉及用于分层编码和解码的、具有高或增强动态范围的图像的自适应整形(reshape)。The present invention relates generally to video images. More particularly, embodiments of the present invention relate to adaptive reshaping of images with high or enhanced dynamic range for layered encoding and decoding.

背景技术Background Art

如本文中所使用的，术语“动态范围”(DR)可以与人类心理视觉系统(HVS)感知图像中的例如从最黑暗的暗(黑)到最明亮的亮(白)的强度(例如，照度、亮度)范围的能力有关。在这个意义上，DR与“场景参考(scene-referred)”强度有关。DR还可以与显示设备充分地或近似地呈现特定广度(breadth)的强度范围的能力有关。在这个意义上，DR与“显示器参考(display-referred)”强度有关。除非特定的意义在本文的描述中的任何点处被明确地指定为具有特别的重要性，否则应推断该术语可以在任一意义上(例如，可互换地)被使用。As used herein, the term "dynamic range" (DR) can relate to the ability of the human psychovisual system (HVS) to perceive a range of intensities (e.g., illuminance, brightness) in an image, e.g., from the darkest darks (black) to the brightest brights (white). In this sense, DR relates to "scene-referred" intensity. DR can also relate to the ability of a display device to adequately or approximately render a range of intensities of a particular breadth. In this sense, DR relates to "display-referred" intensity. Unless a particular meaning is expressly designated as having particular importance at any point in the description herein, it should be inferred that the term can be used in either sense (e.g., interchangeably).

如本文中所使用的，术语高动态范围(HDR)与跨越人类视觉系统(HVS)的一些14-15个数量级的DR广度有关。例如，具有(例如，在统计、生物计量或眼科意义上)基本上正常的视觉的适应良好的人类具有跨越大约15个数量级的强度范围。适应的人类可以感知如仅少数光子那么少的昏暗光源。然而，这些相同的人类可以感知沙漠、海或雪中的正午的太阳的近乎痛苦的耀眼强度(或者甚至瞥向太阳，但是短暂地以防止伤害)。该跨度不过对于“适应的”人类(例如，其HVS具有进行重置和调整的时间段的那些人)是可用的。As used herein, the term high dynamic range (HDR) relates to a DR breadth of some 14-15 orders of magnitude across the human visual system (HVS). For example, well-adapted humans with essentially normal vision (e.g., in a statistical, biometric, or ophthalmological sense) have an intensity range spanning approximately 15 orders of magnitude. Adapted humans can perceive dim light sources as small as just a few photons. However, these same humans can perceive the nearly painful glare of the midday sun in the desert, sea, or snow (or even glance at the sun, but briefly to prevent damage). This span is however available to "adapted" humans (e.g., those whose HVS has a time period for reset and adjustment).

相反，与HDR相比较，在其上人类可以同时感知强度范围中的广泛广度的DR有些截短。如本文中所使用的，术语增强动态范围(EDR)或视觉动态范围(VDR)可以单独地或可互换地与HVS可同时感知的DR有关。如本文中所使用的，EDR可以与跨越5至6个数量级的DR有关。因此，尽管与真实场景参考HDR相比较，可能有些较窄，但是EDR却表示宽的DR广度。In contrast, the DR over which humans can simultaneously perceive a wide range of intensities is somewhat truncated compared to HDR. As used herein, the terms enhanced dynamic range (EDR) or visual dynamic range (VDR) may be used individually or interchangeably to relate to the DR that can be simultaneously perceived by an HVS. As used herein, EDR may relate to a DR that spans 5 to 6 orders of magnitude. Thus, while it may be somewhat narrow compared to real-world reference HDR, EDR represents a wide DR breadth.

在实际中，图像包括一个或多个颜色分量(例如，亮度Y以及色度Cb和Cr)，其中，每个颜色分量通过每一像素n位的精度表示(例如，n＝8))。尽管亮度动态范围和位深不是等同的实体，但是它们通常是相关的。其中n≤8的图像(例如，彩色24位JPEG图像)被认为是标准动态范围的图像，而其中n>8的图像可以被认为是增强动态范围的图像。EDR和HDR图像也可以使用高精度(例如，16位)浮点格式(诸如由Industrial Light and Magic开发的OpenEXR文件格式)来进行存储和分发。In practice, an image includes one or more color components (e.g., luma, Y, and chroma, Cb, and Cr), where each color component is represented with n bits of precision per pixel (e.g., n=8). Although luminance dynamic range and bit depth are not equivalent entities, they are generally related. Images where n≤8 (e.g., color 24-bit JPEG images) are considered standard dynamic range images, while images where n>8 can be considered enhanced dynamic range images. EDR and HDR images can also be stored and distributed using high-precision (e.g., 16-bit) floating-point formats, such as the OpenEXR file format developed by Industrial Light and Magic.

视频信号可以通过多个参数(诸如位深、颜色空间、色域和分辨率)表征。现代的电视和视频回放设备(例如，蓝光播放器)支持各种分辨率，包括标清(例如，720×480i)和高清(HD)(例如，1920×1080p)。超高清(UHD)是具有至少3,840×2,160分辨率(被称为4KUHD)以及高达7680×4320的选项(被称为8K UHD)的下一代分辨率格式。超高清也可以被称为Ultra HD、UHDTV或超高视觉。如本文中所使用的，UHD表示高于HD分辨率的任何分辨率。A video signal can be characterized by multiple parameters such as bit depth, color space, color gamut, and resolution. Modern televisions and video playback devices (e.g., Blu-ray players) support a variety of resolutions, including standard definition (e.g., 720×480i) and high definition (HD) (e.g., 1920×1080p). Ultra High Definition (UHD) is a next-generation resolution format with a resolution of at least 3,840×2,160 (known as 4K UHD) and options up to 7,680×4320 (known as 8K UHD). Ultra High Definition may also be referred to as Ultra HD, UHDTV, or Ultra High Vision. As used herein, UHD means any resolution higher than HD resolution.

为了支持与旧有的8位回放设备以及新的HDR或UHD编码和显示技术的后向兼容，可以使用多种格式来将UHD和HDR(或EDR)视频数据从上游设备递送到下游设备。给定EDR流，一些解码器可以使用8位层的集合来重构内容的HD SDR或EDR版本。高级解码器可以使用以比传统的8位高的位深编码的层的第二集合来重构内容的UHD EDR版本以在更有能力的显示器上呈现它。如发明人在这里所意识到的，用于EDR视频的编码和分发的改进技术是所希望的。To support backward compatibility with legacy 8-bit playback devices and new HDR or UHD encoding and display technologies, a variety of formats can be used to deliver UHD and HDR (or EDR) video data from upstream devices to downstream devices. Given an EDR stream, some decoders can use a set of 8-bit layers to reconstruct an HD SDR or EDR version of the content. Advanced decoders can use a second set of layers encoded at a higher bit depth than traditional 8 bits to reconstruct a UHD EDR version of the content to present it on a more capable display. As the inventors herein have appreciated, improved techniques for encoding and distribution of EDR video are desirable.

在本部分中描述的方法是可以追寻的方法，但不一定是以前已设想或追寻的方法。因此，除非另有指示，否则不应仅仅由于在本部分中所描述的任一方法包括在本部分中就假定它们有资格作为现有技术。类似地，针对一种或多种方法识别出的问题不应基于本部分就假定已在任何现有技术中被认识到，除非另有指示。The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art simply because they are included in this section. Similarly, problems identified with respect to one or more approaches should not be assumed to have been recognized in any prior art based on this section, unless otherwise indicated.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

本发明的实施例在附图的图中通过示例的方式、而非以限制的方式被示出，并且在附图中，相似的附图标记指的是类似的元素，并且其中：Embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements, and in which:

图1A描绘根据本发明的实施例的EDR分层编码的示例框架；FIG1A depicts an example framework for EDR layered encoding according to an embodiment of the present invention;

图1B描绘根据本发明的实施例的EDR分层解码的示例框架；FIG1B depicts an example framework for EDR layered decoding according to an embodiment of the present invention;

图2描绘基于幂函数的示例EDR信号整形函数，其中，函数参数α根据本发明的实施例而确定。FIG. 2 depicts an example EDR signal shaping function based on a power function, wherein the function parameter α is determined according to an embodiment of the present invention.

图3描绘根据本发明的实施例的用于确定用于EDR输入的前向成形函数的最佳指数的示例过程；3 depicts an example process for determining an optimal exponent of a forward shaping function for an EDR input, according to an embodiment of the present invention;

图4描绘根据本发明的实施例的用于确定EDR码字的前向映射的示例过程；FIG4 depicts an example process for determining a forward mapping for an EDR codeword according to an embodiment of the present invention;

图5描绘根据本发明的实施例的输入EDR码字(v_c)到基于块的缩放因子(k(v_c))的中间映射的示例；FIG5 depicts an example of an intermediate mapping of input EDR codewords (v _c ) to block-based scaling factors (k(v _c )) according to an embodiment of the present invention;

图6描绘根据本发明的实施例的输入EDR码字到最终输出的整形符号的示例映射；FIG6 depicts an example mapping of input EDR codewords to final output shaping symbols according to an embodiment of the present invention;

图7描绘根据本发明的实施例计算的反向映射的示例；FIG7 depicts an example of a reverse mapping calculated according to an embodiment of the present invention;

图8A和图8B描绘根据本发明的实施例的色度范围缩放的示例；以及8A and 8B depict examples of chroma range scaling according to an embodiment of the present invention; and

图9描绘根据本发明的实施例的编码和解码管线的示例。FIG9 depicts an example of an encoding and decoding pipeline according to an embodiment of the invention.

具体实施方式DETAILED DESCRIPTION

本文中描述了用于具有增强动态范围(EDR)的视频图像的分层编码的自适应整形技术。在以下描述中，出于解释的目的，阐明了大量具体的细节，以便提供本发明的透彻理解。然而，将清楚的是，本发明可以在没有这些具体细节的情况下实施。在其它情况下，公知的结构和设备没有被详尽地描述，以便避免不必要地封闭、模糊或混淆本发明。This document describes an adaptive shaping technique for layered coding of video images with enhanced dynamic range (EDR). In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent that the present invention can be practiced without these specific details. In other instances, well-known structures and devices are not described in detail in order to avoid unnecessarily enclosing, obscuring, or obfuscating the present invention.

概述Overview

本文中所描述的示例实施例涉及用于高效分层编码的、具有高或增强动态范围的视频图像的自适应整形。编码器接收将以分层表示被编码的输入的增强动态范围(EDR)图像。输入图像可以使用可用视频编码器中的一个或多个不支持的位深格式被伽玛编码或感知编码。输入图像被重映射到一个或多个量化层以使用可用视频编码器来产生适合于压缩的输出码字。The example embodiments described herein relate to adaptive reshaping of video images with high or enhanced dynamic range for efficient layered coding. An encoder receives an input enhanced dynamic range (EDR) image to be encoded in a layered representation. The input image can be gamma-encoded or perceptually coded using one or more bit depth formats not supported by available video encoders. The input image is remapped to one or more quantization layers to produce output codewords suitable for compression using the available video encoder.

在一个实施例中，重映射是基于使用单个函数参数的幂函数。展现了基于对于输入EDR图像中的每个块计算基于块的复杂性量度(measure)、并然后评估量化图像中的量化引起的失真的量来确定最佳函数参数的技术。In one embodiment, the remapping is based on a power function using a single function parameter. A technique is presented for determining the optimal function parameter based on computing a block-based complexity measure for each block in the input EDR image and then evaluating the amount of quantization-induced distortion in the quantized image.

在另一个实施例中，使用基于块的复杂性度量(诸如标准偏差)以及基于块的线性量化模型(其中，对于每个图像块确定单独的最优的量化器缩放器(scaler))来产生最佳映射。该单独的最优的缩放器被组合以对于每个输入码字确定包络斜率(envelope slope)，并且基于包络斜率来确定输入码字与输出码字之间的最优的前向映射函数。反向映射函数可以作为查找表被发送到解码器，或者它可以使用分段多项式近似来近似。In another embodiment, a block-based complexity metric (such as standard deviation) and a block-based linear quantization model (wherein a separate optimal quantizer scaler is determined for each image block) are used to generate the optimal mapping. The separate optimal scalers are combined to determine the envelope slope for each input codeword, and the optimal forward mapping function between the input codeword and the output codeword is determined based on the envelope slope. The reverse mapping function can be sent to the decoder as a lookup table, or it can be approximated using a piecewise polynomial approximation.

在另一个实施例中，给定反向映射查找表，使用分段多项式近似技术来近似逆向(inverse)映射函数。In another embodiment, given a reverse mapping lookup table, a piecewise polynomial approximation technique is used to approximate the inverse mapping function.

在解码器中，对编码的位流层进行解码以产生解码的视频层，该解码的视频层被再组合以产生单个解码信号。然后，给定接收到的定义编码器整形或映射函数的参数，对解码信号进行逆向映射以产生从编码器发送到解码器的原始EDR信号的估计。In the decoder, the encoded bitstream layers are decoded to produce decoded video layers, which are recombined to produce a single decoded signal. The decoded signal is then inversely mapped to produce an estimate of the original EDR signal sent from the encoder to the decoder, given the parameters received that define the encoder's shaping or mapping function.

在另一个实施例中，可以对输入视频信号的色度颜色分量进行转化(translate)，以使得期望的白点(white point)的坐标被近似地移位(shift)到转化的色度范围的中心。In another embodiment, the chrominance color components of the input video signal may be translated so that the coordinates of the desired white point are shifted approximately to the center of the translated chrominance range.

用于视频信号整形和分层分解的示例框架An example framework for video signal shaping and hierarchical decomposition

分层编码和解码Layered encoding and decoding

现有的显示器和回放设备(诸如HDTV、机顶盒或蓝光播放器)通常支持高达1080pHD分辨率(例如，在每秒60帧时的1920×1080)的信号。对于消费者应用，这样的信号现在通常以其中通常色度分量具有比亮度分量低的分辨率的亮度-色度颜色格式(例如，YCbCr或YUV 4:2:0颜色格式)、每一颜色分量每一像素使用8位的位深来进行压缩。由于8-位深和对应的低动态范围，这样的信号通常被称为具有标准动态范围(SDR)的信号。随着新的电视标准(诸如超高清(UHD))正被开发，可能所希望的是对具有增强分辨率和/或增强动态范围的信号进行编码。Existing displays and playback devices, such as HDTVs, set-top boxes, or Blu-ray players, typically support signals with resolutions up to 1080p HD (e.g., 1920×1080 at 60 frames per second). For consumer applications, such signals are now typically compressed in a luma-chroma color format (e.g., YCbCr or YUV 4:2:0 color format), where the chroma components typically have a lower resolution than the luma component, using a bit depth of 8 bits per pixel per color component. Due to the 8-bit depth and the corresponding low dynamic range, such signals are often referred to as signals with standard dynamic range (SDR). As new television standards, such as ultra-high definition (UHD), are being developed, it may be desirable to encode signals with enhanced resolution and/or enhanced dynamic range.

视频图像通常被伽玛编码以补偿人类视觉系统的性质。例如，ITU-R Rec.2020定义了UHDTV信号的推荐伽玛编码。对于EDR图像，感知量化(PQ)可以提供对于传统的伽玛编码的更好的替代。人类视觉系统以非常非线性的方式对增加的光水平(level)进行响应。人类看见刺激的能力受该刺激的亮度、该刺激的大小、构成该刺激的空间频率、以及眼睛在一个人观看该刺激的特定时刻适应的亮度水平影响。感知量化器函数将线性的输入灰度水平映射到与人类视觉系统中的对比灵敏度阈值更好地匹配的输出灰度水平。在其全部内容通过引用并入本文的、2012年12月6日提交的、J.S.Miller等人的标题为“Perceptualluminance nonlinearity-based image data exchange across different displaycapabilities”、序号为PCT/US2012/068212的PCT申请(将被称为‘212申请)中描述了PQ映射函数的示例，在该申请中，给定固定的刺激大小，对于每一个亮度水平(即，刺激水平)，根据最灵敏的适应水平和最灵敏的空间频率(根据HVS模型)来选择该亮度水平处的最小可见对比步长(step)。与传统的表示物理阴极射线管(CRT)设备的响应曲线、并且巧合地可能具有与人类视觉系统响应的方式非常粗略的相似性的伽玛曲线相比，如‘212申请所确定的PQ曲线使用相对简单的函数模型来模拟人类视觉系统的真实视觉响应。Video images are often gamma-encoded to compensate for the properties of the human visual system. For example, ITU-R Rec.2020 defines recommended gamma encoding for UHDTV signals. For EDR images, perceptual quantization (PQ) can provide a better alternative to traditional gamma encoding. The human visual system responds to increasing light levels in a very nonlinear way. The ability of humans to see a stimulus is affected by the brightness of the stimulus, the size of the stimulus, the spatial frequencies that make up the stimulus, and the brightness level to which the eye adapts at the specific moment a person views the stimulus. The perceptual quantizer function maps linear input grayscale levels to output grayscale levels that better match the contrast sensitivity threshold in the human visual system. An example of a PQ mapping function is described in PCT application serial number PCT/US2012/068212, filed December 6, 2012, by J.S. Miller et al., entitled “Perceptual luminance nonlinearity-based image data exchange across different display capabilities,” which is incorporated herein by reference in its entirety (hereinafter referred to as the '212 application). In this application, given a fixed stimulus size, for each luminance level (i.e., stimulus level), the minimum visible contrast step at that luminance level is selected based on the most sensitive adaptation level and the most sensitive spatial frequency (according to the HVS model). Compared to a conventional gamma curve, which represents the response curve of a physical cathode ray tube (CRT) device and, coincidentally, may have a very rough resemblance to the way the human visual system responds, the PQ curve as defined in the '212 application uses a relatively simple function model to simulate the true visual response of the human visual system.

在其全部内容通过引用并入本文的、2013年3月26日提交的、标题为“Encodingperceptually-quantized video content in multi-layer VDR coding”的、从现在开始将被称为‘388申请的美国临时申请序号61/805,388(该申请还于2014年3月25日被作为PCT/US2014/031716提交)中，发明人描述了使用两层编码器对PQ编码的EDR图像数据进行高效编码和传输的图像整形技术。本申请通过描述可应用于使用单层编码器或多层编码器两者对EDR数据进行编码的新颖的映射或整形技术来在‘388申请上进行了扩展。In U.S. Provisional Application Serial No. 61/805,388, filed on March 26, 2013, entitled “Encoding perceptually-quantized video content in multi-layer VDR coding,” which is hereby incorporated by reference in its entirety and will from now on be referred to as the '388 application (also filed as PCT/US2014/031716 on March 25, 2014), the inventors described image shaping techniques for efficient encoding and transmission of PQ-coded EDR image data using a two-layer encoder. The present application expands upon the '388 application by describing novel mapping or shaping techniques that can be applied to encoding EDR data using both a single-layer encoder and a multi-layer encoder.

图1A描绘根据本发明的实施例的EDR分层编码的示例框架。输入信号(102)包括具有可能已被伽玛或PQ编码的EDR像素值的视频帧的序列。对于总共L个编码层，该系统包括至少一个基本层(BL)视频编码器(120-0)，并且可以包括一个或多个增强层(EL)视频编码器(120-1，120-L-1)，直到L-1。例如，对于L＝2，该系统包括双层编码器。视频编码器(120)可以全部是相同的或不同的，从而实现用于视频压缩的已知的或未来的编码格式中的任何一个，诸如：MPEG-2、MPEG-4、part 2、H.264(或AVC)、H.265(或HEVC)等。此外，一个层中的视频编码器可以支持与另一个层中的视频编码器不同的位深。例如，不失一般性，实施例可以包括以下配置：1A depicts an example framework for EDR layered coding according to an embodiment of the present invention. The input signal (102) comprises a sequence of video frames having EDR pixel values that may have been gamma or PQ encoded. For a total of L coding layers, the system comprises at least one base layer (BL) video encoder (120-0) and may comprise one or more enhancement layer (EL) video encoders (120-1, 120-L-1), up to L-1. For example, for L=2, the system comprises a dual-layer encoder. The video encoders (120) may all be the same or different, thereby implementing any one of known or future coding formats for video compression, such as: MPEG-2, MPEG-4, part 2, H.264 (or AVC), H.265 (or HEVC), etc. In addition, a video encoder in one layer may support a different bit depth than a video encoder in another layer. For example, without loss of generality, an embodiment may comprise the following configuration:

●单层HEVC编码器，其支持至少10位的位深，但是优选地支持12位或更多。• A single-layer HEVC encoder that supports a bit depth of at least 10 bits, but preferably 12 bits or more.

●双层编码器，其中，两个编码器可以使用相同的格式(比如说H.264)来进行编码，并且两个编码器支持相同的或不同的位深(比如说，8位和10位)Dual-layer encoders, where both encoders can encode using the same format (e.g., H.264) and support the same or different bit depths (e.g., 8-bit and 10-bit)

●双层编码器，其中，两个编码器可以使用不同的编码格式来进行编码，并且每一个可以支持不同的位深(比如说，8位、以及10位或更多)A dual-layer encoder, where the two encoders can encode using different coding formats, and each can support a different bit depth (e.g., 8 bits and 10 bits or more)

●多层编码器，其中，至少一个编码器为8位MPEG-2编码器，而至少一个其它编码器为HEVC或H.264编码器。• A multi-layered encoder, wherein at least one encoder is an 8-bit MPEG-2 encoder and at least one other encoder is a HEVC or H.264 encoder.

视频编码器(120)可以全部通过单个处理器或者通过一个或多个处理器来实现。The video encoder (120) may be implemented entirely by a single processor or by one or more processors.

根据实施例，信号整形模块(110)将输入的EDR信号(被表示为v)量化为更好地符合视频编码器(120)的特性(诸如最大支持位深)的信号s(112)。如本文中所使用的，术语整形、量化和(前向)映射表示将输入信号从第一动态范围映射到通常低于第一动态范围的第二动态范围的输出信号的等同功能，并且可以可互换地使用。According to an embodiment, a signal shaping module (110) quantizes an input EDR signal (denoted as v) into a signal s (112) that better conforms to characteristics (such as a maximum supported bit depth) of a video encoder (120). As used herein, the terms shaping, quantization, and (forward) mapping denote equivalent functions of mapping an input signal from a first dynamic range to an output signal of a second dynamic range that is typically lower than the first dynamic range, and may be used interchangeably.

令B_l表示由第l层视频编码器(120-l，l＝0,1,2,…,L-1)使用的位深，则，对于总共N_T＝N₀+N₁+…+N_L-1个码字，每个层可以支持多达N_l＝2^Bl个输入码字。例如，对于L＝1(单层)且B₀＝10，存在2¹⁰＝1024个量化码字。对于L＝2(双层)且B₀＝B₁＝8，则存在2⁸+2⁸＝512个量化码字。当L＝2且B₁₀＝10、B₁＝8时，则存在总共2¹⁰+2⁸＝1280个量化码字。因此，该系统可以适应视频编码标准的任何组合，每个视频编码标准以它自己的位深进行操作。Let _B1 denote the bit depth used by the layer l video encoder (120-1, l = 0, 1, 2, ..., L-1), then each layer can support up to _N1 = ^2B1 input codewords for a total of _NT = _N0 + _N1 + ... + _NL-1 codewords. For example, for L = 1 (single layer) and _B0 = 10, there are ²¹⁰ = 1024 quantization codewords. For L = 2 (dual layer) and _B0 = _B1 = 8, there are ²⁸ + ²⁸ = 512 quantization codewords. When L = 2 and _B10 = 10, _B1 = 8, there are ²¹⁰ + ²⁸ = 1280 quantization codewords. Thus, the system can accommodate any combination of video coding standards, each operating at its own bit depth.

令s＝f(v)表示信号整形/量化函数(110)。稍后将更详细地描述这样的函数的示例。标识整形函数的参数可以被包括在元数据信号(119)中。在一些实施例中，元数据(119)可以被元数据编码器(125)编码，并且经编码的元数据(127)可以被用信号发送到解码器(如图1B中所描绘的解码器)，以用于适当的逆向量化和解码。在另一个实施例中，信号整形(110)可以包括信号整形函数族，从而对于一个或多个层或者层内的一个或多个色度分量，使用单独的成形函数。例如，在实施例中，用于基本层(l＝0)的信号整形函数可以是线性函数，而用于第一增强层(l＝1)的信号整形函数可以包括非线性函数或分段线性函数。Let s = f(v) denote the signal shaping/quantization function (110). Examples of such functions will be described in more detail later. Parameters identifying the shaping function may be included in a metadata signal (119). In some embodiments, the metadata (119) may be encoded by a metadata encoder (125), and the encoded metadata (127) may be signaled to a decoder (such as the decoder depicted in FIG1B ) for appropriate inverse quantization and decoding. In another embodiment, the signal shaping (110) may include a family of signal shaping functions such that a separate shaping function is used for one or more layers, or one or more chroma components within a layer. For example, in an embodiment, the signal shaping function for the base layer (l = 0) may be a linear function, while the signal shaping function for the first enhancement layer (l = 1) may include a nonlinear function or a piecewise linear function.

层分解Layer decomposition

在实施例中，令量化信号s(112)的像素值被分成由段边界{p_i,i＝0,1,…,L}限定的L个段，其中，p₀通常表示对于s的最小可能值(例如，p₀＝0)，并且In an embodiment, let the pixel values of the quantized signal s (112) be divided into L segments defined by segment boundaries { _pi , i=0, 1, ..., L}, where _p0 generally represents the minimum possible value for s (e.g., _p0 =0), and

例如，对于L＝1，p₀＝0并且p₁＝N₀。该模块将把所有的码字编码到基本层。For example, for L = 1, p ₀ = 0 and p ₁ = N ₀ . This module will encode all codewords into the base layer.

对于L＝2，p₀＝0，p₁＝N₀，p₂＝N₀+N₁。在实施例中，将在层0中对具有{p₀,p₁}之间的码字的像素进行编码，并且将在层1中对具有{p₁,p₂}之间的码字的像素进行编码。一般地，给定L个层，对于每个层l，该层处的s_l个像素被编码为：For L=2, p ₀ =0, p ₁ =N ₀ , p ₂ =N ₀ +N ₁ . In an embodiment, pixels with codewords between {p ₀ ,p ₁ } will be encoded in layer 0, and pixels with codewords between {p ₁ ,p ₂ } will be encoded in layer 1. In general, given L layers, for each layer l, the s _l pixels at that layer are encoded as:

s_l＝Clip3(s，p_l，p_l+1-1)-p_l，l＝0,1,2,…,L-1, (2)s _l =Clip3(s, p _l , p _l+1 -1)-p _l , l＝0,1,2,…,L-1, (2)

其中，d＝Clip3(s,a,b)表示裁剪(clip)函数，其中，如果a≤s≤b，则d＝s，如果s<a，则d＝a，以及如果s>b，则d＝b。Here, d=Clip3(s,a,b) represents a clipping function, where if a≤s≤b, then d=s, if s<a, then d=a, and if s>b, then d=b.

在层分解(115)之后，在实施例中，每个层s_l(117-l，l＝0,1,…,L-1)可以被视频编码器(120)独立地编码以产生经压缩的位流(122)。如在‘388申请中所讨论的，在一些实施例中，可以将图1A中所描绘的系统修改为也允许层间预测。在这样的实施例中，预测器可以被用于基于l-1层的像素值来估计第l层的像素值；然后不是直接对第l层的像素值进行编码，而是简单地对实际值与预测值之间的残差进行编码和发送。After the layer decomposition (115), in an embodiment, each layer s _l (117-l, l = 0, 1, ..., L-1) can be independently encoded by a video encoder (120) to produce a compressed bitstream (122). As discussed in the '388 application, in some embodiments, the system depicted in FIG1A can be modified to also allow inter-layer prediction. In such an embodiment, a predictor can be used to estimate the pixel values of layer l based on the pixel values of layer l-1; then, instead of encoding the pixel values of layer l directly, the residual between the actual value and the predicted value is simply encoded and transmitted.

在一些实施例中，经编码的位流(122)、经编码的元数据(127)和其它数据(例如，音频数据)可以被复用为单个位流，并且作为单个复用的位流(未示出)被发送到解码器。In some embodiments, the encoded bitstream (122), encoded metadata (127), and other data (e.g., audio data) may be multiplexed into a single bitstream and sent to a decoder as a single multiplexed bitstream (not shown).

图1B描绘了根据本发明的实施例的EDR分层解码的示例框架。如图1B中所描绘的，在接收的可以组合音频、视频和辅助数据(未示出)的位流的解复用之后，将接收的经编码的位流(122)中的每一个馈送给视频解码器阵列(130)。解码器(130)对应于编码器(120)，并且产生经解码的视频信号(132)中的一个或多个。使用信号逆向整形和层合成器(140)单元，接收的分层信号被组合和逆向成形以产生信号(142)，其表示原始EDR信号(102)的估计。在实施例中，可以如下产生输出EDR信号(142)：FIG1B depicts an example framework for EDR layered decoding according to an embodiment of the present invention. As depicted in FIG1B , after demultiplexing of the received bitstream, which may combine audio, video, and auxiliary data (not shown), each of the received encoded bitstreams (122) is fed to a video decoder array (130). The decoder (130) corresponds to the encoder (120) and produces one or more of the decoded video signals (132). Using a signal inverse shaping and layer synthesizer (140) unit, the received layered signals are combined and inversely shaped to produce a signal (142) that represents an estimate of the original EDR signal (102). In an embodiment, the output EDR signal (142) may be generated as follows:

其中，f^-1()表示信号整形函数(110)的逆(或逆的接近的近似)，并且表示重构的层信号(132)，其表示原始s_l信号(117)的非常接近的近似。如图1B中所描绘的，在接收的层之间不存在层间预测；然而，如视频编码领域中已知的，该系统可以被容易地扩展到如下的解码器：在这些解码器中，使用接收的残差信号和层间预测来产生信号。where f ^-1 () represents the inverse (or a close approximation of the inverse) of the signal shaping function (110) and represents the reconstructed layer signal (132), which represents a very close approximation of the original s _l signal (117). As depicted in FIG. 1B , there is no inter-layer prediction between the received layers; however, as is known in the art of video coding, the system can be easily extended to decoders in which the received residual signal and inter-layer prediction are used to generate the signal.

使用幂函数的EDR信号整形EDR signal shaping using power function

如‘388申请中所描述的，对于PQ编码的信号，在实施例中，信号整形函数(110)可以被表达为：As described in the '388 application, for a PQ coded signal, in an embodiment, the signal shaping function (110) may be expressed as:

其中，v_L和v_H表示考虑输入EDR信号(102)下的颜色通道中的最小值和最大值，并且c_L和c_H表示对应的最小输出值和最大输出值。例如，在实施例中，如等式(1)中所定义的，c_L＝0并且c_H＝p_L-1。α的值是恒定的，但是可以基于每一帧、每一场景或其它合适的标准而改动和变化。图2描绘了对于α<1的等式(4)的幂整形函数的示例。在实施例中，如果输入(112)被PQ编码，则α>1，否则，如果输入(112)被伽玛编码，则α<1。Where v _L and v _H represent the minimum and maximum values in the color channel under consideration of the input EDR signal (102), and c _L and c _H represent the corresponding minimum and maximum output values. For example, in an embodiment, as defined in equation (1), c _L = 0 and c _H = p _L -1. The value of α is constant, but can be modified and varied based on each frame, each scene, or other suitable criteria. Figure 2 depicts an example of the power shaping function of equation (4) for α < 1. In an embodiment, if the input (112) is PQ encoded, then α > 1, otherwise, if the input (112) is gamma encoded, then α < 1.

在‘388申请中，描述了使用基于块内的像素的标准偏差或者块内的最小像素值与最大像素值之间的差的块复杂性度量(M_EDR)来确定α的最佳值的方法。除了这些度量之外，可以基于块的空间特性或频率特性来应用其它复杂性量度。例如，M_EDR可以对应于块中的像素的方差、块的DC值、或者其DCT系数或像素值的另一个函数。In the '388 application, a method is described for determining an optimal value for α using a block complexity metric ( _MEDR ) based on the standard deviation of pixels within a block or the difference between the minimum and maximum pixel values within a block. In addition to these metrics, other complexity metrics can be applied based on the spatial or frequency characteristics of the block. For example, _MEDR can correspond to the variance of the pixels in the block, the DC value of the block, or another function of its DCT coefficients or pixel values.

考虑包括F个视频帧的视频场景，每个帧被分成N个块(例如，每个块为8×8个像素)。图像块可以重叠，或者在优选实施例中，不重叠。图3描绘了基于一般化的块复杂性量度M_EDR(j,n)(例如，不失一般性，块内的像素值的标准偏差)来计算最佳α的过程。Consider a video scene consisting of F video frames, each of which is divided into N blocks (e.g., each block is 8×8 pixels). The image blocks may overlap, or in a preferred embodiment, do not overlap. FIG3 depicts the process of computing the optimal α based on a generalized block complexity measure _MEDR (j,n) (e.g., without loss of generality, the standard deviation of the pixel values within a block).

如图3中所描绘的，在步骤(305)(在该步骤中，对场景中的所有帧中的每个块计算M_EDR(j,n))之后，在步骤(310)中，构造集合Ф_j，该集合Ф_j包括满足某一标准(例如，M_EDR(j,n)>T，其中，T是预先指定的阈值，比如说0)的所有图像块。As depicted in FIG3 , after step ( 305 ) in which M _EDR ( j, n ) is calculated for each block in all frames in the scene, in step ( 310 ), a set Φ _j is constructed that includes all image blocks that meet a certain criterion (e.g., M _EDR ( j, n ) > T, where T is a pre _- specified threshold, such as 0).

步骤(315)、(320)和(325)包括对于预定范围(例如，MIN_α≤α_j≤MAX_α)内的各种α_j值计算的循环(327)。例如，开始α_j可以被设置为等于1，并且然后可以根据原始EDR视频数据被如何编码而增大或减小。例如，对于PQ编码的数据，它将被增大，而对于伽玛编码的数据，它将被减小。在步骤(320)中，使用给定的α_j和等式(4)，对输入数据EDR进行量化，并且可以对每个经量化的块计算新的度量M_LD(j,n,α_j)。在一些实施例中，复杂性量度M_LD可以与复杂性量度M_EDR相同。在一些其它的实施例中，这两个复杂性量度可以是不同的。对输入的EDR数据量化越多，经量化的信号(112)的特性将改变越多。理想地，量化(110)应使输入尽可能少地失真。在步骤(325)中，可以应用由于量化而导致的失真的量度以识别选择的α_j是否是最佳的。例如，在实施例中，如果Steps (315), (320) and (325) include a loop (327) of calculations for various values of α _j within a predetermined range (e.g., MIN_α ≤ _{α j} ≤ MAX_α). For example, initially α _j may be set equal to 1 and may then be increased or decreased depending on how the original EDR video data is encoded. For example, it would be increased for PQ encoded data and decreased for gamma encoded data. In step (320), the input EDR data is quantized using the given α _j and equation (4), and a new metric M _LD (j, n, α _j ) may be calculated for each quantized block. In some embodiments, the complexity metric M _LD may be the same as the complexity metric M _EDR . In some other embodiments, the two complexity metrics may be different. The more the input EDR data is quantized, the more the characteristics of the quantized signal (112) will change. Ideally, quantization (110) should distort the input as little as possible. In step (325), a measure of the distortion due to quantization may be applied to identify whether the selected α _j is optimal. For example, in an embodiment, if

则α_j可以被选择为最佳的，其中，T_σ是另一个预定的阈值(例如，T_σ＝0)。Then α _j can be selected as the optimal one, where T _σ is another predetermined threshold (eg, T _σ =0).

在场景中的所有块已被量化之后，在步骤(330)中选择总体最优的alpha值。例如，在实施例中，对于α>1，总体最优的α被选择为所有最佳的α_j值之中最小的那一个。类似地，对于α<1，总体最优的α被选择为所有最佳的α_j值之中最大的那一个。After all blocks in the scene have been quantized, the overall optimal alpha value is selected in step (330). For example, in an embodiment, for α>1, the overall optimal α is selected as the smallest of all the optimal _αj values. Similarly, for α<1, the overall optimal α is selected as the largest of all the optimal _αj values.

在一些实施例中，为了针对由于视频编码器(120)的有损压缩而导致的量化影响进行调整，可以进一步调整总体最佳的alpha(例如，α＝α+Δα，其中，当α>1时Δ为负值，而当α<1时Δ为正值)。相同的参数优化过程也可以被容易地扩展到其它的由多于一个的函数参数表征的线性或非线性量化和整形函数。In some embodiments, to adjust for the quantization effects due to lossy compression by the video encoder (120), the overall optimal alpha can be further adjusted (e.g., α=α+Δα, where Δ is negative when α>1 and positive when α<1). The same parameter optimization process can also be easily extended to other linear or nonlinear quantization and shaping functions characterized by more than one function parameter.

表1以伪代码提供了根据实施例的基于图3中所描绘的过程对伽玛编码的值进行整形的示例算法，其中，期望的α<1。Table 1 provides in pseudocode an example algorithm for shaping gamma-encoded values based on the process depicted in FIG. 3 , where a desired α<1, according to an embodiment.

表1—对于伽玛编码的EDR信号计算最优α的方法Table 1 — Method for calculating optimal α for gamma-coded EDR signals

在解码器中，逆向成形运算(140)可以被表达为：In the decoder, the inverse shaping operation (140) can be expressed as:

在一些实施例中，等式(4)的幂函数可以被表达为分段线性多项式、分段2-d或3-d阶多项式或分段B样条。在这样的实现中，建议应强制执行段之间的平滑度和单调性约束以避免量化相关的伪像。然后可以应用与较早描述的或者在‘388申请中描述的那些搜索方法类似的搜索方法。In some embodiments, the power function of equation (4) can be expressed as a piecewise linear polynomial, a piecewise 2-d or 3-d order polynomial, or a piecewise B-spline. In such an implementation, it is recommended that smoothness and monotonicity constraints between segments should be enforced to avoid quantization-related artifacts. Search methods similar to those described earlier or in the '388 application can then be applied.

块自适应整形函数Block-adaptive shaping function

再次考虑包括F个视频帧的视频场景，每个帧被分成N个块(例如，每个块为8×8个像素)。图像块可以重叠，或者在优选实施例中，不重叠。图4描绘了根据实施例的将输入的EDR码字(102)映射为经整形的输出值(112)的示例数据流程。Consider again a video scene comprising F video frames, each frame divided into N blocks (e.g., each block being 8×8 pixels). The image blocks may overlap, or in a preferred embodiment, do not overlap. FIG4 depicts an example data flow for mapping an input EDR codeword (102) to a shaped output value (112), according to an embodiment.

如前所述，可以定义基于块的复杂性量度(M_EDR)。在实施例中，在步骤(405)中，不失一般性，考虑将基于块中的像素的标准偏差(std)而计算的复杂性量度。注意，检查帧j(j＝1,2,…,F)中的块n的标准偏差是否为零(例如，M_EDR(j,n)＝std_jn＝0)等于检查该块中的最大值(例如，B(j,n))与该块中的最小值(例如，A(j,n))之间的差是否为0。As previously mentioned, a block-based complexity measure ( _MEDR ) may be defined. In an embodiment, in step (405), without loss of generality, a complexity measure is considered that is calculated based on the standard deviation (std) of the pixels in the block. Note that checking whether the standard deviation of block n in frame j (j=1, 2, ..., F) is zero (e.g., _MEDR (j,n)= _stdjn =0) is equivalent to checking whether the difference between the maximum value in the block (e.g., B(j,n)) and the minimum value in the block (e.g., A(j,n)) is zero.

假定整形函数(110)通过分段线性线构造，则对于输入v_i∈[A(j，n) B(j，n)]，局部量化器可以被表达为：Assuming that the shaping function (110) is constructed by a piecewise linear line, for the input _vi∈ [A(j,n) B(j,n)], the local quantizer can be expressed as:

其中，k(j,n)是调整第j帧中的第n块处的量化器的斜率的缩放因子。where k(j,n) is a scaling factor that adjusts the slope of the quantizer at the nth block in the jth frame.

在步骤(410)中，令Ф表示块度量满足某一标准的所有块的集合。例如，令Ф表示在量化之前具有非零标准偏差的所有块的集合，或者In step (410), let Φ denote the set of all blocks whose block metrics satisfy a certain criterion. For example, let Φ denote the set of all blocks with non-zero standard deviation before quantization, or

φ＝{(j，n)|B(j，n)-A(j，n)＞0}， (7)φ={(j,n)|B(j,n)-A(j,n)>0}, (7)

在实施例中，给定阈值T_th(其中，不失一般性，T_th≥1)，给定块的最小像素值和最大像素值，在步骤(415)中，可以如下导出最优的k(j,n)：In an embodiment, given a threshold _Tth (where, without loss of generality, _Tth ≥ 1), a minimum pixel value and a maximum pixel value of a given block, in step (415), the optimal k(j,n) can be derived as follows:

给定数据{A(j,n),B(j,n),k(j,n)}，该三元组值揭示了，在段[A(j,n),B(j,n)]内部，量化器应具有至少k(j,n)的斜率。假定特定的EDR码(例如，v_c)可以属于多个[A(j,n),B(j,n)]段，则对于EDR码字v_c，需要对于每个码字确定最大斜率以满足所有块。Given the data {A(j,n),B(j,n),k(j,n)}, the triplet value reveals that within the segment [A(j,n),B(j,n)], the quantizer should have a slope of at least k(j,n). Assuming that a specific EDR code (e.g., v _c ) can belong to multiple [A(j,n),B(j,n)] segments, for the EDR codeword v _c , it is necessary to determine the maximum slope for each codeword to satisfy all blocks.

令θ(v_c)表示覆盖码字v_c的所有块之中的所有段的集合，或者Let θ(v _c ) denote the set of all segments in all blocks covering codeword v _c , or

θ(v_c)＝{(j，n)|A(j，n)≤v_c≤B(j，n)，(j，n)∈φ}. (9)θ(v _c )={(j,n)|A(j,n)≤v _c ≤B(j,n), (j,n)∈φ}. (9)

然后，在步骤(420)中，码字v_c处的所需斜率可以被确定为属于θ(v_c)集合的那些块内的所有最优斜率的包络，或者Then, in step (420), the required slope at codeword v _c can be determined as the envelope of all optimal slopes within those blocks belonging to the set θ(v _c ), or

在步骤(425)中，令所有这样的包络斜率的和被表示为：In step (425), let the sum of all such envelope slopes be expressed as:

然后，对于每个v_c码字，不失一般性，在步骤(430)中，可以如下定义累积斜率函数：Then, for each v _c codeword, without loss of generality, in step (430), the cumulative slope function can be defined as follows:

为了保证所有的码字被映射在[c_L,c_H]界限内，可以使用以下等式来计算从v_c码字到s_i值的映射：To ensure that all codewords are mapped within the bounds of [c _L ,c _H ], the following equation can be used to calculate the mapping from v _c codewords to s _i values:

给定等式(13)，可以使用前向映射查找表(例如，)来计算输入的v_c值和输出的s_i值之间的映射。在实施例中，该表可以以数据存储，或者作为图像元数据(119)的一部分被发送到解码器以使得解码器可以重构逆向映射过程。Given equation (13), a forward mapping lookup table (e.g., ) can be used to calculate the mapping between the input v _c values and the output s _i values. In an embodiment, the table can be stored in the data or sent to the decoder as part of the image metadata (119) so that the decoder can reconstruct the inverse mapping process.

在示例实施例中，表2使用伪代码概括了图4中所描绘的映射过程。In an example embodiment, Table 2 summarizes the mapping process depicted in FIG. 4 using pseudocode.

表2Table 2

在一些实施例中，在等式(12)中，替代函数可以被用于计算累积斜率函数K(v_c)。例如，可以如以下等式中那样在k(v_c)值被求和之前对k(v_c)值进行滤波或加权：In some embodiments, an alternative function may be used to calculate the cumulative slope function K(v _c ) in equation (12). For example, the k(v _c ) values may be filtered or weighted before being _summed as in the following equation:

其中，w_i表示具有(2u+1)个滤波器抽头的滤波器的滤波器系数或预定权重(例如，u＝2并且滤波器系数对应于低通滤波器的那些滤波器系数)。Here, _wi represents filter coefficients or predetermined weights of a filter having (2u+1) filter taps (eg, u=2 and the filter coefficients correspond to those of a low-pass filter).

图5描绘了给定[0,255]的[c_L,c_H]范围的对于帧的测试序列的、v_c码字对包络斜率k(v_c)值的示例绘图。FIG5 depicts an example plot of _vc codeword versus envelope slope k( _vc ) value for a test sequence of frames given a [ _cL , _cH ] range of [0, 255].

给定图5中描绘的数据，图6描绘了v_c对映射的示例。Given the data depicted in Figure 5, Figure 6 depicts an example of v _c pair mapping.

逆向整形Reverse plastic surgery

在解码器中，给定等式(13)的值，可以如下确定逆向量化器或整形函数(140)：In the decoder, given the value of equation (13), the inverse quantizer or shaping function (140) can be determined as follows:

对于每个经解码的码字令For each decoded codeword

则but

换句话说，对于量化域中的给定码字通过以下操作来构造对应的估计的EDR码字首先对具有量化值的所有像素进行分组，找到对应的EDR码字，并然后对所有收集的EDR码字取平均。从等式(16)，后向查找表可以被构造，并且被以数据存储或者例如作为元数据(119)的一部分发送到解码器。In other words, for a given codeword in the quantized domain, the corresponding estimated EDR codeword is constructed by first grouping all pixels with quantized values, finding the corresponding EDR codeword, and then averaging all collected EDR codewords. From equation (16), a backward lookup table can be constructed and stored in the data or sent to the decoder, for example, as part of the metadata (119).

给定图6中描绘的数据，图7描绘了对映射或逆向整形(140)的示例。Given the data depicted in FIG6 , FIG7 depicts an example of mapping or inverse reshaping ( 140 ).

在实施例中，可以使用元数据(119，127)将由关系定义的映射发送到解码器。这样的方法就位率开销来说可能太昂贵。例如，对于8位数据，查找表可以包括每次存在场景改变时将发送的255个条目。在其它实施例中，可以将逆向映射转化为分段多项式近似。这样的多项式通常可以包括一阶多项式和二阶多项式，尽管也可以使用更高阶的多项式或B样条。对于某一层l(l＝0,1,…,L-1)近似LUT的多项式的数量可以根据可用带宽和处理复杂性而变化。在实施例中，基本层使用多达8个段，而增强层使用单个段。In an embodiment, the mapping defined by the relationship can be sent to the decoder using metadata (119, 127). Such an approach may be too expensive in terms of bit rate overhead. For example, for 8-bit data, the lookup table may include 255 entries to be sent each time there is a scene change. In other embodiments, the inverse mapping can be converted into a piecewise polynomial approximation. Such polynomials can typically include first-order polynomials and second-order polynomials, although higher-order polynomials or B-splines can also be used. The number of polynomials that approximate the LUT for a certain layer l (l = 0, 1, ..., L-1) can vary depending on the available bandwidth and processing complexity. In an embodiment, the base layer uses up to 8 segments, while the enhancement layer uses a single segment.

表3描绘了根据实施例的使用二阶多项式来近似解码器LUT表的示例算法。Table 3 depicts an example algorithm for approximating a decoder LUT table using a second order polynomial, according to an embodiment.

表3—使用2d阶多项式的LUT近似Table 3 - LUT approximation using a 2d-order polynomial

如表3中所描绘的，在实施例中，近似过程的输入包括：(比如说，使用等式(16)计算的)原始查找表、LUT中的值与通过多项式近似产生的那些值之间的可接受的误差容限、可用码字的数量、以及其第一码字值(参见等式(1))。输出可以包括每个多项式的端点(也被称为枢轴点)以及多项式系数。As depicted in Table 3, in an embodiment, the inputs to the approximation process include: the original lookup table (e.g., calculated using Equation (16)), the acceptable error tolerance between the values in the LUT and those produced by the polynomial approximation, the number of available codewords, and their first codeword values (see Equation (1)). The output may include the endpoints (also known as pivot points) of each polynomial and the polynomial coefficients.

从第一枢轴点开始，该算法试图不失一般性地使用二次多项式来拟合可用码字的最大可能范围。可以使用任何已知的多项式拟合算法，诸如均方误差多项式拟合等。Starting from the first pivot point, the algorithm attempts to fit the largest possible range of available codewords using a quadratic polynomial without loss of generality. Any known polynomial fitting algorithm can be used, such as mean square error polynomial fitting, etc.

当计算的最大误差超过输入容限时，则存储最优多项式的参数，并且开始搜索新的多项式，直到整个LUT表被映射为止。When the calculated maximum error exceeds the input tolerance, the parameters of the optimal polynomial are stored and a search for a new polynomial begins until the entire LUT table is mapped.

在一些实施例中，可以用于近似LUT的多项式的数量可以被约束为固定值，比如说八个。在该情况下，可以将更高的误差容限融合到算法中。In some embodiments, the number of polynomials that can be used to approximate the LUT can be constrained to a fixed value, say eight. In this case, a higher error tolerance can be incorporated into the algorithm.

表3的方法也可以被容易地修改以适应其它近似函数，诸如更高阶的多项式、B样条、或近似函数的组合。The method of Table 3 can also be easily modified to accommodate other approximation functions, such as higher order polynomials, B-splines, or a combination of approximation functions.

感知均匀的颜色空间中的编码Encoding in perceptually uniform color spaces

视频信号通常在熟悉的RGB颜色空间中被呈现；然而，大多数视频压缩标准(诸如MPEG-2、H.264(AVC)、H.265(HEVC)等)已被优化以在对立的颜色空间(诸如YCbCr或YUV)中进行操作。这些颜色空间对于对8-10位标准动态范围(SDR)视频进行编码并发送是足够的，然而，从每一像素的位数要求的观点来讲，当对EDR视频进行编码并发送时，它们可能不是最高效的。例如，在过去，Lu’v’和Log(L)u’v’颜色空间也已被建议。Video signals are typically presented in the familiar RGB color space; however, most video compression standards (such as MPEG-2, H.264 (AVC), H.265 (HEVC), etc.) have been optimized to operate in opposing color spaces (such as YCbCr or YUV). These color spaces are sufficient for encoding and transmitting 8-10-bit standard dynamic range (SDR) video, but they may not be the most efficient when encoding and transmitting EDR video from the perspective of the number of bits per pixel required. For example, Lu'v' and Log(L)u'v' color spaces have also been proposed in the past.

如发明人所意识到的，感知均匀空间中的信号的编码可能得益于u’和v’色度数据在被视频编解码器处理之前的附加处理。例如，在实施例中，在编码器中，这样的处理可以被作为信号整形过程(110)的一部分对输入信号(102)执行。As the inventors have appreciated, encoding of signals in a perceptually uniform space may benefit from additional processing of the u' and v' chrominance data prior to processing by the video codec. For example, in an embodiment, such processing may be performed on the input signal (102) as part of a signal shaping process (110) in the encoder.

白点转化White point conversion

在实施例中，从线性XYZ到Luma u’v’颜色空间的转换可以包括以下步骤：In an embodiment, the conversion from linear XYZ to Luma u'v' color space may include the following steps:

a)定义白点(例如，D65)的坐标a) Define the coordinates of the white point (e.g., D65)

b)针对Luma＝f(Y)进行求解，以及b) Solve for Luma = f(Y), and

c)从X、Y和Z，针对u’和v’进行求解c) Solve for u’ and v’ from X, Y and Z

如本文中所使用的，函数f(Y)表示任何明度相关的函数，诸如L(或L’)、log(L)等。在优选实施例中，f(Y)可以表示如在‘212申请中所描述的感知量化(PQ)映射函数。As used herein, the function f(Y) represents any luma-related function, such as L (or L'), log(L), etc. In a preferred embodiment, f(Y) may represent a perceptual quantization (PQ) mapping function as described in the '212 application.

在实施例中，白点可以被定义为具有u’坐标和v’坐标的D65(6500K)：In an embodiment, the white point may be defined as D65 (6500K) with u' and v' coordinates:

Du＝d65u＝0.1978300066428；Du=d65u=0.1978300066428;

Dv＝d65v＝0.4683199949388；Dv=d65v=0.4683199949388;

在实施例中，可以如下导出u’和v’：In an embodiment, u' and v' may be derived as follows:

如果(X+15Y+3Z)≠0，则If (X+15Y+3Z)≠0, then

并且如果(X+15Y+3Z)＝0,则And if (X+15Y+3Z)=0, then

u'＝Du (17c)u'＝Du (17c)

v'＝Dv (17d)v'＝Dv (17d)

逆向操作包括：Reverse operations include:

b)针对Y＝f^-1(Luma)进行求解b) Solve for Y = f ^-1 (Luma)

c)从u’和v’，针对X和Z进行求解c) Solve for X and Z from u’ and v’

例如，在使用根据‘212申请的感知量化函数的实施例中，可以应用对应的逆向PQ映射来产生Y个像素值。For example, in an embodiment using a perceptual quantization function according to the '212 application, a corresponding inverse PQ mapping may be applied to generate Y pixel values.

在实施例中，可以如下导出X和Z：In an embodiment, X and Z may be derived as follows:

如果v'≠0，则If v'≠0, then

并且如果v’＝0，则X＝Z＝Y。 (18c)And if v' = 0, then X = Z = Y. (18c)

图8A描绘了u’v’色度空间中的白点(805)(例如，D65)的传统映射。如图8A中所描绘的，u’和v’色度值的范围对于u’近似为(0,0.623)，对于v’近似为(0,0.587)。如图8A中所描绘的，D65白点在u’v’信号表示中不居中。这可能在对色度分量进行子采样和上采样以在原始的4:4:4颜色格式与在视频编码中通常使用的4:2:0或4:2:2颜色格式之间转化之后导致颜色偏移。为了缓解这样的颜色偏移，建议将变换函数应用于色度值。在一个实施例中，变换函数将白点近似移位到经转化的u’v’的中心；然而，在可能想要看到色度误差下降的情况下，白点可以被转化为任何其它的颜色值。例如，如果经转化的值u′_t和v′_t在范围(0,1)中，则可以应用以下映射：FIG8A depicts a conventional mapping of a white point (805) (e.g., D65) in the u'v' chrominance space. As depicted in FIG8A , the range of u' and v' chrominance values is approximately (0, 0.623) for u' and (0, 0.587) for v'. As depicted in FIG8A , the D65 white point is not centered in the u'v' signal representation. This can result in color shifts after subsampling and upsampling the chrominance components to convert between the original 4:4:4 color format and the 4:2:0 or 4:2:2 color formats commonly used in video encoding. To mitigate such color shifts, it is proposed to apply a transformation function to the chrominance values. In one embodiment, the transformation function shifts the white point approximately to the center of the transformed u'v'; however, in cases where it may be desirable to see a reduction in chrominance errors, the white point can be transformed to any other color value. For example, if the transformed values _u′t and _v′t are in the range (0, 1), the following mapping can be applied:

u′_t＝(u′-Du)a₁+b₁， (19a)u′ _t =(u′-Du)a ₁ +b ₁ , (19a)

v′_t＝(v′-Dv)a₂+b₂， (19b)v′ _t =(v′-Dv)a ₂ +b ₂ , (19b)

其中，Du和Dv表示所选择的白点的原始的u’坐标和v’坐标，(b₁,b₂)确定白点在转化的颜色空间中的期望位置的坐标，并且a_i(i＝1,2)是基于期望的转化点以及u’和v’的最小值和最大值而计算的常数。在一个实施例中，转化参数(例如，a₁和a₂)对于整个视频序列可以是固定的。在另一个实施例中，可以基于每一个场景或每一个帧来计算转化参数以利用传入(incoming)的内容的色度范围的变化。Where Du and Dv represent the original u' and v' coordinates of the selected white point, (b ₁ , b ₂ ) determine the coordinates of the desired location of the white point in the converted color space, and a _i (i=1,2) are constants calculated based on the desired conversion point and the minimum and maximum values of u' and v'. In one embodiment, the conversion parameters (e.g., a ₁ and a ₂ ) can be fixed for the entire video sequence. In another embodiment, the conversion parameters can be calculated on a per-scene or per-frame basis to take advantage of variations in the chromaticity range of the incoming content.

图8B描绘了根据示例实施例的白点(805)在经转化的色度空间中的映射。在图8B中，原始的u’和v’色度值被转化使得所选择的点(例如，D65)近似处于经转化的色度空间的中心(0.5,0,5)。例如，如果经转化的值u′_t和v′_t在(0,1)中，则对于b₁＝b₂＝0.5，在一个实施例中，可以应用以下映射：FIG8B depicts the mapping of the white point (805) in the transformed chromaticity space according to an example embodiment. In FIG8B , the original u' and v' chromaticity values are transformed so that the selected point (e.g., D65) is approximately at the center (0.5, 0, 5) of the transformed chromaticity space. For example, if the transformed values _u't and _v't are in (0, 1), then for _b1 = _b2 = 0.5, in one embodiment, the following mapping may be applied:

u′_t＝(u′-Du)1.175+0.5， (20a)u′ _t = (u′-Du)1.175+0.5, (20a)

v′_t＝(v′-Dv)1.105+0.5， (20b)v′ _t = (v′-Dv)1.105+0.5, (20b)

其中，Du和Dv表示所选择的白点的u’坐标和v’坐标。该转化将导致色度误差表现为欠饱和，而不是色调偏移。本领域的普通技术人员将意识到，非线性函数也可以被应用于u’和v’色度值来实现相同的转化。这样的非线性函数可以将更高的精度分配给近中性色以进一步降低由于编码和量化而导致的颜色误差的可见性。Where Du and Dv represent the u' and v' coordinates of the chosen white point. This conversion causes chromaticity errors to appear as undersaturation rather than hue shifts. Those skilled in the art will appreciate that nonlinear functions can also be applied to the u' and v' chromaticity values to achieve the same conversion. Such nonlinear functions can assign greater precision to near-neutral colors to further reduce the visibility of color errors caused by encoding and quantization.

减小色度熵Reduce chromatic entropy

如果u’和v’像素分量被乘以光度的函数，则可以进一步改进色度细节的可见性。例如，在实施例中，可以如下导出经转化的色度值：If the u' and v' pixel components are multiplied by a function of the luminosity, the visibility of the chrominance details can be further improved. For example, in an embodiment, the converted chrominance values can be derived as follows:

u′_t＝g(Luma)(u′-Du)a₁+b₁， (21a)u′ _t =g(Luma)(u′-Du)a ₁ +b ₁ , (21a)

v′_t＝g(Luma)(v′-Dv)a₂+b₂， (21b)v′ _t =g(Luma)(v′-Dv)a ₂ +b ₂ , (21b)

其中，g(Luma)表示亮度通道的函数。在实施例中，g(Luma)＝Luma。Wherein, g(Luma) represents a function of the luminance channel. In an embodiment, g(Luma)=Luma.

在解码器中，传入的信号可以被表示为Luma u’_tv’_t。在许多应用中，该信号在进一步处理之前必须被转换回XYZ、RGB或某一其它的颜色空间。在示例实施例中，从Luma u’_tv’_t到XYZ的颜色转换过程可以包括以下步骤：In the decoder, the incoming signal can be represented as Luma _u't _v't . In many applications, the signal must be converted back to XYZ, RGB, or some other color space before further processing. In an example embodiment, the color conversion process from Luma _u't _v't to XYZ may include the following steps:

a)取消(undo)亮度编码a) Undo brightness encoding

Y＝f^-1(Luma)Y＝f ^-1 (Luma)

b)取消对u’_t和v’_t值的范围缩放以恢复u’和v’b) Unscale the range of u' _t and v' _t values to restore u' and v'

c)使用等式(18)来还原X和Zc) Use equation (18) to restore X and Z

在一些实施例中，可以在任何颜色变换之前将传入的信号的Luma、u’_t和v’_t分量规范化到(0,1)范围中。在一些实施例中，等式(17)-(21)可以使用查找表、乘法和加法的组合来实现。例如，在实施例中，令In some embodiments, the Luma, _u't , and _v't components of the incoming signal may be normalized to the (0, 1) range before any color transformation. In some embodiments, equations (17)-(21) may be implemented using a combination of lookup tables, multiplications, and additions. For example, in an embodiment, let

Y＝f^-1(Luma)，Y＝f ^-1 (Luma),

B＝3u′，B＝3u′,

C＝20v′，并且C = 20v', and

D＝1/(4v’)D＝1/(4v’)

表示以Luma、u’和v’作为其输入的三个查找表的输出，然后，从等式(18)，可以如下使用四个乘法和两个加法来计算X和Z值：Denote the output of the three lookup tables with Luma, u', and v' as their inputs. Then, from equation (18), the X and Z values can be calculated using four multiplications and two additions as follows:

Z＝(Y*D)*(12-B-C)，Z＝(Y*D)*(12-B-C),

并且and

X＝(Y*D)*(3*B).X＝(Y*D)*(3*B).

例如，在实施例中，对于10位编码的信号，每个LUT可以具有1024个条目，每个条目对于目标应用处于足够高的精度(例如，32位)。For example, in an embodiment, for a 10-bit encoded signal, each LUT may have 1024 entries, with each entry being at a sufficiently high precision (eg, 32 bits) for the target application.

图9描绘了根据本发明的实施例的编码和解码管线的示例。输入信号(902)可以为RGB 4:4:4或任何其它合适的颜色格式。在步骤(910)中，例如通过对亮度值使用感知量化(PQ)映射并且对色度值使用等式(17)，信号(902)被转化为感知Luma u’v’格式。在步骤(915)中，应用例如如等式(19)-(21)中所描绘的变换以将原始的u’v’色度值转化为经转化的色度值u’_t和v’_t，以使得白点被近似放置在经转化的色度空间的中心处。经颜色变换和转化的Luma u’_tv’_t4:4:4信号(例如，917)可以在被视频编码器(920)编码之前被颜色子采样(未示出)为4:2:0或4:2:2格式。视频编码器(920)可以包括如前所述的信号整形(110)和层分解(115)过程。在接收器上，视频解码器(930)产生解码的信号(932)。视频解码器(930)可以包括信号逆向整形和层合成器(140)。在可选的颜色上采样(例如，从4:2:0到4:4:4)之后，反向色度转化步骤(935)可以通过使(915)中的转化操作反向来将Luma u’_tv’_t信号(932)转化回Luma u’v’信号(937)。最后，可以将Luma u’v’信号(937)转化为RGB或其它适当的颜色空间中的输出信号(942)以用于显示或进一步的处理。FIG9 depicts an example of an encoding and decoding pipeline according to an embodiment of the present invention. The input signal (902) may be in RGB 4:4:4 or any other suitable color format. In step (910), the signal (902) is converted to a perceptual Luma u'v' format, for example by using a perceptual quantization (PQ) mapping for luma values and equation (17) for chroma values. In step (915), a transform, such as that described in equations (19)-(21), is applied to convert the original u'v' chroma values to transformed chroma values _u't and _v't such that _{the white point is approximately placed at the center of the transformed chroma space. The color transformed and transformed Luma u't v't} ₄ :4:4 signal (e.g., 917) may be color subsampled (not shown) to a 4:2:0 or 4:2:2 format before being encoded by a video encoder (920). The video encoder (920) may include the signal shaping (110) and layer decomposition (115) processes described above. At the receiver, a video decoder (930) produces a decoded signal (932). The video decoder (930) may include an inverse signal shaping and layer synthesizer (140). After optional color upsampling (e.g., from 4:2:0 to 4:4:4), an inverse chroma conversion step (935) may convert the Luma _u't _v't signal (932) back into a Luma u'v' signal (937) by reversing the conversion operation in (915). Finally, the Luma u'v' signal (937) may be converted to an output signal (942) in RGB or other appropriate color space for display or further processing.

感知量化的IPT颜色空间中的编码Encoding in perceptually quantized IPT color space

白点转化也可以被应用于其它颜色空间，诸如IPT颜色空间或IPT-PQ颜色空间，表现为理想地适合于具有增强或高动态范围的视频信号的编码的感知量化的颜色空间。IPT-PQ颜色空间在其全部内容通过引用并入本文的、R.Atkins等人的、标题为“Displaymanagement for high dynamic range video”的、2014年2月13日提交的PCT申请PCT/US2014/016304中首次被描述。White point conversion can also be applied to other color spaces, such as the IPT color space or the IPT-PQ color space, which appear to be perceptually quantized color spaces ideally suited for encoding video signals with enhanced or high dynamic range. The IPT-PQ color space was first described in PCT application PCT/US2014/016304, filed February 13, 2014, by R. Atkins et al., entitled “Display management for high dynamic range video,” which is incorporated herein by reference in its entirety.

如其全部内容通过引用并入本文的、Proc.6^th Color Imaging Conference:ColorScience,Systems,and Applications,IS&T,Scottsdale,Arizona,Nov.1998,pp.8-13中的、F.Ebner和M.D.Fairchild的“Development and testing of a color space(ipt)withimproved hue uniformity”(将被称为Ebner论文)中所描述的IPT颜色空间是人类视觉系统中的锥体之间的颜色差异的模型。在这个意义上，它像YCbCr或CIE-Lab颜色空间；然而，在一些科学研究中已表明比这些空间更好地模仿人类视觉处理。像CIE-Lab那样，IPT是对于某一参考亮度的规范化空间。在实施例中，规范化可以基于目标显示器的最大亮度。The IPT color space, described in "Development and testing of a color space (ipt) with improved hue uniformity" by F. Ebner and M.D. Fairchild, in Proc. ^6th Color Imaging Conference: Color Science, Systems, and Applications, IS&T, Scottsdale, Arizona, Nov. 1998, pp. 8-13 (hereinafter referred to as the Ebner paper), as incorporated herein by reference in its entirety, is a model of color differences between cones in the human visual system. In this sense, it is like the YCbCr or CIE-Lab color spaces; however, some scientific studies have shown that it mimics human visual processing better than these spaces. Like CIE-Lab, IPT is a normalized space to some reference luminance. In embodiments, the normalization may be based on the maximum luminance of the target display.

如本文中所使用的术语“PQ”指的是感知量化。人类视觉系统以非常非线性的方式对增加的光水平进行响应。人类看见刺激的能力受该刺激的亮度、该刺激的大小、构成该刺激的空间频率、以及眼睛在一个人观看该刺激的特定时刻适应的亮度水平影响。在优选实施例中，感知量化器函数将线性的输入灰度水平映射到与人类视觉系统中的对比灵敏度阈值更好地匹配的输出灰度水平。‘212申请中描述了PQ映射函数的示例，在该申请中，给定固定的刺激大小，对于每一个亮度水平(即，刺激水平)，根据最灵敏的适应水平和最灵敏的空间频率(根据HVS模型)来选择该亮度水平处的最小可见对比步长。与传统的表示物理阴极射线管(CRT)设备的响应曲线、并且巧合地可能具有与人类视觉系统响应的方式非常粗略的相似性的伽玛曲线相比，如‘212申请所确定的PQ曲线使用相对简单的函数模型来模拟人类视觉系统的真实视觉响应。As used herein, the term "PQ" refers to perceptual quantization. The human visual system responds to increasing light levels in a highly nonlinear manner. A person's ability to see a stimulus is affected by the brightness of the stimulus, the size of the stimulus, the spatial frequencies that comprise the stimulus, and the brightness level to which the eye adapts at the particular moment a person views the stimulus. In a preferred embodiment, a perceptual quantizer function maps linear input grayscale levels to output grayscale levels that better match the contrast sensitivity threshold in the human visual system. An example of a PQ mapping function is described in the '212 application, in which, given a fixed stimulus size, for each brightness level (i.e., stimulus level), the minimum visible contrast step size at that brightness level is selected based on the most sensitive adaptation level and the most sensitive spatial frequency (according to the HVS model). Compared to the traditional gamma curve, which represents the response curve of a physical cathode ray tube (CRT) device and, coincidentally, may bear a very rough resemblance to the way the human visual system responds, the PQ curve as defined in the '212 application uses a relatively simple functional model to simulate the true visual response of the human visual system.

表1描述了用于在显示器的一个点处将数字视频码值转换为绝对线性亮度水平的感知曲线EOTF的计算。还包括用于将绝对线性亮度转换为数字码值的逆向OETF计算。Table 1 describes the calculation of the perceptual curve EOTF for converting digital video code values to absolute linear luminance levels at one point on the display. Also included is the inverse OETF calculation for converting absolute linear luminance to digital code values.

表1Table 1

示例性等式定义Example Equation Definition

D＝感知曲线数字码值，SDI-法定无符号整数，10或12位D = Perceptual curve digital code value, SDI-legal unsigned integer, 10 or 12 bits

b＝数字信号表示中的每一分量的位数，10或12b = the number of bits per component in the digital signal representation, 10 or 12

V＝规范化的感知曲线信号值，0≤V≤1V = normalized perceptual curve signal value, 0≤V≤1

Y＝规范化的亮度值，0≤Y≤1Y = normalized brightness value, 0≤Y≤1

L＝绝对亮度值，0≤L≤10,000cd/m² L = absolute luminance value, 0≤L≤10,000cd/ ^m2

示例性EOTF解码等式：Example EOTF decoding equation:

示例性OETF编码等式：Example OETF encoding equation:

示例性常数：Example constants:

注释：Notes:

1.运算符INT对于0至0.4999…的范围中的小数部分返回值0，而对于0.5至0.9999…的范围中的小数部分返回值+1，即，它对大于0.5的小数进行上舍入。1. The operator INT returns a value of 0 for a decimal part in the range of 0 to 0.4999… and a value of +1 for a decimal part in the range of 0.5 to 0.9999…, that is, it rounds up a decimal greater than 0.5.

2.所有常数被定义为12位有理数的整数倍以避免四舍五入问题。2. All constants are defined as multiples of 12-bit rational numbers to avoid rounding problems.

3.R、G或B信号分量将以与上述Y信号分量相同的方式计算。3. The R, G or B signal component will be calculated in the same manner as the Y signal component above.

将信号转换到IPT-PQ颜色空间可以包括以下步骤：Converting a signal to the IPT-PQ color space may include the following steps:

a)将信号从输入的颜色空间(例如，RGB或YCbCr)转换到XYZa) Convert the signal from the input color space (e.g., RGB or YCbCr) to XYZ

b)将信号如下从XYZ转换到IPT-PQ：b) Convert the signal from XYZ to IPT-PQ as follows:

a.将3×3XYZ应用于LMS矩阵以将信号从XYZ转换到LMSa. Apply 3×3XYZ to the LMS matrix to convert the signal from XYZ to LMS

b.将LMS信号的每个颜色分量转换为感知量化的LMS信号(L’M’S’或LMS-PQ)(例如，通过应用等式(t2))b. Convert each color component of the LMS signal into a perceptually quantized LMS signal (L'M'S' or LMS-PQ) (e.g., by applying equation (t2))

c.将3×3LMS应用于IPT矩阵以将LMS-PQ信号转换到IPT-PQ颜色空间c. Apply 3×3 LMS to the IPT matrix to convert the LMS-PQ signal to the IPT-PQ color space

3×3XYZ到LMS和L’M’S’(或LMS-PQ)到IPT转换度量的示例可以在Ebner论文中找到。假定IPT-PQ信号的色度分量(例如，P’和T’)在范围(-0.5,0.5)中，则可以加上偏置(bias)α(例如，α＝0.5)，以使得色度分量的范围基本上在范围(0,1)内，例如：Examples of 3×3XYZ to LMS and L'M'S' (or LMS-PQ) to IPT conversion metrics can be found in the Ebner paper. Assuming that the chrominance components (e.g., P' and T') of the IPT-PQ signal are in the range (-0.5, 0.5), a bias α (e.g., α = 0.5) can be added to make the range of the chrominance components basically in the range (0, 1), for example:

P′＝P′+a (22a)P′＝P′+a (22a)

T′＝T′+a (22b)T′＝T′+a (22b)

逆向颜色操作可以包括以下步骤：The inverse color operation may include the following steps:

a)减去被加到色度分量的任何偏置值a) Subtract any bias added to the chroma components

b)将3×3I’P’T’应用于LMS转换矩阵以从IPT-PQ转换到LMS-PQb) Applying the 3×3 I’P’T’ to the LMS transfer matrix to convert from IPT-PQ to LMS-PQ

c)应用逆向PQ函数以从LMS-PQ转换到LMS(例如，通过使用等式(t1))c) Apply the inverse PQ function to convert from LMS-PQ to LMS (e.g., by using equation (t1))

d)将3×3LMS应用于XYZ变换以从LMS转换到XRZ，以及d) applying 3×3 LMS to the XYZ transform to convert from LMS to XRZ, and

e)从XYZ转换到选择的设备相关的颜色空间(例如，RGB或YCbCr)。e) Convert from XYZ to the selected device-dependent color space (e.g., RGB or YCbCr).

在实际中，可以使用预先计算的1-D查找表(LUT)来执行编码和/或解码期间的颜色变换步骤。In practice, the color conversion step during encoding and/or decoding may be performed using a pre-computed 1-D lookup table (LUT).

减小色度熵Reduce chromatic entropy

如前所述，如果P’和T’像素分量被乘以光度(例如，I’)的函数，则可以进一步改进色度细节的可见性。例如，在实施例中，可以如下导出转化的色度值：As previously mentioned, if the P' and T' pixel components are multiplied by a function of luminosity (e.g., I'), the visibility of chrominance details can be further improved. For example, in an embodiment, the converted chrominance values can be derived as follows:

P′_t＝g(I′)(P′-a)+a， (23a)P′ _t =g(I′)(P′-a)+a, (23a)

T′_t＝g(I′)(T′-a)+a， (23b)T′ _t =g(I′)(T′-a)+a, (23b)

其中，g(I’)表示亮度通道(I’)的线性或非线性函数。在实施例中，g(I’)＝I’。Wherein, g(I') represents a linear or nonlinear function of the luminance channel (I'). In an embodiment, g(I') = I'.

示例计算机系统实现Example Computer System Implementation

本发明的实施例可以利用以下来实现：计算机系统、在电子电路系统和组件中配置的系统、集成电路(IC)器件(诸如微控制器)、现场可编程门阵列(FPGA)、或另一个可配置的或可编程的逻辑器件(PLD)、离散时间或数字信号处理器(DSP)、专用IC(ASIC)、和/或包括这样的系统、器件或组件中的一个或多个的装置。计算机和/或IC可以执行、控制、或执行与用于具有增强动态范围(EDR)的视频图像的分层编码的自适应整形技术(诸如本文中所描述的那些)相关的指令。计算机和/或IC可以计算与本文中所描述的自适应整形过程相关的各种参数或值中的任何一个。图像和视频实施例可以以硬件、软件、固件和其各种组合来实现。Embodiments of the present invention may be implemented using a computer system, a system configured in electronic circuit systems and components, an integrated circuit (IC) device (such as a microcontroller), a field programmable gate array (FPGA), or another configurable or programmable logic device (PLD), a discrete-time or digital signal processor (DSP), an application-specific IC (ASIC), and/or an apparatus comprising one or more of such systems, devices, or components. The computer and/or IC may execute, control, or perform instructions related to adaptive shaping techniques for layered coding of video images with enhanced dynamic range (EDR), such as those described herein. The computer and/or IC may calculate any of the various parameters or values associated with the adaptive shaping process described herein. The image and video embodiments may be implemented in hardware, software, firmware, and various combinations thereof.

本发明的某些实现包括执行使处理器执行本发明的方法的软件指令的计算机处理器。例如，显示器、编码器、机顶盒、转码器等中的一个或多个处理器可以实现与用于具有增强动态范围(EDR)的视频图像的分层编码的自适应整形技术相关的方法，如上所述通过执行可供处理器访问的程序存储器中的软件指令。还可以以程序产品的形式来提供本发明。程序产品可以包括携载计算机可读信号集的任何介质，该计算机可读信号集包括当被数据处理器执行时使数据处理器执行本发明的方法的指令。根据本发明的程序产品可以为多种多样的形式中的任何一种。程序产品可以包括例如物理介质，诸如磁性数据存储介质(包括软盘、硬盘驱动器)、光学数据存储介质(包括CD ROM、DVD)、电子数据存储介质(包括ROM、闪速RAM)等。程序产品上的计算机可读信号可选地可以被压缩或加密。Certain implementations of the present invention include a computer processor executing software instructions that cause the processor to perform the methods of the present invention. For example, one or more processors in a display, encoder, set-top box, transcoder, etc. can implement methods related to adaptive shaping techniques for layered coding of video images with enhanced dynamic range (EDR) by executing software instructions in a program memory accessible to the processor, as described above. The present invention can also be provided in the form of a program product. The program product can include any medium that carries a set of computer-readable signals that, when executed by a data processor, includes instructions that cause the data processor to perform the methods of the present invention. Program products according to the present invention can take any of a variety of forms. The program product can include, for example, physical media such as magnetic data storage media (including floppy disks, hard drives), optical data storage media (including CD ROMs, DVDs), electronic data storage media (including ROMs, flash RAM), etc. The computer-readable signals on the program product can optionally be compressed or encrypted.

在组件(例如，软件模块、处理器、组装件、器件、电路等)在以上被提到的情况下，除非另有指示，否则对该组件的论述(包括对“部件(means)”的论述)应被解释为包括作为该组件的等同物的、执行所描述的组件的功能(例如，在功能上等同)的任何组件，包括在结构上不等同于所公开的结构的、执行本发明的示出的示例实施例中的功能的组件。Where a component (e.g., a software module, processor, assembly, device, circuit, etc.) is referred to above, unless otherwise indicated, discussion of the component (including discussion of "means") should be interpreted to include any component that is an equivalent to the component and performs the function of the described component (e.g., functionally equivalent), including components that are not structurally equivalent to the disclosed structure and perform the functions in the illustrated example embodiments of the invention.

等同、扩展、替代和其它Equivalents, extensions, replacements, and others

如此描述了与用于具有增强动态范围(EDR)的视频图像的分层编码的自适应整形技术相关的示例实施例。在前面的说明书中，参照在不同实现之间可以有所变化的大量的具体细节描述了本发明的实施例。因此，本发明是什么、申请人意图本发明是什么的唯一的且排他的指示是本申请发布的特定形式的一组权利要求，这样的权利要求以该特定形式发布，包括任何后续修正。在本文中对于这样的权利要求中所包含的术语明确阐述的任何定义应决定这样的术语在权利要求中所使用的意义。因此，在权利要求中没有明确记载的限制、元素、性质、特征、优点或属性均不得以任何方式限制这样的权利要求的范围。说明书和附图因此要在例示性、而非限制性的意义上被看待。Thus described are example embodiments relating to adaptive reshaping techniques for layered coding of video images with enhanced dynamic range (EDR). In the foregoing specification, embodiments of the invention have been described with reference to numerous specific details that may vary between different implementations. Thus, the sole and exclusive indicator of what is the invention, and what the applicants intend the invention to be, is the set of claims in the specific form in which this application issues, and in which such claims issue, including any subsequent amendments. Any definitions expressly set forth herein for terms contained in such claims shall govern the meaning of such terms as used in the claims. Accordingly, no limitation, element, property, feature, advantage, or attribute that is not expressly recited in a claim shall limit the scope of such claim in any way. The specification and drawings are, therefore, to be regarded in an illustrative, rather than a restrictive, sense.

Claims

1. A method for encoding a sequence of input augmented dynamic range (EDR) images comprising image patches, the method comprising:

Receive the sequence of the input enhanced dynamic range (EDR) images;

For at least one input image patch in the sequence of input EDR images, one or more block complexity measures are computed, the block complexity measure representing the change in pixel value;

Construct a first set of image patches, the first set including image patches whose calculated block complexity metric satisfies a predetermined criterion;

For each block in the first set of image blocks, an optimal slope is determined according to a slope generation function for a linear quantization model, the optimal slope representing the minimum slope of the linear quantization model for the block;

For each input codeword in the sequence of the input EDR images:

A second set of image blocks is constructed, the second set including blocks belonging to the first set of image blocks, wherein the input codeword is within the minimum and maximum pixel values of the image blocks; and

The envelope slope of the input codeword is generated, and the envelope slope is calculated using the largest optimal slope among the optimal slopes of the blocks in the second set of image blocks;

Calculate the sum of the envelope slopes of all codewords; and

For each input codeword:

A cumulative slope is generated, wherein the cumulative slope is equal to the sum of the envelope slopes up to and including the input codeword; and

A mapping function is generated between the input codeword and the output codeword, which is calculated from the cumulative slope of the codeword and the sum of the envelope slopes of all codewords.

2. The method according to claim 1, further comprising:

The mapping function is applied to the input EDR image to produce a reshaped image;

Decompose the reshaped image into one or more layers; and

One or more video encoders are used to encode the one or more layers.

3. The method of claim 2, wherein, given integer pixels s and L layers, the decomposition step includes mapping pixel s to pixel values s _{and L} :

s _l =Clip3(s, p _l , p _l+1 -1)-p _l ,

Where l = 0, 1, ..., L-1, represents one of the L layers, Clip3() is a clipping function that clips the integer pixel s between the values of p _l and p _l+1 -1, and p _l represents the minimum pixel value of the integer sequence at layer l.

4. The method according to claim 3, wherein _p0 = 0, and

For i = 1, ..., L,

Where N<sub>_j</sub> represents the number of codewords available in the video encoder at level j.

5. The method of claim 1, wherein the linear quantization model comprises a function represented by the following formula:

Where _vL and _νH represent the minimum and maximum codeword values in the EDR input sequence, _cL and _cH represent the minimum and maximum output codeword values, k(j,n) represents the optimal slope of the nth block in the jth frame of the EDR input sequence, _vi represents the input codeword, and _si represents the corresponding output codeword.

6. The method of claim 5, wherein the optimal slope k(j, n) is generated by calculating the following formula:

Where T<sub>_th</sub> is a predetermined threshold, B(j,n) represents the maximum pixel value in block n of frame j, and A(j,n) represents the minimum pixel value in block n of frame j.

7. The method of claim 1, wherein calculating the block complexity metric of an image block comprises calculating the difference between the maximum and minimum pixel values in the image block, and wherein the first set of image blocks comprises all image blocks whose block complexity metric is greater than 0.

8. The method of claim 1, wherein generating the envelope slope k( _vc ) of the codeword _vc comprises calculating:

Where k(j,n) represents the optimal slope of block n in frame j of the sequence of the input EDR image, and θ( _vc ) represents the second set of the image blocks.

9. The method according to claim 1, wherein calculating the cumulative slope K( _{v_c} ) of the codeword _{v_c} comprises calculating:

in,

k(i) represents the envelope slope of the i-th input EDR codeword.

10. The method of claim 1, wherein generating the mapping function between the input codeword and the output codeword comprises calculating:

in,

K(v _c ) represents the cumulative slope of codeword v _c , k represents the sum of the envelope slopes of all codewords, and c _L and c _H represent the minimum and maximum output code values, respectively.

11. A method for generating an inverse mapping between received mapping codewords and output EDR codewords in an EDR coding system, wherein the received mapping codewords are generated according to the method of claim 1, the method comprising:

For the received mapped codeword, identify all EDR codewords that have been mapped to the received mapped codeword using a mapping function; and

A reverse mapping is generated by mapping the received mapping codewords to the output EDR codewords, which are calculated based on the average of all identified EDR codewords mapped to the received mapping codewords.

12. A method for encoding a sequence of input augmented dynamic range (EDR) images comprising image patches, the method comprising:

Receive the sequence of the input enhanced dynamic range (EDR) images;

For one or more image patches in at least one input image in the sequence of input EDR images, a first block complexity metric is calculated, the first block complexity metric representing the change in pixel value;

Construct a first set of image patches, the first set including image patches whose calculated block complexity metric satisfies a first predetermined criterion;

Select a candidate set of function parameter values for the mapping function used to map the input codeword to the quantized output codeword;

Mapped code values are generated by applying a mapping function of a candidate set having the function parameter values to input EDR codewords in blocks belonging to a first set of the image blocks; and

Determining whether the selected candidate set is optimal from the generated mapping code values, wherein the steps for determining whether the selected candidate set is optimal include:

For one or more blocks belonging to the first set, the resulting mapping code value is used to calculate a second block complexity metric, which represents the change in pixel value;

Construct a second set of image patches, the second set including a second set of image patches whose metrics satisfy a second predetermined criterion; and

If the number of blocks in the second set is the same as the number of blocks in the first set, then the selected candidate set is determined to be optimal.

If the selected candidate set is determined to be unoptimal, the selection, generation, and determination are repeated using different candidate sets of function parameter values; and

The optimal candidate set of determined function parameter values is used to generate a mapping function that maps the input codeword to the quantized output codeword.

13. The method of claim 12 further includes, after all blocks in the scene have been quantized, selecting the overall optimal candidate set of function values from the determined optimal candidate set.

14. The method of claim 12, wherein calculating the first block complexity metric for an image block of pixels includes calculating the standard deviation of pixel values in the block, and wherein the first predetermined metric includes comparing whether the first block complexity metric is greater than zero.

15. The method of claim 12, wherein the candidate set of function parameter values comprises the exponent α of the quantization function calculated as follows:

Where _vL and _νH represent the minimum and maximum codeword values in the EDR input sequence, _cL and _cH represent the minimum and maximum output codeword values, _vi represents the input codeword, and _si represents the corresponding output codeword.

16. The method of claim 1 or 12, wherein the output codeword generated by the method of claim 1 or 12 is decomposed into a plurality of layers, the plurality of layers including a first layer and a second layer, and wherein a first video encoder receives the first layer and encodes the first layer using a first bit depth, and a second video encoder receives the second layer and encodes the second layer using a second bit depth, wherein the second bit depth is different from the first bit depth.

17. The method of claim 16, wherein the first bit depth is 8 bits, and the second bit depth is 10 bits, 12 bits, or 14 bits.

18. The method of claim 1 or 12, wherein the output codeword generated by the method of claim 1 or 12 is decomposed into a plurality of layers, the plurality of layers including a first layer and a second layer, and wherein a first video encoder receives the first layer and encodes the first layer using a first encoding format, and a second video encoder receives the second layer and encodes the second layer using a second encoding format, wherein the second encoding format is different from the first encoding format.

19. The method according to claim 18, wherein the first encoding format is MPEG-2 encoding format, and the second encoding format is AVC or HEVC encoding format.

20. An apparatus comprising a processor and configured to perform the method described in any one of claims 1-19.

21. A non-transitory computer-readable storage medium having computer-executable instructions stored thereon for performing the method according to any one of claims 1-19.