HK1217068B

HK1217068B - System and methods for generating scene stabilized metadata

Info

Publication number: HK1217068B
Application number: HK16104981.5A
Authority: HK
Inventors: R．阿特金斯; R．耶尤格; 曲晟
Original assignee: 杜比实验室特许公司
Priority date: 2013-07-30
Filing date: 2014-07-28
Publication date: 2019-10-11

Description

System and method for generating scene-stable metadata

相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS

本申请要求2013年7月30日提交的美国临时专利申请No.61/859,956的优先权，该申请的全部内容由此通过引用并入。This application claims priority to U.S. Provisional Patent Application No. 61/859,956, filed July 30, 2013, the entire contents of which are hereby incorporated by reference.

技术领域Technical Field

本发明涉及视频内容创建和呈现，并且特别地，涉及用于产生用于改进目标显示器上的视频数据的呈现的相关联的元数据的系统和方法。The present invention relates to video content creation and presentation, and in particular, to systems and methods for generating associated metadata for improving the presentation of video data on a target display.

背景技术Background Art

视频文件中的元数据典型地是每一帧地产生或者针对关键帧产生的。然而，在许多情况下，视频回放可能具有对于视频内容的观众而言令人反感的伪像。这些伪像在场景之间(例如，对于可能具有某些共同特征的场景来说)可能是明显的。例如，照相机可能正在捕捉在空间和时间上正在移动的单个演员的视频——例如，在昏暗的房间里并且移动到户外阳光灿烂的空间的一个瞬间。Metadata in video files is typically generated per frame or for keyframes. However, in many cases, video playback may have artifacts that are objectionable to viewers of the video content. These artifacts may be noticeable between scenes (e.g., for scenes that may have certain common features). For example, a camera may be capturing video of a single actor moving in space and time—e.g., in a dimly lit room and moving to a bright, sunny space outdoors at one moment.

周围条件的这样的改变可能对观众引起明显的伪像(例如，改变前面提及的演员的面部色调)。当视频内容将被显示在可关于其性能(例如，对于亮度、色域呈现等)具有限制的目标显示器上时，可能尤其如此。对于内容创作者(诸如导演或后期制作人员)，可以通过产生基于场景的元数据来减轻这样的伪像。Such changes in ambient conditions may cause noticeable artifacts to the viewer (e.g., changing the facial tone of the actor mentioned above). This may be particularly true when the video content is to be displayed on a target display that may have limitations on its performance (e.g., brightness, color gamut rendering, etc.). For content creators (such as directors or post-production personnel), such artifacts can be mitigated by generating scene-based metadata.

发明内容Summary of the Invention

本文中公开了用于针对期望的视频数据流产生并应用场景稳定元数据的方法和系统。给出了如下的系统和/或方法，在这些系统和/或方法中，将视频数据流划分或分割为场景，并且针对视频数据的给定场景可产生第一组元数据。第一组元数据可以是任何已知的作为视频内容的期望应变量(function)的元数据(例如，亮度、色域等)。第一组元数据可以逐帧地产生。在一个实施例中，可以产生场景稳定元数据，其可以不同于场景的第一组元数据。可以通过监视场景的期望特征来产生场景稳定元数据，并且可以使用该场景稳定元数据来将期望特征保持在可接受的值范围内。这可以帮助避免呈现视频数据时的明显的并且可能令人反感的视觉伪像。Disclosed herein are methods and systems for generating and applying scene-stabilizing metadata for a desired video data stream. Systems and/or methods are provided in which a video data stream is divided or segmented into scenes, and a first set of metadata can be generated for a given scene of the video data. The first set of metadata can be any metadata known to be a desired function of the video content (e.g., brightness, color gamut, etc.). The first set of metadata can be generated on a frame-by-frame basis. In one embodiment, scene-stabilizing metadata can be generated that is different from the first set of metadata for the scene. The scene-stabilizing metadata can be generated by monitoring desired characteristics of the scene, and the scene-stabilizing metadata can be used to keep the desired characteristics within an acceptable range of values. This can help avoid noticeable and potentially objectionable visual artifacts when presenting the video data.

在一个实施例中，一种用于在视频数据流中使用场景稳定元数据的方法，该方法包括：将视频数据流划分为一组场景；产生与该组场景内的第一场景相关联的第一元数据；产生场景稳定元数据；并且将场景稳定元数据与所述第一场景相关联。In one embodiment, a method for using scene-stabilizing metadata in a video data stream includes: dividing the video data stream into a group of scenes; generating first metadata associated with a first scene within the group of scenes; generating scene-stabilizing metadata; and associating the scene-stabilizing metadata with the first scene.

在另一实施例中，一种用于对视频数据使用场景稳定元数据的系统，该系统包括：处理器；与所述处理器相关联的存储器，所述存储器进一步包括处理器可读指令，以使得当所述处理器读取处理器可读指令时，使处理器执行以下指令：接收视频数据流，所述视频数据流包括一组场景；对于该组场景，产生与该组场景相关联的第一元数据；产生一组场景稳定元数据；并且对于至少一个场景，将场景稳定元数据与所述至少一个场景相关联。In another embodiment, a system for using scene-stable metadata for video data includes: a processor; a memory associated with the processor, the memory further including processor-readable instructions so that when the processor reads the processor-readable instructions, the processor executes the following instructions: receiving a video data stream, the video data stream including a set of scenes; for the set of scenes, generating first metadata associated with the set of scenes; generating a set of scene-stable metadata; and for at least one scene, associating the scene-stable metadata with the at least one scene.

在又一实施例中，一种视频处理器包括：处理器；与所述处理器相关联的存储器，所述存储器进一步包括处理器可读指令，以使得当所述处理器读取处理器可读指令时，使处理器执行以下指令：接收传入的视频数据流，所述视频数据流包括一组场景；接收与至少一个场景相关联的第一组元数据；接收场景切换基本上是传入的视频数据流的下一帧的指示；接收场景稳定元数据；并且将场景稳定元数据基本上与传入的视频数据流的该下一帧相关联。In another embodiment, a video processor includes: a processor; a memory associated with the processor, the memory further including processor-readable instructions so that when the processor reads the processor-readable instructions, the processor executes the following instructions: receiving an incoming video data stream, the video data stream including a set of scenes; receiving a first set of metadata associated with at least one scene; receiving an indication that the scene cut is substantially the next frame of the incoming video data stream; receiving scene stabilization metadata; and associating the scene stabilization metadata substantially with the next frame of the incoming video data stream.

当结合本申请内呈现的附图阅读具体实施方式时，本系统的其他特征和优点在下面在具体实施方式中被呈现。Other features and advantages of the present system are presented below in the detailed description when the detailed description is read in conjunction with the drawings presented within this application.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

示例性实施例在附图的参考图中被例示说明。本文中公开的实施例和附图要被例示性地、而非限制性地考虑。Exemplary embodiments are illustrated in the referenced figures of the accompanying drawings.The embodiments and figures disclosed herein are to be considered illustrative, not restrictive.

图1描绘了根据本发明的原理构成的视频流水线系统的环境和架构的一个实施例。FIG1 depicts one embodiment of the environment and architecture of a video pipeline system constructed according to the principles of the present invention.

图2A和图2B描绘了可以适合于本申请的目的的视频流水线流程图的两个实施例。2A and 2B depict two embodiments of video pipeline flow diagrams that may be suitable for the purposes of the present application.

图3描绘了在示例性目标显示器的显示管理时可发生的视频处理的高级流程图的一个实施例。FIG3 depicts one embodiment of a high-level flow diagram of video processing that may occur during display management of an exemplary target display.

图4是用于视频文件的场景稳定元数据的产生和关联的视频处理的一个实施例。FIG4 is one embodiment of a video process for the generation and association of scene-stabilizing metadata for a video file.

图5是将场景改变的预先(advanced)通知合并在视频流水线中的流程图的一个实施例。FIG5 is one embodiment of a flow diagram for incorporating advanced notification of scene changes into a video pipeline.

图6描绘了被分割为场景的一个示例性视频文件以及包括场景改变的指示的场景内的一个帧。FIG6 depicts an exemplary video file segmented into scenes and a frame within a scene including an indication of a scene change.

具体实施方式DETAILED DESCRIPTION

如本文中所使用的，术语“组件”、“系统”、“接口”等意图指代计算机相关实体，该计算机相关实体是硬件、(例如，执行中的)软件和/或固件。例如，组件可以是在处理器上运行的过程、处理器、对象、可执行指令、程序和/或计算机。举例说明，在服务器上运行的应用和服务器两者都可以是组件。一个或多个组件可以驻留在过程内，并且组件可以被本地化在一个计算机上和/或分布在两个或更多个计算机之间。组件也可以意图指代通信相关实体，该通信相关实体是硬件、(例如，执行中的)软件和/或固件，并且还可以包括足以实现通信的有线或无线硬件。As used herein, the terms "component," "system," "interface," and the like are intended to refer to a computer-related entity, which is hardware, (e.g., in execution) software, and/or firmware. For example, a component can be a process, a processor, an object, an executable instruction, a program, and/or a computer running on a processor. By way of example, both an application running on a server and the server can be components. One or more components can reside within a process, and a component can be localized on one computer and/or distributed between two or more computers. A component can also be intended to refer to a communication-related entity, which is hardware, (e.g., in execution) software, and/or firmware, and can also include wired or wireless hardware sufficient to enable communication.

在以下整个描述中，对特定细节进行了阐述，以便向本领域技术人员提供更透彻的理解。然而，未对公知的元件进行详细示出或描述，以避免不必要地模糊本公开。因此，描述和附图要从例示性的、而非限制性的意义上来看待。Throughout the following description, certain details are set forth to provide a more thorough understanding to those skilled in the art. However, well-known elements are not shown or described in detail to avoid unnecessarily obscuring the present disclosure. Therefore, the description and drawings are to be viewed in an illustrative, non-restrictive sense.

引言introduction

为了在目标显示器上的视频回放中确保时间稳定性(例如，没有闪烁、脉动、时明时暗等)并且减轻可能不适宜的视觉伪像，可能期望与视频数据相关联的元数据随着时间的过去大体上是稳定的。在几个实施例中，这可以通过强制使得元数据在场景的持续时间上稳定来实现。这样的元数据可以被允许在每次场景切换时改变。在这样的情况下，元数据中的适应内容的改变对于观众可以是不明显的。In order to ensure temporal stability (e.g., no flickering, pulsating, flickering, etc.) in video playback on the target display and to mitigate potentially unpleasant visual artifacts, it may be desirable for metadata associated with the video data to be substantially stable over time. In several embodiments, this can be achieved by enforcing the metadata to be stable over the duration of a scene. Such metadata may be allowed to change at each scene cut. In such cases, changes in the metadata to accommodate content may not be noticeable to the viewer.

仅举一例，可以临在显示之前逐帧地估计视频/图像数据元数据。然而，这可能对场景的显现——可能地，在场景的中间——导致使人不愉快的明显的改变。As just one example, video/image data metadata can be estimated frame by frame just before display. However, this can result in unpleasantly noticeable changes to the appearance of a scene—perhaps in the middle of a scene.

在本申请的若干实施例中，描述了用于产生或以其他方式创建与视频数据相关和/或关联的元数据的系统和方法。在本申请的许多实施例中，如本文中将更详细地描述的，相关联的元数据可被逐场景地产生。这样的元数据可以在视频数据流的前端被创建——或者在视频数据流的任何其他合适的部分被创建。视频数据可以被传输和/或发送给视频数据的用户/消费者/观众——不管是在电影院、家庭观看环境、视频会议、还是在视频数据将被观看和/或消费的任何地方。In several embodiments of the present application, systems and methods for generating or otherwise creating metadata related and/or associated with video data are described. In many embodiments of the present application, as will be described in more detail herein, associated metadata can be generated scene by scene. Such metadata can be created at the front end of the video data stream - or created at any other suitable portion of the video data stream. The video data can be transmitted and/or sent to the user/consumer/viewer of the video data - whether in a movie theater, a home viewing environment, a video conference, or anywhere else the video data will be viewed and/or consumed.

在以下共有的专利和/或专利申请中描述了许多元数据-产生和/或元数据-消费技术：A number of metadata-generating and/or metadata-consuming techniques are described in the following commonly owned patents and/or patent applications:

(1)Messmer的2013年3月28日公布的、标题为“TONE AND GAMUT MAPPING METHODSAND APPARATURS”的美国专利申请20130076763；(1) U.S. Patent Application 20130076763 to Messmer, published on March 28, 2013, entitled “TONE AND GAMUT MAPPING METHODS AND APPARATURS”;

(2)Messmer等人的2013年6月27日公布的、标题为“DATA TRANSMISSION USINGOUT-OF-GAMUT COLOR COORINATES”的美国专利申请20130162666；(2) U.S. Patent Application 20130162666 to Messmer et al., published on June 27, 2013, entitled “DATA TRANSMISSION USING OUT-OF-GAMUT COLOR COORINATES”;

(3)Longhurst等人的2013年6月6日公布的、标题为“MEATDATA FOR USE IN COLORGRADING”的美国专利申请20130141647；以及(3) U.S. Patent Application 20130141647, entitled “MEATDATA FOR USE IN COLORGRADING,” published on June 6, 2013, by Longhurst et al.; and

(4)Messmer等人的2012年12月13日公布的、标题为“VIDEO DELIVERY ANDCONTROL BY OVERWRITING VIDEO DATA”的美国专利申请20120315011。(4) U.S. Patent Application 20120315011 to Messmer et al., published on December 13, 2012, entitled “VIDEO DELIVERY AND CONTROL BY OVERWRITING VIDEO DATA.”

——所有这些申请的全部内容都由此通过引用并入。- All of these applications are hereby incorporated by reference in their entirety.

图1、图2A和图2B描绘了本申请的系统和/或方法可以驻留在其中的若干一般性的环境系统(分别为100、200、206)。这些系统表示可能的端对端视频产生/传输/观看视频流水线——例如，在这些流水线中，视频可以被捕捉，被逐场景地进行元数据提取，并且被置于视频流中以供发布，被发送给目标显示器以供观看。Figures 1, 2A, and 2B depict several generalized environmental systems (100, 200, and 206, respectively) in which the systems and/or methods of the present application may reside. These systems represent possible end-to-end video production/transmission/viewing video pipelines—for example, in which video may be captured, metadata extracted on a scene-by-scene basis, and placed into a video stream for distribution and sent to a target display for viewing.

在图1中，系统100(或者其部分)可以被配置为执行本文中所描述的方法中的一种或多种。系统100的组件可以被实现为软件、固件、硬件和/或它们的组合。系统100包括视频捕捉子系统102、后期制作子系统103和显示子系统104。视频数据流123(更具体地说，对于视频流流水线中的不同点，123-1、123-2、123-3)由视频捕捉子系统102产生，并且被输送到后期制作子系统103以供处理和编辑。视频图像可以在编辑视频数据123的过程中在后期制作子系统103的参考显示器111上被显示和观看。编辑后的视频数据123(经由编码器127A和解码器127B通过发布介质125)被输送到显示子系统104以供进一步处理和显示。子系统102、103和104(以及编码器127A)中的每一个均可以被配置为在视频数据123中编码元数据225。下游子系统可以被配置为从上游装置接收视频数据123并且对已经嵌入在其中的元数据225进行解码。元数据225可以被下游子系统(例如，子系统103和104)用于指导视频数据123的处理和/或显示。元数据225可以连同显示器表征参数120一起被显示子系统104使用以控制和/或指导显示子系统104的显示器118上的视频回放。In FIG1 , system 100 (or portions thereof) can be configured to perform one or more of the methods described herein. Components of system 100 can be implemented as software, firmware, hardware, and/or a combination thereof. System 100 includes a video capture subsystem 102, a post-production subsystem 103, and a display subsystem 104. Video data stream 123 (more specifically, 123-1, 123-2, and 123-3 at different points in the video stream pipeline) is generated by video capture subsystem 102 and delivered to post-production subsystem 103 for processing and editing. During the editing of video data 123, video images can be displayed and viewed on a reference display 111 of post-production subsystem 103. The edited video data 123 is delivered (via encoder 127A and decoder 127B via distribution medium 125) to display subsystem 104 for further processing and display. Each of subsystems 102, 103, and 104 (and encoder 127A) can be configured to encode metadata 225 in video data 123. Downstream subsystems can be configured to receive video data 123 from upstream devices and decode metadata 225 embedded therein. Metadata 225 can be used by downstream subsystems (e.g., subsystems 103 and 104) to direct processing and/or display of video data 123. Metadata 225 can be used by display subsystem 104, along with display characterization parameters 120, to control and/or direct video playback on display 118 of display subsystem 104.

如图1中所见，子系统102、103和104可以分别包括处理器106、108和116、以及可供各自的处理器访问的程序存储器107、109和117。每个处理器(在这里和其他地方被描述)可以包括中央处理单元(CPU)、一个或多个微处理器、一个或多个现场可编程门阵列(FPGA)、或它们的任何组合、或能够如本文中所描述的那样运作的包括硬件和/或软件的任何其他合适的处理单元(一个或多个)。在每个子系统中，处理器执行由存储在程序存储器中的软件提供的指令。软件可以包括执行如本文中描述的元数据产生、编码、解码和处理步骤的例程，诸如，举例来说，执行以下操作的例程：As shown in FIG1 , subsystems 102, 103, and 104 may include processors 106, 108, and 116, respectively, and program memories 107, 109, and 117 accessible to the respective processors. Each processor (described herein and elsewhere) may include a central processing unit (CPU), one or more microprocessors, one or more field programmable gate arrays (FPGAs), or any combination thereof, or any other suitable processing unit(s) including hardware and/or software capable of operating as described herein. In each subsystem, the processor executes instructions provided by software stored in the program memory. The software may include routines that perform the metadata generation, encoding, decoding, and processing steps described herein, such as, for example, routines that perform the following operations:

(1)产生元数据225或者接收关于元数据225的参数；(1) generating metadata 225 or receiving parameters related to metadata 225;

(2)在视频数据123被传送到下游装置之前，在视频数据123中编码元数据225；(2) encoding metadata 225 in the video data 123 before the video data 123 is transmitted to a downstream device;

(3)从自上游装置接收的视频数据123解码元数据225；(3) decoding metadata 225 from video data 123 received from an upstream device;

(4)处理并应用元数据225来指导视频数据的处理和/或显示；(4) processing and applying metadata 225 to guide processing and/or display of video data;

(5)基于图像和/或视频数据123来选择用于对元数据225进行编码的方法；(5) selecting a method for encoding metadata 225 based on the image and/or video data 123;

(6)等等。(6) Etc.

系统100可以包括可供子系统102、103和104访问的储存库110。储存库110可以包括元数据定义库112(例如，该库向元数据编码器和解码器通知如何产生和/或读取元数据)和保留字列表114(例如，受到保护的像素值或保留的元数据字)。在产生、编码和/或处理元数据时，元数据定义库112可以被子系统102、103和104访问。在对元数据225进行编码或解码时，可以将保留字114与编码的/解码的元数据比特进行比较以识别要插入到(或者已经被插入到)元数据流中的保护比特的序列，以防止保留字的传送。虽然在图1的例示说明的实施例中示出了共享储存库110，但是在其他实施例中，子系统102、103和104中的每个均可以结合有可供该子系统访问的存储介质中存储的本地储存库110。System 100 may include a repository 110 accessible to subsystems 102, 103, and 104. Repository 110 may include a metadata definition library 112 (e.g., a library that informs metadata encoders and decoders how to generate and/or read metadata) and a reserved word list 114 (e.g., protected pixel values or reserved metadata words). Metadata definition library 112 may be accessed by subsystems 102, 103, and 104 when generating, encoding, and/or processing metadata. When encoding or decoding metadata 225, reserved words 114 may be compared with encoded/decoded metadata bits to identify a sequence of protection bits to be inserted (or already inserted) into the metadata stream to prevent the transmission of reserved words. While a shared repository 110 is shown in the illustrated embodiment of FIG. 1 , in other embodiments, each of subsystems 102, 103, and 104 may incorporate a local repository 110 stored on a storage medium accessible to that subsystem.

图2A是示出根据特定实施例的通过视频输送流水线200的数据流程的流程图。视频输送流水线200具有与图1的视频输送流水线100中所描绘的级类似的级。在视频输送流水线200的一个或多个级处，可以产生元数据225，并且将元数据225嵌入在视频数据流123中以供在下游级使用。元数据225连同视频数据123一起通过视频输送流水线200被传输，以指导下游装置对视频数据进行处理和/或在方框210指导显示子系统处的视频回放。在方框206，可以使用适合于视频内容输送的类型(例如，通过卫星、电缆或高清网络的电视广播；通过IP或无线网络的流传输多媒体；从DVD或其他存储介质的回放；等等)的系统、设备和方法，将包括嵌入的元数据225的视频数据123输送到显示子系统。FIG2A is a flow chart illustrating the flow of data through a video delivery pipeline 200 according to certain embodiments. The video delivery pipeline 200 has stages similar to those depicted in the video delivery pipeline 100 of FIG1 . At one or more stages of the video delivery pipeline 200, metadata 225 may be generated and embedded in the video data stream 123 for use at downstream stages. The metadata 225 is transmitted along with the video data 123 through the video delivery pipeline 200 to guide processing of the video data by downstream devices and/or to guide video playback at a display subsystem at block 210. At block 206, the video data 123, including the embedded metadata 225, may be delivered to the display subsystem using systems, devices, and methods appropriate to the type of video content delivery (e.g., television broadcast via satellite, cable, or high-definition networks; streaming multimedia over IP or wireless networks; playback from a DVD or other storage medium; etc.).

在图2A实施例中，在方框202，可以产生照相机元数据225A，并且将照相机元数据225A嵌入在视频数据123-1中。可以基于照相机设置和视频帧捕捉环境来产生照相机元数据225A。照相机元数据225A可以包括例如提供视频帧捕捉期间的照相机设置的快照的照相机参数。这样的照相机参数可以包括光圈(f光阑)、透镜、快门速度、灵敏度(ISO额定值)等。这些照相机参数可以被用于指导视频输送流水线200中的后续步骤，诸如方框204处的后期制作编辑期间的颜色调整(例如，颜色调解)、或者方框210处的显示器配置。In the embodiment of FIG2A , at block 202 , camera metadata 225A may be generated and embedded in the video data 123 - 1 . The camera metadata 225A may be generated based on camera settings and the video frame capture environment. The camera metadata 225A may include, for example, camera parameters that provide a snapshot of the camera settings during video frame capture. Such camera parameters may include aperture (f-stop), lens, shutter speed, sensitivity (ISO rating), etc. These camera parameters may be used to guide subsequent steps in the video delivery pipeline 200 , such as color adjustment (e.g., color tuning) during post-production editing at block 204 or display configuration at block 210 .

在方框204，产生后期制作元数据225B，并且将后期制作元数据225B嵌入在视频数据123-2中。后期制作元数据225B可以包括：参考显示器和环境元数据225B₁以及源视频内容表征元数据225B₂。后期制作元数据225B可以被用于指导视频输送流水线200中的后续步骤，诸如方框210处的显示器配置。At block 204, post-production metadata 225B is generated and embedded in the video data 123-2. The post-production metadata 225B may include reference display and environment metadata _225B1 and source video content representation metadata _225B2 . The post-production metadata 225B may be used to guide subsequent steps in the video delivery pipeline 200, such as display configuration at block 210.

参考显示器和环境元数据225B₁可以描述方框204后期制作编辑中使用的参考显示器配置和工作室或观看环境。例如，关于在方框204后期制作编辑期间用于显示视频数据123的参考显示器，参考显示器和环境元数据225B₁可以包括诸如以下的参数：The reference display and environment metadata _225B1 may describe the reference display configuration and studio or viewing environment used in post-production editing at block 204. For example, regarding the reference display used to display the video data 123 during post-production editing at block 204, the reference display and environment metadata _225B1 may include parameters such as:

(1)描述在精细分辨率下的参考显示器的色调和色域边界的3D色域映射；(1) A 3D gamut map that describes the hues and gamut boundaries of a reference display at fine resolution;

(2)限定参考显示器的色调和色域边界的缩减参数集(其可以被用于估计3D色域映射)；(2) A reduced set of parameters defining the hue and gamut boundaries of a reference display (which can be used to estimate a 3D gamut mapping);

(3)描述参考显示器对每个色度通道的色调响应的系统色调响应参数；(3) System tone response parameters describing the tone response of the reference display for each chromaticity channel;

(4)屏幕大小(4) Screen size

(5)等等。(5) Etc.

参考显示器和环境元数据225B₁还可以包括描述在方框204后期制作编辑期间的工作室环境的参数，在工作室环境中，在参考显示器上对视频内容进行颜色调解或编辑。这样的参数可以包括周围亮度和周围色温。The reference display and environment metadata _225B1 may also include parameters describing the studio environment in which the video content is color-adjusted or edited on the reference display during post-production editing at block 204. Such parameters may include ambient brightness and ambient color temperature.

源视频内容表征元数据225B₂可以描述经后期制作编辑的视频内容，该视频内容包括可以标识或提供以下内容的信息：The source video content representation metadata _225B2 may describe the video content that has been edited after post-production, and may include information that may identify or provide the following:

(1)色调映射(例如，可以被用于指导显示器处的色调扩展的定制色调映射参数或曲线)；以及色域映射(例如，可以被用于指导显示器处的色域扩展的定制色域映射参数)；(1) Tone mapping (e.g., custom tone mapping parameters or curves that can be used to guide tone expansion at a display); and gamut mapping (e.g., custom gamut mapping parameters that can be used to guide color gamut expansion at a display);

(2)在场景中被视为重要的最小黑色水平的水平(例如，汽车下方的阴影)；(2) the minimum black level that is considered important in the scene (e.g., the shadow under the car);

(3)与场景的最显要部分(例如，演员的面部)对应的水平；(3) the level corresponding to the most prominent part of the scene (e.g., the actor's face);

(4)在场景中被视为重要的最大白色水平的水平(例如，灯泡的中心)；(4) the level of the maximum white level considered important in the scene (e.g., the center of a light bulb);

(5)场景中的色彩最多的颜色(例如，霓虹灯等)；(5) the most colorful color in the scene (e.g., neon lights);

(6)图像中的光源的或者图像中的反射或发射对象的位置图；(6) a map of the positions of light sources in the image or of reflecting or emitting objects in the image;

(7)视频源内容的色域；(7) The color gamut of the video source content;

(8)图像的被特意地颜色调解到参考显示器的色域之外的区域；(8) The image is intentionally color-adjusted to areas outside the color gamut of the reference display;

(9)在视频处理器的预显示处理期间或者在显示器配置期间不应改变的受到保护的颜色；(9) protected colors that should not be changed during pre-display processing in the video processor or during display configuration;

(10)在亮度或色域方面表征图像的图像直方图(例如，这样的信息可以被下游装置用于确定平均亮度以改善色调和色域映射)；(10) an image histogram that characterizes the image in terms of brightness or color gamut (e.g., such information can be used by downstream devices to determine average brightness to improve tonal and color gamut mapping);

(11)向下游装置警告来自前面的视频帧的任何统计或滞后不再有效的场景改变或重置标志；(11) Scene change or reset flags that alert downstream devices that any statistics or lags from previous video frames are no longer valid;

(12)表征视频内容的用以标识运动中的对象的运动图，该运动图可以与光源位置图组合地被下游装置使用以指导色调和色域映射；(12) a motion map representing the video content to identify objects in motion, which can be used by downstream devices in combination with a light source position map to guide tone and color gamut mapping;

(13)经颜色调解的内容的来源(例如，直接来自照相机或者后期制作编辑)的指示；(13) an indication of the source of the color-modulated content (e.g., directly from the camera or in post-production editing);

(14)导演的创作意图设置，其可以被用于控制下游装置，诸如解码器/电视或其他显示器。例如，这样的设置可以包括：提供控制显示器在特定模式(例如，生动、影院、标准、专业等)下操作的能力的显示模式控制；可以被用于确定适当的色域或色调映射等的内容类型(例如，动画、戏剧、体育、游戏等)；(14) Director's creative intent settings, which can be used to control downstream devices such as decoders/TVs or other displays. For example, such settings may include: display mode controls that provide the ability to control the display to operate in a specific mode (e.g., vivid, cinema, standard, professional, etc.); content types (e.g., animation, drama, sports, games, etc.) that can be used to determine appropriate color gamut or tone mapping, etc.;

(15)等等。(15) and so on.

在方框206，将视频数据123-2输送到显示子系统。如图2B中所见，输送流水线206可以包括编码器级127A，其用于驱动视频数据123通过视频发布介质125(诸如卫星、电缆或高清网络)、IP或无线网络、或DVD或其他存储介质等的发布、广播或传输。解码器级127B可以设在方框206的显示器端以对通过介质125发布的视频数据123进行解码。解码器级127B可以用例如机顶盒或者用显示子系统内的解码器来实现。在方框206和/或208，可以将观看环境元数据225C和/或其他元数据225嵌入在视频数据123中。观看环境元数据225C可以包括例如：In square frame 206, video data 123-2 is delivered to the display subsystem. As seen in Figure 2B, delivery pipeline 206 may include encoder stage 127A, which is used to drive the video data 123 to be released through video distribution medium 125 (such as satellite, cable or high-definition network), IP or wireless network, or DVD or other storage medium, etc., broadcast or transmission. Decoder stage 127B may be located at the display end of square frame 206 to decode the video data 123 released through medium 125. Decoder stage 127B may be implemented with, for example, a set-top box or with a decoder in the display subsystem. In square frames 206 and/or 208, viewing environment metadata 225C and/or other metadata 225 may be embedded in the video data 123. Viewing environment metadata 225C may include, for example:

高级视频编码(AVC)VDR编码器数据，其提供参考监视器色调映射或色域曲线或者参考环境的周围亮度。该信息的至少一些可以由视频处理器在知晓显示器特性(例如，通过读取显示器的扩展显示标识数据(EDID))和显示子系统的环境的情况下确定。在一些实施例中，该信息的至少一些可以在工作室在视频数据的后期制作处理期间被确定。Advanced Video Coding (AVC) VDR encoder data that provides a reference monitor tone mapping or color gamut curve or the ambient luminance of the reference environment. At least some of this information can be determined by the video processor with knowledge of the display characteristics (e.g., by reading the display's Extended Display Identification Data (EDID)) and the environment of the display subsystem. In some embodiments, at least some of this information can be determined in the studio during post-production processing of the video data.

描述显示子系统的显示器所处的环境的参数。这样的参数可以包括例如周围亮度和/或色调或色温。Parameters that describe the environment that the display of the display subsystem is in. Such parameters may include, for example, ambient brightness and/or hue or color temperature.

观看环境元数据225C可以被用于指导在方框208的视频数据的处理和/或在方框210的显示器配置。The viewing environment metadata 225C may be used to guide processing of the video data at block 208 and/or configuration of the display at block 210 .

显示子系统包括用于在方框208对传入的视频数据123-3进行处理的视频处理器。显示子系统的视频处理器可以基于从视频数据123提取的元数据225(例如，元数据225A)和/或与显示子系统的显示器相关联的已知的显示特性对视频数据123-3执行信号处理。可以根据显示器表征参数226和/或元数据225来针对显示器对视频数据123进行处理和调整。The display subsystem includes a video processor for processing the incoming video data 123-3 at block 208. The display subsystem's video processor may perform signal processing on the video data 123-3 based on metadata 225 (e.g., metadata 225A) extracted from the video data 123 and/or known display characteristics associated with the display of the display subsystem. The video data 123 may be processed and adjusted for the display based on the display characterization parameters 226 and/or the metadata 225.

可以在方框206和/或208或者视频输送流水线200的其他级被嵌入在视频数据123中的其他元数据225包括管控(housekeeping)元数据225D(用于管理发布权限等)，诸如，举例来说：Other metadata 225 that may be embedded in the video data 123 at blocks 206 and/or 208 or other stages of the video delivery pipeline 200 includes housekeeping metadata 225D (for managing publishing rights, etc.), such as, for example:

(1)指示视频内容在哪被产生、发布、修改等的水印数据；(1) Watermark data indicating where the video content was generated, published, modified, etc.

(2)出于搜索或索引目的等而提供视频内容的描述的指纹识别数据；(2) Fingerprinting data that provides a description of video content for search or indexing purposes, etc.;

(3)指示谁拥有视频内容和/或谁可以访问它的保护数据；(3) protection data indicating who owns the video content and/or who can access it;

(4)等等。(4) Etc.

可以至少部分基于与显示子系统的显示器相关联的显示器表征参数206来产生观看环境元数据255C。在一些实施例中，观看环境元数据255C、源视频内容表征元数据225B₂和/或管控元数据225D可以通过编码器级127A、解码器级127B处的视频数据103的分析来创建或提供，和/或在方框208由视频处理器创建或提供。The viewing environment metadata 255C may be generated based at least in part on the display characterization parameters 206 associated with the display of the display subsystem. In some embodiments, the viewing environment metadata 255C, the source video content characterization metadata _225B2 , and/or the governance metadata 225D may be created or provided by analysis of the video data 103 at the encoder stage 127A, the decoder stage 127B, and/or created or provided by the video processor at block 208.

在方框210，可以对显示子系统的显示器执行显示器配置。可以基于显示器表征参数226和/或元数据225来确定用于显示器配置的适当的参数，元数据225为诸如照相机元数据225A、后期制作元数据225B(包括参考显示器和环境元数据225B₁和源视频内容表征元数据225B₂)以及观看环境元数据225C。显示器被根据这样的参数进行配置。视频数据123输出到显示器。At block 210, display configuration can be performed on the display of the display subsystem. Appropriate parameters for display configuration can be determined based on display characterization parameters 226 and/or metadata 225, such as camera metadata 225A, post-production metadata 225B (including reference display and environment metadata _225B1 and source video content characterization metadata _225B2 ), and viewing environment metadata 225C. The display is configured according to these parameters. Video data 123 is output to the display.

用于在方框208的视频数据123的处理以及在方框210的显示器配置的元数据225在视频数据流中被输送，以使得元数据225在其被应用之前在显示子系统(包括视频处理器和显示器)处被接收。在一些实施例中，输送元数据225，以使得它比将应用该元数据225的帧提前至少一个视频帧被显示子系统接收。在某些实施例中，元数据225被提前一个视频帧输送，并且元数据225在方框208和/或210的应用可以在传入的视频流中检测到新的视频帧时被触发。The metadata 225 used for processing the video data 123 at block 208 and for display configuration at block 210 is delivered in the video data stream so that the metadata 225 is received at the display subsystem (including the video processor and display) before it is applied. In some embodiments, the metadata 225 is delivered so that it is received by the display subsystem at least one video frame ahead of the frame to which the metadata 225 is to be applied. In certain embodiments, the metadata 225 is delivered one video frame ahead, and application of the metadata 225 at blocks 208 and/or 210 can be triggered when a new video frame is detected in the incoming video stream.

分场景“稳定”元数据"Stable" metadata by scene

如前面所提及的，可能希望的是逐场景地捕捉视频文件中的元数据。如本文中所描述的，本申请的若干实施例可以逐场景地捕捉元数据(例如，基于亮度、色域等)。特别地，一个实施例可以提供可在相同的和/或类似的场景之间应用的一组“稳定”元数据。As mentioned above, it may be desirable to capture metadata in video files on a scene-by-scene basis. As described herein, several embodiments of the present application can capture metadata on a scene-by-scene basis (e.g., based on brightness, color gamut, etc.). In particular, one embodiment can provide a set of "stable" metadata that can be applied across identical and/or similar scenes.

在一个实施例中，每个场景可以与全局场景元数据相关联，所述全局场景元数据可以响应于场景内的帧相关特性(例如，诸如每个帧中的最小、最大和中间亮度值)而产生。也可以强制使得具有类似特性的场景共享相同的元数据，以使得在显示期间，它们保持相同的样子和感觉。在另一实施例中，接收器还可以接收“预先通知(Advanced Notice)元数据”——例如，也就是说，用于未来的场景的元数据，所以它可以预先准备与DM处理相关的参数。In one embodiment, each scene can be associated with global scene metadata, which can be generated in response to frame-related characteristics within the scene (e.g., such as the minimum, maximum, and median luminance values in each frame). It can also be forced that scenes with similar characteristics share the same metadata so that they maintain the same look and feel during display. In another embodiment, the receiver can also receive "Advanced Notice" metadata - that is, metadata for future scenes, so it can prepare parameters related to the DM process in advance.

为了领会“稳定”场景元数据的概念，以下描述仅仅是为了说明的目的而提供的，并非意图限制本申请的范围。可能希望的是使颜色和明亮度在数个场景的过程中稳定。在一个例子中，假设在“场景”中存在两个演员，但是照相机在一组的视频帧序列中——例如，在这两个演员之间的持久的对话中——切换到一个演员，然后切换到另一个演员。即使这可在戏剧上构成“场景”，但这两个不同的照相机切换可能引起对于观众而言既明显、又令人反感的颜色和/或亮度偏移。在一些实施例中，可以针对每次切换具有不同的元数据——例如，以便生成整个场景的稳定外观。To appreciate the concept of "stabilizing" scene metadata, the following description is provided for illustrative purposes only and is not intended to limit the scope of this application. It may be desirable to stabilize color and brightness over the course of several scenes. In one example, suppose there are two actors in a "scene," but the camera switches to one actor and then to the other actor in a set sequence of video frames—for example, during a prolonged conversation between the two actors. Even though this may constitute a "scene" dramatically, the two different camera switches may cause color and/or brightness shifts that are both noticeable and objectionable to the viewer. In some embodiments, different metadata may be used for each switch—for example, to generate a stable appearance for the entire scene.

另举一例，考虑在“场景”中存在单个演员——但是该演员在移动中，并且照相机跟随该演员。再次，即使这可在戏剧上是单个场景，也可能存在对于观众而言既明显、又令人反感的亮度和/或颜色偏移。又举一例，导演可以利用“消融”(或“渐变”)技术，在该技术中，一个场景降低其亮度(可能降至零)，而另一场景可以在数帧的时间段期间从低(例如，零)亮度开始直到最大亮度。这样的消融或渐变可以被用于例示说明对于屏幕上的演员的或者出于其他目的的闪回。As another example, consider a scene in which there is a single actor—but the actor is in motion, and the camera follows the actor. Again, even though this may be a single scene dramatically, there may be brightness and/or color shifts that are both noticeable and objectionable to the viewer. As another example, a director may utilize a "dissolve" (or "fade") technique in which one scene reduces its brightness (perhaps to zero), while another scene may start at a low (e.g., zero) brightness and increase to maximum brightness over a period of several frames. Such a dissolve or fade may be used to illustrate a flashback to an on-screen actor or for other purposes.

在导演可参与捕捉的视频的后期制作处理的情况下，这些情形可能变得相关。这样的导演可在专业级的监视器(例如，具有大约高达5000尼特的亮度)上对视频进行颜色分级、亮度映射。然而，电影可能是在可能具有低得多的亮度的家庭视频设备或某一其他的目标显示器上被观看。预先知道这一点可以允许导演或其他内容创作者有机会改进观众对该内容的体验。These situations may become relevant in situations where a director may be involved in post-production processing of captured video. Such a director may color grade and brightness map the video on a professional-grade monitor (e.g., with a brightness of up to approximately 5000 nits). However, the movie may be viewed on a home video device or some other target display that may have a much lower brightness. Knowing this in advance may allow the director or other content creator the opportunity to improve the viewer's experience of the content.

基于这数个例子(以及本文中未提及的其他例子)，从观众的角度(如果不是视频的内容创作者/导演的角度)来讲可能希望的是，逐场景地应用元数据，和/或在适当的位置具有可以确定对场景和/或帧序列何时应用“稳定”元数据的过程——该过程可能已经对当前场景/帧利用了不同的、可能基于帧的元数据。Based on these several examples (and others not mentioned herein), it may be desirable from a viewer's perspective (if not that of the content creator/director of the video) to apply metadata on a scene-by-scene basis, and/or to have a process in place that can determine when to apply "stable" metadata to a scene and/or frame sequence—a process that may already be utilizing different, possibly frame-based, metadata for the current scene/frame.

对于家庭视频情况，常常可能存在如下情况，存在显示器管理(DM)处理器，其可试图向家庭显示器提供视频数据的“最好的”(或“更好的”)映射。DM常常提供动态范围映射以提供从可获得的视频数据到目标显示器的良好亮度映射。动态范围映射可以使用基于亮度统计的元数据(例如，最大亮度、均值亮度和/或最小亮度)来提供映射。For home video situations, there may often be a situation where a display management (DM) processor exists that attempts to provide the "best" (or "better") mapping of video data to the home display. The DM often provides a dynamic range map to provide a good luminance mapping from the available video data to the target display. The dynamic range map can use metadata based on luminance statistics (e.g., maximum luminance, mean luminance, and/or minimum luminance) to provide the mapping.

几个共有的专利申请公开了显示器管理(DM)系统和技术，并且对于本申请的系统和方法可以是有用的。Several commonly owned patent applications disclose display management (DM) systems and techniques and may be useful with the systems and methods of the present application.

(1)Gish等人的2011年8月11日公布的、标题为“COMPATIBLE COMPRESSION OFHIGH DYNAMIC RANGE,VISUAL DYNAMIC RANGE,AND WIDE COLOR GAMUT VIDEO”的美国专利申请20110194618；(1) U.S. Patent Application 20110194618, entitled “COMPATIBLE COMPRESSION OF HIGH DYNAMIC RANGE, VISUAL DYNAMIC RANGE, AND WIDE COLOR GAMUT VIDEO,” to Gish et al., published on August 11, 2011;

(2)Longhurst的2012年9月13日公布的、标题为“INTERPOLATION OF COLOR GAMUTFOR DISPLAY ON TARGET DISPLAY”的美国专利申请20120229495；(2) U.S. Patent Application No. 20120229495, entitled “INTERPOLATION OF COLOR GAMUTFOR DISPLAY ON TARGET DISPLAY,” published on September 13, 2012, to Longhurst;

(3)Messmer的2012年12月20日公布的、标题为“VIDEO DISPLAY CONTROL USINGEMBEDDED METADATA”的美国专利申请20120321273；以及(3) U.S. Patent Application 20120321273 to Messmer, published on December 20, 2012, entitled “VIDEO DISPLAY CONTROL USING EMBEDDED METADATA”; and

(4)Seetzen等人的2013年2月14日公布的、标题为“DISPLAY MANAGEMENT METHODSAND APPARATUS”的美国专利申请20130038790；(4) U.S. Patent Application 20130038790, published on February 14, 2013, to Seetzen et al., entitled “DISPLAY MANAGEMENT METHODS AND APPARATUS”;

稳定场景元数据的一个实施例An embodiment of stable scene metadata

图3描绘了本申请的高级方框流程图的一个实施例。视频流水线300可以接收编码的比特流301，其可以进一步包括视频/图像数据连同元数据，该元数据可以为某一可用格式——例如，逐帧、逐场景，并且包括基于亮度统计的元数据、颜色映射元数据等。FIG3 depicts one embodiment of a high-level block flow diagram of the present application. A video pipeline 300 may receive an encoded bitstream 301, which may further include video/image data along with metadata, which may be in a usable format—for example, frame-by-frame, scene-by-scene, and including metadata based on brightness statistics, color mapping metadata, etc.

该编码的比特流301可以被解码器302接收，解码器302可以进一步包括解析器304。解码器302可以对传入的比特流进行解码，所述传入的比特流可以被以本领域中已知的任何方式加密、压缩或以其他方式编码。一旦被解码，传入的比特流就可以被解析器304解析。解析器304可以从视频/图像数据分离出元数据。The encoded bitstream 301 can be received by a decoder 302, which can further include a parser 304. The decoder 302 can decode the incoming bitstream, which can be encrypted, compressed, or otherwise encoded in any manner known in the art. Once decoded, the incoming bitstream can be parsed by the parser 304. The parser 304 can separate metadata from the video/image data.

经提取的视频/图像数据可以连同其相关联的元数据一起作为中间比特流303被发出。如本文中进一步将描述的，比特流303还可以包括一个或多个标志(或某一其他的指示、信号等)305，这些标志305可以向(一个或多个)下游处理器通知要应用什么元数据、等等。The extracted video/image data may be emitted along with its associated metadata as an intermediate bitstream 303. As will be described further herein, the bitstream 303 may also include one or more flags (or some other indication, signal, etc.) 305 that may inform downstream processor(s) what metadata to apply, etc.

中间比特流303和/或任何标志305可以被显示器管理(DM)模块306接收。DM模块306可以在最终的图像/视频数据被发送到目标显示器308之前应用任何希望的图像/视频映射。目标显示器308可以是可以向观众显示图像和/或视频数据的任何合适的装置。仅举一些例子，这样的目标显示器308可以是HD电视、电影投影仪、台式监视器、膝上型电脑、平板电脑、智能装置等。The intermediate bitstream 303 and/or any flags 305 may be received by a display management (DM) module 306. The DM module 306 may apply any desired image/video mapping before the final image/video data is sent to a target display 308. The target display 308 may be any suitable device that can display image and/or video data to a viewer. To name a few, such a target display 308 may be an HD television, a movie projector, a desktop monitor, a laptop computer, a tablet computer, a smart device, etc.

如所提及的，本申请的数个实施例可以涉及分场景元数据(例如，可能地，一组“稳定”场景元数据)的计算和/或推导。这样的稳定场景元数据可在稳定场景元数据可以被采用(可能地，替代其他的可获得的不论是基于场景的还是基于帧的元数据)以便减轻对于观众而言可能是明显的和/或令人反感的伪像的时间期间被流水线明智地利用。As mentioned, several embodiments of the present application may involve the computation and/or derivation of per-scene metadata (e.g., possibly a set of "stable" scene metadata). Such stable scene metadata may be pipeline-wise utilized during times when stable scene metadata may be employed (possibly in place of other available metadata, whether scene-based or frame-based) to mitigate artifacts that may be noticeable and/or objectionable to a viewer.

仅举一例，考虑黑暗洞穴中的场景。图像可能显示出洞穴的全部黑暗细节。然而，如果照相机摇摄到洞穴的洞口(其是明亮的)，则自适应映射可以相应地调整图像——例如，可以降低洞穴壁的黑暗细节以适应新的更明亮的像素。通过产生并使用场景稳定元数据，可以针对整个场景对映射进行优化——例如，所以将不存在明显改变的中间场景。As just one example, consider a scene in a dark cave. The image might show all the dark details of the cave. However, if the camera pans to the cave's mouth (which is bright), the adaptive mapping can adjust the image accordingly—for example, dark details of the cave wall can be reduced to accommodate the new, brighter pixels. By generating and using scene-stabilizing metadata, the mapping can be optimized for the entire scene—for example, so there are no intermediate scenes that change significantly.

图4是稳定场景元数据处理的高级流程图400的一个实施例。在402，可以将视频数据划分为一组场景。视频到一组场景的此划分和/或分割可以以若干方式实现。首先，分割可以由人类用户(例如，导演、电影编辑师、后期制作的某人等)进行。例如，在一个实施例中，场景切换可能已经从编辑决策列表(EDL)(其可以被用于从若干个不同的镜头创建电影)知道。在一个实施例中可以提取该EDL并且使用它来消除场景边界。这样，需要很少的额外努力，或者不需要额外的努力。另外，用户具有覆写自动确定的(或提取的)场景切换的选项。FIG4 is one embodiment of a high-level flow diagram 400 for stable scene metadata processing. At 402, video data may be divided into a set of scenes. This division and/or segmentation of the video into a set of scenes may be accomplished in several ways. First, the segmentation may be performed by a human user (e.g., a director, a film editor, someone in post-production, etc.). For example, in one embodiment, scene cuts may already be known from an edit decision list (EDL) (which may be used to create a movie from several different shots). In one embodiment, this EDL may be extracted and used to eliminate scene boundaries. In this way, little or no additional effort is required. Additionally, the user has the option of overriding automatically determined (or extracted) scene cuts.

可替代地，场景分割的识别可以由视频处理器自动地进行，该视频处理器可以通过逐场景地对视频数据进行分析来做出这样的确定。例如，如果在帧之间亮度数据、颜色数据或其他图像数据度量有可测量的大的变化，则视频处理器可决定该差异可能标记两个场景的边界。这样的自动确定可以在前瞻的或多道次的过程中被增强——由此数个帧可以被分析，并且如果图像数据度量的初始差被标记，并且如果其后许多帧中的该度量基本上与这样的初始差一致，则可以评估场景改变很有可能已经发生。Alternatively, the identification of scene segments can be performed automatically by a video processor, which can make such determinations by analyzing the video data on a scene-by-scene basis. For example, if there is a measurable large change in brightness data, color data, or other image data metric between frames, the video processor can determine that the difference likely marks the boundary between two scenes. Such automatic determinations can be enhanced in a look-ahead or multi-pass process—whereby several frames can be analyzed, and if an initial difference in an image data metric is marked, and if the metric in many subsequent frames is substantially consistent with such initial difference, then it can be assessed that a scene change is likely to have occurred.

为了本申请的目的，可以在视频数据中以任何已知的方式识别场景。在404，可以逐场景地计算、测量或以其他方式提取元数据。仅举一例，如果存在50个包括给定场景的帧，则可以针对整个场景测量并提取亮度数据。可以计算场景元数据，诸如最小亮度、均值和/或平均亮度和最大亮度。可以同样地测量和/或提取其他图像/视频度量以形成其他的基于场景的元数据。For the purposes of this application, scenes can be identified in the video data in any known manner. At 404, metadata can be calculated, measured, or otherwise extracted on a scene-by-scene basis. For example, if there are 50 frames containing a given scene, luminance data can be measured and extracted for the entire scene. Scene metadata such as minimum luminance, mean and/or average luminance, and maximum luminance can be calculated. Other image/video metrics can similarly be measured and/or extracted to form other scene-based metadata.

以下是产生视频流内的场景稳定元数据的一个实施例：The following is one embodiment of generating scene stabilization metadata within a video stream:

(1)针对场景内的每个帧计算MIN、MID和MAX亮度。然后针对整个场景组合该结果。(1) Calculate the MIN, MID, and MAX brightness for each frame within the scene. Then combine the results for the entire scene.

a.对于MIN，取场景中的所有帧的所有极小值的最小值；a. For MIN, take the minimum of all the minimum values of all frames in the scene;

b.对于MID，取场景中的所有帧的所有中间值的中间值(平均值)；b. For MID, take the median (average) of all the median values of all frames in the scene;

c.对于MAX，取场景中的所有帧的所有极大值的最大值。c. For MAX, take the maximum of all maxima in all frames of the scene.

将意识到，可以针对其他视频/图像度量(例如，色域数据等)推导类似的统计。在另一实施例中，可以产生其他场景相关元数据——例如，多少锐化或平滑将应用于场景内的图像数据。It will be appreciated that similar statistics may be derived for other video/image metrics (eg, color gamut data, etc.) In another embodiment, other scene-related metadata may be generated—eg, how much sharpening or smoothing to apply to image data within a scene.

在406，可以针对场景计算一组“稳定”元数据。稳定元数据可以不同于早先计算的基于场景的(或基于帧的)元数据——这取决于这样的元数据的使用。可以针对场景、可能地根据可能在视频数据中生成明显的和/或令人反感的改变的一些受监视的特征、方面和/或度量(例如，即使早先计算的基于场景的元数据将被用于呈现场景以供观看)，来计算和/或产生场景稳定元数据。例如，就一个演员来说，在空间和时间中在不同的背景上移动(例如，在一次切换中从黑暗封闭的房间进入明亮的户外阳光灿烂的环境)可以在演员的面部中的颜色或色彩或者皮肤色调中生成明显的和/或令人反感的改变。在一些实施例中，如果根据受监视的特征、方面和/或度量，第一场景和第二场景(例如，不同于第二场景)可以被认为在感知上是类似的，用于第二场景的元数据也可以被针对第一场景计算的元数据取代。第二场景可以在第一场景之后或之前。At 406, a set of "stable" metadata can be calculated for the scene. The stable metadata can be different from previously calculated scene-based (or frame-based) metadata, depending on the use of such metadata. Scene-stable metadata can be calculated and/or generated for the scene, possibly based on some monitored features, aspects, and/or metrics that may generate noticeable and/or objectionable changes in the video data (e.g., even if the previously calculated scene-based metadata is to be used to render the scene for viewing). For example, in the case of an actor, moving across different backgrounds in space and time (e.g., from a dark, enclosed room to a bright, sunny outdoor environment in a single cut) can generate noticeable and/or objectionable changes in the color or tint or skin tone of the actor's face. In some embodiments, if the first scene and the second scene (e.g., different from the second scene) can be considered perceptually similar based on the monitored features, aspects, and/or metrics, the metadata for the second scene can also be replaced by the metadata calculated for the first scene. The second scene can be after or before the first scene.

其他特征、方面和/或度量是可能的——例如，皮肤色调、发亮特征/对象、黑暗特征/对象、有色特征/对象等。这样的改变可以用稳定场景元数据来减轻。可以计算和/或产生场景稳定元数据，以使得在场景的过程期间受监视的特征、方面和/或度量重回到和/或保持在可接受值的范围内。在408，该过程可以将该稳定场景元数据与可以或者可以不事先与场景相关联的任何其他元数据相关联和/或用该稳定场景元数据替代该任何其他元数据。稳定场景元数据的此关联和/或替代可以被提供来使这样的特征、方面和/或度量重回到可接受范围内——例如，在其他元数据允许这样的特征、方面和/或度量超出可接受范围的情况下是可能的。特征、方面和/或度量的可接受值的范围可以人工地(例如，由导演和/或电影编辑师)确定，或者根据涉及图像处理/呈现和/或电影编辑的某些规则和/或启发来确定。Other features, aspects, and/or metrics are possible—for example, skin tones, light features/objects, dark features/objects, colored features/objects, and the like. Such changes can be mitigated using stable scene metadata. Scene stabilization metadata can be calculated and/or generated to return the monitored features, aspects, and/or metrics to and/or remain within a range of acceptable values during the course of the scene. At 408, the process can associate the stable scene metadata with any other metadata that may or may not have been previously associated with the scene and/or replace the other metadata with the stable scene metadata. This association and/or replacement of the stable scene metadata can be provided to return such features, aspects, and/or metrics to within an acceptable range—for example, where other metadata would allow such features, aspects, and/or metrics to exceed an acceptable range. The range of acceptable values for features, aspects, and/or metrics can be determined manually (for example, by a director and/or film editor) or based on certain rules and/or heuristics related to image processing/rendering and/or film editing.

应意识到，图4中所阐述的处理可以在视频/图像流水线中的许多不同点处发生。例如，将视频分割成场景可以在后期制作中由人员进行——或者在其他地方由流水线中的处理器进行。另外，基于场景的元数据的计算和/或提取可以在后期制作中或者在流水线中的其他地方进行。同样地，“稳定”场景元数据的关联可以在后期制作中发生，或者可以在更下游在将最终的视频/图像数据发送到目标显示器以供呈现之前，例如，由DM或其他视频处理器实现。It should be appreciated that the processing illustrated in FIG4 can occur at many different points in the video/image pipeline. For example, segmenting the video into scenes can be performed by a human in post-production—or elsewhere in the pipeline by a processor. Additionally, the calculation and/or extraction of scene-based metadata can be performed in post-production or elsewhere in the pipeline. Similarly, the association of "stable" scene metadata can occur in post-production or further downstream before the final video/image data is sent to the target display for presentation, for example, by a DM or other video processor.

替代实施例Alternative Embodiments

在一些实施例中，映射操作可以是图像内容相关的，以便实现最大性能。这样的图像相关映射可以由从源内容产生的元数据控制。为了确保时间稳定性(例如，没有闪烁、脉动、时明时暗等)，可能希望的是，元数据随着时间的过去大体上是稳定的。在一个实施例中，这可以通过强制使得元数据在场景的持续时间上稳定来实现。元数据可以被允许在每次场景切换时改变。在这样的情况下，元数据中的适应内容的突然改变对于观众可能是不明显的。In some embodiments, the mapping operation can be image content dependent in order to achieve maximum performance. Such image dependent mapping can be controlled by metadata generated from the source content. To ensure temporal stability (e.g., no flickering, pulsation, flashing, etc.), it may be desirable that the metadata be substantially stable over time. In one embodiment, this can be achieved by forcing the metadata to be stable for the duration of the scene. The metadata may be allowed to change at each scene cut. In such a case, a sudden change in the metadata to accommodate the content may not be noticeable to the viewer.

在一个实施例中，用于产生场景稳定元数据的步骤可以包括以下：In one embodiment, the steps for generating scene stabilization metadata may include the following:

(1)获取视频数据中的场景切换的位置。在一个实施例中，这可以从编辑决策列表(EDL)推导出。可替代地，这可以由人手动地输入——或者由处理器自动地检测。(1) Obtain the location of scene cuts in the video data. In one embodiment, this can be derived from an edit decision list (EDL). Alternatively, this can be manually entered by a human—or automatically detected by a processor.

(2)针对场景中的每个帧计算和/或产生元数据：(2) Calculate and/or generate metadata for each frame in the scene:

a.可选地，对图像进行下采样。(这可以趋向于使处理加a. Optionally, downsample the image. (This may tend to make processing slower.

速，并且最小化一些边远的像素值的影响。)speed and minimize the influence of some distant pixel values.)

b.将图像转换到期望的颜色空间(例如，IPT-PQ)中b. Convert the image to the desired color space (e.g., IPT-PQ)

c.计算图像(例如，I通道)的最小值c. Calculate the minimum value of the image (for example, I channel)

d.计算图像(例如，I通道)的最大值d. Calculate the maximum value of the image (for example, I channel)

e.计算图像(例如，I通道)的均值e. Calculate the mean of the image (e.g., I channel)

(3)将每帧的结果组合为每场景的结果：(3) Combine the results of each frame into the results of each scene:

a.计算帧极小值中的每个的最小值。a. Calculate the minimum value of each of the frame minima.

b.计算帧极大值中的每个的最大值。b. Calculate the maximum value of each of the frame maxima.

c.计算帧均值中的每个的均值。c. Calculate the mean of each of the frame means.

(4)将元数据与场景——或者可替代地与场景内的每个帧——相关联。(4) Associating metadata with the scene—or alternatively, with each frame within the scene.

将意识到，以上实施例的变型是可能的并且是在本申请的范围之下被构想的。例如，作为在步骤(2)中对场景的每个帧进行分析的替代，可以选择单个代表性的帧，并且使用该帧来计算和/或产生元数据，该元数据然后被与整个场景相关联。It will be appreciated that variations of the above embodiments are possible and are contemplated within the scope of the present application. For example, instead of analyzing every frame of the scene in step (2), a single representative frame may be selected and used to calculate and/or generate metadata that is then associated with the entire scene.

另外，可以通过如下操作来支持渐变：在渐变的任一侧指示用于场景的元数据、然后针对中间帧进行插值。这样的插值可以是线性的或经由余弦或类似函数而在两端是渐近的。Additionally, gradients can be supported by indicating metadata for the scene on either side of the gradient and then interpolating for the intermediate frames. Such interpolation can be linear or asymptotic on both ends via a cosine or similar function.

在步骤(4)之后，可以与正确的视频帧适当同步地将元数据插入到编码的比特流中。元数据可以有规律地重复以允许随机输入该流。After step (4), the metadata can be inserted into the coded bitstream in proper synchronization with the correct video frames. The metadata can be repeated regularly to allow random input into the stream.

在又一实施例中，可以将一些预先计算的值包括在元数据中以帮助将解码的视频转换到期望的颜色空间(例如，IPT-PQ)中。这可能是可取的，因为转换常常是在具有可能不能很好地处理某些数学运算(诸如除数运算和指数运算)的定点处理器的装置上进行的。预先计算的值的使用以及将它们嵌入在元数据流中可以是有益的。In yet another embodiment, some pre-calculated values can be included in the metadata to assist in converting the decoded video into the desired color space (e.g., IPT-PQ). This may be desirable because conversions are often performed on devices with fixed-point processors that may not handle certain mathematical operations well, such as division and exponential operations. The use of pre-calculated values and their embedding in the metadata stream can be beneficial.

对场景稳定元数据/“预先通知”元数据进行解码Decoding scene stabilization metadata/"advance notice" metadata

在视频解码器，新的场景元数据可以与新场景的第一帧在相同的帧中到达。可替代地，元数据可以在场景的第一帧之前到达，以便提供用于及时地对元数据进行解码和解释以使它被应用于处理视频的时间。该“预先通知元数据”和其他技术对于改进场景稳定元数据在比特流传输上的鲁棒性可能是可取的。若干改进可以包括以下改进(作为单独的改进，或者作为组合的若干改进)：At the video decoder, new scene metadata may arrive in the same frame as the first frame of the new scene. Alternatively, the metadata may arrive before the first frame of the scene to provide time for decoding and interpreting the metadata in time for it to be applied to the processing of the video. This "pre-notification metadata" and other techniques may be desirable to improve the robustness of scene-stable metadata over bitstream transmission. Several improvements may include the following (either individually or in combination):

(1)在同一场景内每一个帧地重复元数据；(1) Repeat metadata for every frame in the same scene;

(2)在元数据本体中添加场景切换基本上在下一帧发生的指示符/标志；(2) adding an indicator/flag in the metadata body that the scene change will occur in the next frame;

(3)在元数据本体中添加场景切换在当前帧发生的指示符/标志；(3) Add an indicator/flag in the metadata body that the scene change occurs in the current frame;

(4)添加下一帧的元数据与当前帧的元数据基本相同(或大不相同)的指示符/标志；和/或(4) adding an indicator/flag that the metadata for the next frame is substantially the same (or significantly different) than the metadata for the current frame; and/or

(5)在元数据本体中添加数据完整性字段以供错误检查(例如，CRC32)。(5) Add data integrity fields in the metadata ontology for error checking (e.g., CRC32).

图5描绘了这样的预先通知元数据的流程图500的一个实施例。在502，系统/流水线可以每一场景地计算和/或产生元数据。在504，可以使该元数据——稳定场景元数据或其他——在视频数据比特流中与场景相关联。然后，在506，系统/流水线可以在新场景的实际第一帧之前提前期望数量的(例如，一个或多个)帧添加即将发生的场景改变的指示。该指示和/或标志可以包括比特流的一部分，并且由DM(或流水线中的其他合适的处理器)通知。在508，系统可以允许DM(或其他合适的处理器)有时间在场景改变之前安装参数和/或映射。该附加时间可以允许系统有机会避免对于视频内容的观众而言可能令人反感的任何明显的伪像。FIG5 depicts one embodiment of a flowchart 500 for such pre-notification metadata. At 502, the system/pipeline may calculate and/or generate metadata for each scene. At 504, this metadata—stable scene metadata or otherwise—may be associated with the scene in the video data bitstream. Then, at 506, the system/pipeline may add an indication of an impending scene change a desired number (e.g., one or more) frames before the actual first frame of the new scene. This indication and/or flag may comprise a portion of the bitstream and be notified by the DM (or other suitable processor in the pipeline). At 508, the system may allow the DM (or other suitable processor) time to install parameters and/or mappings before the scene change. This additional time may allow the system an opportunity to avoid any noticeable artifacts that may be objectionable to viewers of the video content.

这可以趋向于是相对于常规的可能不能提前获得场景切换的位置的元数据稳定化方法的改进。例如，如果场景切换不是提前知道的，则可以通过视频数据的分析来即时地估计元数据，或者允许元数据随着时间的过去平滑地改变。这可能引起图像伪像，诸如闪烁、脉动、时明时暗等。在另一实施例中，通过在源(例如，在视频压缩之前)处计算元数据，可以减少计算，因此降低能力较小的消费者装置所需的成本。This can tend to be an improvement over conventional metadata stabilization methods, where the locations of scene cuts may not be known in advance. For example, if scene cuts are not known in advance, metadata can be estimated on the fly through analysis of the video data, or allowed to change smoothly over time. This can cause image artifacts such as flickering, pulsation, or flickering. In another embodiment, by calculating metadata at the source (e.g., before video compression), computation can be reduced, thereby reducing the cost required for less capable consumer devices.

图6是视频数据600的一个例子，其被划分为若干个场景(场景1至场景N)，场景又包括若干个帧(例如，帧602a)。场景1的帧602m可以具有与该帧相关联的预先通知标志——使得DM可以有时间设置参数和/或映射以更好地呈现跟随的场景2。FIG6 is an example of video data 600, which is divided into several scenes (scene 1 to scene N), each of which includes several frames (e.g., frame 602a). Frame 602m of scene 1 may have a pre-notification flag associated with the frame—so that the DM can have time to set parameters and/or mapping to better present the following scene 2.

现在已经给出了连同附图一起阅读的、例示说明本发明的原理的、本发明的一个或多个实施例的详细描述。要意识到，本发明是与这样的实施例结合描述的，但是本发明不限于任何实施例。本发明的范围仅由权利要求限制，并且本发明涵盖许多替代、修改和等同。在该描述中已经阐述了许多特定细节，以便提供本发明的透彻理解。这些细节是出于示例的目的而提供的，并且本发明可以根据权利要求在没有这些特定细节中的一些或全部的情况下实施。为了清楚起见，未对与本发明相关的技术领域中已知的技术材料进行详细描述，以使得不会不必要地模糊本发明。A detailed description of one or more embodiments of the present invention, to be read in conjunction with the accompanying drawings, illustrating the principles of the present invention, has now been given. It will be appreciated that the present invention is described in conjunction with such embodiments, but the invention is not limited to any embodiment. The scope of the present invention is limited only by the claims, and the present invention encompasses many alternatives, modifications, and equivalents. Many specific details have been set forth in this description in order to provide a thorough understanding of the present invention. These details are provided for illustrative purposes, and the present invention may be implemented according to the claims without some or all of these specific details. For the sake of clarity, technical material known in the technical field related to the present invention has not been described in detail so as not to unnecessarily obscure the present invention.

Claims

1. A method for using scene-stabilized metadata in a video data stream, the method comprising:

The video data stream is divided into a set of scenes;

Generate first metadata associated with a first scene within the set of scenes, wherein generating the first metadata associated with the first scene within the set of scenes includes:

Calculate the minimum, intermediate, and maximum brightness for each frame in the scene; and

Calculate the minimum, intermediate, and maximum brightness of the scene;

By monitoring features within the scene to generate scene-stabilized metadata; and

Associating scene-stabilized metadata with the first scene, wherein associating scene-stabilized metadata with the first scene further includes:

The scene-stabilized metadata is repeated for each frame within the first scene.

2. The method according to claim 1, wherein dividing the video data stream into a group of scenes further includes:

The location of the scene switch is deduced from the edit decision list.

3. The method according to claim 2, wherein deriving the scene switching location from the editing decision list further includes:

The location of scene switching can be determined manually or automatically by the processor.

4. The method according to claim 1, wherein generating scene-stabilized metadata further comprises:

The computation ensures that the monitored features within the scene remain within acceptable ranges, resulting in scene-stable metadata.

5. The method of claim 4, wherein the monitored feature is one of a group comprising: skin tone, highlighting features, dark features, and tinted features.

6. The method according to claim 5, wherein associating the scene stability metadata with the first scene further includes:

Associate scenario-stable metadata, which is different from the first metadata generated for the first scenario, with the first scenario.

7. The method according to claim 5, wherein associating the scene stability metadata with the first scene further includes:

If the scene-stabilized metadata keeps the monitored features within the first scene within acceptable ranges, then the scene-stabilized metadata is associated with the first scene.

8. The method according to claim 1, wherein associating the scene stability metadata with the first scene further comprises:

Add an indicator for scene transitions in the current frame or the next frame.

9. The method according to claim 1, wherein associating the scene stability metadata with the first scene further comprises:

An indication that the metadata of the next frame is substantially the same as or significantly different from the metadata of the current frame.

10. The method of claim 1, wherein associating the scene stability metadata with the first scene further comprises:

Add a data integrity field to the metadata for error checking.

11. A system for stabilizing metadata for video data usage scenarios, the system comprising:

processor;

The memory associated with the processor, the memory further including processor-readable instructions, such that when the processor reads the processor-readable instructions, the processor performs the following processing:

Receive a video data stream, the video data stream comprising a set of scenes;

Calculate the minimum, intermediate, and maximum brightness of the scene;

Associating the scene-stabilized metadata with the first scene, wherein associating the scene-stabilized metadata with the first scene further includes:

12. The system of claim 11, wherein generating scene-stabilized metadata further comprises:

13. The system of claim 12, wherein the monitored feature is one of a group comprising: skin tone, glowing features, dark features, and colored features.

14. An apparatus for using scene-stabilized metadata in a video data stream, the apparatus comprising:

A means for dividing the video data stream into a set of scenes;

A means for generating first metadata associated with a first scene within the set of scenes, wherein the means for generating first metadata associated with the first scene within the set of scenes includes:

A device for calculating the minimum, intermediate, and maximum brightness of each frame in a scene;

and

A device for calculating the minimum, intermediate, and maximum brightness of the scene;

A means for generating scene-stable metadata by monitoring features within the scene; and

A means for associating scene-stabilized metadata with the first scene, wherein the means for associating scene-stabilized metadata with the first scene further includes:

A means for repeating the scene-stabilized metadata for each frame within the first scene.

15. The apparatus of claim 14, wherein the means for dividing the video data stream into a set of scenes further comprises:

A device for deriving the location of scene switching from an edit decision list.

16. The device of claim 15, wherein the means for deriving the location of the scene switch from the editing decision list further comprises:

A device for manually or automatically deriving the location of scene transitions via a processor.

17. The apparatus of claim 14, wherein the means for generating scene-stabilized metadata further comprises:

A device for calculating scene-stable metadata that keeps monitored features within a scene within acceptable ranges.

18. The device of claim 17, wherein the monitored feature is one of a group comprising: skin tone, glowing features, dark features, and colored features.

19. The apparatus of claim 18, wherein the means for associating scene-stabilized metadata with the first scene further comprises:

A means for associating scenario-stable metadata, which is different from the first metadata generated for the first scenario, with the first scenario.

20. The apparatus of claim 18, wherein the means for associating scene-stabilized metadata with the first scene further comprises:

A means for associating scene-stabilized metadata with the first scene if the scene-stabilized metadata keeps the monitored features within the first scene within an acceptable range.

21. The apparatus of claim 14, wherein the means for associating scene-stabilized metadata with the first scene further comprises:

A device for adding an indication of scene transition in the current frame or the next frame.

22. The apparatus of claim 14, wherein the means for associating scene-stabilized metadata with the first scene further comprises:

A means for adding an indication that the metadata of the next frame is substantially the same as or significantly different from the metadata of the current frame.

23. The apparatus of claim 14, wherein the means for associating scene-stabilized metadata with the first scene further comprises:

A device for adding a data integrity field to the metadata for error checking.

24. A system for using scene-stabilized metadata in a video data stream, the system comprising:

processor;

The video data stream is divided into a set of scenes;

Calculate the minimum, intermediate, and maximum brightness of the scene;

25. A device for stabilizing metadata for video data usage scenarios, comprising:

A device for receiving a video data stream, the video data stream comprising a set of scenes;

and

A means for associating the scene-stabilized metadata with the first scene, wherein the means for associating the scene-stabilized metadata with the first scene further includes:

26. The apparatus of claim 25, wherein the means for generating scene-stabilized metadata further comprises:

27. The device of claim 26, wherein the monitored feature is one of a group comprising: skin tone, glowing features, dark features, and colored features.

28. A method for stabilizing metadata for video data usage scenarios, comprising:

Receive a video data stream, the video data stream comprising a set of scenes;

Calculate the minimum, intermediate, and maximum brightness of the scene;

29. The method of claim 28, wherein generating scene-stabilized metadata further comprises:

30. The method of claim 29, wherein the monitored feature is one of a group comprising: skin tone, highlighting features, dark features, and tinted features.

31. A storage medium comprising readable instructions that, when executed by a processor, cause the method according to any one of claims 1-10 and 28-30 to be performed.