HK1232011B - Audio processing unit and method for decoding encoded audio bitstream - Google Patents
Audio processing unit and method for decoding encoded audio bitstream Download PDFInfo
- Publication number
- HK1232011B HK1232011B HK17105459.4A HK17105459A HK1232011B HK 1232011 B HK1232011 B HK 1232011B HK 17105459 A HK17105459 A HK 17105459A HK 1232011 B HK1232011 B HK 1232011B
- Authority
- HK
- Hong Kong
- Prior art keywords
- metadata
- audio
- bitstream
- program
- payload
- Prior art date
Links
Description
本申请是申请日为2014年6月12日、申请号为“201480008799.7”、发明名称为“使用节目信息或子流结构元数据的音频编码器和解码器”的发明专利申请的分案申请。This application is a divisional application of the invention patent application with the application date of June 12, 2014, application number "201480008799.7", and invention name "Audio encoder and decoder using program information or substream structure metadata".
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请要求在2013年6月19日提交的美国临时专利申请61/836,865号的优先权,其全部内容通过引用合并于此。This application claims priority to U.S. Provisional Patent Application No. 61/836,865, filed June 19, 2013, the entire contents of which are incorporated herein by reference.
技术领域Technical Field
本发明涉及音频信号处理,以及更具体地,涉及具有指示与由比特流所指示的音频内容有关的子流结构和/或节目信息的元数据的音频数据比特流的编码和解码。本发明的一些实施方式以被称为杜比数字(AC-3)、杜比数字+(增强的AC-3或E-AC-3)或杜比E的格式中的一种格式生成或解码音频数据。The present invention relates to audio signal processing, and more particularly, to encoding and decoding of an audio data bitstream having metadata indicating a substream structure and/or program information related to the audio content indicated by the bitstream. Some embodiments of the present invention generate or decode audio data in one of the formats known as Dolby Digital (AC-3), Dolby Digital Plus (Enhanced AC-3 or E-AC-3), or Dolby E.
背景技术Background Art
杜比、杜比数字、杜比数字+、和杜比E是杜比实验室特许公司的商标。杜比实验室提供分别被称为杜比数字和杜比数字+的AC-3和E-AC-3的专有实现。Dolby, Dolby Digital, Dolby Digital Plus, and Dolby E are trademarks of Dolby Laboratories, Inc. Dolby Laboratories provides proprietary implementations of AC-3 and E-AC-3 known as Dolby Digital and Dolby Digital Plus, respectively.
音频数据处理单元通常以盲方式(blind fashion)操作并且不关注在数据被接收之前发生的音频数据的处理历史。这可以在这样的处理框架中工作:其中单个实体进行各种目标媒体渲染装置的所有的音频数据处理和编码而目标媒体渲染装置进行编码音频数据的所有的解码和渲染。然而,该盲处理在多个音频处理单元跨多样化的网络被散布(scatter)或串联(即,链)放置并且期望它们最佳地执行其相应类型的音频处理的情形下不能很好地(或完全不)工作。例如,一些音频数据可能针对高性能媒体系统被编码,并且可能需要被转换成适合于沿着媒体处理链的移动设备的简化形式。因此,音频处理单元可能不必要地对音频数据执行已经被执行过的类型的处理。例如,音量校平(leveling)单元可能对输入音频片断执行处理,不管以前是否已经对输入音频片断执行了相同的或相似的音量校平。因此,即使当不必要时,音量校平单元也可能执行校平。该不必要的处理还可能导致当渲染音频数据的内容时具体特征的退化和/或消除。Audio data processing units typically operate in a blind fashion and do not pay attention to the processing history of the audio data that occurred before the data was received. This can work in a processing framework where a single entity performs all audio data processing and encoding for various target media rendering devices, while the target media rendering devices perform all decoding and rendering of the encoded audio data. However, this blind processing does not work well (or at all) when multiple audio processing units are scattered or placed in series (i.e., chained) across a diverse network and are expected to perform their corresponding types of audio processing optimally. For example, some audio data may be encoded for a high-performance media system and may need to be converted into a simplified form suitable for mobile devices along the media processing chain. Therefore, the audio processing unit may unnecessarily perform a type of processing that has already been performed on the audio data. For example, a volume leveling unit may perform processing on an input audio segment, regardless of whether the same or similar volume leveling has been performed on the input audio segment before. Therefore, even when it is unnecessary, the volume leveling unit may perform leveling. This unnecessary processing may also lead to the degradation and/or elimination of specific features when rendering the content of the audio data.
发明内容Summary of the Invention
在一类实施方式中,本发明是能够对编码比特流进行解码的音频处理单元,该编码比特流包括比特流的至少一个帧的至少一个段中的子流结构元数据和/或节目信息元数据(可选地还包括其他元数据,例如,响度处理状态元数据)以及帧的至少一个其他段中的音频数据。在本文中,子流结构元数据(或“SSM”)表示编码比特流(或编码比特流的集合)的元数据,其指示编码比特流的音频内容的子流结构,并且“节目信息元数据”(或“PIM”)表示编码音频比特流的元数据,其指示至少一个音频节目(例如,两个或更多个音频节目),其中节目信息元数据指示至少一个所述节目的音频内容的至少一个属性或特性(例如,指示对节目的音频数据执行的处理的类型或参数的元数据,或指示节目的哪些通道是活动通道(active channel)的元数据)。In one class of embodiments, the present invention is an audio processing unit capable of decoding a coded bitstream, the coded bitstream including substream structure metadata and/or program information metadata (optionally including other metadata, such as loudness processing state metadata) in at least one segment of at least one frame of the bitstream, and audio data in at least one other segment of the frame. Herein, substream structure metadata (or "SSM") refers to metadata of a coded bitstream (or a collection of coded bitstreams) that indicates a substream structure of the audio content of the coded bitstream, and "program information metadata" (or "PIM") refers to metadata of a coded audio bitstream that indicates at least one audio program (e.g., two or more audio programs), wherein the program information metadata indicates at least one attribute or characteristic of the audio content of at least one of the programs (e.g., metadata indicating a type or parameter of processing performed on the audio data of the program, or metadata indicating which channels of the program are active channels).
在典型的情况(例如,其中编码比特流为AC-3或E-AC-3比特流)下,节目信息元数据(PIM)指示实际上不能在比特流的其他部分中携带的节目信息。例如,PIM可以指示在编码(例如,AC-3或E-AC-3编码)之前对PCM音频所应用的处理,音频节目的哪些频带已经使用具体的音频编码技术被编码以及用于在比特流中创建动态范围压缩(DRC)数据的压缩简档(profile)。In a typical case (e.g., where the encoded bitstream is an AC-3 or E-AC-3 bitstream), program information metadata (PIM) indicates program information that cannot actually be carried in other parts of the bitstream. For example, the PIM may indicate the processing applied to the PCM audio before encoding (e.g., AC-3 or E-AC-3 encoding), which frequency bands of the audio program have been encoded using a specific audio coding technique, and the compression profile used to create dynamic range compression (DRC) data in the bitstream.
在另一类实施方式中,方法包括在比特流的每个帧(或至少一些帧中的每个帧)中将编码音频数据与SSM和/或PIM复用的步骤。在典型的解码中,解码器从比特流中提取SSM和/或PIM(包括通过对SSM和/或PIM以及音频数据进行分析和去复用),并且对音频数据进行处理以生成解码音频数据的流(以及在某些情况下还执行音频数据的自适应处理)。在一些实施方式中,解码音频数据以及SSM和/或PIM从解码器被转发至后处理器,该后处理器被配置成使用SSM和/或PIM对解码音频数据执行自适应处理。In another class of embodiments, the method includes the step of multiplexing the encoded audio data with the SSM and/or PIM in each frame (or at least each of some frames) of the bitstream. In a typical decoding, the decoder extracts the SSM and/or PIM from the bitstream (including by parsing and demultiplexing the SSM and/or PIM and the audio data), and processes the audio data to generate a stream of decoded audio data (and, in some cases, also performs adaptive processing of the audio data). In some embodiments, the decoded audio data and the SSM and/or PIM are forwarded from the decoder to a post-processor that is configured to perform adaptive processing on the decoded audio data using the SSM and/or PIM.
在一类实施方式中,本发明的编码方法生成包括音频数据段(例如,图4所示的帧的AB0至AB5段或图7所示的帧的段AB0至AB5中的全部或一些)的编码音频比特流(例如,AC-3或E-AC-3比特流),音频数据段包括编码音频数据以及与音频数据段时分复用的元数据段(包括SSM和/或PIM,可选地还包括其他元数据)。在一些实施方式中,每个元数据段(在本文中有时称为“容器”)具有包括元数据段报头(可选地还包括其他强制性的或“核心”元素)、以及在元数据段报头之后的一个或更多个元数据有效载荷。如果存在,SIM被包括在元数据有效载荷之一中(由有效载荷报头标识,并且通常具有第一类型的格式)。如果存在,PIM被包括在元数据有效载荷中的另一个中(由有效载荷报头标识,并且通常具有第二类型的格式)。类似地,元数据的每个其他类型(如果存在)被包括在元数据有效载荷中的另一个中(由有效载荷报头标识,并且通常具有特定于元数据的类型的格式)。示例性格式允许在除了比特流的解码期间之外的时间(例如,由解码之后的后处理器,或由被配置成在不执行对编码比特流的完全解码的情况下识别元数据的处理器)对SSM、PIM或其他元数据的方便的访问,并且允许在比特流的解码期间(例如,子流识别的)方便的和高效的误差检测和校正。例如,在不以示例性格式访问SSM的情况下,解码器可能错误地识别与节目相关联的子流的正确数量。元数据段中的一个元数据有效载荷可以包括SSM,元数据段中的另一元数据有效载荷可以包括PIM,并且可选地,元数据段中的至少一个其他元数据有效载荷可以包括其他元数据(例如,响度处理状态元数据或“LPSM”)。In one class of embodiments, the encoding method of the present invention generates an encoded audio bitstream (e.g., an AC-3 or E-AC-3 bitstream) comprising audio data segments (e.g., all or some of segments AB0 to AB5 of the frame shown in FIG. 4 , or segments AB0 to AB5 of the frame shown in FIG. 7 ). The audio data segments include the encoded audio data and metadata segments (including an SSM and/or PIM, and optionally other metadata) time-division multiplexed with the audio data segments. In some embodiments, each metadata segment (sometimes referred to herein as a "container") comprises a metadata segment header (optionally including other mandatory or "core" elements) and one or more metadata payloads following the metadata segment header. If present, the SIM is included in one of the metadata payloads (identified by the payload header and generally having a first type of format). If present, the PIM is included in another of the metadata payloads (identified by the payload header and generally having a second type of format). Similarly, each other type of metadata (if present) is included in another of the metadata payloads (identified by the payload header and generally having a format specific to the metadata type). The exemplary format allows convenient access to the SSM, PIM, or other metadata at times other than during decoding of the bitstream (e.g., by a post-processor after decoding, or by a processor configured to identify metadata without performing a full decoding of the encoded bitstream), and allows convenient and efficient error detection and correction during decoding of the bitstream (e.g., substream identification). For example, without access to the SSM in the exemplary format, a decoder may incorrectly identify the correct number of substreams associated with a program. One metadata payload in the metadata segment may include the SSM, another metadata payload in the metadata segment may include the PIM, and optionally, at least one other metadata payload in the metadata segment may include other metadata (e.g., loudness processing state metadata or "LPSM").
根据一个实施例,提供一种音频处理单元,其包括:缓冲存储器;以及至少一个处理子系统,其耦接至缓冲存储器,其中缓冲存储器存储编码音频比特流的至少一个帧,帧包括在帧的至少一个保留字段的至少一个元数据段中的节目信息元数据或子流结构元数据以及在帧的至少一个其他段中的音频数据,其中处理子系统被耦接并且被配置成使用比特流的元数据执行比特流的生成、音频数据的解码或音频数据的自适应处理中的至少一种,或使用比特流的元数据执行比特流的音频数据或元数据中至少之一的认证或验证中的至少一种。其中,元数据段包括至少一个元数据有效载荷,元数据有效载荷包括:报头;以及在报头之后的,节目信息元数据的至少一部分或子流结构元数据的至少一部分。并且其中,保留字段选自由跳过字段、addbsi字段、辅助数据字段或其组合构成的组。According to one embodiment, an audio processing unit is provided, comprising: a buffer memory; and at least one processing subsystem coupled to the buffer memory, wherein the buffer memory stores at least one frame of an encoded audio bitstream, the frame including program information metadata or substream structure metadata in at least one metadata segment of at least one reserved field of the frame and audio data in at least one other segment of the frame, wherein the processing subsystem is coupled and configured to use the metadata of the bitstream to perform at least one of bitstream generation, audio data decoding, or adaptive processing of the audio data, or to use the metadata of the bitstream to perform at least one of authentication or verification of at least one of the audio data or metadata of the bitstream. The metadata segment includes at least one metadata payload, the metadata payload including: a header; and, following the header, at least a portion of the program information metadata or at least a portion of the substream structure metadata. Furthermore, the reserved field is selected from the group consisting of a skip field, an addbsi field, an ancillary data field, or a combination thereof.
根据另一个实施例,提供一种用于对编码音频比特流进行解码的方法,该方法包括以下步骤:接收包括元数据和音频数据的编码音频比特流;以及从编码音频比特流中提取元数据或音频数据,其中元数据是或包括节目信息元数据或子流结构元数据。其中,编码音频比特流包括一系列帧并且指示至少一个音频节目,节目信息元数据和子流结构元数据指示节目,帧中的每个包括至少一个音频数据段,每个音频数据段包括音频数据的至少一部分,帧的至少一个子集中的每个帧包括元数据段,并且每个元数据段包括节目信息元数据的至少一部分以及子流结构元数据的至少一部分,其中,元数据段位于保留字段中,保留字段选自由跳过字段、addbsi字段、辅助数据字段或其组合构成的组。According to another embodiment, a method for decoding a coded audio bitstream is provided, the method comprising the steps of: receiving a coded audio bitstream comprising metadata and audio data; and extracting the metadata or the audio data from the coded audio bitstream, wherein the metadata is or comprises program information metadata or substream structure metadata. The coded audio bitstream comprises a series of frames and indicates at least one audio program, the program information metadata and the substream structure metadata indicate the program, each of the frames comprises at least one audio data segment, each audio data segment comprises at least a portion of the audio data, each frame in at least a subset of the frames comprises a metadata segment, each metadata segment comprises at least a portion of the program information metadata and at least a portion of the substream structure metadata, wherein the metadata segment is located in a reserved field selected from the group consisting of a skip field, an addbsi field, an ancillary data field, or a combination thereof.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是可以被配置成执行本发明的方法的实施方式的系统的实施方式的框图。FIG1 is a block diagram of an embodiment of a system that may be configured to perform embodiments of the method of the present invention.
图2是作为本发明的音频处理单元的实施方式的编码器的框图。FIG. 2 is a block diagram of an encoder as an embodiment of the audio processing unit of the present invention.
图3是作为本发明的音频处理单元的实施方式的解码器以及作为本发明的音频处理单元的另一实施方式的耦接至解码器的后处理器的框图。3 is a block diagram of a decoder as an embodiment of an audio processing unit of the present invention and a post-processor coupled to the decoder as another embodiment of the audio processing unit of the present invention.
图4是包括被划分成的段的AC-3帧的图。FIG. 4 is a diagram of an AC-3 frame including divided segments.
图5是包括被划分成的段的AC-3帧的同步信息(SI)段的图。5 is a diagram of a synchronization information (SI) segment of an AC-3 frame including segments divided therein.
图6是包括被划分成的段的AC-3帧的比特流信息(BSI)段的图。6 is a diagram of a bit stream information (BSI) section of an AC-3 frame including divided segments.
图7是包括被划分成的段的E-AC-3帧的图。FIG. 7 is a diagram of an E-AC-3 frame including divided segments.
图8是根据本发明的实施方式生成的包括元数据段报头的编码比特流的元数据段的图,元数据段报头包括容器同步字(在图8中标识为“容器同步”)以及版本和键ID值,之后是多个元数据有效载荷以及保护位。Figure 8 is a diagram of a metadata segment of an encoded bit stream generated according to an embodiment of the present invention, including a metadata segment header, the metadata segment header including a container synchronization word (identified as "Container Sync" in Figure 8) and version and key ID values, followed by multiple metadata payloads and protection bits.
符号和术语Symbols and terminology
贯穿包括权利要求在内的本公开内容,“对”信号或数据执行操作(例如,对信号或数据进行滤波、缩放、变换或施加增益)的表达用于广义上表示对信号或数据、或对信号或数据的已处理版本(例如,对在对信号执行操作之前已经经历了初步滤波或预处理的信号的版本)直接执行操作。Throughout this disclosure, including the claims, expressions that “perform an operation on” a signal or data (e.g., filtering, scaling, transforming, or applying a gain to the signal or data) are used broadly to refer to performing the operation directly on the signal or data, or on a processed version of the signal or data (e.g., a version of the signal that has undergone preliminary filtering or preprocessing before the operation is performed on the signal).
贯穿包括权利要求在内的本公开内容,“系统”的表达用于广义上表示设备、系统或子系统。例如,实现解码器的子系统可以称为解码器系统,并且包括这样的子系统的系统(例如,响应于多个输入生成X个输出信号的系统,在该系统中,子系统生成M个输入并且其他X-M个输入从外部源接收)也可以称为解码器系统。Throughout this disclosure, including the claims, the expression "system" is used in a broad sense to refer to a device, system, or subsystem. For example, a subsystem that implements a decoder may be referred to as a decoder system, and a system that includes such a subsystem (e.g., a system that generates X output signals in response to multiple inputs, where the subsystem generates M inputs and the other X-M inputs are received from external sources) may also be referred to as a decoder system.
贯穿包括权利要求在内的本公开内容,术语“处理器”用于广义上表示可编程或以其他方式可配置成(例如,使用软件或固件)对数据(例如,音频数据或视频数据或其他图像数据)执行操作的系统或装置。处理器的示例包括现场可编程门阵列(或其他可配置的集成电路或芯片组)、被编程和/或被以其他方式配置成对音频数据或其他声音数据执行流水线处理的数字信号处理器、可编程的通用处理器或计算机以及可编程的微处理器芯片或芯片组。Throughout this disclosure, including the claims, the term "processor" is used in a broad sense to refer to a system or device that is programmable or otherwise configurable (e.g., using software or firmware) to perform operations on data (e.g., audio data or video data or other image data). Examples of processors include field programmable gate arrays (or other configurable integrated circuits or chipsets), digital signal processors that are programmed and/or otherwise configured to perform pipeline processing on audio data or other sound data, programmable general-purpose processors or computers, and programmable microprocessor chips or chipsets.
贯穿包括权利要求在内的本公开内容,“音频处理器”和“音频处理单元”的表达用于可交换地广义上表示被配置成对音频数据进行处理的系统。音频处理单元的示例包括但不限于编码器(例如,代码转换器)、解码器、编解码器、预处理系统、后处理系统以及比特流处理系统(有时称为比特流处理工具)。Throughout this disclosure, including the claims, the expressions "audio processor" and "audio processing unit" are used interchangeably and broadly to refer to a system configured to process audio data. Examples of audio processing units include, but are not limited to, encoders (e.g., transcoders), decoders, codecs, pre-processing systems, post-processing systems, and bitstream processing systems (sometimes referred to as bitstream processing tools).
贯穿包括权利要求在内的本公开内容,(编码音频比特流的)“元数据”的表达指代与比特流的相应的音频数据分离的且不同的数据。Throughout this disclosure, including the claims, the expression "metadata" (of an encoded audio bitstream) refers to data that is separate and distinct from the corresponding audio data of the bitstream.
贯穿包括权利要求在内的本公开内容,“子流结构元数据”(或“SSM”)的表达表示编码音频比特流(或编码音频比特流集)的元数据,其指示编码比特流的音频内容的子流结构。Throughout this disclosure, including the claims, the expression "substream structure metadata" (or "SSM") denotes metadata of a coded audio bitstream (or set of coded audio bitstreams) that indicates the substream structure of the audio content of the coded bitstream.
贯穿包括权利要求在内的本公开内容,“节目信息元数据”(或“PIM”)的表达表示编码音频比特流的元数据,该编码音频比特流指示至少一个音频节目(例如,两个或更多个音频节目),其中所述元数据指示至少一个所述节目的音频内容的至少一个属性或特性(例如,指示对节目的音频数据执行的处理的类型或参数的元数据、或表示节目的哪些通道是活动通道的元数据)。Throughout this disclosure, including the claims, the expression "program information metadata" (or "PIM") means metadata for an encoded audio bitstream that is indicative of at least one audio program (e.g., two or more audio programs), wherein the metadata indicates at least one attribute or characteristic of the audio content of at least one of the programs (e.g., metadata indicating the type or parameters of processing performed on the program's audio data, or metadata indicating which channels of the program are active channels).
贯穿包括权利要求在内的本公开内容,“处理状态元数据”的表达(例如,如在“响度处理状态元数据”的表达中)指代与比特流的音频数据相关联的(编码音频比特流的)元数据,指示相应的(相关联的)音频数据的处理状态(例如,已经对音频数据执行了什么类型的处理),并且通常还指示音频数据的至少一个特征或特性。处理状态元数据与音频数据的关联是时间同步的。从而,当前的(最新接收或更新的)处理状态元数据指示相应的音频数据同时包括所指示的类型的音频数据处理的结果。在一些情况下,处理状态元数据可以包括处理历史和/或用于所指示的类型的处理中的和/或从所指示的类型的处理中得到的参数中的一些或全部。另外,处理状态元数据可以包括相应的音频数据的已经从音频数据中计算或提取的至少一个特征或特性。处理状态元数据还可以包括与相应的音频数据的任何处理无关的或不是从相应的音频数据的任何处理中得到的其他元数据。例如,第三方数据、跟踪信息、标识符、所有权或标准信息、用户注释数据、用户偏好数据等可以通过具体的音频处理单元被添加以传递至其他音频处理单元。Throughout this disclosure, including the claims, the expression "processing state metadata" (e.g., as in the expression "loudness processing state metadata") refers to metadata associated with audio data of a bitstream (of an encoded audio bitstream) that indicates the processing state of the corresponding (associated) audio data (e.g., what type of processing has been performed on the audio data) and typically also indicates at least one feature or characteristic of the audio data. The association of the processing state metadata with the audio data is time-synchronous. Thus, current (most recently received or updated) processing state metadata indicates that the corresponding audio data also includes the results of the indicated type of audio data processing. In some cases, the processing state metadata may include some or all of the processing history and/or parameters used in and/or derived from the indicated type of processing. Additionally, the processing state metadata may include at least one feature or characteristic of the corresponding audio data that has been calculated or extracted from the audio data. The processing state metadata may also include other metadata that is not related to or derived from any processing of the corresponding audio data. For example, third-party data, tracking information, identifiers, proprietary or standards information, user annotation data, user preference data, etc. may be added by a specific audio processing unit for delivery to other audio processing units.
贯穿包括权利要求在内的本公开内容,“响度处理状态元数据”(或“LPSM”)的表达表示处理状态元数据,处理状态元数据指示相应的音频数据的响度处理状态(例如,已经对音频数据执行了什么类型的响度处理),并且通常还指示相应的音频数据的至少一个特征或特性(例如,响度)。响度处理状态元数据可以包括不是(即,当单独考虑时)响度处理状态元数据的数据(例如,其他元数据)。Throughout this disclosure, including the claims, the expression "loudness processing state metadata" (or "LPSM") refers to processing state metadata that indicates the loudness processing state of corresponding audio data (e.g., what type of loudness processing has been performed on the audio data) and typically also indicates at least one feature or characteristic of the corresponding audio data (e.g., loudness). Loudness processing state metadata may include data that is not (i.e., when considered alone) loudness processing state metadata (e.g., other metadata).
贯穿包括权利要求在内的本公开内容,“通道”(或“音频通道”)的表达表示单通道音频信号。Throughout this disclosure including the claims, the expression "channel" (or "audio channel") refers to a single-channel audio signal.
贯穿包括权利要求在内的本公开内容,“音频节目”的表达表示一个或更多个音频通道的集合以及可选地还表示相关联的元数据(例如,描述期望的空间音频表示的元数据、和/或PIM、和/或SSM、和/或LPSM、和/或节目边界元数据)。Throughout this disclosure, including the claims, the expression "audio program" refers to a collection of one or more audio channels and optionally also associated metadata (e.g., metadata describing the desired spatial audio representation, and/or PIM, and/or SSM, and/or LPSM, and/or program boundary metadata).
贯穿包括权利要求在内的本公开内容,“节目边界元数据”的表达表示编码音频比特流的元数据,其中编码音频比特流指示至少一个音频节目(例如,两个或更多个节目),并且节目边界元数据指示至少一个所述音频节目的至少一个边界(开始和/或结束)在比特流中的位置。例如,(指示音频节目的编码音频比特流的)节目边界元数据可以包括指示节目的开始的位置(例如,比特流的第“N”帧的开始,或比特流的第“N”帧的第“M”个样本位置)的元数据,以及指示节目的结束的位置(例如,比特流的第“J”帧的开始,或比特流的第“J”帧的第“K”个样本位置)的额外元数据。Throughout this disclosure, including the claims, the expression "program boundary metadata" refers to metadata of a coded audio bitstream, wherein the coded audio bitstream indicates at least one audio program (e.g., two or more programs), and the program boundary metadata indicates the position of at least one boundary (start and/or end) of at least one of the audio programs in the bitstream. For example, the program boundary metadata (of the coded audio bitstream indicating the audio program) may include metadata indicating the position of the start of the program (e.g., the start of the "N"th frame of the bitstream, or the "M"th sample position of the "N"th frame of the bitstream), and additional metadata indicating the position of the end of the program (e.g., the start of the "J"th frame of the bitstream, or the "K"th sample position of the "J"th frame of the bitstream).
贯穿包括权利要求在内的本公开内容,术语“耦接”或“被耦接”用于表示直接或间接连接。从而,如果第一设备耦接至第二设备,该连接可以是通过直接连接,或经由其他设备和连接的通过间接连接。Throughout this disclosure, including the claims, the terms "couple" or "coupled" are used to mean either a direct or indirect connection. Thus, if a first device couples to a second device, that connection may be through a direct connection or through an indirect connection via other devices and connections.
具体实施方式DETAILED DESCRIPTION
典型的音频数据流包括音频内容(例如,音频内容的一个或更多个通道)和指示音频内容的至少一个特性的元数据两者。例如,在AC-3比特流中,存在具体意在用于改变被传送至收听环境的节目的声音的若干音频元数据参数。元数据参数中的一个为DIALNORM参数,其意在指示音频节目中的对白的平均电平,并且用于确定音频回放信号电平。A typical audio data stream includes both audio content (e.g., one or more channels of audio content) and metadata indicating at least one characteristic of the audio content. For example, in an AC-3 bitstream, there are several audio metadata parameters specifically intended to alter the sound of a program delivered to a listening environment. One of these metadata parameters is the DIALNORM parameter, which is intended to indicate the average level of dialogue in an audio program and is used to determine the audio playback signal level.
在包括一系列不同的音频节目段(每个具有不同的DIALNORM参数)的比特流的回放期间,AC-3解码器使用每个段的DIALNORM参数执行一种类型的响度处理,在该响度处理中AC-3解码器修改回放电平或响度,使得该系列段的对白的感知的响度处于一致的电平。一系列编码音频项目中的每个编码音频段(项目)将(通常)具有不同的DIALNORM参数,并且解码器将对项目中的每个项目的电平进行缩放,使得每个项目的对白的回放电平或响度相同或非常相似,尽管这会要求在回放期间对项目中的不同的项目应用不同量的增益。During playback of a bitstream that includes a series of different audio program segments (each with different DIALNORM parameters), the AC-3 decoder uses the DIALNORM parameters for each segment to perform a type of loudness processing in which the AC-3 decoder modifies the playback level or loudness so that the perceived loudness of the dialogue for the series of segments is at a consistent level. Each encoded audio segment (item) in a series of encoded audio items will (usually) have different DIALNORM parameters, and the decoder will scale the levels of each item in the item so that the playback level or loudness of the dialogue for each item is the same or very similar, although this may require that different amounts of gain be applied to different items in the item during playback.
DIALNORM通常由用户设置而不是自动生成的,然而如果用户没有设置值则存在默认的DIALNORM值。例如,内容创建者可以使用AC-3编码器外部的装置进行响度测量,然后将该结果(指示音频节目的口语对白的响度)传送至编码器以设置DIALNORM值。从而,依赖于内容创建者正确地设置DIALNORM参数。DIALNORM is typically set by the user rather than automatically generated, however, there is a default DIALNORM value if the user does not set a value. For example, a content creator might perform loudness measurement using a device external to the AC-3 encoder and then pass that result (indicating the loudness of the spoken dialogue of the audio program) to the encoder to set the DIALNORM value. Thus, the content creator is reliant on correctly setting the DIALNORM parameter.
对于为什么AC-3比特流中的DIALNORM参数会是错误的,存在几个不同的原因。第一,如果DIALNORM值不是由内容创建者设置的,那么每个AC-3编码器具有在比特流的生成期间使用的默认的DIALNORM值。该默认值可能与音频的实际对白响度显著不同。第二,即使内容创建者测量响度并且相应地设置DIALNORM值,可能已经使用不符合推荐的AC-3响度测量方法的响度测量算法或计量器,产生不正确的DIALNORM值。第三,即使已经使用由内容创建者正确测量和设置的DIALNORM值创建了AC-3比特流,该AC-3比特流可能在比特流的传输和/或存储期间已经被改变成错误值。例如,这在使用错误的DIALNORM元数据信息解码、修改然后重新编码AC-3比特流的电视广播应用中并非是不常见的。从而,包括在AC-3比特流中的DIALNORM值可能是错误的或不准确的,因此可能对收听体验的质量有消极的影响。There are several different reasons why the DIALNORM parameter in an AC-3 bitstream may be incorrect. First, if the DIALNORM value is not set by the content creator, each AC-3 encoder has a default DIALNORM value used during bitstream generation. This default value may differ significantly from the actual dialogue loudness of the audio. Second, even if the content creator measures loudness and sets the DIALNORM value accordingly, a loudness measurement algorithm or meter that does not conform to the recommended AC-3 loudness measurement method may have been used, resulting in an incorrect DIALNORM value. Third, even if the AC-3 bitstream was created using the DIALNORM value correctly measured and set by the content creator, the AC-3 bitstream may have been altered to an incorrect value during the bitstream's transmission and/or storage. For example, this is not uncommon in television broadcast applications where an AC-3 bitstream is decoded, modified, and then re-encoded using incorrect DIALNORM metadata information. Consequently, the DIALNORM value included in the AC-3 bitstream may be incorrect or inaccurate, potentially negatively impacting the quality of the listening experience.
此外,DIALNORM参数不指示相应的音频数据的响度处理状态(例如,已经对音频数据执行了什么类型的响度处理)。响度处理状态元数据(以其在本发明的一些实施方式中被提供的格式)有助于以尤其高效的方式便利于音频比特流的自适应响度处理和/或音频内容的响度处理状态和响度的有效性的验证。Furthermore, the DIALNORM parameter does not indicate the loudness processing state of the corresponding audio data (e.g., what type of loudness processing has been performed on the audio data). Loudness processing state metadata (in the format in which it is provided in some embodiments of the present invention) helps facilitate adaptive loudness processing of audio bitstreams and/or verification of the validity of the loudness processing state and loudness of audio content in a particularly efficient manner.
尽管本发明不限于使用AC-3比特流、E-AC-3比特流或杜比E比特流,为了方便,将在生成、解码或以其他方式处理这样的比特流的实施方式中对其进行描述。Although the present invention is not limited to use with AC-3 bitstreams, E-AC-3 bitstreams, or Dolby E bitstreams, for convenience it will be described in embodiments that generate, decode, or otherwise process such bitstreams.
AC-3编码比特流包括元数据和音频内容的1至6个通道。音频内容是已经使用感知音频编码压缩的音频数据。元数据包括意在用于改变被传送至收听环境的节目的声音的若干音频元数据参数。The AC-3 encoded bitstream includes metadata and 1 to 6 channels of audio content. The audio content is audio data that has been compressed using perceptual audio coding. The metadata includes several audio metadata parameters intended to change the sound of the program being delivered to the listening environment.
AC-3编码音频比特流的每帧包含关于数字音频的1536个样本的音频内容和元数据。对于48kHz的采样率,这表示32毫秒的数字音频或音频的每秒31.25帧的速率。Each frame of an AC-3 encoded audio bitstream contains audio content and metadata about 1536 samples of digital audio. For a 48kHz sampling rate, this represents 32 milliseconds of digital audio, or 31.25 frames per second of audio.
取决于帧是否分别包含1块、2块、3块或6块音频数据,E-AC-3编码音频比特流的每帧包含关于数字音频的256、512、768或1536个样本的音频数据和元数据。对于48kHz的采样率,这分别表示5.333、10.667、16或32毫秒的数字音频或分别表示音频的每秒189.9、93.75、62.5或31.25帧的速率。Each frame of the E-AC-3 coded audio bitstream contains audio data and metadata for 256, 512, 768, or 1536 samples of digital audio, depending on whether the frame contains 1, 2, 3, or 6 blocks of audio data, respectively. For a sampling rate of 48kHz, this represents 5.333, 10.667, 16, or 32 milliseconds of digital audio, respectively, or an audio rate of 189.9, 93.75, 62.5, or 31.25 frames per second, respectively.
如图4所示,每个AC-3帧被划分成部分(段),包括:包含(如图5所示)同步字(SW)和两个误差校正字中的第一个误差校正字(CRC1)的同步信息(SI)部分;包含大部分元数据的比特流信息(BSI)部分;包含数据压缩音频内容(以及还可以包括元数据)的6个音频块(AB0至AB5);包含在压缩音频内容之后剩余的任意未使用的位的无用位段(W)(也称为“跳过字段”);可以包含更多元数据的辅助(AUX)信息部分;以及两个误差校正字中的第二个误差校正字(CRC2)。As shown in Figure 4, each AC-3 frame is divided into parts (segments), including: a synchronization information (SI) part containing (as shown in Figure 5) a synchronization word (SW) and the first of two error correction words (CRC1); a bit stream information (BSI) part containing most of the metadata; 6 audio blocks (AB0 to AB5) containing data compressed audio content (and may also include metadata); a useless bit segment (W) (also called a "skip field") containing any unused bits remaining after the compressed audio content; an auxiliary (AUX) information part that may contain more metadata; and the second of the two error correction words (CRC2).
如图7所示,每个E-AC-3帧被划分成部分(段),包括:包含(如图5所示)同步字(SW)的同步信息(SI)部分;包含大部分元数据的比特流信息(BSI)部分;包含数据压缩音频内容(以及还可以包括元数据)的6个音频块(AB0至AB5);包含在压缩音频内容之后剩余的任意未使用的位的无用位段(W)(也称为“跳过字段”)(尽管仅示出了一个无用位段,不同的无用位段或跳过字段段通常可以在每个音频块之后);可以包含更多元数据的辅助(AUX)信息部分;以及误差校正字(CRC)。As shown in Figure 7, each E-AC-3 frame is divided into parts (segments), including: a synchronization information (SI) part containing the synchronization word (SW) (as shown in Figure 5); a bit stream information (BSI) part containing most of the metadata; 6 audio blocks (AB0 to AB5) containing data compressed audio content (and may also include metadata); a useless bit segment (W) (also called a "skip field") containing any unused bits remaining after the compressed audio content (although only one useless bit segment is shown, a different useless bit segment or skip field segment may typically follow each audio block); an auxiliary (AUX) information part that may contain more metadata; and an error correction word (CRC).
在AC-3(或E-AC-3)比特流中,存在具体意在用于改变被传送至收听环境的节目的声音的若干音频元数据参数。元数据参数中的一个为DIALNORM参数,该DIALNORM参数被包括在BSI段中。In an AC-3 (or E-AC-3) bitstream, there are several audio metadata parameters that are specifically intended to change the sound of a program delivered to a listening environment. One of the metadata parameters is the DIALNORM parameter, which is included in the BSI segment.
如图6所示,AC-3帧的BSI段包括指示节目的DIALNORM值的5位参数(“DIALNORM”)。如果AC-3帧的音频编码模式(“acmod”)为0,则包括指示在同一AC-3帧中携带的第二音频节目的5位参数DIALNORM值的5位参数(“DIALNORM2”),指示使用双单通道或“1+1”通道配置。As shown in Figure 6, the BSI segment of the AC-3 frame includes a 5-bit parameter ("DIALNORM") indicating the DIALNORM value of the program. If the audio coding mode ("acmod") of the AC-3 frame is 0, a 5-bit parameter ("DIALNORM2") indicating the 5-bit DIALNORM value of the second audio program carried in the same AC-3 frame is included, indicating the use of a dual mono channel or "1+1" channel configuration.
BSI段还包括指示在“addbsie”位之后额外的比特流信息的存在(或不存在)的标志(“addbsie”)、指示在“addbsil”值之后任何额外的比特流信息的长度的参数(“addbsil”)、以及在“addbsil”值之后高达64位的额外的比特流信息(“addbsi”)。The BSI segment also includes a flag ("addbsie") indicating the presence (or absence) of additional bitstream information following the "addbsie" bits, a parameter ("addbsil") indicating the length of any additional bitstream information following the "addbsil" value, and up to 64 bits of additional bitstream information following the "addbsil" value ("addbsi").
BSI段包括在图6中没有具体示出的其他元数据值。The BSI segment includes other metadata values not specifically shown in FIG. 6 .
根据一类实施方式,编码比特流指示音频内容的多个子流。在一些情况下,子流指示多通道节目的音频内容,并且子流中的每个指示节目的通道中的一个或更多个。在其他情况下,编码音频比特流的多个子流指示若干音频节目——通常为“主”音频节目(可以是多通道节目)和至少一个其他音频节目(例如,为关于主音频节目的评论的节目)——的音频内容。According to one class of embodiments, an encoded bitstream indicates multiple substreams of audio content. In some cases, the substreams indicate the audio content of a multi-channel program, and each of the substreams indicates one or more of the program's channels. In other cases, the multiple substreams of the encoded audio bitstream indicate the audio content of several audio programs, typically a "main" audio program (which may be a multi-channel program) and at least one other audio program (e.g., a program that provides commentary about the main audio program).
指示至少一个音频节目的编码音频比特流需要包括音频内容的至少一个“独立”子流。独立子流指示音频节目的至少一个通道(例如,独立子流可以指示常规的5.1通道音频节目的5个全音域通道)。在本文中,该音频节目称为“主”节目。The encoded audio bitstream indicating at least one audio program needs to include at least one "independent" substream of audio content. The independent substream indicates at least one channel of the audio program (for example, the independent substream may indicate the five full-range channels of a conventional 5.1-channel audio program). In this document, this audio program is referred to as the "main" program.
在一些类型的实施方式中,编码音频比特流指示两个或更多个音频节目(“主”节目和至少一个其他音频节目)。在这样的情况下,比特流包括两个或更多个独立子流:指示主节目的至少一个通道的第一独立子流;以及指示另一音频节目(与主节目不同的节目)的至少一个通道的至少一个其他独立子流。每个独立子流可以独立地被解码,并且解码器可以操作以仅对编码比特流的独立子流的子集(不是全部)进行解码。In some types of implementations, an encoded audio bitstream indicates two or more audio programs (a "main" program and at least one other audio program). In such a case, the bitstream includes two or more independent substreams: a first independent substream indicating at least one channel of the main program; and at least one other independent substream indicating at least one channel of another audio program (a program different from the main program). Each independent substream can be decoded independently, and the decoder can be operated to decode only a subset (not all) of the independent substreams of the encoded bitstream.
在指示两个独立子流的编码音频比特流的典型示例中,独立子流中的一个指示多通道主节目的标准格式扬声器通道(例如,5.1通道主节目的左、右、中、左环绕、右环绕全音域扬声器通道),而另一独立子流指示关于主节目的单通道音频评论(例如,导演关于电影的评论,其中主节目是电影的声带(soundtrack))。在指示多个独立子流的编码音频比特流的另一示例中,独立子流中的一个指示包括第一语言的对白的多通道主节目(例如,5.1通道主节目)的标准格式扬声器通道(例如,主节目的扬声器通道中的一个可以指示对白),而每个其他独立子流指示对白的单通道翻译(翻译成不同的语言)。In a typical example of an encoded audio bitstream indicating two independent substreams, one of the independent substreams indicates standard format speaker channels of a multi-channel main program (e.g., left, right, center, left surround, and right surround full-range speaker channels of a 5.1-channel main program), while the other independent substream indicates a single-channel audio commentary about the main program (e.g., a director's commentary about a movie, where the main program is the movie's soundtrack). In another example of an encoded audio bitstream indicating multiple independent substreams, one of the independent substreams indicates standard format speaker channels of a multi-channel main program (e.g., a 5.1-channel main program) including dialogue in a first language (e.g., one of the speaker channels of the main program may indicate dialogue), while each of the other independent substreams indicates a single-channel translation (into a different language) of the dialogue.
可选地,指示主节目(可选地还指示至少一个其他音频节目)的编码音频比特流包括音频内容的至少一个“从属”子流。每个从属子流与比特流的一个独立子流相关联,并且指示其内容由相关联的独立子流指示的节目(例如,主节目)的至少一个额外的通道(即,从属子流指示节目的不是由相关联的独立子流指示的至少一个通道,而相关联的独立子流指示节目的至少一个通道)。Optionally, the coded audio bitstream indicating a primary program (and optionally at least one other audio program) includes at least one "slave" substream of audio content. Each slave substream is associated with an independent substream of the bitstream and indicates at least one additional channel of the program (e.g., primary program) whose content is indicated by the associated independent substream (i.e., the slave substream indicates at least one channel of the program that is not indicated by the associated independent substream, while the associated independent substream indicates at least one channel of the program).
在包括独立子流(指示主节目的至少一个通道)的编码比特流的示例中,比特流还包括指示主节目的一个或更多个额外的扬声器通道的(与独立子流相关联的)从属子流。这样的额外的扬声器通道对由独立子流指示的主节目通道来说是额外的。例如,如果独立子流指示7.1通道主节目的左、右、中、左环绕、右环绕全音域扬声器通道,那么从属子流可以指示主节目的其他两个全音域扬声器通道。In the example of an encoded bitstream including an independent substream (indicating at least one channel of a main program), the bitstream also includes a dependent substream (associated with the independent substream) indicating one or more additional speaker channels for the main program. Such additional speaker channels are in addition to the main program channels indicated by the independent substream. For example, if the independent substream indicates the left, right, center, left surround, and right surround full-range speaker channels of a 7.1-channel main program, the dependent substream may indicate the other two full-range speaker channels of the main program.
根据E-AC-3标准,E-AC-3比特流必须指示至少一个独立子流(例如,单个AC-3比特流),并且可以指示高达8个独立子流。E-AC-3比特流的每个独立子流可以与高达8个从属子流相关联。According to the E-AC-3 standard, an E-AC-3 bitstream must indicate at least one independent substream (e.g., a single AC-3 bitstream) and can indicate up to 8 independent substreams. Each independent substream of an E-AC-3 bitstream can be associated with up to 8 dependent substreams.
E-AC-3比特流包括指示比特流的子流结构的元数据。例如,E-AC-3比特流的比特流信息(BSI)部分中的“chanmap”字段确定由比特流的从属子流指示的节目通道的通道映射。然而,指示子流结构的元数据常规地以如下格式包括在E-AC-3比特流中:该格式使得便于仅由E-AC-3解码器访问和使用(在编码E-AC-3比特流的解码期间);不便于在解码之后(例如,由后处理器)或解码之前(例如,由被配置成识别元数据的处理器)访问和使用。而且,存在以下风险:解码器可能使用常规地包括的元数据错误地识别常规的E-AC-3编码比特流的子流,并且在本发明之前还不知道如何以这样的格式在编码比特流(例如,编码E-AC-3比特流)中包括子流结构元数据,使得允许在比特流的解码期间方便和高效的检测和校正子流识别中的误差。An E-AC-3 bitstream includes metadata indicating the substream structure of the bitstream. For example, the "chanmap" field in the Bitstream Information (BSI) portion of an E-AC-3 bitstream determines the channel mapping for the program channels indicated by the bitstream's subordinate substreams. However, metadata indicating the substream structure is conventionally included in an E-AC-3 bitstream in a format that facilitates access and use only by an E-AC-3 decoder (during decoding of an encoded E-AC-3 bitstream) and is not convenient for access and use after decoding (e.g., by a post-processor) or before decoding (e.g., by a processor configured to identify the metadata). Furthermore, there is a risk that a decoder might incorrectly identify substreams of a conventional E-AC-3 encoded bitstream using the conventionally included metadata, and prior to the present invention, it was not known how to include substream structure metadata in an encoded bitstream (e.g., an encoded E-AC-3 bitstream) in a format that allows convenient and efficient detection and correction of errors in substream identification during decoding of the bitstream.
E-AC-3比特流还可以包括关于音频节目的音频内容的元数据。例如,指示音频节目的E-AC-3比特流包括指示已经使用谱扩展处理(以及通道耦合编码)以对节目的内容进行编码的最小频率和最大频率的元数据。然而,这样的元数据通常以如下格式包括在E-AC-3比特流中,该格式使得便于仅由E-AC-3解码器访问和使用(在编码E-AC-3比特流的解码期间);不便于在解码之后(例如,由后处理器)或解码之前(例如,由被配置成识别元数据的处理器)访问和使用。而且,这样的元数据不以如下的格式包括在E-AC-3比特流中,该格式允许在比特流的解码期间这样的元数据的识别的方便和高效的误差检测和误差校正。An E-AC-3 bitstream may also include metadata about the audio content of an audio program. For example, an E-AC-3 bitstream indicating an audio program includes metadata indicating the minimum and maximum frequencies at which spectral spreading (and channel coupling coding) was used to encode the program's content. However, such metadata is typically included in the E-AC-3 bitstream in a format that facilitates access and use only by an E-AC-3 decoder (during decoding of an encoded E-AC-3 bitstream); it is not convenient for access and use after decoding (e.g., by a post-processor) or before decoding (e.g., by a processor configured to recognize the metadata). Furthermore, such metadata is not included in the E-AC-3 bitstream in a format that allows for convenient identification of such metadata during decoding of the bitstream and for efficient error detection and error correction.
根据本发明的典型的实施方式,PIM和/或SSM(以及可选地还有其他元数据,例如,响度处理状态元数据或“LPSM”)被嵌入在音频比特流的元数据段的一个或更多个保留字段(或槽(slot))中,该音频比特流还包括其他段(音频数据段)中的音频数据。通常,比特流的每个帧的至少一个段包括PIM或SSM,并且帧的至少一个其他段包括相应的音频数据(即,其数据结构由SSM指示的和/或其至少一个特性或属性由PIM指示的音频数据)。According to an exemplary embodiment of the present invention, a PIM and/or SSM (and optionally other metadata, such as loudness processing state metadata or "LPSM") is embedded in one or more reserved fields (or slots) of a metadata segment of an audio bitstream, which also includes audio data in other segments (the audio data segments). Typically, at least one segment of each frame of the bitstream includes a PIM or SSM, and at least one other segment of the frame includes corresponding audio data (i.e., audio data whose data structure is indicated by the SSM and/or whose at least one characteristic or attribute is indicated by the PIM).
在一类实施方式中,每个元数据段为可以包含一个或更多个元数据有效载荷的数据结构(在本文中有时称为容器)。每个有效载荷包括报头以提供存在于有效载荷中的元数据的类型的明确的指示,其中报头包括具体的有效载荷标识符(或有效载荷配置数据)。有效载荷在容器内的顺序未被定义,使得有效载荷可以以任何顺序存储并且分析器必须能够对整个容器进行分析以提取相关的有效载荷而忽略不相关的或不支持的有效载荷。图8(下面将要描述的)说明这样的容器和容器内的有效载荷的结构。In one class of embodiments, each metadata segment is a data structure (sometimes referred to herein as a container) that can contain one or more metadata payloads. Each payload includes a header that provides an explicit indication of the type of metadata present in the payload, wherein the header includes a specific payload identifier (or payload configuration data). The order of the payloads within the container is undefined, so that the payloads can be stored in any order and a parser must be able to parse the entire container to extract relevant payloads while ignoring irrelevant or unsupported payloads. FIG8 (described below) illustrates the structure of such a container and the payloads within the container.
当两个或更多个音频处理单元需要贯穿该处理链(或内容生命周期)彼此合作工作时,音频数据处理链中的通信元数据(例如,SSM和/或PIM和/或LPSM)尤其有用。在音频比特流中不包括元数据的情况下,例如,当在链中利用两个或更多个音频编解码器并且在媒体消耗装置的比特流路径(或比特流的音频内容的渲染点)期间多于一次地应用单端音量时,可以出现若干媒体处理问题,例如质量、电平和空间退化。Communicating metadata (e.g., SSM and/or PIM and/or LPSM) in an audio data processing chain is particularly useful when two or more audio processing units need to work cooperatively with each other throughout the processing chain (or content lifecycle). Without metadata included in the audio bitstream, for example, when two or more audio codecs are utilized in the chain and single-ended volume is applied more than once during the bitstream path (or rendering point of the audio content of the bitstream) of a media consumption device, several media processing issues, such as quality, level, and spatial degradation, can arise.
根据本发明的一些实施方式,嵌入在音频比特流中的响度处理状态元数据(LPSM)可以被认证和验证,例如以使得响度调整实体能够证明特定节目的响度是否已经在指定的范围内以及相应的音频数据本身是否未被修改(由此确保符合可适用的调节)。包括在包括响度处理状态元数据的数据块中的响度值可以被读出以对此进行验证,而不再次计算响度。响应于LPSM,管理结构可以确定相应的音频内容符合(如由LPSM指示的)响度法定的和/或管理的要求(例如,在商业广告响度缓解法下公布的规则,也称为“CALM”法)而不需要计算音频内容的响度。According to some embodiments of the present invention, loudness processing state metadata (LPSM) embedded in an audio bitstream can be authenticated and verified, for example, to enable a loudness adjustment entity to verify whether the loudness of a particular program is within a specified range and whether the corresponding audio data itself has not been modified (thereby ensuring compliance with applicable regulations). The loudness value included in the data block containing the loudness processing state metadata can be read to verify this without recalculating the loudness. In response to the LPSM, a management structure can determine that the corresponding audio content complies with loudness legal and/or regulatory requirements (e.g., rules promulgated under the Commercial Loudness Mitigation Act, also known as the "CALM" method) (as indicated by the LPSM) without needing to calculate the loudness of the audio content.
图1为示例性音频处理链(音频数据处理系统)的框图,在音频处理链中,系统的元件中的一个或更多个可以根据本发明的实施方式被配置。系统包括如所示耦接在一起的以下元件:预处理单元、编码器、信号分析和元数据校正单元、代码转换器、解码器和预处理单元。在所示的系统的变型中,省略元件中的一个或更多个,或包括额外的音频数据处理单元。FIG1 is a block diagram of an exemplary audio processing chain (audio data processing system) in which one or more of the system's elements may be configured according to embodiments of the present invention. The system includes the following elements coupled together as shown: a preprocessing unit, an encoder, a signal analysis and metadata correction unit, a transcoder, a decoder, and the preprocessing unit. In variations of the illustrated system, one or more of the elements may be omitted, or additional audio data processing units may be included.
在一些实现中,图1的预处理单元被配置成接收包括音频内容的PCM(时域)样本作为输入,并且输出经处理PCM样本。编码器可以被配置成接收PCM样本作为输入,并且输出指示音频内容的编码的(例如,压缩的)音频比特流。指示音频内容的比特流的数据在本文中有时被称为“音频数据”。如果编码器根据本发明的典型实施方式被配置,那么从编码器输出的音频比特流包括PIM和/或SSM(可选地还包括响度处理状态元数据和/或其他元数据)以及音频数据。In some implementations, the pre-processing unit of FIG. 1 is configured to receive as input PCM (time domain) samples comprising audio content and output processed PCM samples. The encoder can be configured to receive as input the PCM samples and output an encoded (e.g., compressed) audio bitstream indicating the audio content. Data indicating the bitstream of audio content is sometimes referred to herein as "audio data." If the encoder is configured according to an exemplary embodiment of the present invention, the audio bitstream output from the encoder includes the PIM and/or SSM (optionally also including loudness processing state metadata and/or other metadata) as well as the audio data.
图1的信号分析和元数据校正单元可以接收一个或更多个编码音频比特流作为输入,并且通过执行信号分析(例如,使用编码音频比特流中的节目边界元数据)来确定(例如,验证)每个编码音频比特流中的元数据(例如,处理状态元数据)是否正确。如果信号分析和元数据校正单元发现所包括的元数据是无效的,那么通常使用从信号分析中获得的正确值替代错误值。从而,从信号分析和元数据校正单元输出的每个编码音频比特流可以包括校正的(或未校正的)处理状态元数据以及编码音频数据。The signal analysis and metadata correction unit of FIG1 can receive one or more coded audio bitstreams as input and determine (e.g., verify) whether the metadata (e.g., processing state metadata) in each coded audio bitstream is correct by performing signal analysis (e.g., using program boundary metadata in the coded audio bitstream). If the signal analysis and metadata correction unit finds that the included metadata is invalid, it typically replaces the erroneous value with the correct value obtained from the signal analysis. Thus, each coded audio bitstream output from the signal analysis and metadata correction unit can include corrected (or uncorrected) processing state metadata as well as the coded audio data.
图1的代码转换器可以接收编码音频比特流作为输入,并且作为响应(例如,通过对输入流进行解码并且以不同的编码格式对解码流进行重新编码)输出修改的(例如,不同编码的)音频比特流。如果代码转换器根据本发明的典型的实施方式被配置,那么从代码转换器输出的音频比特流包括SSM和/或PIM(通常还包括其他元数据)以及编码音频数据。元数据可以已经被包括在输入比特流中。The code converter of Fig. 1 can receive coded audio bitstream as input, and in response (for example, by decoding the input stream and re-encoding the decoded stream with a different coding format) output modified (for example, differently encoded) audio bitstream. If the code converter is configured according to an exemplary embodiment of the present invention, the audio bitstream output from the code converter includes SSM and/or PIM (usually also including other metadata) and coded audio data. The metadata may have been included in the input bitstream.
图1的解码器可以接收编码的(例如,压缩的)音频比特流作为输入,并且输出(作为响应)解码PCM音频样本流。如果解码器根据本发明的典型的实施方式被配置,那么在典型的操作中,解码器的输出是或包括下列中的任一个:The decoder of Figure 1 can receive an encoded (e.g., compressed) audio bitstream as input and output (in response) a stream of decoded PCM audio samples. If the decoder is configured according to an exemplary embodiment of the present invention, then in typical operation, the output of the decoder is or includes any one of the following:
音频样本流,以及从输入的编码比特流中提取的SSM和/或PIM(通常还有其他元数据)的至少一个相应的流;或a stream of audio samples, and at least one corresponding stream of SSM and/or PIM (and typically other metadata) extracted from the input coded bitstream; or
音频样本流,以及根据从输入编码比特流中提取的SSM和/或PIM(通常还有其他元数据,例如LPSM)所确定的控制位的相应的流;或a stream of audio samples, and a corresponding stream of control bits determined from the SSM and/or PIM (and typically other metadata, such as LPSM) extracted from the input coded bitstream; or
音频样本流,但没有元数据或根据元数据确定的控制位的相应的流。在最后一种情下,解码器可以从输入编码比特流中提取元数据,并且对所提取的元数据执行至少一种操作(例如,验证),即使没有输出所提取的元数据或根据元数据确定的控制位。A stream of audio samples but no corresponding stream of metadata or control bits determined based on the metadata. In the latter case, the decoder may extract the metadata from the input coded bitstream and perform at least one operation (e.g., validation) on the extracted metadata even if the extracted metadata or control bits determined based on the metadata are not output.
通过根据本发明的典型的实施方式配置图1的后处理单元,后处理单元被配置成接收解码的PCM音频样本流,并且使用与样本一起接收的SSM和/或PIM(通常还有其他元数据,例如LPSM),或根据与样本一起接收的元数据确定的控制位对其执行后处理(例如,音频内容的音量校平)。后处理单元还通常被配置成对经后处理音频内容进行渲染用于由一个或更多个扬声器回放。By configuring the post-processing unit of FIG1 according to an exemplary embodiment of the present invention, the post-processing unit is configured to receive a decoded PCM audio sample stream and perform post-processing (e.g., volume leveling of the audio content) using the SSM and/or PIM (typically with other metadata, such as LPSM) received with the samples, or based on control bits determined by the metadata received with the samples. The post-processing unit is also typically configured to render the post-processed audio content for playback by one or more speakers.
本发明的典型的实施方式提供增强的音频处理链,其中音频处理单元(例如,编码器、解码器、代码转换器以及预处理单元和后处理单元)根据由通过音频处理单元分别接收的元数据所指示的媒体数据的同时期的状态来修改待应用于音频数据的其相应的处理。Typical embodiments of the present invention provide an enhanced audio processing chain in which audio processing units (e.g., encoders, decoders, transcoders, and pre-processing and post-processing units) modify their respective processing to be applied to audio data based on a contemporaneous state of the media data indicated by metadata respectively received by the audio processing units.
输入到图1系统的任何音频处理单元(例如,图1的编码器或代码转换器)的音频数据可以包括SSM和/或PIM(可选地还包括其他元数据)以及音频数据(例如,编码音频数据)。该元数据可以根据本发明的实施方式已经通过图1系统的另一元件(或另一源,在图1中未示出)而被包括在输入音频中。接收输入音频(具有元数据)的处理单元可以被配置成对元数据执行至少一种操作(例如,验证),或响应于元数据(例如,输入音频的自适应处理),并且还通常将元数据、元数据的经处理的版本、或根据元数据确定的控制位包括在其输出音频中。The audio data input to any audio processing unit of the FIG1 system (e.g., the encoder or transcoder of FIG1 ) may include the SSM and/or PIM (optionally, other metadata) along with the audio data (e.g., encoded audio data). The metadata may have been included in the input audio by another element of the FIG1 system (or another source, not shown in FIG1 ) according to an embodiment of the present invention. The processing unit that receives the input audio (with metadata) may be configured to perform at least one operation on the metadata (e.g., validation) or respond to the metadata (e.g., adaptive processing of the input audio), and may also typically include the metadata, a processed version of the metadata, or control bits determined based on the metadata in its output audio.
本发明的音频处理单元(或音频处理器)的典型的实施方式被配置成基于由对应于音频数据的元数据所指示的音频数据的状态来执行音频数据的自适应处理。在一些实施方式中,自适应处理是(或包括)响度处理(如果元数据指示还未对音频数据执行响度处理或与响度处理类似的处理),而不是(且不包括)响度处理(如果元数据指示已经对音频数据执行了这样的响度处理或与响度处理类似的处理)。在一些实施方式中,自适应处理是或包括(例如,在元数据验证子单元中执行的)元数据验证以确保音频处理单元基于由元数据所指示的音频数据的状态来执行音频数据的其他自适应处理。在一些实施方式中,该验证确定与音频数据相关联(例如,包括在具有音频数据的比特流中)的元数据的可靠性。例如,如果验证元数据是可靠的,那么来自一种先前执行的音频处理的结果可以被重新使用并且可以避免新执行相同类型的音频处理。另一方面,如果发现元数据已经被篡改(或以其他方式不可靠),那么据称先前执行的一种类型的媒体处理(如由不可靠的元数据指示的)可以由音频处理单元重复,和/或可以由音频处理单元对元数据和/或音频数据执行其他处理。如果该单元确定元数据是有效的(例如,基于所提取的加密值与参考加密值的匹配),音频处理单元还可以被配置成用信号向增强的媒体处理链下游的其他音频处理单元通知元数据(例如,存在于媒体比特流中)是有效的。Typical embodiments of the audio processing unit (or audio processor) of the present invention are configured to perform adaptive processing of audio data based on the state of the audio data indicated by metadata corresponding to the audio data. In some embodiments, the adaptive processing is (or includes) loudness processing (if the metadata indicates that loudness processing or processing similar to loudness processing has not yet been performed on the audio data), rather than (and does not include) loudness processing (if the metadata indicates that such loudness processing or processing similar to loudness processing has already been performed on the audio data). In some embodiments, the adaptive processing is or includes metadata verification (e.g., performed in a metadata verification subunit) to ensure that the audio processing unit performs other adaptive processing of the audio data based on the state of the audio data indicated by the metadata. In some embodiments, the verification determines the reliability of metadata associated with the audio data (e.g., included in a bitstream with the audio data). For example, if the metadata is verified to be reliable, the results from a previously performed audio processing can be reused and a new execution of the same type of audio processing can be avoided. On the other hand, if the metadata is found to have been tampered with (or is otherwise unreliable), then a type of media processing purportedly previously performed (as indicated by the unreliable metadata) may be repeated by the audio processing unit, and/or other processing may be performed by the audio processing unit on the metadata and/or audio data. If the unit determines that the metadata is valid (e.g., based on a match between the extracted encryption value and the reference encryption value), the audio processing unit may also be configured to signal to other audio processing units downstream in the enhanced media processing chain that the metadata (e.g., present in the media bitstream) is valid.
图2是作为本发明的音频处理单元的实施方式的编码器(100)的框图。编码器100的任何部件或元件可以以硬件或软件或硬件与软件的组合被实现为一个或更多个处理和/或一个或更多个电路(例如,ASIC、FPGA或其他集成电路)。编码器100包括如所示地连接的帧缓冲器110、分析器111、解码器101、音频状态验证器102、响度处理级103、音频流选择级104、编码器105、填充器/格式器级107、元数据生成级106、对白响度测量子系统108以及帧缓冲器109。编码器100通常还包括其他处理元件(未示出)。FIG2 is a block diagram of an encoder (100) as an embodiment of an audio processing unit of the present invention. Any component or element of the encoder 100 can be implemented in hardware or software or a combination of hardware and software as one or more processes and/or one or more circuits (e.g., ASICs, FPGAs, or other integrated circuits). The encoder 100 includes a frame buffer 110, an analyzer 111, a decoder 101, an audio state validator 102, a loudness processing stage 103, an audio stream selection stage 104, an encoder 105, a filler/formatter stage 107, a metadata generation stage 106, a dialog loudness measurement subsystem 108, and a frame buffer 109, connected as shown. The encoder 100 typically also includes other processing elements (not shown).
编码器100(为代码转换器)被配置成包括通过使用包括在输入比特流中的响度处理状态元数据执行自适应和自动的响度处理来将输入音频比特流(例如,可以是AC-3比特流、E-AC-3比特流或杜比E比特流中的一个)转换成编码输出音频比特流(例如,可以是AC-3比特流、E-AC-3比特流或杜比E比特流中的另一个)。例如,编码器100可以被配置成将(通常用在生产和广播设备中,但不用在接收已经被广播的音频节目的消费者设备中的格式的)输入杜比E比特流转换成AC-3或E-AC-3格式的(适合于广播至消费者设备的)编码输出音频比特流。The encoder 100 (which is a transcoder) is configured to convert an input audio bitstream (e.g., one of an AC-3 bitstream, an E-AC-3 bitstream, or a Dolby E bitstream) into an encoded output audio bitstream (e.g., another of an AC-3 bitstream, an E-AC-3 bitstream, or a Dolby E bitstream) by performing adaptive and automatic loudness processing using loudness processing state metadata included in the input bitstream. For example, the encoder 100 can be configured to convert an input Dolby E bitstream (which is a format commonly used in production and broadcast equipment, but not in consumer equipment that receives broadcast audio programs) into an encoded output audio bitstream in an AC-3 or E-AC-3 format (suitable for broadcast to consumer equipment).
图2的系统还包括编码音频传送子系统150(其存储和/或传送从编码器100输出的编码比特流)和解码器152。从编码器100输出的编码音频比特流可以由子系统150(例如,以DVD或蓝光光盘格式)存储,或由子系统150(可以实现传输线路或网络)传输,或可以由子系统150存储和传输。解码器152被配置成包括通过从比特流的每个帧中提取元数据(PIM和/或SSM、以及可选地还有响度处理状态元数据和/或其他元数据)(以及可选地还从比特流中提取节目边界元数据)以及生成解码音频数据,对经由子系统150接收的(由编码器100生成的)编码音频比特流进行解码。通常,解码器152被配置成使用PIM和/或SSM和/或LPSM(可选地还使用节目边界元数据)对解码音频数据执行自适应处理,和/或将解码音频数据和元数据转发至被配置成使用元数据对解码音频数据执行自适应处理的后处理器。通常,解码器152包括存储(例如,以非暂态方式)从子系统150中接收的编码音频比特流的缓冲器。The system of FIG2 also includes an encoded audio transmission subsystem 150 (which stores and/or transmits the encoded bitstream output from encoder 100) and a decoder 152. The encoded audio bitstream output from encoder 100 can be stored by subsystem 150 (e.g., in a DVD or Blu-ray Disc format), transmitted by subsystem 150 (which can implement a transmission line or network), or stored and transmitted by subsystem 150. Decoder 152 is configured to decode the encoded audio bitstream (generated by encoder 100) received via subsystem 150 by extracting metadata (PIM and/or SSM, and optionally loudness processing state metadata and/or other metadata) from each frame of the bitstream (and optionally also extracting program boundary metadata from the bitstream) and generating decoded audio data. Typically, decoder 152 is configured to perform adaptive processing on the decoded audio data using the PIM and/or SSM and/or LPSM (optionally also using program boundary metadata), and/or forward the decoded audio data and metadata to a post-processor configured to perform adaptive processing on the decoded audio data using the metadata. Typically, decoder 152 includes a buffer that stores (eg, in a non-transitory manner) an encoded audio bitstream received from subsystem 150 .
编码器100和解码器152的各种实现被配置成执行本发明的方法的不同的实施方式。Various implementations of the encoder 100 and decoder 152 are configured to perform different embodiments of the method of the present invention.
帧缓冲器110是耦接以接收编码输入音频比特流的缓冲存储器。在操作中,缓冲器110存储(例如,以非暂态方式)编码音频比特流的至少一个帧,并且编码音频比特流的帧的序列被从缓冲器110设定到分析器111。The frame buffer 110 is a buffer memory coupled to receive an encoded input audio bitstream. In operation, the buffer 110 stores (e.g., in a non-transient manner) at least one frame of the encoded audio bitstream, and a sequence of frames of the encoded audio bitstream is set from the buffer 110 to the analyzer 111.
将分析器111耦接并配置成从包括这样的元数据的编码输入音频的每个帧中提取PIM和/或SSM、以及响度处理状态元数据(LPSM)、以及可选地还有节目边界元数据(和/或其他元数据),至少将LPSM(以及可选地还有节目边界元数据和/或其他元数据)设定到音频状态验证器102、响度处理级103、级106和子系统108,以从编码输入音频中提取音频数据并且将音频数据设定到解码器101。编码器100的解码器101被配置成对音频数据进行解码以生成解码音频数据,并且将解码音频数据设定到响度处理级103、音频流选择级104、子系统108以及通常还设定到状态验证器102。The analyzer 111 is coupled and configured to extract the PIM and/or SSM, as well as loudness processing state metadata (LPSM), and optionally also program boundary metadata (and/or other metadata) from each frame of the encoded input audio including such metadata, and to provide at least the LPSM (and optionally also program boundary metadata and/or other metadata) to the audio state validator 102, the loudness processing stage 103, the stage 106, and the subsystem 108 to extract audio data from the encoded input audio and to provide the audio data to the decoder 101. The decoder 101 of the encoder 100 is configured to decode the audio data to generate decoded audio data, and to provide the decoded audio data to the loudness processing stage 103, the audio stream selection stage 104, the subsystem 108, and typically also to the state validator 102.
状态验证器102被配置成对设定到其的LPSM(可选地其他元数据)进行认证和验证。在一些实施方式中,LPSM是(或包括在)数据块(中),数据块已经包括在输入比特流中(例如,根据本发明的实施方式)。块可以包括加密散列(基于散列的消息认证代码或“HMAC”)用于对LPSM(可选地还有其他元数据)和/或(从解码器101提供至验证器102的)基本的音频数据进行处理。在这些实施方式中,数据块可以被数字地标记,使得下游的音频处理单元可以相对容易地认证和验证处理状态元数据。The state validator 102 is configured to authenticate and verify the LPSM (and optionally other metadata) set thereto. In some embodiments, the LPSM is (or is included in) a data block that is already included in the input bitstream (e.g., according to embodiments of the present invention). The block may include a cryptographic hash (a hash-based message authentication code or "HMAC") for processing the LPSM (and optionally other metadata) and/or the underlying audio data (provided to the validator 102 from the decoder 101). In these embodiments, the data block may be digitally signed so that downstream audio processing units can relatively easily authenticate and verify the processing state metadata.
例如,HMAC用于生成摘要,并且包括在本发明的比特流中的保护值可以包括该摘要。该摘要可以关于AC-3帧被如下生成:For example, HMAC is used to generate a digest, and the protection value included in the bitstream of the present invention may include the digest. The digest may be generated as follows with respect to the AC-3 frame:
1.在AC-3数据和LPSM被编码之后,帧数据字节(连接的帧数据#1和帧数据#2)和LPSM数据字节用作哈希函数HMAC的输入。没有考虑可以存在于辅助数据字段内的其他数据用于计算摘要。这样的其他数据可以是既不属于AC-3数据也不属于LSPSM数据的字节。可以不考虑包括在LPSM中的保护位用于计算HMAC摘要。1. After the AC-3 data and LPSM are encoded, the frame data bytes (concatenated frame data #1 and frame data #2) and the LPSM data bytes are used as input to the hash function HMAC. Other data that may be present in the ancillary data field is not considered for computing the digest. Such other data may be bytes that are neither AC-3 data nor LSPSM data. The protection bits included in the LPSM are not considered for computing the HMAC digest.
2.在计算摘要之后,被写入比特流中的为保护位保留的字段中。2. After the digest is calculated, it is written to the field reserved for protection bits in the bitstream.
3.生成完整的AC-3帧的最后步骤是CRC校验的计算。这被写在帧的结束处并且考虑属于该帧的所有的数据,包括LPSM位。3. The final step in generating a complete AC-3 frame is the calculation of the CRC checksum. This is written at the end of the frame and takes into account all the data belonging to the frame, including the LPSM bits.
包括但不限于一个或更多个非HMAC加密方法中的任意一个的其他加密方法可以用于LPSM和/或其他元数据(例如,在验证器102中)的验证,以确保元数据和/或基本音频数据的安全的传输和接收。例如,可以在接收本发明的音频比特流的实施方式的每个音频处理单元中执行验证(使用这样的加密方法),以确定包括在该比特流中的元数据和相应的音频数据是否已经经历(和/或已经产生)具体的处理(由元数据指示的)并且在这样的具体的处理执行之后是否未被修改。Other encryption methods, including but not limited to any one of one or more non-HMAC encryption methods, may be used for verification of the LPSM and/or other metadata (e.g., in the verifier 102) to ensure secure transmission and reception of the metadata and/or the underlying audio data. For example, verification (using such an encryption method) may be performed in each audio processing unit of an embodiment of the present invention that receives an audio bitstream to determine whether the metadata and corresponding audio data included in the bitstream have undergone (and/or have been generated by) a specific processing (indicated by the metadata) and have not been modified after such specific processing has been performed.
状态验证器102将控制数据设定到音频流选择级104、元数据生成器106以及对白响度测量子系统108,以表示验证操作的结果。响应于控制数据,级104可以选择(以及传递至编码器105):The state validator 102 sets control data to the audio stream selection stage 104, metadata generator 106, and dialog loudness measurement subsystem 108 to indicate the results of the validation operation. In response to the control data, the stage 104 may select (and pass to the encoder 105):
响度处理级103的经自适应处理的输出(例如,当LPSM指示从解码器101输出的音频数据没有经历特定类型的响度处理,以及来自验证器102的控制位指示LPSM有效时);或the adaptively processed output of the loudness processing stage 103 (e.g., when the LPSM indicates that the audio data output from the decoder 101 has not undergone a particular type of loudness processing, and the control bit from the validator 102 indicates that the LPSM is valid); or
从解码器102输出的音频数据(例如,当LPSM指示从解码器101输出的音频数据已经经历将由级103执行的特定类型的响度处理,并且来自验证器102的控制位指示LPSM有效时)。Audio data output from decoder 102 (e.g., when the LPSM indicates that the audio data output from decoder 101 has been subjected to a particular type of loudness processing to be performed by stage 103, and the control bit from validator 102 indicates that the LPSM is valid).
编码器100的级103被配置成基于由通过解码器101所提取的LPSM指示的一个或更多个音频数据特性,对从解码器101输出的解码音频数据执行自适应响度处理。级103可以是自适应变换域实时响度和动态范围控制处理器。级103可以接收用户输入(例如,用户目标响度/动态范围值或对白归一化值)、或其他元数据输入(例如,一种或更多种类型的第三方数据、跟踪信息、标识符、所有权或标准信息、用户注释数据、用户偏好数据等)和/或其他输入(例如,来自指纹识别处理),并且使用这样的输入以对从解码器101输出的解码音频数据进行处理。级103可以对指示(由通过分析器111提取的节目边界元数据所表示的)单个音频节目的(从解码器101输出的)解码音频数据执行自适应响度处理,并且可以响应于接收到指示由通过分析器111提取的节目边界元数据所指示的不同的音频节目的(从解码器101输出的)解码音频数据将响度处理复位。The stage 103 of the encoder 100 is configured to perform adaptive loudness processing on the decoded audio data output from the decoder 101 based on one or more audio data characteristics indicated by the LPSM extracted by the decoder 101. The stage 103 may be an adaptive transform-domain real-time loudness and dynamic range control processor. The stage 103 may receive user input (e.g., a user target loudness/dynamic range value or a dialogue normalization value), or other metadata input (e.g., one or more types of third-party data, tracking information, identifiers, proprietary or standards information, user annotation data, user preference data, etc.), and/or other input (e.g., from a fingerprinting process), and use such input to process the decoded audio data output from the decoder 101. Stage 103 may perform adaptive loudness processing on decoded audio data (output from decoder 101) indicating a single audio program (represented by program boundary metadata extracted by analyzer 111), and may reset the loudness processing in response to receiving decoded audio data (output from decoder 101) indicating a different audio program indicated by program boundary metadata extracted by analyzer 111.
当来自验证器102的控制位指示LPSM无效时,对白响度测量子系统108可以操作以使用由解码器101提取的LPSM(和/或其他元数据)来确定表示对白(或其他语音)的(来自解码器101的)解码音频的段的响度。当来自验证器102的控制位指示LPSM有效时,当LPSM指示(来自解码器101的)解码音频的对白(或其他语音)段的先前确定的响度时,可以禁止对白响度测量子系统108的操作。子系统108可以对表示(通过分析器111所提取的节目边界元数据所指示的)单个音频节目的解码音频数据执行响度测量,并且可以响应于接收到表示由这样的节目边界元数据所指示的不同的音频节目的解码音频数据将响度处理复位。When the control bit from validator 102 indicates that the LPSM is invalid, the dialog loudness measurement subsystem 108 may be operative to use the LPSM (and/or other metadata) extracted by decoder 101 to determine the loudness of a segment of decoded audio (from decoder 101) representing dialogue (or other speech). When the control bit from validator 102 indicates that the LPSM is valid, operation of the dialog loudness measurement subsystem 108 may be disabled when the LPSM indicates a previously determined loudness of a segment of dialogue (or other speech) for the decoded audio (from decoder 101). Subsystem 108 may perform loudness measurement on decoded audio data representing a single audio program (as indicated by program boundary metadata extracted by analyzer 111), and may reset loudness processing in response to receiving decoded audio data representing a different audio program as indicated by such program boundary metadata.
存在有用的工具(例如,杜比LM100响度计)用于方便地和容易地对音频内容中的对白的电平进行测量。本发明的APU(例如,编码器100的级108)的一些实施方式被实现以包括这样的工具(或执行这样的工具的功能)来对音频比特流(例如,从编码器100的解码器101设定到级108的解码AC-3比特流)的音频内容的平均对白响度进行测量。There are useful tools (e.g., the Dolby LM100 Loudness Meter) for conveniently and easily measuring the level of dialogue in audio content. Some embodiments of the APU of the present invention (e.g., stage 108 of encoder 100) are implemented to include such a tool (or perform the functionality of such a tool) to measure the average dialogue loudness of the audio content of an audio bitstream (e.g., the decoded AC-3 bitstream provided from decoder 101 of encoder 100 to stage 108).
如果级108被实现成对音频数据的真实平均对白响度进行测量,那么测量可以包括将主要包含语音的音频内容的段分离的步骤。然后,根据响度测量算法来处理主要为语音的音频段。对于根据AC-3比特流解码的音频数据,该算法可以是标准的K加权响度测量(根据国际标准ITU-R BS1770)。可替代地,可以使用其他响度测量(例如,基于响度的心理声学模型的那些测量)。If stage 108 is implemented to measure the true average dialogue loudness of the audio data, the measurement may include a step of separating segments of audio content that primarily contain speech. The primarily speech audio segments are then processed according to a loudness measurement algorithm. For audio data decoded from an AC-3 bitstream, the algorithm may be a standard K-weighted loudness measurement (according to the international standard ITU-R BS1770). Alternatively, other loudness measures (e.g., those based on psychoacoustic models of loudness) may be used.
语音段的分离不是测量音频数据的平均对白响度所必需的。然而,它提高测量的准确度,并且通常提供来自听者感知的较满意的结果。因为不是所有的音频内容包含对白(语音),整个音频内容的响度测量可以提供语音已经存在的音频的对白电平的足够的近似。Separation of speech segments is not necessary to measure the average dialogue loudness of audio data. However, it improves the accuracy of the measurement and generally provides more favorable results from the listener's perspective. Because not all audio content contains dialogue (speech), loudness measurements of the entire audio content can provide a sufficient approximation of the dialogue level for audio where speech is present.
元数据生成器106生成(和/或传递至级107)要由级107包括在待从编码器100输出的编码比特流中。元数据生成器106可以将由编码器101和/或分析器111提取的LPSM(可选地还有LIM和/或PIM和/或节目边界元数据和/或其他元数据)传递至级107(例如,当来自验证器102的控制位指示LPSM和/或其他元数据有效时),或生成新的LIM和/或PIM和/或LPSM和/或节目边界元数据和/或其他元数据并且将新的元数据设定到级107(例如,当来自验证器102的控制位指示由解码器101提取的元数据无效时),或可以将由解码器101和/或分析器111提取的元数据与新生成的元数据的组合设定到级107。元数据生成器106可以将由子系统108生成的响度数据以及指示由子系统108执行的响度处理的类型的至少一个值包括在LPSM中,将LPSM设定到级107以用于包括在待从编码器100输出的编码比特流中。The metadata generator 106 generates (and/or passes to the stage 107) metadata to be included by the stage 107 in the encoded bitstream to be output from the encoder 100. The metadata generator 106 may pass the LPSM (and optionally the LIM and/or PIM and/or program boundary metadata and/or other metadata) extracted by the encoder 101 and/or the analyzer 111 to the stage 107 (e.g., when the control bit from the validator 102 indicates that the LPSM and/or other metadata are valid), or generate new LIM and/or PIM and/or LPSM and/or program boundary metadata and/or other metadata and set the new metadata to the stage 107 (e.g., when the control bit from the validator 102 indicates that the metadata extracted by the decoder 101 is invalid), or may set a combination of the metadata extracted by the decoder 101 and/or the analyzer 111 and the newly generated metadata to the stage 107. Metadata generator 106 may include the loudness data generated by subsystem 108 and at least one value indicating the type of loudness processing performed by subsystem 108 in an LPSM, setting the LPSM to stage 107 for inclusion in the encoded bitstream to be output from encoder 100 .
元数据生成器106可以生成用于待被包括在编码比特流和/或待被包括在编码比特流中的基本音频数据中的LPSM(可选地还有其他元数据)的解密、认证或验证中的至少一个的控制位(可以由基于散列的消息认证代码或“HMAC”组成或包括基于散列的消息认证代码或“HMAC”)。元数据生成器106可以向级107提供这样的保护位以用于包括在编码比特流中。The metadata generator 106 may generate control bits (which may consist of or include a hash-based message authentication code or "HMAC") for at least one of decryption, authentication, or verification of the LPSM (and optionally other metadata) to be included in the encoded bitstream and/or in the elementary audio data to be included in the encoded bitstream. The metadata generator 106 may provide such protection bits to the stage 107 for inclusion in the encoded bitstream.
在典型的操作中,对白响度测量子系统108对从解码器101输出的音频数据进行处理以响应于音频数据生成响度值(例如,选通的和未选通的对白响度值)和动态范围值。响应于这些值,元数据生成器106可以生成响度处理状态元数据(LPSM)以用于(由填充器/格式器107)包括在待从编码器100输出的编码比特流中。In typical operation, the dialogue loudness measurement subsystem 108 processes the audio data output from the decoder 101 to generate loudness values (e.g., gated and ungated dialogue loudness values) and dynamic range values in response to the audio data. In response to these values, the metadata generator 106 may generate loudness processing state metadata (LPSM) for inclusion (by the stuffer/formatter 107) in the encoded bitstream to be output from the encoder 100.
另外,可选地,或可替代地,编码器100的子系统106和/或108可以执行音频数据的额外的分析以生成指示音频数据的至少一个特性的元数据以用于包括在待从级107输出的编码比特流中。Additionally, optionally, or alternatively, subsystems 106 and/or 108 of encoder 100 may perform additional analysis of the audio data to generate metadata indicative of at least one characteristic of the audio data for inclusion in the encoded bitstream to be output from stage 107 .
编码器105对从选择级104输出的音频数据进行编码(例如,通过对其执行压缩),并且将编码的音频设定到级107以用于包括在待从级107输出的编码比特流中。Encoder 105 encodes the audio data output from selection stage 104 (eg, by performing compression thereon), and sets the encoded audio to stage 107 for inclusion in an encoded bitstream to be output from stage 107 .
级107将来自编码器105的编码音频和来自生成器106的元数据(包括PIM和/或SSM)进行复用以生成待从级107中输出的编码比特流,优选地使得编码比特流具有由本发明的优选实施方式指定的格式。Stage 107 multiplexes the encoded audio from encoder 105 and metadata (including PIM and/or SSM) from generator 106 to generate an encoded bitstream to be output from stage 107, preferably such that the encoded bitstream has a format specified by preferred embodiments of the invention.
帧缓冲器109为存储(例如,以非暂态方式)从级107输出的编码音频比特流的至少一个帧的缓冲存储器,然后编码音频比特流的一系列帧被从缓冲器109作为来自编码器100的输出设定至传送系统150。The frame buffer 109 is a buffer memory that stores (e.g., in a non-transitory manner) at least one frame of the encoded audio bitstream output from stage 107, and then a series of frames of the encoded audio bitstream are set from the buffer 109 as output from the encoder 100 to the transmission system 150.
由元数据生成器106生成并且由级107包括在编码比特流中的LPSM通常指示相应音频数据的响度处理状态(例如,已经对音频数据执行什么类型的响度处理)以及相应音频数据的响度(例如,测量的对白响度、选通和/或未选通的响度、和/或动态范围)。The LPSM generated by metadata generator 106 and included in the encoded bitstream by stage 107 generally indicates a loudness processing state of the corresponding audio data (e.g., what type of loudness processing has been performed on the audio data) and the loudness of the corresponding audio data (e.g., measured dialogue loudness, gated and/or ungated loudness, and/or dynamic range).
在本文中,对音频数据执行的响度和/或电平测量的“选通”是指超过阈值的计算值被包括在最终测量(例如,在最终测量的值中忽略低于-60dBFS的短期响度值)中的特定电平或响度阈值。绝对值的选通是指固定的电平或响度,而相对值的选通是指依赖于当前“未选通的”测量值的值。As used herein, "gating" of loudness and/or level measurements performed on audio data refers to a specific level or loudness threshold above which calculated values exceeding the threshold are included in the final measurement (e.g., short-term loudness values below -60 dBFS are omitted from the final measurement). Absolute gating refers to a fixed level or loudness, while relative gating refers to a value that is dependent on the current "ungated" measurement value.
在编码器100的一些实现中,缓存在存储器109(以及输出至传送系统150)的编码比特流为AC-3比特流或E-AC-3比特流,并且包括音频数据段(例如,图4中所示的帧的AB0至AB5段)和元数据段,其中音频数据段指示音频数据,并且元数据段中的至少一些中的每个包括PIM和/或SSM(以及可选地其他元数据)。级107将元数据段(包括元数据)插入到下面的格式的比特流中。包括PIM和/或SSM的元数据段中的每个元数据段被包括在比特流的无用位段(例如,图4或图7中所示的无用位段“W”)中,或比特流的帧的比特流信息(“BSI”)段的“addbsi”字段中,或比特流的帧的结束处的辅助数据字段(例如,图4或图7中所示的AUX段)。比特流的帧可以包括一个或两个元数据段,每个元数据段包括元数据,并且如果帧包括两个元数据段,一个可以存在于帧的addbsi字段中而另一个存在于帧的AUX字段中。In some implementations of encoder 100, the encoded bitstream buffered in memory 109 (and output to delivery system 150) is an AC-3 bitstream or an E-AC-3 bitstream and includes an audio data segment (e.g., segments AB0 to AB5 of a frame shown in FIG4 ) and a metadata segment, wherein the audio data segment indicates audio data and at least some of the metadata segments each include a PIM and/or SSM (and optionally other metadata). Stage 107 inserts the metadata segment (including the metadata) into the bitstream in the following format. Each of the metadata segments including the PIM and/or SSM is included in a useless bit segment of the bitstream (e.g., the useless bit segment "W" shown in FIG4 or FIG7 ), or in an "addbsi" field of a bitstream information ("BSI") segment of a frame of the bitstream, or in an auxiliary data field at the end of a frame of the bitstream (e.g., the AUX segment shown in FIG4 or FIG7 ). A frame of a bitstream may include one or two metadata segments, each metadata segment including metadata, and if a frame includes two metadata segments, one may be present in the addbsi field of the frame and the other in the AUX field of the frame.
在一些实施方式中,由级107插入的每个元数据段(在本文中有时称为“容器”)具有包括元数据段报头(可选地还包括其他强制的或“核心”元素)以及在元数据段报头之后的一个或更多个元数据有效载荷的格式。如果存在,SIM被包括在元数据有效载荷中的一个有效载荷(由有效载荷报头标识,并且通常具有第一类型的格式)中。如果存在,PIM被包括在元数据有效载荷中的另一个有效载荷(由有效载荷报头标识,并且通常具有第二类型的格式)中。类似地,元数据的每个其他类型(如果存在)被包括在元数据有效载荷中的另一有效载荷(由有效载荷报头标识,并且通常具有针对元数据的类型的格式)中。示例性格式使得能够在除了解码期间之外的时间便于访问(例如,由解码之后的后处理器、或由被配置成在没有对编码比特流执行完全解码的情况下识别元数据的处理器)SSM、PIM和其他元数据,并且允许在比特流的解码期间(例如,子流识别的)方便和高效的误差检测和校正。例如,在不以示例性格式访问SSM的情况下,解码器可能错误地识别与节目相关联的子流的正确数量。元数据段中的一个元数据有效载荷可以包括SSM,元数据段中的另一个元数据有效载荷可以包括PIM,以及可选地,元数据段中的至少一个其他元数据有效载荷可以包括其他元数据(例如,响度处理状态元数据或“LPSM”)。In some embodiments, each metadata segment (sometimes referred to herein as a "container") inserted by stage 107 has a format that includes a metadata segment header (optionally including other mandatory or "core" elements) and one or more metadata payloads following the metadata segment header. If present, the SIM is included in one of the metadata payloads (identified by the payload header and generally having a first type of format). If present, the PIM is included in another of the metadata payloads (identified by the payload header and generally having a second type of format). Similarly, each other type of metadata (if present) is included in another of the metadata payloads (identified by the payload header and generally having a format specific to the type of metadata). The exemplary format enables convenient access to the SSM, PIM, and other metadata at times other than during decoding (e.g., by a post-processor after decoding, or by a processor configured to identify metadata without performing a full decoding of the encoded bitstream), and allows for convenient and efficient error detection and correction during decoding of the bitstream (e.g., substream identification). For example, without access to the SSM in the exemplary format, a decoder may incorrectly identify the correct number of substreams associated with a program. One metadata payload in the metadata segment may include the SSM, another metadata payload in the metadata segment may include the PIM, and optionally, at least one other metadata payload in the metadata segment may include other metadata (e.g., loudness processing state metadata or "LPSM").
在一些实施方式中,(由级107)包括在编码比特流(例如,指示至少一个音频节目的E-AC-3比特流)的帧中的子流结构元数据(SSM)有效载荷包括下面的格式的SSM:In some embodiments, a substream structure metadata (SSM) payload included (by stage 107) in a frame of an encoded bitstream (e.g., an E-AC-3 bitstream indicating at least one audio program) includes an SSM in the following format:
有效载荷报头,通常包括至少一个识别值(例如,指示SSM格式版本的2位值,以及可选地长度、周期、计数和子流相关联值);以及在报头之后:A payload header, typically including at least one identification value (e.g., a 2-bit value indicating the SSM format version, and optionally length, period, count, and substream association values); and following the header:
指示由比特流指示的节目的独立子流的数量的独立子流元数据;以及independent substream metadata indicating the number of independent substreams for the program indicated by the bitstream; and
从属子流元数据,其指示:节目的每个独立子流是否具有至少一个相关联的从属子流(即,至少一个从属子流是否与所述每个独立子流相关联),以及如果是这样,与节目的每个独立子流相关联的从属子流的数量。Dependent substream metadata indicating whether each independent substream of the program has at least one associated dependent substream (i.e., whether at least one dependent substream is associated with each independent substream) and, if so, the number of dependent substreams associated with each independent substream of the program.
预期的是,编码比特流的独立子流可以指示音频节目的扬声器通道集(例如,5.1扬声器通道音频节目的扬声器通道),以及一个或更多个从属子流中的每个(与独立子流相关联,由从属子流元数据指示)可以指示节目的目标通道。然而,编码比特流的独立比特流通常指示节目的扬声器通道集,并且与独立子流相关联的每个从属子流(由从属子流元数据指示)指示节目的至少一个额外的扬声器通道。It is contemplated that an independent substream of an encoded bitstream may indicate a set of speaker channels for an audio program (e.g., the speaker channels for a 5.1 speaker channel audio program), and that each of one or more dependent substreams (associated with the independent substream, as indicated by the dependent substream metadata) may indicate a target channel for the program. However, an independent bitstream of an encoded bitstream typically indicates a set of speaker channels for a program, and each dependent substream associated with the independent substream (as indicated by the dependent substream metadata) indicates at least one additional speaker channel for the program.
在一些实施方式中,(由级107)包括在编码比特流(例如,指示至少一个音频节目的E-AC-3比特流)的帧中的节目信息元数据(PIM)有效载荷具有下面的格式:In some embodiments, a program information metadata (PIM) payload included (by stage 107) in a frame of an encoded bitstream (e.g., an E-AC-3 bitstream indicating at least one audio program) has the following format:
有效载荷报头,通常包括至少一个标识值(例如,指示PIM格式版本的值,以及可选地长度、周期、计数和子流相关联值);以及在报头之后的下面格式的PIM:A payload header, typically including at least one identification value (e.g., a value indicating the PIM format version, and optionally length, period, count, and substream association values); and, following the header, a PIM in the following format:
指示音频节目的每个静音通道和每个非静音通道(即,节目的哪些通道包含音频信息,而哪些通道(如果有)仅包含静音(通常关于帧的持续时间))的活动通道元数据。在编码比特流是AC-3或E-AC-3比特流的实施方式中,比特流的帧中的活动通道元数据可以结合比特流的额外的元数据(例如,帧的音频编码模式(“acmod”)字段,以及,如果存在,帧或相关联的从属子流帧中的chanmap字段)以确定节目的哪些通道包含音频信息而哪些通道包含静音。AC-3或E-AC-3帧的“acmod”字段指示由帧的音频内容指示的音频节目的全音域通道的数量(例如,节目是1.0通道单通道节目、2.0通道立体声节目、还是包括L、R、C、Ls、Rs全音域通道的节目),或者帧指示两个独立的1.0通道单通道节目。E-AC-3比特流的“chanmap”字段指示由比特流指示的从属子流的通道映射。活动通道元数据可以有助于实现解码器的上混合(在后处理器中)下游,例如以在解码器的输出处将音频添加至包含静音的通道;Active channel metadata indicating each silent channel and each non-silent channel of an audio program (i.e., which channels of the program contain audio information and which channels, if any, contain only silence (typically with respect to the duration of a frame)). In embodiments where the encoded bitstream is an AC-3 or E-AC-3 bitstream, the active channel metadata in a frame of the bitstream may be combined with additional metadata of the bitstream (e.g., the audio coding mode ("acmod") field of the frame and, if present, the chanmap field in the frame or associated dependent substream frame) to determine which channels of the program contain audio information and which channels contain silence. The "acmod" field of an AC-3 or E-AC-3 frame indicates the number of full-range channels of the audio program indicated by the audio content of the frame (e.g., whether the program is a 1.0 channel mono program, a 2.0 channel stereo program, or a program including L, R, C, Ls, Rs full-range channels), or whether the frame indicates two independent 1.0 channel mono programs. The "chanmap" field of an E-AC-3 bitstream indicates the channel mapping of the dependent substream indicated by the bitstream. Active channel metadata can help to implement upmixing (in a post-processor) downstream of the decoder, for example to add audio to a channel containing silence at the output of the decoder;
指示节目是否被下混合(在编码之前或在编码期间)以及如果节目被下混合则被应用的下混合的类型的下混合处理状态元数据。下混合处理状态元数据可以有助于实现解码器的上混合(在后处理器中)下游,例如以使用最匹配被应用的下混合的类型的参数对节目的音频内容进行上混合。在编码比特流是AC-3或E-AC-3比特流的实施方式中,下混合处理状态元数据可以结合帧的音频编码模型(“acmod”)字段以确定应用于节目的通道的下混合(如果有)的类型;Downmix processing state metadata indicating whether the program was downmixed (before or during encoding) and, if so, the type of downmix that was applied. The downmix processing state metadata may aid in implementing upmixing (in a post-processor) downstream of a decoder, for example, to upmix the program's audio content using parameters that best match the type of downmix that was applied. In embodiments where the encoded bitstream is an AC-3 or E-AC-3 bitstream, the downmix processing state metadata may be combined with the audio coding model ("acmod") field of the frame to determine the type of downmix (if any) applied to the program's channels;
指示在编码之前或在编码期间节目是否被上混合(例如,从较小数量的通道)以及如果节目被上混合则所应用的上混合的类型的上混合处理状态元数据。上混合处理状态元数据可以有助于实现解码器的下混合(在后处理器中)下游,例如以与应用于节目的上混合(例如,杜比定向逻辑、或杜比定向逻辑Ⅱ电影模式、或杜比定向逻辑Ⅱ音乐模式、或杜比专业上混合器)的类型一致的方式对节目的音频内容进行下混合。在编码比特流是E-AC-3比特流的实施方式中,上混合处理状态元数据可以结合其他元数据(例如,帧的“strmtyp”字段的值)以确定应用于节目的通道的上混合(如果有)的类型。(E-AC-3比特流的帧的BSI字段中的)“strmtyp”字段的值指示帧的音频内容是否属于独立流(其确定节目)或(包括多个子流或与多个子流相关联的节目的)独立子流,从而可以独立于由E-AC-3比特流指示的任何其他子流被编码,或帧的音频内容是否属于(包括多个子流或与多个子流相关联的节目的)从属子流,从而必须结合与其相关联的独立子流被解码;以及Upmix processing state metadata indicating whether the program was upmixed (e.g., from a smaller number of channels) before or during encoding, and if so, the type of upmix that was applied. The upmix processing state metadata can aid in implementing downmixing (in a post-processor) downstream of a decoder, such as downmixing the program's audio content in a manner consistent with the type of upmix applied to the program (e.g., Dolby Pro Logic, or Dolby Pro Logic II Movie mode, or Dolby Pro Logic II Music mode, or Dolby Professional upmixer). In embodiments where the encoded bitstream is an E-AC-3 bitstream, the upmix processing state metadata can be combined with other metadata (e.g., the value of a frame's "strmtyp" field) to determine the type of upmix (if any) applied to the program's channels. the value of the "strmtyp" field (in the BSI field of a frame of the E-AC-3 bitstream) indicates whether the audio content of the frame belongs to an independent stream (which identifies a program) or an independent substream (comprising multiple substreams or a program associated with multiple substreams), and thus can be encoded independently of any other substream indicated by the E-AC-3 bitstream, or whether the audio content of the frame belongs to a dependent substream (of a program comprising multiple substreams or associated with multiple substreams), and thus must be decoded in conjunction with the independent substream with which it is associated; and
预处理状态元数据,其指示:是否对帧的音频内容执行了预处理(在生成编码比特流的音频内容的编码之前),以及如果对帧音频内容执行了预处理则被执行的预处理的类型。Pre-processing status metadata indicating whether pre-processing was performed on the audio content of the frame (prior to encoding of the audio content to generate the encoded bitstream), and if so, the type of pre-processing that was performed on the audio content of the frame.
在一些实现中,预处理状态元数据指示:In some implementations, the pre-processing state metadata indicates:
是否应用环绕衰减(例如,在编码之前,音频节目的环绕通道是否被衰减3dB),whether surround attenuation is applied (e.g., whether the surround channels of an audio program are attenuated by 3dB before encoding),
是否(例如,在编码之前,对音频节目的环绕通道Ls和Rs通道)应用90°相移,whether to apply a 90° phase shift (e.g., to the surround channels Ls and Rs of the audio program before encoding),
在编码之前,是否对音频节目的LFE通道应用低通滤波器,Whether to apply a low-pass filter to the LFE channel of the audio program before encoding.
在生成期间,是否监视节目的LFE通道的电平以及如果监视了节目的LFE通道的电平则LFE通道的监视的电平相对于节目的全音域音频通道的电平,During generation, whether to monitor the level of the program's LFE channel and, if so, the monitored level of the LFE channel relative to the level of the program's full-range audio channels,
是否应当对节目的解码音频内容的每个块执行(例如,在解码器中)动态范围压缩以及如果应当对节目的解码音频内容的每个块执行动态范围压缩则待被执行的动态范围压缩的类型(和/或参数)(例如,该类型的预处理状态元数据可以指示以下压缩简档类型中的哪个由编码器假定以生成被包括在编码比特流中的动态范围压缩控制值:电影标准、电影轻度、音乐标准、音乐轻度或语音。或者,该类型的预处理状态元数据可以指示应当以由被包括在编码比特流中的动态范围压缩控制值确定的方式对节目的解码音频内容的每个帧执行重动态范围压缩(“compr”压缩)),whether dynamic range compression should be performed (e.g., in a decoder) on each block of the program's decoded audio content, and if so, the type (and/or parameters) of dynamic range compression to be performed on each block of the program's decoded audio content (e.g., the type of pre-processing state metadata may indicate which of the following compression profile types was assumed by the encoder to generate the dynamic range compression control values included in the encoded bitstream: film standard, film mild, music standard, music mild, or speech. Alternatively, the type of pre-processing state metadata may indicate that heavy dynamic range compression ("compr" compression) should be performed on each frame of the program's decoded audio content in a manner determined by the dynamic range compression control values included in the encoded bitstream),
是否使用谱扩展和/或通道耦合编码以对特定频率范围的节目内容进行编码,以及如果使用谱扩展和/或通道耦合编码以对特定频率范围的节目内容进行编码则对其执行谱扩展编码的内容的频率分量的最小频率和最大频率,以及对其执行通道耦合编码的内容的频率分量的最小频率和最大频率。该类型的预处理状态元数据信息可以有助于执行解码器的均衡(在后处理器中)下游。通道耦合信息和谱扩展信息两者都有助于在代码转换操作和应用期间优化质量。例如,编码器可以基于参数例如谱扩展和通道耦合信息的状态优化其行为(包括预处理步骤例如头戴式耳机虚拟、上混合等的自适应)。而且,编码器可以基于进入的(并且认证的)元数据的状态来动态地修改其耦合参数和谱扩展参数以匹配最佳值和/或将其耦合和谱扩展参数修改成最佳值,以及Whether spectral spreading and/or channel coupling coding is used to encode program content for a particular frequency range, and if spectral spreading and/or channel coupling coding is used to encode program content for a particular frequency range, the minimum and maximum frequencies of the frequency components of the content on which spectral spreading coding is performed, and the minimum and maximum frequencies of the frequency components of the content on which channel coupling coding is performed. This type of pre-processing state metadata information can be helpful in performing equalization downstream of the decoder (in a post-processor). Both channel coupling information and spectral spreading information help to optimize quality during transcoding operations and applications. For example, the encoder can optimize its behavior (including adaptation of pre-processing steps such as headphone virtualization, upmixing, etc.) based on parameters such as the state of spectral spreading and channel coupling information. Moreover, the encoder can dynamically modify its coupling parameters and spectral spreading parameters to match optimal values and/or modify its coupling and spectral spreading parameters to optimal values based on the state of the incoming (and authenticated) metadata, and
对白增强调整范围数据是否包括在编码比特流中,以及如果对白增强调整范围数据包括在编码比特流中,则在相对于音频节目中的非对白内容的电平调整对白内容的电平的对白增强处理(例如,在解码器的后处理器下游)的执行期间可得到的调整的范围。Whether dialogue enhancement adjustment range data is included in the encoded bitstream, and if so, the range of adjustments that may be obtained during performance of dialogue enhancement processing (e.g., in a post-processor downstream of a decoder) to adjust the level of dialogue content relative to the level of non-dialogue content in the audio program.
在一些实现中,额外的预处理状态元数据(例如,指示头戴式耳机相关的参数的元数据)被包括在(由级107)待从编码器100输出的编码比特流的PIM有效载荷中。In some implementations, additional pre-processing state metadata (eg, metadata indicating headset-related parameters) is included (by stage 107 ) in the PIM payload of the encoded bitstream to be output from encoder 100 .
在一些实现中,(由级107)包括在编码比特流(例如,指示至少一个音频节目的E-AC-3比特流)的帧中的LPSM有效载荷包括下面的格式的LPSM:In some implementations, an LPSM payload included (by stage 107) in a frame of an encoded bitstream (e.g., an E-AC-3 bitstream indicating at least one audio program) includes an LPSM in the following format:
报头(通常包括标识LPSM有效载荷的开始的同步字,在同步字之后的至少一个标识值,例如,在下面的表2中表示的LPSM格式版本、长度、周期、计数和子流关联值);以及A header (typically comprising a synchronization word identifying the start of the LPSM payload, followed by at least one identification value, such as the LPSM format version, length, period, count, and substream association value shown in Table 2 below); and
在报头之后的:After the header:
指示相应音频数据指示对白或不指示对白(例如,相应音频数据的哪些通道指示对白)的至少一个对白指示值(例如,表2的参数“对白通道”);at least one dialogue indication value (e.g., parameter “Dialogue Channel” of Table 2) indicating whether the corresponding audio data indicates dialogue or does not indicate dialogue (e.g., which channels of the corresponding audio data indicate dialogue);
指示相应的音频内容是否符合响度调整的所指示的集合的至少一个响度调整符合值(例如,表2的参数“响度调整类型”);at least one loudness adjustment compliance value of the indicated set indicating whether the corresponding audio content complies with loudness adjustment (e.g., parameter “loudness adjustment type” of Table 2);
指示已经对相应音频数据执行的响度处理的至少一种类型的至少一个响度处理值(例如,表2的参数“对白选通响度校正标志”、“响度校正类型”中的一个或更多个);以及at least one loudness processing value indicating at least one type of loudness processing that has been performed on the corresponding audio data (e.g., one or more of the parameters "Dialogue Gating Loudness Correction Flag," "Loudness Correction Type," of Table 2); and
指示相应音频数据的至少一个响度(例如,峰值或平均响度)特性的至少一个响度值(例如,表2的参数“ITU相对选通响度”、“ITU语音选通响度”、“ITU(EBU 3341)短期3s响度”和“真实峰值”中的一个或更多个)。At least one loudness value indicating at least one loudness (e.g., peak or average loudness) characteristic of corresponding audio data (e.g., one or more of the parameters “ITU Relative Gated Loudness,” “ITU Speech Gated Loudness,” “ITU (EBU 3341) Short-Term 3s Loudness,” and “True Peak” of Table 2).
在一些实现中,包含PIM和/或SSM(以及可选地其他元数据)的每个元数据段包含元数据段报头(以及可选地额外的核心元素)、以及在元数据段报头(或元数据段报头和其他核心元素)之后的具有下面的格式的至少一个元数据有效载荷段:In some implementations, each metadata segment containing a PIM and/or SSM (and optionally other metadata) includes a metadata segment header (and optionally additional core elements), and following the metadata segment header (or metadata segment header and other core elements) at least one metadata payload segment having the following format:
有效载荷报头,通常包括至少一个标识值(例如,SSM或PIM格式版本、长度、周期、计数和子流关联值),以及The payload header, which typically includes at least one identification value (e.g., SSM or PIM format version, length, period, count, and substream association value), and
在有效载荷报头之后的SSM或PIM(或另一类型的元数据)。An SSM or PIM (or another type of metadata) follows the payload header.
在一些实现中,由级107插入至比特流的帧的无用位段/跳过字段段(或“addbsi”字段或辅助数据字段)中的元数据段(在本文中有时称为“元数据容器”或“容器”)中的每个具有下面的格式:In some implementations, each of the metadata segments (sometimes referred to herein as "metadata containers" or "containers") inserted by stage 107 into the useless bit segment/skip field segment (or "addbsi" field or ancillary data field) of a frame of the bitstream has the following format:
元数据段报头(通常包括标识元数据段的开始的同步字,在同步字之后的标识值,例如,在下面的表1中表示的版本、长度、周期、扩展的元素计数和子流关联值);以及A metadata segment header (typically including a synchronization word identifying the start of the metadata segment, followed by an identification value such as version, length, period, extended element count, and substream association value as shown in Table 1 below); and
在元数据段报头之后的有助于元数据段或相应音频数据的元数据的至少一个的解密、认证或验证中的至少一种的至少一个保护值(例如表1的HMAC摘要和音频指纹值);以及at least one protection value (e.g., the HMAC digest and audio fingerprint value of Table 1) following the metadata segment header that facilitates at least one of decryption, authentication, or verification of at least one of the metadata segment or metadata of corresponding audio data; and
也在元数据段报头之后的标识每个下面的元数据有效载荷中的元数据的类型并且指示每个这样的有效载荷的配置(例如,尺寸)的至少一个方面的元数据有效载荷标识(“ID”)值和有效载荷配置值。Also following the metadata segment header are metadata payload identification ("ID") values and payload configuration values that identify the type of metadata in each following metadata payload and indicate at least one aspect of the configuration (e.g., size) of each such payload.
每个元数据有效载荷在相应有效载荷ID值和有效载荷配置值之后。Each metadata payload is followed by the corresponding payload ID value and payload configuration value.
在一些实施方式中,在帧的无用位段(或辅助数据字段或“addbsi”字段)中的元数据段中的每个具有三种等级的结构:In some embodiments, each of the metadata fields in the useless bits field (or ancillary data field or "addbsi" field) of a frame has three levels of structure:
高等级结构(例如,元数据段报头),包括指示无用位(或辅助数据或addbsi)字段是否包括元数据的标志、指示存在什么类型的元数据的至少一个ID值、以及通常还有指示(例如,每个类型的)元数据的多少位存在(如果元数据存在的话)的值。可以存在的元数据的一种类型为PIM,可以存在的元数据的另一类型为SSM,而可以存在的元数据的其他类型为LPSM、和/或节目边界元数据、和/或媒体搜索元数据;a high-level structure (e.g., a metadata section header) that includes a flag indicating whether the useless bits (or ancillary data or addbsi) field includes metadata, at least one ID value indicating what type of metadata is present, and typically also a value indicating how many bits of metadata (e.g., of each type) are present (if any metadata is present). One type of metadata that may be present is PIM, another type of metadata that may be present is SSM, and still other types of metadata that may be present are LPSM, and/or program boundary metadata, and/or media search metadata;
中间等级结构,包括与每个所标识的类型的元数据相关联的数据(例如,元数据有效载荷报头、保护值、以及关于每个所标识的类型的元数据的有效载荷ID值和有效载荷配置值);以及an intermediate hierarchical structure comprising data associated with each identified type of metadata (e.g., a metadata payload header, a protection value, and a payload ID value and a payload configuration value for each identified type of metadata); and
低等级结构,包括关于每个所标识的类型的元数据的元数据有效载荷(例如,如果PIM被识别为正存在,一系列PIM值,和/或如果该其他类型的元数据被识别为正存在,另一类型(例如,SSM或LPSM)的元数据值)。A low-level structure that includes a metadata payload for each identified type of metadata (e.g., if PIM is identified as being present, a series of PIM values, and/or if that other type of metadata is identified as being present, metadata values of another type (e.g., SSM or LPSM)).
这样三个等级结构中的数据值可以被嵌套。例如,由高等级结构和中间等级结构标识的每个有效载荷(例如,每个PIM、或SSM或其他数据有效载荷)的保护值可以被包括在有效载荷之后(从而在有效载荷的元数据有效载荷报头之后),或由高等级结构和中间等级结构标识的所有元数据有效载荷的保护值可以被包括在元数据段中的最终元数据有效载荷之后(从而在元数据段的所有有效载荷的元数据有效载荷报头之后)。In this way, data values in the three hierarchical structures can be nested. For example, the protection value for each payload identified by the high-level structure and the middle-level structure (e.g., each PIM, or SSM, or other data payload) can be included after the payload (and thus after the metadata payload header of the payload), or the protection value for all metadata payloads identified by the high-level structure and the middle-level structure can be included after the final metadata payload in the metadata segment (and thus after the metadata payload headers of all payloads in the metadata segment).
在(参照图8的元数据段或“容器”将要描述的)一个示例中,元数据段报头标识4个元数据有效载荷。如图8所示,元数据段报头包括容器同步字(被标识为“容器同步”)以及版本和键ID值。元数据段报头之后是4个元数据有效载荷和保护位。第一有效载荷(例如,PIM有效载荷)的有效载荷ID值和有效载荷配置(例如,有效载荷尺寸)值在元数据段报头之后,第一有效载荷本身在ID和配置值之后,第二有效载荷(例如,SSM有效载荷)的有效载荷ID值和有效载荷配置(例如,有效载荷尺寸)值在第一有效载荷之后,第二有效载荷本身在这些ID和配置值之后,第三有效载荷(例如,LPSM有效载荷)的有效载荷ID值和有效载荷配置(例如,有效载荷尺寸)值在第二有效载荷之后,第三有效载荷本身在这些ID和配置值之后,第四有效载荷的有效载荷ID值和有效载荷配置(例如,有效载荷尺寸)值在第三有效载荷之后,第四有效载荷本身在这些ID和配置值之后,而关于有效载荷中的全部或一些有效载荷(或关于高等级结构和中间等级结构以及有效载荷中的全部或一些有效载荷)的保护值(在图8中被标识为“保护数据”)在最后一个有效载荷之后。In one example (described with reference to the metadata segment or "container" in FIG8 ), the metadata segment header identifies four metadata payloads. As shown in FIG8 , the metadata segment header includes a container synchronization word (identified as "Container Sync"), as well as version and key ID values. Following the metadata segment header are the four metadata payloads and protection bits. The payload ID value and payload configuration (e.g., payload size) value of the first payload (e.g., PIM payload) follow the metadata segment header, the first payload itself follows the ID and configuration values, the payload ID value and payload configuration (e.g., payload size) value of the second payload (e.g., SSM payload) follows the first payload, the second payload itself follows these ID and configuration values, the payload ID value and payload configuration (e.g., payload size) value of the third payload (e.g., LPSM payload) follows the second payload, the third payload itself follows these ID and configuration values, the payload ID value and payload configuration (e.g., payload size) value of the fourth payload follows the third payload, the fourth payload itself follows these ID and configuration values, and protection values (identified as "protection data" in FIG. 8 ) regarding all or some of the payloads (or regarding the high-level structure and the intermediate-level structure and all or some of the payloads) follow the last payload.
在一些实施方式中,如果解码器101接收根据本发明的实施方式生成的具有加密散列的音频比特流,则解码器被配置成根据由比特流确定的数据块对加密散列进行分析和检索,其中所述块包括元数据。验证器102可以使用加密散列对所接收的比特流和/或相关联的元数据进行验证。例如,如果验证器102基于参考加密散列与从数据块检索到的加密散列之间的匹配发现元数据是有效的,那么可以禁止处理器103对相应的音频数据的操作,并且使得选择级104通过(未改变的)音频数据。另外,可选地或可替代地,可以使用其他类型的加密技术替代基于加密散列的方法。In some embodiments, if the decoder 101 receives an audio bitstream having a cryptographic hash generated according to an embodiment of the present invention, the decoder is configured to analyze and retrieve the cryptographic hash based on data blocks determined by the bitstream, wherein the blocks include metadata. The verifier 102 can use the cryptographic hash to verify the received bitstream and/or associated metadata. For example, if the verifier 102 finds that the metadata is valid based on a match between a reference cryptographic hash and a cryptographic hash retrieved from the data block, then the processor 103 can be prohibited from operating on the corresponding audio data and the selection stage 104 can be caused to pass the (unchanged) audio data. In addition, other types of encryption techniques can be used, optionally or alternatively, instead of the cryptographic hash-based approach.
图2的编码器100可以确定(响应于由解码器101提取的LPSM以及可选地还响应于节目边界元数据)后处理/预处理单元已经(在元件105、106和107中)对待编码的音频数据执行了一种类型的响度处理,因此可以(在生成器106中)创建包括用于先前执行的响度处理的和/或根据先前执行的响度处理得到的具体参数的响度处理状态元数据。在一些实现中,只要编码器知道已经对音频内容执行的处理的类型,编码器100就可以创建指示对音频内容的处理历史的元数据(以及将其包括在从编码器输出的编码比特流中)。The encoder 100 of FIG2 can determine (in response to the LPSM extracted by the decoder 101 and optionally also in response to program boundary metadata) that the post-processing/pre-processing unit has performed a type of loudness processing on the audio data to be encoded (in elements 105, 106, and 107), and can therefore create (in generator 106) loudness processing state metadata that includes specific parameters for and/or derived from the previously performed loudness processing. In some implementations, the encoder 100 can create metadata indicating the processing history of the audio content (and include it in the encoded bitstream output from the encoder) as long as the encoder knows the type of processing that has been performed on the audio content.
图3是为本发明的音频处理单元的实施方式的解码器(200)以及耦接至解码器(200)的后处理器(300)的框图。后处理器(300)也是本发明的音频处理单元的实施方式。编码器200和后处理器300的部件或元件中的任何一个可以以硬件、软件或硬件和软件的组合被实现为一个或更多个处理和/或一个或更多个电路(例如,ASIC、FPGA或其他集成电路)。解码器200包括如所示地连接的帧缓冲器201、分析器205、音频解码器202、音频状态验证级(验证器)203以及控制位生成级204。通常,解码器200还包括其他处理元件(未示出)。3 is a block diagram of a decoder (200) and a post-processor (300) coupled to the decoder (200), which is an embodiment of an audio processing unit of the present invention. The post-processor (300) is also an embodiment of an audio processing unit of the present invention. Any of the components or elements of the encoder 200 and the post-processor 300 can be implemented as one or more processes and/or one or more circuits (e.g., ASICs, FPGAs, or other integrated circuits) in hardware, software, or a combination of hardware and software. The decoder 200 includes a frame buffer 201, an analyzer 205, an audio decoder 202, an audio state verification stage (verifier) 203, and a control bit generation stage 204, connected as shown. Typically, the decoder 200 also includes other processing elements (not shown).
帧缓冲器201(缓冲存储器)存储(例如,以非暂态方式)由解码器200接收的编码音频比特流的至少一个帧。编码音频比特流的帧序列被从缓冲器201设定到分析器205。The frame buffer 201 (buffer memory) stores (eg, in a non-transitory manner) at least one frame of the encoded audio bitstream received by the decoder 200. A sequence of frames of the encoded audio bitstream is set from the buffer 201 to the analyzer 205.
耦接分析器205并且将其配置成从编码输入音频的每个帧中提取PIM和/或SSM(可选地还提取其他元数据,例如,LPSM),将元数据中的至少一些(例如,LPSM和节目边界元数据,如果任意一个被提取的话,和/或PIM和/或SSM)设定到音频状态验证器203和级204,将所提取的元数据设定为(例如对后处理器300的)输出,从编码输入音频中提取音频数据,以及将所提取的音频数据设定到解码器202。The analyzer 205 is coupled and configured to extract the PIM and/or SSM (and optionally other metadata, e.g., LPSM) from each frame of the encoded input audio, set at least some of the metadata (e.g., LPSM and program boundary metadata, if any is extracted, and/or the PIM and/or SSM) to the audio state validator 203 and stage 204, set the extracted metadata as output (e.g., to the post-processor 300), extract audio data from the encoded input audio, and set the extracted audio data to the decoder 202.
输入至解码器200的编码音频比特流可以是AC-3比特流、E-AC-3比特流或杜比E比特流中的一个。The encoded audio bitstream input to the decoder 200 may be one of an AC-3 bitstream, an E-AC-3 bitstream, or a Dolby E bitstream.
图3的系统还包括后处理器300。后处理器300包括帧缓冲器301和包括耦接至缓冲器301的至少一个处理元件的其他处理元件(未示出)。帧缓冲器301存储(例如,以非暂态方式)由后处理器300从解码器200接收的解码音频比特流的至少一个帧。耦接后处理器300的处理元件并且将其配置成接收从缓冲器301输出的解码音频比特流的一系列帧并且使用从解码器200输出的元数据和/或从解码器200的级204输出的控制位对其进行自适应处理。通常,后处理器300被配置成使用来自解码器200的元数据对解码音频数据执行自适应处理(例如,使用LPSM值以及可选地还使用节目边界元数据对解码音频数据执行自适应响度处理,其中自适应处理可以基于响度处理状态、和/或由指示单个音频节目的音频数据的LPSM所指示的一个或更多个音频数据特性)。The system of FIG3 also includes a post-processor 300. Post-processor 300 includes a frame buffer 301 and other processing elements (not shown) including at least one processing element coupled to buffer 301. Frame buffer 301 stores (e.g., in a non-transitory manner) at least one frame of a decoded audio bitstream received by post-processor 300 from decoder 200. The processing elements of post-processor 300 are coupled and configured to receive a series of frames of the decoded audio bitstream output from buffer 301 and adaptively process them using metadata output from decoder 200 and/or control bits output from stage 204 of decoder 200. Generally, post-processor 300 is configured to perform adaptive processing on the decoded audio data using metadata from decoder 200 (e.g., to perform adaptive loudness processing on the decoded audio data using LPSM values and, optionally, program boundary metadata, where the adaptive processing may be based on a loudness processing state and/or one or more audio data characteristics indicated by an LPSM indicating audio data of a single audio program).
解码器200和后处理器300的各种实现被配置成执行本发明的方法的不同的实施方式。Various implementations of the decoder 200 and the post-processor 300 are configured to perform different embodiments of the method of the present invention.
解码器200的音频解码器202被配置成对由分析器205提取的音频数据进行解码以生成解码音频数据,并且将解码音频数据设定为(例如对后处理器300的)输出。The audio decoder 202 of the decoder 200 is configured to decode the audio data extracted by the analyzer 205 to generate decoded audio data, and set the decoded audio data as output (eg, to the post-processor 300 ).
状态验证器203被配置成对设定到其的元数据进行认证和验证。在一些实施方式中,元数据为(或被包括在)已经被包括在输入比特流(例如,根据本发明的实施方式)中的数据块。块可以包括用于对元数据和/或基本音频数据(从分析器205和/或解码器202提供至验证器203)进行处理的加密散列(基于散列的消息认证代码或“HMAC”)。数据块可以在这些实施方式中被数字地标记,使得下游的音频处理单元可以相对容易地认证和验证处理状态元数据。The state validator 203 is configured to authenticate and verify metadata set thereto. In some embodiments, the metadata is (or is included in) a data block that has been included in an input bitstream (e.g., according to an embodiment of the present invention). The block may include a cryptographic hash (hash-based message authentication code or "HMAC") for processing the metadata and/or the base audio data (provided to the validator 203 from the analyzer 205 and/or decoder 202). The data block can be digitally signed in these embodiments so that downstream audio processing units can relatively easily authenticate and verify the processing state metadata.
包括但不限于一个或更多个非HMAC加密方法中的任意一个的其他加密方法可以用于元数据的验证(例如,在验证器203中)以确保元数据和/或基本的音频数据的安全的传输和接收。例如,验证(使用这样的加密方法)可以在接收本发明的音频比特流的实施方式的每个音频处理单元中被执行以确定包括在该比特流中的元数据和相应音频数据是否已经经历(和/或产生于)具体的处理(由元数据所指示的)并且在这样的具体的处理执行之后没有被修改。Other encryption methods, including but not limited to any one of one or more non-HMAC encryption methods, may be used for metadata verification (e.g., in the verifier 203) to ensure secure transmission and reception of metadata and/or underlying audio data. For example, verification (using such an encryption method) may be performed in each audio processing unit that receives an embodiment of the audio bitstream of the present invention to determine whether the metadata and corresponding audio data included in the bitstream have undergone (and/or resulted from) a specific processing (indicated by the metadata) and have not been modified after such specific processing has been performed.
状态验证器203将控制数据设定到控制位生成器204,和/或将控制数据设定为输出(例如,设定到后处理器300)以指示验证操作的结果。响应于控制数据(以及可选地从输入比特流中提取的其他元数据),级204可以生成(以及设定到后处理器300):The state validator 203 sets the control data to the control bit generator 204 and/or sets the control data as an output (e.g., to the post-processor 300) to indicate the result of the validation operation. In response to the control data (and optionally other metadata extracted from the input bitstream), the stage 204 may generate (and set to the post-processor 300):
指示从解码器202输出的解码音频数据已经经历特定类型的响度处理(当LPSM指示从解码器202输出的音频数据已经经历该特定类型的响度处理,并且来自验证器203的控制位指示LPSM有效时)的控制位;或a control bit indicating that the decoded audio data output from the decoder 202 has been subjected to a particular type of loudness processing (when the LPSM indicates that the audio data output from the decoder 202 has been subjected to the particular type of loudness processing and the control bit from the validator 203 indicates that the LPSM is valid); or
指示从解码器202输出的解码音频数据应当经历特定类型的响度处理(例如,当LPSM指示从解码器202输出的音频数据没有经历具体类型的响度处理,或当LPSM指示从解码器202输出的音频数据已经经历该特定类型的响度处理但来自验证器203的控制位指示LPSM无效时)的控制位。A control bit indicating that the decoded audio data output from the decoder 202 should be subjected to a particular type of loudness processing (e.g., when the LPSM indicates that the audio data output from the decoder 202 has not been subjected to the particular type of loudness processing, or when the LPSM indicates that the audio data output from the decoder 202 has been subjected to the particular type of loudness processing but the control bit from the validator 203 indicates that the LPSM is invalid).
或者,解码器200将由解码器202从输入比特流中提取的元数据以及由分析器205从输入比特流中提取的元数据设定到后处理器300,并且后处理器300使用元数据对解码音频数据执行自适应处理,或执行元数据的验证,然后如果验证指示元数据有效,则使用元数据对解码音频数据执行自适应处理。Alternatively, the decoder 200 sets the metadata extracted from the input bit stream by the decoder 202 and the metadata extracted from the input bit stream by the analyzer 205 to the post-processor 300, and the post-processor 300 performs adaptive processing on the decoded audio data using the metadata, or performs verification of the metadata, and then performs adaptive processing on the decoded audio data using the metadata if the verification indicates that the metadata is valid.
在一些实施方式中,如果解码器200接收根据本发明的使用加密散列的实施方式生成的的音频比特流,则解码器被配置成对来自由比特流所确定的数据块的加密散列进行分析和检索,所述块包括响度处理状态元数据(LPSM)。验证器203可以使用加密散列以对接收的比特流和/或相关联的元数据进行验证。例如,如果验证器203基于参考加密散列与从数据块检索的加密散列之间的匹配发现LPSM有效,那么可以用向下游的音频处理单元(例如,可以是或包括音量校平单元的后处理器300)发信号以通过(未改变的)比特流的音频数据。另外地,可选地或可替代地,可以使用其他类型的加密技术替代基于加密散列的方法。In some embodiments, if decoder 200 receives an audio bitstream generated according to an embodiment of the present invention using cryptographic hashing, the decoder is configured to analyze and retrieve cryptographic hashes from data blocks identified by the bitstream, the blocks comprising loudness processing state metadata (LPSM). Validator 203 can use the cryptographic hashes to validate the received bitstream and/or associated metadata. For example, if validator 203 finds that the LPSM is valid based on a match between a reference cryptographic hash and a cryptographic hash retrieved from the data block, then a signal can be sent to a downstream audio processing unit (e.g., post-processor 300, which may be or include a volume leveling unit) to pass the audio data of the bitstream (unchanged). Additionally, optionally or alternatively, other types of cryptographic techniques can be used in place of cryptographic hash-based approaches.
在解码器200的一些实现中,所接收(以及缓存在存储器201中)的编码比特流为AC-3比特流或E-AC-3比特流,并且包括音频数据段(例如,图4所示的帧的AB0至AB5段)和元数据段,其中音频数据段指示音频数据,而元数据段中的至少一些中的每个包括PIM或SSM(或其他元数据)。解码器级202(和/或分析器205)被配置成从比特流中提取元数据。元数据段中的包括PIM和/或SSM(可选地还包括其他元数据)的每个元数据段被包括在比特流的帧的无用位段中,或比特流的帧的比特流信息(“BSI”)段的“addbsi”字段中,或比特流的帧的结束处的辅助数据字段(例如,图4所示的AUX段)中。比特流的帧可以包括一个或两个元数据段,其中每个元数据段包括元数据,并且如果帧包括两个元数据段,一个可以存在于帧的addbsi字段中而另一个存在于帧的AUX字段中。In some implementations of decoder 200, the encoded bitstream received (and buffered in memory 201) is an AC-3 bitstream or an E-AC-3 bitstream and includes an audio data segment (e.g., segments AB0 to AB5 of a frame shown in FIG. 4 ) and a metadata segment, wherein the audio data segment indicates audio data and at least some of the metadata segments each include a PIM or SSM (or other metadata). Decoder stage 202 (and/or analyzer 205) is configured to extract metadata from the bitstream. Each metadata segment, including the PIM and/or SSM (and optionally other metadata), is included in a useless bit field of a frame of the bitstream, in an "addbsi" field of a bitstream information ("BSI") segment of a frame of the bitstream, or in an auxiliary data field at the end of a frame of the bitstream (e.g., the AUX segment shown in FIG. 4 ). A frame of the bitstream may include one or two metadata segments, wherein each metadata segment includes metadata, and if a frame includes two metadata segments, one may be present in the addbsi field of the frame and the other in the AUX field of the frame.
在一些实施方式中,缓存在缓冲器201中的比特流的每个元数据段(在本文中有时称为“容器”)具有包括元数据段报头(可选地还包括其他强制的或“核心”元素)、以及在元数据段报头之后的一个或更多个元数据有效载荷的格式。如果存在,SIM被包括在元数据有效载荷中的一个有效载荷(由有效载荷报头标识,并且通常具有第一类型的格式)中。如果存在,PIM被包括在元数据有效载荷中的另一个有效载荷(由有效载荷报头标识,并且通常具有第二类型的格式)中。类似地,元数据的其他类型(如果存在)被包括在元数据有效载荷中的另一有效载荷(由有效载荷报头标识,并且通常具有针对元数据的类型的格式)中。示例性格式使得能够在除了解码期间之外的时间方便访问(例如,由解码之后的后处理器300、或由被配置成在没有对编码比特流执行完全解码的情况下识别元数据的处理器)SSM、PIM和其他元数据,并且允许在比特流的解码期间(例如,子流识别的)方便和高效的误差检测和校正。例如,在不以示例性格式访问SSM的情况下,解码器200可能错误地识别与节目相关联的子流的正确数量。元数据段中的一个元数据有效载荷可以包括SSM,元数据段中的另一个元数据有效载荷可以包括PIM,以及可选地,元数据段中的至少一个其他元数据有效载荷可以包括其他元数据(例如,响度处理状态元数据或“LPSM”)。In some embodiments, each metadata segment (sometimes referred to herein as a "container") of a bitstream buffered in buffer 201 has a format that includes a metadata segment header (optionally including other mandatory or "core" elements) and one or more metadata payloads following the metadata segment header. If present, the SIM is included in one of the metadata payloads (identified by the payload header and generally having a first type of format). If present, the PIM is included in another of the metadata payloads (identified by the payload header and generally having a second type of format). Similarly, other types of metadata (if present) are included in another of the metadata payloads (identified by the payload header and generally having a format specific to the type of metadata). This exemplary format enables convenient access to the SSM, PIM, and other metadata at times other than during decoding (e.g., by post-processor 300 after decoding, or by a processor configured to identify metadata without performing a full decoding of the encoded bitstream), and allows for convenient and efficient error detection and correction during decoding of the bitstream (e.g., substream identification). For example, without access to the SSM in the exemplary format, decoder 200 may incorrectly identify the correct number of substreams associated with a program. One metadata payload in the metadata segment may include the SSM, another metadata payload in the metadata segment may include the PIM, and optionally, at least one other metadata payload in the metadata segment may include other metadata (e.g., loudness processing state metadata or "LPSM").
在一些实施方式中,包括在缓存在缓冲器201中的编码比特流(例如,指示至少一个音频节目的E-AC-3比特流)的帧中的子流结构元数据(SSM)有效载荷包括下面的格式的SSM:In some embodiments, a substream structure metadata (SSM) payload included in a frame of a coded bitstream (e.g., an E-AC-3 bitstream indicating at least one audio program) buffered in buffer 201 includes an SSM in the following format:
有效载荷报头,通常包括至少一个标识值(例如,指示SSM格式版本的2位值,以及可选地长度、周期、计数和子流关联值);以及A payload header, typically including at least one identification value (e.g., a 2-bit value indicating the SSM format version, and optionally length, period, count, and substream association values); and
在报头之后:After the header:
指示由比特流指示的节目的独立子流的数量的独立子流元数据;以及independent substream metadata indicating the number of independent substreams for the program indicated by the bitstream; and
从属子流元数据,其指示:节目的每个独立子流是否具有至少一个与其相关联的从属子流,以及如果节目的每个独立子流具有至少一个与其相关联的从属子流,与节目的每个独立子流相关联的从属子流的数量。Dependent substream metadata indicating whether each independent substream of the program has at least one dependent substream associated with it, and if so, the number of dependent substreams associated with each independent substream of the program.
在一些实施方式中,缓存在缓冲器201中的编码比特流(例如,指示至少一个音频节目的E-AC-3比特流)的帧中的包括的节目信息元数据(PIM)有效载荷具有下面的格式:In some embodiments, the program information metadata (PIM) payload included in a frame of an encoded bitstream (e.g., an E-AC-3 bitstream indicating at least one audio program) buffered in buffer 201 has the following format:
有效载荷报头,通常包括至少一个标识值(例如,指示PIM格式版本的值,以及可选地长度、周期、计数和子流关联值);以及在报头之后,下面的格式的PIM:A payload header, typically including at least one identification value (e.g., a value indicating the PIM format version, and optionally length, period, count, and substream association values); and, following the header, a PIM in the following format:
音频节目的每个静音通道和每个非静音通道(即,节目的哪些通道包含音频信息,而哪些通道(如果有)仅包含静音(通常关于帧的持续时间))的活动通道元数据。在编码比特流是AC-3或E-AC-3比特流的实施方式中,比特流的帧中的活动通道元数据可以结合比特流的额外的元数据(例如,帧的音频编码模式(“acmod”)字段,以及如果存在,帧或相关联的从属子流帧中的chanmap字段)以确定节目的哪些通道包含音频信息而哪些通道包含静音;active channel metadata for each silent channel and each non-silent channel of an audio program (i.e., which channels of the program contain audio information and which channels, if any, contain only silence (typically with respect to the duration of a frame)). In embodiments where the coded bitstream is an AC-3 or E-AC-3 bitstream, the active channel metadata in a frame of the bitstream may be combined with additional metadata of the bitstream (e.g., the audio coding mode ("acmod") field of the frame and, if present, the chanmap field in the frame or associated dependent substream frames) to determine which channels of the program contain audio information and which channels contain silence;
下混合处理状态元数据,其指示:节目是否被下混合(在编码之前或在编码期间),以及如果节目被下混合,所应用的下混合的类型。下混合处理状态元数据可以有助于实现解码器的上混合(在后处理器300中)下游,例如以使用最匹配所应用的下混合的类型的参数对节目的音频内容进行上混合。在编码比特流是AC-3或E-AC-3比特流的实施方式中,下混合处理状态元数据可以结合帧的音频编码模型(“acmod”)字段以确定应用于节目的通道的下混合(如果有)的类型;Downmix processing state metadata that indicates whether the program was downmixed (before or during encoding), and if so, the type of downmix that was applied. The downmix processing state metadata may facilitate upmixing downstream of the decoder (in the post-processor 300), for example, to upmix the program's audio content using parameters that best match the type of downmix that was applied. In embodiments where the encoded bitstream is an AC-3 or E-AC-3 bitstream, the downmix processing state metadata may be combined with the frame's audio coding model ("acmod") field to determine the type of downmix (if any) applied to the program's channels;
上混合处理状态元数据,其指示:在编码之前或在编码期间节目是否被上混合(例如,从较小数量的通道),以及如果节目被上混合,所应用的上混合的类型。上混合处理状态元数据可以有助于实现解码器的下混合(在后处理器中)下游,例如以与应用于节目的上混合(例如,杜比定向逻辑、或杜比定向逻辑Ⅱ电影模式、或杜比定向逻辑Ⅱ音乐模式、或杜比专业上混合器)的类型一致的方式对节目的音频内容进行下混合。在编码比特流是E-AC-3比特流的实施方式中,上混合处理状态元数据可以结合其他元数据(例如,帧的“strmtyp”字段的值)以确定应用于节目的通道的上混合(如果有)的类型。(E-AC-3比特流的帧的BSI字段中的)“strmtyp”字段的值指示帧的音频内容是否属于独立流(其确定节目)或(包括多个子流或与多个子流相关联的节目的)独立子流,从而可以独立于由E-AC-3比特流所指示的任何其他子流被编码,或帧的音频内容是否属于(包括多个子流或与多个子流相关联的节目的)从属子流,从而必须结合与其相关联的独立子流而被解码;以及Upmix processing state metadata that indicates whether the program was upmixed (e.g., from a smaller number of channels) before or during encoding, and if so, the type of upmix that was applied. The upmix processing state metadata can aid in implementing downmixing (in a post-processor) downstream at a decoder, for example, by downmixing the program's audio content in a manner consistent with the type of upmix applied to the program (e.g., Dolby Pro Logic, or Dolby Pro Logic II Movie mode, or Dolby Pro Logic II Music mode, or Dolby Professional upmixer). In embodiments where the encoded bitstream is an E-AC-3 bitstream, the upmix processing state metadata can be combined with other metadata (e.g., the value of a frame's "strmtyp" field) to determine the type of upmix (if any) applied to the program's channels. the value of the "strmtyp" field (in the BSI field of a frame of the E-AC-3 bitstream) indicates whether the audio content of the frame belongs to an independent stream (which identifies a program) or an independent substream (comprising multiple substreams or a program associated with multiple substreams), and thus can be encoded independently of any other substream indicated by the E-AC-3 bitstream, or whether the audio content of the frame belongs to a dependent substream (of a program comprising multiple substreams or associated with multiple substreams), and thus must be decoded in conjunction with the independent substream with which it is associated; and
预处理状态元数据,其指示:是否对帧的音频内容执行了预处理(在生成编码比特流的音频内容的编码之前),以及如果对帧音频内容执行了预处理,被执行的预处理的类型。Pre-processing status metadata that indicates whether pre-processing was performed on the audio content of the frame (prior to encoding of the audio content to generate the encoded bitstream), and if so, the type of pre-processing that was performed.
在一些实现中,预处理状态元数据指示:In some implementations, the pre-processing state metadata indicates:
是否应用了环绕衰减(例如,在编码之前,音频节目的环绕通道是否被衰减了3dB),whether surround attenuation is applied (e.g., whether the surround channels of the audio program are attenuated by 3dB before encoding),
是否(例如,在编码之前对音频节目的环绕通道Ls和Rs通道)应用了90°相移,whether a 90° phase shift is applied (e.g. to the surround channels Ls and Rs of the audio program before encoding),
在编码之前,是否对音频节目的LFE通道应用了低通滤波器,Whether to apply a low-pass filter to the LFE channel of the audio program before encoding,
在生成期间,是否监视节目的LFE通道的电平,以及如果监视了节目的LFE通道的电平,相对于节目的全音域音频通道的电平的LFE通道的监视电平,During generation, whether to monitor the level of the program's LFE channel, and if so, the monitored level of the LFE channel relative to the level of the program's full-range audio channels,
是否应当对节目的解码音频的每个块执行(例如,在解码器中)动态范围压缩,以及如果应当对节目的解码音频的每个块执行动态范围压缩,要执行的动态范围压缩的类型(和/或参数)(例如,该类型的预处理状态元数据可以指示下面的压缩简档类型中的哪种类型由编码器假定以生成被包括在编码比特流中的动态范围压缩控制值:电影标准、电影轻度、音乐标准、音乐轻度或语音。或者,预处理状态元数据的该类型可以指示应当以由被包括在编码比特流中的动态范围压缩控制值确定的方式对节目的解码音频内容的每个帧执行重动态范围压缩(“compr”压缩)),whether dynamic range compression should be performed (e.g., in a decoder) on each block of the program's decoded audio, and if so, the type (and/or parameters) of dynamic range compression to be performed (e.g., the type of pre-processing state metadata may indicate which of the following compression profile types was assumed by the encoder to generate the dynamic range compression control values included in the encoded bitstream: Film Standard, Film Mild, Music Standard, Music Mild, or Speech. Alternatively, the type of pre-processing state metadata may indicate that heavy dynamic range compression ("compr" compression) should be performed on each frame of the program's decoded audio content in a manner determined by the dynamic range compression control values included in the encoded bitstream),
是否使用谱扩展和/或通道耦合编码以对特定频率范围的节目的内容进行编码,以及如果使用谱扩展和/或通道耦合编码以对特定频率范围的节目的内容进行编码,对其执行谱扩展编码的内容的频率分量的最小频率和最大频率,以及对其执行通道耦合编码的内容的频率分量的最小频率和最大频率。该类型的预处理状态元数据信息可以有助于执行解码器的均衡(在后处理器中)下游。通道耦合信息和谱扩展信息两者也有助于在代码转换操作和应用期间优化质量。例如,编码器可以基于参数(例如谱扩展和通道耦合信息)的状态优化其行为(包括预处理步骤例如头戴式耳机虚拟、上混合等的自适应)。而且,编码器可以基于进入的(并且认证的)元数据的状态动态地修改其耦合和谱扩展参数以匹配最佳值和/或将其耦合和谱扩展参数修改成最佳值,以及Whether spectral spreading and/or channel coupling coding is used to encode the content of a program in a specific frequency range, and if spectral spreading and/or channel coupling coding is used to encode the content of a program in a specific frequency range, the minimum and maximum frequencies of the frequency components of the content on which spectral spreading coding is performed, and the minimum and maximum frequencies of the frequency components of the content on which channel coupling coding is performed. This type of pre-processing state metadata information can be helpful in performing equalization downstream of the decoder (in a post-processor). Both channel coupling information and spectral spreading information also help to optimize quality during transcoding operations and applications. For example, the encoder can optimize its behavior (including adaptation of pre-processing steps such as headphone virtualization, upmixing, etc.) based on the state of parameters (such as spectral spreading and channel coupling information). Moreover, the encoder can dynamically modify its coupling and spectral spreading parameters to match optimal values and/or modify its coupling and spectral spreading parameters to optimal values based on the state of the incoming (and authenticated) metadata, and
对白增强调整范围数据是否包括在编码比特流中,以及如果对白增强调整范围数据包括在编码比特流中,在相对于音频节目中的非对白内容的电平调整对白内容的电平的对白增强处理(例如,在解码器的后处理器下游)的执行期间可得到的调整范围。Whether dialogue enhancement adjustment range data is included in the encoded bitstream, and if so, the adjustment range available during performance of dialogue enhancement processing (e.g., in a post-processor downstream of a decoder) to adjust the level of dialogue content relative to the level of non-dialogue content in the audio program.
在一些实施方式中,包括在缓存在缓冲器201中的编码比特流(例如,指示至少一个音频节目的E-AC-3比特流)的帧中的LPSM有效载荷包括下面的格式的LPSM:In some embodiments, the LPSM payload included in a frame of an encoded bitstream (e.g., an E-AC-3 bitstream indicating at least one audio program) buffered in the buffer 201 includes an LPSM in the following format:
报头(通常包括标识LPSM有效载荷的开始的同步字,在同步字之后的至少一个标识值,例如,在下面的表2中指示的LPSM格式版本、长度、周期、计数和子流关联值);以及A header (typically comprising a synchronization word identifying the start of the LPSM payload, followed by at least one identification value, such as the LPSM format version, length, period, count, and substream association value indicated in Table 2 below); and
在报头之后的:After the header:
指示相应音频数据指示对白或不指示对白(例如,相应音频数据的哪些通道指示对白)的至少一个对白表示值(例如,表2的参数“对白通道”);at least one dialogue representation value (e.g., parameter “Dialogue Channel” of Table 2) indicating whether the corresponding audio data indicates dialogue or does not indicate dialogue (e.g., which channels of the corresponding audio data indicate dialogue);
指示相应音频内容是否符合响度调整的所指示的集合的至少一个响度调整符合值(例如,表2的参数“响度调整类型”);at least one loudness adjustment compliance value of the indicated set indicating whether the corresponding audio content complies with loudness adjustment (e.g., parameter “loudness adjustment type” of Table 2);
指示已经对相应音频数据执行的至少一种类型的响度处理的至少一个响度处理值(例如,表2的参数“对白选通响度校正标志”、“响度校正类型”中的一个或更多个);以及at least one loudness processing value indicating at least one type of loudness processing that has been performed on the corresponding audio data (e.g., one or more of the parameters "Dialogue Gating Loudness Correction Flag," "Loudness Correction Type," of Table 2); and
指示相应音频数据的至少一个响度(例如,峰值或平均响度)特性的至少一个响度值(例如,表2的参数“ITU相对选通响度”、“ITU语音选通响度”、“ITU(EBU 3341)短期3s响度”和“真实峰值”中的一个或更多个)。At least one loudness value indicating at least one loudness (e.g., peak or average loudness) characteristic of corresponding audio data (e.g., one or more of the parameters “ITU Relative Gated Loudness,” “ITU Speech Gated Loudness,” “ITU (EBU 3341) Short-Term 3s Loudness,” and “True Peak” of Table 2).
在一些实现中,分析器205(和/或解码器级202)被配置成从比特流的帧的无用位段或“addbsi”字段或辅助数据段中提取具有下面的格式的每个元数据段:In some implementations, the analyzer 205 (and/or the decoder stage 202) is configured to extract each metadata segment having the following format from the useless bits segment or "addbsi" field or the auxiliary data segment of a frame of the bitstream:
元数据段报头(通常包括标识元数据段的开始的同步字,同步字之后的标识值,例如版本、长度、周期、扩展的元素计数和子流关联值);以及A metadata segment header (typically including a synchronization word that identifies the start of the metadata segment, followed by identification values such as version, length, period, extended element count, and substream association value); and
在元数据段报头之后的有助于元数据段或相应音频数据的元数据的至少一个的解密、认证或验证中的至少一种的至少一个保护值(例如,表1的HMAC摘要和音频指纹值);以及at least one protection value (e.g., the HMAC digest and audio fingerprint value of Table 1) following the metadata segment header that facilitates at least one of decryption, authentication, or verification of at least one of the metadata segment or metadata of corresponding audio data; and
也在元数据段报头之后的标识每个下面的元数据有效载荷中的元数据的类型并且表示每个这样的有效载荷的配置(例如,尺寸)的至少一个方面的元数据有效载荷标识(“ID”)值和有效载荷配置值。Also following the metadata segment header are metadata payload identification ("ID") values and payload configuration values that identify the type of metadata in each following metadata payload and represent at least one aspect of the configuration (e.g., size) of each such payload.
每个元数据有效载荷段(优选地具有上面指定的格式)在相应的元数据有效载荷ID值和元数据配置值之后。Each metadata payload segment (preferably having the format specified above) is followed by a corresponding metadata payload ID value and metadata configuration values.
更一般地,由本发明的优选实施方式生成的编码音频比特流具有提供将元数据元素和子元素标记为核心的(强制的)或扩展的(可选的)元素或子元素的机制的结构。这使得比特流(包括其元数据)的数据速率能够扩展到大量的应用。优选的比特流语法的核心的(强制的)元素还应当能够用信号通知与音频内容相关联的扩展的(可选的)元素存在于(带中)和/或远程位置(带外)。More generally, the coded audio bitstream generated by the preferred embodiment of the present invention has a structure that provides a mechanism for marking metadata elements and sub-elements as core (mandatory) or extended (optional) elements or sub-elements. This enables the data rate of the bitstream (including its metadata) to be scalable to a large number of applications. The core (mandatory) elements of the preferred bitstream syntax should also be able to signal the presence of extended (optional) elements associated with the audio content (in-band) and/or remotely (out-of-band).
要求核心元素存在于比特流的每个帧中。核心元素的一些子元素是可选的,并且可以以任何组合存在。不要求扩展元素存在于每个帧中(以限制比特率总开销)。从而,扩展元素可以存在于一些帧中而不存于其他帧中。扩展元素的一些子元素是可选的,并且可以以任何组合存在,然而,扩展元素的一些子元素可以是强制的(即,如果扩展元素存在于比特流的帧中)。Core elements are required to be present in every frame of the bitstream. Some sub-elements of core elements are optional and may be present in any combination. Extension elements are not required to be present in every frame (to limit bitrate overhead). Thus, extension elements may be present in some frames and not in others. Some sub-elements of extension elements are optional and may be present in any combination, however, some sub-elements of extension elements may be mandatory (i.e., if the extension element is present in a frame of the bitstream).
在一类实施方式中,生成(例如,通过实现本发明的音频处理单元)包括一系列音频数据段和元数据段的编码音频比特流。音频数据段指示音频数据,元数据段中的至少一些中的每个包括PIM和/或SSM(以及可选地至少一种其他类型的元数据),并且音频数据段被与元数据段时分复用。在该类中的优选实施方式中,元数据段中的每个具有在本文中要描述的优选的格式。In one class of embodiments, an encoded audio bitstream is generated (e.g., by implementing an audio processing unit of the present invention) comprising a series of audio data segments and metadata segments. The audio data segments indicate audio data, at least some of the metadata segments each comprise a PIM and/or SSM (and optionally at least one other type of metadata), and the audio data segments are time-division multiplexed with the metadata segments. In a preferred embodiment of this class, each of the metadata segments has a preferred format as described herein.
在一种优选的格式中,编码比特流为AC-3比特流或E-AC-3比特流,并且元数据段中的包括SSM和/或PIM的每个元数据段被包括(例如,由编码器100的优选的实现的级107)作为比特流的帧的比特流信息(“BSI”)段的“addbsi”字段(图6所示)、或比特流的帧的辅助数据字段中、或比特流的帧的无用位段中的额外的比特流信息。In a preferred format, the encoded bitstream is an AC-3 bitstream or an E-AC-3 bitstream, and each metadata segment including the SSM and/or PIM in the metadata segment is included (e.g., by stage 107 of a preferred implementation of encoder 100) as additional bitstream information in an "addbsi" field (shown in FIG. 6) of a bitstream information ("BSI") segment of a frame of the bitstream, or in an auxiliary data field of a frame of the bitstream, or in a useless bit segment of a frame of the bitstream.
在优选格式中,帧中的每个包括帧的无用位段(或addbsi字段)中的元数据段(在本文中有时也称为元数据容器或容器)。元数据段具有下面表1中所示的强制的元素(统一称为“核心元素”)(并且可以包括表1中所示的可选元素)。表1中所示的需要的元素中的至少一些被包括在元数据段的元数据段报头中,但一些可以被包括在元数据段的其他位置:In a preferred format, each of the frames includes a metadata segment (also sometimes referred to herein as a metadata container or container) in the useless bit field (or addbsi field) of the frame. The metadata segment has the mandatory elements (collectively referred to as "core elements") shown in Table 1 below (and may include the optional elements shown in Table 1). At least some of the required elements shown in Table 1 are included in the metadata segment header of the metadata segment, but some may be included in other locations in the metadata segment:
表1Table 1
在优选格式中,包含SSM、PIM或LPSM的每个元数据段(在编码比特流的帧的无用位段或addbsi或辅助数据字段中)包含元数据段报头(以及可选地额外的核心元素)、以及在元数据段报头(或元数据段报头和其他核心元素)之后的一个或更多个元数据有效载荷。每个元数据有效载荷包括被包括在有效载荷中的元数据有效载荷报头(指示元数据的具体类型(例如,SSM、PIM或LPSM)),之后是具体类型的元数据。通常,元数据有效载荷报头包括下面的值(参数):In the preferred format, each metadata segment containing an SSM, PIM, or LPSM (in the useless bit segment or addbsi or auxiliary data field of a frame of the coded bitstream) contains a metadata segment header (and optionally additional core elements), and one or more metadata payloads following the metadata segment header (or metadata segment header and other core elements). Each metadata payload includes a metadata payload header (indicating the specific type of metadata (e.g., SSM, PIM, or LPSM)) included in the payload, followed by metadata of the specific type. In general, the metadata payload header includes the following values (parameters):
在元数据段报头(可以包括在表1中指定的值)之后的有效载荷ID(标识元数据的类型,例如,SSM、PIM或LPSM);A payload ID (identifying the type of metadata, e.g., SSM, PIM, or LPSM) following the metadata segment header (which may include the values specified in Table 1);
在有效载荷ID之后的有效载荷配置值(通常指示有效载荷的大小);The payload configuration value following the payload ID (usually indicating the size of the payload);
以及可选地还包括额外的有效载荷配置值(例如,指示从帧的开始处到有效载荷涉及的第一音频样本的音频样本的数量的偏置值,以及有效载荷优先权值,例如,指示其中有效载荷可以被丢弃的条件)。and optionally also including additional payload configuration values (e.g., an offset value indicating the number of audio samples from the start of the frame to the first audio sample to which the payload relates, and a payload priority value, e.g., indicating conditions under which a payload may be discarded).
通常,有效载荷的元数据具有下面的格式中的一种:Typically, payload metadata has one of the following formats:
有效载荷的元数据为SSM,包括指示由比特流指示的节目的独立子流的数量的独立子流元数据;以及从属子流元数据,其指示:节目的每个独立子流是否具有与其相关联的至少一个从属子流,以及如果节目的每个独立子流具有与其相关联的至少一个从属子流,与节目的每个独立子流相关联的从属子流的数量;The metadata of the payload is an SSM, comprising independent substream metadata indicating the number of independent substreams of the program indicated by the bitstream; and dependent substream metadata indicating whether each independent substream of the program has at least one dependent substream associated therewith, and if each independent substream of the program has at least one dependent substream associated therewith, the number of dependent substreams associated with each independent substream of the program;
有效载荷的元数据为PIM,包括指示音频节目的哪些通道包含音频信息以及哪些通道(如果有)仅包含静音(通常关于帧的持续时间)的活动通道元数据;下混合处理状态元数据,其指示节目是否被下混合(在编码之前或在编码期间),以及如果节目被下混合,被应用的下混合的类型;上混合处理状态元数据,其指示在编码之前或在编码期间节目是否被上混合(例如,从较小数量的通道),以及如果节目被上混合,被应用的上混合的类型;以及预处理状态元数据,其指示是否(在生成编码比特流的音频内容的编码之前)对帧的音频数据执行了预处理,以及如果对帧的音频数据执行了预处理,执行的预处理的类型;或The metadata of the payload is a PIM, including active channel metadata indicating which channels of the audio program contain audio information and which channels, if any, contain only silence (typically for the duration of a frame); downmix processing state metadata indicating whether the program was downmixed (before or during encoding), and if the program was downmixed, the type of downmixing applied; upmix processing state metadata indicating whether the program was upmixed (e.g., from a smaller number of channels) before or during encoding, and if the program was upmixed, the type of upmixing applied; and preprocessing state metadata indicating whether preprocessing was performed on the audio data of the frame (before encoding to generate the audio content of the coded bitstream), and if so, the type of preprocessing performed on the audio data of the frame; or
有效载荷的元数据为LPSM,该LPSM具有如下面的表(表2)所指示的格式:The metadata of the payload is LPSM having a format as indicated in the following table (Table 2):
表2Table 2
在根据本发明而生成的编码比特流的另一优选格式中,比特流为AC-3比特流或E-AC-3比特流,并且元数据段中的包括PIM和/或SSM(可选地还包括至少一个其他类型的元数据)的每个元数据段(例如,由编码器100的优选实现的级107)被包括在下列中的任一个中:比特流的帧的无用位段;或比特流的帧的比特流信息(“BSI”)段的“addbsi”字段(图6所示);或比特流的帧的结束处的辅助数据字段(例如,图4中所示的AUX段)。帧可以包括一个或两个元数据段,元数据段中的每个包括PIM和/或SSM,并且(在一些实施方式中)如果帧包括两个元数据段,一个可以存在于帧的addbsi字段中而另一个存在于帧的AUX字段中。每个元数据段优选地具有参照上面的表1在上面所指定的格式(即,包括在表1中所指定的核心元素,在核心元素之后是有效载荷ID值(标识元数据段的每个有效载荷中的元数据的类型)和有效载荷配置值,以及每个元数据有效载荷)。包括LPSM的每个元数据段优选地具有参照上面的表1和表2在上面所指定的格式(即,包括在表1中所指定的核心元素,在核心元素之后是有效载荷ID(标识元数据作为LPSM)以及有效载荷配置值,之后是有效载荷(具有如表2中所指示的格式的LPSM数据))。In another preferred format of an encoded bitstream generated according to the present invention, the bitstream is an AC-3 bitstream or an E-AC-3 bitstream, and each metadata segment (e.g., by stage 107 of a preferred implementation of encoder 100) that includes a PIM and/or SSM (and optionally at least one other type of metadata) is included in any of the following: a useless bit segment of a frame of the bitstream; or an "addbsi" field of a bitstream information ("BSI") segment of a frame of the bitstream (as shown in FIG6); or an auxiliary data field at the end of a frame of the bitstream (e.g., the AUX segment shown in FIG4). A frame may include one or two metadata segments, each of which includes a PIM and/or SSM, and (in some embodiments) if a frame includes two metadata segments, one may be present in the addbsi field of the frame and the other in the AUX field of the frame. Each metadata segment preferably has the format specified above with reference to Table 1 above (i.e., includes the core elements specified in Table 1, followed by a payload ID value (identifying the type of metadata in each payload of the metadata segment) and a payload configuration value, and each metadata payload). Each metadata segment including an LPSM preferably has the format specified above with reference to Tables 1 and 2 above (i.e., includes the core elements specified in Table 1, followed by a payload ID (identifying the metadata as an LPSM) and a payload configuration value, followed by a payload (LPSM data having the format indicated in Table 2)).
在另一优选格式中,编码比特流为杜比E比特流,并且元数据段中的包括PIM和/或SSM(可选地还包括其他元数据)的每个元数据段为杜比E保护带间隔的第一N样本位置。包括这样的包括LPSM的元数据段的杜比E比特流优选地包括指示在SMPTE 337M前同步信号的Pd字中用信号通知的LPSM有效载荷长度的值(SMPTE 337M Pa字重复频率优选地保持与相关联的视频帧速率相同)。In another preferred format, the encoded bitstream is a Dolby E bitstream, and each of the metadata segments including the PIM and/or SSM (and optionally other metadata) is the first N sample positions of a Dolby E guard band interval. A Dolby E bitstream including such a metadata segment including the LPSM preferably includes a value indicating the LPSM payload length signaled in the Pd word of the SMPTE 337M preamble (the SMPTE 337M Pa word repetition frequency preferably remains the same as the associated video frame rate).
在优选的格式中,其中编码比特流为E-AC-3比特流,元数据段中的包括PIM和/或SSM(可选地还包括LPSM和/或其他元数据)的每个元数据段(例如,由编码器100的优选实现的级107)被包括作为比特流的帧的无用位段或比特流信息(“BSI”)段的“addbsi”字段中的额外的比特流信息。接下来对以该优选的格式使用LPSM对E-AC-3比特流进行编码的额外的方面进行描述:In a preferred format, where the encoded bitstream is an E-AC-3 bitstream, each metadata segment in the metadata segment that includes the PIM and/or SSM (and optionally also the LPSM and/or other metadata) is included (e.g., by stage 107 of a preferred implementation of encoder 100) as additional bitstream information in an "addbsi" field of a useless bit segment or bitstream information ("BSI") segment of a frame of the bitstream. Additional aspects of encoding an E-AC-3 bitstream using the LPSM in this preferred format are described below:
1.在E-AC-3比特流的生成期间,尽管E-AC-3编码器(将LPSM值插入待比特流中)是“活动的”,对于每个生成的帧(同步帧),比特流应当包括在帧的addbsi字段(或无用位段)中携带的元数据块(包括LPSM)。要求携带元数据块的比特不应当增加编码器比特率(帧长度);1. During the generation of the E-AC-3 bitstream, while the E-AC-3 encoder (which inserts the LPSM value into the bitstream to be generated) is "active", for each generated frame (synchronization frame), the bitstream shall include the metadata block (including LPSM) carried in the addbsi field (or useless bit field) of the frame. It is required that the bits carrying the metadata block shall not increase the encoder bit rate (frame length);
2.每个元数据块(包含LPSM)应当包含下面的信息:2. Each metadata block (including LPSM) should contain the following information:
响度校正类型标志:其中,“1”指示相应的音频数据的响度在编码器的上游被校正,而“0”指示响度由嵌入在编码器中的响度校正器(例如,图2的编码器100的响度处理器103)校正;Loudness correction type flag: where "1" indicates that the loudness of the corresponding audio data is corrected upstream of the encoder, while "0" indicates that the loudness is corrected by a loudness corrector embedded in the encoder (e.g., the loudness processor 103 of the encoder 100 of FIG. 2 );
语音通道:指示哪些源通道包含语音(在先前的0.5秒)。如果没有检测到语音,应当如此指示;Speech channels: Indicates which source channels contain speech (in the previous 0.5 seconds). If no speech is detected, this should be indicated as such;
语音响度:指示包括语音(在先前的0.5秒)的每个相应的音频通道的综合语音响度;Speech Loudness: Indicates the integrated speech loudness of each corresponding audio channel including speech (in the previous 0.5 seconds);
ITU响度:指示每个相应音频通道的综合ITU BS.1770-3响度;以及ITU Loudness: Indicates the combined ITU BS.1770-3 loudness of each corresponding audio channel; and
增益:解码器中的逆变的响度复合增益(以表明可逆性);Gain: The inverted loudness composite gain in the decoder (to indicate reversibility);
3.当E-AC-3编码器(将LPSM值插入到比特流中)是“活动的”,并且正在接收具有“信任”标志的AC-3帧时,编码器中的响度控制器(例如,图2的编码器100的响度处理器103)应当被旁路。“信任的”源对白归一化和DRC值应当被传递(例如,由编码器100的生成器106)至E-AC-3编码器部件(例如,编码器100的级107)。LPSM块生成继续,并且响度校正类型标志被设置成“1”。响度控制器旁路序列必须被同步至“信任”标志出现的解码AC-3帧的开始。响度控制器旁路序列应当被如下实现:校平器量控制跨10个音频块周期(即,53.3毫秒)从值9减少到值0,并且校平器返回结束计量器控制被置于旁路模式(该操作应当导致无缝转换)。调节器的术语“信任的”旁路暗示源比特流的对白归一化值还在编码的输出端处被重新利用。(例如,若果该“信任的”源比特流具有-30的对白归一化值,则编码器的输出应当利用-30用于输出对白归一化值);3. When the E-AC-3 encoder (which inserts LPSM values into the bitstream) is "active" and is receiving AC-3 frames with the "trust" flag, the loudness controller in the encoder (e.g., loudness processor 103 of encoder 100 of FIG. 2 ) should be bypassed. The "trusted" source dialogue normalization and DRC values should be passed (e.g., by generator 106 of encoder 100 ) to the E-AC-3 encoder components (e.g., stage 107 of encoder 100 ). LPSM block generation continues, and the loudness correction type flag is set to "1." The loudness controller bypass sequence must be synchronized to the start of the decoded AC-3 frame where the "trust" flag appears. The loudness controller bypass sequence should be implemented as follows: the leveler amount control is reduced from a value of 9 to a value of 0 over 10 audio block periods (i.e., 53.3 milliseconds), and the leveler returns to the end. The meter control is placed in bypass mode (this operation should result in a seamless transition). The term "trusted" bypass by the conditioner implies that the source bitstream's dialogue normalization value is also reused at the encoder's output. (For example, if the "trusted" source bitstream has a dialogue normalization value of -30, then the encoder's output should use -30 for the output dialogue normalization value).
4.当E-AC-3编码器(将LPSM值插入到比特流中)是“活动的”,并且正在接收不具有“信任”标志的AC-3帧时,编码器中嵌入的响度控制器(例如,图2的编码器100的响度处理器103)应当是活动的。LPSM块生成继续,并且响度校正类型标志被设置成“0”。响度控制器激活序列应当被同步至其中“信任”标志消失的解码AC-3帧的开始。响度控制器激活序列应当被如下实现:校平器量控制跨1个音频块周期(例如,5.3毫秒)从值0增加至值9,并且校平器返回结束计量器控制被置于“活动的”模式(该操作应当导致无缝转换,并且包括返回结束计量器综合复位);以及4. When the E-AC-3 encoder (which inserts LPSM values into the bitstream) is "active" and is receiving AC-3 frames without the "trust" flag, the loudness controller embedded in the encoder (e.g., loudness processor 103 of encoder 100 of FIG. 2 ) should be active. LPSM block generation continues, and the loudness correction type flag is set to "0." The loudness controller activation sequence should be synchronized to the start of the decoded AC-3 frame where the "trust" flag disappears. The loudness controller activation sequence should be implemented as follows: the Leveler Amount control is increased from a value of 0 to a value of 9 across 1 audio block period (e.g., 5.3 milliseconds), and the Leveler Return End Meter control is placed in "active" mode (this operation should result in a seamless transition and include a return end meter integrated reset); and
5.在编码期间,图形用户接口(GUI)应当给用户指示下面的参数:“输入音频节目:[信任的/不信任的]”—该参数的状态基于输入信号内的“信任”标志的存在;以及“实时响度校正:[启用/禁用]”—该参数的状态基于编码器中嵌入的响度控制器是否是活动的。5. During encoding, the Graphical User Interface (GUI) should indicate the following parameters to the user: "Input Audio Program: [Trusted/Untrusted]" - the state of this parameter is based on the presence of a "Trusted" flag within the input signal; and "Real-time Loudness Correction: [Enabled/Disabled]" - the state of this parameter is based on whether the loudness controller embedded in the encoder is active.
当对使LSPM(以优选的格式)包括在比特流的每个帧的无用位段或跳过字段段或比特流信息(“BSI”)段的“addbsi”字段中的AC-3或E-AC-3比特流进行解码时,解码器应当对(无用位段或addbsi字段中的)LPSM块数据进行分析并且将全部所提取的LPSM值传递至图形用户接口(GUI)。在每帧刷新所提取的LPSM值的集合。When decoding an AC-3 or E-AC-3 bitstream that has LSPM (in the preferred format) included in the useless bit segment or skip field segment or the "addbsi" field of the bitstream information ("BSI") segment of each frame of the bitstream, the decoder should parse the LPSM block data (in the useless bit segment or addbsi field) and pass all extracted LPSM values to the graphical user interface (GUI). The set of extracted LPSM values is refreshed at each frame.
在根据本发明而生成的编码比特流的另一优选格式中,编码比特流为AC-3比特流或E-AC-3比特流,并且元数据段中的包括PIM和/或SSM(可选地还包括LPSM和/或其他元数据)的每个元数据段(例如,由编码器100的优选的实现的级107)被包括在比特流的帧的无用位段或AUX段中或作为比特流信息(“BSI”)段的“addbsi”字段(图6所示)中的额外的比特流信息。在该格式(为关于上面参照表1和表2所描述的格式的变型)中,包含LPSM的addbsi(或AUX或无用位)字段中的每个字段包含下面的LPSM值:In another preferred format of an encoded bitstream generated according to the present invention, the encoded bitstream is an AC-3 bitstream or an E-AC-3 bitstream, and each metadata segment including the PIM and/or SSM (and optionally also the LPSM and/or other metadata) in the metadata segment (e.g., by stage 107 of a preferred implementation of encoder 100) is included in the useless bits segment or AUX segment of a frame of the bitstream or as additional bitstream information in the "addbsi" field (shown in FIG. 6) of the bitstream information ("BSI") segment. In this format (which is a variation of the format described above with reference to Tables 1 and 2), each field in the addbsi (or AUX or useless bits) field containing the LPSM contains the following LPSM values:
表1中所指定的核心元素,之后是有效载荷ID(标识元数据作为LPSM)和有效载荷值,之后是具有下面的格式(与上面表2中所示的强制元素类似)的有效载荷(LPSM数据):The core elements specified in Table 1 are followed by a payload ID (identifying the metadata as LPSM) and a payload value, followed by a payload (LPSM data) having the following format (similar to the mandatory elements shown in Table 2 above):
LPSM有效载荷的版本:指示LPSM有效载荷的版本的2位字段;LPSM payload version: a 2-bit field indicating the version of the LPSM payload;
dialchan:指示包含口语对白的相应音频数据的左、右和/或中央通道的3位字段。dialchan字段的位分配可以如下:指示左通道中存在对白的位0被存储在dialchan字段的最高有效位中;而指示中央通道中存在对白的位2被存储在dialchan字段的最低有效位中。如果在节目的前0.5秒期间相应通道包含口语对白,则dialchan字段的每个位被设置为“1”;dialchan: A 3-bit field indicating the left, right, and/or center channel of the corresponding audio data containing spoken dialogue. The bit assignment of the dialchan field may be as follows: bit 0, indicating the presence of dialogue in the left channel, is stored in the most significant bit of the dialchan field; and bit 2, indicating the presence of dialogue in the center channel, is stored in the least significant bit of the dialchan field. Each bit of the dialchan field is set to "1" if the corresponding channel contains spoken dialogue during the first 0.5 seconds of the program;
loudregtyp:指示节目响度符合哪个响度调整标准的4位字段。将“loudregtyp”字段设置为“0000”指示LPSM不指示响度调整符合。例如,该字段的一个值(例如,0000)可以指示未指示符合响度调整标准,该字段的另一值(例如,0001)可以指示节目的音频数据符合ATSC A/85标准,并且该字段的另一值(例如,0010)可以指示节目的音频数据符合EBU R128标准。在该示例中,如果该字段被设置为除了“0000”之外的任何值,则有效载荷中随后应该是loudcorrdialgat和loudcorrtyp字段;loudregtyp: A 4-bit field that indicates which loudness adjustment standard the program loudness complies with. Setting the "loudregtyp" field to "0000" indicates that the LPSM does not indicate loudness adjustment compliance. For example, one value of this field (e.g., 0000) may indicate that no loudness adjustment standard compliance is indicated, another value of this field (e.g., 0001) may indicate that the program's audio data complies with the ATSC A/85 standard, and another value of this field (e.g., 0010) may indicate that the program's audio data complies with the EBU R128 standard. In this example, if this field is set to any value other than "0000", the loudcorrdialgat and loudcorrtyp fields should follow in the payload;
loudcorrdialgat:指示是否已经应用对白选通校正的1位字段。如果已经使用对白选通校正了节目的响度,则loudcorrdialgat字段的值被设置为“1”。否则,被设置为“0”;loudcorrdialgat: A 1-bit field indicating whether dialogue gating correction has been applied. If the program's loudness has been corrected using dialogue gating, the value of the loudcorrdialgat field is set to "1". Otherwise, it is set to "0";
loudcorrtyp:指示对节目应用的响度校正的类型的1位字段。如果已经使用无限超前(基于文件的)响度校正处理校正了节目的响度,则loudcorrtyp字段的值被设置为“0”。如果已经使用实时响度测量和动态范围控制的组合校正了节目的响度,则该字段的值被设置为“1”;loudcorrtyp: A 1-bit field indicating the type of loudness correction applied to the program. If the program's loudness has been corrected using an infinite look-ahead (file-based) loudness correction process, the value of the loudcorrtyp field is set to "0". If the program's loudness has been corrected using a combination of real-time loudness measurement and dynamic range control, the value of this field is set to "1";
loudrelgate:指示相对选通节目响度(ITU)是否存在的1位字段。如果loudrelgate字段被设置为“1”,则有效载荷中随后应该是7位ituloudrelgat字段;loudrelgate: A 1-bit field indicating whether the relative gated program loudness (ITU) is present. If the loudrelgate field is set to "1", then the 7-bit ituloudrelgat field should follow in the payload;
loudrelgat:指示相对选通节目响度(ITU)的7位字段。该字段指示由于正在应用的对白归一化和动态范围压缩(DRC),在没有任何增益调整的情况下根据ITU-R BS.1770-3而测量的音频节目的综合的响度。0至127的值被解释为以0.5LKFS步长的-58LKFS至+5.5LKFS;loudrelgat: A 7-bit field indicating the relative gated program loudness (ITU). This field indicates the integrated loudness of the audio program measured according to ITU-R BS.1770-3 without any gain adjustment due to dialogue normalization and dynamic range compression (DRC) being applied. Values from 0 to 127 are interpreted as -58 LKFS to +5.5 LKFS in steps of 0.5 LKFS;
loudspchgate:指示语音选通响度数据(ITU)是否存在的1位字段。如果loudspchgate字段被设置为“1”,则效载荷中随后应是7位loudspchgat字段;loudspchgate: A 1-bit field indicating whether speech gated loudness data (ITU) is present. If the loudspchgate field is set to "1", the payload shall be followed by a 7-bit loudspchgat field;
loudspchgate:指示语音选通节目响度的7位字段。该字段指示由于正在应用的对白归一化和动态范围压缩,在没有任何增益调整的情况下根据ITU-R BS.1770-3的公式(2)而测量的整个相应音频节目的综合响度。0至127的值被解释为以0.5LKFS步长的-58LKFS至+5.5LKFS;loudspchgate: A 7-bit field indicating the loudness of the speech-gated program. This field indicates the integrated loudness of the entire corresponding audio program measured according to equation (2) of ITU-R BS.1770-3 without any gain adjustment due to dialogue normalization and dynamic range compression being applied. Values from 0 to 127 are interpreted as -58 LKFS to +5.5 LKFS in steps of 0.5 LKFS;
loudstrm3e:指示短期(3秒)响度数据是否存在的1位字段。如果该字段被设置为“1”,则有效载荷中随后应是7位loudstrm3s字段;loudstrm3e: A 1-bit field indicating whether short-term (3 seconds) loudness data is present. If this field is set to "1", the payload shall be followed by a 7-bit loudstrm3s field;
loudstrm3s:指示由于正在应用的对白归一化和动态范围压缩,在没有任何增益调整的情况下根据ITU-R BS.1771-1而测量的相应音频节目的前3秒的未选通响度的7位字段。0至256的值被解释为以0.5LKFS步长的-116LKFS至+11.5LKFS;loudstrm3s: A 7-bit field indicating the ungated loudness of the first 3 seconds of the corresponding audio program measured according to ITU-R BS.1771-1 without any gain adjustment due to dialogue normalization and dynamic range compression being applied. Values from 0 to 256 are interpreted as -116 LKFS to +11.5 LKFS in steps of 0.5 LKFS;
truepke:指示真实峰值响度数据是否存在的1位字段。如果truepke字段被设置为“1”,则有效载荷中随后应是8位truepk字段;以及TruePKE: A 1-bit field indicating whether true peak loudness data is present. If the TruePKE field is set to "1", then the 8-bit TruePK field shall follow in the payload; and
truepk:指示由于正在应用的对白归一化和动态范围压缩,在没有任何增益调整的情况下根据ITU-R BS.1770-3的附件2而测量的节目真实峰值样本值的8位字段。0至256的值被解释为以0.5LKFS步长的-116LKFS至+11.5LKFS。truepk: An 8-bit field indicating the true peak sample value of the program as measured according to Annex 2 of ITU-R BS.1770-3 without any gain adjustment due to dialogue normalization and dynamic range compression being applied. Values from 0 to 256 are interpreted as -116LKFS to +11.5LKFS in steps of 0.5LKFS.
在一些实施方式中,AC-3比特流或E-AC-3比特流的帧的无用位段或辅助数据(或“addbsi”)字段中的元数据段的核心元素包括元数据段报头(通常包括标识值,例如,版本),以及在元数据段报头之后的:指示元数据段的元数据是否包括指纹数据(或其他保护值)的值、指示(与对应于元数据段的元数据的音频数据有关的)外部数据是否存在的值、关于由核心元素标识的每种类型的元数据(例如,PIM和/或SSM和/或LPSM和/或一种类型的元数据)的有效载荷ID值和有效载荷配置值、以及由元数据段报头(或元数据段的其他核心元素)标识的至少一种类型的元数据的保护值。元数据段的元数据有效载荷在元数据段报头之后,并且(在有些情况下)嵌套在元数据段的核心元素内。In some embodiments, the core elements of a metadata segment in a useless bit field or ancillary data (or "addbsi") field of a frame of an AC-3 bitstream or an E-AC-3 bitstream include a metadata segment header (typically including an identification value, such as a version), and following the metadata segment header: a value indicating whether the metadata of the metadata segment includes fingerprint data (or other protection value), a value indicating whether external data (related to the audio data corresponding to the metadata of the metadata segment) is present, a payload ID value and a payload configuration value for each type of metadata identified by the core element (e.g., PIM and/or SSM and/or LPSM and/or a type of metadata), and a protection value for at least one type of metadata identified by the metadata segment header (or other core elements of the metadata segment). The metadata payload of the metadata segment follows the metadata segment header and (in some cases) is nested within the core elements of the metadata segment.
本发明的实施方式可以以硬件、固件、或软件、或硬件和软件的组合(例如,作为可编程逻辑阵列)被实现。除非另外指明,作为本发明的部分而被包括在内的算法或处理不内在涉及任何特定的计算机或其他设备。具体地,各种通用机器可以利用根据本文中的教示而编写的程序而被使用,或可以更加便于构造更具体的装置(例如,集成电路)以执行所需要的方法步骤。从而,本发明可以以在一个或更多个可编程计算机系统(例如,图1的元件、或图2的编码器100(或编码器的元件)、或图3的解码器(或解码器的元件)、或图3的后处理器(或后处理器的元件)中任意一种的实施)上执行的一个或更多个计算机程序而被实现,每个可编程计算机系统包括至少一个处理器、至少一个数据存储系统(包括易失性和非易失性存储器和/或存储元件)、至少一个输入装置或端口以及至少一个输出装置或端口。程序代码被应用于输入数据以执行本文中所描述的功能并生成输出信息。输出信息以已知的方式应用于一个或更多个输出装置。The embodiments of the present invention can be implemented in hardware, firmware, or software, or a combination of hardware and software (e.g., as a programmable logic array). Unless otherwise specified, the algorithm or processing included as part of the present invention does not inherently relate to any specific computer or other device. Specifically, various general-purpose machines can be used using programs written according to the teachings herein, or can be more convenient to construct more specific devices (e.g., integrated circuits) to perform required method steps. Thus, the present invention can be implemented by one or more computer programs executed on one or more programmable computer systems (e.g., the implementation of any one of the elements of FIG. 1 , or the encoder 100 (or elements of an encoder) of FIG. 2 , or the decoder (or elements of a decoder) of FIG. 3 , or the post-processor (or elements of a post-processor) of FIG. 3 ), each programmable computer system including at least one processor, at least one data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device or port, and at least one output device or port. Program code is applied to input data to perform the functions described herein and generate output information. Output information is applied to one or more output devices in a known manner.
每个这样的程序可以以任何期望的计算机语言(包括机器、汇编或高级过程的、逻辑的或面向对象的编程语言)实现以与计算机系统通信。在任何情况下,语言可以是编译语言或解释语言。Each such program can be implemented in any desired computer language (including machine, assembly or high-level procedural, logical or object-oriented programming languages) to communicate with a computer system. In any case, the language can be a compiled language or an interpreted language.
例如,当由计算机软件指令序列实现时,本发明的实施方式的各种功能和步骤可以由在适当的数字信号处理硬件中运行的多线程软件指令序列实现,在这种情况下,实施方式的各种装置、步骤和功能可以对应于软件指令的部分。For example, when implemented by a sequence of computer software instructions, the various functions and steps of the embodiments of the present invention may be implemented by a multi-threaded sequence of software instructions running in appropriate digital signal processing hardware, in which case the various devices, steps and functions of the embodiments may correspond to portions of the software instructions.
每个这样的计算机程序优选地存储在或下载至由通用或专用可编程计算机可读的存储介质或装置(例如,固态存储器或介质、磁介质或光介质),当存储介质或装置由计算机系统读取以执行本文所描述的过程时,用于配置和操作计算机。本发明的系统还可以被实现为配置有(例如,存储)计算机程序的计算机可读存储介质,其中,这样配置的存储介质使得计算机系统以特定和预先定义的方式操作以执行本文中所描述的功能。Each such computer program is preferably stored on or downloaded to a storage medium or device (e.g., solid-state memory or media, magnetic media, or optical media) readable by a general or special purpose programmable computer, and when the storage medium or device is read by a computer system to perform the processes described herein, is used to configure and operate the computer. The system of the present invention may also be implemented as a computer-readable storage medium configured with (e.g., storing) a computer program, wherein the storage medium so configured causes the computer system to operate in a specific and predefined manner to perform the functions described herein.
已经描述了本发明的大量的实施方式。然而,应当理解的是,在不偏离本发明的精神和范围的情况下可以作出各种修改。鉴于上面的教示,本发明的大量的修改和变型是可能的。应当理解的是,在所附权利要求的范围内,可以与本文中具体描述的方式不同地实践本发明。A number of embodiments of the present invention have been described. However, it will be appreciated that various modifications may be made without departing from the spirit and scope of the present invention. In light of the above teachings, numerous modifications and variations of the present invention are possible. It will be appreciated that, within the scope of the appended claims, the present invention may be practiced otherwise than as specifically described herein.
此外,本发明还包括以下实施方式:In addition, the present invention also includes the following embodiments:
(1)一种音频处理单元,包括:(1) An audio processing unit comprising:
缓冲存储器;以及buffer memory; and
至少一个处理子系统,其耦接至所述缓冲存储器,其中所述缓冲存储器存储编码音频比特流的至少一个帧,所述帧包括在所述帧的至少一个跳过字段的至少一个元数据段中的节目信息元数据或子流结构元数据以及在所述帧的至少一个其他段中的音频数据,其中所述处理子系统被耦接并且被配置成使用所述比特流的元数据执行所述比特流的生成、所述比特流的解码或所述比特流的音频数据的自适应处理中的至少一种,或使用所述比特流的元数据执行所述比特流的音频数据或元数据中至少之一的认证或验证中的至少一种,at least one processing subsystem coupled to the buffer memory, wherein the buffer memory stores at least one frame of an encoded audio bitstream, the frame including program information metadata or substream structure metadata in at least one metadata segment of at least one skip field of the frame and audio data in at least one other segment of the frame, wherein the processing subsystem is coupled and configured to perform at least one of generation of the bitstream, decoding of the bitstream, or adaptive processing of audio data of the bitstream using the metadata of the bitstream, or to perform at least one of authentication or verification of at least one of the audio data or metadata of the bitstream using the metadata of the bitstream,
其中,所述元数据段包括至少一个元数据有效载荷,所述元数据有效载荷包括:The metadata segment includes at least one metadata payload, and the metadata payload includes:
报头;以及header; and
在所述报头之后的,所述节目信息元数据的至少一部分或所述子流结构元数据的至少一部分。Following the header, at least a portion of the program information metadata or at least a portion of the substream structure metadata.
(2)根据(1)所述的音频处理单元,其中,所述编码音频比特流指示至少一个音频节目,并且所述元数据段包括节目信息元数据有效载荷,所述节目信息元数据有效载荷包括:(2) The audio processing unit of (1), wherein the encoded audio bitstream indicates at least one audio program, and the metadata segment includes a program information metadata payload, the program information metadata payload including:
节目信息元数据报头;以及Program information metadata header; and
在所述节目信息元数据报头之后的,指示所述节目的音频内容的至少一个属性或特性的节目信息元数据,所述节目信息元数据包括指示所述节目的每个非静音通道和每个静音通道的活动通道元数据。Following the program information metadata header is program information metadata indicating at least one attribute or characteristic of the audio content of the program, the program information metadata including active channel metadata indicating each non-silent channel and each silent channel of the program.
(3)根据(2)所述的音频处理单元,其中,所述节目信息元数据还包括下列元数据中的至少之一:(3) The audio processing unit according to (2), wherein the program information metadata further includes at least one of the following metadata:
下混合处理状态元数据,其指示:所述节目是否是下混合过的,以及在所述节目是下混合过的情况下应用于所述节目的下混合的类型;downmix processing status metadata indicating whether the program is downmixed and, if so, the type of downmixing applied to the program;
上混合处理状态元数据,其指示:所述节目是否是上混合过的,以及在所述节目是上混合过的情况下应用于所述节目的上混合的类型;upmixing process status metadata indicating whether the program is upmixed and, if so, the type of upmixing applied to the program;
预处理状态元数据,其指示:是否对所述帧的音频内容执行了预处理,以及在对所述帧的音频内容执行了预处理的情况下对所述音频内容执行的预处理的类型;或pre-processing status metadata indicating whether pre-processing was performed on the audio content of the frame, and if so, the type of pre-processing performed on the audio content; or
谱扩展处理或通道耦合元数据,其指示:是否对所述节目应用了谱扩展处理或通道耦合,以及在对所述节目应用了谱扩展处理或通道耦合的情况下应用所述谱扩展或通道耦合的频率范围。Spectrum spreading processing or channel coupling metadata indicating whether spectrum spreading processing or channel coupling is applied to the program and, if so, the frequency range in which the spectrum spreading or channel coupling is applied.
(4)根据(1)所述的音频处理单元,其中,所述编码音频比特流指示具有音频内容的至少一个独立子流的至少一个音频节目,而所述元数据段包括子流结构元数据有效载荷,所述子流结构元数据有效载荷包括:(4) The audio processing unit of (1), wherein the coded audio bitstream indicates at least one audio program having at least one independent substream of audio content, and the metadata segment includes a substream structure metadata payload, the substream structure metadata payload including:
子流结构元数据有效载荷报头;以及Substream structure metadata payload header; and
在所述子流结构元数据有效载荷报头之后的,指示所述节目的独立子流的数量的独立子流元数据,以及指示所述节目的每个独立子流是否具有至少一个相关联的从属子流的从属子流元数据。Following the substream structure metadata payload header are independent substream metadata indicating the number of independent substreams of the program and dependent substream metadata indicating whether each independent substream of the program has at least one associated dependent substream.
(5)根据(1)所述的音频处理单元,其中,所述元数据段包括:(5) The audio processing unit according to (1), wherein the metadata segment includes:
元数据段报头;metadata segment header;
在所述元数据段报头之后的至少一个保护值,其用于所述节目信息元数据、或所述子流结构元数据、或与所述节目信息元数据或所述子流结构元数据相对应的所述音频数据中至少之一的解密、认证或验证中的至少一种;以及at least one protection value following the metadata segment header, used for at least one of decryption, authentication, or verification of at least one of the program information metadata, or the substream structure metadata, or the audio data corresponding to the program information metadata or the substream structure metadata; and
在所述元数据段报头之后的元数据有效载荷标识值和有效载荷配置值,其中所述元数据有效载荷在所述元数据有效载荷标识值和所述有效载荷配置值之后。A metadata payload identification value and a payload configuration value follow the metadata segment header, wherein the metadata payload follows the metadata payload identification value and the payload configuration value.
(6)根据(5)所述的音频处理单元,其中,所述元数据段报头包括标识所述元数据段的开始的同步字、以及在所述同步字之后的至少一个标识值,并且所述元数据有效载荷的所述报头包括至少一个标识值。(6) The audio processing unit of (5), wherein the metadata segment header includes a synchronization word identifying the start of the metadata segment and at least one identification value following the synchronization word, and the header of the metadata payload includes at least one identification value.
(7)根据(1)所述的音频处理单元,其中,所述编码音频比特流为AC-3比特流或E-AC-3比特流。(7) The audio processing unit according to (1), wherein the encoded audio bitstream is an AC-3 bitstream or an E-AC-3 bitstream.
(8)根据(1)所述的音频处理单元,其中,所述缓冲存储器以非暂态方式存储所述帧。(8) The audio processing unit according to (1), wherein the buffer memory stores the frames in a non-transitory manner.
(9)根据(1)所述的音频处理单元,其中,所述音频处理单元为编码器。(9) The audio processing unit according to (1), wherein the audio processing unit is an encoder.
(10)根据(9)所述的音频处理单元,其中,所述处理子系统包括:(10) The audio processing unit according to (9), wherein the processing subsystem includes:
解码子系统,其被配置成接收输入音频比特流并且从所述输入音频比特流中提取输入元数据和输入音频数据;a decoding subsystem configured to receive an input audio bitstream and extract input metadata and input audio data from the input audio bitstream;
自适应处理子系统,其被耦接并且被配置成使用所述输入元数据对所述输入音频数据执行自适应处理,由此生成经处理音频数据;以及an adaptive processing subsystem coupled and configured to perform adaptive processing on the input audio data using the input metadata, thereby generating processed audio data; and
编码子系统,其被耦接并且被配置成响应于所述经处理音频数据,包括通过将所述节目信息元数据或所述子流结构元数据包括在所述编码音频比特流中,来生成所述编码音频比特流,并且将所述编码音频比特流设定到所述缓冲存储器。An encoding subsystem is coupled and configured to generate the encoded audio bitstream in response to the processed audio data, including by including the program information metadata or the substream structure metadata in the encoded audio bitstream, and setting the encoded audio bitstream to the buffer memory.
(11)根据(1)所述的音频处理单元,其中,所述音频处理单元为解码器。(11) The audio processing unit according to (1), wherein the audio processing unit is a decoder.
(12)根据(11)所述的音频处理单元,其中,所述处理子系统为耦接至所述缓冲存储器并且被配置成从所述编码音频比特流中提取所述节目信息元数据或所述子流结构元数据的解码子系统。(12) The audio processing unit according to (11), wherein the processing subsystem is a decoding subsystem coupled to the buffer memory and configured to extract the program information metadata or the substream structure metadata from the encoded audio bitstream.
(13)根据(1)所述的音频处理单元,包括:(13) The audio processing unit according to (1), comprising:
子系统,其被耦接至所述缓冲存储器并且被配置成:从所述编码音频比特流中提取所述节目信息元数据或所述子流结构元数据,以及从所述编码音频比特流中提取所述音频数据;以及a subsystem coupled to the buffer memory and configured to: extract the program information metadata or the substream structure metadata from the encoded audio bitstream, and extract the audio data from the encoded audio bitstream; and
后处理器,其被耦接至所述子系统并且被配置成使用从所述编码音频比特流中提取的所述节目信息元数据或所述子流结构元数据中至少之一对所述音频数据执行自适应处理。A post-processor is coupled to the subsystem and configured to perform adaptive processing on the audio data using at least one of the program information metadata or the substream structure metadata extracted from the encoded audio bitstream.
(14)根据(1)所述的音频处理单元,其中,所述音频处理单元为数字信号处理器。(14) The audio processing unit according to (1), wherein the audio processing unit is a digital signal processor.
(15)根据(1)所述的音频处理单元,其中,所述音频处理单元为预处理器,所述预处理器被配置成从所述编码音频比特流中提取所述节目信息元数据或所述子流结构元数据以及所述音频数据,并且使用从所述编码音频比特流中提取的所述节目信息元数据或所述子流结构元数据中至少之一对所述音频数据执行自适应处理。(15) An audio processing unit according to (1), wherein the audio processing unit is a preprocessor configured to extract the program information metadata or the substream structure metadata and the audio data from the encoded audio bit stream, and perform adaptive processing on the audio data using at least one of the program information metadata or the substream structure metadata extracted from the encoded audio bit stream.
(16)一种用于对编码音频比特流进行解码的方法,所述方法包括以下步骤:(16) A method for decoding a coded audio bitstream, the method comprising the following steps:
接收编码音频比特流;以及receiving an encoded audio bitstream; and
从所述编码音频比特流中提取元数据和音频数据,其中所述元数据是或包括节目信息元数据和子流结构元数据,Extracting metadata and audio data from the coded audio bitstream, wherein the metadata is or includes program information metadata and substream structure metadata,
其中,所述编码音频比特流包括一系列帧并且指示至少一个音频节目,所述节目信息元数据和所述子流结构元数据指示所述节目,所述帧中的每个包括至少一个音频数据段,每个所述音频数据段包括所述音频数据的至少一部分,所述帧的至少一个子集中的每个帧包括元数据段,并且每个所述元数据段包括所述节目信息元数据的至少一部分以及所述子流结构元数据的至少一部分。The encoded audio bitstream includes a series of frames and indicates at least one audio program, the program information metadata and the substream structure metadata indicate the program, each of the frames includes at least one audio data segment, each of the audio data segment includes at least a portion of the audio data, each frame in at least a subset of the frames includes a metadata segment, and each of the metadata segments includes at least a portion of the program information metadata and at least a portion of the substream structure metadata.
(17)根据(16)所述的方法,其中,所述元数据段包括节目信息元数据有效载荷,所述节目信息元数据有效载荷包括:(17) The method of (16), wherein the metadata segment includes a program information metadata payload, the program information metadata payload including:
节目信息元数据报头;以及Program information metadata header; and
在所述节目信息元数据报头之后的指示所述节目的音频内容的至少一个属性或特性的节目信息元数据,所述节目信息元数据包括指示所述节目的每个非静音通道和每个静音通道的活动通道元数据。The program information metadata header is followed by program information metadata indicating at least one attribute or characteristic of audio content of the program, the program information metadata including active channel metadata indicating each non-muted channel and each muted channel of the program.
(18)根据(17)所述的方法,其中,所述节目信息元数据还包括下列元数据中的至少之一:(18) The method according to (17), wherein the program information metadata further includes at least one of the following metadata:
下混合处理状态元数据,其指示:所述节目是否是下混合过的,以及在所述节目是下混合过的情况下应用于所述节目的下混合的类型;downmix processing status metadata indicating whether the program is downmixed and, if so, the type of downmixing applied to the program;
上混合处理状态元数据,其指示:所述节目是否是上混合过的,以及在所述节目是上混合过的情况下应用于所述节目的上混合的类型;或upmixing process status metadata indicating whether the programme is upmixed and, if so, the type of upmixing applied to the programme; or
预处理状态元数据,其指示:是否对所述帧的音频内容执行了预处理,以及在对所述帧的音频内容执行了预处理的情况下对所述音频内容执行的预处理的类型。Pre-processing status metadata indicating whether pre-processing has been performed on the audio content of the frame, and, if so, the type of pre-processing performed on the audio content.
(19)根据(16)的方法,其中,所述编码音频比特流指示具有音频内容的至少一个独立子流的至少一个音频节目,并且所述元数据段包括子流结构元数据有效载荷,所述子流结构元数据有效载荷包括:(19) The method of (16), wherein the coded audio bitstream indicates at least one audio program having at least one independent substream of audio content, and the metadata segment includes a substream structure metadata payload, the substream structure metadata payload including:
子流结构元数据有效载荷报头;以及Substream structure metadata payload header; and
在所述子流结构元数据有效载荷报头之后的,指示所述节目的独立子流的数量的独立子流元数据以及指示所述节目的每个独立子流是否具有至少一个相关联的从属子流的从属子流元数据。Following the substream structure metadata payload header are independent substream metadata indicating the number of independent substreams of the program and dependent substream metadata indicating whether each independent substream of the program has at least one associated dependent substream.
(20)根据(16)所述的方法,其中,所述元数据段包括:(20) The method according to (16), wherein the metadata segment includes:
元数据段报头;metadata segment header;
在所述元数据段报头之后的至少一个保护值,用于所述节目信息元数据或所述子流结构元数据或与所述节目信息元数据和所述子流结构元数据相对应的所述音频数据中至少之一的解密、认证或验证中的至少一种;以及at least one protection value following the metadata segment header for at least one of decryption, authentication, or verification of at least one of the program information metadata or the substream structure metadata or the audio data corresponding to the program information metadata and the substream structure metadata; and
在所述元数据段报头之后的,包括所述节目信息元数据的所述至少一部分和所述子流结构元数据的所述至少一部分的元数据有效载荷。Following the metadata segment header is a metadata payload including the at least a portion of the program information metadata and the at least a portion of the sub-stream structure metadata.
(21)根据(16)所述的方法,其中,所述编码音频比特流为AC-3比特流或E-AC-3比特流。(21) The method according to (16), wherein the encoded audio bitstream is an AC-3 bitstream or an E-AC-3 bitstream.
(22)根据(16)所述的方法,还包括步骤:(22) The method according to (16), further comprising the steps of:
使用从所述编码音频比特流中提取的所述节目信息元数据或所述子流结构元数据中至少之一,对所述音频数据执行自适应处理。Adaptive processing is performed on the audio data using at least one of the program information metadata or the substream structure metadata extracted from the encoded audio bitstream.
Claims (20)
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US61/836,865 | 2013-06-19 |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| HK1232011A1 HK1232011A1 (en) | 2017-12-29 |
| HK1232011A HK1232011A (en) | 2017-12-29 |
| HK1232011B true HK1232011B (en) | 2020-07-31 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7775528B1 (en) | Audio encoder and decoder with program information or substream structure metadata | |
| HK1232011B (en) | Audio processing unit and method for decoding encoded audio bitstream | |
| HK1232012B (en) | Audio processing unit and audio decoding method | |
| HK40017633B (en) | Audio processing unit and method for decoding an encoded audio bitstream | |
| HK40017428B (en) | Audio processing unit, method performed by an audio processing unit and storage medium | |
| HK40017559B (en) | Audio processing unit, audio decoding method and storage medium | |
| HK40014483B (en) | Audio processing unit, and method for decoding an encoded audio bitstream | |
| HK40017428A (en) | Audio processing unit, method performed by an audio processing unit and storage medium | |
| HK40020068A (en) | Audio processing unit, method performed by an audio processing unit and storage medium | |
| HK40017633A (en) | Audio processing unit and method for decoding an encoded audio bitstream | |
| HK40014483A (en) | Audio processing unit, and method for decoding an encoded audio bitstream | |
| HK40017559A (en) | Audio processing unit, audio decoding method and storage medium | |
| HK1204135B (en) | Audio encoder and decoder with program information or substream structure metadata | |
| HK1232011A1 (en) | Audio processing unit and method for decoding encoded audio bitstream | |
| HK1232011A (en) | Audio processing unit and method for decoding encoded audio bitstream | |
| HK1232012A (en) | Audio processing unit and audio decoding method | |
| HK1232012A1 (en) | Audio processing unit and audio decoding method | |
| HK1214883B (en) | Audio encoder and decoder with program information or substream structure metadata |