HK40017428B

HK40017428B - Audio processing unit, method performed by an audio processing unit and storage medium

Info

Publication number: HK40017428B
Application number: HK42020007264.3A
Authority: HK
Inventors: 杰弗里·里德米勒; 迈克尔·沃德
Original assignee: 杜比实验室特许公司
Priority date: 2013-06-19
Filing date: 2020-05-12
Publication date: 2024-11-22

Description

Audio processing unit, method executed by audio processing unit, and storage medium

本申请是申请日为2013年7月31日、申请号为“201310329128.8”、发明名称为“使用节目信息或子流结构元数据的音频编码器和解码器”的发明专利申请的分案申请。This application is a divisional application of the invention patent application filed on July 31, 2013, with application number "201310329128.8" and invention title "Audio Encoder and Decoder Using Program Information or Substream Structure Metadata".

技术领域Technical Field

本发明涉及音频信号处理，以及更具体地，涉及具有指示与由比特流所指示的音频内容有关的子流结构和/或节目信息的元数据的音频数据比特流的编码和解码。本发明的一些实施方式以被称为杜比数字(AC-3)、杜比数字+(增强的AC-3或E-AC-3)或杜比E的格式中的一种格式生成或解码音频数据。This invention relates to audio signal processing, and more specifically, to the encoding and decoding of audio data bitstreams having metadata indicating substream structure and/or program information relating to the audio content indicated by the bitstream. Some embodiments of the invention generate or decode audio data in one of the formats known as Dolby Digital (AC-3), Dolby Digital+ (Enhanced AC-3 or E-AC-3), or Dolby E.

背景技术Background Technology

杜比、杜比数字、杜比数字+、和杜比E是杜比实验室特许公司的商标。杜比实验室提供分别被称为杜比数字和杜比数字+的AC-3和E-AC-3的专有实现。Dolby, Dolby Digital, Dolby Digital+, and Dolby E are trademarks of Dolby Laboratories, Inc. Dolby Laboratories provides proprietary implementations of AC-3 and E-AC-3, respectively known as Dolby Digital and Dolby Digital+.

音频数据处理单元通常以盲方式(blind fashion)操作并且不关注在数据被接收之前发生的音频数据的处理历史。这可以在这样的处理框架中工作：其中单个实体进行各种目标媒体渲染装置的所有的音频数据处理和编码而目标媒体渲染装置进行编码音频数据的所有的解码和渲染。然而，该盲处理在多个音频处理单元跨多样化的网络被散布(scatter)或串联(即，链)放置并且期望它们最佳地执行其相应类型的音频处理的情形下不能很好地(或完全不)工作。例如，一些音频数据可能针对高性能媒体系统被编码，并且可能需要被转换成适合于沿着媒体处理链的移动设备的简化形式。因此，音频处理单元可能不必要地对音频数据执行已经被执行过的类型的处理。例如，音量校平(leveling)单元可能对输入音频片断执行处理，不管以前是否已经对输入音频片断执行了相同的或相似的音量校平。因此，即使当不必要时，音量校平单元也可能执行校平。该不必要的处理还可能导致当渲染音频数据的内容时具体特征的退化和/或消除。Audio data processing units typically operate blindly and are not concerned with the processing history of audio data that occurred before the data was received. This can work in a processing framework where a single entity performs all audio data processing and encoding for various target media rendering devices, while the target media rendering devices perform all decoding and rendering of the encoded audio data. However, this blind processing does not work well (or not at all) when multiple audio processing units are scattered or chained across diverse networks and are expected to optimally perform their respective types of audio processing. For example, some audio data may be encoded for high-performance media systems and may need to be converted into a simplified form suitable for mobile devices along the media processing chain. Therefore, audio processing units may unnecessarily perform the type of processing on the audio data that has already been performed. For example, a leveling unit may perform processing on an input audio segment regardless of whether the same or similar leveling has been performed on the input audio segment before. Thus, the leveling unit may perform leveling even when it is unnecessary. This unnecessary processing may also lead to the degradation and/or elimination of specific features when rendering the content of the audio data.

发明内容Summary of the Invention

本发明公开了一种音频处理单元，包括：一个或更多个处理器；存储器，其耦接至一个或更多个处理器并且被配置成存储指令，指令在由一个或更多个处理器执行时使一个或更多个处理器执行操作，该操作包括：接收包括音频节目的编码音频比特流，所述编码音频比特流包括一个或更多个音频通道的集合的编码音频数据和与音频通道的所述集合相关联的元数据，其中，所述元数据包括响度处理状态元数据，并且其中，所述响度处理状态元数据包括指示所述音频节目的响度的元数据；对所述编码音频数据进行解码以获得音频通道的所述集合的解码音频数据；从所述编码音频比特流的元数据中获得所述响度处理状态元数据；以及基于所述响度处理状态元数据对音频通道的所述集合的解码音频数据执行自适应响度处理；其中，所述元数据还包括节目信息元数据，所述节目信息元数据指示用于在所述比特流中创建动态范围压缩DRC数据的压缩配置文件，其中，所述压缩配置文件是音乐标准压缩配置文件。This invention discloses an audio processing unit, comprising: one or more processors; a memory coupled to the one or more processors and configured to store instructions, which, when executed by the one or more processors, cause the one or more processors to perform operations, the operations including: receiving an encoded audio bitstream comprising an audio program, the encoded audio bitstream comprising encoded audio data of a set of one or more audio channels and metadata associated with the set of audio channels, wherein the metadata includes loudness processing status metadata, and wherein the loudness processing status metadata includes metadata indicating the loudness of the audio program; decoding the encoded audio data to obtain decoded audio data of the set of audio channels; obtaining the loudness processing status metadata from the metadata of the encoded audio bitstream; and performing adaptive loudness processing on the decoded audio data of the set of audio channels based on the loudness processing status metadata; wherein the metadata further includes program information metadata, the program information metadata indicating a compression profile for creating dynamic range compressed (DRC) data in the bitstream, wherein the compression profile is a music standard compression profile.

本发明还公开了一种由音频处理单元执行的方法，包括：接收包括音频节目的编码音频比特流，所述编码音频比特流包括一个或更多个音频通道的集合的编码音频数据和与音频通道的所述集合相关联的元数据，其中，所述元数据包括响度处理状态元数据，并且其中，所述响度处理状态元数据包括指示所述音频节目的响度的元数据；对所述编码音频数据进行解码以获得音频通道的所述集合的解码音频数据；从所述编码音频比特流的元数据中获得所述响度处理状态元数据；以及基于所述响度处理状态元数据对音频通道的所述集合的解码音频数据执行自适应响度处理；其中，所述元数据还包括节目信息元数据，所述节目信息元数据指示用于在所述比特流中创建动态范围压缩DRC数据的压缩配置文件，其中，所述压缩配置文件是音乐标准压缩配置文件。The present invention also discloses a method executed by an audio processing unit, comprising: receiving an encoded audio bitstream including an audio program, the encoded audio bitstream including encoded audio data of a set of one or more audio channels and metadata associated with the set of audio channels, wherein the metadata includes loudness processing status metadata, and wherein the loudness processing status metadata includes metadata indicating the loudness of the audio program; decoding the encoded audio data to obtain decoded audio data of the set of audio channels; obtaining the loudness processing status metadata from the metadata of the encoded audio bitstream; and performing adaptive loudness processing on the decoded audio data of the set of audio channels based on the loudness processing status metadata; wherein the metadata further includes program information metadata, the program information metadata indicating a compression profile for creating dynamic range compressed DRC data in the bitstream, wherein the compression profile is a music standard compression profile.

本发明又公开了一种存储有指令的非暂态计算机可读存储介质，指令在由一个或更多个处理器执行时使一个或更多个处理器执行操作，操作包括：接收包括音频节目的编码音频比特流，所述编码音频比特流包括一个或更多个音频通道的集合的编码音频数据以及与音频通道的所述集合相关联的元数据，其中，所述元数据包括响度处理状态元数据，并且其中，所述响度处理状态元数据包括指示所述音频节目的响度的元数据；对所述编码音频数据进行解码以获得音频通道的所述集合的解码音频数据；从所述编码音频比特流的元数据中获得所述响度处理状态元数据；以及基于所述响度处理状态元数据对音频通道的所述集合的解码音频数据执行自适应响度处理；其中，所述元数据还包括节目信息元数据，所述节目信息元数据指示用于在所述比特流中创建动态范围压缩DRC数据的压缩配置文件，其中，所述压缩配置文件是音乐标准压缩配置文件。The present invention further discloses a non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, cause one or more processors to perform operations, the operations including: receiving an encoded audio bitstream comprising an audio program, the encoded audio bitstream comprising encoded audio data of a set of one or more audio channels and metadata associated with the set of audio channels, wherein the metadata includes loudness processing status metadata, and wherein the loudness processing status metadata includes metadata indicating the loudness of the audio program; decoding the encoded audio data to obtain decoded audio data of the set of audio channels; obtaining the loudness processing status metadata from the metadata of the encoded audio bitstream; and performing adaptive loudness processing on the decoded audio data of the set of audio channels based on the loudness processing status metadata; wherein the metadata further includes program information metadata indicating a compression profile for creating dynamically range compressed (DRC) data in the bitstream, wherein the compression profile is a music standard compression profile.

在一类实施方式中，本发明是能够对编码比特流进行解码的音频处理单元，该编码比特流包括比特流的至少一个帧的至少一个段中的子流结构元数据和/或节目信息元数据(可选地还包括其他元数据，例如，响度处理状态元数据)以及帧的至少一个其他段中的音频数据。在本文中，子流结构元数据(或“SSM”)表示编码比特流(或编码比特流的集合)的元数据，其指示编码比特流的音频内容的子流结构，并且“节目信息元数据”(或“PIM”)表示编码音频比特流的元数据，其指示至少一个音频节目(例如，两个或更多个音频节目)，其中节目信息元数据指示至少一个所述节目的音频内容的至少一个属性或特性(例如，指示对节目的音频数据执行的处理的类型或参数的元数据，或指示节目的哪些通道是活动通道(active channel)的元数据)。In one embodiment, the present invention is an audio processing unit capable of decoding an encoded bitstream comprising substream structure metadata and/or program information metadata (optionally including other metadata, such as loudness processing status metadata) in at least one segment of at least one frame of the bitstream, and audio data in at least one other segment of the frame. Hereinafter, substream structure metadata (or “SSM”) refers to metadata of an encoded bitstream (or a collection of encoded bitstreams) indicating the substream structure of the audio content of the encoded bitstream, and “program information metadata” (or “PIM”) refers to metadata of an encoded audio bitstream indicating at least one audio program (e.g., two or more audio programs), wherein the program information metadata indicates at least one attribute or characteristic of the audio content of at least one of the programs (e.g., metadata indicating the type or parameters of processing performed on the audio data of the program, or metadata indicating which channels of the program are active channels).

在典型的情况(例如，其中编码比特流为AC-3或E-AC-3比特流)下，节目信息元数据(PIM)指示实际上不能在比特流的其他部分中携带的节目信息。例如，PIM可以指示在编码(例如，AC-3或E-AC-3编码)之前对PCM音频所应用的处理，音频节目的哪些频带已经使用具体的音频编码技术被编码以及用于在比特流中创建动态范围压缩(DRC)数据的压缩配置文件(profile)。In typical cases (e.g., where the encoded bitstream is AC-3 or E-AC-3), Program Information Metadata (PIM) indicates program information that cannot actually be carried in other parts of the bitstream. For example, PIM may indicate the processing applied to the PCM audio prior to encoding (e.g., AC-3 or E-AC-3 encoding), which frequency bands of the audio program have been encoded using specific audio coding techniques, and the compression profile used to create Dynamic Range Compression (DRC) data in the bitstream.

在另一类实施方式中，方法包括在比特流的每个帧(或至少一些帧中的每个帧)中将编码音频数据与SSM和/或PIM复用的步骤。在典型的解码中，解码器从比特流中提取SSM和/或PIM(包括通过对SSM和/或PIM以及音频数据进行分析和去复用)，并且对音频数据进行处理以生成解码音频数据的流(以及在某些情况下还执行音频数据的自适应处理)。在一些实施方式中，解码音频数据以及SSM和/或PIM从解码器被转发至后处理器，该后处理器被配置成使用SSM和/或PIM对解码音频数据执行自适应处理。In another type of implementation, the method includes the step of multiplexing encoded audio data with SSM and/or PIM in each frame (or at least each of some frames) of the bitstream. In typical decoding, the decoder extracts the SSM and/or PIM from the bitstream (including by analyzing and demultiplexing the SSM and/or PIM along with the audio data) and processes the audio data to generate a stream of decoded audio data (and in some cases, performs adaptive processing of the audio data). In some implementations, the decoded audio data, along with the SSM and/or PIM, is forwarded from the decoder to a post-processor configured to perform adaptive processing on the decoded audio data using the SSM and/or PIM.

在一类实施方式中，本发明的编码方法生成包括音频数据段(例如，图4所示的帧的AB0至AB5段或图7所示的帧的段AB0至AB5中的全部或一些)的编码音频比特流(例如，AC-3或E-AC-3比特流)，音频数据段包括编码音频数据以及与音频数据段时分复用的元数据段(包括SSM和/或PIM，可选地还包括其他元数据)。在一些实施方式中，每个元数据段(在本文中有时称为“容器”)具有包括元数据段报头(可选地还包括其他强制性的或“核心”元素)、以及在元数据段报头之后的一个或更多个元数据有效载荷。如果存在，SIM被包括在元数据有效载荷之一中(由有效载荷报头标识，并且通常具有第一类型的格式)。如果存在，PIM被包括在元数据有效载荷中的另一个中(由有效载荷报头标识，并且通常具有第二类型的格式)。类似地，元数据的每个其他类型(如果存在)被包括在元数据有效载荷中的另一个中(由有效载荷报头标识，并且通常具有特定于元数据的类型的格式)。示例性格式允许在除了比特流的解码期间之外的时间(例如，由解码之后的后处理器，或由被配置成在不执行对编码比特流的完全解码的情况下识别元数据的处理器)对SSM、PIM或其他元数据的方便的访问，并且允许在比特流的解码期间(例如，子流识别的)方便的和高效的误差检测和校正。例如，在不以示例性格式访问SSM的情况下，解码器可能错误地识别与节目相关联的子流的正确数量。元数据段中的一个元数据有效载荷可以包括SSM，元数据段中的另一元数据有效载荷可以包括PIM，并且可选地，元数据段中的至少一个其他元数据有效载荷可以包括其他元数据(例如，响度处理状态元数据或“LPSM”)。In one embodiment, the encoding method of the present invention generates an encoded audio bitstream (e.g., an AC-3 or E-AC-3 bitstream) comprising audio data segments (e.g., segments AB0 to AB5 of the frame shown in FIG. 4 or all or some of segments AB0 to AB5 of the frame shown in FIG. 7), the audio data segments comprising encoded audio data and metadata segments (including SSM and/or PIM, optionally including other metadata) time-division multiplexed with the audio data segments. In some embodiments, each metadata segment (sometimes referred to herein as a “container”) has a metadata segment header (optionally including other mandatory or “core” elements) and one or more metadata payloads following the metadata segment header. If present, the SIM is included in one of the metadata payloads (identified by a payload header and typically having a first type of format). If present, the PIM is included in another of the metadata payloads (identified by a payload header and typically having a second type of format). Similarly, each other type of metadata (if present) is included in another of the metadata payloads (identified by a payload header and typically having a metadata-specific format). The exemplary format allows convenient access to the SSM, PIM, or other metadata at times other than during the decoding of the bitstream (e.g., by a post-processor after decoding, or by a processor configured to identify metadata without performing full decoding of the encoded bitstream), and allows for convenient and efficient error detection and correction during the decoding of the bitstream (e.g., substream identification). For example, without accessing the SSM in the exemplary format, the decoder might incorrectly identify the correct number of substreams associated with the program. One metadata payload in a metadata segment may include the SSM, another metadata payload in a metadata segment may include the PIM, and optionally, at least one other metadata payload in a metadata segment may include other metadata (e.g., loudness processing status metadata or "LPSM").

附图说明Attached Figure Description

图1是可以被配置成执行本发明的方法的实施方式的系统的实施方式的框图。Figure 1 is a block diagram of an embodiment of a system that can be configured to perform an embodiment of the method of the present invention.

图2是作为本发明的音频处理单元的实施方式的编码器的框图。Figure 2 is a block diagram of an encoder as an embodiment of the audio processing unit of the present invention.

图3是作为本发明的音频处理单元的实施方式的解码器以及作为本发明的音频处理单元的另一实施方式的耦接至解码器的后处理器的框图。Figure 3 is a block diagram of a decoder as an embodiment of the audio processing unit of the present invention and a post-processor coupled to the decoder as another embodiment of the audio processing unit of the present invention.

图4是包括被划分成的段的AC-3帧的图。Figure 4 is a diagram of an AC-3 frame that includes the segments it is divided into.

图5是包括被划分成的段的AC-3帧的同步信息(SI)段的图。Figure 5 is a diagram of the synchronization information (SI) segment of an AC-3 frame, which is divided into segments.

图6是包括被划分成的段的AC-3帧的比特流信息(BSI)段的图。Figure 6 is a diagram of the Bit Stream Information (BSI) segment of an AC-3 frame, which is divided into segments.

图7是包括被划分成的段的E-AC-3帧的图。Figure 7 is a diagram of an E-AC-3 frame that includes the segments it is divided into.

图8是根据本发明的实施方式生成的包括元数据段报头的编码比特流的元数据段的图，元数据段报头包括容器同步字(在图8中标识为“容器同步”)以及版本和键ID值，之后是多个元数据有效载荷以及保护位。Figure 8 is a diagram of a metadata segment of an encoded bitstream including a metadata segment header generated according to an embodiment of the present invention. The metadata segment header includes a container synchronization word (identified as "container synchronization" in Figure 8) and a version and key ID value, followed by multiple metadata payloads and protection bits.

符号和术语Symbols and terms

贯穿包括权利要求在内的本公开内容，“对”信号或数据执行操作(例如，对信号或数据进行滤波、缩放、变换或施加增益)的表达用于广义上表示对信号或数据、或对信号或数据的已处理版本(例如，对在对信号执行操作之前已经经历了初步滤波或预处理的信号的版本)直接执行操作。Throughout this disclosure, including the claims, the expression “to” a signal or data (e.g., to filter, scale, transform, or apply gain to the signal or data) is used broadly to mean directly performing an operation on the signal or data, or on a processed version of the signal or data (e.g., on a version of the signal that has undergone preliminary filtering or preprocessing before the operation is performed on the signal).

贯穿包括权利要求在内的本公开内容，“系统”的表达用于广义上表示设备、系统或子系统。例如，实现解码器的子系统可以称为解码器系统，并且包括这样的子系统的系统(例如，响应于多个输入生成X个输出信号的系统，在该系统中，子系统生成M个输入并且其他X－M个输入从外部源接收)也可以称为解码器系统。Throughout this disclosure, including the claims, the term "system" is used broadly to refer to a device, system, or subsystem. For example, a subsystem implementing a decoder can be called a decoder system, and a system including such a subsystem (e.g., a system that generates X output signals in response to multiple inputs, wherein the subsystem generates M inputs and the other X-M inputs are received from an external source) can also be called a decoder system.

贯穿包括权利要求在内的本公开内容，术语“处理器”用于广义上表示可编程或以其他方式可配置成(例如，使用软件或固件)对数据(例如，音频数据或视频数据或其他图像数据)执行操作的系统或装置。处理器的示例包括现场可编程门阵列(或其他可配置的集成电路或芯片组)、被编程和/或被以其他方式配置成对音频数据或其他声音数据执行流水线处理的数字信号处理器、可编程的通用处理器或计算机以及可编程的微处理器芯片或芯片组。Throughout this disclosure, including the claims, the term "processor" is used broadly to refer to a system or apparatus that is programmable or otherwise configurable (e.g., using software or firmware) to perform operations on data (e.g., audio data, video data, or other image data). Examples of processors include field-programmable gate arrays (or other configurable integrated circuits or chipsets), digital signal processors programmed and/or otherwise configured to perform pipelined processing of audio data or other sound data, programmable general-purpose processors or computers, and programmable microprocessor chips or chipsets.

贯穿包括权利要求在内的本公开内容，“音频处理器”和“音频处理单元”的表达用于可交换地广义上表示被配置成对音频数据进行处理的系统。音频处理单元的示例包括但不限于编码器(例如，代码转换器)、解码器、编解码器、预处理系统、后处理系统以及比特流处理系统(有时称为比特流处理工具)。Throughout this disclosure, including the claims, the expressions "audio processor" and "audio processing unit" are used interchangeably and broadly to refer to a system configured to process audio data. Examples of audio processing units include, but are not limited to, encoders (e.g., code converters), decoders, codecs, preprocessing systems, post-processing systems, and bitstream processing systems (sometimes referred to as bitstream processing tools).

贯穿包括权利要求在内的本公开内容，(编码音频比特流的)“元数据”的表达指代与比特流的相应的音频数据分离的且不同的数据。Throughout this disclosure, including the claims, the expression “metadata” (of encoded audio bitstreams) refers to data that is separate from and distinct from the corresponding audio data of the bitstream.

贯穿包括权利要求在内的本公开内容，“子流结构元数据”(或“SSM”)的表达表示编码音频比特流(或编码音频比特流集)的元数据，其指示编码比特流的音频内容的子流结构。Throughout this disclosure, including the claims, the expression “substream structure metadata” (or “SSM”) represents metadata of an encoded audio bitstream (or set of encoded audio bitstreams) that indicates the substream structure of the audio content of the encoded bitstream.

贯穿包括权利要求在内的本公开内容，“节目信息元数据”(或“PIM”)的表达表示编码音频比特流的元数据，该编码音频比特流指示至少一个音频节目(例如，两个或更多个音频节目)，其中所述元数据指示至少一个所述节目的音频内容的至少一个属性或特性(例如，指示对节目的音频数据执行的处理的类型或参数的元数据、或表示节目的哪些通道是活动通道的元数据)。Throughout this disclosure, including the claims, the expression “Program Information Metadata” (or “PIM”) represents metadata of an encoded audio bitstream that indicates at least one audio program (e.g., two or more audio programs), wherein the metadata indicates at least one attribute or characteristic of the audio content of at least one of the programs (e.g., metadata indicating the type or parameters of processing performed on the audio data of the program, or metadata indicating which channels of the program are active channels).

贯穿包括权利要求在内的本公开内容，“处理状态元数据”的表达(例如，如在“响度处理状态元数据”的表达中)指代与比特流的音频数据相关联的(编码音频比特流的)元数据，指示相应的(相关联的)音频数据的处理状态(例如，已经对音频数据执行了什么类型的处理)，并且通常还指示音频数据的至少一个特征或特性。处理状态元数据与音频数据的关联是时间同步的。从而，当前的(最新接收或更新的)处理状态元数据指示相应的音频数据同时包括所指示的类型的音频数据处理的结果。在一些情况下，处理状态元数据可以包括处理历史和/或用于所指示的类型的处理中的和/或从所指示的类型的处理中得到的参数中的一些或全部。另外，处理状态元数据可以包括相应的音频数据的已经从音频数据中计算或提取的至少一个特征或特性。处理状态元数据还可以包括与相应的音频数据的任何处理无关的或不是从相应的音频数据的任何处理中得到的其他元数据。例如，第三方数据、跟踪信息、标识符、所有权或标准信息、用户注释数据、用户偏好数据等可以通过具体的音频处理单元被添加以传递至其他音频处理单元。Throughout this disclosure, including the claims, the expression "processing status metadata" (e.g., as in the expression "loudness processing status metadata") refers to metadata associated with audio data of a bitstream (encoded audio bitstream), indicating the processing status of the corresponding (associated) audio data (e.g., what type of processing has been performed on the audio data), and generally also indicating at least one feature or characteristic of the audio data. The association between the processing status metadata and the audio data is time-synchronized. Thus, the current (latest received or updated) processing status metadata indicates the corresponding audio data and includes the result of audio data processing of the indicated type. In some cases, the processing status metadata may include some or all of the processing history and/or parameters used in and/or obtained from the indicated type of processing. Additionally, the processing status metadata may include at least one feature or characteristic of the corresponding audio data that has been calculated or extracted from the audio data. The processing status metadata may also include other metadata unrelated to any processing of the corresponding audio data or not obtained from any processing of the corresponding audio data. For example, third-party data, tracking information, identifiers, ownership or standard information, user annotation data, user preference data, etc., can be added by specific audio processing units to be passed to other audio processing units.

贯穿包括权利要求在内的本公开内容，“响度处理状态元数据”(或“LPSM”)的表达表示处理状态元数据，处理状态元数据指示相应的音频数据的响度处理状态(例如，已经对音频数据执行了什么类型的响度处理)，并且通常还指示相应的音频数据的至少一个特征或特性(例如，响度)。响度处理状态元数据可以包括不是(即，当单独考虑时)响度处理状态元数据的数据(例如，其他元数据)。Throughout this disclosure, including the claims, the expression “loudness processing status metadata” (or “LPSM”) refers to processing status metadata, which indicates the loudness processing status of the corresponding audio data (e.g., what type of loudness processing has been performed on the audio data) and typically also indicates at least one feature or characteristic of the corresponding audio data (e.g., loudness). Loudness processing status metadata may include data that is not (i.e., when considered alone) loudness processing status metadata (e.g., other metadata).

贯穿包括权利要求在内的本公开内容，“通道”(或“音频通道”)的表达表示单通道音频信号。Throughout this disclosure, including the claims, the term "channel" (or "audio channel") refers to a single-channel audio signal.

贯穿包括权利要求在内的本公开内容，“音频节目”的表达表示一个或更多个音频通道的集合以及可选地还表示相关联的元数据(例如，描述期望的空间音频表示的元数据、和/或PIM、和/或SSM、和/或LPSM、和/或节目边界元数据)。Throughout this disclosure, including the claims, the expression “audio program” means a collection of one or more audio channels and optionally also means associated metadata (e.g., metadata describing the desired spatial audio representation, and/or PIM, and/or SSM, and/or LPSM, and/or program boundary metadata).

贯穿包括权利要求在内的本公开内容，“节目边界元数据”的表达表示编码音频比特流的元数据，其中编码音频比特流指示至少一个音频节目(例如，两个或更多个节目)，并且节目边界元数据指示至少一个所述音频节目的至少一个边界(开始和/或结束)在比特流中的位置。例如，(指示音频节目的编码音频比特流的)节目边界元数据可以包括指示节目的开始的位置(例如，比特流的第“N”帧的开始，或比特流的第“N”帧的第“M”个样本位置)的元数据，以及指示节目的结束的位置(例如，比特流的第“J”帧的开始，或比特流的第“J”帧的第“K”个样本位置)的额外元数据。Throughout this disclosure, including the claims, the expression "program boundary metadata" represents metadata of an encoded audio bitstream, wherein the encoded audio bitstream indicates at least one audio program (e.g., two or more programs), and the program boundary metadata indicates the position of at least one boundary (start and/or end) of at least one of the audio programs in the bitstream. For example, the program boundary metadata (of the encoded audio bitstream indicating the audio programs) may include metadata indicating the position of the start of the program (e.g., the start of the "N"th" frame of the bitstream, or the "M"th sample position of the "N"th frame of the bitstream), and additional metadata indicating the position of the end of the program (e.g., the start of the "J"th frame of the bitstream, or the "K"th sample position of the "J"th frame of the bitstream).

贯穿包括权利要求在内的本公开内容，术语“耦接”或“被耦接”用于表示直接或间接连接。从而，如果第一设备耦接至第二设备，该连接可以是通过直接连接，或经由其他设备和连接的通过间接连接。Throughout this disclosure, including the claims, the terms "coupled" or "coupled" are used to indicate a direct or indirect connection. Thus, if a first device is coupled to a second device, the connection can be a direct connection or an indirect connection via other devices and connections.

具体实施方式Detailed Implementation

典型的音频数据流包括音频内容(例如，音频内容的一个或更多个通道)和指示音频内容的至少一个特性的元数据两者。例如，在AC-3比特流中，存在具体意在用于改变被传送至收听环境的节目的声音的若干音频元数据参数。元数据参数中的一个为DIALNORM参数，其意在指示音频节目中的对白的平均电平，并且用于确定音频回放信号电平。A typical audio data stream consists of both audio content (e.g., one or more channels of audio content) and metadata indicating at least one characteristic of the audio content. For example, in an AC-3 bitstream, there are several audio metadata parameters specifically intended to alter the sound of the program being transmitted to the listening environment. One of these metadata parameters is the DIALNORM parameter, which is intended to indicate the average level of dialogue in the audio program and is used to determine the audio playback signal level.

在包括一系列不同的音频节目段(每个具有不同的DIALNORM参数)的比特流的回放期间，AC-3解码器使用每个段的DIALNORM参数执行一种类型的响度处理，在该响度处理中AC-3解码器修改回放电平或响度，使得该系列段的对白的感知的响度处于一致的电平。一系列编码音频项目中的每个编码音频段(项目)将(通常)具有不同的DIALNORM参数，并且解码器将对项目中的每个项目的电平进行缩放，使得每个项目的对白的回放电平或响度相同或非常相似，尽管这会要求在回放期间对项目中的不同的项目应用不同量的增益。During playback of a bitstream comprising a series of different audio program segments (each with different DIALNORM parameters), the AC-3 decoder performs a type of loudness processing using the DIALNORM parameters of each segment. In this loudness processing, the AC-3 decoder modifies the playback level, or loudness, so that the perceived loudness of the dialogue across the series of segments is at a consistent level. Each coded audio segment (project) in a series of coded audio projects will (typically) have different DIALNORM parameters, and the decoder will scale the level of each project within the project so that the playback level or loudness of the dialogue in each project is the same or very similar, although this requires applying different amounts of gain to the different projects within the project during playback.

DIALNORM通常由用户设置而不是自动生成的，然而如果用户没有设置值则存在默认的DIALNORM值。例如，内容创建者可以使用AC-3编码器外部的装置进行响度测量，然后将该结果(指示音频节目的口语对白的响度)传送至编码器以设置DIALNORM值。从而，依赖于内容创建者正确地设置DIALNORM参数。DIALNORM is typically set by the user rather than automatically generated; however, a default DIALNORM value exists if the user does not set one. For example, content creators can use an external device to measure loudness using an AC-3 encoder and then transmit that result (indicating the loudness of spoken dialogue in an audio program) to the encoder to set the DIALNORM value. Thus, it depends on the content creator setting the DIALNORM parameter correctly.

对于为什么AC-3比特流中的DIALNORM参数会是错误的，存在几个不同的原因。第一，如果DIALNORM值不是由内容创建者设置的，那么每个AC-3编码器具有在比特流的生成期间使用的默认的DIALNORM值。该默认值可能与音频的实际对白响度显著不同。第二，即使内容创建者测量响度并且相应地设置DIALNORM值，可能已经使用不符合推荐的AC-3响度测量方法的响度测量算法或计量器，产生不正确的DIALNORM值。第三，即使已经使用由内容创建者正确测量和设置的DIALNORM值创建了AC-3比特流，该AC-3比特流可能在比特流的传输和/或存储期间已经被改变成错误值。例如，这在使用错误的DIALNORM元数据信息解码、修改然后重新编码AC-3比特流的电视广播应用中并非是不常见的。从而，包括在AC-3比特流中的DIALNORM值可能是错误的或不准确的，因此可能对收听体验的质量有消极的影响。There are several different reasons why the DIALNORM parameter in an AC-3 bitstream might be incorrect. First, if the DIALNORM value is not set by the content creator, then each AC-3 encoder has a default DIALNORM value used during bitstream generation. This default value may differ significantly from the actual loudness of the audio dialogue. Second, even if the content creator measures loudness and sets the DIALNORM value accordingly, an incorrect DIALNORM value may have been produced using a loudness measurement algorithm or meter that does not conform to recommended AC-3 loudness measurement methods. Third, even if the AC-3 bitstream was created using a DIALNORM value correctly measured and set by the content creator, the AC-3 bitstream may have been altered to an incorrect value during bitstream transmission and/or storage. This is not uncommon, for example, in television broadcasting applications that decode, modify, and then re-encode the AC-3 bitstream using incorrect DIALNORM metadata information. Therefore, the DIALNORM value included in the AC-3 bitstream may be incorrect or inaccurate, and thus may negatively impact the quality of the listening experience.

此外，DIALNORM参数不指示相应的音频数据的响度处理状态(例如，已经对音频数据执行了什么类型的响度处理)。响度处理状态元数据(以其在本发明的一些实施方式中被提供的格式)有助于以尤其高效的方式便利于音频比特流的自适应响度处理和/或音频内容的响度处理状态和响度的有效性的验证。Furthermore, the DIALNORM parameter does not indicate the loudness processing status of the corresponding audio data (e.g., what type of loudness processing has been performed on the audio data). Loudness processing status metadata (in the format provided in some embodiments of the invention) helps to facilitate, in a particularly efficient manner, adaptive loudness processing of audio bitstreams and/or verification of the loudness processing status and loudness validity of audio content.

尽管本发明不限于使用AC-3比特流、E-AC-3比特流或杜比E比特流，为了方便，将在生成、解码或以其他方式处理这样的比特流的实施方式中对其进行描述。Although the present invention is not limited to the use of AC-3 bitstreams, E-AC-3 bitstreams or Dolby E bitstreams, for convenience, it will be described in embodiments of generating, decoding or otherwise processing such bitstreams.

AC-3编码比特流包括元数据和音频内容的1至6个通道。音频内容是已经使用感知音频编码压缩的音频数据。元数据包括意在用于改变被传送至收听环境的节目的声音的若干音频元数据参数。The AC-3 encoded bitstream comprises 1 to 6 channels of metadata and audio content. The audio content is audio data that has been compressed using perceptual audio coding. The metadata includes several audio metadata parameters intended to modify the sound of the program transmitted to the listening environment.

AC-3编码音频比特流的每帧包含关于数字音频的1536个样本的音频内容和元数据。对于48kHz的采样率，这表示32毫秒的数字音频或音频的每秒31.25帧的速率。Each frame of an AC-3 encoded audio bitstream contains audio content and metadata about 1536 samples of digital audio. At a sampling rate of 48kHz, this translates to 32 milliseconds of digital audio, or 31.25 frames per second of audio.

取决于帧是否分别包含1块、2块、3块或6块音频数据，E-AC-3编码音频比特流的每帧包含关于数字音频的256、512、768或1536个样本的音频数据和元数据。对于48kHz的采样率，这分别表示5.333、10.667、16或32毫秒的数字音频或分别表示音频的每秒189.9、93.75、62.5或31.25帧的速率。Depending on whether a frame contains 1, 2, 3, or 6 blocks of audio data, each frame of an E-AC-3 encoded audio bitstream contains audio data and metadata of 256, 512, 768, or 1536 samples of digital audio. For a sampling rate of 48 kHz, this represents 5.333, 10.667, 16, or 32 milliseconds of digital audio, or 189.9, 93.75, 62.5, or 31.25 frames per second, respectively.

如图4所示，每个AC-3帧被划分成部分(段)，包括：包含(如图5所示)同步字(SW)和两个误差校正字中的第一个误差校正字(CRC1)的同步信息(SI)部分；包含大部分元数据的比特流信息(BSI)部分；包含数据压缩音频内容(以及还可以包括元数据)的6个音频块(AB0至AB5)；包含在压缩音频内容之后剩余的任意未使用的位的无用位段(W)(也称为“跳过字段”)；可以包含更多元数据的辅助(AUX)信息部分；以及两个误差校正字中的第二个误差校正字(CRC2)。As shown in Figure 4, each AC-3 frame is divided into sections, including: a synchronization information (SI) section containing a synchronization word (SW) and the first error correction word (CRC1) of the two error correction words (as shown in Figure 5); a bitstream information (BSI) section containing most of the metadata; six audio blocks (AB0 to AB5) containing compressed audio content (and may also include metadata); a useless bit field (W) containing any unused bits remaining after the compressed audio content (also known as a "skip field"); an auxiliary (AUX) information section that may contain more metadata; and the second error correction word (CRC2) of the two error correction words.

如图7所示，每个E-AC-3帧被划分成部分(段)，包括：包含(如图5所示)同步字(SW)的同步信息(SI)部分；包含大部分元数据的比特流信息(BSI)部分；包含数据压缩音频内容(以及还可以包括元数据)的6个音频块(AB0至AB5)；包含在压缩音频内容之后剩余的任意未使用的位的无用位段(W)(也称为“跳过字段”)(尽管仅示出了一个无用位段，不同的无用位段或跳过字段段通常可以在每个音频块之后)；可以包含更多元数据的辅助(AUX)信息部分；以及误差校正字(CRC)。As shown in Figure 7, each E-AC-3 frame is divided into sections, including: a Synchronization Information (SI) section containing a Synchronization Word (SW) (as shown in Figure 5); a Bit Stream Information (BSI) section containing most of the metadata; six audio blocks (AB0 to AB5) containing compressed audio content (and may also include metadata); a Useless Bit Field (W) containing any unused bits remaining after the compressed audio content (also known as a "skip field") (although only one Useless Bit Field is shown, different Useless Bit Fields or Skip Fields are usually present after each audio block); an Auxiliary (AUX) Information section that may contain more metadata; and an Error Correction Word (CRC).

在AC-3(或E-AC-3)比特流中，存在具体意在用于改变被传送至收听环境的节目的声音的若干音频元数据参数。元数据参数中的一个为DIALNORM参数，该DIALNORM参数被包括在BSI段中。In an AC-3 (or E-AC-3) bitstream, there are several audio metadata parameters specifically designed to alter the sound of the program transmitted to the listening environment. One of these metadata parameters is the DIALNORM parameter, which is included in the BSI segment.

如图6所示，AC-3帧的BSI段包括指示节目的DIALNORM值的5位参数(“DIALNORM”)。如果AC-3帧的音频编码模式(“acmod”)为0，则包括指示在同一AC-3帧中携带的第二音频节目的5位参数DIALNORM值的5位参数(“DIALNORM2”)，指示使用双单通道或“1+1”通道配置。As shown in Figure 6, the BSI segment of an AC-3 frame includes a 5-bit parameter (“DIALNORM”) indicating the DIALNORM value of the program. If the audio coding mode (“acmod”) of the AC-3 frame is 0, it includes a 5-bit parameter (“DIALNORM2”) indicating the DIALNORM value of a second audio program carried in the same AC-3 frame, indicating the use of a dual-single channel or a “1+1” channel configuration.

BSI段还包括指示在“addbsie”位之后额外的比特流信息的存在(或不存在)的标志(“addbsie”)、指示在“addbsil”值之后任何额外的比特流信息的长度的参数(“addbsil”)、以及在“addbsil”值之后高达64位的额外的比特流信息(“addbsi”)。The BSI segment also includes a flag (“addbsie”) indicating the presence (or absence) of additional bitstream information after the “addbsie” bit, a parameter (“addbsil”) indicating the length of any additional bitstream information after the “addbsil” value, and additional bitstream information (“addbsi”) up to 64 bits after the “addbsil” value.

BSI段包括在图6中没有具体示出的其他元数据值。The BSI segment includes other metadata values not specifically shown in Figure 6.

根据一类实施方式，编码比特流指示音频内容的多个子流。在一些情况下，子流指示多通道节目的音频内容，并且子流中的每个指示节目的通道中的一个或更多个。在其他情况下，编码音频比特流的多个子流指示若干音频节目——通常为“主”音频节目(可以是多通道节目)和至少一个其他音频节目(例如，为关于主音频节目的评论的节目)——的音频内容。According to one implementation, the encoded bitstream indicates multiple substreams of audio content. In some cases, the substreams indicate the audio content of a multi-channel program, and each substream indicates one or more channels of the program. In other cases, the multiple substreams of the encoded audio bitstream indicate the audio content of several audio programs—typically a “main” audio program (which may be a multi-channel program) and at least one other audio program (e.g., a program providing commentary on the main audio program).

指示至少一个音频节目的编码音频比特流需要包括音频内容的至少一个“独立”子流。独立子流指示音频节目的至少一个通道(例如，独立子流可以指示常规的5.1通道音频节目的5个全音域通道)。在本文中，该音频节目称为“主”节目。An encoded audio bitstream indicating at least one audio program needs to include at least one “independent” substream containing audio content. An independent substream indicates at least one channel of the audio program (e.g., an independent substream could indicate the five full-range channels of a typical 5.1-channel audio program). In this document, this audio program is referred to as the “main” program.

在一些类型的实施方式中，编码音频比特流指示两个或更多个音频节目(“主”节目和至少一个其他音频节目)。在这样的情况下，比特流包括两个或更多个独立子流：指示主节目的至少一个通道的第一独立子流；以及指示另一音频节目(与主节目不同的节目)的至少一个通道的至少一个其他独立子流。每个独立子流可以独立地被解码，并且解码器可以操作以仅对编码比特流的独立子流的子集(不是全部)进行解码。In some implementations, the encoded audio bitstream indicates two or more audio programs (a "main" program and at least one other audio program). In such cases, the bitstream comprises two or more independent substreams: a first independent substream indicating at least one channel of the main program; and at least one other independent substream indicating at least one channel of another audio program (a program different from the main program). Each independent substream can be decoded independently, and the decoder can operate to decode only a subset (not all) of the independent substreams of the encoded bitstream.

在指示两个独立子流的编码音频比特流的典型示例中，独立子流中的一个指示多通道主节目的标准格式扬声器通道(例如，5.1通道主节目的左、右、中、左环绕、右环绕全音域扬声器通道)，而另一独立子流指示关于主节目的单通道音频评论(例如，导演关于电影的评论，其中主节目是电影的声带(soundtrack))。在指示多个独立子流的编码音频比特流的另一示例中，独立子流中的一个指示包括第一语言的对白的多通道主节目(例如，5.1通道主节目)的标准格式扬声器通道(例如，主节目的扬声器通道中的一个可以指示对白)，而每个其他独立子流指示对白的单通道翻译(翻译成不同的语言)。In a typical example of an encoded audio bitstream indicating two independent substreams, one independent substream indicates a standard format speaker channel for a multi-channel master program (e.g., left, right, center, left surround, and right surround full-range speaker channels for a 5.1-channel master program), while the other independent substream indicates a single-channel audio commentary on the master program (e.g., a director's commentary on a film, where the master program is the film's soundtrack). In another example of an encoded audio bitstream indicating multiple independent substreams, one independent substream indicates a standard format speaker channel for a multi-channel master program (e.g., a 5.1-channel master program) that includes dialogue in the first language (e.g., one of the speaker channels of the master program could indicate the dialogue), while each of the other independent substreams indicates a single-channel translation of the dialogue (translated into a different language).

可选地，指示主节目(可选地还指示至少一个其他音频节目)的编码音频比特流包括音频内容的至少一个“从属”子流。每个从属子流与比特流的一个独立子流相关联，并且指示其内容由相关联的独立子流指示的节目(例如，主节目)的至少一个额外的通道(即，从属子流指示节目的不是由相关联的独立子流指示的至少一个通道，而相关联的独立子流指示节目的至少一个通道)。Optionally, the encoded audio bitstream indicating the main program (and optionally at least one other audio program) includes at least one “subordinate” substream of audio content. Each subordinate substream is associated with an independent substream of the bitstream and indicates at least one additional channel of the program (e.g., the main program) whose content is indicated by the associated independent substream (i.e., the subordinate substream indicates at least one channel of the program that is not indicated by the associated independent substream, while the associated independent substream indicates at least one channel of the program).

在包括独立子流(指示主节目的至少一个通道)的编码比特流的示例中，比特流还包括指示主节目的一个或更多个额外的扬声器通道的(与独立子流相关联的)从属子流。这样的额外的扬声器通道对由独立子流指示的主节目通道来说是额外的。例如，如果独立子流指示7.1通道主节目的左、右、中、左环绕、右环绕全音域扬声器通道，那么从属子流可以指示主节目的其他两个全音域扬声器通道。In an example of an coded bitstream that includes an independent substream (indicating at least one channel of the main program), the bitstream also includes subordinate substreams (associated with the independent substream) that indicate one or more additional speaker channels of the main program. Such additional speaker channels are extra to the main program channel indicated by the independent substream. For example, if the independent substream indicates the left, right, center, left surround, and right surround full-range speaker channels of a 7.1 channel main program, then the subordinate substreams could indicate two other full-range speaker channels of the main program.

根据E-AC-3标准，E-AC-3比特流必须指示至少一个独立子流(例如，单个AC-3比特流)，并且可以指示高达8个独立子流。E-AC-3比特流的每个独立子流可以与高达8个从属子流相关联。According to the E-AC-3 standard, an E-AC-3 bitstream must indicate at least one independent substream (e.g., a single AC-3 bitstream) and can indicate up to eight independent substreams. Each independent substream of an E-AC-3 bitstream can be associated with up to eight subordinate substreams.

E-AC-3比特流包括指示比特流的子流结构的元数据。例如，E-AC-3比特流的比特流信息(BSI)部分中的“chanmap”字段确定由比特流的从属子流指示的节目通道的通道映射。然而，指示子流结构的元数据常规地以如下格式包括在E-AC-3比特流中：该格式使得便于仅由E-AC-3解码器访问和使用(在编码E-AC-3比特流的解码期间)；不便于在解码之后(例如，由后处理器)或解码之前(例如，由被配置成识别元数据的处理器)访问和使用。而且，存在以下风险：解码器可能使用常规地包括的元数据错误地识别常规的E-AC-3编码比特流的子流，并且在本发明之前还不知道如何以这样的格式在编码比特流(例如，编码E-AC-3比特流)中包括子流结构元数据，使得允许在比特流的解码期间方便和高效的检测和校正子流识别中的误差。E-AC-3 bitstreams include metadata indicating the substream structure of the bitstream. For example, the “chanmap” field in the Bitstream Information (BSI) section of an E-AC-3 bitstream determines the channel mapping of program channels indicated by subordinate substreams of the bitstream. However, the metadata indicating the substream structure is conventionally included in E-AC-3 bitstreams in a format that facilitates access and use only by the E-AC-3 decoder (during decoding of the encoded E-AC-3 bitstream); it is not convenient to access and use after decoding (e.g., by a post-processor) or before decoding (e.g., by a processor configured to identify metadata). Moreover, there is a risk that the decoder may incorrectly identify substreams of a conventional E-AC-3 encoded bitstream using the conventionally included metadata, and prior to this invention, it was unknown how to include substream structure metadata in an encoded bitstream (e.g., an encoded E-AC-3 bitstream) in such a format that errors in substream identification can be conveniently and efficiently detected and corrected during bitstream decoding.

E-AC-3比特流还可以包括关于音频节目的音频内容的元数据。例如，指示音频节目的E-AC-3比特流包括指示已经使用谱扩展处理(以及通道耦合编码)以对节目的内容进行编码的最小频率和最大频率的元数据。然而，这样的元数据通常以如下格式包括在E-AC-3比特流中，该格式使得便于仅由E-AC-3解码器访问和使用(在编码E-AC-3比特流的解码期间)；不便于在解码之后(例如，由后处理器)或解码之前(例如，由被配置成识别元数据的处理器)访问和使用。而且，这样的元数据不以如下的格式包括在E-AC-3比特流中，该格式允许在比特流的解码期间这样的元数据的识别的方便和高效的误差检测和误差校正。E-AC-3 bitstreams may also include metadata about the audio content of an audio program. For example, an E-AC-3 bitstream indicating an audio program may include metadata indicating the minimum and maximum frequencies at which spectral spreading processing (and channel-coupled coding) has been used to encode the program's content. However, such metadata is typically included in the E-AC-3 bitstream in a format that facilitates access and use solely by the E-AC-3 decoder (during decoding of the encoded E-AC-3 bitstream); it is not convenient to access and use after decoding (e.g., by a post-processor) or before decoding (e.g., by a processor configured to recognize metadata). Furthermore, such metadata is not included in the E-AC-3 bitstream in a format that allows for convenient and efficient error detection and correction during the decoding of the bitstream.

根据本发明的典型的实施方式，PIM和/或SSM(以及可选地还有其他元数据，例如，响度处理状态元数据或“LPSM”)被嵌入在音频比特流的元数据段的一个或更多个保留字段(或槽(slot))中，该音频比特流还包括其他段(音频数据段)中的音频数据。通常，比特流的每个帧的至少一个段包括PIM或SSM，并且帧的至少一个其他段包括相应的音频数据(即，其数据结构由SSM指示的和/或其至少一个特性或属性由PIM指示的音频数据)。According to a typical embodiment of the invention, PIM and/or SSM (and optionally other metadata, such as loudness processing status metadata or "LPSM") are embedded in one or more reserved fields (or slots) of a metadata segment of an audio bitstream that also includes audio data in other segments (audio data segments). Typically, at least one segment of each frame of the bitstream includes a PIM or SSM, and at least one other segment of the frame includes corresponding audio data (i.e., audio data whose data structure is indicated by the SSM and/or at least one of its characteristics or attributes is indicated by the PIM).

在一类实施方式中，每个元数据段为可以包含一个或更多个元数据有效载荷的数据结构(在本文中有时称为容器)。每个有效载荷包括报头以提供存在于有效载荷中的元数据的类型的明确的指示，其中报头包括具体的有效载荷标识符(或有效载荷配置数据)。有效载荷在容器内的顺序未被定义，使得有效载荷可以以任何顺序存储并且分析器必须能够对整个容器进行分析以提取相关的有效载荷而忽略不相关的或不支持的有效载荷。图8(下面将要描述的)说明这样的容器和容器内的有效载荷的结构。In one implementation, each metadata segment is a data structure (sometimes referred to herein as a container) that may contain one or more metadata payloads. Each payload includes a header to provide an explicit indication of the type of metadata present in the payload, wherein the header includes a specific payload identifier (or payload configuration data). The order of payloads within the container is undefined, allowing payloads to be stored in any order, and the analyzer must be able to analyze the entire container to extract relevant payloads while ignoring irrelevant or unsupported payloads. Figure 8 (described below) illustrates the structure of such a container and the payloads within it.

当两个或更多个音频处理单元需要贯穿该处理链(或内容生命周期)彼此合作工作时，音频数据处理链中的通信元数据(例如，SSM和/或PIM和/或LPSM)尤其有用。在音频比特流中不包括元数据的情况下，例如，当在链中利用两个或更多个音频编解码器并且在媒体消耗装置的比特流路径(或比特流的音频内容的渲染点)期间多于一次地应用单端音量时，可以出现若干媒体处理问题，例如质量、电平和空间退化。Communication metadata (e.g., SSM and/or PIM and/or LPSM) in the audio data processing chain is particularly useful when two or more audio processing units need to work collaboratively across the processing chain (or content lifecycle). In cases where metadata is not included in the audio bitstream, such as when two or more audio codecs are used in the chain and single-ended volume is applied more than once during the bitstream path of the media-consuming device (or the rendering point of the audio content of the bitstream), several media processing problems, such as quality, level, and spatial degradation, can occur.

根据本发明的一些实施方式，嵌入在音频比特流中的响度处理状态元数据(LPSM)可以被认证和验证，例如以使得响度调整实体能够证明特定节目的响度是否已经在指定的范围内以及相应的音频数据本身是否未被修改(由此确保符合可适用的调节)。包括在包括响度处理状态元数据的数据块中的响度值可以被读出以对此进行验证，而不再次计算响度。响应于LPSM，管理结构可以确定相应的音频内容符合(如由LPSM指示的)响度法定的和/或管理的要求(例如，在商业广告响度缓解法下公布的规则，也称为“CALM”法)而不需要计算音频内容的响度。According to some embodiments of the invention, loudness processing status metadata (LPSM) embedded in the audio bitstream can be authenticated and verified, for example, to enable a loudness adjustment entity to demonstrate whether the loudness of a particular program is within a specified range and whether the corresponding audio data itself has not been modified (thereby ensuring compliance with applicable regulation). Loudness values included in data blocks containing loudness processing status metadata can be read out for verification without recalculating the loudness. In response to the LPSM, a regulatory structure can determine that the corresponding audio content complies with (as indicated by the LPSM) the legal and/or regulatory requirements for loudness (e.g., rules published under the Commercial Advertising Loudness Mitigation Act, also known as the "CALM" Act) without needing to calculate the loudness of the audio content.

图1为示例性音频处理链(音频数据处理系统)的框图，在音频处理链中，系统的元件中的一个或更多个可以根据本发明的实施方式被配置。系统包括如所示耦接在一起的以下元件：预处理单元、编码器、信号分析和元数据校正单元、代码转换器、解码器和后处理单元。在所示的系统的变型中，省略元件中的一个或更多个，或包括额外的音频数据处理单元。Figure 1 is a block diagram of an exemplary audio processing chain (audio data processing system), in which one or more elements of the system can be configured according to embodiments of the present invention. The system includes the following elements coupled together as shown: a preprocessing unit, an encoder, a signal analysis and metadata correction unit, a code converter, a decoder, and a post-processing unit. In variations of the system shown, one or more elements are omitted, or additional audio data processing units are included.

在一些实现中，图1的预处理单元被配置成接收包括音频内容的PCM(时域)样本作为输入，并且输出经处理PCM样本。编码器可以被配置成接收PCM样本作为输入，并且输出指示音频内容的编码的(例如，压缩的)音频比特流。指示音频内容的比特流的数据在本文中有时被称为“音频数据”。如果编码器根据本发明的典型实施方式被配置，那么从编码器输出的音频比特流包括PIM和/或SSM(可选地还包括响度处理状态元数据和/或其他元数据)以及音频数据。In some implementations, the preprocessing unit of Figure 1 is configured to receive PCM (temporal domain) samples including audio content as input and output processed PCM samples. The encoder can be configured to receive PCM samples as input and output an encoded (e.g., compressed) audio bitstream indicating the audio content. The data indicating the audio content bitstream is sometimes referred to herein as "audio data." If the encoder is configured according to a typical embodiment of the invention, the audio bitstream output from the encoder includes PIM and/or SSM (optionally also including loudness processing status metadata and/or other metadata) and audio data.

图1的信号分析和元数据校正单元可以接收一个或更多个编码音频比特流作为输入，并且通过执行信号分析(例如，使用编码音频比特流中的节目边界元数据)来确定(例如，验证)每个编码音频比特流中的元数据(例如，处理状态元数据)是否正确。如果信号分析和元数据校正单元发现所包括的元数据是无效的，那么通常使用从信号分析中获得的正确值替代错误值。从而，从信号分析和元数据校正单元输出的每个编码音频比特流可以包括校正的(或未校正的)处理状态元数据以及编码音频数据。The signal analysis and metadata correction unit in Figure 1 can receive one or more encoded audio bitstreams as input and determine (e.g., verify) whether the metadata (e.g., processing status metadata) in each encoded audio bitstream is correct by performing signal analysis (e.g., using program boundary metadata in the encoded audio bitstreams). If the signal analysis and metadata correction unit finds that the included metadata is invalid, it typically replaces the erroneous value with the correct value obtained from the signal analysis. Thus, each encoded audio bitstream output from the signal analysis and metadata correction unit can include corrected (or uncorrected) processing status metadata as well as encoded audio data.

图1的代码转换器可以接收编码音频比特流作为输入，并且作为响应(例如，通过对输入流进行解码并且以不同的编码格式对解码流进行重新编码)输出修改的(例如，不同编码的)音频比特流。如果代码转换器根据本发明的典型的实施方式被配置，那么从代码转换器输出的音频比特流包括SSM和/或PIM(通常还包括其他元数据)以及编码音频数据。元数据可以已经被包括在输入比特流中。The code converter of Figure 1 can receive an encoded audio bitstream as input and, in response (e.g., by decoding the input stream and re-encoding the decoded stream in a different encoding format), output a modified (e.g., differently encoded) audio bitstream. If the code converter is configured according to a typical embodiment of the invention, the audio bitstream output from the code converter includes SSM and/or PIM (and typically other metadata) as well as encoded audio data. The metadata may have already been included in the input bitstream.

图1的解码器可以接收编码的(例如，压缩的)音频比特流作为输入，并且输出(作为响应)解码PCM音频样本流。如果解码器根据本发明的典型的实施方式被配置，那么在典型的操作中，解码器的输出是或包括下列中的任一个：The decoder of Figure 1 can receive an encoded (e.g., compressed) audio bitstream as input and output (as a response) a decoded PCM audio sample stream. If the decoder is configured according to a typical embodiment of the invention, then in typical operation, the decoder's output is or includes any of the following:

音频样本流，以及从输入的编码比特流中提取的SSM和/或PIM(通常还有其他元数据)的至少一个相应的流；或An audio sample stream, and at least one corresponding stream of SSM and/or PIM (and often other metadata) extracted from the input encoded bitstream; or

音频样本流，以及根据从输入编码比特流中提取的SSM和/或PIM(通常还有其他元数据，例如LPSM)所确定的控制位的相应的流；或The audio sample stream, and the corresponding stream of control bits determined based on the SSM and/or PIM extracted from the input encoded bit stream (usually along with other metadata, such as LPSM); or

音频样本流，但没有元数据或根据元数据确定的控制位的相应的流。在最后一种情下，解码器可以从输入编码比特流中提取元数据，并且对所提取的元数据执行至少一种操作(例如，验证)，即使没有输出所提取的元数据或根据元数据确定的控制位。An audio sample stream, but without metadata or a corresponding stream of control bits determined from the metadata. In the latter case, the decoder can extract metadata from the input encoded bit stream and perform at least one operation (e.g., verification) on the extracted metadata, even if there is no output of the extracted metadata or control bits determined from the metadata.

通过根据本发明的典型的实施方式配置图1的后处理单元，后处理单元被配置成接收解码的PCM音频样本流，并且使用与样本一起接收的SSM和/或PIM(通常还有其他元数据，例如LPSM)，或根据与样本一起接收的元数据确定的控制位对其执行后处理(例如，音频内容的音量校平)。后处理单元还通常被配置成对经后处理音频内容进行渲染用于由一个或更多个扬声器回放。By configuring the post-processing unit of FIG1 according to a typical embodiment of the invention, the post-processing unit is configured to receive a decoded PCM audio sample stream and perform post-processing (e.g., volume leveling of the audio content) on it using the SSM and/or PIM (and typically other metadata, such as LPSM) received with the samples, or based on control bits determined according to the metadata received with the samples. The post-processing unit is also typically configured to render the post-processed audio content for playback by one or more speakers.

本发明的典型的实施方式提供增强的音频处理链，其中音频处理单元(例如，编码器、解码器、代码转换器以及预处理单元和后处理单元)根据由通过音频处理单元分别接收的元数据所指示的媒体数据的同时期的状态来修改待应用于音频数据的其相应的处理。Typical embodiments of the present invention provide an enhanced audio processing chain in which audio processing units (e.g., encoders, decoders, code converters, and preprocessing and postprocessing units) modify their respective processing to be applied to the audio data based on the concurrent state of the media data indicated by metadata received by the audio processing units respectively.

输入到图1系统的任何音频处理单元(例如，图1的编码器或代码转换器)的音频数据可以包括SSM和/或PIM(可选地还包括其他元数据)以及音频数据(例如，编码音频数据)。该元数据可以根据本发明的实施方式已经通过图1系统的另一元件(或另一源，在图1中未示出)而被包括在输入音频中。接收输入音频(具有元数据)的处理单元可以被配置成对元数据执行至少一种操作(例如，验证)，或响应于元数据(例如，输入音频的自适应处理)，并且还通常将元数据、元数据的经处理的版本、或根据元数据确定的控制位包括在其输出音频中。Audio data input to any audio processing unit of the system of Figure 1 (e.g., the encoder or code converter of Figure 1) may include SSM and/or PIM (optionally including other metadata) and audio data (e.g., encoded audio data). This metadata may be included in the input audio via another element (or another source, not shown in Figure 1) of the system of Figure 1 according to embodiments of the invention. The processing unit receiving the input audio (with metadata) may be configured to perform at least one operation on the metadata (e.g., verification), or in response to the metadata (e.g., adaptive processing of the input audio), and typically also includes the metadata, a processed version of the metadata, or control bits determined based on the metadata in its output audio.

本发明的音频处理单元(或音频处理器)的典型的实施方式被配置成基于由对应于音频数据的元数据所指示的音频数据的状态来执行音频数据的自适应处理。在一些实施方式中，自适应处理是(或包括)响度处理(如果元数据指示还未对音频数据执行响度处理或与响度处理类似的处理)，而不是(且不包括)响度处理(如果元数据指示已经对音频数据执行了这样的响度处理或与响度处理类似的处理)。在一些实施方式中，自适应处理是或包括(例如，在元数据验证子单元中执行的)元数据验证以确保音频处理单元基于由元数据所指示的音频数据的状态来执行音频数据的其他自适应处理。在一些实施方式中，该验证确定与音频数据相关联(例如，包括在具有音频数据的比特流中)的元数据的可靠性。例如，如果验证元数据是可靠的，那么来自一种先前执行的音频处理的结果可以被重新使用并且可以避免新执行相同类型的音频处理。另一方面，如果发现元数据已经被篡改(或以其他方式不可靠)，那么据称先前执行的一种类型的媒体处理(如由不可靠的元数据指示的)可以由音频处理单元重复，和/或可以由音频处理单元对元数据和/或音频数据执行其他处理。如果该单元确定元数据是有效的(例如，基于所提取的加密值与参考加密值的匹配)，音频处理单元还可以被配置成用信号向增强的媒体处理链下游的其他音频处理单元通知元数据(例如，存在于媒体比特流中)是有效的。A typical implementation of the audio processing unit (or audio processor) of the present invention is configured to perform adaptive processing of audio data based on the state of the audio data indicated by metadata corresponding to the audio data. In some implementations, the adaptive processing is (or includes) loudness processing (if the metadata indicates that loudness processing or a similar process has not yet been performed on the audio data), but not (and does not include) loudness processing (if the metadata indicates that such loudness processing or a similar process has already been performed on the audio data). In some implementations, the adaptive processing is or includes (e.g., performed in a metadata verification subunit) metadata verification to ensure that the audio processing unit performs additional adaptive processing of the audio data based on the state of the audio data indicated by the metadata. In some implementations, the verification determines the reliability of the metadata associated with the audio data (e.g., included in a bitstream containing the audio data). For example, if the verified metadata is reliable, the result from a previously performed audio processing can be reused and new performance of the same type of audio processing can be avoided. On the other hand, if the metadata is found to have been tampered with (or otherwise unreliable), then a type of media processing that was allegedly previously performed (such as that indicated by the unreliable metadata) can be repeated by the audio processing unit, and/or other processing can be performed on the metadata and/or audio data by the audio processing unit. If the unit determines that the metadata is valid (e.g., based on a match between the extracted encrypted value and a reference encrypted value), the audio processing unit can also be configured to signal other audio processing units downstream in the enhanced media processing chain that the metadata (e.g., present in the media bitstream) is valid.

图2是作为本发明的音频处理单元的实施方式的编码器(100)的框图。编码器100的任何部件或元件可以以硬件或软件或硬件与软件的组合被实现为一个或更多个处理和/或一个或更多个电路(例如，ASIC、FPGA或其他集成电路)。编码器100包括如所示地连接的帧缓冲器110、分析器111、解码器101、音频状态验证器102、响度处理级103、音频流选择级104、编码器105、填充器/格式器级107、元数据生成级106、对白响度测量子系统108以及帧缓冲器109。编码器100通常还包括其他处理元件(未示出)。Figure 2 is a block diagram of an encoder (100) as an embodiment of the audio processing unit of the present invention. Any component or element of the encoder 100 may be implemented in hardware or software or a combination of hardware and software as one or more processes and/or one or more circuits (e.g., ASIC, FPGA or other integrated circuits). The encoder 100 includes a frame buffer 110, an analyzer 111, a decoder 101, an audio state verifier 102, a loudness processing stage 103, an audio stream selection stage 104, an encoder 105, a filler/formatter stage 107, a metadata generation stage 106, a dialogue loudness measurement subsystem 108, and a frame buffer 109, connected as shown. The encoder 100 typically also includes other processing elements (not shown).

编码器100(为代码转换器)被配置成包括通过使用包括在输入比特流中的响度处理状态元数据执行自适应和自动的响度处理来将输入音频比特流(例如，可以是AC-3比特流、E-AC-3比特流或杜比E比特流中的一个)转换成编码输出音频比特流(例如，可以是AC-3比特流、E-AC-3比特流或杜比E比特流中的另一个)。例如，编码器100可以被配置成将(通常用在生产和广播设备中，但不用在接收已经被广播的音频节目的消费者设备中的格式的)输入杜比E比特流转换成AC-3或E-AC-3格式的(适合于广播至消费者设备的)编码输出音频比特流。Encoder 100 (a code converter) is configured to convert an input audio bitstream (e.g., one of an AC-3 bitstream, an E-AC-3 bitstream, or a Dolby E bitstream) into an encoded output audio bitstream (e.g., another of an AC-3 bitstream, an E-AC-3 bitstream, or a Dolby E bitstream) by performing adaptive and automatic loudness processing using loudness processing state metadata included in the input bitstream. For example, encoder 100 can be configured to convert an input Dolby E bitstream (in a format typically used in production and broadcast equipment, but not in consumer devices receiving already broadcast audio programs) into an encoded output audio bitstream in AC-3 or E-AC-3 format (suitable for broadcast to consumer devices).

图2的系统还包括编码音频传送子系统150(其存储和/或传送从编码器100输出的编码比特流)和解码器152。从编码器100输出的编码音频比特流可以由子系统150(例如，以DVD或蓝光光盘格式)存储，或由子系统150(可以实现传输线路或网络)传输，或可以由子系统150存储和传输。解码器152被配置成包括通过从比特流的每个帧中提取元数据(PIM和/或SSM、以及可选地还有响度处理状态元数据和/或其他元数据)(以及可选地还从比特流中提取节目边界元数据)以及生成解码音频数据，对经由子系统150接收的(由编码器100生成的)编码音频比特流进行解码。通常，解码器152被配置成使用PIM和/或SSM和/或LPSM(可选地还使用节目边界元数据)对解码音频数据执行自适应处理，和/或将解码音频数据和元数据转发至被配置成使用元数据对解码音频数据执行自适应处理的后处理器。通常，解码器152包括存储(例如，以非暂态方式)从子系统150中接收的编码音频比特流的缓冲器。The system of Figure 2 also includes an encoded audio transmission subsystem 150 (which stores and/or transmits the encoded bitstream output from encoder 100) and a decoder 152. The encoded audio bitstream output from encoder 100 may be stored by subsystem 150 (e.g., in DVD or Blu-ray disc format), transmitted by subsystem 150 (which may implement a transmission line or network), or both stored and transmitted by subsystem 150. Decoder 152 is configured to decode the encoded audio bitstream (generated by encoder 100) received via subsystem 150 by extracting metadata (PIM and/or SSM, and optionally loudness processing status metadata and/or other metadata) (and optionally program boundary metadata from the bitstream) from each frame of the bitstream and generating decoded audio data. Typically, decoder 152 is configured to perform adaptive processing on the decoded audio data using PIM and/or SSM and/or LPSM (optionally also using program boundary metadata), and/or forward the decoded audio data and metadata to a postprocessor configured to perform adaptive processing on the decoded audio data using metadata. Typically, decoder 152 includes a buffer that stores (e.g., in a non-transient manner) the encoded audio bitstream received from subsystem 150.

编码器100和解码器152的各种实现被配置成执行本发明的方法的不同的实施方式。Various implementations of encoder 100 and decoder 152 are configured to perform different embodiments of the method of the present invention.

帧缓冲器110是耦接以接收编码输入音频比特流的缓冲存储器。在操作中，缓冲器110存储(例如，以非暂态方式)编码音频比特流的至少一个帧，并且编码音频比特流的帧的序列被从缓冲器110设定到分析器111。Frame buffer 110 is a buffer memory coupled to receive an encoded input audio bitstream. In operation, buffer 110 stores (e.g., in a non-transient manner) at least one frame of the encoded audio bitstream, and a sequence of frames of the encoded audio bitstream is set from buffer 110 to analyzer 111.

将分析器111耦接并配置成从包括这样的元数据的编码输入音频的每个帧中提取PIM和/或SSM、以及响度处理状态元数据(LPSM)、以及可选地还有节目边界元数据(和/或其他元数据)，至少将LPSM(以及可选地还有节目边界元数据和/或其他元数据)设定到音频状态验证器102、响度处理级103、级106和子系统108，以从编码输入音频中提取音频数据并且将音频数据设定到解码器101。编码器100的解码器101被配置成对音频数据进行解码以生成解码音频数据，并且将解码音频数据设定到响度处理级103、音频流选择级104、子系统108以及通常还设定到状态验证器102。The analyzer 111 is coupled and configured to extract PIM and/or SSM, loudness processing state metadata (LPSM), and optionally program boundary metadata (and/or other metadata) from each frame of the encoded input audio, including such metadata. The LPSM (and optionally program boundary metadata and/or other metadata) is set to at least the audio state verifier 102, loudness processing level 103, level 106, and subsystem 108 to extract audio data from the encoded input audio and set the audio data to the decoder 101. The decoder 101 of the encoder 100 is configured to decode the audio data to generate decoded audio data and set the decoded audio data to the loudness processing level 103, audio stream selection level 104, subsystem 108, and typically also to the state verifier 102.

状态验证器102被配置成对设定到其的LPSM(可选地其他元数据)进行认证和验证。在一些实施方式中，LPSM是(或包括在)数据块(中)，数据块已经包括在输入比特流中(例如，根据本发明的实施方式)。块可以包括加密散列(基于散列的消息认证代码或“HMAC”)用于对LPSM(可选地还有其他元数据)和/或(从解码器101提供至验证器102的)基本的音频数据进行处理。在这些实施方式中，数据块可以被数字地标记，使得下游的音频处理单元可以相对容易地认证和验证处理状态元数据。The state verifier 102 is configured to authenticate and verify the LPSM (optionally other metadata) set thereto. In some embodiments, the LPSM is (or is included in) a data block that has already been included in the input bitstream (e.g., according to embodiments of the invention). The block may include a cryptographic hash (a hash-based message authentication code or "HMAC") for processing the LPSM (optionally other metadata) and/or the basic audio data (provided from decoder 101 to verifier 102). In these embodiments, the data block may be digitally labeled so that downstream audio processing units can relatively easily authenticate and verify the processing state metadata.

例如，HMAC用于生成摘要，并且包括在本发明的比特流中的保护值可以包括该摘要。该摘要可以关于AC-3帧被如下生成：For example, HMAC is used to generate a digest, and the protection value included in the bitstream of this invention may include this digest. This digest can be generated with respect to an AC-3 frame as follows:

1.在AC-3数据和LPSM被编码之后，帧数据字节(连接的帧数据#1和帧数据#2)和LPSM数据字节用作哈希函数HMAC的输入。没有考虑可以存在于辅助数据字段内的其他数据用于计算摘要。这样的其他数据可以是既不属于AC-3数据也不属于LPSM数据的字节。可以不考虑包括在LPSM中的保护位用于计算HMAC摘要。1. After the AC-3 data and LPSM are encoded, the frame data bytes (connected frame data #1 and frame data #2) and LPSM data bytes are used as input to the hash function HMAC. Other data that may exist within the auxiliary data field for digest calculation is not considered. Such other data can be bytes that are neither part of the AC-3 data nor the LPSM data. The guard bits included in the LPSM for HMAC digest calculation can be disregarded.

2.在计算摘要之后，被写入比特流中的为保护位保留的字段中。2. After the digest is calculated, it is written into the field reserved for the protection bits in the bit stream.

3.生成完整的AC-3帧的最后步骤是CRC校验的计算。这被写在帧的结束处并且考虑属于该帧的所有的数据，包括LPSM位。3. The final step in generating a complete AC-3 frame is the calculation of the CRC checksum. This is written at the end of the frame and takes into account all data belonging to that frame, including the LPSM bit.

包括但不限于一个或更多个非HMAC加密方法中的任意一个的其他加密方法可以用于LPSM和/或其他元数据(例如，在验证器102中)的验证，以确保元数据和/或基本音频数据的安全的传输和接收。例如，可以在接收本发明的音频比特流的实施方式的每个音频处理单元中执行验证(使用这样的加密方法)，以确定包括在该比特流中的元数据和相应的音频数据是否已经经历(和/或已经产生)具体的处理(由元数据指示的)并且在这样的具体的处理执行之后是否未被修改。Other encryption methods, including but not limited to any one of one or more non-HMAC encryption methods, can be used for the verification of LPSM and/or other metadata (e.g., in verifier 102) to ensure the secure transmission and reception of metadata and/or basic audio data. For example, verification (using such encryption methods) can be performed in each audio processing unit receiving the audio bitstream of the present invention to determine whether the metadata and corresponding audio data included in the bitstream have undergone (and/or been generated) specific processing (indicated by the metadata) and whether they have not been modified after such specific processing was performed.

状态验证器102将控制数据设定到音频流选择级104、元数据生成器106以及对白响度测量子系统108，以表示验证操作的结果。响应于控制数据，级104可以选择(以及传递至编码器105)：The state verifier 102 sets control data to the audio stream selection level 104, the metadata generator 106, and the dialogue loudness measurement subsystem 108 to represent the result of the verification operation. In response to the control data, level 104 can be selected (and passed to encoder 105):

响度处理级103的经自适应处理的输出(例如，当LPSM指示从解码器101输出的音频数据没有经历特定类型的响度处理，以及来自验证器102的控制位指示LPSM有效时)；或The adaptively processed output of loudness processing level 103 (e.g., when the LPSM indicator shows that the audio data output from decoder 101 has not undergone a specific type of loudness processing, and a control bit from verifier 102 indicates that LPSM is active); or

从解码器102输出的音频数据(例如，当LPSM指示从解码器101输出的音频数据已经经历将由级103执行的特定类型的响度处理，并且来自验证器102的控制位指示LPSM有效时)。Audio data output from decoder 102 (e.g., when LPSM indicates that the audio data output from decoder 101 has undergone a specific type of loudness processing to be performed by stage 103, and a control bit from verifier 102 indicates that LPSM is valid).

编码器100的级103被配置成基于由通过解码器101所提取的LPSM指示的一个或更多个音频数据特性，对从解码器101输出的解码音频数据执行自适应响度处理。级103可以是自适应变换域实时响度和动态范围控制处理器。级103可以接收用户输入(例如，用户目标响度/动态范围值或对白归一化值)、或其他元数据输入(例如，一种或更多种类型的第三方数据、跟踪信息、标识符、所有权或标准信息、用户注释数据、用户偏好数据等)和/或其他输入(例如，来自指纹识别处理)，并且使用这样的输入以对从解码器101输出的解码音频数据进行处理。级103可以对指示(由通过分析器111提取的节目边界元数据所表示的)单个音频节目的(从解码器101输出的)解码音频数据执行自适应响度处理，并且可以响应于接收到指示由通过分析器111提取的节目边界元数据所指示的不同的音频节目的(从解码器101输出的)解码音频数据将响度处理复位。Stage 103 of encoder 100 is configured to perform adaptive loudness processing on the decoded audio data output from decoder 101 based on one or more audio data characteristics indicated by LPSM extracted by decoder 101. Stage 103 may be an adaptive transform domain real-time loudness and dynamic range control processor. Stage 103 may receive user input (e.g., user-targeted loudness/dynamic range values or dialogue normalization values), or other metadata input (e.g., one or more types of third-party data, tracking information, identifiers, ownership or standard information, user annotation data, user preference data, etc.) and/or other input (e.g., from fingerprint recognition processing), and use such input to process the decoded audio data output from decoder 101. Level 103 can perform adaptive loudness processing on decoded audio data (output from decoder 101) indicating a single audio program (represented by program boundary metadata extracted by analyzer 111), and can reset the loudness processing in response to receiving decoded audio data (output from decoder 101) indicating a different audio program (represented by program boundary metadata extracted by analyzer 111).

当来自验证器102的控制位指示LPSM无效时，对白响度测量子系统108可以操作以使用由解码器101提取的LPSM(和/或其他元数据)来确定表示对白(或其他语音)的(来自解码器101的)解码音频的段的响度。当来自验证器102的控制位指示LPSM有效时，当LPSM指示(来自解码器101的)解码音频的对白(或其他语音)段的先前确定的响度时，可以禁止对白响度测量子系统108的操作。子系统108可以对表示(通过分析器111所提取的节目边界元数据所指示的)单个音频节目的解码音频数据执行响度测量，并且可以响应于接收到表示由这样的节目边界元数据所指示的不同的音频节目的解码音频数据将响度处理复位。When the control bit from verifier 102 indicates that the LPSM is invalid, the dialogue loudness measurement subsystem 108 can operate to determine the loudness of a segment of decoded audio (from decoder 101) representing dialogue (or other speech) using the LPSM (and/or other metadata) extracted by decoder 101. When the control bit from verifier 102 indicates that the LPSM is valid, the operation of the dialogue loudness measurement subsystem 108 can be disabled when the LPSM indicates a previously determined loudness of a segment of dialogue (or other speech) in the decoded audio (from decoder 101). Subsystem 108 can perform loudness measurements on decoded audio data representing a single audio program (indicated by program boundary metadata extracted by analyzer 111) and can reset loudness processing in response to receiving decoded audio data representing a different audio program indicated by such program boundary metadata.

存在有用的工具(例如，杜比LM100响度计)用于方便地和容易地对音频内容中的对白的电平进行测量。本发明的APU(例如，编码器100的级108)的一些实施方式被实现以包括这样的工具(或执行这样的工具的功能)来对音频比特流(例如，从编码器100的解码器101设定到级108的解码AC-3比特流)的音频内容的平均对白响度进行测量。Useful tools (e.g., a Dolby LM100 loudness meter) exist for convenient and easy measurement of the level of dialogue in audio content. Some embodiments of the APU of the present invention (e.g., stage 108 of encoder 100) are implemented to include such a tool (or perform the function of such a tool) to measure the average dialogue loudness of the audio content of an audio bitstream (e.g., the decoded AC-3 bitstream from decoder 101 of encoder 100 to decoded stage 108).

如果级108被实现成对音频数据的真实平均对白响度进行测量，那么测量可以包括将主要包含语音的音频内容的段分离的步骤。然后，根据响度测量算法来处理主要为语音的音频段。对于根据AC-3比特流解码的音频数据，该算法可以是标准的K加权响度测量(根据国际标准ITU-R BS 1770)。可替代地，可以使用其他响度测量(例如，基于响度的心理声学模型的那些测量)。If level 108 is implemented to measure the true average dialogue loudness of audio data, the measurement may include the step of separating audio segments that primarily contain speech. The audio segments that are primarily speech are then processed according to a loudness measurement algorithm. For audio data decoded from an AC-3 bitstream, this algorithm may be a standard K-weighted loudness measurement (according to the international standard ITU-R BS 1770). Alternatively, other loudness measurements may be used (e.g., those based on a psychoacoustic model of loudness).

语音段的分离不是测量音频数据的平均对白响度所必需的。然而，它提高测量的准确度，并且通常提供来自听者感知的较满意的结果。因为不是所有的音频内容包含对白(语音)，整个音频内容的响度测量可以提供语音已经存在的音频的对白电平的足够的近似。Speech segment separation is not necessary for measuring the average dialogue loudness of audio data. However, it improves the accuracy of the measurement and generally provides more satisfactory results from listener perception. Because not all audio content contains dialogue (speech), a loudness measurement of the entire audio content can provide a sufficient approximation of the dialogue level of the audio where speech already exists.

元数据生成器106生成(和/或传递至级107)要由级107包括在待从编码器100输出的编码比特流中。元数据生成器106可以将由编码器101和/或分析器111提取的LPSM(可选地还有LIM和/或PIM和/或节目边界元数据和/或其他元数据)传递至级107(例如，当来自验证器102的控制位指示LPSM和/或其他元数据有效时)，或生成新的LIM和/或PIM和/或LPSM和/或节目边界元数据和/或其他元数据并且将新的元数据设定到级107(例如，当来自验证器102的控制位指示由解码器101提取的元数据无效时)，或可以将由解码器101和/或分析器111提取的元数据与新生成的元数据的组合设定到级107。元数据生成器106可以将由子系统108生成的响度数据以及指示由子系统108执行的响度处理的类型的至少一个值包括在LPSM中，将LPSM设定到级107以用于包括在待从编码器100输出的编码比特流中。Metadata generator 106 generates (and/or passes to level 107) the encoded bitstream to be included by level 107 in the output from encoder 100. Metadata generator 106 may pass LPSM (optionally also LIM and/or PIM and/or program boundary metadata and/or other metadata) extracted by encoder 101 and/or analyzer 111 to level 107 (e.g., when a control bit from verifier 102 indicates that LPSM and/or other metadata is valid), or generate new LIM and/or PIM and/or LPSM and/or program boundary metadata and/or other metadata and set the new metadata to level 107 (e.g., when a control bit from verifier 102 indicates that metadata extracted by decoder 101 is invalid), or set a combination of metadata extracted by decoder 101 and/or analyzer 111 and newly generated metadata to level 107. Metadata generator 106 can include loudness data generated by subsystem 108 and at least one value indicating the type of loudness processing performed by subsystem 108 in the LPSM, setting the LPSM to level 107 for inclusion in the encoded bitstream to be output from encoder 100.

元数据生成器106可以生成用于待被包括在编码比特流和/或待被包括在编码比特流中的基本音频数据中的LPSM(可选地还有其他元数据)的解密、认证或验证中的至少一个的控制位(可以由基于散列的消息认证代码或“HMAC”组成或包括基于散列的消息认证代码或“HMAC”)。元数据生成器106可以向级107提供这样的保护位以用于包括在编码比特流中。Metadata generator 106 can generate control bits (which may consist of or include a hash-based message authentication code or "HMAC") for at least one of the decryption, authentication, or verification of LPSM (optionally other metadata) in the underlying audio data to be included in the encoded bitstream and/or the underlying audio data to be included in the encoded bitstream. Metadata generator 106 can provide such protection bits to stage 107 for inclusion in the encoded bitstream.

在典型的操作中，对白响度测量子系统108对从解码器101输出的音频数据进行处理以响应于音频数据生成响度值(例如，选通的和未选通的对白响度值)和动态范围值。响应于这些值，元数据生成器106可以生成响度处理状态元数据(LPSM)以用于(由填充器/格式器107)包括在待从编码器100输出的编码比特流中。In typical operation, the dialogue loudness measurement subsystem 108 processes the audio data output from the decoder 101 to generate loudness values (e.g., gated and ungated dialogue loudness values) and dynamic range values in response to the audio data. In response to these values, the metadata generator 106 can generate loudness processing state metadata (LPSM) for inclusion (by the filler/formatter 107) in the encoded bitstream to be output from the encoder 100.

另外，可选地，或可替代地，编码器100的子系统106和/或108可以执行音频数据的额外的分析以生成指示音频数据的至少一个特性的元数据以用于包括在待从级107输出的编码比特流中。Additionally, or alternatively, subsystems 106 and/or 108 of encoder 100 may perform additional analysis of the audio data to generate metadata indicating at least one characteristic of the audio data for inclusion in the encoded bitstream output from slave level 107.

编码器105对从选择级104输出的音频数据进行编码(例如，通过对其执行压缩)，并且将编码的音频设定到级107以用于包括在待从级107输出的编码比特流中。Encoder 105 encodes the audio data output from selection stage 104 (e.g., by compressing it) and sets the encoded audio to stage 107 for inclusion in the encoded bitstream to be output from stage 107.

级107将来自编码器105的编码音频和来自生成器106的元数据(包括PIM和/或SSM)进行复用以生成待从级107中输出的编码比特流，优选地使得编码比特流具有由本发明的优选实施方式指定的格式。Stage 107 multiplexes the encoded audio from encoder 105 and metadata (including PIM and/or SSM) from generator 106 to generate an encoded bitstream to be output from stage 107, preferably such that the encoded bitstream has a format specified by a preferred embodiment of the present invention.

帧缓冲器109为存储(例如，以非暂态方式)从级107输出的编码音频比特流的至少一个帧的缓冲存储器，然后编码音频比特流的一系列帧被从缓冲器109作为来自编码器100的输出设定至传送系统150。Frame buffer 109 is a buffer memory for storing (e.g., in a non-transient manner) at least one frame of the encoded audio bitstream output from stage 107, and then a series of frames of the encoded audio bitstream are set from buffer 109 as output from encoder 100 to transmission system 150.

由元数据生成器106生成并且由级107包括在编码比特流中的LPSM通常指示相应音频数据的响度处理状态(例如，已经对音频数据执行什么类型的响度处理)以及相应音频数据的响度(例如，测量的对白响度、选通和/或未选通的响度、和/或动态范围)。The LPSM generated by metadata generator 106 and included in the encoded bitstream by level 107 typically indicates the loudness processing status of the corresponding audio data (e.g., what type of loudness processing has been performed on the audio data) and the loudness of the corresponding audio data (e.g., measured dialogue loudness, gated and/or ungated loudness, and/or dynamic range).

在本文中，对音频数据执行的响度和/或电平测量的“选通”是指超过阈值的计算值被包括在最终测量(例如，在最终测量的值中忽略低于-60dBFS的短期响度值)中的特定电平或响度阈值。绝对值的选通是指固定的电平或响度，而相对值的选通是指依赖于当前“未选通的”测量值的值。In this paper, "gating" of loudness and/or level measurements performed on audio data refers to a specific level or loudness threshold in which calculated values exceeding a threshold are included in the final measurement (e.g., ignoring short-term loudness values below -60 dBFS in the final measurement). Gating of absolute values refers to a fixed level or loudness, while gating of relative values refers to values that depend on the currently "ungated" measurement.

在编码器100的一些实现中，缓存在存储器109(以及输出至传送系统150)的编码比特流为AC-3比特流或E-AC-3比特流，并且包括音频数据段(例如，图4中所示的帧的AB0至AB5段)和元数据段，其中音频数据段指示音频数据，并且元数据段中的至少一些中的每个包括PIM和/或SSM(以及可选地其他元数据)。级107将元数据段(包括元数据)插入到下面的格式的比特流中。包括PIM和/或SSM的元数据段中的每个元数据段被包括在比特流的无用位段(例如，图4或图7中所示的无用位段“W”)中，或比特流的帧的比特流信息(“BSI”)段的“addbsi”字段中，或比特流的帧的结束处的辅助数据字段(例如，图4或图7中所示的AUX段)。比特流的帧可以包括一个或两个元数据段，每个元数据段包括元数据，并且如果帧包括两个元数据段，一个可以存在于帧的addbsi字段中而另一个存在于帧的AUX字段中。In some implementations of encoder 100, the encoded bitstream cached in memory 109 (and output to transmission system 150) is an AC-3 bitstream or an E-AC-3 bitstream, and includes audio data segments (e.g., segments AB0 to AB5 of the frame shown in FIG. 4) and metadata segments, wherein the audio data segments indicate audio data, and each of at least some of the metadata segments includes PIM and/or SSM (and optionally other metadata). Stage 107 inserts the metadata segments (including metadata) into the bitstream in the following format. Each metadata segment including PIM and/or SSM is included in a useless bit segment of the bitstream (e.g., the useless bit segment “W” shown in FIG. 4 or FIG. 7), or in the “addbsi” field of the bitstream frame’s bitstream information (“BSI”) segment, or in an auxiliary data field at the end of the bitstream frame (e.g., the AUX segment shown in FIG. 4 or FIG. 7). A frame of a bitstream may include one or two metadata segments, each containing metadata, and if the frame includes two metadata segments, one may be in the addbsi field of the frame and the other in the AUX field of the frame.

在一些实施方式中，由级107插入的每个元数据段(在本文中有时称为“容器”)具有包括元数据段报头(可选地还包括其他强制的或“核心”元素)以及在元数据段报头之后的一个或更多个元数据有效载荷的格式。如果存在，SIM被包括在元数据有效载荷中的一个有效载荷(由有效载荷报头标识，并且通常具有第一类型的格式)中。如果存在，PIM被包括在元数据有效载荷中的另一个有效载荷(由有效载荷报头标识，并且通常具有第二类型的格式)中。类似地，元数据的每个其他类型(如果存在)被包括在元数据有效载荷中的另一有效载荷(由有效载荷报头标识，并且通常具有针对元数据的类型的格式)中。示例性格式使得能够在除了解码期间之外的时间便于访问(例如，由解码之后的后处理器、或由被配置成在没有对编码比特流执行完全解码的情况下识别元数据的处理器)SSM、PIM和其他元数据，并且允许在比特流的解码期间(例如，子流识别的)方便和高效的误差检测和校正。例如，在不以示例性格式访问SSM的情况下，解码器可能错误地识别与节目相关联的子流的正确数量。元数据段中的一个元数据有效载荷可以包括SSM，元数据段中的另一个元数据有效载荷可以包括PIM，以及可选地，元数据段中的至少一个其他元数据有效载荷可以包括其他元数据(例如，响度处理状态元数据或“LPSM”)。In some implementations, each metadata segment inserted by level 107 (sometimes referred to herein as a “container”) has a format including a metadata segment header (optionally including other mandatory or “core” elements) and one or more metadata payloads following the metadata segment header. If present, the SIM is included in one of the metadata payloads (identified by a payload header and typically having a first-type format). If present, the PIM is included in another of the metadata payloads (identified by a payload header and typically having a second-type format). Similarly, each other type of metadata (if present) is included in another of the metadata payloads (identified by a payload header and typically having a format specific to the type of metadata). The exemplary format enables convenient access to SSM, PIM, and other metadata at times other than during decoding (e.g., by a post-processor after decoding, or by a processor configured to identify metadata without performing full decoding on the encoded bitstream) and allows for convenient and efficient error detection and correction during the decoding of the bitstream (e.g., substream identification). For example, if the SSM is not accessed in the exemplary format, the decoder may incorrectly identify the correct number of substreams associated with the program. One metadata payload in a metadata segment may include the SSM, another metadata payload in a metadata segment may include the PIM, and optionally, at least one other metadata payload in a metadata segment may include other metadata (e.g., loudness processing status metadata or "LPSM").

在一些实施方式中，(由级107)包括在编码比特流(例如，指示至少一个音频节目的E-AC-3比特流)的帧中的子流结构元数据(SSM)有效载荷包括下面的格式的SSM：In some implementations, (by level 107) the substream structure metadata (SSM) payload included in frames of encoded bitstreams (e.g., E-AC-3 bitstreams indicating at least one audio program) comprises an SSM in the following format:

有效载荷报头，通常包括至少一个识别值(例如，指示SSM格式版本的2位值，以及可选地长度、周期、计数和子流相关联值)；以及在报头之后：The payload header typically includes at least one identification value (e.g., a 2-bit value indicating the SSM format version, and optionally, length, period, count, and substream associated values); and following the header:

指示由比特流指示的节目的独立子流的数量的独立子流元数据；以及Independent substream metadata indicating the number of independent substreams of a program indicated by a bitstream; and

从属子流元数据，其指示：节目的每个独立子流是否具有至少一个相关联的从属子流(即，至少一个从属子流是否与所述每个独立子流相关联)，以及如果是这样，与节目的每个独立子流相关联的从属子流的数量。Sub-stream metadata indicates whether each independent sub-stream of the program has at least one associated sub-stream (i.e., whether at least one sub-stream is associated with each independent sub-stream), and if so, the number of sub-streams associated with each independent sub-stream of the program.

预期的是，编码比特流的独立子流可以指示音频节目的扬声器通道集(例如，5.1扬声器通道音频节目的扬声器通道)，以及一个或更多个从属子流中的每个(与独立子流相关联，由从属子流元数据指示)可以指示节目的目标通道。然而，编码比特流的独立比特流通常指示节目的扬声器通道集，并且与独立子流相关联的每个从属子流(由从属子流元数据指示)指示节目的至少一个额外的扬声器通道。The expectation is that an independent substream of the encoded bitstream can indicate the set of speaker channels of an audio program (e.g., speaker channels of a 5.1 speaker channel audio program), and each of one or more subordinate substreams (as associated with the independent substream and indicated by the subordinate substream metadata) can indicate a target channel of the program. However, the independent bitstream of the encoded bitstream typically indicates the set of speaker channels of the program, and each subordinate substream associated with the independent substream (as indicated by the subordinate substream metadata) indicates at least one additional speaker channel of the program.

在一些实施方式中，(由级107)包括在编码比特流(例如，指示至少一个音频节目的E-AC-3比特流)的帧中的节目信息元数据(PIM)有效载荷具有下面的格式：In some implementations, the Program Information Metadata (PIM) payload included in the frame of the encoded bitstream (e.g., an E-AC-3 bitstream indicating at least one audio program) (by level 107) has the following format:

有效载荷报头，通常包括至少一个标识值(例如，指示PIM格式版本的值，以及可选地长度、周期、计数和子流相关联值)；以及在报头之后的下面格式的PIM：The payload header typically includes at least one identification value (e.g., a value indicating the PIM format version, and optionally, length, period, count, and substream associated values); and the PIM in the following format following the header:

指示音频节目的每个静音通道和每个非静音通道(即，节目的哪些通道包含音频信息，而哪些通道(如果有)仅包含静音(通常关于帧的持续时间))的活动通道元数据。在编码比特流是AC-3或E-AC-3比特流的实施方式中，比特流的帧中的活动通道元数据可以结合比特流的额外的元数据(例如，帧的音频编码模式(“acmod”)字段，以及，如果存在，帧或相关联的从属子流帧中的chanmap字段)以确定节目的哪些通道包含音频信息而哪些通道包含静音。AC-3或E-AC-3帧的“acmod”字段指示由帧的音频内容指示的音频节目的全音域通道的数量(例如，节目是1.0通道单通道节目、2.0通道立体声节目、还是包括L、R、C、Ls、Rs全音域通道的节目)，或者帧指示两个独立的1.0通道单通道节目。E-AC-3比特流的“chanmap”字段指示由比特流指示的从属子流的通道映射。活动通道元数据可以有助于实现解码器的上混合(在后处理器中)下游，例如以在解码器的输出处将音频添加至包含静音的通道；This indicates the active channel metadata for each silent and non-silent channel of an audio program (i.e., which channels of the program contain audio information and which channels, if any, contain only silence (typically related to the duration of the frame)). In implementations where the encoded bitstream is AC-3 or E-AC-3, the active channel metadata in the frames of the bitstream can be combined with additional metadata of the bitstream (e.g., the frame's audio encoding mode (“acmod”) field, and, if present, the chanmap field in the frame or associated subordinate substream frames) to determine which channels of the program contain audio information and which channels contain silence. The “acmod” field of an AC-3 or E-AC-3 frame indicates the number of full-range channels of the audio program indicated by the frame's audio content (e.g., whether the program is a 1.0-channel single-channel program, a 2.0-channel stereo program, or a program including L, R, C, Ls, Rs full-range channels), or the frame indicates two separate 1.0-channel single-channel programs. The “chanmap” field of an E-AC-3 bitstream indicates the channel mapping of the subordinate substream indicated by the bitstream. Active channel metadata can help enable downstream upmixing of the decoder (in the post-processor), for example, to add audio to the channel containing silence at the decoder's output;

指示节目是否被下混合(在编码之前或在编码期间)以及如果节目被下混合则被应用的下混合的类型的下混合处理状态元数据。下混合处理状态元数据可以有助于实现解码器的上混合(在后处理器中)下游，例如以使用最匹配被应用的下混合的类型的参数对节目的音频内容进行上混合。在编码比特流是AC-3或E-AC-3比特流的实施方式中，下混合处理状态元数据可以结合帧的音频编码模型(“acmod”)字段以确定应用于节目的通道的下混合(如果有)的类型；This is downmixing processing state metadata indicating whether the program is downmixed (before or during encoding) and, if so, the type of downmixing applied. Downmixing processing state metadata can facilitate downstream upmixing (in the post-processor) of the decoder, for example, upmixing the program's audio content with parameters that best match the type of downmixing applied. In implementations where the encoded bitstream is AC-3 or E-AC-3, the downmixing processing state metadata can be combined with the frame's audio coding model ("acmod") field to determine the type of downmixing (if any) applied to the program's channels.

指示在编码之前或在编码期间节目是否被上混合(例如，从较小数量的通道)以及如果节目被上混合则所应用的上混合的类型的上混合处理状态元数据。上混合处理状态元数据可以有助于实现解码器的下混合(在后处理器中)下游，例如以与应用于节目的上混合(例如，杜比定向逻辑、或杜比定向逻辑Ⅱ电影模式、或杜比定向逻辑Ⅱ音乐模式、或杜比专业上混合器)的类型一致的方式对节目的音频内容进行下混合。在编码比特流是E-AC-3比特流的实施方式中，上混合处理状态元数据可以结合其他元数据(例如，帧的“strmtyp”字段的值)以确定应用于节目的通道的上混合(如果有)的类型。(E-AC-3比特流的帧的BSI字段中的)“strmtyp”字段的值指示帧的音频内容是否属于独立流(其确定节目)或(包括多个子流或与多个子流相关联的节目的)独立子流，从而可以独立于由E-AC-3比特流指示的任何其他子流被编码，或帧的音频内容是否属于(包括多个子流或与多个子流相关联的节目的)从属子流，从而必须结合与其相关联的独立子流被解码；以及This is upmixing processing state metadata indicating whether the program is upmixed (e.g., from a smaller number of channels) before or during encoding, and if so, the type of upmixing applied. Upmixing processing state metadata can facilitate downstream downmixing (in the post-processor) of the decoder, for example, downmixing the program's audio content in a manner consistent with the type of upmixing applied to the program (e.g., Dolby Pro Logic, or Dolby Pro Logic II Cinema Mode, or Dolby Pro Logic II Music Mode, or Dolby Pro Upmixer). In implementations where the encoded bitstream is an E-AC-3 bitstream, the upmixing processing state metadata can be combined with other metadata (e.g., the value of the "strmtyp" field of a frame) to determine the type of upmixing (if any) applied to the channels of the program. The value of the “strmtyp” field (in the BSI field of a frame of an E-AC-3 bitstream) indicates whether the audio content of the frame belongs to an independent stream (which defines a program) or an independent substream (including multiple substreams or programs associated with multiple substreams), thus allowing it to be encoded independently of any other substream indicated by the E-AC-3 bitstream, or whether the audio content of the frame belongs to a subordinate substream (including multiple substreams or programs associated with multiple substreams), thus requiring it to be decoded in conjunction with its associated independent substream; and

预处理状态元数据，其指示：是否对帧的音频内容执行了预处理(在生成编码比特流的音频内容的编码之前)，以及如果对帧音频内容执行了预处理则被执行的预处理的类型。Preprocessing status metadata, which indicates whether preprocessing was performed on the audio content of the frame (before encoding of the audio content to generate the encoded bitstream), and if so, the type of preprocessing performed.

在一些实现中，预处理状态元数据指示：In some implementations, preprocessed state metadata indicates:

是否应用环绕衰减(例如，在编码之前，音频节目的环绕通道是否被衰减3dB)，Whether to apply surround attenuation (e.g., whether the surround channels of the audio program are attenuated by 3dB before encoding),

是否(例如，在编码之前，对音频节目的环绕通道Ls和Rs通道)应用90°相移，Whether to apply a 90° phase shift (e.g., to the surround channels Ls and Rs of the audio program before encoding),

在编码之前，是否对音频节目的LFE通道应用低通滤波器，Before encoding, should a low-pass filter be applied to the LFE channel of the audio program?

在生成期间，是否监视节目的LFE通道的电平以及如果监视了节目的LFE通道的电平则LFE通道的监视的电平相对于节目的全音域音频通道的电平，During generation, whether the LFE channel level of the program is monitored, and if so, the monitored LFE channel level relative to the program's full-range audio channels.

是否应当对节目的解码音频内容的每个块执行(例如，在解码器中)动态范围压缩以及如果应当对节目的解码音频内容的每个块执行动态范围压缩则待被执行的动态范围压缩的类型(和/或参数)(例如，该类型的预处理状态元数据可以指示以下压缩配置文件类型中的哪个由编码器假定以生成被包括在编码比特流中的动态范围压缩控制值：电影标准、电影光线、音乐标准、音乐光线或语音。或者，该类型的预处理状态元数据可以指示应当以由被包括在编码比特流中的动态范围压缩控制值确定的方式对节目的解码音频内容的每个帧执行重动态范围压缩(“compr”压缩))，Whether dynamic range compression should be performed (e.g., in the decoder) on each block of the program's decoded audio content, and if so, the type (and/or parameters) of dynamic range compression to be performed (e.g., the type of preprocessing state metadata could indicate which of the following compression profile types is assumed by the encoder to generate dynamic range compression control values included in the encoded bitstream: cinematic standard, cinematic ray, music standard, music ray, or speech. Alternatively, the type of preprocessing state metadata could indicate that re-dynamic range compression (“compr” compression) should be performed on each frame of the program's decoded audio content in a manner determined by the dynamic range compression control values included in the encoded bitstream).

是否使用谱扩展和/或通道耦合编码以对特定频率范围的节目内容进行编码，以及如果使用谱扩展和/或通道耦合编码以对特定频率范围的节目内容进行编码则对其执行谱扩展编码的内容的频率分量的最小频率和最大频率，以及对其执行通道耦合编码的内容的频率分量的最小频率和最大频率。该类型的预处理状态元数据信息可以有助于执行解码器的均衡(在后处理器中)下游。通道耦合信息和谱扩展信息两者都有助于在代码转换操作和应用期间优化质量。例如，编码器可以基于参数例如谱扩展和通道耦合信息的状态优化其行为(包括预处理步骤例如头戴式耳机虚拟、上混合等的自适应)。而且，编码器可以基于进入的(并且认证的)元数据的状态来动态地修改其耦合参数和谱扩展参数以匹配最佳值和/或将其耦合和谱扩展参数修改成最佳值，以及Whether spectral spreading and/or channel coupling coding is used to encode program content within a specific frequency range, and if spectral spreading and/or channel coupling coding is used, the minimum and maximum frequencies of the frequency components of the content for which spectral spreading coding is performed, and the minimum and maximum frequencies of the frequency components of the content for which channel coupling coding is performed. This type of preprocessing state metadata information can help perform equalization (in post-processing) downstream in the decoder. Both channel coupling information and spectral spreading information help optimize quality during code conversion operations and applications. For example, the encoder can optimize its behavior (including adaptation of preprocessing steps such as headset virtualization, upmixing, etc.) based on the state of parameters such as spectral spreading and channel coupling information. Moreover, the encoder can dynamically modify its coupling and spectral spreading parameters to match optimal values and/or modify its coupling and spectral spreading parameters to optimal values based on the state of incoming (and certified) metadata.

对白增强调整范围数据是否包括在编码比特流中，以及如果对白增强调整范围数据包括在编码比特流中，则在相对于音频节目中的非对白内容的电平调整对白内容的电平的对白增强处理(例如，在解码器的后处理器下游)的执行期间可得到的调整的范围。Whether the dialogue enhancement adjustment range data is included in the encoded bitstream, and if so, the range of adjustment that can be obtained during the execution of dialogue enhancement processing (e.g., downstream of the decoder's post-processor) that adjusts the level of dialogue content relative to the level of non-dialogue content in the audio program.

在一些实现中，额外的预处理状态元数据(例如，指示头戴式耳机相关的参数的元数据)被包括在(由级107)待从编码器100输出的编码比特流的PIM有效载荷中。In some implementations, additional preprocessed state metadata (e.g., metadata indicating parameters related to the headset) is included in the PIM payload of the encoded bitstream to be output from encoder 100 (by stage 107).

在一些实现中，(由级107)包括在编码比特流(例如，指示至少一个音频节目的E-AC-3比特流)的帧中的LPSM有效载荷包括下面的格式的LPSM：In some implementations, (by level 107) the LPSM payload included in frames of the encoded bitstream (e.g., an E-AC-3 bitstream indicating at least one audio program) includes LPSM in the following format:

报头(通常包括标识LPSM有效载荷的开始的同步字，在同步字之后的至少一个标识值，例如，在下面的表2中表示的LPSM格式版本、长度、周期、计数和子流关联值)；以及The header (typically includes a synchronization word identifying the start of the LPSM payload, followed by at least one identification value, such as the LPSM format version, length, period, count, and substream association value shown in Table 2 below); and

在报头之后的：Following the masthead:

指示相应音频数据指示对白或不指示对白(例如，相应音频数据的哪些通道指示对白)的至少一个对白指示值(例如，表2的参数“对白通道”)；At least one dialogue indication value (e.g., the parameter "Dialogue Channels" in Table 2) that indicates whether the corresponding audio data indicates dialogue or not (e.g., which channels of the corresponding audio data indicate dialogue).

指示相应的音频内容是否符合响度调整的所指示的集合的至少一个响度调整符合值(例如，表2的参数“响度调整类型”)；Indicates whether the corresponding audio content conforms to at least one loudness adjustment conformity value of the indicated set (e.g., the parameter "loudness adjustment type" in Table 2);

指示已经对相应音频数据执行的响度处理的至少一种类型的至少一个响度处理值(例如，表2的参数“对白选通响度校正标志”、“响度校正类型”中的一个或更多个)；以及Indicates at least one loudness processing value of at least one type of loudness processing that has been performed on the corresponding audio data (e.g., one or more of the parameters “Dialogue Gating Loudness Correction Flag” and “Loudness Correction Type” in Table 2); and

指示相应音频数据的至少一个响度(例如，峰值或平均响度)特性的至少一个响度值(例如，表2的参数“ITU相对选通响度”、“ITU语音选通响度”、“ITU(EBU 3341)短期3s响度”和“真实峰值”中的一个或更多个)。At least one loudness value (e.g., peak or average loudness) indicating at least one loudness characteristic of the corresponding audio data (e.g., one or more of the parameters “ITU relative gated loudness”, “ITU speech gated loudness”, “ITU (EBU 3341) short 3s loudness” and “true peak” in Table 2).

在一些实现中，包含PIM和/或SSM(以及可选地其他元数据)的每个元数据段包含元数据段报头(以及可选地额外的核心元素)、以及在元数据段报头(或元数据段报头和其他核心元素)之后的具有下面的格式的至少一个元数据有效载荷段：In some implementations, each metadata segment containing PIM and/or SSM (and optionally other metadata) includes a metadata segment header (and optionally additional core elements), and at least one metadata payload segment following the metadata segment header (or metadata segment header and other core elements) with the following format:

有效载荷报头，通常包括至少一个标识值(例如，SSM或PIM格式版本、长度、周期、计数和子流关联值)，以及The payload header typically includes at least one identification value (e.g., SSM or PIM format version, length, period, count, and substream association value), and

在有效载荷报头之后的SSM或PIM(或另一类型的元数据)。The SSM or PIM (or another type of metadata) following the payload header.

在一些实现中，由级107插入至比特流的帧的无用位段/跳过字段段(或“addbsi”字段或辅助数据字段)中的元数据段(在本文中有时称为“元数据容器”或“容器”)中的每个具有下面的格式：In some implementations, each of the metadata segments (sometimes referred to herein as "metadata containers" or "containers") in the useless bit segments/skip field segments (or "addbsi" fields or auxiliary data fields) of the frames inserted into the bitstream by level 107 has the following format:

元数据段报头(通常包括标识元数据段的开始的同步字，在同步字之后的标识值，例如，在下面的表1中表示的版本、长度、周期、扩展的元素计数和子流关联值)；以及Metadata segment header (typically includes a synchronization word identifying the start of the metadata segment, followed by an identification value, such as version, length, period, extended element count, and substream association value as shown in Table 1 below); and

在元数据段报头之后的有助于元数据段或相应音频数据的元数据的至少一个的解密、认证或验证中的至少一种的至少一个保护值(例如表1的HMAC摘要和音频指纹值)；以及At least one protected value (e.g., the HMAC digest and audio fingerprint value in Table 1) following the metadata segment header, which is helpful for the decryption, authentication, or verification of at least one of the metadata segments or corresponding audio data; and

也在元数据段报头之后的标识每个下面的元数据有效载荷中的元数据的类型并且指示每个这样的有效载荷的配置(例如，尺寸)的至少一个方面的元数据有效载荷标识(“ID”)值和有效载荷配置值。Also following the metadata segment header is an identifier for the type of metadata in each of the following metadata payloads and an indication of at least one aspect of the configuration (e.g., size) of each such payload, including a metadata payload identifier (“ID”) value and a payload configuration value.

每个元数据有效载荷在相应有效载荷ID值和有效载荷配置值之后。Each metadata payload follows the corresponding payload ID value and payload configuration value.

在一些实施方式中，在帧的无用位段(或辅助数据字段或“addbsi”字段)中的元数据段中的每个具有三种等级的结构：In some implementations, each of the metadata segments in the frame's unused bit fields (or auxiliary data fields or "addbsi" fields) has a three-level structure:

高等级结构(例如，元数据段报头)，包括指示无用位(或辅助数据或addbsi)字段是否包括元数据的标志、指示存在什么类型的元数据的至少一个ID值、以及通常还有指示(例如，每个类型的)元数据的多少位存在(如果元数据存在的话)的值。可以存在的元数据的一种类型为PIM，可以存在的元数据的另一类型为SSM，而可以存在的元数据的其他类型为LPSM、和/或节目边界元数据、和/或媒体搜索元数据；The higher-level structure (e.g., metadata segment header) includes a flag indicating whether a useless bit (or auxiliary data or addbsi) field includes metadata, at least one ID value indicating what type of metadata exists, and usually a value indicating how many bits of metadata (for each type) are present (if the metadata exists). One type of metadata that can exist is PIM, another type is SSM, and other types of metadata that can exist are LPSM, and/or program boundary metadata, and/or media search metadata;

中间等级结构，包括与每个所标识的类型的元数据相关联的数据(例如，元数据有效载荷报头、保护值、以及关于每个所标识的类型的元数据的有效载荷ID值和有效载荷配置值)；以及The intermediate hierarchy includes data associated with the metadata of each identified type (e.g., metadata payload header, protection value, and payload ID and payload configuration values for the metadata of each identified type); and

低等级结构，包括关于每个所标识的类型的元数据的元数据有效载荷(例如，如果PIM被识别为正存在，一系列PIM值，和/或如果该其他类型的元数据被识别为正存在，另一类型(例如，SSM或LPSM)的元数据值)。The low-level structure includes metadata payloads for each identified type of metadata (e.g., a series of PIM values if PIM is identified as present, and/or metadata values for another type (e.g., SSM or LPSM) if the other type of metadata is identified as present).

这样三个等级结构中的数据值可以被嵌套。例如，由高等级结构和中间等级结构标识的每个有效载荷(例如，每个PIM、或SSM或其他数据有效载荷)的保护值可以被包括在有效载荷之后(从而在有效载荷的元数据有效载荷报头之后)，或由高等级结构和中间等级结构标识的所有元数据有效载荷的保护值可以被包括在元数据段中的最终元数据有效载荷之后(从而在元数据段的所有有效载荷的元数据有效载荷报头之后)。Data values in these three hierarchical structures can be nested. For example, the protection value for each payload (e.g., each PIM, or SSM, or other data payload) identified by the high-level and intermediate-level structures can be included after the payload (and thus after the metadata payload header of the payload), or the protection values for all metadata payloads identified by the high-level and intermediate-level structures can be included after the final metadata payload in the metadata segment (and thus after the metadata payload header of all payloads in the metadata segment).

在(参照图8的元数据段或“容器”将要描述的)一个示例中，元数据段报头标识4个元数据有效载荷。如图8所示，元数据段报头包括容器同步字(被标识为“容器同步”)以及版本和键ID值。元数据段报头之后是4个元数据有效载荷和保护位。第一有效载荷(例如，PIM有效载荷)的有效载荷ID值和有效载荷配置(例如，有效载荷尺寸)值在元数据段报头之后，第一有效载荷本身在ID和配置值之后，第二有效载荷(例如，SSM有效载荷)的有效载荷ID值和有效载荷配置(例如，有效载荷尺寸)值在第一有效载荷之后，第二有效载荷本身在这些ID和配置值之后，第三有效载荷(例如，LPSM有效载荷)的有效载荷ID值和有效载荷配置(例如，有效载荷尺寸)值在第二有效载荷之后，第三有效载荷本身在这些ID和配置值之后，第四有效载荷的有效载荷ID值和有效载荷配置(例如，有效载荷尺寸)值在第三有效载荷之后，第四有效载荷本身在这些ID和配置值之后，而关于有效载荷中的全部或一些有效载荷(或关于高等级结构和中间等级结构以及有效载荷中的全部或一些有效载荷)的保护值(在图8中被标识为“保护数据”)在最后一个有效载荷之后。In one example (refer to Figure 8, which will describe the metadata segment or "container"), the metadata segment header identifies four metadata payloads. As shown in Figure 8, the metadata segment header includes the container synchronization word (identified as "container synchronization"), along with the version and key ID values. Following the metadata segment header are the four metadata payloads and protection bits. The payload ID and payload configuration (e.g., payload size) values for the first payload (e.g., PIM payload) are after the metadata segment header, and the first payload itself is after the ID and configuration values. The payload ID and payload configuration (e.g., payload size) values for the second payload (e.g., SSM payload) are after the first payload, and the second payload itself is after these ID and configuration values. The payload ID and payload configuration (e.g., payload size) values for the third payload (e.g., LPSM payload) are after the second payload, and the third payload itself is after these ID and configuration values. The payload ID and payload configuration (e.g., payload size) values for the fourth payload are after the third payload, and the fourth payload itself is after these ID and configuration values. The protection values (identified as "protection data" in Figure 8) for all or some of the payloads (or for high-level and intermediate-level structures and all or some of the payloads) are after the last payload.

在一些实施方式中，如果解码器101接收根据本发明的实施方式生成的具有加密散列的音频比特流，则解码器被配置成根据由比特流确定的数据块对加密散列进行分析和检索，其中所述块包括元数据。验证器102可以使用加密散列对所接收的比特流和/或相关联的元数据进行验证。例如，如果验证器102基于参考加密散列与从数据块检索到的加密散列之间的匹配发现元数据是有效的，那么可以禁止处理器103对相应的音频数据的操作，并且使得选择级104通过(未改变的)音频数据。另外，可选地或可替代地，可以使用其他类型的加密技术替代基于加密散列的方法。In some implementations, if decoder 101 receives an audio bitstream with a cryptographic hash generated according to an embodiment of the invention, the decoder is configured to analyze and retrieve the cryptographic hash based on data blocks determined by the bitstream, wherein said blocks include metadata. Verifier 102 can use the cryptographic hash to verify the received bitstream and/or associated metadata. For example, if verifier 102 finds the metadata to be valid based on a match between a reference cryptographic hash and a cryptographic hash retrieved from a data block, then processor 103 can be prevented from operating on the corresponding audio data, and selection level 104 can pass through the (unaltered) audio data. Alternatively or alternatively, other types of encryption techniques can be used instead of the cryptographic hash-based method.

图2的编码器100可以确定(响应于由解码器101提取的LPSM以及可选地还响应于节目边界元数据)后处理/预处理单元已经(在元件105、106和107中)对待编码的音频数据执行了一种类型的响度处理，因此可以(在生成器106中)创建包括用于先前执行的响度处理的和/或根据先前执行的响度处理得到的具体参数的响度处理状态元数据。在一些实现中，只要编码器知道已经对音频内容执行的处理的类型，编码器100就可以创建指示对音频内容的处理历史的元数据(以及将其包括在从编码器输出的编码比特流中)。The encoder 100 of Figure 2 can determine (in response to the LPSM extracted by the decoder 101 and optionally also in response to program boundary metadata) that the post-processing/pre-processing unit has (in elements 105, 106, and 107) performed a type of loudness processing on the audio data to be encoded, and therefore can (in generator 106) create loudness processing status metadata that includes specific parameters for the previously performed loudness processing and/or based on the previously performed loudness processing. In some implementations, the encoder 100 can create metadata indicating the processing history of the audio content (and include it in the encoded bitstream output from the encoder) as long as the encoder knows the type of processing that has been performed on the audio content.

图3是为本发明的音频处理单元的实施方式的解码器(200)以及耦接至解码器(200)的后处理器(300)的框图。后处理器(300)也是本发明的音频处理单元的实施方式。编码器200和后处理器300的部件或元件中的任何一个可以以硬件、软件或硬件和软件的组合被实现为一个或更多个处理和/或一个或更多个电路(例如，ASIC、FPGA或其他集成电路)。解码器200包括如所示地连接的帧缓冲器201、分析器205、音频解码器202、音频状态验证级(验证器)203以及控制位生成级204。通常，解码器200还包括其他处理元件(未示出)。Figure 3 is a block diagram of a decoder (200) and a post-processor (300) coupled to the decoder (200) according to an embodiment of the audio processing unit of the present invention. The post-processor (300) is also an embodiment of the audio processing unit of the present invention. Any component or element of the encoder 200 and the post-processor 300 may be implemented in hardware, software, or a combination of hardware and software as one or more processes and/or one or more circuits (e.g., ASIC, FPGA, or other integrated circuits). The decoder 200 includes a frame buffer 201, a analyzer 205, an audio decoder 202, an audio state verification stage (verifier) 203, and a control bit generation stage 204 connected as shown. Typically, the decoder 200 also includes other processing elements (not shown).

帧缓冲器201(缓冲存储器)存储(例如，以非暂态方式)由解码器200接收的编码音频比特流的至少一个帧。编码音频比特流的帧序列被从缓冲器201设定到分析器205。Frame buffer 201 (buffer memory) stores (e.g., in a non-transient manner) at least one frame of the encoded audio bitstream received by decoder 200. The frame sequence of the encoded audio bitstream is set from buffer 201 to analyzer 205.

耦接分析器205并且将其配置成从编码输入音频的每个帧中提取PIM和/或SSM(可选地还提取其他元数据，例如，LPSM)，将元数据中的至少一些(例如，LPSM和节目边界元数据，如果任意一个被提取的话，和/或PIM和/或SSM)设定到音频状态验证器203和级204，将所提取的元数据设定为(例如对后处理器300的)输出，从编码输入音频中提取音频数据，以及将所提取的音频数据设定到解码器202。The analyzer 205 is coupled and configured to extract PIM and/or SSM (optionally other metadata, such as LPSM) from each frame of the encoded input audio, set at least some of the metadata (e.g., LPSM and program boundary metadata, if either is extracted, and/or PIM and/or SSM) to the audio state verifier 203 and level 204, set the extracted metadata as (e.g., to postprocessor 300) output, extract audio data from the encoded input audio, and set the extracted audio data to the decoder 202.

输入至解码器200的编码音频比特流可以是AC-3比特流、E-AC-3比特流或杜比E比特流中的一个。The encoded audio bitstream input to decoder 200 can be one of AC-3 bitstream, E-AC-3 bitstream, or Dolby E bitstream.

图3的系统还包括后处理器300。后处理器300包括帧缓冲器301和包括耦接至缓冲器301的至少一个处理元件的其他处理元件(未示出)。帧缓冲器301存储(例如，以非暂态方式)由后处理器300从解码器200接收的解码音频比特流的至少一个帧。耦接后处理器300的处理元件并且将其配置成接收从缓冲器301输出的解码音频比特流的一系列帧并且使用从解码器200输出的元数据和/或从解码器200的级204输出的控制位对其进行自适应处理。通常，后处理器300被配置成使用来自解码器200的元数据对解码音频数据执行自适应处理(例如，使用LPSM值以及可选地还使用节目边界元数据对解码音频数据执行自适应响度处理，其中自适应处理可以基于响度处理状态、和/或由指示单个音频节目的音频数据的LPSM所指示的一个或更多个音频数据特性)。The system of Figure 3 also includes a post-processor 300. The post-processor 300 includes a frame buffer 301 and other processing elements (not shown) including at least one processing element coupled to the buffer 301. The frame buffer 301 stores (e.g., in a non-transient manner) at least one frame of the decoded audio bitstream received by the post-processor 300 from the decoder 200. The processing element coupled to the post-processor 300 is configured to receive a series of frames of the decoded audio bitstream output from the buffer 301 and to adaptively process them using metadata output from the decoder 200 and/or control bits output from stage 204 of the decoder 200. Typically, the post-processor 300 is configured to perform adaptive processing on the decoded audio data using metadata from the decoder 200 (e.g., adaptive loudness processing on the decoded audio data using LPSM values and optionally also using program boundary metadata, wherein the adaptive processing may be based on loudness processing status, and/or one or more audio data characteristics indicated by the LPSM indicating the audio data of a single audio program).

解码器200和后处理器300的各种实现被配置成执行本发明的方法的不同的实施方式。Various implementations of the decoder 200 and the post-processor 300 are configured to perform different embodiments of the method of the present invention.

解码器200的音频解码器202被配置成对由分析器205提取的音频数据进行解码以生成解码音频数据，并且将解码音频数据设定为(例如对后处理器300的)输出。The audio decoder 202 of the decoder 200 is configured to decode the audio data extracted by the analyzer 205 to generate decoded audio data, and to set the decoded audio data as (e.g., for the post-processor 300) output.

状态验证器203被配置成对设定到其的元数据进行认证和验证。在一些实施方式中，元数据为(或被包括在)已经被包括在输入比特流(例如，根据本发明的实施方式)中的数据块。块可以包括用于对元数据和/或基本音频数据(从分析器205和/或解码器202提供至验证器203)进行处理的加密散列(基于散列的消息认证代码或“HMAC”)。数据块可以在这些实施方式中被数字地标记，使得下游的音频处理单元可以相对容易地认证和验证处理状态元数据。The state verifier 203 is configured to authenticate and verify the metadata set thereto. In some embodiments, the metadata is (or is included in) a data block that has already been included in the input bitstream (e.g., according to an embodiment of the invention). The block may include a cryptographic hash (hash-based message authentication code or "HMAC") for processing the metadata and/or basic audio data (provided from analyzer 205 and/or decoder 202 to verifier 203). The data block may be digitally labeled in these embodiments so that downstream audio processing units can relatively easily authenticate and verify the processing state metadata.

包括但不限于一个或更多个非HMAC加密方法中的任意一个的其他加密方法可以用于元数据的验证(例如，在验证器203中)以确保元数据和/或基本的音频数据的安全的传输和接收。例如，验证(使用这样的加密方法)可以在接收本发明的音频比特流的实施方式的每个音频处理单元中被执行以确定包括在该比特流中的元数据和相应音频数据是否已经经历(和/或产生于)具体的处理(由元数据所指示的)并且在这样的具体的处理执行之后没有被修改。Other encryption methods, including but not limited to any one of one or more non-HMAC encryption methods, can be used for metadata verification (e.g., in verifier 203) to ensure the secure transmission and reception of metadata and/or basic audio data. For example, verification (using such encryption methods) can be performed in each audio processing unit receiving the audio bitstream of the present invention to determine whether the metadata and corresponding audio data included in the bitstream have undergone (and/or been generated by) a specific process (indicated by the metadata) and have not been modified after such specific processing has been performed.

状态验证器203将控制数据设定到控制位生成器204，和/或将控制数据设定为输出(例如，设定到后处理器300)以指示验证操作的结果。响应于控制数据(以及可选地从输入比特流中提取的其他元数据)，级204可以生成(以及设定到后处理器300)：The status verifier 203 sets control data to the control bit generator 204, and/or sets control data as output (e.g., to the post-processor 300) to indicate the result of the verification operation. In response to the control data (and optionally other metadata extracted from the input bitstream), level 204 can generate (and set to the post-processor 300):

指示从解码器202输出的解码音频数据已经经历特定类型的响度处理(当LPSM指示从解码器202输出的音频数据已经经历该特定类型的响度处理，并且来自验证器203的控制位指示LPSM有效时)的控制位；或A control bit indicating that the decoded audio data output from decoder 202 has undergone a specific type of loudness processing (when LPSM indicates that the audio data output from decoder 202 has undergone that specific type of loudness processing, and the control bit from verifier 203 indicates that LPSM is valid); or

指示从解码器202输出的解码音频数据应当经历特定类型的响度处理(例如，当LPSM指示从解码器202输出的音频数据没有经历具体类型的响度处理，或当LPSM指示从解码器202输出的音频数据已经经历该特定类型的响度处理但来自验证器203的控制位指示LPSM无效时)的控制位。A control bit that indicates that the decoded audio data output from decoder 202 should undergo a specific type of loudness processing (e.g., when LPSM indicates that the audio data output from decoder 202 has not undergone a specific type of loudness processing, or when LPSM indicates that the audio data output from decoder 202 has undergone that specific type of loudness processing but the control bit from verifier 203 indicates that LPSM is invalid).

或者，解码器200将由解码器202从输入比特流中提取的元数据以及由分析器205从输入比特流中提取的元数据设定到后处理器300，并且后处理器300使用元数据对解码音频数据执行自适应处理，或执行元数据的验证，然后如果验证指示元数据有效，则使用元数据对解码音频数据执行自适应处理。Alternatively, decoder 200 sets metadata extracted from the input bitstream by decoder 202 and metadata extracted from the input bitstream by analyzer 205 to postprocessor 300, and postprocessor 300 performs adaptive processing on the decoded audio data using the metadata, or performs metadata verification, and then performs adaptive processing on the decoded audio data using the metadata if the verification indicates that the metadata is valid.

在一些实施方式中，如果解码器200接收根据本发明的使用加密散列的实施方式生成的的音频比特流，则解码器被配置成对来自由比特流所确定的数据块的加密散列进行分析和检索，所述块包括响度处理状态元数据(LPSM)。验证器203可以使用加密散列以对接收的比特流和/或相关联的元数据进行验证。例如，如果验证器203基于参考加密散列与从数据块检索的加密散列之间的匹配发现LPSM有效，那么可以用向下游的音频处理单元(例如，可以是或包括音量校平单元的后处理器300)发信号以通过(未改变的)比特流的音频数据。另外地，可选地或可替代地，可以使用其他类型的加密技术替代基于加密散列的方法。In some implementations, if decoder 200 receives an audio bitstream generated according to an embodiment of the invention using cryptographic hashing, the decoder is configured to analyze and retrieve cryptographic hashes from data blocks determined from the bitstream, said blocks including loudness processing state metadata (LPSM). Verifier 203 can use the cryptographic hashes to verify the received bitstream and/or associated metadata. For example, if verifier 203 finds the LPSM valid based on a match between a reference cryptographic hash and a cryptographic hash retrieved from the data block, then the audio data of the (unaltered) bitstream can be signaled to a downstream audio processing unit (e.g., a post-processor 300 that may be or include a volume leveling unit). Alternatively, or alternatively, other types of encryption techniques can be used instead of the cryptographic hash-based method.

在解码器200的一些实现中，所接收(以及缓存在存储器201中)的编码比特流为AC-3比特流或E-AC-3比特流，并且包括音频数据段(例如，图4所示的帧的AB0至AB5段)和元数据段，其中音频数据段指示音频数据，而元数据段中的至少一些中的每个包括PIM或SSM(或其他元数据)。解码器级202(和/或分析器205)被配置成从比特流中提取元数据。元数据段中的包括PIM和/或SSM(可选地还包括其他元数据)的每个元数据段被包括在比特流的帧的无用位段中，或比特流的帧的比特流信息(“BSI”)段的“addbsi”字段中，或比特流的帧的结束处的辅助数据字段(例如，图4所示的AUX段)中。比特流的帧可以包括一个或两个元数据段，其中每个元数据段包括元数据，并且如果帧包括两个元数据段，一个可以存在于帧的addbsi字段中而另一个存在于帧的AUX字段中。In some implementations of decoder 200, the received (and cached in memory 201) encoded bitstream is an AC-3 bitstream or an E-AC-3 bitstream, and includes audio data segments (e.g., segments AB0 to AB5 of the frame shown in Figure 4) and metadata segments, wherein the audio data segments indicate audio data, and each of at least some of the metadata segments includes PIM or SSM (or other metadata). Decoder stage 202 (and/or analyzer 205) is configured to extract metadata from the bitstream. Each metadata segment that includes PIM and/or SSM (optionally also includes other metadata) is included in a useless bit segment of the frame of the bitstream, or in the “addbsi” field of the Bitstream Information (“BSI”) segment of the frame of the bitstream, or in an auxiliary data field at the end of the frame of the bitstream (e.g., the AUX segment shown in Figure 4). A frame of the bitstream may include one or two metadata segments, wherein each metadata segment includes metadata, and if the frame includes two metadata segments, one may exist in the addbsi field of the frame and the other may exist in the AUX field of the frame.

在一些实施方式中，缓存在缓冲器201中的比特流的每个元数据段(在本文中有时称为“容器”)具有包括元数据段报头(可选地还包括其他强制的或“核心”元素)、以及在元数据段报头之后的一个或更多个元数据有效载荷的格式。如果存在，SIM被包括在元数据有效载荷中的一个有效载荷(由有效载荷报头标识，并且通常具有第一类型的格式)中。如果存在，PIM被包括在元数据有效载荷中的另一个有效载荷(由有效载荷报头标识，并且通常具有第二类型的格式)中。类似地，元数据的其他类型(如果存在)被包括在元数据有效载荷中的另一有效载荷(由有效载荷报头标识，并且通常具有针对元数据的类型的格式)中。示例性格式使得能够在除了解码期间之外的时间方便访问(例如，由解码之后的后处理器300、或由被配置成在没有对编码比特流执行完全解码的情况下识别元数据的处理器)SSM、PIM和其他元数据，并且允许在比特流的解码期间(例如，子流识别的)方便和高效的误差检测和校正。例如，在不以示例性格式访问SSM的情况下，解码器200可能错误地识别与节目相关联的子流的正确数量。元数据段中的一个元数据有效载荷可以包括SSM，元数据段中的另一个元数据有效载荷可以包括PIM，以及可选地，元数据段中的至少一个其他元数据有效载荷可以包括其他元数据(例如，响度处理状态元数据或“LPSM”)。In some implementations, each metadata segment (sometimes referred to herein as a “container”) of the bitstream cached in buffer 201 has a format including a metadata segment header (optionally including other mandatory or “core” elements) and one or more metadata payloads following the metadata segment header. If present, the SIM is included in one of the metadata payloads (identified by a payload header and typically having a first-type format). If present, the PIM is included in another of the metadata payloads (identified by a payload header and typically having a second-type format). Similarly, other types of metadata (if present) are included in another of the metadata payloads (identified by a payload header and typically having a format specific to the type of metadata). This exemplary format enables convenient access to the SSM, PIM, and other metadata at times other than during decoding (e.g., by a post-processor 300 after decoding, or by a processor configured to identify metadata without performing full decoding of the encoded bitstream), and allows for convenient and efficient error detection and correction during the decoding of the bitstream (e.g., for substream identification). For example, if the SSM is not accessed in the exemplary format, the decoder 200 may incorrectly identify the correct number of substreams associated with the program. One metadata payload in a metadata segment may include the SSM, another metadata payload in a metadata segment may include the PIM, and optionally, at least one other metadata payload in a metadata segment may include other metadata (e.g., loudness processing status metadata or "LPSM").

在一些实施方式中，包括在缓存在缓冲器201中的编码比特流(例如，指示至少一个音频节目的E-AC-3比特流)的帧中的子流结构元数据(SSM)有效载荷包括下面的格式的SSM：In some implementations, the substream structure metadata (SSM) payload in frames containing encoded bitstreams (e.g., E-AC-3 bitstreams indicating at least one audio program) buffered in buffer 201 includes an SSM in the following format:

有效载荷报头，通常包括至少一个标识值(例如，指示SSM格式版本的2位值，以及可选地长度、周期、计数和子流关联值)；以及The payload header typically includes at least one identification value (e.g., a 2-bit value indicating the SSM format version, and optionally length, period, count, and substream association values); and

在报头之后：Following the masthead:

从属子流元数据，其指示：节目的每个独立子流是否具有至少一个与其相关联的从属子流，以及如果节目的每个独立子流具有至少一个与其相关联的从属子流，与节目的每个独立子流相关联的从属子流的数量。Sub-stream metadata indicates whether each independent sub-stream of a program has at least one associated sub-stream, and if so, the number of associated sub-streams of each independent sub-stream of the program.

在一些实施方式中，缓存在缓冲器201中的编码比特流(例如，指示至少一个音频节目的E-AC-3比特流)的帧中的包括的节目信息元数据(PIM)有效载荷具有下面的格式：In some implementations, the Program Information Metadata (PIM) payload included in frames of the encoded bitstream (e.g., an E-AC-3 bitstream indicating at least one audio program) buffered in buffer 201 has the following format:

有效载荷报头，通常包括至少一个标识值(例如，指示PIM格式版本的值，以及可选地长度、周期、计数和子流关联值)；以及在报头之后,下面的格式的PIM：The payload header typically includes at least one identification value (e.g., a value indicating the PIM format version, and optionally length, period, count, and substream association values); and following the header, the PIM in the following format:

音频节目的每个静音通道和每个非静音通道(即，节目的哪些通道包含音频信息，而哪些通道(如果有)仅包含静音(通常关于帧的持续时间))的活动通道元数据。在编码比特流是AC-3或E-AC-3比特流的实施方式中，比特流的帧中的活动通道元数据可以结合比特流的额外的元数据(例如，帧的音频编码模式(“acmod”)字段，以及如果存在，帧或相关联的从属子流帧中的chanmap字段)以确定节目的哪些通道包含音频信息而哪些通道包含静音；The active channel metadata for each silent channel and each non-silent channel of the audio program (i.e., which channels of the program contain audio information and which channels (if any) contain only silence (typically related to the duration of the frame)). In implementations where the encoded bitstream is an AC-3 or E-AC-3 bitstream, the active channel metadata in the frames of the bitstream can be combined with additional metadata of the bitstream (e.g., the frame's audio encoding mode ("acmod") field, and, if present, the chanmap field in the frame or associated subordinate substream frames) to determine which channels of the program contain audio information and which channels contain silence;

下混合处理状态元数据，其指示：节目是否被下混合(在编码之前或在编码期间)，以及如果节目被下混合，所应用的下混合的类型。下混合处理状态元数据可以有助于实现解码器的上混合(在后处理器300中)下游，例如以使用最匹配所应用的下混合的类型的参数对节目的音频内容进行上混合。在编码比特流是AC-3或E-AC-3比特流的实施方式中，下混合处理状态元数据可以结合帧的音频编码模型(“acmod”)字段以确定应用于节目的通道的下混合(如果有)的类型；Downmixing processing status metadata indicates whether the program is downmixed (before or during encoding), and if so, the type of downmixing applied. This downmixing processing status metadata can facilitate downstream upmixing by the decoder (in post-processor 300), for example, upmixing the program's audio content with parameters that best match the type of downmixing applied. In implementations where the encoded bitstream is AC-3 or E-AC-3, the downmixing processing status metadata can be combined with the frame's audio coding model ("acmod") field to determine the type of downmixing (if any) applied to the program's channels.

上混合处理状态元数据，其指示：在编码之前或在编码期间节目是否被上混合(例如，从较小数量的通道)，以及如果节目被上混合，所应用的上混合的类型。上混合处理状态元数据可以有助于实现解码器的下混合(在后处理器中)下游，例如以与应用于节目的上混合(例如，杜比定向逻辑、或杜比定向逻辑Ⅱ电影模式、或杜比定向逻辑Ⅱ音乐模式、或杜比专业上混合器)的类型一致的方式对节目的音频内容进行下混合。在编码比特流是E-AC-3比特流的实施方式中，上混合处理状态元数据可以结合其他元数据(例如，帧的“strmtyp”字段的值)以确定应用于节目的通道的上混合(如果有)的类型。(E-AC-3比特流的帧的BSI字段中的)“strmtyp”字段的值指示帧的音频内容是否属于独立流(其确定节目)或(包括多个子流或与多个子流相关联的节目的)独立子流，从而可以独立于由E-AC-3比特流所指示的任何其他子流被编码，或帧的音频内容是否属于(包括多个子流或与多个子流相关联的节目的)从属子流，从而必须结合与其相关联的独立子流而被解码；以及Upmixing processing state metadata indicates whether the program was upmixed (e.g., from a smaller number of channels) before or during encoding, and if so, the type of upmixing applied. This upmixing processing state metadata can facilitate downstream downmixing (in the post-processor) of the decoder, for example, downmixing the program's audio content in a manner consistent with the type of upmixing applied to the program (e.g., Dolby Pro Logic, or Dolby Pro Logic II Cinema Mode, or Dolby Pro Logic II Music Mode, or Dolby Pro Upmixer). In implementations where the encoded bitstream is an E-AC-3 bitstream, the upmixing processing state metadata can be combined with other metadata (e.g., the value of the "strmtyp" field of a frame) to determine the type of upmixing (if any) applied to the channels of the program. The value of the “strmtyp” field (in the BSI field of a frame of an E-AC-3 bitstream) indicates whether the audio content of the frame belongs to an independent stream (which defines a program) or an independent substream (including multiple substreams or programs associated with multiple substreams), thus allowing it to be encoded independently of any other substream indicated by the E-AC-3 bitstream, or whether the audio content of the frame belongs to a subordinate substream (including multiple substreams or programs associated with multiple substreams), thus requiring decoding in conjunction with its associated independent substream; and

预处理状态元数据，其指示：是否对帧的音频内容执行了预处理(在生成编码比特流的音频内容的编码之前)，以及如果对帧音频内容执行了预处理，被执行的预处理的类型。Preprocessing status metadata, which indicates whether preprocessing was performed on the audio content of the frame (before encoding the audio content to generate the encoded bitstream), and if so, the type of preprocessing performed.

是否应用了环绕衰减(例如，在编码之前，音频节目的环绕通道是否被衰减了3dB)，Whether surround attenuation was applied (e.g., whether the surround channels of the audio program were attenuated by 3dB before encoding).

是否(例如，在编码之前对音频节目的环绕通道Ls和Rs通道)应用了90°相移，Whether (e.g., whether a 90° phase shift was applied to the surround channels Ls and Rs of the audio program before encoding)

在编码之前，是否对音频节目的LFE通道应用了低通滤波器，Was a low-pass filter applied to the LFE channel of the audio program before encoding?

在生成期间，是否监视节目的LFE通道的电平，以及如果监视了节目的LFE通道的电平，相对于节目的全音域音频通道的电平的LFE通道的监视电平，During generation, is the level of the program's LFE channel monitored, and if so, is the monitored level of the LFE channel relative to the levels of the program's full-range audio channels?

是否应当对节目的解码音频的每个块执行(例如，在解码器中)动态范围压缩，以及如果应当对节目的解码音频的每个块执行动态范围压缩，要执行的动态范围压缩的类型(和/或参数)(例如，该类型的预处理状态元数据可以指示下面的压缩配置文件类型中的哪种类型由编码器假定以生成被包括在编码比特流中的动态范围压缩控制值：电影标准、电影光线、音乐标准、音乐光线或语音。或者，预处理状态元数据的该类型可以指示应当以由被包括在编码比特流中的动态范围压缩控制值确定的方式对节目的解码音频内容的每个帧执行重动态范围压缩(“compr”压缩))，Whether dynamic range compression should be performed (e.g., in the decoder) on each block of the decoded audio of the program, and if so, the type (and/or parameters) of dynamic range compression to be performed (e.g., the type of preprocessing state metadata could indicate which of the following compression profile types is assumed by the encoder to generate the dynamic range compression control values included in the encoded bitstream: cinematic standard, cinematic ray, music standard, music ray, or speech. Alternatively, the type of preprocessing state metadata could indicate that re-dynamic range compression (“compr” compression) should be performed on each frame of the decoded audio content of the program in a manner determined by the dynamic range compression control values included in the encoded bitstream).

是否使用谱扩展和/或通道耦合编码以对特定频率范围的节目的内容进行编码，以及如果使用谱扩展和/或通道耦合编码以对特定频率范围的节目的内容进行编码，对其执行谱扩展编码的内容的频率分量的最小频率和最大频率，以及对其执行通道耦合编码的内容的频率分量的最小频率和最大频率。该类型的预处理状态元数据信息可以有助于执行解码器的均衡(在后处理器中)下游。通道耦合信息和谱扩展信息两者也有助于在代码转换操作和应用期间优化质量。例如，编码器可以基于参数(例如谱扩展和通道耦合信息)的状态优化其行为(包括预处理步骤例如头戴式耳机虚拟、上混合等的自适应)。而且，编码器可以基于进入的(并且认证的)元数据的状态动态地修改其耦合和谱扩展参数以匹配最佳值和/或将其耦合和谱扩展参数修改成最佳值，以及Whether spectral spreading and/or channel coupling coding is used to encode the content of a program within a specific frequency range, and if spectral spreading and/or channel coupling coding is used, the minimum and maximum frequencies of the frequency components of the content for which spectral spreading coding is performed, and the minimum and maximum frequencies of the frequency components of the content for which channel coupling coding is performed. This type of preprocessing state metadata information can help perform equalization (in post-processing) downstream in the decoder. Both channel coupling information and spectral spreading information also help optimize quality during code conversion operations and applications. For example, the encoder can optimize its behavior (including adaptation of preprocessing steps such as headset virtualization, upmixing, etc.) based on the state of parameters such as spectral spreading and channel coupling information. Moreover, the encoder can dynamically modify its coupling and spectral spreading parameters to match optimal values and/or modify its coupling and spectral spreading parameters to optimal values based on the state of incoming (and certified) metadata.

对白增强调整范围数据是否包括在编码比特流中，以及如果对白增强调整范围数据包括在编码比特流中，在相对于音频节目中的非对白内容的电平调整对白内容的电平的对白增强处理(例如，在解码器的后处理器下游)的执行期间可得到的调整范围。Whether the dialogue enhancement adjustment range data is included in the encoded bitstream, and if so, the adjustment range that can be obtained during the execution of dialogue enhancement processing (e.g., downstream of the decoder's post-processor) that adjusts the level of dialogue content relative to the level of non-dialogue content in the audio program.

在一些实施方式中，包括在缓存在缓冲器201中的编码比特流(例如，指示至少一个音频节目的E-AC-3比特流)的帧中的LPSM有效载荷包括下面的格式的LPSM：In some implementations, the LPSM payload in a frame including an encoded bitstream (e.g., an E-AC-3 bitstream indicating at least one audio program) buffered in buffer 201 includes an LPSM in the following format:

报头(通常包括标识LPSM有效载荷的开始的同步字，在同步字之后的至少一个标识值，例如，在下面的表2中指示的LPSM格式版本、长度、周期、计数和子流关联值)；以及The header (typically includes a synchronization word identifying the start of the LPSM payload, followed by at least one identification value, such as the LPSM format version, length, period, count, and substream association value indicated in Table 2 below); and

在报头之后的：Following the masthead:

指示相应音频数据指示对白或不指示对白(例如，相应音频数据的哪些通道指示对白)的至少一个对白表示值(例如，表2的参数“对白通道”)；At least one dialogue representation value (e.g., the parameter "Dialogue Channels" in Table 2) that indicates whether the corresponding audio data indicates dialogue or not (e.g., which channels of the corresponding audio data indicate dialogue).

指示相应音频内容是否符合响度调整的所指示的集合的至少一个响度调整符合值(例如，表2的参数“响度调整类型”)；Indicates whether the corresponding audio content conforms to at least one loudness adjustment conformity value of the indicated set (e.g., the parameter "loudness adjustment type" in Table 2);

指示已经对相应音频数据执行的至少一种类型的响度处理的至少一个响度处理值(例如，表2的参数“对白选通响度校正标志”、“响度校正类型”中的一个或更多个)；以及At least one loudness processing value indicating at least one type of loudness processing has been performed on the corresponding audio data (e.g., one or more of the parameters “Dialogue Gating Loudness Correction Flag” and “Loudness Correction Type” in Table 2); and

在一些实现中，分析器205(和/或解码器级202)被配置成从比特流的帧的无用位段或“addbsi”字段或辅助数据段中提取具有下面的格式的每个元数据段：In some implementations, the analyzer 205 (and/or decoder stage 202) is configured to extract each metadata segment with the following format from the useless bit segments or “addbsi” fields or auxiliary data segments of the frames of the bitstream:

元数据段报头(通常包括标识元数据段的开始的同步字，同步字之后的标识值，例如版本、长度、周期、扩展的元素计数和子流关联值)；以及Metadata segment header (typically includes a synchronization word identifying the start of the metadata segment, followed by identification values such as version, length, period, extended element count, and substream association value); and

在元数据段报头之后的有助于元数据段或相应音频数据的元数据的至少一个的解密、认证或验证中的至少一种的至少一个保护值(例如，表1的HMAC摘要和音频指纹值)；以及At least one protected value following the metadata segment header that is helpful for the decryption, authentication, or verification of at least one of the metadata segments or corresponding audio data (e.g., the HMAC digest and audio fingerprint value in Table 1); and

也在元数据段报头之后的标识每个下面的元数据有效载荷中的元数据的类型并且表示每个这样的有效载荷的配置(例如，尺寸)的至少一个方面的元数据有效载荷标识(“ID”)值和有效载荷配置值。Also following the metadata segment header are metadata payload identifiers (“ID”) values and payload configuration values that identify the type of metadata in each of the following metadata payloads and represent at least one aspect of the configuration (e.g., size) of each such payload.

每个元数据有效载荷段(优选地具有上面指定的格式)在相应的元数据有效载荷ID值和元数据配置值之后。Each metadata payload segment (preferably in the format specified above) follows the corresponding metadata payload ID value and metadata configuration value.

更一般地，由本发明的优选实施方式生成的编码音频比特流具有提供将元数据元素和子元素标记为核心的(强制的)或扩展的(可选的)元素或子元素的机制的结构。这使得比特流(包括其元数据)的数据速率能够扩展到大量的应用。优选的比特流语法的核心的(强制的)元素还应当能够用信号通知与音频内容相关联的扩展的(可选的)元素存在于(带中)和/或远程位置(带外)。More generally, the encoded audio bitstream generated by the preferred embodiments of the present invention has a structure that provides a mechanism for marking metadata elements and sub-elements as core (mandatory) or extended (optional) elements or sub-elements. This allows the data rate of the bitstream (including its metadata) to be scaled to a wide range of applications. The core (mandatory) elements of the preferred bitstream syntax should also be able to signal the presence of extended (optional) elements associated with the audio content at (in-band) and/or remote (out-of-band) locations.

要求核心元素存在于比特流的每个帧中。核心元素的一些子元素是可选的，并且可以以任何组合存在。不要求扩展元素存在于每个帧中(以限制比特率总开销)。从而，扩展元素可以存在于一些帧中而不存于其他帧中。扩展元素的一些子元素是可选的，并且可以以任何组合存在，然而，扩展元素的一些子元素可以是强制的(即，如果扩展元素存在于比特流的帧中)。The core element is required to be present in every frame of the bitstream. Some child elements of the core element are optional and can exist in any combination. The extension element is not required to be present in every frame (to limit the total bitrate overhead). Therefore, the extension element can exist in some frames but not in others. Some child elements of the extension element are optional and can exist in any combination; however, some child elements of the extension element can be mandatory (i.e., if the extension element is present in a frame of the bitstream).

在一类实施方式中，生成(例如，通过实现本发明的音频处理单元)包括一系列音频数据段和元数据段的编码音频比特流。音频数据段指示音频数据，元数据段中的至少一些中的每个包括PIM和/或SSM(以及可选地至少一种其他类型的元数据)，并且音频数据段被与元数据段时分复用。在该类中的优选实施方式中，元数据段中的每个具有在本文中要描述的优选的格式。In one type of implementation, the generation (e.g., by implementing the audio processing unit of the present invention) comprises an encoded audio bitstream including a series of audio data segments and metadata segments. The audio data segments indicate audio data, each of at least some of the metadata segments includes PIM and/or SSM (and optionally at least one other type of metadata), and the audio data segments are time-division multiplexed with the metadata segments. In a preferred embodiment of this type, each of the metadata segments has a preferred format as described herein.

在一种优选的格式中，编码比特流为AC-3比特流或E-AC-3比特流，并且元数据段中的包括SSM和/或PIM的每个元数据段被包括(例如，由编码器100的优选的实现的级107)作为比特流的帧的比特流信息(“BSI”)段的“addbsi”字段(图6所示)、或比特流的帧的辅助数据字段中、或比特流的帧的无用位段中的额外的比特流信息。In a preferred format, the encoded bitstream is an AC-3 bitstream or an E-AC-3 bitstream, and each metadata segment in the metadata segment, including SSM and/or PIM, is included (e.g., by stage 107 of a preferred implementation of encoder 100) as the “addbsi” field of the Bitstream Information (“BSI”) segment of the frame of the bitstream (shown in FIG. 6), or in the auxiliary data field of the frame of the bitstream, or in the unused bit segments of the frame of the bitstream as additional bitstream information.

在优选格式中，帧中的每个包括帧的无用位段(或addbsi字段)中的元数据段(在本文中有时也称为元数据容器或容器)。元数据段具有下面表1中所示的强制的元素(统一称为“核心元素”)(并且可以包括表1中所示的可选元素)。表1中所示的需要的元素中的至少一些被包括在元数据段的元数据段报头中，但一些可以被包括在元数据段的其他位置：In the preferred format, each frame includes a metadata segment (sometimes referred to herein as a metadata container or container) within a frame's unused bit field (or addbsi field). The metadata segment has the mandatory elements shown in Table 1 below (collectively referred to as "core elements") (and may include the optional elements shown in Table 1). At least some of the required elements shown in Table 1 are included in the metadata segment header of the metadata segment, but some may be included in other locations within the metadata segment:

表1Table 1

在优选格式中，包含SSM、PIM或LPSM的每个元数据段(在编码比特流的帧的无用位段或addbsi或辅助数据字段中)包含元数据段报头(以及可选地额外的核心元素)、以及在元数据段报头(或元数据段报头和其他核心元素)之后的一个或更多个元数据有效载荷。每个元数据有效载荷包括被包括在有效载荷中的元数据有效载荷报头(指示元数据的具体类型(例如，SSM、PIM或LPSM))，之后是具体类型的元数据。通常，元数据有效载荷报头包括下面的值(参数)：In the preferred format, each metadata segment containing an SSM, PIM, or LPSM (in a useless bit segment or addbsi or auxiliary data field of a frame of the encoded bitstream) includes a metadata segment header (and optionally additional core elements), and one or more metadata payloads following the metadata segment header (or metadata segment header and other core elements). Each metadata payload includes a metadata payload header (indicating the specific type of metadata (e.g., SSM, PIM, or LPSM)) included in the payload, followed by the metadata of that specific type. Typically, the metadata payload header includes the following values (parameters):

在元数据段报头(可以包括在表1中指定的值)之后的有效载荷ID(标识元数据的类型，例如，SSM、PIM或LPSM)；The payload ID (identifying the type of metadata, such as SSM, PIM, or LPSM) following the metadata segment header (which may include the values specified in Table 1);

在有效载荷ID之后的有效载荷配置值(通常指示有效载荷的大小)；The payload configuration value following the payload ID (usually indicating the size of the payload);

以及可选地还包括额外的有效载荷配置值(例如，指示从帧的开始处到有效载荷涉及的第一音频样本的音频样本的数量的偏置值，以及有效载荷优先权值，例如，指示其中有效载荷可以被丢弃的条件)。It may also optionally include additional payload configuration values (e.g., an offset value indicating the number of audio samples from the beginning of the frame to the first audio sample involved in the payload, and a payload priority value, e.g., indicating the conditions under which the payload may be dropped).

通常，有效载荷的元数据具有下面的格式中的一种：Typically, payload metadata has one of the following formats:

有效载荷的元数据为SSM，包括指示由比特流指示的节目的独立子流的数量的独立子流元数据；以及从属子流元数据，其指示：节目的每个独立子流是否具有与其相关联的至少一个从属子流，以及如果节目的每个独立子流具有与其相关联的至少一个从属子流，与节目的每个独立子流相关联的从属子流的数量；The payload metadata is SSM, which includes independent substream metadata indicating the number of independent substreams of the program indicated by the bitstream; and dependent substream metadata indicating whether each independent substream of the program has at least one dependent substream associated with it, and if each independent substream of the program has at least one dependent substream associated with it, the number of dependent substreams associated with each independent substream of the program.

有效载荷的元数据为PIM，包括指示音频节目的哪些通道包含音频信息以及哪些通道(如果有)仅包含静音(通常关于帧的持续时间)的活动通道元数据；下混合处理状态元数据，其指示节目是否被下混合(在编码之前或在编码期间)，以及如果节目被下混合，被应用的下混合的类型；上混合处理状态元数据，其指示在编码之前或在编码期间节目是否被上混合(例如，从较小数量的通道)，以及如果节目被上混合，被应用的上混合的类型；以及预处理状态元数据，其指示是否(在生成编码比特流的音频内容的编码之前)对帧的音频数据执行了预处理，以及如果对帧的音频数据执行了预处理，执行的预处理的类型；或The payload metadata is PIM, including metadata indicating which channels of the audio program contain audio information and which channels (if any) contain only active channels with silence (typically related to the duration of a frame); downmixing processing status metadata, indicating whether the program is downmixed (before or during encoding), and if so, the type of downmixing applied; upmixing processing status metadata, indicating whether the program is upmixed (e.g., from a smaller number of channels) before or during encoding, and if so, the type of upmixing applied; and preprocessing status metadata, indicating whether preprocessing was performed on the audio data of the frame (before encoding the audio content that generates the encoded bitstream), and if so, the type of preprocessing performed; or

有效载荷的元数据为LPSM，该LPSM具有如下面的表(表2)所指示的格式：The payload metadata is an LPSM, which has the format indicated in the table below (Table 2):

表2Table 2

在根据本发明而生成的编码比特流的另一优选格式中，比特流为AC-3比特流或E-AC-3比特流，并且元数据段中的包括PIM和/或SSM(可选地还包括至少一个其他类型的元数据)的每个元数据段(例如，由编码器100的优选实现的级107)被包括在下列中的任一个中：比特流的帧的无用位段；或比特流的帧的比特流信息(“BSI”)段的“addbsi”字段(图6所示)；或比特流的帧的结束处的辅助数据字段(例如，图4中所示的AUX段)。帧可以包括一个或两个元数据段，元数据段中的每个包括PIM和/或SSM，并且(在一些实施方式中)如果帧包括两个元数据段，一个可以存在于帧的addbsi字段中而另一个存在于帧的AUX字段中。每个元数据段优选地具有参照上面的表1在上面所指定的格式(即，包括在表1中所指定的核心元素，在核心元素之后是有效载荷ID值(标识元数据段的每个有效载荷中的元数据的类型)和有效载荷配置值，以及每个元数据有效载荷)。包括LPSM的每个元数据段优选地具有参照上面的表1和表2在上面所指定的格式(即，包括在表1中所指定的核心元素，在核心元素之后是有效载荷ID(标识元数据作为LPSM)以及有效载荷配置值，之后是有效载荷(具有如表2中所指示的格式的LPSM数据))。In another preferred format of the encoded bitstream generated according to the invention, the bitstream is an AC-3 bitstream or an E-AC-3 bitstream, and each metadata segment (e.g., level 107 of a preferred implementation of encoder 100) including PIM and/or SSM (optionally including at least one other type of metadata) is included in any of the following: a useless bit segment of the frame of the bitstream; or the “addbsi” field of the bitstream information (“BSI”) segment of the frame of the bitstream (shown in FIG. 6); or an auxiliary data field at the end of the frame of the bitstream (e.g., the AUX segment shown in FIG. 4). A frame may include one or two metadata segments, each of which includes PIM and/or SSM, and (in some embodiments) if the frame includes two metadata segments, one may exist in the addbsi field of the frame and the other in the AUX field of the frame. Each metadata segment preferably has a format specified above, referring to Table 1 above (i.e., including the core element specified in Table 1, followed by a payload ID value (identifying the type of metadata in each payload of the metadata segment) and a payload configuration value, and each metadata payload). Each metadata segment including LPSM preferably has a format specified above, referring to Tables 1 and 2 above (i.e., including the core element specified in Table 1, followed by a payload ID (identifying the metadata as LPSM) and a payload configuration value, followed by the payload (LPSM data with the format indicated in Table 2)).

在另一优选格式中，编码比特流为杜比E比特流，并且元数据段中的包括PIM和/或SSM(可选地还包括其他元数据)的每个元数据段为杜比E保护带间隔的第一N样本位置。包括这样的包括LPSM的元数据段的杜比E比特流优选地包括指示在SMPTE 337M前同步信号的Pd字中用信号通知的LPSM有效载荷长度的值(SMPTE 337M Pa字重复频率优选地保持与相关联的视频帧速率相同)。In another preferred format, the encoded bitstream is a Dolby E bitstream, and each metadata segment, including PIM and/or SSM (optionally including other metadata), represents the first N sample location of the Dolby E guard band interval. A Dolby E bitstream including such metadata segments containing LPSM preferably includes a value indicating the LPSM payload length signaled in the Pd word of the SMPTE 337M presync signal (the SMPTE 337M Pa word repetition frequency is preferably maintained at the same as the associated video frame rate).

在优选的格式中，其中编码比特流为E-AC-3比特流，元数据段中的包括PIM和/或SSM(可选地还包括LPSM和/或其他元数据)的每个元数据段(例如，由编码器100的优选实现的级107)被包括作为比特流的帧的无用位段或比特流信息(“BSI”)段的“addbsi”字段中的额外的比特流信息。接下来对以该优选的格式使用LPSM对E-AC-3比特流进行编码的额外的方面进行描述：In the preferred format, where the encoded bitstream is an E-AC-3 bitstream, each metadata segment (e.g., level 107 of the preferred implementation of encoder 100) including PIM and/or SSM (optionally also including LPSM and/or other metadata) is included with additional bitstream information in the "addbsi" field of the useless bit segment or bitstream information ("BSI") segment of the frame as the bitstream. Further aspects of encoding the E-AC-3 bitstream using LPSM in this preferred format are described below:

1.在E-AC-3比特流的生成期间，尽管E-AC-3编码器(将LPSM值插入待比特流中)是“活动的”，对于每个生成的帧(同步帧)，比特流应当包括在帧的addbsi字段(或无用位段)中携带的元数据块(包括LPSM)。要求携带元数据块的比特不应当增加编码器比特率(帧长度)；1. During the generation of the E-AC-3 bitstream, although the E-AC-3 encoder (which inserts LPSM values into the bitstream) is "active," for each generated frame (synchronization frame), the bitstream should include a metadata block (including LPSM) carried in the addbsi field (or unused bit field) of the frame. It is required that the bits carrying the metadata block should not increase the encoder bit rate (frame length);

2.每个元数据块(包含LPSM)应当包含下面的信息：2. Each metadata block (including LPSM) should contain the following information:

响度校正类型标志：其中，“1”指示相应的音频数据的响度在编码器的上游被校正，而“0”指示响度由嵌入在编码器中的响度校正器(例如，图2的编码器100的响度处理器103)校正；Loudness correction type flag: where “1” indicates that the loudness of the corresponding audio data is corrected upstream of the encoder, while “0” indicates that the loudness is corrected by a loudness corrector embedded in the encoder (e.g., loudness processor 103 of encoder 100 in Figure 2).

语音通道：指示哪些源通道包含语音(在先前的0.5秒)。如果没有检测到语音，应当如此指示；Voice Channels: Indicates which source channels contain voice (in the previous 0.5 seconds). This should be indicated if no voice is detected.

语音响度：指示包括语音(在先前的0.5秒)的每个相应的音频通道的综合语音响度；Speech loudness: Indicates the overall speech loudness of each corresponding audio channel, including the speech (in the preceding 0.5 seconds);

ITU响度：指示每个相应音频通道的综合ITU BS.1770-3响度；以及ITU loudness: Indicates the overall ITU BS.1770-3 loudness for each corresponding audio channel; and

增益：解码器中的逆变的响度复合增益(以表明可逆性)；Gain: The loudness composite gain of the inverted frequency in the decoder (to indicate reversibility);

3.当E-AC-3编码器(将LPSM值插入到比特流中)是“活动的”，并且正在接收具有“信任”标志的AC-3帧时，编码器中的响度控制器(例如，图2的编码器100的响度处理器103)应当被旁路。“信任的”源对白归一化和DRC值应当被传递(例如，由编码器100的生成器106)至E-AC-3编码器部件(例如，编码器100的级107)。LPSM块生成继续，并且响度校正类型标志被设置成“1”。响度控制器旁路序列必须被同步至“信任”标志出现的解码AC-3帧的开始。响度控制器旁路序列应当被如下实现：校平器量控制跨10个音频块周期(即，53.3毫秒)从值9减少到值0，并且校平器返回结束计量器控制被置于旁路模式(该操作应当导致无缝转换)。调节器的术语“信任的”旁路暗示源比特流的对白归一化值还在编码的输出端处被重新利用。(例如，若果该“信任的”源比特流具有-30的对白归一化值，则编码器的输出应当利用-30用于输出对白归一化值)；3. When the E-AC-3 encoder (which inserts LPSM values into the bitstream) is "active" and is receiving AC-3 frames with the "trust" flag, the loudness controller in the encoder (e.g., loudness processor 103 of encoder 100 in Figure 2) should be bypassed. The "trusted" source dialogue normalization and DRC values should be passed (e.g., by generator 106 of encoder 100) to the E-AC-3 encoder components (e.g., stage 107 of encoder 100). LPSM block generation continues, and the loudness correction type flag is set to "1". The loudness controller bypass sequence must be synchronized to the beginning of the decoded AC-3 frame where the "trust" flag appears. The loudness controller bypass sequence should be implemented as follows: the leveler quantity control decreases from value 9 to value 0 across 10 audio block cycles (i.e., 53.3 milliseconds), and the leveler return end meter control is placed in bypass mode (this operation should result in a seamless transition). The term "trusted" bypass in the regulator implies that the dialogue normalization value of the source bitstream is still reused at the output of the encoder. (For example, if the "trusted" source bitstream has a dialogue normalization value of -30, then the encoder output should use -30 for the output dialogue normalization value.)

4.当E-AC-3编码器(将LPSM值插入到比特流中)是“活动的”，并且正在接收不具有“信任”标志的AC-3帧时，编码器中嵌入的响度控制器(例如，图2的编码器100的响度处理器103)应当是活动的。LPSM块生成继续，并且响度校正类型标志被设置成“0”。响度控制器激活序列应当被同步至其中“信任”标志消失的解码AC-3帧的开始。响度控制器激活序列应当被如下实现：校平器量控制跨1个音频块周期(例如，5.3毫秒)从值0增加至值9，并且校平器返回结束计量器控制被置于“活动的”模式(该操作应当导致无缝转换，并且包括返回结束计量器综合复位)；以及5.在编码期间，图形用户接口(GUI)应当给用户指示下面的参数：“输入音频节目：[信任的/不信任的]”—该参数的状态基于输入信号内的“信任”标志的存在；以及“实时响度校正：[启用/禁用]”—该参数的状态基于编码器中嵌入的响度控制器是否是活动的。4. When the E-AC-3 encoder (which inserts LPSM values into the bitstream) is "active" and is receiving AC-3 frames without the "trust" flag, the loudness controller embedded in the encoder (e.g., loudness processor 103 of encoder 100 in Figure 2) should be active. LPSM block generation continues, and the loudness correction type flag is set to "0". The loudness controller activation sequence should be synchronized to the beginning of the decoded AC-3 frame in which the "trust" flag disappears. The loudness controller activation sequence should be implemented as follows: the leveler quantity control increases from value 0 to value 9 across one audio block period (e.g., 5.3 ms), and the leveler return-to-end meter control is placed in "active" mode (this operation should result in a seamless transition and include a return-to-end meter integrated reset); and 5. During encoding, the graphical user interface (GUI) should indicate the following parameters to the user: "Input audio program: [trusted/untrusted]"—the state of this parameter is based on the presence of the "trusted" flag within the input signal; and "Real-time loudness correction: [enabled/disabled]"—the state of this parameter is based on whether the loudness controller embedded in the encoder is active.

当对使LSPM(以优选的格式)包括在比特流的每个帧的无用位段或跳过字段段或比特流信息(“BSI”)段的“addbsi”字段中的AC-3或E-AC-3比特流进行解码时，解码器应当对(无用位段或addbsi字段中的)LPSM块数据进行分析并且将全部所提取的LPSM值传递至图形用户接口(GUI)。在每帧刷新所提取的LPSM值的集合。When decoding an AC-3 or E-AC-3 bitstream that includes LSPM (in the preferred format) in the "addbsi" field of the unused bit field, skip field, or Bitstream Information ("BSI") segment in each frame of the bitstream, the decoder should analyze the LPSM block data (in the unused bit field or addbsi field) and pass all extracted LPSM values to the graphical user interface (GUI). The set of extracted LPSM values is refreshed in each frame.

在根据本发明而生成的编码比特流的另一优选格式中，编码比特流为AC-3比特流或E-AC-3比特流，并且元数据段中的包括PIM和/或SSM(可选地还包括LPSM和/或其他元数据)的每个元数据段(例如，由编码器100的优选的实现的级107)被包括在比特流的帧的无用位段或AUX段中或作为比特流信息(“BSI”)段的“addbsi”字段(图6所示)中的额外的比特流信息。在该格式(为关于上面参照表1和表2所描述的格式的变型)中，包含LPSM的addbsi(或AUX或无用位)字段中的每个字段包含下面的LPSM值：In another preferred format of the encoded bitstream generated according to the invention, the encoded bitstream is an AC-3 bitstream or an E-AC-3 bitstream, and each metadata segment (e.g., level 107 of a preferred implementation of encoder 100) including PIM and/or SSM (optionally also including LPSM and/or other metadata) is included as additional bitstream information in the unused bit segment or AUX segment of the frame of the bitstream or as the “addbsi” field (shown in FIG. 6) of the bitstream information (“BSI”) segment. In this format (a variation of the format described above with reference to Tables 1 and 2), each field in the addbsi (or AUX or unused bit) field containing LPSM contains the following LPSM value:

表1中所指定的核心元素，之后是有效载荷ID(标识元数据作为LPSM)和有效载荷值，之后是具有下面的格式(与上面表2中所示的强制元素类似)的有效载荷(LPSM数据)：The core elements specified in Table 1 are followed by the payload ID (identifying metadata as LPSM) and the payload value, followed by the payload (LPSM data) in the following format (similar to the mandatory elements shown in Table 2 above):

LPSM有效载荷的版本：指示LPSM有效载荷的版本的2位字段；LPSM payload version: A 2-bit field indicating the version of the LPSM payload;

dialchan：指示包含口语对白的相应音频数据的左、右和/或中央通道的3位字段。dialchan字段的位分配可以如下：指示左通道中存在对白的位0被存储在dialchan字段的最高有效位中；而指示中央通道中存在对白的位2被存储在dialchan字段的最低有效位中。如果在节目的前0.5秒期间相应通道包含口语对白，则dialchan字段的每个位被设置为“1”；dialchan: A 3-bit field indicating the left, right, and/or center channel containing the corresponding audio data for spoken dialogue. The bit allocation of the dialchan field can be as follows: bit 0, indicating the presence of dialogue in the left channel, is stored in the most significant bit of the dialchan field; while bit 2, indicating the presence of dialogue in the center channel, is stored in the least significant bit of the dialchan field. If the corresponding channel contains spoken dialogue during the first 0.5 seconds of the program, each bit of the dialchan field is set to "1".

loudregtyp：指示节目响度符合哪个响度调整标准的4位字段。将“loudregtyp”字段设置为“0000”指示LPSM不指示响度调整符合。例如，该字段的一个值(例如，0000)可以指示未指示符合响度调整标准，该字段的另一值(例如，0001)可以指示节目的音频数据符合ATSC A/85标准，并且该字段的另一值(例如，0010)可以指示节目的音频数据符合EBU R128标准。在该示例中，如果该字段被设置为除了“0000”之外的任何值，则有效载荷中随后应该是loudcorrdialgat和loudcorrtyp字段；loudregtyp: A 4-bit field indicating which loudness adjustment standard the program loudness conforms to. Setting the "loudregtyp" field to "0000" indicates that the LPSM does not indicate loudness adjustment conformance. For example, one value of this field (e.g., 0000) could indicate no loudness adjustment standard conformance, another value (e.g., 0001) could indicate that the program's audio data conforms to the ATSC A/85 standard, and yet another value (e.g., 0010) could indicate that the program's audio data conforms to the EBU R128 standard. In this example, if this field is set to any value other than "0000", the payload should subsequently include the loudcorrdialgat and loudcorrtyp fields.

loudcorrdialgat：指示是否已经应用对白选通校正的1位字段。如果已经使用对白选通校正了节目的响度，则loudcorrdialgat字段的值被设置为“1”。否则，被设置为“0”；loudcorrdialgat: A 1-bit field indicating whether dialogue gating correction has been applied. If dialogue gating correction has been applied to the program's loudness, the value of the loudcorrdialgat field is set to "1". Otherwise, it is set to "0".

loudcorrtyp：指示对节目应用的响度校正的类型的1位字段。如果已经使用无限超前(基于文件的)响度校正处理校正了节目的响度，则loudcorrtyp字段的值被设置为“0”。如果已经使用实时响度测量和动态范围控制的组合校正了节目的响度，则该字段的值被设置为“1”；loudcorrtyp: A 1-bit field indicating the type of loudness correction applied to the program. If the program's loudness has already been corrected using infinite lead (file-based) loudness correction processing, the value of the loudcorrtyp field is set to "0". If the program's loudness has already been corrected using a combination of real-time loudness measurement and dynamic range control, the value of this field is set to "1".

loudrelgate：指示相对选通节目响度(ITU)是否存在的1位字段。如果loudrelgate字段被设置为“1”，则有效载荷中随后应该是7位ituloudrelgat字段；loudrelgate: A 1-bit field indicating the presence of the relative gated program loudness (ITU). If the loudrelgate field is set to "1", the payload should then be followed by a 7-bit ituloudrelgate field;

loudrelgat：指示相对选通节目响度(ITU)的7位字段。该字段指示由于正在应用的对白归一化和动态范围压缩(DRC)，在没有任何增益调整的情况下根据ITU-R BS.1770-3而测量的音频节目的综合的响度。0至127的值被解释为以0.5LKFS步长的-58LKFS至+5.5LKFS；loudrelgat: A 7-bit field indicating the relative gated program loudness (ITU). This field indicates the overall loudness of the audio program as measured according to ITU-R BS.1770-3, without any gain adjustment, due to the applied dialogue normalization and dynamic range compression (DRC). Values from 0 to 127 are interpreted as -58 LKFS to +5.5 LKFS in 0.5 LKFS steps.

loudspchgate：指示语音选通响度数据(ITU)是否存在的1位字段。如果loudspchgate字段被设置为“1”，则效载荷中随后应是7位loudspchgat字段；loudspchgate: A 1-bit field indicating the presence of speech gated loudness data (ITU). If the loudspchgate field is set to "1", the payload should then be followed by a 7-bit loudspchgate field;

loudspchgate：指示语音选通节目响度的7位字段。该字段指示由于正在应用的对白归一化和动态范围压缩，在没有任何增益调整的情况下根据ITU-R BS.1770-3的公式(2)而测量的整个相应音频节目的综合响度。0至127的值被解释为以0.5LKFS步长的-58LKFS至+5.5LKFS；loudspchgate: A 7-bit field indicating the loudness of the speech gated program. This field indicates the overall loudness of the entire corresponding audio program, measured according to Formula (2) of ITU-R BS.1770-3, without any gain adjustment, due to the applied dialogue normalization and dynamic range compression. Values from 0 to 127 are interpreted as -58LKFS to +5.5LKFS in 0.5LKFS steps;

loudstrm3e：指示短期(3秒)响度数据是否存在的1位字段。如果该字段被设置为“1”，则有效载荷中随后应是7位loudstrm3s字段；loudstrm3e: A 1-bit field indicating the presence of short-term (3-second) loudness data. If this field is set to "1", the payload should be followed by a 7-bit loudstrm3s field;

loudstrm3s：指示由于正在应用的对白归一化和动态范围压缩，在没有任何增益调整的情况下根据ITU-R BS.1771-1而测量的相应音频节目的前3秒的未选通响度的7位字段。0至256的值被解释为以0.5LKFS步长的-116LKFS至+11.5LKFS；loudstrm3s: A 7-bit field indicating the ungated loudness of the first 3 seconds of the corresponding audio program, measured according to ITU-R BS.1771-1 without any gain adjustment, due to the applied dialogue normalization and dynamic range compression. Values from 0 to 256 are interpreted as -116LKFS to +11.5LKFS in 0.5LKFS steps;

truepke：指示真实峰值响度数据是否存在的1位字段。如果truepke字段被设置为“1”，则有效载荷中随后应是8位truepk字段；以及`truepke`: A 1-bit field indicating the presence of true peak loudness data. If the `truepke` field is set to "1", the payload should then be followed by an 8-bit `truepke` field; and

truepk：指示由于正在应用的对白归一化和动态范围压缩，在没有任何增益调整的情况下根据ITU-R BS.1770-3的附件2而测量的节目真实峰值样本值的8位字段。0至256的值被解释为以0.5LKFS步长的-116LKFS至+11.5LKFS。truepk: An 8-bit field indicating the true peak sample value of the program as measured according to Annex 2 of ITU-R BS.1770-3 without any gain adjustment, due to the applied dialogue normalization and dynamic range compression. Values from 0 to 256 are interpreted as -116LKFS to +11.5LKFS in 0.5LKFS steps.

在一些实施方式中，AC-3比特流或E-AC-3比特流的帧的无用位段或辅助数据(或“addbsi”)字段中的元数据段的核心元素包括元数据段报头(通常包括标识值，例如，版本)，以及在元数据段报头之后的：指示元数据段的元数据是否包括指纹数据(或其他保护值)的值、指示(与对应于元数据段的元数据的音频数据有关的)外部数据是否存在的值、关于由核心元素标识的每种类型的元数据(例如，PIM和/或SSM和/或LPSM和/或一种类型的元数据)的有效载荷ID值和有效载荷配置值、以及由元数据段报头(或元数据段的其他核心元素)标识的至少一种类型的元数据的保护值。元数据段的元数据有效载荷在元数据段报头之后，并且(在有些情况下)嵌套在元数据段的核心元素内。In some implementations, the core element of a metadata segment in the unused bit field or auxiliary data (or "addbsi") field of an AC-3 bitstream or E-AC-3 bitstream frame includes a metadata segment header (typically including an identification value, e.g., version), and following the metadata segment header are: a value indicating whether the metadata of the metadata segment includes fingerprint data (or other protection values); a value indicating the presence of external data (related to the audio data corresponding to the metadata of the metadata segment); a payload ID value and a payload configuration value for each type of metadata (e.g., PIM and/or SSM and/or LPSM and/or one type of metadata) identified by the core element; and a protection value for at least one type of metadata identified by the metadata segment header (or other core elements of the metadata segment). The metadata payload of the metadata segment follows the metadata segment header and (in some cases) is nested within the core element of the metadata segment.

本发明的实施方式可以以硬件、固件、或软件、或硬件和软件的组合(例如，作为可编程逻辑阵列)被实现。除非另外指明，作为本发明的部分而被包括在内的算法或处理不内在涉及任何特定的计算机或其他设备。具体地，各种通用机器可以利用根据本文中的教示而编写的程序而被使用，或可以更加便于构造更具体的装置(例如，集成电路)以执行所需要的方法步骤。从而，本发明可以以在一个或更多个可编程计算机系统(例如，图1的元件、或图2的编码器100(或编码器的元件)、或图3的解码器(或解码器的元件)、或图3的后处理器(或后处理器的元件)中任意一种的实施)上执行的一个或更多个计算机程序而被实现，每个可编程计算机系统包括至少一个处理器、至少一个数据存储系统(包括易失性和非易失性存储器和/或存储元件)、至少一个输入装置或端口以及至少一个输出装置或端口。程序代码被应用于输入数据以执行本文中所描述的功能并生成输出信息。输出信息以已知的方式应用于一个或更多个输出装置。Embodiments of the present invention can be implemented in hardware, firmware, or software, or a combination of hardware and software (e.g., as a programmable logic array). Unless otherwise specified, the algorithms or processes included as part of the present invention do not inherently involve any particular computer or other device. Specifically, various general-purpose machines can be used with programs written in accordance with the teachings herein, or more specific devices (e.g., integrated circuits) can be constructed to perform the required method steps. Thus, the present invention can be implemented as one or more computer programs executing on one or more programmable computer systems (e.g., any implementation of the elements of FIG. 1, or the encoder 100 (or elements of the encoder) of FIG. 2, or the decoder (or elements of the decoder) of FIG. 3, or the post-processor (or elements of the post-processor) of FIG. 3), each programmable computer system including at least one processor, at least one data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device or port, and at least one output device or port. Program code is applied to input data to perform the functions described herein and generate output information. The output information is applied to one or more output devices in a known manner.

每个这样的程序可以以任何期望的计算机语言(包括机器、汇编或高级过程的、逻辑的或面向对象的编程语言)实现以与计算机系统通信。在任何情况下，语言可以是编译语言或解释语言。Each such program can be implemented in any desired computer language (including machine, assembly, or high-level procedural, logical, or object-oriented programming languages) to communicate with a computer system. In any case, the language can be a compiled language or an interpreted language.

例如，当由计算机软件指令序列实现时，本发明的实施方式的各种功能和步骤可以由在适当的数字信号处理硬件中运行的多线程软件指令序列实现，在这种情况下，实施方式的各种装置、步骤和功能可以对应于软件指令的部分。For example, when implemented by a sequence of computer software instructions, the various functions and steps of embodiments of the present invention can be implemented by a sequence of multi-threaded software instructions running in appropriate digital signal processing hardware. In this case, the various means, steps and functions of the embodiments can correspond to parts of the software instructions.

每个这样的计算机程序优选地存储在或下载至由通用或专用可编程计算机可读的存储介质或装置(例如，固态存储器或介质、磁介质或光介质)，当存储介质或装置由计算机系统读取以执行本文所描述的过程时，用于配置和操作计算机。本发明的系统还可以被实现为配置有(例如，存储)计算机程序的计算机可读存储介质，其中，这样配置的存储介质使得计算机系统以特定和预先定义的方式操作以执行本文中所描述的功能。Each such computer program is preferably stored or downloaded to a storage medium or device readable by a general-purpose or special-purpose programmable computer (e.g., solid-state memory or medium, magnetic medium, or optical medium) for configuring and operating the computer when the storage medium or device is read by a computer system to perform the processes described herein. The system of the present invention can also be implemented as a computer-readable storage medium configured (e.g., storing) a computer program, wherein such a configuration causes the computer system to operate in a specific and predefined manner to perform the functions described herein.

已经描述了本发明的大量的实施方式。然而，应当理解的是，在不偏离本发明的精神和范围的情况下可以作出各种修改。鉴于上面的教示，本发明的大量的修改和变型是可能的。应当理解的是，在所附权利要求的范围内，可以与本文中具体描述的方式不同地实践本发明。Numerous embodiments of the invention have been described. However, it should be understood that various modifications can be made without departing from the spirit and scope of the invention. In view of the foregoing teachings, numerous modifications and variations of the invention are possible. It should be understood that the invention can be practiced differently than those specifically described herein, within the scope of the appended claims.

本发明还包括以下方案：The present invention also includes the following solutions:

方案1.一种音频处理单元，包括：Solution 1. An audio processing unit, comprising:

缓冲存储器；以及Buffer memory; and

至少一个处理子系统，其耦接至所述缓冲存储器，其中所述缓冲存储器存储编码音频比特流的至少一个帧，所述帧包括在所述帧的至少一个跳过字段的至少一个元数据段中的节目信息元数据或子流结构元数据以及在所述帧的至少一个其他段中的音频数据，其中所述处理子系统被耦接并且被配置成使用所述比特流的元数据执行所述比特流的生成、所述比特流的解码或所述比特流的音频数据的自适应处理中的至少一种，或使用所述比特流的元数据执行所述比特流的音频数据或元数据中至少之一的认证或验证中的至少一种，At least one processing subsystem coupled to the buffer memory, wherein the buffer memory stores at least one frame of an encoded audio bitstream, the frame including program information metadata or substream structure metadata in at least one metadata segment of at least one skip field of the frame and audio data in at least one other segment of the frame, wherein the processing subsystem is coupled and configured to perform at least one of the following using the metadata of the bitstream: generation of the bitstream, decoding of the bitstream, or adaptive processing of the audio data of the bitstream; or to perform at least one of the following using the metadata of the bitstream: authentication or verification of at least one of the audio data or metadata of the bitstream.

其中，所述元数据段包括至少一个元数据有效载荷，所述元数据有效载荷包括：The metadata segment includes at least one metadata payload, which includes:

报头；以及Header; and

在所述报头之后的，所述节目信息元数据的至少一部分或所述子流结构元数据的至少一部分。Following the header, at least a portion of the program information metadata or at least a portion of the substream structure metadata.

方案2.根据方案1所述的音频处理单元，其中，所述编码音频比特流指示至少一个音频节目，并且所述元数据段包括节目信息元数据有效载荷，所述节目元数据有效载荷包括：Solution 2. The audio processing unit according to Solution 1, wherein the encoded audio bitstream indicates at least one audio program, and the metadata segment includes a program information metadata payload, the program metadata payload including:

节目信息元数据报头；以及Program information metadata header; and

在所述节目信息元数据报头之后的，指示所述节目的音频内容的至少一个属性或特性的节目信息元数据，所述节目信息元数据包括指示所述节目的每个非静音通道和每个静音通道的活动通道元数据。Following the program information metadata header is program information metadata indicating at least one attribute or characteristic of the audio content of the program, the program information metadata including metadata indicating each non-mute channel and each mute channel of the program.

方案3.根据方案2所述的音频处理单元，其中，所述节目信息元数据还包括下列之一：Option 3. The audio processing unit according to Option 2, wherein the program information metadata further includes one of the following:

下混合处理状态元数据，其指示：所述节目是否是下混合过的，以及在所述节目是下混合过的情况下应用于所述节目的下混合的类型；Downmixing status metadata, which indicates whether the program has been downmixed, and if the program has been downmixed, the type of downmixing applied to the program;

上混合处理状态元数据，其指示：所述节目是否是上混合过的，以及在所述节目是上混合过的情况下应用于所述节目的上混合的类型；Upmixing status metadata, which indicates whether the program is upmixed, and if the program is upmixed, the type of upmixing applied to the program;

预处理状态元数据，其指示：是否对所述帧的音频内容执行了预处理，以及在对所述帧的音频内容执行了预处理的情况下对所述音频内容执行的预处理的类型；或Preprocessing status metadata, indicating whether preprocessing was performed on the audio content of the frame, and if so, the type of preprocessing performed on the audio content; or

谱扩展处理或通道耦合元数据，其指示：是否对所述节目应用了谱扩展处理或通道耦合，以及在对所述节目应用了谱扩展处理或通道耦合的情况下应用谱扩展或通道耦合的频率范围。Spectrum spreading or channel coupling metadata, indicating whether spectrum spreading or channel coupling has been applied to the program, and the frequency range in which spectrum spreading or channel coupling has been applied if spectrum spreading or channel coupling has been applied to the program.

方案4.根据方案1所述的音频处理单元，其中，所述编码音频比特流指示具有音频内容的至少一个独立子流的至少一个音频节目，而所述元数据段包括子流结构元数据有效载荷，所述子流结构元数据有效载荷包括：Solution 4. The audio processing unit according to Solution 1, wherein the encoded audio bitstream indicates at least one audio program of at least one independent substream having audio content, and the metadata segment includes a substream structure metadata payload, the substream structure metadata payload including:

子流结构元数据有效载荷报头；以及Substream structure metadata payload header; and

在所述子流结构元数据有效载荷报头之后的，指示所述节目的独立子流的数量的独立子流元数据，以及指示所述节目的每个独立子流是否具有至少一个相关联的从属子流的从属子流元数据。Following the substream structure metadata payload header are independent substream metadata indicating the number of independent substreams of the program, and dependent substream metadata indicating whether each independent substream of the program has at least one associated dependent substream.

方案5.根据方案1所述的音频处理单元，其中，所述元数据段包括：Option 5. The audio processing unit according to Option 1, wherein the metadata segment includes:

元数据段报头；Metadata segment header;

在所述元数据段报头之后的至少一个保护值，其用于所述节目信息元数据、或所述子流结构元数据、或与所述节目信息元数据或所述子流结构元数据相对应的所述音频数据中至少之一的解密、认证或验证中的至少一种；以及At least one protection value following the metadata segment header, used for at least one of the decryption, authentication, or verification of the program information metadata, or the substream structure metadata, or the audio data corresponding to the program information metadata or the substream structure metadata; and

在所述元数据段报头之后的元数据有效载荷标识值和有效载荷配置值，其中所述元数据有效载荷在所述元数据有效载荷标识值和所述有效载荷配置值之后。The metadata payload identifier value and the payload configuration value follow the metadata segment header, wherein the metadata payload follows the metadata payload identifier value and the payload configuration value.

方案6.根据方案5所述的音频处理单元，其中，所述元数据段报头包括标识所述元数据段的开始的同步字、以及在所述同步字之后的至少一个标识值，并且所述元数据有效载荷的所述报头包括至少一个标识值。Solution 6. The audio processing unit according to Solution 5, wherein the metadata segment header includes a synchronization word identifying the start of the metadata segment and at least one identification value following the synchronization word, and the header of the metadata payload includes at least one identification value.

方案7.根据方案1所述的音频处理单元，其中，所述编码音频比特流为AC-3比特流或E-AC-3比特流。Scheme 7. The audio processing unit according to Scheme 1, wherein the encoded audio bitstream is an AC-3 bitstream or an E-AC-3 bitstream.

方案8.根据方案1所述的音频处理单元，其中，所述缓冲存储器以非暂态方式存储所述帧。Option 8. The audio processing unit according to Option 1, wherein the buffer memory stores the frame in a non-transitory manner.

方案9.根据方案1所述的音频处理单元，其中，所述音频处理单元为编码器。Solution 9. The audio processing unit according to Solution 1, wherein the audio processing unit is an encoder.

方案10.根据方案9所述的音频处理单元，其中，所述处理子系统包括：Solution 10. The audio processing unit according to Solution 9, wherein the processing subsystem includes:

解码子系统，其被配置成接收输入音频比特流并且从所述输入音频比特流中提取输入元数据和输入音频数据；A decoding subsystem configured to receive an input audio bitstream and extract input metadata and input audio data from the input audio bitstream;

自适应处理子系统，其被耦接并且被配置成使用所述输入元数据对所述输入音频数据执行自适应处理，由此生成经处理音频数据；以及An adaptive processing subsystem, coupled to and configured to perform adaptive processing on the input audio data using the input metadata, thereby generating processed audio data; and

编码子系统，其被耦接并且被配置成响应于所述经处理音频数据，包括通过将所述节目信息元数据或所述子流结构元数据包括在所述编码音频比特流中，来生成所述编码音频比特流，并且将所述编码音频比特流设定到所述缓冲存储器。An encoding subsystem, coupled to and configured to respond to the processed audio data, includes generating the encoded audio bitstream by including the program information metadata or the substream structure metadata in the encoded audio bitstream, and setting the encoded audio bitstream to the buffer memory.

方案11.根据方案1所述的音频处理单元，其中，所述音频处理单元为解码器。Solution 11. The audio processing unit according to Solution 1, wherein the audio processing unit is a decoder.

方案12.根据方案11所述的音频处理单元，其中，所述处理子系统为耦接至所述缓冲存储器并且被配置成从所述编码音频比特流中提取所述节目信息元数据或所述子流结构元数据的解码子系统。Option 12. The audio processing unit according to Option 11, wherein the processing subsystem is a decoding subsystem coupled to the buffer memory and configured to extract the program information metadata or the substream structure metadata from the encoded audio bitstream.

方案13.根据方案1所述的音频处理单元，包括：Solution 13. The audio processing unit according to Solution 1 includes:

子系统，其被耦接至所述缓冲存储器并且被配置成：从所述编码音频比特流中提取所述节目信息元数据或所述子流结构元数据，以及从所述编码音频比特流中提取所述音频数据；以及A subsystem, coupled to the buffer memory and configured to: extract the program information metadata or the substream structure metadata from the encoded audio bitstream, and extract the audio data from the encoded audio bitstream; and

后处理器，其被耦接至所述子系统并且被配置成使用从所述编码音频比特流中提取的所述节目信息元数据或所述子流结构元数据中至少之一对所述音频数据执行自适应处理。A post-processor, coupled to the subsystem, is configured to perform adaptive processing on the audio data using at least one of the program information metadata extracted from the encoded audio bitstream or the substream structure metadata.

方案14.根据方案1所述的音频处理单元，其中，所述音频处理单元为数字信号处理器。Option 14. The audio processing unit according to Option 1, wherein the audio processing unit is a digital signal processor.

方案15.根据方案1所述的音频处理单元，其中，所述音频处理单元为预处理器，所述预处理器被配置成从所述编码音频比特流中提取所述节目信息元数据或所述子流结构元数据以及所述音频数据，并且使用从所述编码音频比特流中提取的所述节目信息元数据或所述子流结构元数据中至少之一对所述音频数据执行自适应处理。Solution 15. The audio processing unit according to Solution 1, wherein the audio processing unit is a preprocessor, the preprocessor being configured to extract the program information metadata or the substream structure metadata and the audio data from the encoded audio bitstream, and to perform adaptive processing on the audio data using at least one of the program information metadata or the substream structure metadata extracted from the encoded audio bitstream.

方案16.一种用于对编码音频比特流进行解码的方法，所述方法包括以下步骤：Solution 16. A method for decoding an encoded audio bitstream, the method comprising the following steps:

接收编码音频比特流；以及Receive encoded audio bitstream; and

从所述编码音频比特流中提取元数据和音频数据，其中所述元数据是或包括节目信息元数据和子流结构元数据，Metadata and audio data are extracted from the encoded audio bitstream, wherein the metadata is or includes program information metadata and substream structure metadata.

其中，所述编码音频比特流包括一系列帧并且指示至少一个音频节目，所述节目信息元数据和所述子流结构元数据指示所述节目，所述帧中的每个包括至少一个音频数据段，每个所述音频数据段包括所述音频数据的至少一部分，所述帧的至少一个子集中的每个帧包括元数据段，并且每个所述元数据段包括所述节目信息元数据的至少一部分以及所述子流结构元数据的至少一部分。The encoded audio bitstream comprises a series of frames and indicates at least one audio program. The program information metadata and the substream structure metadata indicate the program. Each of the frames includes at least one audio data segment, and each audio data segment includes at least a portion of the audio data. Each frame in at least a subset of the frames includes a metadata segment, and each metadata segment includes at least a portion of the program information metadata and at least a portion of the substream structure metadata.

方案17.根据方案16所述的方法，其中，所述元数据段包括节目信息元数据有效载荷，所述节目信息元数据有效载荷包括：Solution 17. The method according to Solution 16, wherein the metadata segment includes a program information metadata payload, and the program information metadata payload includes:

节目信息元数据报头；以及Program information metadata header; and

在所述节目信息元数据报头之后的指示所述节目的音频内容的至少一个属性或特性的节目信息元数据，所述节目信息元数据包括指示所述节目的每个非静音通道和每个静音通道的活动通道元数据。The program information metadata following the program information metadata header indicates at least one attribute or characteristic of the audio content of the program, and the program information metadata includes metadata indicating each non-mute channel and each mute channel of the program.

方案18.根据方案17所述的方法，其中，所述节目信息元数据还包括下列中的至少一个：Option 18. The method according to Option 17, wherein the program information metadata further includes at least one of the following:

上混合处理状态元数据，其指示：所述节目是否是上混合过的，以及在所述节目是上混合过的情况下应用于所述节目的上混合的类型；或Upmixing status metadata, indicating whether the program is upmixed, and if so, the type of upmixing applied to the program; or

预处理状态元数据，其指示：是否对所述帧的音频内容执行了预处理，以及在对所述帧的音频内容执行了预处理的情况下对所述音频内容执行的预处理的类型。Preprocessing status metadata, which indicates whether preprocessing was performed on the audio content of the frame, and if so, the type of preprocessing performed on the audio content.

方案19.根据方案16所述的方法，其中，所述编码音频比特流指示具有音频内容的至少一个独立子流的至少一个音频节目，并且所述元数据段包括子流结构元数据有效载荷，所述子流结构元数据有效载荷包括：Option 19. The method according to Option 16, wherein the encoded audio bitstream indicates at least one audio program of at least one independent substream having audio content, and the metadata segment includes a substream structure metadata payload, the substream structure metadata payload comprising:

在所述子流结构元数据有效载荷报头之后的，指示所述节目的独立子流的数量的独立子流元数据以及指示所述节目的每个独立子流是否具有至少一个相关联的从属子流的从属子流元数据。Following the substream structure metadata payload header are independent substream metadata indicating the number of independent substreams of the program and dependent substream metadata indicating whether each independent substream of the program has at least one associated dependent substream.

方案20.根据方案16所述的方法，其中，所述元数据段包括：Option 20. The method according to Option 16, wherein the metadata segment includes:

元数据段报头；Metadata segment header;

在所述元数据段报头之后的至少一个保护值，用于所述节目信息元数据或所述子流结构元数据或与所述节目信息元数据和所述子流结构元数据相对应的所述音频数据中至少之一的解密、认证或验证中的至少一个；以及At least one protection value following the metadata segment header is used for at least one of the decryption, authentication, or verification of the program information metadata, the substream structure metadata, or the audio data corresponding to the program information metadata and the substream structure metadata; and

在所述元数据段报头之后的，包括所述节目信息元数据的所述至少一部分和所述子流结构元数据的所述至少一部分的元数据有效载荷。The metadata payload following the metadata segment header includes at least a portion of the program information metadata and at least a portion of the substream structure metadata.

方案21.根据方案16所述的方法，其中，所述编码音频比特流为AC-3比特流或E-AC-3比特流。Option 21. The method according to Option 16, wherein the encoded audio bitstream is an AC-3 bitstream or an E-AC-3 bitstream.

方案22.根据方案16所述的方法，还包括步骤：Option 22. The method according to Option 16 further includes the step of:

使用从所述编码音频比特流中提取的所述节目信息元数据或所述子流结构元数据中至少之一，对所述音频数据执行自适应处理。Adaptive processing is performed on the audio data using at least one of the program information metadata or the substream structure metadata extracted from the encoded audio bitstream.

Claims

1. An audio processing unit, comprising:

One or more processors;

A memory coupled to the one or more processors and configured to store instructions that, when executed by the one or more processors, cause the one or more processors to perform operations, the operations including:

Receive an encoded audio bitstream comprising an audio program, the encoded audio bitstream comprising encoded audio data of a set of one or more audio channels and metadata associated with the set of audio channels, wherein the metadata includes loudness processing status metadata, and wherein the loudness processing status metadata includes metadata indicating the loudness of the audio program;

The encoded audio data is decoded to obtain the decoded audio data of the set of audio channels;

The loudness processing status metadata is obtained from the metadata of the encoded audio bitstream; and

Adaptive loudness processing is performed on the decoded audio data of the set of audio channels based on the loudness processing state metadata;

The metadata also includes program information metadata, which indicates a compression profile for creating Dynamic Range Compressed (DRC) data in the bitstream, wherein the compression profile is a music standard compression profile.

2. A method executed by an audio processing unit, comprising:

3. A non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations, the operations including:

Receive an encoded audio bitstream comprising an audio program, the encoded audio bitstream comprising encoded audio data of a set of one or more audio channels and metadata associated with the set of audio channels, wherein the metadata comprises loudness processing status metadata, and wherein the loudness processing status metadata comprises metadata indicating the loudness of the audio program.