HK40115800A - Efficient drc profile transmission - Google Patents
Efficient drc profile transmissionInfo
- Publication number
- HK40115800A HK40115800A HK42025102490.7A HK42025102490A HK40115800A HK 40115800 A HK40115800 A HK 40115800A HK 42025102490 A HK42025102490 A HK 42025102490A HK 40115800 A HK40115800 A HK 40115800A
- Authority
- HK
- Hong Kong
- Prior art keywords
- drc
- audio signal
- profiles
- loudness
- profile
- Prior art date
Links
Description
本申请是申请号为202110527052.4、申请日为2015年9月29日、发明名称为“高效DRC配置文件传输”的发明专利申请的分案申请,该申请号为的发明专利申请是申请号为201580053702.9、申请日为2015年9月29日、发明名称为“高效DRC配置文件传输”的发明专利申请的分案申请。This application is a divisional application of patent application No. 202110527052.4, filed on September 29, 2015, entitled "Efficient DRC Configuration File Transmission", which is a divisional application of patent application No. 201580053702.9, filed on September 29, 2015, entitled "Efficient DRC Configuration File Transmission".
相关申请的交叉引用Cross-references to related applications
本申请要求2014年10月1日提交的美国临时专利申请No.62/058,228的优先权,该申请由此通过引用而全文并入。This application claims priority to U.S. Provisional Patent Application No. 62/058,228, filed October 1, 2014, which is incorporated herein by reference in its entirety.
技术领域Technical Field
本文涉及音频信号处理。特别地,本文涉及一种用于以带宽高效的方式传输动态范围控制(DRC)配置文件(profile)的方法和对应系统。This article relates to audio signal processing. In particular, this article relates to a method and corresponding system for transmitting Dynamic Range Control (DRC) profiles in a bandwidth-efficient manner.
背景技术Background Technology
媒体消费者设备的日益普及为用于在这些设备上回放的媒体内容的创作者和分发者以及这些设备的设计者和制造者创建了新的机会和挑战。许多消费者设备能够回放范围广泛的媒体内容类型和格式,包括通常与用于HDTV、Blu-ray(蓝光)或DVD的高质量、宽带宽和宽动态范围音频内容相关联的那些。媒体处理设备可以用于在它们自己的内部声学换能器上或在外部换能器(比如耳机或高质量家庭影院系统)上回放这种类型的音频内容;然而,所有这些回放系统和环境由于环境中的噪声水平变化或者由于回放系统不失真地再现所需声压水平的能力有限而对音频信号的动态范围提出了明显不同的要求。根据环境限制动态范围是在范围广泛的具有不同渲染能力和收听环境的不同渲染设备上(即,在范围广泛的渲染模式上)提供高质量和高可懂度的方法。The increasing prevalence of media consumer devices presents both new opportunities and challenges for creators and distributors of media content played back on these devices, as well as for designers and manufacturers of these devices. Many consumer devices are capable of playing back a wide range of media content types and formats, including those typically associated with high-quality, wide-bandwidth, and wide dynamic range audio content used in HDTV, Blu-ray, or DVD. Media processing devices can be used to play back this type of audio content on their own internal acoustic transducers or on external transducers such as headphones or high-quality home theater systems; however, all these playback systems and environments place significantly different demands on the dynamic range of the audio signal due to variations in ambient noise levels or the limited ability of the playback system to reproduce the required sound pressure level without distortion. Environmentally-constrained dynamic range is a method for delivering high quality and high intelligibility across a wide range of rendering devices with varying rendering capabilities and listening environments (i.e., across a wide range of rendering modes).
本文解决了如下技术问题:为媒体内容的创作者和分发者提供使得能够在范围广泛的具有不同渲染能力的不同渲染设备上以高质量和高可懂度再现音频信号的带宽高效的手段。This paper addresses the following technical problem: providing media content creators and distributors with a bandwidth-efficient means to reproduce audio signals with high quality and intelligibility on a wide range of different rendering devices with varying rendering capabilities.
发明内容Summary of the Invention
根据一方面,描述了一种用于生成编码音频信号的方法。编码音频信号包括帧序列。编码音频信号指示用于对应的多个不同渲染模式的多个不同的动态范围控制(DRC)配置文件。所述方法包括将所述多个DRC配置文件中的不同DRC配置文件子集插入到帧序列的不同帧中,以使帧序列的两个或更多个帧共同包括所述多个DRC配置文件。According to one aspect, a method for generating an encoded audio signal is described. The encoded audio signal includes a frame sequence. The encoded audio signal indicates multiple different Dynamic Range Control (DRC) profiles for corresponding multiple different rendering modes. The method includes inserting different subsets of the multiple DRC profiles into different frames of the frame sequence, such that two or more frames of the frame sequence collectively include the multiple DRC profiles.
根据进一步的方面,描述了一种用于对编码音频信号进行解码的方法。编码音频信号包括帧序列。此外,编码音频信号指示用于对应的多个不同渲染模式的多个不同的动态范围控制(DRC)配置文件。多个DRC配置文件中的不同DRC配置文件子集被包括在所述帧序列的不同帧中,使得所述帧序列中的两个或更多个帧共同包括所述多个DRC配置文件。该方法包括从多个不同的渲染模式确定第一渲染模式,并且从帧序列的当前帧内所包括的DRC配置文件子集确定一个或多个DRC配置文件。此外,该方法包括确定所述一个或多个DRC配置文件中的至少一个是否适用于第一渲染模式。另外,该方法包括所述一个或多个DRC配置文件都不适用于第一渲染模式,则选择默认DRC配置文件作为当前DRC配置文件;其中,默认DRC配置文件的定义数据在用于对编码音频信号进行解码的解码器处是已知的。此外,该方法包括使用当前DRC配置文件对当前帧进行解码。According to a further aspect, a method for decoding an encoded audio signal is described. The encoded audio signal includes a frame sequence. Furthermore, the encoded audio signal indicates multiple different Dynamic Range Control (DRC) profiles for corresponding multiple different rendering modes. Different subsets of the multiple DRC profiles are included in different frames of the frame sequence, such that two or more frames in the frame sequence collectively include the multiple DRC profiles. The method includes determining a first rendering mode from the multiple different rendering modes, and determining one or more DRC profiles from the subset of DRC profiles included within the current frame of the frame sequence. Furthermore, the method includes determining whether at least one of the one or more DRC profiles is suitable for the first rendering mode. Additionally, if none of the one or more DRC profiles are suitable for the first rendering mode, then a default DRC profile is selected as the current DRC profile; wherein the definition data of the default DRC profile is known at the decoder used to decode the encoded audio signal. Furthermore, the method includes decoding the current frame using the current DRC profile.
根据进一步的方面,描述了一种包括编码音频信号的位流。编码音频信号包括帧序列。编码音频信号指示用于对应的多个不同渲染模式的多个不同的动态范围控制(DRC)配置文件。多个DRC配置文件中的不同DRC配置文件子集被包括在帧序列的不同帧中,以使帧序列中的两个或更多个帧共同包括所述多个DRC配置文件。According to a further aspect, a bitstream comprising an encoded audio signal is described. The encoded audio signal comprises a frame sequence. The encoded audio signal indicates multiple different Dynamic Range Control (DRC) profiles for corresponding multiple different rendering modes. Different subsets of the multiple DRC profiles are included in different frames of the frame sequence such that two or more frames in the frame sequence collectively include the multiple DRC profiles.
根据另一方面,描述了一种用于生成编码音频信号的编码器。编码音频信号包括帧序列。编码音频信号指示用于对应的多个不同渲染模式的多个不同的动态范围控制(DRC)配置文件。编码器被配置为将所述多个DRC配置文件中的不同DRC配置文件子集插入到帧序列的不同帧中,以使帧序列中的两个或更多个帧共同包括所述多个DRC配置文件。According to another aspect, an encoder for generating encoded audio signals is described. The encoded audio signals comprise a sequence of frames. The encoded audio signals indicate multiple different Dynamic Range Control (DRC) profiles for corresponding multiple different rendering modes. The encoder is configured to insert different subsets of the multiple DRC profiles into different frames of the frame sequence, such that two or more frames in the frame sequence collectively include the multiple DRC profiles.
根据进一步的方面,描述了一种用于对编码音频信号进行解码的解码器。编码音频信号包括帧序列。编码音频信号指示用于对应的多个不同渲染模式的多个不同的动态范围控制(DRC)配置文件。所述多个DRC配置文件中的不同DRC配置文件子集被包括在帧序列的不同帧中,以使帧序列的两个或更多个帧共同包括所述多个DRC配置文件。所述解码器被配置为:从所述多个不同的渲染模式确定第一渲染模式;从帧序列的当前帧内所包括的DRC配置文件子集确定一个或多个DRC配置文件;确定所述一个或多个DRC配置文件中的至少一个是否适用于第一渲染模式;如果所述一个或多个DRC配置文件都不适用于第一渲染模式,则选择默认DRC配置文件作为当前DRC配置文件;其中,默认DRC配置文件的定义数据在解码器处是已知的;并且使用当前DRC配置文件对当前帧进行解码。According to a further aspect, a decoder for decoding encoded audio signals is described. The encoded audio signals include a sequence of frames. The encoded audio signals indicate multiple different Dynamic Range Control (DRC) profiles for corresponding multiple different rendering modes. Different subsets of the multiple DRC profiles are included in different frames of the frame sequence such that two or more frames of the frame sequence jointly include the multiple DRC profiles. The decoder is configured to: determine a first rendering mode from the multiple different rendering modes; determine one or more DRC profiles from the subset of DRC profiles included within the current frame of the frame sequence; determine whether at least one of the one or more DRC profiles is suitable for the first rendering mode; if none of the one or more DRC profiles are suitable for the first rendering mode, select a default DRC profile as the current DRC profile; wherein the definition data of the default DRC profile is known at the decoder; and decode the current frame using the current DRC profile.
根据进一步的方面,描述了一种软件程序。所述软件程序可以适于在处理器上执行并且适于当在处理器上实施时执行本文中所概述的方法步骤。According to a further aspect, a software program is described. The software program is adaptable to execute on a processor and, when implemented on a processor, to perform the method steps outlined herein.
根据另一方面,描述了一种存储介质。所述存储介质可以包括软件程序,该软件程序适于在处理器上执行并且适于当在处理器上实施时执行本文中所概述的方法步骤。According to another aspect, a storage medium is described. The storage medium may include a software program adapted to execute on a processor and, when implemented on a processor, to perform the method steps outlined herein.
根据进一步的方面,描述了一种计算机程序产品。所述计算机程序产品可以包括用于当在计算机上被执行时执行本文中所概述的方法步骤的可执行指令。According to a further aspect, a computer program product is described. The computer program product may include executable instructions for performing the method steps outlined herein when executed on a computer.
应注意,如本专利申请中所概述的包括其优选实施例的方法和系统可以单独使用或者与本文中所公开的其他方法和系统组合使用。此外,本专利申请中所概述的方法和系统的所有方面都可以任意组合。特别地,权利要求的特征可以以任意的方式相互组合。It should be noted that the methods and systems, including their preferred embodiments, as outlined in this patent application can be used alone or in combination with other methods and systems disclosed herein. Furthermore, all aspects of the methods and systems outlined in this patent application can be combined arbitrarily. In particular, the features of the claims can be combined with each other in any manner.
附图说明Attached Figure Description
下面以示例性方式参照附图来对本发明进行说明,其中The invention will now be described by way of example with reference to the accompanying drawings, wherein...
图1和图2分别例示说明示例音频解码器和示例音频编码器;Figures 1 and 2 illustrate an example audio decoder and an example audio encoder, respectively.
图3和图4例示说明示例动态范围压缩曲线;Figures 3 and 4 illustrate example dynamic range compression curves;
图5例示说明示例帧序列;以及Figure 5 illustrates an example frame sequence; and
图6示出用于选择DRC配置文件的示例方法的流程图。Figure 6 shows a flowchart of an example method for selecting a DRC profile.
具体实施方式Detailed Implementation
如上面所指示的,本文件解决了使得音频内容的设计者和/或分发者能够针对不同类型的渲染模式控制音频内容的质量和可懂度的技术问题。示例渲染模式是家庭影院渲染模式,在家庭影院渲染模式中,在安静的环境中使用通常允许非常宽的动态范围的换能器来回放音频内容。另一个示例渲染模式是平板模式,在平板模式中,使用例如电视机的换能器来回放音频内容,这些换能器通常允许与家庭影院相比缩小的动态范围。进一步的示例渲染模式是便携式扬声器模式,在便携式扬声器模式中,使用便携式电子设备(比如智能电话)的扩音器来回放音频内容。该渲染模式的动态范围与以上提及的渲染模式相比通常小,并且环境往往是有噪声的。另一个示例渲染模式是便携式耳机模式,在便携式耳机模式中,使用结合便携式电子设备的耳机来回放音频内容。动态范围是受限的,但是通常高于便携式电子设备的扩音器提供的动态范围。As indicated above, this document addresses the technical challenges of enabling designers and/or distributors of audio content to control the quality and intelligibility of audio content for different types of rendering modes. An example rendering mode is the home theater rendering mode, in which audio content is played back using transducers that typically allow a very wide dynamic range in a quiet environment. Another example rendering mode is the flat panel mode, in which audio content is played back using transducers such as those found in televisions, which typically allow a reduced dynamic range compared to home theater. A further example rendering mode is the portable speaker mode, in which audio content is played back using the amplifier of a portable electronic device (such as a smartphone). The dynamic range of this rendering mode is typically smaller than the rendering modes mentioned above, and the environment is often noisy. Another example rendering mode is the portable headphone mode, in which audio content is played back using headphones combined with a portable electronic device. The dynamic range is limited, but generally higher than that provided by the amplifier of the portable electronic device.
为了允许不同渲染模式有高质量和高可懂度,用于不同渲染模式的不同DRC(动态范围控制)配置文件可以连同音频内容一起被提供。音频内容可以在帧序列中被传输。帧序列可以包括I(即,独立)帧,I帧可以独立于先前的或后续的帧被解码。此外,帧序列可以包括通常表现出关于前一帧和/或后一帧的相关性的其他类型的帧(例如,P帧和/或B帧)。帧序列中的至少一些帧可以包括用于多个不同的渲染模式的多个不同的DRC配置文件。具体地说,帧序列的I帧可以包括所述多个DRC配置文件。To allow for high quality and intelligibility across different rendering modes, different DRC (Dynamic Range Control) profiles for each rendering mode can be provided along with audio content. The audio content can be transmitted in a frame sequence. The frame sequence can include I-frames (i.e., independent frames), which can be decoded independently of previous or subsequent frames. Furthermore, the frame sequence can include other types of frames (e.g., P-frames and/or B-frames) that typically exhibit correlation with the preceding and/or following frames. At least some frames in the frame sequence can include multiple different DRC profiles for multiple different rendering modes. Specifically, the I-frames of the frame sequence can include the multiple DRC profiles.
通过将多个不同的DRC配置文件插入到音频帧序列中,使得音频解码器能够为特定渲染模式选择适当的DRC配置文件。结果,可以确保被渲染的音频信号具有高质量(尤其是没有由换能器引入的裁剪或失真)和高可懂度。By inserting multiple different DRC profiles into the audio frame sequence, the audio decoder is able to select the appropriate DRC profile for a specific rendering mode. As a result, the rendered audio signal can be ensured to have high quality (especially without clipping or distortion introduced by the transducer) and high intelligibility.
在下面,描述动态范围控制的各个方面。在没有定制的动态范围控制的情况下,输入音频信息(例如,PCM采样、QMF矩阵中的时间-频率采样等)通常在回放设备处以不适合于回放设备的特定回放环境(即,包括设备的物理和/或机械回放限制)的响度水平被再现,这是因为回放设备的特定回放环境可能不同于在编码设备处已经针对其对编码音频内容进行编码的目标回放环境。Below, various aspects of dynamic range control are described. Without customized dynamic range control, input audio information (e.g., PCM samples, time-frequency samples in a QMF matrix, etc.) is typically reproduced at the playback device at a loudness level unsuitable for the specific playback environment of the playback device (i.e., including the device's physical and/or mechanical playback limitations), because the specific playback environment of the playback device may differ from the target playback environment for which the encoded audio content has been encoded at the encoding device.
如本文中所描述的技术可以用于支持针对各种回放环境中的任何回放环境被定制的各种音频内容的动态范围控制,同时保持音频内容的感知质量并且保持艺术家使该内容适应不同收听环境的意图。The techniques described in this article can be used to support dynamic range control of a variety of audio content tailored to any playback environment in a variety of playback environments, while maintaining the perceptual quality of the audio content and the artist's intention to adapt the content to different listening environments.
动态范围控制(DRC)是指随时间变化的、与水平相关的音频处理操作,该音频处理操作改变(例如,压缩、削切(cut)、扩展、提升(boost))信号以便将音频内容中的响度水平的输入动态范围转换为不同于输入动态范围的输出动态范围。例如,在动态范围控制场景中,柔和的声音可以被映射(例如,被提升等)到更高的响度水平,响亮的声音可以被映射(例如,被削切等)到更低的响度水平。结果,在响度域中,响度水平的输出范围在这个例子中变为小于响度水平的输入范围。在一些实施例中,然而,动态范围控制可以是可逆的,使得原始范围被恢复。例如,可以执行扩展操作来恢复原始范围,只要输出动态范围中的从原始响度水平映射的映射响度水平达到或低于限幅水平,每个独特的原始响度水平被映射到独特的输出响度水平,等等。Dynamic range control (DRC) refers to a time-varying, level-dependent audio processing operation that alters (e.g., compresses, cuts, expands, boosts) a signal to convert the input dynamic range of loudness levels in audio content into an output dynamic range different from the input dynamic range. For example, in a DRC scenario, a soft sound might be mapped (e.g., boosted) to a higher loudness level, while a loud sound might be mapped (e.g., cut) to a lower loudness level. As a result, in the loudness domain, the output range of loudness levels becomes smaller than the input range of loudness levels in this example. In some embodiments, however, DRC can be reversible, allowing the original range to be restored. For example, an expansion operation can be performed to restore the original range, whereby each unique original loudness level is mapped to a unique output loudness level, and so on, provided that the mapped loudness level from the original loudness level in the output dynamic range is at or below a limiting level.
如本文中所描述的DRC技术可以用于在某些回放环境或情况下提供更好的收听体验。例如,有噪声的环境中的柔和的声音可能被使该柔和声音不可听的噪声掩蔽。相反,响亮的声音在一些情况下可能是不期望的,例如,打扰邻居(例如,在“深夜”收听模式内)。通常具有小形状因子的扩音器的许多设备不能再现高输出水平的声音,或者不能再现没有可感知的失真的声音。在一些情况下,较低信号电平可能被再现得低于人类听觉阈值。DRC技术可以基于通过动态范围压缩曲线查找的DRC增益(例如,缩放音频振幅的缩放因子、提升比率、削切比率等)来执行输入响度水平到输出响度水平的映射。As described in this article, DRC technology can be used to provide a better listening experience in certain playback environments or situations. For example, a soft sound in a noisy environment may be masked by noise that makes the soft sound inaudible. Conversely, a loud sound may be undesirable in some situations, such as disturbing neighbors (e.g., in "late-night" listening mode). Many devices with amplifiers that typically have small shape factors cannot reproduce high output levels of sound, or sound without perceptible distortion. In some cases, lower signal levels may be reproduced below the human hearing threshold. DRC technology can perform a mapping from input loudness level to output loudness level based on the DRC gain (e.g., scaling factor, boost ratio, clipping ratio, etc.) found through a dynamic range compression curve.
动态范围压缩曲线是指如下功能(例如,查找表、曲线、多段分段线等):将从各个音频数据帧确定的各个输入响度水平(例如,除对话之外的声音的输入响度水平,等等)映射到对应的输出响度水平,结果映射到各个增益或用于动态范围控制的增益,以便将输入响度水平转化为对应的输出响度水平。各个增益中的每一个指示将应用于信号的用于将对应的单个输入响度水平映射到预期的输出响度水平的增益量。应用各个增益之后的输出响度水平表示在特定回放环境中各个音频数据帧中的音频内容的目标响度水平。Dynamic range compression (DLL) refers to a function (e.g., lookup table, curve, multi-segment line, etc.) that maps individual input loudness levels (e.g., input loudness levels of sounds other than dialogue, etc.) determined from individual audio data frames to corresponding output loudness levels. The result is mapped to individual gains or gains used for dynamic range control to transform the input loudness levels into corresponding output loudness levels. Each gain is an indication of the amount of gain applied to the signal to map the corresponding single input loudness level to the desired output loudness level. The output loudness level after applying the gains represents the target loudness level of the audio content in each audio data frame within a specific playback environment.
除了指定增益和响度水平之间的映射,动态范围压缩曲线还可以包括,或者还可以设有,应用特定增益中的特定释放时间(releasetime)和增高时间(attack time)。增高是指连续时间采样之间的信号能量(或响度)的增大,而释放是指连续时间采样之间的能量(或响度)的降低。增高时间(例如,10毫秒、20毫秒等)是指当对应信号处于增高模式时使DRC增益平滑所用的时间常数。释放时间(例如,80毫秒、100毫秒等)是指当对应信号处于释放模式时使DRC增益平滑所用的时间常数。在一些实施例中,附加地、可选地或可替代地,时间常数用于在确定DRC增益之前使信号能量(或响度)平滑。In addition to specifying the mapping between gain and loudness levels, the dynamic range compression curve may also include, or may further include, specific release times and attack times for applying a specific gain. Attack time refers to the increase in signal energy (or loudness) between consecutive time samples, while release time refers to the decrease in energy (or loudness) between consecutive time samples. Attack time (e.g., 10 ms, 20 ms, etc.) is the time constant used to smooth the DRC gain when the corresponding signal is in attack mode. Release time (e.g., 80 ms, 100 ms, etc.) is the time constant used to smooth the DRC gain when the corresponding signal is in attack mode. In some embodiments, additionally, optionally, or alternatively, the time constant is used to smooth the signal energy (or loudness) before determining the DRC gain.
不同的动态范围压缩曲线可以对应于不同的回放环境(即,不同的渲染模式)。例如,用于平板TV的回放环境的动态范围压缩曲线可以不同于用于便携式设备的回放环境的动态范围压缩曲线。回放设备可以具有两种或更多种回放环境。例如,用于具有扬声器的便携式设备的第一回放环境的第一动态范围压缩曲线可以不同于用于具有耳麦的相同便携式设备的第二回放环境的第二动态范围压缩曲线。Different dynamic range compression curves can correspond to different playback environments (i.e., different rendering modes). For example, the dynamic range compression curve for a playback environment of a flat-panel TV can be different from the dynamic range compression curve for a playback environment of a portable device. A playback device can have two or more playback environments. For example, the first dynamic range compression curve for a first playback environment of a portable device with speakers can be different from the second dynamic range compression curve for a second playback environment of the same portable device with headsets.
图1示出了音频解码器100的示例组件的框图。音频解码器100包括数据提取器104、动态范围控制器106以及音频渲染器108。数据提取器104被配置为接收编码输入信号102。如本文中所描述的编码输入信号102可以是包含被编码(例如,压缩等)的输入音频数据帧(尤其是音频帧序列)并且可能还包含元数据的位流。该位流可以是AC-4位流。数据提取器104被配置为从编码输入信号102提取/解码输入音频数据帧和元数据。每个输入音频数据帧包括多个编码音频数据块,每个编码音频数据块表示多个音频采样。每个帧表示包括一定数量的音频采样的(例如,恒定)时间间隔。帧大小可以随着采样速率和编码数据速率而变化。音频采样是表示一个、两个或更多个(音频)频带或频率范围内的量化音频数据元素(例如,输入PCM采样、QMF矩阵中的输入时间-频率采样等)。输入音频数据帧中的量化音频数据元素可以表示数字(量化)域中的声压波。量化音频数据元素可以涵盖达到或低于最大可能值(例如,限幅水平、最大响度水平等)的有限范围的响度水平。Figure 1 shows a block diagram of example components of an audio decoder 100. The audio decoder 100 includes a data extractor 104, a dynamic range controller 106, and an audio renderer 108. The data extractor 104 is configured to receive an encoded input signal 102. The encoded input signal 102, as described herein, can be a bitstream containing encoded (e.g., compressed, etc.) input audio data frames (especially a sequence of audio frames) and possibly metadata. This bitstream can be an AC-4 bitstream. The data extractor 104 is configured to extract/decode the input audio data frames and metadata from the encoded input signal 102. Each input audio data frame includes multiple encoded audio data blocks, each representing multiple audio samples. Each frame represents a (e.g., constant) time interval comprising a certain number of audio samples. The frame size can vary with the sampling rate and the encoded data rate. An audio sample is a quantized audio data element (e.g., input PCM sample, input time-frequency sample in a QMF matrix, etc.) representing one, two, or more (audio) bands or frequency ranges. Quantized audio data elements in an input audio data frame can represent sound pressure waves in the digital (quantization) domain. Quantized audio data elements can cover a finite range of loudness levels that reach or fall below the maximum possible value (e.g., limiting level, maximum loudness level, etc.).
元数据可以被音频解码器100用来对输入音频数据帧进行处理。元数据可以包括与解码器100将执行的一个或多个操作相关的各种操作参数、一个或多个动态范围压缩曲线(即,一个或多个DRC配置文件)、与输入音频数据帧中所表示的对话响度水平相关的规范化参数等。对话响度水平可以是指整个节目(例如,电影、TV节目、无线电广播等)、节目的一部分、节目的对话等中的对话响度、节目响度、平均对话响度等的(例如,心理声学、感知等)水平。Metadata can be used by the audio decoder 100 to process the input audio data frames. Metadata may include various operational parameters related to one or more operations to be performed by the decoder 100, one or more dynamic range compression curves (i.e., one or more DRC profiles), normalization parameters related to the dialogue loudness level represented in the input audio data frames, etc. Dialogue loudness level can refer to the dialogue loudness in the entire program (e.g., a movie, TV program, radio broadcast, etc.), a portion of the program, dialogue in the program, program loudness, average dialogue loudness, etc. (e.g., psychoacoustic, perceptual, etc.).
解码器100或一些或全部模块(例如,数据提取器104、动态范围控制器106等)的操作和功能可以响应于从编码输入信号102提取的元数据而被改动。例如,元数据——包括但不限于动态范围压缩曲线、对话响度水平等——可以被解码器100用来生成数字域中的音频数据元素(例如,输出PCM采样、QMF矩阵中的输出时间-频率采样等)。输出数据元素然后可以被用来驱动音频通道或扬声器以在特定回放环境中进行回放期间实现指定响度或参考再现水平。The operation and function of decoder 100 or some or all of its modules (e.g., data extractor 104, dynamic range controller 106, etc.) can be modified in response to metadata extracted from encoded input signal 102. For example, metadata—including but not limited to dynamic range compression curves, dialogue loudness levels, etc.—can be used by decoder 100 to generate audio data elements in the digital domain (e.g., output PCM samples, output time-frequency samples in a QMF matrix, etc.). The output data elements can then be used to drive audio channels or speakers to achieve a specified loudness or reference reproduction level during playback in a particular playback environment.
动态范围控制器106可以被配置为接收输入音频数据帧中的音频数据元素中的一些或全部以及元数据,至少部分基于从编码音频信号102提取的元数据对输入音频数据帧中的音频数据元素执行音频处理操作(例如,动态范围控制操作、增益平滑操作、增益限制操作等),等等。The dynamic range controller 106 can be configured to receive some or all of the audio data elements in the input audio data frame, as well as metadata, and to perform audio processing operations (e.g., dynamic range control operations, gain smoothing operations, gain limiting operations, etc.) on the audio data elements in the input audio data frame, at least in part based on the metadata extracted from the encoded audio signal 102.
特定地,动态范围控制器106可以包括选择器110、响度计算器112和/或DRC增益单元114。选择器110可以被配置为确定与解码器100处的特定回放环境相关的扬声器配置(例如,家庭影院模式、平板模式、具有扬声器模式的便携式设备、具有耳机模式的便携式设备、5.1扬声器配置模式、7.1扬声器配置模式等)。扬声器配置也可以被称为渲染模式。此外,选择器110可以被配置为从从编码输入信号102的元数据提取的动态范围压缩曲线(即,从多个DRC配置文件)中选择特定的动态范围压缩曲线(即,DRC配置文件)。Specifically, the dynamic range controller 106 may include a selector 110, a loudness calculator 112, and/or a DRC gain unit 114. The selector 110 may be configured to determine a speaker configuration (e.g., home theater mode, flat panel mode, portable device with speaker mode, portable device with headphone mode, 5.1 speaker configuration mode, 7.1 speaker configuration mode, etc.) associated with a specific playback environment at the decoder 100. The speaker configuration may also be referred to as a rendering mode. Furthermore, the selector 110 may be configured to select a specific dynamic range compression curve (i.e., a DRC profile) from dynamic range compression curves extracted from the metadata of the encoded input signal 102 (i.e., from multiple DRC profiles).
响度计算器112可以被配置为计算输入音频数据帧中的音频数据元素所表示的一种或多种类型的响度水平。响度水平类型的例子包括但不限于以下中的任何一个:各个时间间隔上的各个通道中的各个频带上的各个响度水平、各个通道中的宽(或广)频率范围上的宽带(或广带)响度水平、从音频数据块或帧确定的或在音频数据块或帧上平滑的响度水平、从多于一个的音频数据块或帧确定的或在多于一个的音频数据块或帧上平滑的响度水平、在一个或多个时间间隔上平滑的响度水平等。这些响度水平中的零个、一个或多个可以出于解码器100的动态范围控制的目的而被改变。Loudness calculator 112 can be configured to calculate one or more types of loudness levels represented by audio data elements in an input audio data frame. Examples of loudness level types include, but are not limited to, any of the following: loudness levels in various frequency bands of various channels over various time intervals; wideband (or broadband) loudness levels over a wide (or broad) frequency range in various channels; loudness levels determined from or smoothed over audio data blocks or frames; loudness levels determined from or smoothed over more than one audio data block or frame; loudness levels smoothed over one or more time intervals; etc. Zero or more of these loudness levels can be changed for the purpose of dynamic range control of decoder 100.
为了确定响度水平,响度计算器112可以确定输入音频数据帧中的音频数据元素所表示的一个或多个时间相关的物理声波性质,比如特定音频频率处的空间和/或局部压力水平等。响度计算器112可以使用该一个或多个时变的物理波性质基于对人类响度感知进行建模的一个或多个心理声学函数来推导一种或多种类型的响度水平。心理声学函数可以是基于人类听觉系统的模型构造的非线性函数,该函数将特定音频频率处的特定空间压力水平转换为/映射到用于这些特定音频频率的特定响度。To determine loudness levels, loudness calculator 112 can determine one or more time-dependent physical acoustic properties represented by audio data elements in an input audio data frame, such as spatial and/or local pressure levels at specific audio frequencies. Loudness calculator 112 can use these one or more time-varying physical acoustic properties to derive one or more types of loudness levels based on one or more psychoacoustic functions modeling human loudness perception. The psychoacoustic function can be a nonlinear function constructed based on a model of the human auditory system, which converts/maps a specific spatial pressure level at a specific audio frequency to a specific loudness for those specific audio frequencies.
多个(音频)频率或多个频带上的(例如,宽带、广带等)响度水平可以通过在所述多个(音频)频率或多个频带上的特定响度水平的整合而得出。可以通过使用在解码器100中作为音频处理操作的一部分实现的一个或多个平滑滤波器来获得一个或多个时间间隔(例如,长于音频数据块或帧中的音频数据元素所表示的时间间隔等)上的经时间平均的、平滑的、等等的响度水平。ITU-R BS.1770中指定了用于确定(宽带)响度水平的另一种示例方法。ITU-R BS.1770中指定的方法对时域输入音频信号应用时域滤波,然后计算输入音频信号的每个通道上的RMS(均方根)水平,这是在在通道上进行整合并且对所得的响度水平进行门控之前进行的。Loudness levels at multiple (audio) frequencies or frequency bands (e.g., broadband, wideband, etc.) can be obtained by integrating specific loudness levels at said multiple (audio) frequencies or frequency bands. A time-averaged, smoothed, etc., loudness level over one or more time intervals (e.g., longer than the time interval represented by audio data elements in an audio data block or frame) can be obtained by using one or more smoothing filters implemented as part of audio processing operations in decoder 100. Another example method for determining (broadband) loudness levels is specified in ITU-R BS.1770. The method specified in ITU-R BS.1770 applies a time-domain filter to the time-domain input audio signal and then calculates the RMS (root mean square) level on each channel of the input audio signal, prior to integration on the channels and gating the resulting loudness levels.
可以对每一具有一定(例如,256个等)采样的音频数据块计算对于不同频带的特定响度水平。在将特定响度水平整合为宽带(或广带)响度水平中,可以使用预滤波器来将频率加权(例如,类似于IECB-加权等)应用于特定响度水平。可以执行两个或更多个通道(例如,左前、右前、中心、左环绕、右环绕等)上的宽响度水平的求和以提供所述两个或更多个通道的总体响度水平。A specific loudness level for different frequency bands can be calculated for each audio data block with a certain number of samples (e.g., 256, etc.). In integrating specific loudness levels into a wideband (or broadband) loudness level, a pre-filter can be used to apply frequency weighting (e.g., similar to IECB-weighting, etc.) to the specific loudness level. The summation of wide loudness levels on two or more channels (e.g., left front, right front, center, left surround, right surround, etc.) can be performed to provide the overall loudness level of said two or more channels.
总体响度水平可以是指扬声器配置的单个通道(例如,中心等)中的宽带(广带)响度水平。总体响度水平可以是指多个通道中的宽带(或广带)响度水平。所述多个通道可以是(即,用于渲染模式的)扬声器配置中的所有通道。附加地、可选地或可替代地,所述多个通道可以包括扬声器配置中的通道子集(例如,包括左前、右前和低频效果(LFE)的通道子集;包括左环绕和右环绕的通道子集;以及包括中心的通道子集等)。The overall loudness level can refer to the broadband (wideband) loudness level in a single channel (e.g., center, etc.) of a speaker configuration. The overall loudness level can also refer to the broadband (or wideband) loudness level across multiple channels. These multiple channels can be all channels in the speaker configuration (i.e., those used for rendering modes). Additionally, optionally, or alternatively, the multiple channels may include subsets of channels in the speaker configuration (e.g., subsets including left front, right front, and low-frequency effects (LFE); subsets including left surround and right surround; and a subset including the center, etc.).
(例如,宽带、广带、总体、特定等)响度水平可以用作从所选择的动态范围压缩曲线查找对应的(例如,静态的、预先平滑的、预先限制的、等等的)DRC增益的输入。可以首先相对于得自从编码音频信号102提取的元数据的对话响度水平和/或相对于渲染模式的输出参考水平对将用作查找DRC增益的输入的响度水平进行调整或规范化。在编码音频信号102中的音频内容的一部分中所表示的特定空间压力水平被转换为或映射到编码音频信号102中的音频内容的该部分的特定响度水平之前,可以在非响度域(例如,SPL域等)中对编码音频信号102中的音频内容的该部分执行与调整对话响度水平/输出参考水平相关的调整和规范化。Loudness levels (e.g., broadband, wideband, overall, specific, etc.) can be used as input to find the corresponding (e.g., static, pre-smoothed, pre-limited, etc.) DRC gain from a selected dynamic range compression curve. The loudness level to be used as input for finding the DRC gain can first be adjusted or normalized relative to the dialogue loudness level derived from metadata extracted from the encoded audio signal 102 and/or relative to the output reference level of the rendering mode. Before a specific spatial pressure level represented in a portion of the audio content in the encoded audio signal 102 is converted or mapped to a specific loudness level in that portion of the audio content in the encoded audio signal 102, adjustments and normalization related to adjusting the dialogue loudness level/output reference level can be performed on that portion of the audio content in the encoded audio signal 102 in a non-loudness domain (e.g., SPL domain, etc.).
DRC增益单元114可以被配置有DRC算法,该DRC算法生成增益(例如,用于动态范围控制、用于增益限制、用于增益平滑等的增益),并且将增益应用于输入音频数据帧中的音频数据元素所表示的一种或多种类型的响度水平以实现特定回放环境的目标响度水平。如本文中所描述的增益(例如,DRC增益等)的应用可以在响度域中发生。举例来说,增益可以基于响度计算(其可以是在Sone,或仅例如未转换的针对对话响度水平被补偿的SPL值中)来生成,被平滑并且直接应用于输入信号。如本文中所描述的技术可以将增益应用于响度域中的信号,然后将该信号从响度域转换回(线性)SPL域,并且通过在响度域中在增益被应用于信号之前和之后对信号进行评估来计算将应用于信号的对应增益。比率(或当用对数dB表示来表示时的差值)然后确定用于信号的对应增益。DRC gain unit 114 can be configured with a DRC algorithm that generates a gain (e.g., a gain for dynamic range control, gain limiting, gain smoothing, etc.) and applies the gain to one or more types of loudness levels represented by audio data elements in the input audio data frame to achieve a target loudness level for a particular playback environment. The application of the gain (e.g., DRC gain, etc.) as described herein can occur in the loudness domain. For example, the gain can be generated based on loudness calculations (which may be in the Son, or simply, for example, an unconverted SPL value compensated for the dialogue loudness level), smoothed, and applied directly to the input signal. Techniques as described herein can apply the gain to the signal in the loudness domain, then convert the signal back from the loudness domain to the (linear) SPL domain, and calculate the corresponding gain to be applied to the signal by evaluating the signal in the loudness domain before and after the gain is applied. The ratio (or the difference when expressed in logarithmic dB) is then used to determine the corresponding gain for the signal.
DRC算法可以用多个DRC参数进行操作。DRC参数包括对话响度水平,该对话响度水平已经被(如在图2的上下文下描述的)上游编码器150计算并且被嵌入到编码音频信号102中,并且可以由解码器100从编码音频信号102中的元数据获得。来自上游编码器150的对话响度水平指示平均对话响度水平(例如,每个节目的、相对于全标度的1kHz正弦波的能量的、相对于参考矩形波的能量的、等等)。从编码音频信号102提取的对话响度水平可以用于减小节目间的响度水平差。在解码器100处在相同的特定回放环境中,参考对话响度水平可以在不同节目之间被设置为相同的值。基于来自元数据的对话响度水平,DRC增益单元114可以将对话响度相关的增益应用于节目中的每个音频数据块,以使在节目的多个音频数据块上被平均的输出对话响度水平(或输出参考水平)被提高/降低到节目的参考对话响度水平(例如,预先配置的、系统默认的、用户可配置的、配置文件相关的、等等的)。对话响度水平还可以用于对DRC算法进行校准,尤其是,DRC算法的零带可以被调整为对话响度水平。可替代地,期望的输出参考水平可以用于在DRC算法被应用于已经应用增益的信号时对DRC算法进行校准,以使对话响度水平变为与期望的输出参考水平相等。如果语音门控已经被用来确定对话规范(dialnorm)参数,则对话响度水平可以对应于所谓的对话规范参数。在一些实施例中,对话响度水平对应于不是通过使用语音门控、而是通过基于响度水平阈值的门控确定的对话规范参数。The DRC algorithm can operate using multiple DRC parameters. These parameters include a dialogue loudness level, which has been calculated by the upstream encoder 150 (as described in the context of Figure 2) and embedded into the encoded audio signal 102, and can be obtained by the decoder 100 from metadata in the encoded audio signal 102. The dialogue loudness level from the upstream encoder 150 indicates the average dialogue loudness level (e.g., per program, energy relative to a 1kHz sine wave on full scale, energy relative to a reference rectangular wave, etc.). The dialogue loudness level extracted from the encoded audio signal 102 can be used to reduce loudness level differences between programs. At the decoder 100, within the same specific playback environment, the reference dialogue loudness level can be set to the same value across different programs. Based on the dialogue loudness level from metadata, the DRC gain unit 114 can apply dialogue loudness-related gain to each audio data block in the program, so that the averaged output dialogue loudness level (or output reference level) across multiple audio data blocks of the program is increased/decreased to the program's reference dialogue loudness level (e.g., pre-configured, system default, user-configurable, profile-dependent, etc.). The dialogue loudness level can also be used to calibrate the DRC algorithm; in particular, the zero band of the DRC algorithm can be adjusted to the dialogue loudness level. Alternatively, the desired output reference level can be used to calibrate the DRC algorithm when it is applied to a signal that has already been given gain, so that the dialogue loudness level becomes equal to the desired output reference level. If voice gating has been used to determine the dialogue norm parameter, the dialogue loudness level can correspond to the so-called dialogue norm parameter. In some embodiments, the dialogue loudness level corresponds to a dialogue norm parameter determined not by voice gating, but by gating based on a loudness level threshold.
DRC增益可以用于通过根据所选的动态范围压缩曲线提升或削切柔和的和/或响亮的声音中的信号部分来解决节目内的响度水平差。这些DRC增益中的一个或多个可以通过DRC算法基于所选的动态范围压缩曲线以及从一个或多个对应音频数据块、音频数据帧等的确定的(例如,宽带、广带、总体、特定的、等等)响度水平来计算/确定。DRC gains can be used to address loudness level differences within a program by boosting or cutting off signal portions in soft and/or loud sounds according to a selected dynamic range compression curve. One or more of these DRC gains can be calculated/determined by a DRC algorithm based on a selected dynamic range compression curve and a defined (e.g., wideband, broadband, overall, specific, etc.) loudness level from one or more corresponding audio data blocks, audio data frames, etc.
用于通过查找所选的动态范围压缩曲线来确定(例如,静态的、预先平滑的、预先增益限制的、等等)DRC增益的响度水平可以按短间隔(例如,大约5.3毫秒等)被计算。人类听觉系统的整合时间(例如,大约200毫秒等)可以长得多。可以用考虑了人类听觉系统的长整合时间的时间常数来使从所选的动态范围压缩曲线获得的DRC增益平滑。为了实现响度水平的快速变化(增大或降低)速率,可以使用短时间常数来使响度水平在与短时间常数相对应的短时间间隔内变化。相反,为了实现响度水平的缓慢变化(增大或降低)速率,可以使用长时间常数来使响度水平在与长时间常数相对应的长时间间隔内改变。The loudness level used to determine the DRC gain by finding a selected dynamic range compression curve (e.g., static, pre-smoothed, pre-gain limited, etc.) can be calculated in short intervals (e.g., approximately 5.3 milliseconds, etc.). The integration time of the human auditory system (e.g., approximately 200 milliseconds, etc.) can be much longer. The DRC gain obtained from the selected dynamic range compression curve can be smoothed using a time constant that takes into account the long integration time of the human auditory system. To achieve a rapid rate of change (increase or decrease) in the loudness level, a short time constant can be used to change the loudness level within a short time interval corresponding to the short time constant. Conversely, to achieve a slow rate of change (increase or decrease) in the loudness level, a long time constant can be used to change the loudness level within a long time interval corresponding to the long time constant.
人类听觉系统可以以不同的整合时间对增大的响度水平和降低的响度水平做出反应。可以根据响度水平是将增大、还是将降低来使用不同的时间常数以使从所选的动态范围压缩曲线查找的静态DRC增益平滑。例如,与人类视觉系统的特性相对应地,增高(响度水平增大)可以用相对短的时间常数(例如,增高时间等)来平滑,而释放(响度水平降低)可以用相对长的时间常数(例如,释放时间等)来平滑。The human auditory system can respond to increasing and decreasing loudness levels with different integration times. Different time constants can be used to smooth the static DRC gain derived from a selected dynamic range compression curve, depending on whether the loudness level is increasing or decreasing. For example, corresponding to the characteristics of the human visual system, increase (loudness level increase) can be smoothed with a relatively short time constant (e.g., increase time, etc.), while release (loudness level decrease) can be smoothed with a relatively long time constant (e.g., release time, etc.).
用于音频内容的一部分(例如,一个或多个音频数据块、音频数据帧等)的DRC增益可以使用从音频内容的该部分确定的响度水平被计算得到。可以首先相对于(例如,关于、等等)从编码音频信号102提取的元数据中的(例如,音频内容是其一部分的节目中的、等等的)对话响度水平对将用于在所选的动态范围压缩曲线中查找的响度水平进行调整。The DRC gain for a portion of the audio content (e.g., one or more audio data blocks, audio data frames, etc.) can be calculated using the loudness level determined from that portion of the audio content. The loudness level to be used to find in the selected dynamic range compression curve can be adjusted first relative to the dialogue loudness level in the metadata extracted from the encoded audio signal 102 (e.g., in a program where the audio content is part of the dialogue, etc.).
可以针对解码器100处的特定回放环境指定或建立参考对话响度水平/输出参考水平(例如,在“线”模式中为-31dBFS,在“RF”模式中为-20dBFS,等等)。附加地、可替代地或可选地,在一些实施例中,用户可以被给予对于在解码器100处设置或改变参考对话响度水平的控制。A reference dialogue loudness level/output reference level can be specified or established for a specific playback environment at decoder 100 (e.g., -31 dB FS in "line" mode, -20 dB FS in "RF" mode, etc.). Additionally, alternatively, or optionally, in some embodiments, the user may be given control over setting or changing the reference dialogue loudness level at decoder 100.
DRC增益单元114可以被配置为确定如下这样的对话响度相关增益,该对话响度相关增益用于音频内容,以使得从输入对话响度水平变为作为输出对话响度水平的参考对话响度水平。DRC gain unit 114 can be configured to determine a dialogue loudness-related gain for audio content such that the input dialogue loudness level becomes a reference dialogue loudness level as the output dialogue loudness level.
音频渲染器108可以被配置为在将基于DRC、增益限制、增益平滑等确定的增益应用于从编码音频信号102提取的输入音频数据之后针对特定扬声器配置生成(例如,多通道的、等等)通道特定音频数据116。通道特定音频数据116可以用于驱动扬声器配置中所表示的扬声器、耳机等。The audio renderer 108 can be configured to generate channel-specific audio data 116 (e.g., multi-channel, etc.) for a specific speaker configuration after applying a gain determined based on DRC, gain limiting, gain smoothing, etc., to the input audio data extracted from the encoded audio signal 102. The channel-specific audio data 116 can be used to drive speakers, headphones, etc., as indicated in the speaker configuration.
附加地和/或可选地,解码器100可以被配置为执行与输入音频信号相关的处理、渲染、下混、重采样等有关的一个或多个其他的操作。Additionally and/or optionally, decoder 100 may be configured to perform one or more other operations related to processing, rendering, downmixing, resampling, etc., associated with the input audio signal.
如本文中所描述的技术可以用于与各种不同的环绕声配置(例如,2.0、3.0、4.0、4.1、4.1、5.1、6.1、7.1、7.1、10.2、10-60扬声器配置、60+扬声器配置、对象信号或对象信号的组合等)以及各种不同的渲染环境配置(例如,影院、公园、歌剧院、音乐厅、酒吧、家里、礼堂等)相对应的各种扬声器配置。The techniques described herein can be used with various speaker configurations corresponding to a wide range of surround sound configurations (e.g., 2.0, 3.0, 4.0, 4.1, 4.1, 5.1, 6.1, 7.1, 7.1, 10.2, 10-60 speaker configurations, 60+ speaker configurations, object signals or combinations of object signals, etc.) and various rendering environment configurations (e.g., cinemas, parks, opera houses, concert halls, bars, homes, auditoriums, etc.).
图2例示说明了示例编码器150。编码器150可以包括音频内容接口152、对话响度分析器154、DRC参考资料库156以及音频信号编码器158。编码器150可以是广播系统、基于互联网的内容服务器、空中网络运营商系统、电影制作系统等的一部分。Figure 2 illustrates an example encoder 150. Encoder 150 may include an audio content interface 152, a dialogue loudness analyzer 154, a DRC reference library 156, and an audio signal encoder 158. Encoder 150 may be part of a broadcasting system, an Internet-based content server, an over-the-air network operator system, a film production system, etc.
音频内容接口152可以被配置为接收音频内容160和音频内容控制输入162,用于至少基于音频内容160和音频内容控制输入162中的一些或全部来生成编码音频信号102。例如,音频内容接口152可以用于从内容创作者、内容提供者等接收音频内容160和音频内容控制输入162。The audio content interface 152 can be configured to receive audio content 160 and audio content control input 162, for generating an encoded audio signal 102 based on at least some or all of the audio content 160 and audio content control input 162. For example, the audio content interface 152 can be used to receive audio content 160 and audio content control input 162 from a content creator, content provider, etc.
音频内容160可以构成仅包括音频、包括视听等的总媒体数据的一些或全部。音频内容160可以包括节目的部分、节目、若干节目、一个或多个商业广告等的一个或多个。Audio content 160 may constitute some or all of total media data, including only audio, audiovisual, etc. Audio content 160 may include portions of a program, a program, several programs, one or more commercials, etc.
对话响度分析器154可以被配置为确定/建立音频内容152的一个或多个部分(例如,一个或多个节目、一个或多个商业广告等)的一个或多个对话响度水平。音频内容可以用一组或多组音轨表示。音频内容的对话音频内容可以在单独的音轨中,和/或音频内容的对话音频内容的至少一部分可以在包括非对话音频内容的音轨中。The dialogue loudness analyzer 154 can be configured to determine/establish one or more dialogue loudness levels for one or more portions of audio content 152 (e.g., one or more programs, one or more commercials, etc.). The audio content can be represented by one or more audio tracks. The dialogue audio content of the audio content can be in a separate audio track, and/or at least a portion of the dialogue audio content of the audio content can be in an audio track that includes non-dialogue audio content.
音频内容控制输入162可以包括以下中的一些或全部:用户控制输入、编码器150外部的系统/设备提供的控制输入、来自内容创作者的控制输入、来自内容提供者的控制输入等。例如,用户(比如混音工程师等)可以提供/指定一个或多个动态范围压缩曲线标识符;这些标识符可以用于从数据资料库(比如DRC参考资料库(156)等)检索最适合音频内容160的一个或多个动态范围压缩曲线。Audio content control input 162 may include some or all of the following: user control input, control input provided by a system/device outside the encoder 150, control input from the content creator, control input from the content provider, etc. For example, a user (such as a mixing engineer) may provide/specify one or more dynamic range compression curve identifiers; these identifiers may be used to retrieve one or more dynamic range compression curves best suited for the audio content 160 from a database (such as a DRC reference database (156) etc.).
DRC参考资料库156可以被配置为存储DRC参考参数集等。DRC参考参数集可以包括一个或多个动态范围压缩曲线的定义数据等。编码器150可以(例如,并发地)将多于一个的动态范围压缩曲线编码到编码音频信号102中。动态范围压缩曲线中的零个、一个或多个可以是基于标准的、专有的、定制的、解码器可修改的、等等。举例来说,图3和图4的动态范围压缩曲线可以(例如,并发地)被编码到编码音频信号102中。DRC reference library 156 can be configured to store DRC reference parameter sets, etc. The DRC reference parameter set may include definition data for one or more dynamic range compression curves, etc. Encoder 150 can (e.g., concurrently) encode more than one dynamic range compression curve into the encoded audio signal 102. Zero, one, or more dynamic range compression curves can be standard-based, proprietary, custom, decoder-modifiable, etc. For example, the dynamic range compression curves of Figures 3 and 4 can (e.g., concurrently) be encoded into the encoded audio signal 102.
音频信号编码器158可以被配置为:从音频内容接口152接收音频内容,从对话响度分析器154接收对话响度水平,从DRC参考资料库156检索一个或多个DRC参考参数集(即,DRC配置文件),将音频内容格式化为音频数据块/帧,将对话响度水平、DRC参考参数集等格式化为元数据(例如,元数据容器、元数据字段、元数据结构等),并且将音频数据块/帧和元数据编码为编码音频信号102。The audio signal encoder 158 can be configured to: receive audio content from the audio content interface 152, receive dialogue loudness levels from the dialogue loudness analyzer 154, retrieve one or more DRC reference parameter sets (i.e., DRC profiles) from the DRC reference library 156, format the audio content into audio data blocks/frames, format dialogue loudness levels, DRC reference parameter sets, etc. into metadata (e.g., metadata containers, metadata fields, metadata structures, etc.), and encode the audio data blocks/frames and metadata into an encoded audio signal 102.
如本文中所描述的将被编码为编码音频信号102的音频内容可以以各种方式(比如无线地、经由有线连接、通过文件、经由互联网下载等)中的一种或多种、以各种源音频格式中的一种或多种接收。The audio content to be encoded as encoded audio signal 102 as described herein can be received in one or more of various ways (e.g., wirelessly, via a wired connection, via a file, via internet download, etc.) and in one or more of various source audio formats.
如本文中所描述的编码音频信号102可以是(例如,用于音频广播、音频节目、视听节目、视听广播等的)整个媒体数据位流的一部分。媒体数据位流可以从服务器、计算机、媒体存储设备、媒体数据库、媒体文件等访问。媒体数据位流可以通过一个或多个无线或有线网络链路被广播、发送或接收。媒体数据位流还可以通过中间介质(比如网络连接、USB连接、广域网、局域网、无线连接、光学连接、总线、纵横连接、串行连接等中的一个或多个)被传送。The encoded audio signal 102 described herein can be part of a larger media data bitstream (e.g., for audio broadcasting, audio programs, audiovisual programs, audiovisual broadcasting, etc.). The media data bitstream can be accessed from servers, computers, media storage devices, media databases, media files, etc. The media data bitstream can be broadcast, transmitted, or received via one or more wireless or wired network links. The media data bitstream can also be transmitted via an intermediate medium (e.g., one or more of network connections, USB connections, wide area networks, local area networks, wireless connections, optical connections, buses, crossbar connections, serial connections, etc.).
(例如,图1、图2)所描述的组件中的任何一个可以实现为一个或多个过程和/或一个或多个IC电路(例如,ASIC、FPGA等),可以用硬件、软件或硬件和软件的组合实现。Any of the components described (e.g., Figures 1 and 2) can be implemented as one or more processes and/or one or more IC circuits (e.g., ASIC, FPGA, etc.), and can be implemented in hardware, software, or a combination of hardware and software.
图3和图4例示说明了可以被解码器100中的DRC增益单元104用来从输入响度水平推导DRC增益的示例动态范围压缩曲线。如所例示说明的,动态范围压缩曲线可以以节目中的参考响度水平(例如,输出参考水平)为中心,以便提供适合于特定回放环境的总增益。下表中示出了动态范围压缩曲线的示例定义数据(例如,在编码音频信号102的元数据中的定义数据)(例如,包括但不限于以下中的任何一个:提升比率、削切比率、增高时间、释放时间等)。对于不同的回放环境(例如,解码器100处),不同的配置文件(例如,标准影片(filmstandard)、轻松影片(film light)、标准音乐(music standard)、轻音乐(film light)、语音等)可以是不同的:Figures 3 and 4 illustrate example dynamic range compression curves that can be used by the DRC gain unit 104 in decoder 100 to derive DRC gain from the input loudness level. As illustrated, the dynamic range compression curve can be centered on a reference loudness level in the program (e.g., the output reference level) to provide a total gain suitable for a particular playback environment. The following table shows example definition data for dynamic range compression curves (e.g., definition data in the metadata of the encoded audio signal 102) (e.g., including, but not limited to, any of the following: boost ratio, clipping ratio, boost time, release time, etc.). For different playback environments (e.g., at decoder 100), different profiles (e.g., filmstandard, film light, music standard, film light, speech, etc.) can be different:
表1Table 1
依照以dBSPL或dBFS计的响度水平以及与dBSPL相关的以dB计的增益描述的一个或多个压缩曲线可以被接收,而DRC增益计算是用与dBSPL响度水平具有非线性关系的不同响度表示(例如,Sone)执行的。DRC增益计算中所用的压缩曲线然后可以被转换以用不同的响度表示(例如,Sone)来描述。One or more compression curves, described by loudness levels in dB SPL or dB FS and gain in dB associated with dB SPL , can be received, while DRC gain calculations are performed using different loudness representations (e.g., Sone) that have a non-linear relationship with the dB SPL loudness level. The compression curves used in the DRC gain calculations can then be converted to be described using different loudness representations (e.g., Sone).
图5例示说明了包括帧序列(被编号为n+1直到n+30,其中n为整数)的示例编码音频信号102。在例示说明的例子中,每第5帧是I帧。在例示说明的例子中,I帧(n+1)包括多个DRC配置文件(其被标识为用于家庭影院、平板、便携式HP(耳机)和便携式SP(扬声器)的AVR(音频/视频接收器))。每个DRC配置文件包括如图3和图4所示的动态范围压缩曲线。Figure 5 illustrates an example coded audio signal 102 comprising a sequence of frames (numbered n+1 to n+30, where n is an integer). In the illustrated example, every 5th frame is an I-frame. In the illustrated example, the I-frame (n+1) comprises multiple DRC profiles (identified as AVRs (audio/video receivers) for home theaters, tablets, portable HP (headphones), and portable SP (speakers). Each DRC profile includes a dynamic range compression curve as shown in Figures 3 and 4.
所述多个DRC配置文件可以被重复地插入帧序列的I帧中。这使得解码器100可以在编码音频信号102启动时、在调谐到运行音频节目中和/或随后拼接点之后时确定适合于编码音频信号102和当前渲染模式的DRC配置文件。另一方面,DRC配置文件的全集的重复传输导致位流开销相对较高。鉴于此,提出了在编码音频信号102的I帧内传输变化的DRC配置文件子集。The multiple DRC profiles can be repeatedly inserted into the I-frames of the frame sequence. This allows the decoder 100 to determine the appropriate DRC profile for the encoded audio signal 102 and the current rendering mode when the encoded audio signal 102 is initiated, when it is tuned to the running audio program, and/or after a subsequent splicing point. On the other hand, the repeated transmission of the entire set of DRC profiles results in relatively high bitstream overhead. In view of this, a subset of varying DRC profiles is proposed to be transmitted within the I-frames of the encoded audio signal 102.
图5例示说明了用于将DRC配置文件插入在帧序列内的例子。在例示说明的例子中,只有DRC配置文件的全集中的单个DRC配置文件被插入到I帧中。插入到I帧中的DRC配置文件在I帧之间变化,并且结果,在N个I帧(在例示说明的例子中,N=4)之后,解码器100已经接收到N个DRC配置文件的全集。通过这样做,可以降低用于传输DRC配置文件的的全集的数据速率,同时确保解码器100在合理的时间量内接收到DRC配置文件的全集。Figure 5 illustrates an example of inserting DRC profiles into a frame sequence. In the illustrated example, only a single DRC profile from the complete set of DRC profiles is inserted into an I-frame. The DRC profiles inserted into the I-frames change between I-frames, and as a result, after N I-frames (N=4 in the illustrated example), decoder 100 has received the complete set of N DRC profiles. By doing so, the data rate used to transmit the complete set of DRC profiles can be reduced, while ensuring that decoder 100 receives the complete set of DRC profiles within a reasonable amount of time.
图6a和图6b示出了用于确定用于对编码音频信号102的帧进行解码的DRC配置文件的示例方法600的流程图。方法600可以由解码器100(尤其是由选择器110)执行。当开始接收编码音频信号102时,可以对解码器100所用的DRC配置文件进行初始化。用于对编码音频信号102的当前帧进行解码的DRC配置文件可以被称为当前DRC配置文件。因此,当启动时,可以对当前DRC配置文件进行初始化。特别地,默认DRC配置文件(其在解码器100处是可用的)可以被设置为用于对当前帧进行渲染的当前DRC配置文件(方法步骤601)。因此,变量“profile”可以被设置为默认DRC配置文件(profile=Default DRC Profile)。此外,解码器100可以跟踪先前使用的配置文件。先前使用的配置文件可以被设置为未定义的(prev_profile=undefined)。Figures 6a and 6b illustrate flowcharts of an example method 600 for determining a DRC profile for decoding frames of encoded audio signal 102. Method 600 can be performed by decoder 100 (specifically by selector 110). When receiving encoded audio signal 102 begins, the DRC profile used by decoder 100 can be initialized. The DRC profile used for decoding the current frame of encoded audio signal 102 can be referred to as the current DRC profile. Therefore, the current DRC profile can be initialized upon startup. In particular, the default DRC profile (which is available at decoder 100) can be set to the current DRC profile used for rendering the current frame (method step 601). Thus, the variable “profile” can be set to the default DRC profile (profile = Default DRC Profile). Furthermore, decoder 100 can keep track of previously used profiles. Previously used profiles can be set to undefined (prev_profile = undefined).
方法600可以进一步包括从编码音频信号102取得将被解码的新帧(即,当前帧)的步骤602。在步骤603中,验证新帧是否是可以包括DRC配置文件的I帧。如果新帧不是I帧,则方法600继续进行步骤604,并且使用当前DRC配置文件对新帧进行处理。此外,在方法步骤605中,将先前使用的配置文件设置为当前DRC配置文件(prev_profile=profile)。Method 600 may further include step 602 of obtaining a new frame (i.e., the current frame) to be decoded from the encoded audio signal 102. In step 603, it is verified whether the new frame is an I-frame that may include a DRC profile. If the new frame is not an I-frame, method 600 proceeds to step 604 and processes the new frame using the current DRC profile. Furthermore, in method step 605, the previously used profile is set as the current DRC profile (prev_profile = profile).
如果新帧是I帧,则可以在方法步骤606中检查I帧是否包括DRC数据。举例来说,I帧的元数据可以包括指示I帧是否包括DRC数据的标志。如果DRC数据不存在,则方法300可以继续进行步骤604、605。否则,该方法可以继续进行方法步骤607。If the new frame is an I-frame, then in method step 606, it can be checked whether the I-frame includes DRC data. For example, the metadata of the I-frame may include a flag indicating whether the I-frame includes DRC data. If the DRC data is not present, method 300 may proceed to steps 604 and 605. Otherwise, the method may proceed to method step 607.
在方法步骤607中,可以验证新帧是否是将被解码的编码音频信号102的第一帧。从图6a和图6b的流程图可以看出,这可以通过检查prev_profile变量来进行验证。如果prev_profile变量是未定义的,则新帧是将被解码的第一帧。如果新帧是将被解码的第一帧,则解码器100可以使用除默认DRC配置文件之外的预定义DRC配置文件。为此,新帧的元数据可以包括用于这样的预定义DRC配置文件的标识符(ID)。这样的预定义DRC配置文件可以被存储在解码器100处的数据库内。预定义DRC配置文件的使用可以提供用于向解码器100发信号通知待使用DRC配置文件的位率高效的手段,因为只有预定义配置文件的ID需要被传输(方法步骤608)。使用ID发信号通知的预定义DRC配置文件也可以被称为隐式(implicit)DRC配置文件。In method step 607, it can be verified whether the new frame is the first frame of the encoded audio signal 102 to be decoded. As can be seen from the flowcharts in Figures 6a and 6b, this can be verified by checking the `prev_profile` variable. If the `prev_profile` variable is undefined, then the new frame is the first frame to be decoded. If the new frame is the first frame to be decoded, then decoder 100 can use a predefined DRC profile other than the default DRC profile. For this purpose, the metadata of the new frame can include an identifier (ID) for such a predefined DRC profile. Such a predefined DRC profile can be stored in a database at decoder 100. The use of a predefined DRC profile provides a bit-rate efficient means of signaling decoder 100 to use a DRC profile, since only the ID of the predefined profile needs to be transmitted (method step 608). A predefined DRC profile signaled using an ID can also be referred to as an implicit DRC profile.
应注意,在一些情况下,可能有益的是仅使用除默认DRC配置文件之外的一个预定义DRC配置文件。在此类情况下,解码器100可以被配置为将profile变量设置为预定义(即,隐式)DRC配置文件,而不接收新帧的元数据内的任何ID。It should be noted that in some cases, it may be beneficial to use only a predefined DRC profile in addition to the default DRC profile. In such cases, decoder 100 can be configured to set the profile variable to the predefined (i.e., implicit) DRC profile without receiving any IDs within the metadata of new frames.
方法600可以进一步包括验证新帧的元数据是否包括一个或多个显式(explicit)DRC配置文件(步骤609)。显式DRC配置文件可以包括用于标识显式DRC配置文件的ID。此外,显式DRC配置文件通常包括如图3和图4所示的动态范围压缩曲线的定义数据。动态范围压缩曲线可以被定义为分段线性函数。此外,显式DRC配置文件可以指示显式DRC配置文件适用的输出参考水平(ORL)的范围。举例来说,默认DRC配置文件和/或预定义(隐式)DRC配置文件可以适用于从-31dB FS直到0dB FS的范围内的输出参考水平。Method 600 may further include verifying whether the metadata of the new frame includes one or more explicit DRC profiles (step 609). An explicit DRC profile may include an ID used to identify the explicit DRC profile. Furthermore, an explicit DRC profile typically includes definition data for a dynamic range compression curve, as shown in Figures 3 and 4. The dynamic range compression curve may be defined as a piecewise linear function. Additionally, the explicit DRC profile may indicate the range of output reference levels (ORL) to which the explicit DRC profile applies. For example, a default DRC profile and/or a predefined (implicit) DRC profile may apply to output reference levels ranging from -31 dB FS to 0 dB FS.
渲染设备的ORL可以指示渲染设备的动态范围能力。通常,动态范围能力随着ORL增大而降低。在ORL高的情况下,应使用压缩程度高的压缩曲线,以便在不剪裁的情况下以可懂的方式来渲染音频信号。另一方面,在ORL低的情况下,压缩可被减小以便以高动态范围来渲染音频信号。由于渲染设备的动态范围能力高,音频信号的可懂度仍可以得到保证。The rendering device's ORL (Organizational Range) indicates its dynamic range capability. Generally, dynamic range capability decreases as ORL increases. With a high ORL, a highly compressed profile should be used to render the audio signal in an intelligible manner without clipping. Conversely, with a low ORL, compression can be reduced to render the audio signal with a high dynamic range. Because of the high dynamic range capability of the rendering device, the intelligibility of the audio signal can still be guaranteed.
如果新帧的元数据包括至少一个显式DRC配置文件,则读取第一DRC配置文件的配置文件数据(步骤610)。此外,验证第一DRC配置文件的ORL的范围是否适用于当前使用的渲染设备(步骤611)。如果情况并非如此,则方法600继续在新帧的元数据内查找另一个显式DRC配置文件。另一方面,如果显式DRC配置文件适用于渲染设备,则可以将该显式DRC配置文件设置为将用于对新帧进行处理的当前DRC配置文件(步骤614)。If the metadata of the new frame includes at least one explicit DRC profile, the profile data of the first DRC profile is read (step 610). Furthermore, it is verified that the ORL range of the first DRC profile is applicable to the currently used rendering device (step 611). If not, method 600 continues to search for another explicit DRC profile within the metadata of the new frame. Alternatively, if the explicit DRC profile is applicable to the rendering device, it can be set as the current DRC profile to be used for processing the new frame (step 614).
方法600可以进一步包括验证耳机渲染模式是否被使用以及显式DRC配置文件是否适用于耳机渲染模式(步骤612)。另外,方法600可以包括验证显式DRC配置文件与先前使用的配置文件相比是否是更新的配置文件(步骤613)。为此,可以将显式DRC配置文件的ID与当前使用的配置文件的ID进行比较。通过这样做,可以确保解码器100总是使用最近的DRC配置文件。Method 600 may further include verifying whether the headphone rendering mode is used and whether the explicit DRC profile is applicable to the headphone rendering mode (step 612). Additionally, method 600 may include verifying whether the explicit DRC profile is a newer profile compared to a previously used profile (step 613). For this purpose, the ID of the explicit DRC profile can be compared with the ID of the currently used profile. By doing so, it can be ensured that the decoder 100 always uses the most recent DRC profile.
使用方法600,可以确保即使解码器100尚未接收到用于当前渲染模式(即,用于当前渲染设备)的DRC配置文件,解码器100也总能识别用于对编码音频信号102的帧进行渲染的DRC配置文件。此外,确保解码器100一接收到对应的DRC配置文件,就应用用于当前渲染模式的DRC配置文件。Using method 600, it can be ensured that even if decoder 100 has not yet received a DRC profile for the current rendering mode (i.e., for the current rendering device), decoder 100 can always recognize the DRC profile used to render the frames of the encoded audio signal 102. Furthermore, it ensures that decoder 100 applies the DRC profile for the current rendering mode as soon as it receives the corresponding DRC profile.
因此,描述了用于对编码音频信号102进行解码的方法600。编码音频信号102包括帧序列。此外,编码音频信号102指示用于对应的多个不同渲染模式的多个不同的动态范围控制(DRC)配置文件。针对不同渲染模式(或不同再现环境)的例子是用在家庭影院渲染模式中的第一DRC配置文件;用在平板渲染模式中的第二DRC配置文件;用在便携式设备扩音器渲染模式中的第三DRC配置文件;和/或用在耳机渲染模式中的第四DRC配置文件。DRC配置文件定义了特定的DRC行为。DRC行为可以用压缩曲线(和时间常数)和/或用DRC增益来描述。DRC增益可以是可应用于编码音频信号102以部署DRC的时间等距增益。压缩曲线可以伴随有时间常数,它们共同配置了DRC算法。DRC通常降低响亮的声音的音量,并且放大安静的声音,从而压缩音频信号的动态范围以用于改进不理想的再现环境中的体验。Therefore, a method 600 for decoding encoded audio signal 102 is described. Encoded audio signal 102 includes a sequence of frames. Furthermore, encoded audio signal 102 indicates multiple different Dynamic Range Control (DRC) profiles for corresponding multiple different rendering modes. Examples for different rendering modes (or different reproduction environments) include a first DRC profile used in a home theater rendering mode; a second DRC profile used in a tablet rendering mode; a third DRC profile used in a portable device loudspeaker rendering mode; and/or a fourth DRC profile used in a headphone rendering mode. The DRC profile defines specific DRC behavior. DRC behavior can be described by a compression curve (and time constant) and/or by DRC gain. The DRC gain can be a time-isolated gain that can be applied to the encoded audio signal 102 to deploy DRC. The compression curve may be accompanied by a time constant; together, they configure the DRC algorithm. DRC typically reduces the volume of loud sounds and amplifies quiet sounds, thereby compressing the dynamic range of the audio signal to improve the experience in less-than-ideal reproduction environments.
帧序列通常包括形成音频信号的多个连续的帧。音频节目(例如,广播TV或无线电节目)可以包括在拼接点处连结的多个音频信号。举例来说,主要音频节目可以被广告时间以重复的方式中断。帧序列可以对应于整个音频节目。可替代地,帧序列可以对应于形成整个音频节目的多个音频信号中的一个。A frame sequence typically comprises multiple consecutive frames that form an audio signal. An audio program (e.g., broadcast TV or radio program) may include multiple audio signals joined together at splicing points. For example, a main audio program may be interrupted by repeating commercial breaks. A frame sequence may correspond to the entire audio program. Alternatively, a frame sequence may correspond to one of the multiple audio signals that form the entire audio program.
所述多个DRC配置文件中的不同DRC配置文件子集可以被包括在帧序列的不同帧内,以使帧序列的两个或更多个帧共同(jointly)包括所述多个DRC配置文件。如上面所指示的,DRC配置文件在帧序列的多个帧上的分布导致用于用信号通知所述多个DRC配置文件的位流开销降低。Different subsets of the multiple DRC profiles can be included in different frames of the frame sequence so that two or more frames of the frame sequence jointly include the multiple DRC profiles. As indicated above, the distribution of DRC profiles across multiple frames of the frame sequence results in a reduction in bitstream overhead for signaling the multiple DRC profiles.
方法600可以包括从多个不同渲染模式确定第一渲染模式。特别地,可以确定哪个渲染模式被用于对编码音频信号102进行渲染。此外,方法600可以包括从帧序列的当前帧内所包括的多个DRC配置文件确定609、610一个或多个DRC配置文件。换句话说,可以确定当前帧内所包括的DRC配置文件子集中的一个或多个DRC配置文件。另外,可以确定611该一个或多个DRC配置文件中的至少一个是否适用于第一渲染模式。确定611该一个或多个DRC配置文件中的至少一个是否适用于第一渲染模式可以包括:确定用于第一渲染模式的第一输出参考水平,确定该一个或多个DRC配置文件中的DRC配置文件适用的输出参考水平的范围,并且确定第一输出参考水平是否落在输出参考水平范围内。Method 600 may include determining a first rendering mode from a plurality of different rendering modes. Specifically, it may be determined which rendering mode is used to render the encoded audio signal 102. Furthermore, method 600 may include determining 609, 610 one or more DRC profiles from a plurality of DRC profiles included within the current frame of the frame sequence. In other words, it may be determined one or more DRC profiles from a subset of DRC profiles included within the current frame. Additionally, it may be determined 611 whether at least one of the one or more DRC profiles is suitable for the first rendering mode. Determining 611 whether at least one of the one or more DRC profiles is suitable for the first rendering mode may include: determining a first output reference level for the first rendering mode, determining a range of output reference levels to which the DRC profiles in the one or more DRC profiles are applicable, and determining whether the first output reference level falls within the range of output reference levels.
方法600可以进一步包括:如果该一个或多个DRC配置文件都不适用于第一渲染模式,则选择604默认DRC配置文件作为当前DRC配置文件。默认DRC配置文件的定义数据在用于对编码音频信号102进行解码的解码器100处通常是已知的。另外,方法600可以包括使用当前DRC配置文件对当前帧进行解码(和/或渲染)。因此,可以确保即使解码器100尚未接收到特定于编码音频信号102的DRC配置文件,解码器100也能使用DRC配置文件(以及动态范围压缩曲线)。Method 600 may further include: if none of the one or more DRC profiles are suitable for the first rendering mode, then selecting a default DRC profile 604 as the current DRC profile. The definition data of the default DRC profile is generally known at the decoder 100 used to decode the encoded audio signal 102. Additionally, method 600 may include decoding (and/or rendering) the current frame using the current DRC profile. Therefore, it can be ensured that decoder 100 can use the DRC profile (and dynamic range compression curve) even if decoder 100 has not yet received a DRC profile specific to the encoded audio signal 102.
可替代地或附加地,方法600可以包括:如果该一个或多个DRC配置文件中的第一DRC配置文件被确定为适用于第一渲染模式,则选择604该第一DRC配置文件作为当前DRC配置文件。其结果是,解码器100被配置为解码器100一接收到第一DRC配置文件,就使用对于编码音频信号102和对于第一渲染模式最优的第一DRC配置文件。Alternatively or additionally, method 600 may include: if a first DRC profile among the one or more DRC profiles is determined to be suitable for the first rendering mode, then 604 selecting the first DRC profile as the current DRC profile. As a result, decoder 100 is configured to use the first DRC profile that is optimal for encoding the audio signal 102 and for the first rendering mode as soon as decoder 100 receives the first DRC profile.
方法600可以进一步包括确定603、606帧序列的当前帧是否包括该多个DRC配置文件中的一个或多个DRC配置文件,即,当前帧是否包括DRC配置文件子集。如在图5的上下文中所概述的,DRC配置文件子集通常被包括在帧序列的I帧内。因此,确定603、606当前帧是否包括该多个DRC配置文件中的一个或多个DRC配置文件或当前帧是否包括DRC配置文件子集可以包括确定603当前帧是否是I帧。如上面所指示的,I帧可以是可独立于帧序列中的任何其他帧被解码的帧。这可能是由于这样的I帧中所包括的数据以与来自前面的帧或后续的帧的数据无关的方式被传输这一事实而导致的。特别地,I帧内所包括的数据的编码相对于前一帧或后一帧内所包括的数据来说是没有区别的。Method 600 may further include determining whether the current frame of the frame sequence 603, 606 includes one or more of the plurality of DRC profiles, i.e., whether the current frame includes a subset of DRC profiles. As outlined in the context of Figure 5, a subset of DRC profiles is typically included within an I-frame of the frame sequence. Therefore, determining whether the current frames 603, 606 include one or more of the plurality of DRC profiles or whether the current frame includes a subset of DRC profiles may include determining whether the current frame 603 is an I-frame. As indicated above, an I-frame can be a frame that can be decoded independently of any other frame in the frame sequence. This may be due to the fact that the data included in such an I-frame is transmitted in a manner independent of the data from preceding or subsequent frames. In particular, the encoding of the data included within an I-frame is indistinguishable from the data included in the preceding or following frame.
此外,确定603、606当前帧是否包括所述多个DRC配置文件中的一个或多个DRC配置文件或当前帧是否包括DRC配置文件子集可以包括验证606当前帧内所包括的DRC配置文件标志。编码音频信号的位流内的DRC配置文件提供了用于识别携带DRC配置文件的帧的带宽和计算高效的手段。Furthermore, determining whether the current frames 603 and 606 include one or more of the plurality of DRC profiles, or whether the current frame includes a subset of DRC profiles, may include verifying the DRC profile flags included in the current frame 606. DRC profiles within the bitstream of the encoded audio signal provide a bandwidth-efficient and computationally efficient means of identifying frames carrying DRC profiles.
方法600可以进一步包括确定当前帧是否指示多个隐式DRC配置文件中的一个隐式DRC配置文件。隐式DRC配置文件可以包括可以用于转码为E-AC-3的预定义老式压缩曲线和时间常数。如上面所指示的,隐式DRC配置文件的定义数据在用于对输入音频信号102进行解码的解码器100处可以是已知的。与默认DRC配置文件相反,隐式DRC配置文件可以是特定于(如例如表1中指定的)不同类型的音频信号的。帧序列的当前帧可以指示特定的隐式DRC配置文件(例如,通过使用标识符,ID)。这可以提供用于用信号通知适合于编码音频信号102的DRC配置文件的带宽高效的手段。如果确定当前帧指示隐式DRC配置文件,则可以选择608隐式DRC配置文件作为当前DRC配置文件。Method 600 may further include determining whether the current frame indicates an implicit DRC profile among a plurality of implicit DRC profiles. The implicit DRC profile may include a predefined legacy compression curve and time constant that can be used for transcoding to E-AC-3. As indicated above, the definition data of the implicit DRC profile may be known at the decoder 100 used to decode the input audio signal 102. In contrast to the default DRC profile, the implicit DRC profile may be specific to (as specified, for example, in Table 1) different types of audio signals. The current frame of a frame sequence may indicate a specific implicit DRC profile (e.g., by using an identifier, ID). This can provide a bandwidth-efficient means of signaling the appropriate DRC profile for encoding the audio signal 102. If it is determined that the current frame indicates an implicit DRC profile, then implicit DRC profile 608 can be selected as the current DRC profile.
当前帧的解码可以包括使帧序列的水平等于第一渲染模式的第一输出参考水平。此外,当前帧的解码可以包括使用当前DRC配置文件内指定的动态范围压缩曲线来改动当前帧的响度水平。响度水平的改动可以如在图1的上下文中概述的那样执行。Decoding the current frame may include setting the level of the frame sequence to equal the first output reference level of the first rendering mode. Additionally, decoding the current frame may include modifying the loudness level of the current frame using a dynamic range compression curve specified within the current DRC profile. Modification of the loudness level may be performed as outlined in the context of Figure 1.
根据帧序列中的帧数,当前DRC配置文件可以对应于默认DRC配置文件(其通常独立于输入音频信号102)、对应于隐式DRC配置文件(其可以被以有限的方式改动以适应输入音频信号102)或对应于第一显式DRC配置文件(其可能已经被设计用于输入音频信号102和/或第一渲染模式)。Depending on the frame number in the frame sequence, the current DRC profile may correspond to the default DRC profile (which is typically independent of the input audio signal 102), the implicit DRC profile (which can be modified in a limited way to suit the input audio signal 102), or the first explicit DRC profile (which may have been designed for the input audio signal 102 and/or the first rendering mode).
通常,只有帧子集包括DRC配置文件。一旦当前DRC配置文件已经被选择,就可以保持当前DRC配置文件用于对帧序列的不包括任何DRC配置文件的帧进行解码。此外,即使当接收到具有DRC配置文件的帧时,也可以保持当前DRC配置文件,只要没有比当前DRC配置文件更新的和/或与编码音频信号102相关性更高的DRC配置文件被接收到(其中,所选的第一显式DRC配置文件具有比所选的隐式DRC配置文件更高的相关性,该隐式DRC配置文件具有比默认DRC配置文件更高的相关性)即可。通过这样做,可以确保所用DRC配置文件的连续性和最优性。Typically, only a subset of frames includes a DRC profile. Once the current DRC profile has been selected, it can be used to decode frames in the frame sequence that do not include any DRC profile. Furthermore, even when a frame with a DRC profile is received, the current DRC profile can be maintained, provided that no newer DRC profile and/or more relevant to the encoded audio signal 102 is received (wherein, the selected first explicit DRC profile has a higher relevance than the selected implicit DRC profile, which in turn has a higher relevance than the default DRC profile). By doing so, the continuity and optimality of the DRC profiles used can be ensured.
与用于对编码音频信号102进行解码的方法600互补地,描述了用于生成编码音频信号102或对编码音频信号102进行编码的方法。编码音频信号102包括帧序列。此外,编码音频信号102指示用于对应的多个不同渲染模式的多个不同的动态范围控制(DRC)配置文件。所述方法可以包括将所述多个DRC配置文件中的不同DRC配置文件子集插入到帧序列的不同帧中,以使帧序列的两个或更多个帧共同包括所述多个DRC配置文件。换句话说,具有少于DRC配置文件总数的DRC配置文件的DRC配置文件子集可以连同帧序列的不同帧一起被提供。通过这样做,可以减少编码音频信号102的开销,同时将DRC配置文件的全集提供给对应的解码器100。换句话说,该方法的优点是,编码器150传输DRC的数据的自由度提高。该自由度可以用于降低位率。Complementing the method 600 for decoding the encoded audio signal 102, a method for generating or encoding the encoded audio signal 102 is described. The encoded audio signal 102 includes a frame sequence. Furthermore, the encoded audio signal 102 indicates multiple different Dynamic Range Control (DRC) profiles for corresponding multiple different rendering modes. The method may include inserting different subsets of the multiple DRC profiles into different frames of the frame sequence, such that two or more frames of the frame sequence collectively include the multiple DRC profiles. In other words, a subset of DRC profiles having fewer DRC profiles than the total number of DRC profiles can be provided along with different frames of the frame sequence. By doing so, the overhead of the encoded audio signal 102 can be reduced while providing the complete set of DRC profiles to the corresponding decoder 100. In other words, the advantage of this method is that the encoder 150 has increased freedom in transmitting DRC data. This freedom can be used to reduce the bit rate.
帧序列可以包括I帧子序列(例如,帧序列的每第X帧可以是I帧)。不同DRC配置文件子集可以被插入到I帧子序列的不同的(例如,连续的)I帧中。为了进一步减小带宽,可以跳过I帧,即,I帧中的一些可以不包括任何DRC配置文件数据。A frame sequence may include I-frame subsequences (e.g., every Xth frame in a frame sequence may be an I-frame). Different subsets of DRC profiles may be inserted into different (e.g., consecutive) I-frames within an I-frame subsequence. To further reduce bandwidth, I-frames may be skipped; that is, some I-frames may not include any DRC profile data.
(例如,每个)DRC配置文件子集可以仅包括一个DRC配置文件。特别地,多个DRC配置文件可以包括N个DRC配置文件,其中N是整数,N>1。N个DRC配置文件可以被插入到帧序列中的N个不同帧中。通过这样做,可以使传输DRC配置文件所需的位率最小。For example, each subset of DRC profiles may contain only one DRC profile. Specifically, multiple DRC profiles may comprise N DRC profiles, where N is an integer, N>1. The N DRC profiles can be inserted into N distinct frames in a frame sequence. By doing so, the bit rate required to transmit the DRC profiles can be minimized.
所述方法可以进一步包括将多个DRC配置文件全都插入到帧序列的第一帧(例如,音频信号的帧序列的第一帧)中。其结果是,可以直接用正确的显式DRC配置文件来开始编码音频信号102的渲染。如上面所指示的,音频节目可以细分为多个子音频节目,例如,被广告时间中断的主要音频节目。可能有益的是将多个DRC配置文件全都插入到每个子音频节目的第一帧中。换句话说,可能有益的是直接在包括多个子音频节目的音频节目的一个或多个拼接点之后插入全部多个DRC配置文件。The method may further include inserting multiple DRC profiles into the first frame of a frame sequence (e.g., the first frame of a frame sequence of audio signals). As a result, the rendering of the encoded audio signal 102 can begin directly with the correct explicit DRC profile. As indicated above, an audio program can be subdivided into multiple sub-audio programs, such as a main audio program interrupted by commercial breaks. It may be advantageous to insert multiple DRC profiles into the first frame of each sub-audio program. In other words, it may be advantageous to insert all multiple DRC profiles directly after one or more splicing points of an audio program comprising multiple sub-audio programs.
多个DRC配置文件中的不同DRC配置文件子集可以被插入到帧序列的不同帧中,以使帧序列中的M个直接相连的帧的每个子序列共同构成所述多个DRC配置文件,其中M是整数,M>1。换句话说,多个DRC配置文件可以在M个帧的块内重复传输。其结果是,解码器100在获得用于编码音频信号102的最优显式DRC配置文件之前必须等待最多M个帧。Different subsets of multiple DRC profiles can be inserted into different frames of a frame sequence, such that each subsequence of M directly contiguous frames in the frame sequence constitutes the multiple DRC profiles, where M is an integer, M>1. In other words, multiple DRC profiles can be repeatedly transmitted within a block of M frames. As a result, decoder 100 must wait for at most M frames before obtaining the optimal explicit DRC profile for encoding audio signal 102.
所述方法可以进一步包括将标志插入到帧序列的帧中,其中,该标志指示该帧是否包括DRC配置文件。提供此类标志使得对应的解码器100能够高效地识别包括DRC配置文件数据的帧。The method may further include inserting a flag into frames of a frame sequence, wherein the flag indicates whether the frame includes a DRC profile. Providing such a flag enables the corresponding decoder 100 to efficiently identify frames that include DRC profile data.
多个DRC配置文件的DRC配置文件可以是包括(即,携带)用于定义动态范围压缩曲线的定义数据的显式DRC配置文件。如本文件中所概述的,动态范围压缩曲线提供了输入响度和输出响度之间的映射和/或将应用于音频信号的增益。具体地说,定义数据可以包括以下中的一个或多个:提升增益,其用于提升输入响度;提升增益范围,其指示提升增益适用的输入响度的范围;零带范围,其指示增益0dB适用的输入响度的范围;削切增益,其用于使输入响度衰减;削切增益范围,其指示削切增益适用的输入响度的范围;提升增益比率,其指示零增益和提升增益之间的转变;和/或削切增益比率,其指示零增益和削切增益之间的转变。A DRC profile of multiple DRC profiles can be an explicit DRC profile that includes (i.e., carries) definition data for defining the dynamic range compression curve. As outlined in this document, the dynamic range compression curve provides a mapping between input loudness and output loudness and/or a gain that will be applied to the audio signal. Specifically, the definition data may include one or more of the following: boost gain, which is used to boost the input loudness; boost gain range, which indicates the range of input loudness to which boost gain applies; zero-band range, which indicates the range of input loudness to which a gain of 0 dB applies; clipping gain, which is used to attenuate the input loudness; clipping gain range, which indicates the range of input loudness to which clipping gain applies; boost gain ratio, which indicates the transition between zero gain and boost gain; and/or clipping gain ratio, which indicates the transition between zero gain and clipping gain.
所述方法可以进一步包括插入隐式DRC配置文件的指示(例如,标识符,ID),其中,隐式DRC配置文件的定义数据通常对于编码音频信号102的解码器100是已知的。隐式DRC配置文件的指示可以提供用于用信号通知被(以有限的方式)改动以适应编码音频信号102的DRC配置文件的带宽高效的手段。The method may further include inserting an indication (e.g., an identifier, ID) of an implicit DRC profile, wherein the definition data of the implicit DRC profile is generally known to the decoder 100 encoding the audio signal 102. The indication of the implicit DRC profile can provide a bandwidth-efficient means of signaling changes (in a limited manner) to the DRC profile of the encoded audio signal 102 to accommodate modifications.
如上面所概述的,帧序列的帧通常包括音频数据和元数据。DRC配置文件子集通常被作为元数据插入。As outlined above, frames in a frame sequence typically include audio data and metadata. A subset of the DRC profile is usually inserted as metadata.
DRC配置文件可以包括用于定义DRC配置文件适用的输出参考水平的范围的定义数据。输出参考水平通常指示渲染模式的动态范围。特别地,渲染模式的动态范围可以随着输出参考水平增大而缩小,反之亦然。此外,DRC配置文件的动态范围压缩曲线的最大提升增益和最大削切增益可以随着输出参考水平增大而增大,反之亦然。因此,输出参考水平提供了用于对于特定渲染模式选择适当的DRC配置文件(具有适当的动态范围压缩曲线)的高效手段。A DRC profile can include definition data for defining the range of output reference levels to which the DRC profile applies. The output reference level typically indicates the dynamic range of the rendering mode. Specifically, the dynamic range of a rendering mode can decrease as the output reference level increases, and vice versa. Furthermore, the maximum boost gain and maximum clipping gain of the dynamic range compression curve of the DRC profile can increase as the output reference level increases, and vice versa. Therefore, the output reference level provides an efficient means of selecting an appropriate DRC profile (with an appropriate dynamic range compression curve) for a given rendering mode.
所述方法可以进一步包括生成包括编码音频信号102的位流。该位流可以是AC4位流,即,该位流可以与AC4位流格式兼容。The method may further include generating a bitstream comprising the encoded audio signal 102. This bitstream may be an AC4 bitstream, i.e., it may be compatible with the AC4 bitstream format.
所述方法可以进一步包括将用于编码音频信号102的显式DRC增益插入到帧序列的帧中。特别地,适用于帧序列的特定帧的DRC增益可以被插入到该特定帧中。因此,帧序列的每个帧可以包括DRC数据分量,该DRC数据分量包括将应用于相应帧的一个或多个显式DRC增益。特别地,每个帧可以包括用于不同渲染模式的不同显式DRC增益。为此,可以在编码器150内应用用于不同渲染模式的DRC算法,并且可以在编码器150处确定用于不同渲染模式的不同DRC增益。然后可以将不同DRC增益显式地插入在帧序列内。其结果是,对应的解码器100直接应用显式DRC增益,而不执行使用动态范围压缩曲线的DRC算法。The method may further include inserting explicit DRC gains for encoding audio signal 102 into frames of the frame sequence. Specifically, DRC gains applicable to a particular frame of the frame sequence can be inserted into that particular frame. Therefore, each frame of the frame sequence may include a DRC data component comprising one or more explicit DRC gains to be applied to the corresponding frame. In particular, each frame may include different explicit DRC gains for different rendering modes. For this purpose, DRC algorithms for different rendering modes can be applied within encoder 150, and different DRC gains for different rendering modes can be determined at encoder 150. The different DRC gains can then be explicitly inserted into the frame sequence. As a result, the corresponding decoder 100 directly applies explicit DRC gains without performing a DRC algorithm using dynamic range compression curves.
因此,帧序列可以包括或可以指示用于用信号通知用于多个对应的渲染模式的动态范围压缩曲线的多个显式DRC配置文件。所述多个DRC配置文件可以被插入到帧序列的帧中的一些(不是全部)(例如,I帧)中。此外,帧序列可以包括或可以指示用于对应的一个或多个渲染模式的一个或多个DRC配置文件,其中,所述一个或多个DRC配置文件指示用于一个或多个渲染模式的显式DRC增益被插入到帧序列的帧中。举例来说,用于用信号通知显式DRC增益的该一个或多个DRC配置文件可以包括指示显式DRC增益是否包括在帧序列的帧中的标志。DRC增益可以被插入到帧序列的每个帧中。特别地,每个帧可以包括将被用于对该帧进行解码的一个或多个DRC增益。Therefore, the frame sequence may include or may indicate multiple explicit DRC profiles for signaling dynamic range compression curves for multiple corresponding rendering modes. These multiple DRC profiles may be inserted into some (but not all) of the frames in the frame sequence (e.g., I-frames). Furthermore, the frame sequence may include or may indicate one or more DRC profiles for corresponding one or more rendering modes, wherein the one or more DRC profiles indicate that explicit DRC gains for one or more rendering modes are inserted into the frames of the frame sequence. For example, the one or more DRC profiles for signaling explicit DRC gains may include flags indicating whether explicit DRC gains are included in the frames of the frame sequence. DRC gains may be inserted into each frame of the frame sequence. In particular, each frame may include one or more DRC gains that will be used to decode that frame.
所述方法可以包括将用于显式DRC增益的DRC配置文件插入到帧序列中的帧子集中。举例来说,其DRC增益被传输的DRC配置文件可以指示用于显式增益的DRC配置数据。具体地说,其DRC增益被传输的DRC配置文件可以包括在所有的所述DRC配置文件子集中。DRC配置数据(例如,标志)可以指示帧序列包括用于特定渲染模式的显式DRC增益。通过这样做,解码器100被通知以下事实:对于特定渲染模式,显式DRC增益将从帧序列的帧直接推导得到。The method may include inserting a DRC profile for explicit DRC gain into a subset of frames in the frame sequence. For example, the DRC profile whose DRC gain is transmitted may indicate DRC configuration data for explicit gain. Specifically, the DRC profile whose DRC gain is transmitted may be included in all subsets of said DRC profiles. The DRC configuration data (e.g., flags) may indicate that the frame sequence includes explicit DRC gain for a specific rendering mode. By doing so, decoder 100 is informed that, for a specific rendering mode, the explicit DRC gain will be directly derived from the frames of the frame sequence.
因此,所述方法可以进一步包括针对特定渲染模式确定用于编码音频信号102的显式DRC增益。另外,所述方法可以包括将显式DRC增益插入到帧序列的帧中。显式DRC增益可以被插入到帧序列中的显式DRC增益适用的帧中。此外,帧序列中的帧可以包括在特定渲染模式内对帧进行解码所需的一个或多个显式DRC增益。Therefore, the method may further include determining an explicit DRC gain for encoding the audio signal 102 for a specific rendering mode. Additionally, the method may include inserting explicit DRC gains into frames of a frame sequence. Explicit DRC gains can be inserted into frames in the frame sequence to which the explicit DRC gains are applicable. Furthermore, frames in the frame sequence may include one or more explicit DRC gains required for decoding the frames within a specific rendering mode.
所述方法可以进一步包括将指示用于特定渲染模式的DRC配置数据的DRC配置文件插入到帧序列中的帧子集(例如,I帧)中。DRC配置数据(包括例如标志)可以指示以下事实:对于特定渲染模式,显式DRC增益被包括在帧序列的帧中。因此,解码器100可以高效地确定是否使用来自多个DRC配置文件的压缩曲线以用于用信号通知动态范围压缩曲线或者是否使用显式DRC增益。The method may further include inserting a DRC profile indicating DRC configuration data for a specific rendering mode into a subset of frames (e.g., I-frames) in the frame sequence. The DRC configuration data (including, for example, flags) may indicate that, for a specific rendering mode, explicit DRC gain is included in the frames of the frame sequence. Therefore, the decoder 100 can efficiently determine whether to use compression curves from multiple DRC profiles to signal dynamic range compression curves or whether to use explicit DRC gain.
用于用信号通知动态范围压缩曲线的DRC配置文件以及指向显式DRC配置文件的一个或多个DRC配置文件可以被包括在帧序列的I帧的专用语法元素(其被称为例如DRC配置文件语法元素)内。The DRC profile used to signal the dynamic range compression curve, and one or more DRC profiles pointing to the explicit DRC profile, can be included in a special syntax element (referred to as, for example, the DRC profile syntax element) of the I-frame of the frame sequence.
本文件中所描述的方法和系统可以实现为软件、固件和/或硬件。某些组件可以例如实现为在数字信号处理器或微处理器上运行的软件。其他组件可以例如实现为硬件和/或专用集成电路。在所描述的方法和系统中遇到的信号可以存储在比如随机存取存储器或光学存储介质的介质上。它们可以经由网络(比如无线电网络、卫星网络、无线网络或有线网络(例如,互联网))传送。使用本文件中所描述的方法和系统的典型设备是用于存储和/或渲染音频信号的便携式电子设备或其他消费者设备。The methods and systems described in this document can be implemented as software, firmware, and/or hardware. Some components can be implemented, for example, as software running on a digital signal processor or microprocessor. Other components can be implemented, for example, as hardware and/or application-specific integrated circuits (ASICs). Signals encountered in the described methods and systems can be stored on a medium such as random access memory or optical storage media. They can be transmitted via a network (such as a radio network, satellite network, wireless network, or wired network (e.g., the Internet)). Typical devices using the methods and systems described in this document are portable electronic devices or other consumer devices for storing and/or rendering audio signals.
Claims (12)
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US62/058,228 | 2014-10-01 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| HK40115800A true HK40115800A (en) | 2025-04-11 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7567093B1 (en) | Efficient DRC Profile Transmission | |
| HK40115800A (en) | Efficient drc profile transmission | |
| HK40114612A (en) | Efficient drc profile transmission | |
| HK40114251A (en) | Efficient drc profile transmission | |
| HK40057532B (en) | Efficient drc profile transmission | |
| HK40057528B (en) | Efficient drc profile transmission | |
| HK40057531B (en) | Efficient drc profile transmission | |
| HK40057532A (en) | Efficient drc profile transmission | |
| HK40057528A (en) | Efficient drc profile transmission | |
| HK40057531A (en) | Efficient drc profile transmission |