HK1245491B - Computer-readable medium - Google Patents
Computer-readable medium Download PDFInfo
- Publication number
- HK1245491B HK1245491B HK18104576.4A HK18104576A HK1245491B HK 1245491 B HK1245491 B HK 1245491B HK 18104576 A HK18104576 A HK 18104576A HK 1245491 B HK1245491 B HK 1245491B
- Authority
- HK
- Hong Kong
- Prior art keywords
- channel
- configuration
- data
- decoder
- payload
- Prior art date
Links
Description
本申请是申请号为201280023547.2、申请日为2012年3月19日、名称为“具有灵活配置功能的音频编码器和解码器”的PCT发明专利申请的分案申请。This application is a divisional application of the PCT invention patent application with application number 201280023547.2, application date March 19, 2012, and name “Audio encoder and decoder with flexible configuration function”.
技术领域Technical Field
本发明涉及音频编码,具体地涉及高质量和低比特率编码,例如根据所谓的USAC编码(USAC=统一语音与音频编码)已知的。The present invention relates to audio coding, in particular to high-quality and low-bit-rate coding, such as is known from the so-called USAC coding (USAC=Unified Speech and Audio Coding).
背景技术Background Art
在ISO/IEC CD 23003-1中定义了USAC编解码器(coder)。命名为“信息技术-运动图像专家组(MPEG)音频技术-第三部分:统一语音与音频编码”的本标准详细地描述了对关于统一语音与音频编码的建议的呼吁的参考模型的功能块。The USAC codec is defined in ISO/IEC CD 23003-1. This standard, entitled "Information Technology - Moving Picture Experts Group (MPEG) Audio Techniques - Part 3: Unified Speech and Audio Coding", details the functional blocks of the reference model for the call for recommendations on unified speech and audio coding.
图10a和图10b示出编码器和解码器的框图。USAC编码器和解码器的框图反映出MPEG-D USAC编码的结构。可以像这样来描述大体结构:首先,存在包括MPEG环绕(MPEGS)功能单元和增强型SBR(eSBR)单元的公共预/后-处理,该MPEGS功能单元处置立体声或多通道处理,以及该eSBR处置输入信号中的较高音频频率的参数表示。然后,存在二个分支,一个分支包括改进的高级音频编码(AAC)工具路径,而另一分支包括基于线性预测编码(LP或LPC域)的路径,该另一分支转而以LPC残差的频域表示或时域表示为特征。用于AAC和LPC二者的所有传输频谱在量化与算术编码后以改进离散余弦变换(MDCT)域表示。时域表示使用代数编码激励线性预测(ACELP)激励编码方案。Figures 10a and 10b show block diagrams of the encoder and decoder. The block diagrams of the USAC encoder and decoder reflect the structure of MPEG-D USAC coding. The general structure can be described as follows: First, there is a common pre-/post-processing including an MPEG Surround (MPEGS) functional unit and an enhanced SBR (eSBR) unit. The MPEGS functional unit handles stereo or multi-channel processing, and the eSBR handles the parametric representation of higher audio frequencies in the input signal. Then, there are two branches, one branch including an improved Advanced Audio Coding (AAC) tool path and the other branch including a path based on linear predictive coding (LP or LPC domain), which in turn features a frequency domain representation or a time domain representation of the LPC residual. All transmission spectra for both AAC and LPC are represented in the modified discrete cosine transform (MDCT) domain after quantization and arithmetic coding. The time domain representation uses the Algebraic Coded Excited Linear Prediction (ACELP) excitation coding scheme.
在图10a和图10b中示出了MPEG-D USAC的基本结构。在该图中的数据流为从左至右、从上到下。该解码器的功能是找出比特流有效载荷中的对量化音频频谱或时域表示的描述,并且对量化的值和其它重建信息进行解码。The basic structure of MPEG-D USAC is shown in Figures 10a and 10b. The data flow in the figure is from left to right and from top to bottom. The decoder's function is to find the description of the quantized audio spectrum or time domain representation in the bitstream payload and decode the quantized values and other reconstruction information.
在传输频谱信息的情况下,解码器将重建量化频谱,通过在比特流有效载荷中起作用的任意工具来处理所重建的频谱以达到如由输入比特流有效载荷描述的实际信号频谱,以及最后将频域频谱转换到时域。在初始重建和频谱重建的定标后,存在修改频谱中的一个或更多个频谱以提供更有效编码的任选工具。In the case of transmitted spectral information, the decoder will reconstruct the quantized spectrum, process the reconstructed spectrum through any tools available in the bitstream payload to arrive at the actual signal spectrum as described by the input bitstream payload, and finally convert the frequency domain spectrum to the time domain. After the initial reconstruction and scaling of the spectral reconstruction, there are optional tools that modify one or more of the spectra in the spectrum to provide more efficient encoding.
在传输时域信号表示的情况下,解码器将重建所量化的时间信号,通过在比特流有效载荷中起作用的任意工具来处理所重建的时间信号以达到如由输入比特流有效载荷描述的实际时域信号。In case a time domain signal representation is transmitted, the decoder shall reconstruct the quantized time signal, processing the reconstructed time signal by any tools available in the bitstream payload to arrive at the actual time domain signal as described by the input bitstream payload.
针对对信号数据进行操作的每个任选工具,保留“通过”的选项,并且在所有略去处理的情况下,在其输入端的频谱或时间样本直接通过工具而未经修改。For each optional tool that operates on signal data, the option "through" is retained, and in all cases where processing is omitted, the spectrum or time samples at its input are passed directly through the tool without modification.
在比特流将其信号表示从时域改变为频域表示或从LP域改变为非LP域、或者从频域表示改变为时域表示或从非LP域改变为LP域的情况下,解码器将借助于适当的转换重叠相加开窗法以便于从一个域转变至另一个域。In the case where the bitstream changes its signal representation from time domain to frequency domain representation or from LP domain to non-LP domain, or from frequency domain representation to time domain representation or from non-LP domain to LP domain, the decoder will use appropriate conversion overlap-add windowing method to facilitate the transition from one domain to another.
在转变处置之后,以相同的方式将eSBR和MPEGS处理施加至两条编码路径。After the transition process, eSBR and MPEGS processing are applied to both encoding paths in the same way.
比特流有效载荷解复用器工具的输入为MPEG-D USAC比特流有效载荷。解复用器将比特流有效载荷分为针对每个工具的部分,并且向每个工具提供与该工具有关的比特流有效载荷信息。The input to the Bitstream Payload Demultiplexer tool is the MPEG-D USAC bitstream payload. The Demultiplexer separates the bitstream payload into parts for each tool and provides each tool with the bitstream payload information relevant to that tool.
来自比特流有效载荷解复用器工具的输出为:The output from the Bitstream Payload Demultiplexer tool is:
●取决于当前帧的核心编码类型,为:●Depends on the core encoding type of the current frame, which is:
○由以下内容表示的经量化且无噪声地编码的频谱○ The quantized and noiselessly encoded spectrum represented by
○标度因子信息○ Scale factor information
○算术编码的频谱线○ Spectral lines of arithmetic coding
●或为:线性预测(LP)参数连同由以下中的任一者表示的激励信号:• or: Linear Prediction (LP) parameters together with an excitation signal represented by either:
○经量化且算术编码的频谱线(变换编码激励,TCX)或o quantized and arithmetically coded spectral lines (transform coded excitation, TCX) or
○ACELP编码时域激励○ACELP coded time domain excitation
●频谱噪声填充信息(任选)●Spectrum noise filling information (optional)
●M/S决策信息(任选)●M/S decision information (optional)
●时域噪声整形(TNS)信息(任选)Temporal Noise Shaping (TNS) information (optional)
●滤波器组控制信息●Filter bank control information
●时间展开(TW)控制信息(任选)●Time expansion (TW) control information (optional)
●增强型频谱带宽复制(eSBR)控制信息(任选)Enhanced Spectrum Bandwidth Replication (eSBR) control information (optional)
●MPEG环绕(MPEGS)控制信息MPEG Surround (MPEGS) control information
标度因子无噪声解码工具从比特流有效载荷解复用器取得信息,解析该信息以及对霍夫曼和DPCM编码标度因子进行解码。 The Scale Factor Noiseless Decoding Tool takes information from the bitstream payload demultiplexer, parses this information and decodes the Huffman and DPCM coded scale factors.
标度因子无噪声解码工具的输入为:The input to the scale factor noiseless decoding tool is:
●用于无噪声地编码的频谱的标度因子信息● Scale factor information for noiselessly encoded spectrum
标度因子无噪声解码工具的输出为:The output of the scale factor noiseless decoding tool is:
●标度因子的解码整数表示。• The decoded integer representation of the scale factor.
频谱无噪声解码工具从比特流有效载荷解复用器取得信息,解析该信息,对算术编码数据进行解码以及重建所量化的频谱。该无噪声解码工具的输入为: The Spectral Noiseless Decoding Tool takes information from the Bitstream Payload Demultiplexer, parses the information, decodes the arithmetic coded data and reconstructs the quantized spectrum. The input to the Noiseless Decoding Tool is:
●无噪声地编码的频谱●Noiselessly coded spectrum
该无噪声解码工具的输出为:The output of the noiseless decoding tool is:
●频谱的量化值。●Quantized value of the spectrum.
逆量化器工具取得频谱的量化值,并且将整数值变换成未定标的重建频谱。该量化器为压扩量化器(companding quantizer),其伸缩因子取决于所选择的核心编码模式。 The inverse quantizer tool takes the quantized values of the spectrum and transforms the integer values into an unscaled reconstructed spectrum. The quantizer is a companding quantizer, and its scaling factor depends on the selected core coding mode.
逆量化器工具的输入为:The input to the Inverse Quantizer tool is:
●用于频谱的量化值●Quantization value for the spectrum
逆量化器工具的输出为:The output of the Inverse Quantizer tool is:
●未定标的逆量化频谱●Unscaled inverse quantized spectrum
噪声填充工具用于填充所解码的频谱中的频谱间隙,该频谱间隙例如由于编码器中对比特需求的严格限制而在频谱值被量化为零时出现。噪声填充工具的使用是任选的。 The noise filling tool is used to fill spectral gaps in the decoded spectrum, which occur when spectral values are quantized to zero, eg due to strict constraints on bit requirements in the encoder.The use of the noise filling tool is optional.
噪声填充工具的输入为:The input to the Noise Fill tool is:
●未定标的逆量化频谱●Unscaled inverse quantized spectrum
●噪声填充参数●Noise filling parameters
●标度因子的经解码的整数表示● The decoded integer representation of the scale factor
噪声填充工具的输出为:The output of the Noise Fill tool is:
●针对先前被量化为零的频谱线的未定标的逆量化频谱值● Unscaled inverse quantized spectral values for spectral lines that were previously quantized to zero
●标度因子的经修改的整数表示● Modified integer representation of the scale factor
重新定标工具将标度因子的整数表示转换成实际值,并且将未定标的逆量化频谱与相关的标度因子相乘。 The rescaling tool converts the integer representation of the scale factors into real values and multiplies the unscaled inverse quantized spectrum by the associated scale factors.
标度因子工具的输入为:The inputs to the Scale Factor tool are:
●标度因子的经解码的整数表示● The decoded integer representation of the scale factor
●未定标的逆量化频谱●Unscaled inverse quantized spectrum
来自标度因子工具的输出为:The output from the Scale Factor tool is:
●经定标的逆量化频谱●Scaled inverse quantized spectrum
有关M/S工具的概述,请参考ISO/IEC 14496-3:2009,4.1.1.2。For an overview of M/S tools, refer to ISO/IEC 14496-3:2009, 4.1.1.2.
有关时域噪声整形(TNS)工具的概述,请参考ISO/IEC 14496-3:2009,4.1.1.2。For an overview of temporal noise shaping (TNS) tools, refer to ISO/IEC 14496-3:2009, 4.1.1.2.
滤波器组/块切换工具施加在编码器中实施的频率映射的逆映射。逆改进型离散余弦变换(IMDCT)用于滤波器组工具。IMDCT可以被配置成支持120、128、240、256、480、512、960或1024个频谱系数。 The filter bank/block switching tool applies the inverse of the frequency mapping implemented in the encoder. The inverse modified discrete cosine transform (IMDCT) is used for the filter bank tool. The IMDCT can be configured to support 120, 128, 240, 256, 480, 512, 960 or 1024 spectral coefficients.
滤波器组工具的输入为:The input to the Filter Bank tool is:
●(逆量化)频谱●(Inverse quantization) spectrum
●滤波器组控制信息●Filter bank control information
来自滤波器组工具的输出为:The output from the filterbank tool is:
●时域重建音频信号●Reconstruct audio signal in time domain
当使能时间弯曲模式时,时间弯曲式滤波器组/块切换工具(time-warpedfilterbank/block switching toll)替换普通滤波器组/块切换工具。滤波器组与普通滤波器组相同(IMDCT),另外地,开窗时域样本通过时间改变重新采样而从弯曲式时域映射至线性时域。When time warping mode is enabled, a time- warped filterbank/block switching tool replaces the normal filterbank/block switching tool. The filterbank is the same as the normal filterbank (IMDCT), except that the windowed time domain samples are mapped from the warped time domain to the linear time domain via time-shifted resampling.
时间弯曲式滤波器组工具的输入为:The input to the Time Warping Filterbank tool is:
●逆量化频谱●Inverse quantized spectrum
●滤波器组控制信息●Filter bank control information
●时间弯曲控制信息Time warping control information
来自滤波器组工具的输出为:The output from the filterbank tool is:
●线性时域重建音频信号。●Linear time domain reconstruction of audio signals.
增强型SBR(eSBR)工具重新生成音频信号的高频带。其基于在编码期间截断的谐波序列的复制。其调整所生成的高频带的频谱包络并且施加逆向滤波,以及将噪声和正弦分量相加以重新创建原始信号的频谱特征。 The enhanced SBR (eSBR) tool regenerates the high-frequency band of an audio signal. It is based on replicating the harmonic sequence truncated during encoding. It adjusts the spectral envelope of the generated high-frequency band and applies inverse filtering, as well as adding noise and sinusoidal components to recreate the spectral characteristics of the original signal.
eSBR工具的输入为:The input to the eSBR tool is:
●所量化的包络数据●Quantified envelope data
●综合的控制数据Comprehensive control data
●来自频域核心解码器或ACELP/TCX核心解码器的时域信号●Time domain signal from frequency domain core decoder or ACELP/TCX core decoder
eSBR工具的输出为:The output of the eSBR tool is:
●时域信号,或●Time domain signal, or
●例如,在使用MPEG环绕工具的情况下,信号的QMF域表示。• A QMF domain representation of the signal, for example in case of using MPEG Surround tools.
MPEG环绕(MPEGS)工具通过向由适当空间参数控制的输入信号施加复杂的上混程序而从一个或更多个输入信号生成多个信号。在USAC背景下,MPEGS用于通过与所传输的下混信号一起传输参数边信息(parametric side information)来对多通道信号进行编码。 MPEG Surround (MPEGS) tools generate multiple signals from one or more input signals by applying a complex upmixing procedure to the input signals controlled by appropriate spatial parameters. In the context of USAC, MPEGS is used to encode multi-channel signals by transmitting parametric side information along with the transmitted downmix signal.
MPEGS工具的输入为:The input to the MPEGS tool is:
●下混的时域信号,或● Downmixed time domain signal, or
●来自eSBR工具的下混信号的QMF域表示●QMF domain representation of the downmix signal from the eSBR tool
MPEGS工具的输出为:The output of the MPEGS tool is:
●多通道时域信号●Multi-channel time domain signal
信号分类器工具分析原始输入信号,并且根据其来生成触发不同编码模式的选择的控制信息。输入信号的分析是与实现方式有关的,并且将试图针对给定输入信号帧选择最佳核心编码模式。信号分类器的输出(任选地)还可以用于影响其它工具(例如MPEG环绕、增强型SBR、时间弯曲式滤波器组以及其它)的行为。 The Signal Classifier tool analyzes the original input signal and generates control information based on it that triggers the selection of different coding modes. The analysis of the input signal is implementation-dependent and will attempt to select the best core coding mode for a given input signal frame. The output of the Signal Classifier can also (optionally) be used to influence the behavior of other tools (such as MPEG Surround, Enhanced SBR, Time Warping Filter Bank, and others).
信号分类器工具的输入为:The input to the Signal Classifier tool is:
●原始的未修改输入信号●Original unmodified input signal
●另外的依赖于实现方式的参数● Additional implementation-dependent parameters
信号分类器工具的输出为:The output of the Signal Classifier tool is:
●控制核心编解码器的选择(非LP滤波的频域编码、LP滤波的频域编码、或LP滤波的时域编码)的控制信号。• A control signal that controls the selection of the core codec (frequency domain coding without LP filtering, frequency domain coding with LP filtering, or time domain coding with LP filtering).
ACELP工具通过将长期预测器(适应性码字)与脉冲样序列(创新码字)组合来提供有效地表示时域激励信号的方式。所重建的激励被发送通过LP合成滤波器以形成时域信号。 The ACELP tool provides a way to efficiently represent the time domain excitation signal by combining a long-term predictor (adaptive codeword) with a pulse-like sequence (innovative codeword).The reconstructed excitation is sent through an LP synthesis filter to form the time domain signal.
ACELP工具的输入为:The input to the ACELP tool is:
●适应性及创新码本索引● Adaptability and Innovation Codebook Index
●适应性及创新代码增益值●Adaptability and innovative code gain
●其它控制数据●Other control data
●逆量化且内插的LPC滤波器系数●Inverse quantized and interpolated LPC filter coefficients
ACELP工具的输出为:The output of the ACELP tool is:
●时域重建的音频信号●Audio signal reconstructed in time domain
基于MDCT的TCX解码工具用于将经加权的LP残差表示从MDCT域变换回时域信号,并且输出包括经加权的LP合成滤波的时域信号。IMDCT可以被配置支持256、512或1024个频谱系数。The MDCT-based TCX decoding tool is used to transform the weighted LP residual representation from the MDCT domain back to the time domain signal and output a time domain signal including the weighted LP synthesis filtered IMDCT can be configured to support 256, 512 or 1024 spectral coefficients.
TCX工具的输入为:The input to the TCX tool is:
●(逆量化)MDCT频谱●(Inverse quantization) MDCT spectrum
●逆量化且内插的LPC滤波器系数●Inverse quantized and interpolated LPC filter coefficients
TCX工具的输出为:The output of the TCX tool is:
●时域重建音频信号●Reconstruct audio signal in time domain
在ISO/IEC CD 23003-3(其通过引用并入本文)中公开的技术允许如下定义:例如作为单个通道元素的通道元素仅包含用于单个通道的有效载荷,或者作为通道对元素的通道元素包括用于两个通道的有效载荷,或者作为LFE(低频增强型)通道元素的通道元素包括用于LFE通道的有效载荷。The techniques disclosed in ISO/IEC CD 23003-3 (which is incorporated herein by reference) allow for the definition of, for example, a channel element as a single channel element containing only payload for a single channel, or a channel element as a channel pair element including payload for two channels, or a channel element as an LFE (Low Frequency Enhancement) channel element including payload for the LFE channel.
五通道的多通道音频信号可以例如由如下通道元素表示:包括中心通道的单个通道元素;包括左通道和右通道的第一通道对元素;以及包括左环绕通道(Ls)和右环绕通道(Rs)的第二通道对元素。共同表示多通道音频信号的这些不同的通道元素被馈送到解码器中,并且利用相同的解码器配置对其进行处理。根据现有技术,由解码器将在USAC特定配置元素中发送的解码器配置施加至所有通道元素,并且因此存在如下情况:不能以最佳的方式针对各个通道元素选择对于所有通道元素有效的配置的元素,却必须针对所有通道元素同时进行设定。然而,另一方面,已经发现用于描述直接的五通道多通道信号的通道元素彼此非常不同。作为单个通道元素的中心通道与描述左/右通道和左环绕/右环绕通道的通道对元素具有显著不同的特性,并且另外地,两个通道对元素的特性也显著不同,原因是环绕通道包括的信息在很大程度上与包括在左通道和右通道中的信息不同。A five-channel multi-channel audio signal can, for example, be represented by the following channel elements: a single channel element comprising a center channel; a first channel pair element comprising a left channel and a right channel; and a second channel pair element comprising a left surround channel (Ls) and a right surround channel (Rs). These different channel elements, collectively representing the multi-channel audio signal, are fed into a decoder and processed using the same decoder configuration. According to the prior art, the decoder applies the decoder configuration, transmitted in a USAC-specific configuration element, to all channel elements. Therefore, it is not possible to optimally select an element of the configuration that is valid for all channel elements for each channel element, but rather to set it simultaneously for all channel elements. However, it has been found that the channel elements used to describe a direct five-channel multi-channel signal are very different from one another. The center channel, as a single channel element, has significantly different characteristics from the channel pair elements describing the left/right channels and the left/right surround channels. Furthermore, the characteristics of the two channel pair elements are also significantly different because the surround channels contain information that is significantly different from that contained in the left and right channels.
共同针对所述由通道元素选择配置数据使其必需做出折衷,使得不得不选择并非对于所有通道元素都最佳的配置,但是该配置表示所有通道元素之间的折衷。可替代地,已经选择对于一个通道元素最佳的配置,但是这不可避免地导致该配置对于其他通道元素并非最佳的情况。然而,这导致具有非最佳配置的通道元素的增加比特率,或者可替代地或另外地,对于不具有最佳配置设定的这些通道元素来说,导致音频质量降低。The selection of configuration data for each channel element necessitates a compromise, such that a configuration may be selected that is not optimal for all channel elements, but represents a compromise between all channel elements. Alternatively, a configuration that is optimal for one channel element may be selected, but this inevitably results in a situation where the configuration is suboptimal for other channel elements. This, however, results in an increased bitrate for the channel elements with the suboptimal configuration, or alternatively or additionally, a reduction in audio quality for those channel elements that do not have the optimal configuration settings.
发明内容Summary of the Invention
因此,本发明的目的在于提供一种改进的音频编码/解码构思。It is therefore an object of the present invention to provide an improved audio encoding/decoding concept.
此目的通过根据权利要求1的音频解码器、根据权利要求14的音频解码的方法、根据权利要求15的音频编码器、根据权利要求16的音频编码的方法、根据权利要求17的计算机程序以及根据权利要求18的经编码的音频信号来实现。This object is achieved by an audio decoder according to claim 1, a method of audio decoding according to claim 14, an audio encoder according to claim 15, a method of audio encoding according to claim 16, a computer program according to claim 17 and an encoded audio signal according to claim 18.
本发明基于如下发现:在传输用于各个通道元素的解码器配置数据时获得了改进的音频编码/解码构思。根据本发明,经编码的音频信号因此包括在数据流的有效载荷区段中的第一通道元素和第二通道元素;以及在数据流的配置区段中的用于第一通道元素的第一解码器配置数据和用于第二通道元素的第二解码器配置数据。因此,数据流的用于通道元素的有效载荷数据所位于的有效载荷区段与数据流的用于通道元素的配置数据所位于的配置数据隔开。优选地,配置区段为串行比特流的连续部分,其中属于比特流的该有效载荷区段或连续部分的所有位为配置数据。优选地,配置数据区段后面跟随数据流的用于通道元素的有效载荷所位于的有效载荷区段。本发明的音频解码器包括数据流读取器,该数据流读取器用于读取配置区段中的用于每个通道元素的配置数据,并且用于读取有效载荷区段中的用于每个通道元素的有效载荷数据。此外,音频解码器包括用于对多个通道元素进行解码的可配置解码器和用于配置可配置解码器的配置控制器,使得在对第一通道元素进行解码时,根据第一解码器配置数据来配置可配置解码器,而在对第二通道元素进行解码时,根据第二解码器配置数据来配置可配置解码器。The present invention is based on the discovery that an improved audio encoding/decoding concept is achieved when transmitting decoder configuration data for individual channel elements. According to the present invention, the encoded audio signal thus comprises a first channel element and a second channel element in a payload section of the data stream; and first decoder configuration data for the first channel element and second decoder configuration data for the second channel element in a configuration section of the data stream. Thus, the payload section of the data stream, in which the payload data for the channel elements are located, is separated from the configuration data section of the data stream, in which the configuration data for the channel elements are located. Preferably, the configuration section is a continuous portion of a serial bit stream, wherein all bits belonging to the payload section or continuous portion of the bit stream are configuration data. Preferably, the configuration data section is followed by a payload section of the data stream, in which the payload data for the channel elements are located. The audio decoder of the present invention comprises a data stream reader for reading the configuration data for each channel element in the configuration section and for reading the payload data for each channel element in the payload section. In addition, the audio decoder includes a configurable decoder for decoding multiple channel elements and a configuration controller for configuring the configurable decoder, so that when decoding a first channel element, the configurable decoder is configured according to first decoder configuration data, and when decoding a second channel element, the configurable decoder is configured according to second decoder configuration data.
因而,确信针对每个通道元素可以选择最佳配置。这允许最佳地考虑不同通道元素的不同特性。Thus, it is ensured that an optimal configuration can be selected for each channel element. This allows the different characteristics of the different channel elements to be optimally taken into account.
根据本发明的音频编码器被布置为用于对多通道音频信号进行编码,该多通道音频信号具有例如至少两个、三个或优选地多于三个的通道。音频编码器包括:配置处理器,其用于生成用于第一通道元素的第一配置数据和用于第二通道元素的第二配置数据;以及可配置编码器,其用于分别利用第一配置数据和第二配置数据来对多通道音频信号进行编码,以获得第一通道元素和第二通道元素。此外,音频编码器包括数据流生成器,该数据流生成器用于生成表示经编码的音频信号的数据流,该数据流具有:配置区段,其具有第一配置数据和第二配置数据;以及有效载荷区段,其包括第一通道元素和第二通道元素。The audio encoder according to the present invention is arranged for encoding a multi-channel audio signal having, for example, at least two, three or preferably more than three channels. The audio encoder comprises: a configuration processor for generating first configuration data for a first channel element and second configuration data for a second channel element; and a configurable encoder for encoding the multi-channel audio signal using the first configuration data and the second configuration data, respectively, to obtain the first channel element and the second channel element. In addition, the audio encoder comprises a data stream generator for generating a data stream representing the encoded audio signal, the data stream having: a configuration section having the first configuration data and the second configuration data; and a payload section comprising the first channel element and the second channel element.
现在,在此情况下的编码器和解码器针对每个通道元素确定各个优选的最佳配置数据。The encoder and decoder in this case now determine the respective preferred optimal configuration data for each channel element.
这确保用于每个通道元素的可配置解码器被配置为使得针对每个通道元素,可以获得关于音频质量和比特率的最佳选择,并且不再需要做出折衷。This ensures that the configurable decoder for each channel element is configured such that for each channel element the best choice with respect to audio quality and bitrate is obtained and no compromises need to be made.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
随后,参照附图来描述本发明的优选实施方式,在附图中:Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings, in which:
图1是解码器的框图;Figure 1 is a block diagram of a decoder;
图2是编码器的框图;FIG2 is a block diagram of an encoder;
图3a和图3b表示对用于不同扬声器设置的通道配置进行概述的表;Figures 3a and 3b show tables summarizing the channel configurations for different loudspeaker setups;
图4a和图4b识别并且以图形示出不同扬声器设置;Figures 4a and 4b identify and graphically illustrate different speaker setups;
图5a至图5d示出具有配置区段和有效载荷区段的经编码音频信号的不同方面;5a to 5d show different aspects of an encoded audio signal having a configuration section and a payload section;
图6a示出UsacConfig元素的语法;Figure 6a shows the syntax of the UsacConfig element;
图6b示出UsacChannelConfig元素的语法;Figure 6b shows the syntax of the UsacChannelConfig element;
图6c示出UsacDecoderConfig的语法;FIG6 c shows the syntax of UsacDecoderConfig;
图6d示出UsacSingleChannelElementConfig的语法;Figure 6d shows the syntax of UsacSingleChannelElementConfig;
图6e示出UsacChannelPairElementConfig的语法;Figure 6e shows the syntax of UsacChannelPairElementConfig;
图6f示出UsacLfeElementConfig的语法;Figure 6f shows the syntax of UsacLfeElementConfig;
图6g示出UsacCoreConfig的语法;Figure 6g shows the syntax of UsacCoreConfig;
图6h示出SbrConfig的语法;Figure 6h shows the syntax of SbrConfig;
图6i示出SbrDfltHeader的语法;FIG6i shows the syntax of SbrDfltHeader;
图6j示出Mps212Config的语法;Figure 6j shows the syntax of Mps212Config;
图6k示出UsacExtElementConfig的语法;Figure 6k shows the syntax of UsacExtElementConfig;
图6l示出UsacConfigExtension的语法;FIG61 shows the syntax of UsacConfigExtension;
图6m示出escapedValue的语法;Figure 6m shows the syntax of escapedValue;
图7示出用于对通道元素的不同编码器/解码器工具分别进行识别和配置的不同替代方案;FIG7 shows different alternatives for identifying and configuring the different encoder/decoder tools of channel elements, respectively;
图8示出具有用于生成5.1多通道音频信号的并行操作解码器实例的解码器实现方式的优选实施方式;FIG8 shows a preferred embodiment of a decoder implementation with parallel operating decoder instances for generating a 5.1 multi-channel audio signal;
图9以流程图形式示出图1的解码器的优选实现方式;FIG9 shows a preferred implementation of the decoder of FIG1 in the form of a flow chart;
图10a示出USAC编码器的框图;以及Figure 10a shows a block diagram of a USAC encoder; and
图10b示出USAC解码器的框图。FIG10 b shows a block diagram of a USAC decoder.
具体实施方式DETAILED DESCRIPTION
关于所包含的音频内容的高阶信息(如采样率、确切通道配置)存在于音频比特流中。这使比特流更加自包含,并且在被嵌置到可能没有明确传输该信息的手段的传输方案中时,使配置和有效载荷的传输更容易。High-level information about the contained audio content (e.g., sampling rate, exact channel configuration) is present in the audio bitstream. This makes the bitstream more self-contained and makes transmission of the configuration and payload easier when embedded in a transport scheme that may not have a means to explicitly transmit this information.
配置结构包含有组合的帧长度和频谱带宽复制(SBR)采样率比的索引(coreSbrFrameLengthIndex)。这保证二个值的有效传输,并且确保帧长度与SBR比的无含义组合无法被传达。后者简化了解码器的实现方式。The configuration structure contains an index (coreSbrFrameLengthIndex) of the combined frame length and spectral bandwidth replication (SBR) sampling rate ratio. This ensures that both values are transmitted efficiently and that meaningless combinations of frame length and SBR ratio cannot be communicated. The latter simplifies decoder implementation.
可以借助于专用配置扩展机制来扩展该配置。这将防止如根据MPEG-4AudioSpecificConfig()已知的配置扩展的巨大且无效的传输。The configuration can be extended by means of a dedicated configuration extension mechanism. This will prevent large and inefficient transmission of configuration extensions as known from MPEG-4 AudioSpecificConfig().
该配置允许与每个所传输的音频通道相关联的扬声器位置的自由传达。常用通道对扬声器映射的传达可以借助于channelConfigurationIndex(通道配置索引)而有效地传达。This configuration allows for free communication of the speaker positions associated with each transmitted audio channel.Communication of common channel to speaker mappings can be efficiently communicated by means of the channelConfigurationIndex.
每个通道元素的配置均被包含在单独结构中,使得每个通道元素可以被独立地配置。The configuration of each channel element is contained in a separate structure, allowing each channel element to be configured independently.
SBR配置数据(“SBR头”)被划分成SbrInfo()和SbrHeader()。对于SbrHeader(),定义默认版本(SbrDfltHeader()),其可以在比特流中有效地引用。这减少了需要重新传输SBR配置数据的位置处的位需求。The SBR configuration data ("SBR header") is divided into SbrInfo() and SbrHeader(). For SbrHeader(), a default version (SbrDfltHeader()) is defined that can be efficiently referenced in the bitstream. This reduces the bit requirements at locations where the SBR configuration data needs to be retransmitted.
借助于SbrInfo()语法元素,可以有效地传达较常施加至SBR的配置变化。Configuration changes that are more commonly applied to SBR can be efficiently communicated with the help of the SbrInfo() syntax element.
用于参数带宽扩展(SBR)和参数立体声编码工具(MPS212又称MPEG环绕2-1-2)的配置被紧密集成到USAC配置结构中。这表示在标准中实际采用两种技术的方式更好。The configurations for Parametric Bandwidth Extension (SBR) and Parametric Stereo Coding tools (MPS212, also known as MPEG Surround 2-1-2) are tightly integrated into the USAC configuration structure, which means that the two technologies are actually better implemented in the standard.
语法以扩展机制为特征,该扩展机制允许编解码器的现有的传输和未来扩展的传输。The syntax features an extension mechanism that allows the transmission of existing and future extensions to the codec.
扩展可以以任何次序放置在通道元素旁边(即交错)。这允许需要在被施加扩展的特定通道元素之前或之后进行读取的扩展。Extensions can be placed next to channel elements in any order (ie interleaved). This allows extensions to be read before or after the specific channel element to which the extension is applied.
默认长度可以针对语法扩展进行定义,这使得恒定长度扩展的传输非常有效,原因是无需每次都传输扩展有效载荷的长度。A default length can be defined for syntax extensions, which makes the transmission of constant-length extensions very efficient since the length of the extension payload does not need to be transmitted every time.
借助于逃逸机制来传达值以扩展值的范围的常见情况,如果需要的话,被模块化到专用真实语法元素(escapedValue())中,该元素足够灵活地覆盖所有期望的逃逸值丛和位域扩展。The common case of conveying values with the help of escape mechanisms to extend the range of values, if necessary, is modularized into a dedicated real syntax element (escapedValue()) that is flexible enough to cover all expected escape value complexes and bitfield extensions.
比特流配置Bitstream Configuration
UsacConfig()(图6a)UsacConfig() (Figure 6a)
UsacConfig()被扩展为包含有与所含音频内容有关的信息以及用于完整解码器设置所需的一切。关于音频的顶阶信息(采样率、通道配置、输出帧长度)聚集在起始处以容易从更高(应用)层存取。UsacConfig() has been extended to include information about the contained audio content and everything needed for a complete decoder setup. Top-level information about the audio (sampling rate, channel configuration, output frame length) is gathered at the beginning for easy access from higher (application) layers.
channelConfigurationIndex、UsacChannelConfig()(图6b)channelConfigurationIndex, UsacChannelConfig() (Figure 6b)
这样的元素给出与所包含的比特流元素以及其至扬声器的映射有关的信息。channelConfigurationIndex允许对被视为实际上相关的预定义的单声、立体声或多通道配置的范围中之一进行传达的容易且方便的方式。Such an element gives information about the contained bitstream elements and their mapping to loudspeakers.channelConfigurationIndex allows an easy and convenient way of conveying which one of a range of predefined mono, stereo or multi-channel configurations is considered to be actually relevant.
对于channelConfigurationIndex未覆盖的更详尽配置,UsacChannelConfig()允许将元素自由分配给32个扬声器位置的列表中的扬声器位置,该列表覆盖用于家庭或影院声音重现的所有已知扬声器设置中的所有目前已知的扬声器位置。For more elaborate configurations not covered by channelConfigurationIndex, UsacChannelConfig() allows free assignment of elements to speaker positions from a list of 32 speaker positions covering all currently known speaker positions in all known speaker setups for home or cinema sound reproduction.
该扬声器位置的列表是在MPEG环绕标准中起重要作用的列表的超集(参考ISO/IEC 23003-1的表1和图1)。已经增加四个另外的扬声器位置以能够覆盖最近问世的22.2扬声器设置(参见图3a、图3b、图4a以及图4b)。This list of speaker positions is a superset of the list featured in the MPEG Surround standard (see Table 1 and Figure 1 of ISO/IEC 23003-1). Four additional speaker positions have been added to cover the recently introduced 22.2 speaker setup (see Figures 3a, 3b, 4a, and 4b).
UsacDecoderConfig()(图6c)UsacDecoderConfig() (Figure 6c)
该元素位于解码器配置的重要位置,使其包含解码器解释比特流所需的所有另外信息。This element is placed at a key position in the decoder configuration so that it contains all the additional information the decoder needs to interpret the bitstream.
具体地,于此通过明确地陈述比特流中的元素数目及其次序来定义比特流的结构。Specifically, the structure of a bitstream is defined herein by explicitly stating the number of elements in the bitstream and their order.
然后,对所有元素的循环允许所有类型(单个、成对、lfe、扩展)的所有元素的配置。Then, looping over all elements allows configuration of all elements of all types (single, pair, lfe, extended).
UsacConfigExtension()(图6l)UsacConfigExtension() (Figure 6l)
为了考虑到未来的扩展,配置的特征为以下的强有力机制:针对USAC的尚未存在的配置扩展而扩展该配置。To allow for future extensions, the configuration features a robust mechanism for extending the configuration for not yet existing configuration extensions of USAC.
UsacSingleChannelElementConfig()(图6d)UsacSingleChannelElementConfig() (Figure 6d)
该元素配置包含用于将解码器配置成对一个单通道进行解码所需的所有信息。这基本上为与核心编码器相关的信息,并且如果使用SBR,则为与SBR相关的信息。This element configuration contains all the information needed to configure the decoder to decode a single pass. This is basically the information related to the core encoder, and if SBR is used, the information related to SBR.
UsacChannelPairElementConfig()(图6e)UsacChannelPairElementConfig() (Figure 6e)
类似以上所述的,该元素配置包含用于将解码器配置成对一个通道对进行解码所需的所有信息。除上述的核心配置和SBR配置之外,其还包括特定于立体声的配置,例如所施加的立体声编码的确切类别(具有或不具有MPS212、残差等)。注意,该元素覆盖在USAC中可用的立体声编码选项的所有种类。Similar to the above, this element configuration contains all the information needed to configure the decoder to decode a channel pair. In addition to the core configuration and SBR configuration mentioned above, it also includes stereo-specific configurations, such as the exact type of stereo encoding applied (with or without MPS212, residual, etc.). Note that this element covers the full range of stereo encoding options available in USAC.
UsacLfeElementConfig()(图6f)UsacLfeElementConfig() (Figure 6f)
因为LFE元素具有静态配置,所以LFE元素配置不包含配置数据。Because the LFE element has a static configuration, the LFE element configuration does not contain configuration data.
UsacExtElementConfig()(图6k)UsacExtElementConfig() (Figure 6k)
该元素配置可以用于向编解码器配置任何种类的现有或未来扩展。每个扩展元素类型具有其本身的专用ID值。包括长度字段,以能够方便地跳过解码器所未知的配置扩展。默认有效载荷长度的任选定义进一步提高存在于实际比特流中的扩展有效载荷的编码效率。This element configuration can be used to configure any type of existing or future extension to the codec. Each extension element type has its own dedicated ID value. A length field is included to facilitate skipping configuration extensions unknown to the decoder. The optional definition of a default payload length further improves the coding efficiency of extension payloads present in the actual bitstream.
已知被预见为与USAC组合的扩展包括:MPEG环绕、SAOC以及根据MPEG-4AAC已知的某种FIL元素。Known extensions that are foreseen for combination with USAC include: MPEG Surround, SAOC and some kind of FIL element known from MPEG-4 AAC.
UsacCoreConfig()(图6g)UsacCoreConfig() (Figure 6g)
该元素包含影响核心编码器设置的配置数据。目前,这些配置数据为用于时间弯曲工具和噪声填充工具的切换。This element contains configuration data that affects the core encoder settings. Currently, these are switches for the Time Warp tool and the Noise Fill tool.
SbrConfig()(图6h)SbrConfig() (Figure 6h)
为了减少由sbr_header()的频繁重新传输所产生的位开销,通常保持为恒定的sbr_header()的元素的默认值现在被承载于配置元素SbrDfltHeader()中。此外,静态SBR配置元素也被承载于SbrConfig()中。这些静态位包括用于使能或禁止增强型SBR的特定特征(如谐波转位或跨时间包络整形特征(inter-TES))的标记。To reduce the bit overhead caused by frequent retransmissions of sbr_header(), the default values for elements of sbr_header() that are usually kept constant are now carried in the configuration element SbrDfltHeader(). In addition, static SBR configuration elements are also carried in SbrConfig(). These static bits include flags for enabling or disabling specific features of Enhanced SBR, such as harmonic transposition or inter-time envelope shaping (inter-TES).
SbrDfltHeader()(图6i)SbrDfltHeader() (Figure 6i)
该元素承载通常保持为恒定的sbr_header()元素。影响事物(如幅值分辨率、交叉频带、频谱预平坦化)的元素现在被承载于SbrInfo()中,其允许所述事物实时地有效改变。This element carries the sbr_header() element which normally remains constant. Elements affecting things like amplitude resolution, crossbands, spectrum pre-flattening are now carried in SbrInfo() which allows them to be effectively changed in real time.
Mps212Config()(图6j)Mps212Config() (Figure 6j)
类似上面的SBR配置,针对MPEG环绕2-1-2工具的所有设置参数被集合在该配置中。来自SpatialSpecificConfig()的与上下文不相关或冗余的所有元素均被移除。Similar to the SBR configuration above, all the settings parameters for the MPEG Surround 2-1-2 tool are gathered in this configuration. All elements from SpatialSpecificConfig() that are not relevant to the context or redundant are removed.
比特流有效载荷Bitstream Payload
UsacFrame()UsacFrame()
其为环绕USAC比特流有效载荷的最外侧包绕器并且表示USAC存取单元。其包含通过所有所含通道元素和如在config部分所传达的扩展元素的循环。这使得比特流格式在其可以包含的内容方面显著更灵活,并且是用于任何未来扩展的未来保证。It is the outermost wrapper around the USAC bitstream payload and represents a USAC access unit. It contains a loop through all contained channel elements and extension elements as conveyed in the config section. This makes the bitstream format significantly more flexible in what it can contain and is future-proof for any future extensions.
UsacSingleChannelElement()UsacSingleChannelElement()
该元素包含对单声流进行解码的所有数据。该内容被划分成与核心编码器相关的部分和与eSBR相关的部分。与eSBR相关的部分现在显著更紧密地连接至核心,这也显著更好地反映了解码器需要数据的次序。This element contains all the data required to decode the mono stream. The content is divided into a core encoder-related part and an eSBR-related part. The eSBR-related part is now significantly more tightly connected to the core, which also significantly better reflects the order in which the decoder needs the data.
UsacChannelPairElement()UsacChannelPairElement()
该元素覆盖用于对立体声对进行编码的所有可能方式的数据。具体地,覆盖统一立体声编码的所有风格,从基于传统M/S的编码到借助于MPEG环绕2-1-2的完全参数立体声编码。stereoConfigIndex表示实际使用的风格。在该元素中发送适当的eSBR数据和MPEG环绕2-1-2数据。UsacLfeElement()This element covers data for all possible ways of encoding a stereo pair. Specifically, it covers all styles of unified stereo coding, from traditional M/S based coding to fully parametric stereo coding with the help of MPEG Surround 2-1-2. stereoConfigIndex indicates the actual style used. Appropriate eSBR data and MPEG Surround 2-1-2 data are sent in this element. UsacLfeElement()
仅对之前的lfe_channel_element()重新命名,以遵守一致的命名方案。UsacExtElement()Only the previous lfe_channel_element() has been renamed to follow a consistent naming scheme. UsacExtElement()
扩展元素被审慎设计为能够使灵活性最大化,但同时使效率最大化,即使针对具有较小(或通常根本没有)有效载荷的扩展也如此。向无知的解码器传达扩展有效载荷长度以跳过它。用户定义的扩展可以借助于扩展类型的保留范围进行传达。扩展可以以元素次序自由地放置。已经考虑一定范围的扩展元素,包括写入填充字节的机制。Extension elements are carefully designed to maximize flexibility while maximizing efficiency, even for extensions with small (or often no) payloads. Extension payload lengths are communicated to naive decoders to skip them. User-defined extensions can be communicated using a reserved range of extension types. Extensions can be freely placed in element order. A range of extension elements are considered, including a mechanism for writing padding bytes.
UsacCoreCoderData()UsacCoreCoderData()
该新元素概括影响核心编码器的所有信息,因此也包含fd_channel_stream()和lpd_channel_stream()。This new element summarizes all information affecting the core encoder and therefore also includes fd_channel_stream() and lpd_channel_stream().
StereoCoreToolInfo()StereoCoreToolInfo()
为了使语法的可读性容易化,所有立体声相关信息被捕获在该元素中。其处理立体声编码模式下的位的众多依赖性。To ease the readability of the syntax, all stereo related information is captured in this element. It handles the numerous dependencies of the bits in the stereo coding mode.
UsacSbrData()UsacSbrData()
可伸缩性音频编码的CRC功能元素和传统描述元素从用于成为sbr_extension_data()元素的元素中被移除。为了减少由SBR信息和头数据的频繁重新传输造成的开销,可以明确地传达它们的存在。The CRC functionality element and legacy description elements for Scalable Audio Coding are removed from what used to be the sbr_extension_data() element. To reduce the overhead caused by frequent retransmission of SBR information and header data, their presence can be explicitly communicated.
SbrInfo()SbrInfo()
SBR配置数据经常进行实时修改。这包括先前需要完整sbr_header()的传输的控制如下事物的元素,该事物例如为幅值分辨率、交叉频带、频谱预平坦化。(参见[N11660]中的6.3,“效率”)。SBR configuration data is frequently modified in real time. This includes elements that previously required the transmission of a complete sbr_header() to control things like amplitude resolution, crossbands, and spectral pre-flattening (see 6.3, "Efficiency" in [N11660]).
SbrHeader()SbrHeader()
为了维持SBR实时地改变sbr_header()中的值的能力,在应当使用除在SbrDfltHeader()中发送的那些值以外的其它值的情况下,现在可以将SbrHeader()承载于UsacSbrData()内。对bs_header_extra机制进行维持以针对大部分常见情况将开销保持为尽可能低。To maintain SBR's ability to change the values in sbr_header() in real time, SbrHeader() can now be carried within UsacSbrData() in cases where other values than those sent in SbrDfltHeader() should be used. The bs_header_extra mechanism is maintained to keep overhead as low as possible for the most common cases.
sbr_data()sbr_data()
再者,移除SBR可伸缩编码的余部,原因是其不能应用于USAC上下文中。取决于通道数目,sbr_data()包含一个sbr_single_channel_element()或一个sbr_channel_pair_element()。usacSamplingFrequencyIndexFurthermore, the remainder of the SBR scalable coding is removed since it is not applicable in the USAC context. Depending on the number of channels, sbr_data() contains either a sbr_single_channel_element() or a sbr_channel_pair_element(). usacSamplingFrequencyIndex
本表为在MPEG-4中使用以对音频编解码器的采样频率进行传达的表的超集。本表被进一步扩展为还覆盖目前在USAC操作模式下使用的采样率。还加入采样频率的一些倍数。This table is a superset of the table used in MPEG-4 to convey the sampling frequency of the audio codec. This table has been further extended to also cover the sampling rates currently used in the USAC mode of operation. Some multiples of the sampling frequency are also included.
channelConfigurationIndexchannelConfigurationIndex
本表为在MPEG-4中使用以对channelConfiguration进行传达的表的超集。本表被进一步扩展来允许常用的和所预见的未来扬声器设置的传达。本表中的索引以5位进行传达,以允许未来扩展。This table is a superset of the table used in MPEG-4 to communicate channelConfiguration. This table is further extended to allow communication of common and foreseen future speaker setups. Indexes in this table are communicated using 5 bits to allow for future extensions.
usacElementTypeusacElementType
仅存在4种元素类型。四个基本比特流元素各有一个类型:UsacSingleChannelElement()、UsacChannelPairElement()、UsacLfeElement()、UsacExtElement()。这些元素提供所需的顶层结构,同时维持所有需要的灵活性。There are only four element types. One for each of the four basic bitstream elements: UsacSingleChannelElement(), UsacChannelPairElement(), UsacLfeElement(), and UsacExtElement(). These elements provide the required top-level structure while maintaining all the required flexibility.
usacExtElementTypeusacExtElementType
在UsacExtElement()内部,本元素允许传达过多的扩展。为了未来保证,位域被选择为足够大以允许所有可设想的扩展。在当前已知的扩展中,建议考虑少数扩展:填充元素、MPEG环绕以及SAOC。Within UsacExtElement(), this element allows a plethora of extensions to be communicated. To be future-proof, the bitfield has been chosen to be large enough to allow for all conceivable extensions. Of the currently known extensions, a few are recommended for consideration: filler elements, MPEG surround, and SAOC.
usacConfigExtTypeusacConfigExtType
可能需要在某一点扩展配置,那么这可以通过UsacConfigExtension()来处置,然后其将允许给每个新配置分配类型。当前可以被传达的唯一类型为用于该配置的填充机制。It may be necessary to extend the configuration at some point, so this can be handled by UsacConfigExtension(), which will then allow a type to be assigned to each new configuration. Currently the only type that can be conveyed is the population mechanism to use for that configuration.
coreSbrFrameLengthIndexcoreSbrFrameLengthIndex
该表将对解码器的多个配置方面进行传达。具体地,这些为输出帧长度、SBR比以及所得的核心编码器帧长度(ccfl)。同时,其表示用在SBR中的合成频带和QMF分析的数目。This table conveys several configuration aspects of the decoder. Specifically, these are the output frame length, the SBR ratio, and the resulting core encoder frame length (ccfl). It also indicates the number of synthesis bands used in the SBR and QMF analysis.
stereoConfigIndexstereoConfigIndex
该表确定UsacChannelPairElement()的内部结构。该表表示单声或立体声核心的使用、MPS212的使用、是否施加立体声SBR以及是否在MPS212中施加残差编码。This table determines the internal structure of UsacChannelPairElement(). This table indicates the use of mono or stereo core, the use of MPS 212, whether stereo SBR is applied, and whether residual coding is applied in MPS 212.
通过将eSBR头字段的大部分移动至可以借助于默认头标记来参考的默认头,大大减少了发送eSBR控制数据的位需求。被视为在现实世界系统中最可能改变的前述sbr_header()位域反而被外包给sbrInfo()元素,使其现在仅包括覆盖最多8位的4个元素。与由至少18位构成的sbr_header()相比,这节省了10位。By moving most of the eSBR header fields to a default header that can be referenced via the default header tag, the bit requirements for sending eSBR control data have been significantly reduced. The aforementioned sbr_header() bitfields, considered the most likely to change in real-world systems, have instead been outsourced to the sbrInfo() element, now consisting of only four elements covering a maximum of 8 bits. This saves 10 bits compared to the sbr_header() element, which consisted of at least 18 bits.
评估此变化对总比特率的影响是较困难的,原因在于总比特率很大程度上取决于sbrInfo()中的eSBR控制数据的传输率。然而,已经针对在比特流中更改sbr交叉的公共使用情况,每次发生发送sbrInfo()替代完整传输的sbr_header()时,位节省可以高达22位。It is difficult to assess the impact of this change on the overall bitrate, as the overall bitrate depends strongly on the transmission rate of the eSBR control data in sbrInfo(). However, for the common use case of changing the sbr interleaving in the bitstream, the bit savings can be as high as 22 bits each time an sbrInfo() is sent instead of a full transmitted sbr_header().
USAC解码器的输出可以由MPEG环绕(MPS)(ISO/IEC 23003-1)或SAOC(ISO/IEC23003-2)进一步处理。如果USAC中的SBR工具为有效的,则通过以针对ISO/IEC 23003-14.4中的HE-AAC所描述的相同方式在QMF域中连接USAC解码器和后续MPS/SAOC解码器,USAC解码器通常可以有效地与后续MPS/SAOC解码器组合。如果在QMF域中的连接不可行,则它们需要在时域中进行连接。The output of the USAC decoder can be further processed by MPEG Surround (MPS) (ISO/IEC 23003-1) or SAOC (ISO/IEC 23003-2). If the SBR tool in USAC is valid, the USAC decoder can usually be effectively combined with the subsequent MPS/SAOC decoder by connecting the USAC decoder and the subsequent MPS/SAOC decoder in the QMF domain in the same way as described for HE-AAC in ISO/IEC 23003-14.4. If the connection in the QMF domain is not feasible, they need to be connected in the time domain.
如果借助于usacExtElement机制(其中usacExtElementType为ID_EXT_ELE_MPEGS或ID_EXT_ELE_SAOC)将MPS/SAOC边信息嵌入到USAC比特流中,则USAC数据与MPS/SAOC数据之间的时间对齐呈现出USAC解码器与MPS/SAOC解码器之间的最有效连接。如果在USAC中的SBR工具为有效的并且如果MPS/SAOC采用64频带的QMF域表示(参见ISO/IEC 23003-16.6.3),则最有效连接是在QMF域中。否则,最有效连接是在时域中。这对应于如在ISO/IEC23003-14.4、4.5以及7.2.1中定义的MPS和HE-AAC的组合的时间对齐。If MPS/SAOC side information is embedded into the USAC bitstream using the usacExtElement mechanism (where usacExtElementType is ID_EXT_ELE_MPEGS or ID_EXT_ELE_SAOC), the time alignment between the USAC data and the MPS/SAOC data presents the most efficient connection between the USAC decoder and the MPS/SAOC decoder. If the SBR tool in USAC is valid and if MPS/SAOC uses a 64-band QMF domain representation (see ISO/IEC 23003-16.6.3), the most efficient connection is in the QMF domain. Otherwise, the most efficient connection is in the time domain. This corresponds to the time alignment of the combination of MPS and HE-AAC as defined in ISO/IEC 23003-14.4, 4.5, and 7.2.1.
通过在USAC解码后增加MPS解码所引入的另外延迟是由ISO/IEC23003-14.5给定的,并且取决于:是否使用HQ MPS或LP MPS,以及MPS是否在QMF域或时域中连接至USAC。The additional delay introduced by adding MPS decoding after USAC decoding is given by ISO/IEC 23003-14.5 and depends on whether HQ MPS or LP MPS is used, and whether MPS is connected to USAC in the QMF domain or in the time domain.
ISO/IEC 23003-14.4阐明USAC系统与MPEG系统之间的接口。从系统接口传递给音频解码器的每个存取单元将导致从该音频解码器传递至系统接口的相应组合单元即组合器。这将包括起始状况和关断状况,即存取单元何时为存取单元的有限序列中的第一个或最后一个。ISO/IEC 23003-14.4 describes the interface between the USAC system and the MPEG system. Each access unit passed from the system interface to the audio decoder will result in a corresponding combination unit, or combiner, being passed from the audio decoder to the system interface. This will include a start condition and a shutdown condition, i.e., when an access unit is the first or last in a finite sequence of access units.
对于音频组合单元,ISO/IEC 14496-17.1.3.5组合时间戳(CTS)指定施加至组合单元内的第n个音频样本的组合时间。对于USAC,n的值始终为1。注意,这适用于USAC解码器本身的输出。在USAC解码器例如与MPS解码器组合的情况下,需要考虑在MPS解码器的输出端传递的组合单元。For audio combination units, ISO/IEC 14496-17.1.3.5 Combination Timestamp (CTS) specifies the combination time applied to the nth audio sample within the combination unit. For USAC, the value of n is always 1. Note that this applies to the output of the USAC decoder itself. In the case where the USAC decoder is combined with an MPS decoder, for example, the combination unit delivered at the output of the MPS decoder needs to be taken into account.
USAC比特流有效载荷语法的特征Features of USAC bitstream payload syntax
表-UsacFrame()的语法Table - Syntax of UseFrame()
表-UsacSingleChannelElement()的语法Table - Syntax of UseSingleChannelElement()
表-UsacChannelPairElement()的语法Table - Syntax of UseChannelPairElement()
表-UsacLfeElement()的语法Table - Syntax of UseLfeElement()
表-UsacExtElement()的语法Table - Syntax of UseExtElement()
附属有效载荷元素的语法的特征Characteristics of the syntax of the attachment payload element
表-UsacCoreCoderData()的语法Table - Syntax of UseCoreCoderData()
表-StereoCoreToolInfo()的语法Table - Syntax of StereoCoreToolInfo()
表-fd_channel_stream()的语法Table - Syntax of fd_channel_stream()
表-lpd_channel_stream()的语法Table - Syntax of lpd_channel_stream()
表-fac_data()的语法Table - Syntax of fac_data()
增强型SBR有效载荷语法的特征Features of the Enhanced SBR Payload Syntax
表-UsacSbrData()的语法Table - Syntax of UsacSbrData()
表-SbrInfo的语法Table - Syntax of SbrInfo
表-SbrHeader的语法Table - Syntax of SbrHeader
表-sbr_data()的语法Table - Syntax of sbr_data()
表-sbr_envelope()的语法Table - Syntax of sbr_envelope()
表-FramingInfo()的语法Table - Syntax of FramingInfo()
数据元素的简短描述A short description of the data element
UsacConfig()UsacConfig()
该元素包含关于所含音频内容的信息以及用于完整解码器设置所需的一切。This element contains information about the contained audio content and everything needed for a complete decoder setup.
UsacChannelConfig()UsacChannelConfig()
该元素给出与所包含的比特流元素以及其至扬声器的映射有关的信息。This element gives information about the contained bitstream elements and their mapping to speakers.
UsacDecoderConfig()UsacDecoderConfig()
该元素包含由解码器解释比特流所需的所有另外信息。具体地,在此处传达SBR重新采样率,并且比特流的结构在此通过明确地陈述比特流中的元素数目及其次序进行定义。This element contains all the additional information needed by the decoder to interpret the bitstream. Specifically, the SBR resampling rate is conveyed here, and the structure of the bitstream is defined here by explicitly stating the number of elements in the bitstream and their order.
UsacConfigExtension()UsacConfigExtension()
配置扩展机制,对用于USAC的未来配置扩展的配置进行扩展。Configuration extension mechanism to extend the configuration for future configuration extensions of USAC.
UsacSingleChannelElementConfig()UsacSingleChannelElementConfig()
其包含用于将解码器配置成对一个单通道进行解码所需的所有信息。这基本上为与核心编码器相关的信息,并且如果使用SBR,则为与SBR相关的信息。It contains all the information needed to configure the decoder to decode a single pass. This is basically the core encoder related information, and if SBR is used, SBR related information.
UsacChannelPairElementConfig()UsacChannelPairElementConfig()
类似以上所述的,该元素配置包含用于将解码器配置成对一个通道对进行解码所需的所有信息。除上述的核心配置和SBR配置之外,其还包括特定于立体声的配置,例如所施加的立体声编码的确切类别(具有或不具有MPS212、残差等)。该元素覆盖在USAC中当前可用的立体声编码选项的所有种类。Similar to the above, this element configuration contains all the information needed to configure the decoder to decode a channel pair. In addition to the core configuration and SBR configuration mentioned above, it also includes stereo-specific configurations, such as the exact type of stereo encoding applied (with or without MPS212, residual, etc.). This element covers the full range of stereo encoding options currently available in USAC.
UsacLfeElementConfig()UsacLfeElementConfig()
因为LFE元素具有静态配置,所以LFE元素配置不包含配置数据。Because the LFE element has a static configuration, the LFE element configuration does not contain configuration data.
UsacExtElementConfig()UsacExtElementConfig()
该元素配置可以用于对编解码器的任何种类的现有扩展或未来扩展进行配置。每个扩展元素类型具有其本身专用类型值。包括长度字段,以能够跳过解码器所未知的配置扩展。This element configuration can be used to configure any type of existing or future extensions to the codec. Each extension element type has its own dedicated type value. A length field is included to enable skipping of configuration extensions that are unknown to the decoder.
UsacCoreConfig()UsacCoreConfig()
其包含影响核心编码器设置的配置数据。It contains configuration data that affects the core encoder settings.
SbrConfig()SbrConfig()
其包含通常保持为恒定的用于eSBR的配置元素的默认值。此外,静态SBR配置元素也被承载于SbrConfig()中。这些静态位包括用于使能或禁止增强型SBR的特定特征(如谐波转位或inter-TES)的标记。It contains default values for configuration elements used for eSBR that are generally kept constant. In addition, static SBR configuration elements are also carried in SbrConfig(). These static bits include flags for enabling or disabling specific features of enhanced SBR (such as harmonic transposition or inter-TES).
SbrDfltHeader()SbrDfltHeader()
该元素承载SbrHeader()的元素的默认版本,如果不期望这些元素有差值,则可以参考该默认版本。This element carries the default version of the elements of SbrHeader(), which can be referenced if no differences between these elements are expected.
Mps212Config()Mps212Config()
针对MPEG环绕2-1-2工具的所有设置参数都被集合在该配置中。All settings for the MPEG Surround 2-1-2 tool are grouped together in this configuration.
escapedValue()escapedValue()
该元素实现使用不同数目的位来传输整数值的通用方法。其以两阶逃逸机制为特征,该两阶逃逸机制允许通过连续传输另外的位来扩展可表示的值范围。This element implements a general method for transmitting integer values using different numbers of bits. It features a two-stage escape mechanism that allows the range of representable values to be extended by continuously transmitting additional bits.
usacSamplingFrequencyIndexusacSamplingFrequencyIndex
该索引确定解码后的音频信号的采样频率。在表C中描述usacSamplingFrequencyIndex的值及其相关联的采样频率。The index determines the sampling frequency of the decoded audio signal. The values of usacSamplingFrequencyIndex and their associated sampling frequencies are described in Table C.
表C-usacSamplingFrequencyIndex的值和含义Table C - usacSamplingFrequencyIndex values and meanings
usacSamplingFrequencyusacSamplingFrequency
在usacSamplingFrequencyIndex等于零的情况下,解码器的输出采样频率被编码为无符号整数值。In the case where usacSamplingFrequencyIndex is equal to zero, the output sampling frequency of the decoder is encoded as an unsigned integer value.
channelConfigurationIndexchannelConfigurationIndex
该索引确定通道配置。如果channelConfigurationIndex>0,则该索引根据表Y明确地定义通道数目、通道元素以及相关联的扬声器映射。扬声器位置的名称、所使用的缩写以及可用扬声器的通用位置可以从图3a、图3b以及图4a和图4b得到。This index determines the channel configuration. If channelConfigurationIndex > 0, then the index explicitly defines the number of channels, channel elements, and associated loudspeaker mappings according to Table Y. The names of the loudspeaker positions, the abbreviations used, and the general positions of the available loudspeakers can be derived from Figures 3a, 3b, and 4a and 4b.
bsOutputChannelPosbsOutputChannelPos
该索引根据图4a和图4b来描述与给定通道相关联的扬声器位置。图4b表示在收听者的3D环境中的扬声器位置。为了方便理解扬声器位置,图4a也包含根据IEC 100/1706/CDV的扬声器位置,其被列举于此以方便感兴趣的读者查询。The index describes the loudspeaker positions associated with a given channel according to Figures 4a and 4b. Figure 4b shows the loudspeaker positions in the listener's 3D environment. To facilitate understanding of the loudspeaker positions, Figure 4a also includes the loudspeaker positions according to IEC 100/1706/CDV, which are listed here for the convenience of the interested reader.
表-取决于coreSbrFrameLengthIndex的coreCoderFrameLength、sbrRatio、outputFrameLength以及numSlots的值Table - depends on the coreCoderFrameLength, sbrRatio, outputFrameLength and numSlots values of coreSbrFrameLengthIndex
usacConfigExtEnsionPresentusacConfigExtEnsionPresent
其表示对配置的扩展的存在。It indicates the presence of an extension to the configuration.
numOutChannelsnumOutChannels
如果channelConfigurationIndex的值表示未使用任何预定义的通道配置,则该元素确定特定扬声器位置将关联的音频通道的数目。If the value of channelConfigurationIndex indicates that none of the predefined channel configurations are being used, then this element determines the number of audio channels that a particular speaker position will be associated with.
numElementsnumElements
本字段包含将跟随通过UsacDecoderConfig()的元素类型的循环的元素的数目。This field contains the number of elements that will follow the loop through the element type of UsacDecoderConfig().
usacElementType[elemIdx]usacElementType[elemIdx]
其定义在比特流中的位置elemIdx处的元素的USAC通道元素类型。存在四种元素类型,针对四个基本比特流元素中的每一个基本比特流元素的类型为:UsacSingleChannelElement()、UsacChannelPairElement()、UsacLfeElement()、UsacExtElement()。这些元素提供所需的顶层结构,同时维持所有需要的灵活性。在表A中定义usacElementType的含义。This defines the USAC channel element type for the element at position elemIdx in the bitstream. There are four element types, one for each of the four basic bitstream elements: UsacSingleChannelElement(), UsacChannelPairElement(), UsacLfeElement(), and UsacExtElement(). These elements provide the required top-level structure while maintaining all the required flexibility. The meaning of usacElementType is defined in Table A.
表A-usacElementType的值Table A - Values of usacElementType
stereoConfigIndexstereoConfigIndex
该元素确定UsacChannelPairElement()的内部结构。其根据表ZZ表示单声或立体声核心的使用、MPS212的使用、是否施加立体声SBR、以及是否在MPS212中施加残差编码。该元素还定义辅助元素bsStereoSbr和bsResidualCoding的值。This element specifies the internal structure of UsacChannelPairElement(). It indicates the use of mono or stereo core, the use of MPS 212, whether stereo SBR is applied, and whether residual coding is applied in MPS 212, according to Table ZZ. This element also defines the values of the auxiliary elements bsStereoSbr and bsResidualCoding.
表ZZ-stereoConfigIndex的值及其含义以及bsStereoSbr和bsResidualCoding的隐式分配Table ZZ - Values of stereoConfigIndex and their meanings and implicit assignment of bsStereoSbr and bsResidualCoding
tw_mdcttw_mdct
该标记对本流中的时间弯曲式MDCT的使用进行传达。This flag communicates the use of time-warped MDCT in this stream.
noiseFillingnoiseFilling
该标记对FD核心编码器中的频谱洞(spectral hole)的噪声填充的使用进行传达。This flag communicates the use of noise filling of spectral holes in the FD core encoder.
harmonicSBRharmonicSBR
该标记对SBR中的谐波修补的使用进行传达。This flag communicates the use of harmonic repair in SBR.
bs_interTesbs_interTes
该标记对SBR中的inter-TES工具的使用进行传达。This flag communicates the use of the inter-TES tool in SBR.
dflt_start_freqdflt_start_freq
其为用于比特流元素bs_start_freq的默认值,该默认值在标记sbrUseDfltHeader表示将采取用于SbrHeader()元素的默认值的情况下应用。This is the default value for the bitstream element bs_start_freq, which applies if the flag sbrUseDfltHeader indicates that the default value for the SbrHeader() element is to be assumed.
dflt_stop_freqdflt_stop_freq
其为用于比特流元素bs_stop_freq的默认值,该默认值在标记sbrUseDfltHeader表示将采取用于SbrHeader()元素的默认值的情况下应用。This is the default value for the bitstream element bs_stop_freq, which applies if the flag sbrUseDfltHeader indicates that the default value for the SbrHeader() element is to be assumed.
dflt_header_extra1dflt_header_extra1
其为用于比特流元素bs_header_extra1的默认值,该默认值在标记sbrUseDfltHeader表示将采取用于SbrHeader()元素的默认值的情况下应用。This is the default value for the bitstream element bs_header_extra1, which applies if the flag sbrUseDfltHeader indicates that the default value for the SbrHeader() element is to be assumed.
dflt_header_extra2dflt_header_extra2
其为用于比特流元素bs_header_extra2的默认值,该默认值在标记sbrUseDfltHeader表示将采取用于SbrHeader()元素的默认值的情况下应用。This is the default value for the bitstream element bs_header_extra2, which applies if the flag sbrUseDfltHeader indicates that the default value for the SbrHeader() element is to be assumed.
dflt_freq_scaledflt_freq_scale
其为用于比特流元素bs_freq_scale的默认值,该默认值在标记sbrUseDfltHeader表示将采取用于SbrHeader()元素的默认值的情况下应用。This is the default value for the bitstream element bs_freq_scale, which applies if the flag sbrUseDfltHeader indicates that the default value for the SbrHeader() element is to be assumed.
dflt_alter_scaledflt_alter_scale
其为用于比特流元素bs_alter_scale的默认值,该默认值在标记sbrUseDfltHeader表示将采取用于SbrHeader()元素的默认值的情况下应用。This is the default value for the bitstream element bs_alter_scale, which applies if the flag sbrUseDfltHeader indicates that the default value for the SbrHeader() element is to be assumed.
dflt_noise_bandsdflt_noise_bands
其为用于比特流元素bs_noise_bands的默认值,该默认值在标记sbrUseDfltHeader表示将采取用于SbrHeader()元素的默认值的情况下应用。This is the default value for the bitstream element bs_noise_bands, which applies if the flag sbrUseDfltHeader indicates that the default value for the SbrHeader() element is to be assumed.
dflt_limiter_bandsdflt_limiter_bands
其为用于比特流元素bs_limiter_bands的默认值,该默认值在标记sbrUseDfltHeader表示将采取用于SbrHeader()元素的默认值的情况下应用。This is the default value for the bitstream element bs_limiter_bands, which applies if the flag sbrUseDfltHeader indicates that the default value for the SbrHeader() element is to be assumed.
dflt_limiter_gainsdflt_limiter_gains
其为用于比特流元素bs_limiter_gains的默认值,该默认值在标记sbrUseDfltHeader表示将采取用于SbrHeader()元素的默认值的情况下应用。This is the default value for the bitstream element bs_limiter_gains, which applies if the flag sbrUseDfltHeader indicates that the default value for the SbrHeader() element is to be assumed.
dflt_interpol_freqdflt_interpol_freq
其为用于比特流元素bs_interpol_freq的默认值,该默认值在标记sbrUseDfltHeader表示将采取用于SbrHeader()元素的默认值的情况下应用。This is the default value for the bitstream element bs_interpol_freq, which applies if the flag sbrUseDfltHeader indicates that the default value for the SbrHeader() element is to be assumed.
dflt_smoothing_modedflt_smoothing_mode
其为用于比特流元素bs_smoothing_mode的默认值,该默认值在标记sbrUseDfltHeader表示将采取用于SbrHeader()元素的默认值的情况下应用。This is the default value for the bitstream element bs_smoothing_mode, which applies if the flag sbrUseDfltHeader indicates that the default value for the SbrHeader() element is to be assumed.
usacExtElementTypeusacExtElementType
该元素允许对比特流扩展类型进行传达。在表B中定义usacExtElementType的含义。This element allows the communication of a bitstream extension type. The meaning of usacExtElementType is defined in Table B.
表B-usacExtElementType的值Table B - Values of usacExtElementType
usacExtElementConfigLengthusacExtElementConfigLength
其以字节(八位字节)来传达扩展配置的长度。It conveys the length of the extended configuration in bytes (octets).
usacExtElementDefaultLengthPresentusacExtElementDefaultLengthPresent
该标记对是否在UsacExtElementConfig()中传送usacExtElementDefaultLength进行传达。This flag communicates whether usacExtElementDefaultLength is passed in UsacExtElementConfig().
usacExtElementDefaultLengthusacExtElementDefaultLength
其以字节对扩展元素的默认长度进行传达。只要给定存取单元中的扩展元素偏离该值,则需要在比特流中传输另外的长度。如果未明确地传输该元素(usacExtElementDefaultLengthPresent==0),则usacExtElementDefaultLength的值将被设定为零。This conveys the default length of the extension element in bytes. Whenever an extension element in a given access unit deviates from this value, an additional length needs to be transmitted in the bitstream. If the element is not explicitly transmitted (usacExtElementDefaultLengthPresent == 0), the value of usacExtElementDefaultLength will be set to zero.
usacExtElementPayloadFragusacExtElementPayloadFrag
该标记表示本扩展元素的有效载荷是否可以被分片段并且作为连续USAC帧中的若干节段进行发送。This flag indicates whether the payload of this extension element can be fragmented and sent as several segments in consecutive USAC frames.
numConfigExtensionsnumConfigExtensions
如果对配置的扩展存在于UsacConfig()中,则该值表示所传达的配置扩展的数目。If extensions to the configuration are present in UsacConfig(), this value indicates the number of configuration extensions conveyed.
confExtIdxconfExtIdx
配置扩展的索引。Configure the extended index.
usacConfigExtTypeusacConfigExtType
该元素允许对配置扩展类型进行传达。在表D中定义usacConfigExtType的含义。This element allows the communication of a configuration extension type. The meaning of usacConfigExtType is defined in Table D.
表D-usacConfigExtType的值Table D - usacConfigExtType values
usacConfigExtLengthusacConfigExtLength
其以字节(八位字节)对配置扩展的长度进行传达。It conveys the length of the configuration extension in bytes (octets).
bsPseudoLrbsPseudoLr
该标记对应当在Mps212处理之前将逆向中间/边旋转施加至核心信号进行传达。This flag communicates that reverse medial/lateral rotation should be applied to the core signal prior to Mps212 processing.
表-bsPseudoLrTable-bsPseudoLr
bsStereoSbrbsStereoSbr
该标记对结合MPEG环绕解码来使用立体声SBR进行传达。This flag is used to communicate the use of stereo SBR in conjunction with MPEG surround decoding.
表-bsStereoSbrTable -bsStereoSbr
bsResidualCodingbsResidualCoding
其根据下表来表示是否施加残差编码。由stereoConfigIndex定义BsResidualCoding值(参见X)。It indicates whether residual coding is applied according to the following table: The BsResidualCoding value is defined by stereoConfigIndex (see X).
表-bsResidualCodingTable -bsResidualCoding
sbrRatioIndxsbrRatioIndx
其表示核心采样率与eSBR处理后的采样率之间的比。同时,其根据下表来表示在SBR中使用的合成频带和QMF分析的数目。It represents the ratio between the core sampling rate and the sampling rate after eSBR processing. At the same time, it represents the number of synthesis bands and QMF analysis used in SBR according to the following table.
表-sbrRatioIndex的定义Table - Definition of sbrRatioIndex
elemIdxelemIdx
存在于UsacDecoderConfig()和UsacFrame()中的元素的索引。The index of the element present in UsacDecoderConfig() and UsacFrame().
UsacConfig()UsacConfig()
UsacConfig()包含与输出采样频率和通道配置有关的信息。该信息将与在此元素外部如在MPEG-4AudioSpecificConfig()中所传达的信息相同。UsacConfig() contains information about the output sampling frequency and channel configuration. This information will be the same as that conveyed outside this element as in MPEG-4AudioSpecificConfig().
Usac输出采样频率Usac output sampling frequency
如果采样率并非为表1右栏列举的比率中之一,则必须得到采样频率依赖性表(代码表、标度因子频带表等)以解析比特流有效载荷。由于给定采样频率与仅一个采样频率表相关联,并且由于在可能的采样频率范围内期望最大的灵活性,所以下表将用于使隐式采样频率和期望采样频率依赖性表相关联。If the sampling rate is not one of the ratios listed in the right column of Table 1, then the sampling frequency dependency table (code table, scale factor band table, etc.) must be obtained to parse the bitstream payload. Since a given sampling frequency is associated with only one sampling frequency table, and since maximum flexibility is desired within the range of possible sampling frequencies, the following table will be used to associate the implicit sampling frequency with the desired sampling frequency dependency table.
表1-采样频率映射Table 1 - Sampling frequency mapping
UsacChannelConfig()UsacChannelConfig()
通道配置表覆盖大多数常用的扬声器位置。为了进一步的灵活性,通道可以被映射至在各种应用的现代扬声器设置中发现的32个扬声器位置的总体选择(参见图3a、图3b)。The channel configuration table covers the most commonly used loudspeaker positions. For further flexibility, channels can be mapped to a total selection of 32 loudspeaker positions found in modern loudspeaker setups for various applications (see Figures 3a, 3b).
针对包含在比特流中的每个通道,UsacChannelConfig()指定该特定通道将映射至的相关联扬声器位置。在图4a中列出由bsOutputChannelPos索引的扬声器位置。在多通道元素的情况下,bsOutputChannelPos[i]的索引i表示该通道在比特流中出现的位置。图Y给出关于收听者的扬声器位置的概况。For each channel included in the bitstream, UsacChannelConfig() specifies the associated speaker position to which that particular channel will be mapped. The speaker positions indexed by bsOutputChannelPos are listed in Figure 4a. In the case of a multi-channel element, the index i of bsOutputChannelPos[i] indicates the position at which the channel appears in the bitstream. Figure Y gives an overview of the speaker positions with respect to the listener.
更精确地,以0(零)开始,以通道在比特流中出现的顺序对通道进行编号。在UsacSingleChannelElement()或UsacLfeElement()的普通情况下,通道编号被分配给该通道,并且通道计数值加1。在UsacChannelPairElement()的情况下,该元素中的第一通道(具有索引ch==0)被编号为1,而该同一元素中的第二通道(具有索引ch==1)接收下一更高的数字,并且通道计数值加2。More precisely, channels are numbered in the order in which they appear in the bitstream, starting with 0 (zero). In the ordinary case of UsacSingleChannelElement() or UsacLfeElement(), a channel number is assigned to the channel and the channel count value is incremented by 1. In the case of UsacChannelPairElement(), the first channel in the element (with index ch == 0) is numbered 1, while the second channel in the same element (with index ch == 1) receives the next higher number and the channel count value is incremented by 2.
其遵循numOutChannels将等于或小于比特流中所包含的所有通道的累积和。所有通道的累积和与如下数目相等:该数目为所有UsacSingleChannelElement()的数目加上所有UsacLfeElement()的数目再加上所有UsacChannelPairElement()的两倍数目。It follows that numOutChannels will be equal to or less than the cumulative sum of all channels contained in the bitstream. The cumulative sum of all channels is equal to the number of all UsacSingleChannelElement() plus the number of all UsacLfeElement() plus twice the number of all UsacChannelPairElement().
数组bsOutputChannelPos中的所有条目将被互相分开,以避免比特流中扬声器位置的双重分配。All entries in the array bsOutputChannelPos will be separated from each other to avoid double allocation of speaker positions in the bitstream.
在channelConfigurationIndex为0且numOutChannels小于比特流中所包含的所有通道的累积和的特定情况下,那么非分配通道的处置在本说明书的范围以外。关于此的信息可以例如通过较高应用层的适当手段或通过特定设计的(私有)扩展有效载荷进行传送。In the specific case where channelConfigurationIndex is 0 and numOutChannels is less than the cumulative sum of all channels contained in the bitstream, then the handling of non-allocated channels is outside the scope of this specification. Information about this can be conveyed, for example, by appropriate means of higher application layers or by specially designed (private) extension payloads.
UsacDecoderConfig()UsacDecoderConfig()
UsacDecoderConfig()包含由解码器解释比特流所需的所有另外信息。首先,sbrRatioIndex的值确定核心编码器帧长度(ccfl)与输出帧长度之间的比。其后,sbrRatioIndex为通过本比特流中的所有通道元素的循环。针对每次迭代,在usacElementType[]中传达元素类型,紧接着传达其相应的配置结构。各个元素在UsacDecoderConfig()中存在的次序将与相应有效载荷在UsacFrame()中的次序相同。UsacDecoderConfig() contains all the additional information needed by the decoder to interpret the bitstream. First, the value of sbrRatioIndex determines the ratio between the core encoder frame length (ccfl) and the output frame length. sbrRatioIndex then loops through all channel elements in the bitstream. For each iteration, the element type is conveyed in usacElementType[], followed by its corresponding configuration structure. The order in which the elements appear in UsacDecoderConfig() will be the same as the order in which the corresponding payload appears in UsacFrame().
元素的每个实例可以被独立地配置。当读取UsacFrame()中的每个通道元素时,针对每个元素,将使用该实例的相应配置即具有相同的elemIdx。Each instance of an element can be configured independently. When reading each channel element in UsacFrame(), for each element, the corresponding configuration of the instance will be used, i.e., with the same elemIdx.
UsacSingleChannelElementConfig()UsacSingleChannelElementConfig()
UsacSingleChannelElementConfig()包含将解码器配置成对一个单通道进行解码所需的所有信息。如果实际上采用SBR,则仅传输SBR配置数据。UsacSingleChannelElementConfig() contains all the information needed to configure the decoder to decode a single channel. If SBR is actually used, only the SBR configuration data is transmitted.
UsacChannelPairElementConfig()UsacChannelPairElementConfig()
UsacChannelPairElementConfig()包含与核心编码器相关的配置数据以及取决于SBR的使用的SBR配置数据。立体声编码算法的确切类型由stereoConfigIndex表示。在USAC中,通道对可以以各种方式进行编码。这些方式为:UsacChannelPairElementConfig() contains configuration data related to the core encoder and SBR configuration data depending on the use of SBR. The exact type of stereo encoding algorithm is indicated by stereoConfigIndex. In USAC, channel pairs can be encoded in various ways. These ways are:
1.通过MDCT域中的复合预测可能性来扩展使用传统联合立体声编码技术的立体声核心编码器对。1. Extending the stereo core encoder pair using the conventional joint stereo coding technique by composite prediction possibilities in the MDCT domain.
2.单声核心编码器通道与基于MPEG环绕的MPS212组合,以用于完整参数立体声编码。单声SBR处理被施加至核心信号。2. The mono core encoder channel is combined with MPS212 based MPEG Surround for full parametric stereo coding. Mono SBR processing is applied to the core signal.
3.立体声核心编码器对与基于MPEG环绕的MPS212组合,其中第一核心编码器通道承载下混信号并且第二通道承载残差信号。残差可以是被限制为实现部分残差编码的频带。单声SBR处理仅在MPS212处理之前被施加至下混信号。3. A stereo core encoder pair is combined with an MPS212 based on MPEG Surround, where the first core encoder channel carries the downmix signal and the second channel carries the residual signal. The residual can be band-limited to achieve partial residual coding. Mono SBR processing is applied to the downmix signal only before MPS212 processing.
4.立体声核心编码器对与基于MPEG环绕的MPS212组合,其中第一核心编码器通道承载下混信号并且第二通道承载残差信号。残差可以是被限制为实现部分残差编码的频带。立体声SBR在MPS212处理之后被施加至重建的立体声信号。4. A stereo core encoder pair is combined with MPS212 based on MPEG Surround, where the first core encoder channel carries the downmix signal and the second channel carries the residual signal. The residual can be band-limited to achieve partial residual coding. Stereo SBR is applied to the reconstructed stereo signal after MPS212 processing.
在核心编码器之后,选项3和4可以进一步与伪LR通道旋转组合。After the core encoder, options 3 and 4 can be further combined with pseudo LR channel rotation.
UsacLfeElementConfig()UsacLfeElementConfig()
由于LFE通道不允许使用时间弯曲式MDCT和噪声填充,所以无需传输针对这些工具的常用核心编码器标记。其反而将被设定为零。Since the LFE channel does not allow the use of time-warped MDCT and noise filling, the usual core encoder flags for these tools do not need to be transmitted. Instead, they will be set to zero.
而且,在LFE上下文下也不允许使用SBR。因而,不传输SBR配置数据。Furthermore, SBR is not allowed in the LFE context and therefore, no SBR configuration data is transmitted.
UsacCoreConfig()UsacCoreConfig()
UsacCoreConfig()仅包含在全局比特流层级上使能或禁止时间弯曲式MDCT和频谱噪声填充的使用的标记。如果tw_mdct被设定为零,则不施加时间弯曲。如果noiseFilling被设定为零,则不施加频谱噪声填充。UsacCoreConfig() contains only flags to enable or disable the use of time-warped MDCT and spectral noise filling at the global bitstream level. If tw_mdct is set to zero, no time warping is applied. If noiseFilling is set to zero, no spectral noise filling is applied.
SbrConfig()SbrConfig()
SbrConfig()比特流元素用于对确切eSBR设置参数进行传达的目的。一方面,SbrConfig()对eSBR工具的一般部署进行传达。另一方面,SbrConfig()包含SbrHeader()的默认版本,即SbrDfltHeader()。如果在比特流中未传输不同的SbrHeader(),则将采取该默认头的值。此机制的背景为在一个比特流中通常仅应用一组SbrHeader()值。然后,SbrDfltHeader()的传输允许通过使用比特流中的仅一位而非常有效地参考该组默认值。通过允许比特流本身的新SbrHeader的带内传输,仍然保持实时地改变SbrHeader值的可能性。The SbrConfig() bitstream element serves the purpose of communicating the exact eSBR setup parameters. On the one hand, SbrConfig() communicates the general deployment of the eSBR tool. On the other hand, SbrConfig() contains a default version of SbrHeader(), namely SbrDfltHeader(). If a different SbrHeader() is not transmitted in the bitstream, the value of this default header will be assumed. The background of this mechanism is that usually only one set of SbrHeader() values is applied in a bitstream. The transmission of SbrDfltHeader() then allows to refer to this set of default values very efficiently by using only one bit in the bitstream. By allowing in-band transmission of a new SbrHeader in the bitstream itself, the possibility of changing the SbrHeader values in real time is still maintained.
SbrDfltHeader()SbrDfltHeader()
SbrDfltHeader()可以被称为基本SbrHeader()模板,并且应当包含用于主要使用的eSBR配置的值。在比特流中,通过设定sbrUseDfltHeader()标记可以参考该配置。SbrDfltHeader()的结构与SbrHeader()的结构相同。为了能够区别SbrDfltHeader()和SbrHeader()的值,SbrDfltHeader()中的位域被加前缀“dflt_”而非“bs_”。如果表示使用SbrDfltHeader(),则SbrHeader()位域将采取相应SbrDfltHeader()的值,即SbrDfltHeader() can be called the basic SbrHeader() template and should contain the values for the eSBR configuration that is primarily used. In the bitstream, this configuration can be referenced by setting the sbrUseDfltHeader() flag. The structure of SbrDfltHeader() is the same as that of SbrHeader(). In order to be able to distinguish the values of SbrDfltHeader() and SbrHeader(), the bit fields in SbrDfltHeader() are prefixed with "dflt_" instead of "bs_". If the use of SbrDfltHeader() is indicated, the SbrHeader() bit fields will take the values of the corresponding SbrDfltHeader(), i.e.
bs_start_freq=dflt_start_freq;bs_start_freq=dflt_start_freq;
bs_stop_freq=dflt_stop_freq;bs_stop_freq=dflt_stop_freq;
等wait
(继续SbrHeader()中的所有元素,如:(Continue with all elements in SbrHeader(), such as:
bs_xxx_yyy=dflt_xxx_yyy;bs_xxx_yyy=dflt_xxx_yyy;
Mps212Config()Mps212Config()
Mps212Config()类似于MPEG环绕的SpatialSpecificConfig()并且大部分是根据SpatialSpecificConfig()得到的。然而,其程度减少为仅包含与USAC上下文中的单声到立体声上混有关的信息。因此,MPS212仅配置一个OTT框。Mps212Config() is similar to and largely derived from SpatialSpecificConfig() of MPEG Surround. However, its extent is reduced to only contain information related to mono to stereo upmixing in the USAC context. Therefore, MPS212 only configures one OTT box.
UsacExtElementConfig()UsacExtElementConfig()
UsacExtElementConfig()为用于USAC的扩展元素的配置数据的一般容器。每个USAC扩展具有独特类型的标识符即usacExtElementType,其在图6k中定义。针对每个UsacExtElementConfig(),所包含的扩展配置的长度以可变usacExtElementConfigLength进行传输,并且允许解码器安全地跳过usacExtElementType为未知的扩展元素。UsacExtElementConfig() is a general container for configuration data for USAC extension elements. Each USAC extension has a unique type identifier, usacExtElementType, which is defined in Figure 6k. For each UsacExtElementConfig(), the length of the included extension configuration is transmitted in the variable usacExtElementConfigLength, allowing the decoder to safely skip extension elements with unknown usacExtElementType.
对于通常具有恒定有效载荷长度的USAC扩展,UsacExtElementConfig()允许usacExtElementDefaultLength的传输。定义配置中的默认有效载荷长度允许UsacExtElement()内的usacExtElementPayloadLength的高度有效传达,其中位消耗需要被保持为低。For USAC extensions that typically have a constant payload length, UsacExtElementConfig() allows the transmission of usacExtElementDefaultLength. Defining a default payload length in the configuration allows for highly efficient communication of usacExtElementPayloadLength within UsacExtElement() where bit consumption needs to be kept low.
在其中较大量数据被累积并且并非以每帧为基础进行传输而仅以每隔一帧或甚至更稀疏地进行传输的USAC扩展的情况下,该数据可以以遍布若干USAC帧的片段或区段进行传输。这可以有助于更加均衡地保持位储藏。该机制的使用由标记usacExtElementPayloadFrag标记进行传达。在6.2.X的usacExtElement的描述中进一步说明片段机制。In the case of USAC extensions where larger amounts of data are accumulated and transmitted not on a per-frame basis but only every other frame or even more sparsely, this data can be transmitted in fragments or segments spread across several USAC frames. This can help maintain more even bit storage. The use of this mechanism is signaled by the usacExtElementPayloadFrag flag. The fragment mechanism is further described in the description of usacExtElement in 6.2.X.
UsacConfigExtension()UsacConfigExtension()
UsacConfigExtension()为用于UsacConfig()扩展的一般容器。其提供对在解码器初始化或设置时所切换的信息进行修正或扩展的便利方式。配置扩展的存在由usacConfigExtensionPresent表示。如果配置扩展存在(usacConfigExtensionPresent==1),则这些扩展的确切数目遵循位域numConfigExtensions。每个配置扩展具有独特类型的标识符,usacConfigExtType。针对每个UsacConfigExtension,所包含的配置扩展的长度以可变usacConfigExtLength进行传输,并且允许配置比特流解析器安全地跳过usacConfigExtType为未知的配置扩展。UsacConfigExtension() is a general container for UsacConfig() extensions. It provides a convenient way to modify or extend information that is switched when the decoder is initialized or set up. The presence of a configuration extension is indicated by usacConfigExtensionPresent. If the configuration extension is present (usacConfigExtensionPresent == 1), the exact number of these extensions follows the bitfield numConfigExtensions. Each configuration extension has a unique type identifier, usacConfigExtType. For each UsacConfigExtension, the length of the contained configuration extension is transmitted in the variable usacConfigExtLength, and allows the configuration bitstream parser to safely skip configuration extensions whose usacConfigExtType is unknown.
针对音频对象类型USAC的顶级有效载荷Top-level payload for audio object type USAC
术语和定义Terms and Definitions
UsacFrame()UsacFrame()
该数据块包含一个USAC帧的时间周期内的音频数据、相关信息以及其它数据。如在UsacDecoderConfig()中所传达的,UsacFrame()包含numElements元素。这些元素可以包含针对一个或二个通道的音频数据、针对低频增强或扩展有效载荷的音频数据。This data block contains the audio data, related information, and other data for the time period of a USAC frame. As conveyed in UsacDecoderConfig(), UsacFrame() contains numElements elements. These elements can contain audio data for one or two channels, audio data for low-frequency enhancement, or extended payload.
UsacSingleChannelElement()UsacSingleChannelElement()
缩写SCE。包含用于单个音频通道的编码数据的比特流的语法元素。single_channel_element()基本上包括UsacCoreCoderData(),UsacCoreCoderData()含有用于FD或LPD核心编码器的数据。在SBR处于有效的情况下,UsacSingleChannelElement也包含SBR数据。Abbreviated SCE. A syntax element that contains the bitstream of coded data for a single audio channel. single_channel_element() essentially includes UsacCoreCoderData(), which contains data for the FD or LPD core coder. If SBR is enabled, UsacSingleChannelElement also contains SBR data.
UsacChannelPairElement()UsacChannelPairElement()
缩写CPE。包含用于一对通道的数据的比特流有效载荷的语法元素。通道对可以通过传输二个离散通道或者通过一个离散通道和相关Mps212有效载荷来实现。这借助于stereoConfigIndex来传达。在SBR处于有效的情况下,UsacChannelPairElement还包含SBR数据。Abbreviated CPE. A syntax element containing a bitstream payload for a pair of channels. A channel pair can be implemented by transmitting two discrete channels or a single discrete channel and the associated MPLS 212 payload. This is communicated via the stereoConfigIndex parameter. If SBR is enabled, the UsacChannelPairElement also contains SBR data.
UsacLfeElement()UsacLfeElement()
缩写LFE。包含低采样频率增强通道的语法元素。LFE始终使用fd_channel_stream()元素进行编码。Abbreviation for LFE. Contains the syntax element for the low sampling frequency enhancement channel. LFE is always encoded using the fd_channel_stream() element.
UsacExtElement()UsacExtElement()
包含扩展有效载荷的语法元素。扩展元素的长度作为配置(USACExtElementConfig())中的默认长度进行传达或在UsacExtElement()本身中进行传达。如果存在,则扩展有效载荷为usacExtElementType类型,如在配置中所传达的。The syntax element that contains the extension payload. The length of the extension element is communicated as a default length in the configuration (USACExtElementConfig()) or in UsacExtElement() itself. If present, the extension payload is of type usacExtElementType, as communicated in the configuration.
usacIndependencyFlagusacIndependencyFlag
其根据下表来表示是否可以在不知道来自先前帧的信息的情况下对当前UsacFrame()进行完全解码。It indicates whether the current UsacFrame() can be fully decoded without knowing information from the previous frame according to the table below.
表-usacIndependencyFlag的含义Table - Meaning of usacIndependencyFlag
备注:请参考针对关于usacIndependencyFlag的建议的X.Y。Note: Please refer to X.Y for suggestions regarding usacIndependencyFlag.
usacExtElementUseDefaultLengthusacExtElementUseDefaultLength
其表示扩展元素的长度是否与在UsacExtElementConfig()中定义的usacExtElementDefaultLength相对应。It indicates whether the length of the extension element corresponds to usacExtElementDefaultLength defined in UsacExtElementConfig().
usacExtElementPayloadLengthusacExtElementPayloadLength
其将以字节包含扩展元素的长度。该值应当仅在目前存取单元中的扩展元素长度偏离默认值usacExtElementDefaultLength的情况下在比特流中明确地传输。It shall contain the length of the extension element in bytes. This value shall only be conveyed explicitly in the bitstream if the extension element length in the current access unit deviates from the default value usacExtElementDefaultLength.
usacExtElementStartusacExtElementStart
其表示目前的usacExtElementSegmentData是否开始数据块。It indicates whether the current usacExtElementSegmentData starts a data block.
usacExtElementStopusacExtElementStop
其表示目前的usacExtElementSegmentData是否结束数据块。It indicates whether the current usacExtElementSegmentData ends the data block.
usacExtElementSegmentDatausacExtElementSegmentData
来自连续USAC帧的UsacExtElement()的所有usacExtElementSegmentData的级联,始于usacExtElementStart==1的UsacExtElement()直至且包含usacExtElementStop==1的UsacExtElement(),形成一个数据块。在一个UsacExtElement()中包含完整数据块的情况下,usacExtElementStart和usacExtElementStop二者将均被设定为1。根据下表,取决于usacExtElementType将数据块解释为字节对齐的扩展有效载荷:The concatenation of all usacExtElementSegmentData from UsacExtElement() of consecutive USAC frames, starting with UsacExtElement() with usacExtElementStart == 1 up to and including UsacExtElement() with usacExtElementStop == 1, forms a data block. In case the complete data block is contained in one UsacExtElement(), both usacExtElementStart and usacExtElementStop will be set to 1. The data block is interpreted as a byte-aligned extended payload depending on the usacExtElementType according to the following table:
表-针对USAC扩展有效载荷解码的数据块的解释Table - Explanation of data blocks decoded for USAC extended payload
fill_bytefill_byte
可以用于以未承载信息的位来填补比特流的位的八位字节。用于fill_byte的确切位模式应当为‘10100101’。An octet that can be used to fill the bits of the bitstream with bits that do not carry information. The exact bit pattern used for fill_byte should be '10100101'.
辅助元素Auxiliary elements
nrCoreCoderChannelsnrCoreCoderChannels
在通道对元素的上下文中,该变量表示形成立体声编码的基础的核心编码器通道的数目。取决于stereoConfigIndex的值,该值将为1或2。In the context of a channel-pair element, this variable represents the number of core encoder channels that form the basis for stereo encoding. Depending on the value of stereoConfigIndex, this value will be 1 or 2.
nrSbrChannelsnrSbrChannels
在通道对元素的上下文中,该变量表示被施加SBR处理的通道的数目。取决于stereoConfigIndex的值,该值将为1或2。In the context of a channel pair element, this variable indicates the number of channels to which SBR processing is applied. Depending on the value of stereoConfigIndex, this value will be 1 or 2.
用于USAC的附属有效载荷Ancillary payloads for USAC
术语和定义Terms and Definitions
UsacCoreCoderData()UsacCoreCoderData()
该数据块包含核心编码器音频数据。针对FD模式或LPD模式,有效载荷元素包含用于一个或二个核心编码器通道的数据。在元素的起始时按通道传达特定模式。This data block contains the core encoder audio data. For FD or LPD mode, the payload element contains data for one or two core encoder channels. The specific mode is conveyed per channel at the beginning of the element.
StereoCoreToolInfo()StereoCoreToolInfo()
所有立体声相关信息被捕获在该元素中。其处理立体声编码模式下的位域的众多依赖性。All stereo related information is captured in this element. It handles many dependencies of the bit fields in stereo coding mode.
辅助元素Auxiliary elements
commonCoreModecommonCoreMode
在CPE中,该标记表示两个经编码的核心编码器通道是否使用相同模式。In CPE, this flag indicates whether the two encoded core encoder passes use the same mode.
Mps212Data()Mps212Data()
该数据块包含用于Mps212立体声模块的有效载荷。该数据的存在取决于stereoConfigIndex。This data block contains the payload for the Mps212 stereo module. The existence of this data depends on stereoConfigIndex.
common_windowcommon_window
其表示CPE的通道0和通道1是否使用相同的窗口参数。It indicates whether channel 0 and channel 1 of the CPE use the same window parameters.
common_twcommon_tw
其表示CPE的通道0和通道1针对时间弯曲式MDCT是否使用相同的参数。This indicates whether channels 0 and 1 of the CPE use the same parameters for the time-warped MDCT.
UsacFrame()的解码Decoding of UsacFrame()
一个UsacFrame()形成USAC比特流的一个存取单元。根据从表确定的outputFrameLength,每个UsacFrame解码成768、1024、2048或4096个输出样本。A UsacFrame() forms an access unit of the USAC bitstream. Each UsacFrame decodes into 768, 1024, 2048 or 4096 output samples, depending on the outputFrameLength determined from the table.
UsacFrame()中的第一位为usacIndependencyFlag,其确定是否可以在对先前帧没有任何获知的情况下对给定帧进行解码。如果usacIndependencyFlag被设定为0,则在当前帧的有效载荷中可能存在对先前帧的依赖性。The first bit in UsacFrame() is usacIndependencyFlag, which determines whether a given frame can be decoded without any knowledge of the previous frame. If usacIndependencyFlag is set to 0, there may be a dependency on a previous frame in the payload of the current frame.
UsacFrame()进一步由一个或更多个语法元素组成,该一个或更多个语法元素将以与其相对应配置元素在UsacDecoderConfig()中的次序相同的次序出现在比特流中。每个元素在所有元素系列中的位置由elemIdx索引。针对每个元素,将使用该实例的(如在UsacDecoderConfig()中传输的)相应配置,即具有相同的elemIdx。UsacFrame() further consists of one or more syntax elements that will appear in the bitstream in the same order as their corresponding configuration elements in UsacDecoderConfig(). The position of each element in the series of all elements is indexed by elemIdx. For each element, the corresponding configuration of this instance (as transmitted in UsacDecoderConfig()) will be used, i.e., with the same elemIdx.
这些语法元素为表中列举的四种类型中的一种类型。这些元素中的每个元素的类型由usacElementType确定。可能存在相同类型的多个元素。在不同帧的相同位置elemIdx处出现的元素将属于相同的流。These syntax elements are of one of the four types listed in the table. The type of each element is determined by usacElementType. Multiple elements of the same type may exist. Elements that appear at the same position elemIdx in different frames belong to the same stream.
表-简单的可能比特流有效载荷的示例Table - Simple Examples of Possible Bitstream Payloads
如果这些比特流有效载荷通过恒定比率通道进行传输,则它们可能包括具有ID_EXT_ELE_FILL的usacExtElementType的扩展有效载荷元素,以调整瞬时比特率。在此情况下,所编码的立体声信号的示例为:If these bitstream payloads are transported over a constant rate channel, they may include an extended payload element with usacExtElementType of ID_EXT_ELE_FILL to adjust the instantaneous bit rate. In this case, an example of a coded stereo signal is:
表-具有扩展有效载荷用以写入填充位的简单立体声比特流的示例Table - Example of a simple stereo bitstream with an extended payload to write padding bits
UsacSingleChannelElement()的解码Decoding of UsacSingleChannelElement()
UsacSingleChannelElement()的简单结构由UsacCoreCoderData()的一个实例组成,其中nrCoreCoderChannels被设定为1。取决于该元素的sbrRatioIndex,跟随nrSbrChannels的UsacSbrData()元素也被设定为1。The simple structure of UsacSingleChannelElement() consists of an instance of UsacCoreCoderData() with nrCoreCoderChannels set to 1. The following UsacSbrData() element with nrSbrChannels also set to 1, depending on the sbrRatioIndex of this element.
UsacExtElement()的解码Decoding of UsacExtElement()
在比特流中的UsacExtElement()结构可以被USAC解码器解码或跳过。每个扩展由在与UsacExtElement()相关联的UsacExtElementConfig()中传送的usacExtElementType识别。针对每个usacExtElementType,可以存在特定解码器。The UsacExtElement() structure in the bitstream can be decoded or skipped by a USAC decoder. Each extension is identified by a usacExtElementType transmitted in the UsacExtElementConfig() associated with the UsacExtElement(). For each usacExtElementType, a specific decoder can exist.
如果用于扩展的解码器能够用于USAC解码器,则紧接着由USAC解码器已经解析UsacExtElement()之后,将扩展的有效载荷转发至扩展解码器。If the decoder for the extension is capable of being used with the USAC decoder, then the payload of the extension is forwarded to the extension decoder right after the UsacExtElement() has been parsed by the USAC decoder.
如果用于扩展的解码器均不能用于USAC解码器,则在比特流内提供最小结构,使得扩展可以被USAC解码器忽略。If none of the decoders for the extension can be used with a USAC decoder, a minimal structure is provided within the bitstream so that the extension can be ignored by the USAC decoder.
扩展元素的长度由八位字节的默认长度指定,该默认长度可以在相应UsacExtElementConfig()内进行传达并且可以在UsacExtElement()中宣布无效;或者通过利用语法元素escapedValue(),扩展元素的长度由在UsacExtElement()中明确提供的长度信息指定,其为一个或三个八位字节长。The length of the extension element is specified by a default length in octets, which can be communicated within the corresponding UsacExtElementConfig() and can be declared invalid in UsacExtElement(); or by utilizing the syntax element escapedValue(), the length of the extension element is specified by length information explicitly provided in UsacExtElement(), which is one or three octets long.
跨越一个或更多个UsacFrame()的扩展有效载荷可以被分片段,并且其有效载荷分布在若干UsacFrame()间。在此情况下,usacExtElementPayloadFrag标记被设定为1,并且解码器必须采集如下范围的所有片段:从usacExtElementStart被设定为1的UsacFrame()直至且包含usacExtElementStop被设定为1的UsacFrame()。当usacExtElementStop被设定为1时,那么扩展被视为完整的并且被传递至扩展解码器。Extension payloads that span one or more UsacFrame()s can be fragmented and have their payload distributed across several UsacFrame()s. In this case, the usacExtElementPayloadFrag flag is set to 1, and the decoder must collect all fragments from a UsacFrame() with usacExtElementStart set to 1 up to and including a UsacFrame() with usacExtElementStop set to 1. When usacExtElementStop is set to 1, the extension is considered complete and is passed to the extension decoder.
注意,本说明书不提供片段扩展有效载荷的完整性保护,应当使用其它手段来确保扩展有效载荷的完整性。Note that this specification does not provide integrity protection for fragmented extension payloads, and other means should be used to ensure the integrity of the extension payload.
注意,假设所有扩展有效载荷数据是字节对齐的。Note that all extended payload data is assumed to be byte aligned.
每个UsacExtElement()应遵守由于使用usacIndependencyFlag所带来的要求。更明确地,如果usacIndependencyFlag被设定(==1),则UsacExtElement()将能够解码而不需获知先前帧(及其中可能包含的扩展有效载荷)。Each UsacExtElement() shall comply with the requirements imposed by the use of usacIndependencyFlag. More specifically, if usacIndependencyFlag is set (==1), then UsacExtElement() will be able to decode without knowledge of the previous frame (and any extension payload that may be contained therein).
解码处理Decoding process
在UsacChannelPairElementConfig()中传输的stereoConfigIndex确定在给定CPE中施加的立体声编码的确切类型。取决于立体声编码的该类型,在比特流中实际传输一个或二个核心编码器通道,并且可变nrCoreCoderChannels必须相应地进行设定。然后,语法元素UsacCoreCoderData()提供针对一个或二个核心编码器通道的数据。The stereoConfigIndex transmitted in UsacChannelPairElementConfig() determines the exact type of stereo encoding applied in a given CPE. Depending on the type of stereo encoding, one or two core encoder channels are actually transmitted in the bitstream, and the variable nrCoreCoderChannels must be set accordingly. The syntax element UsacCoreCoderData() then provides data for one or two core encoder channels.
类似地,取决于立体声编码的类型和eSBR的使用(即如果sbrRatioIndex>0),可以存在可用于一个或二个通道的数据。需要相应地设定nrSbrChannels的值,并且语法元素UsacSbrData()提供针对一个或二个通道的eSBR数据。Similarly, depending on the type of stereo coding and the use of eSBR (i.e. if sbrRatioIndex>0), there may be data available for one or two channels. The value of nrSbrChannels needs to be set accordingly, and the syntax element UsacSbrData() provides eSBR data for one or two channels.
最后,取决于stereoConfigIndex的值来传输Mps212Data()。Finally, Mps212Data() is transmitted depending on the value of stereoConfigIndex.
低频增强型(LFE)通道元素,UsacLfeElement()Low frequency enhancement (LFE) channel element, UsacLfeElement()
概论Introduction
为了维持解码器中的规则结构,UsacLfeElement()被定义为标准fd_channel_stream(0,0,0,0,x)元素,即其等于使用频域编码器的UsacCoreCoderData()。因而,使用用于对UsacCoreCoderData()-元素进行解码的标准程序可以进行解码。In order to maintain a regular structure in the decoder, UsacLfeElement() is defined as a standard fd_channel_stream(0,0,0,0,x) element, i.e. it is equal to UsacCoreCoderData() using the frequency domain encoder. Thus, decoding can be performed using the standard procedure for decoding UsacCoreCoderData()-elements.
然而,为了适应LFE解码器的更高比特率和硬件高效率实现,向用于对该元素进行编码的选项施加若干限制:However, to accommodate higher bitrate and hardware-efficient implementations of the LFE decoder, several restrictions are imposed on the options for encoding this element:
●window_sequence字段始终设定为0(ONLY_LONG_SEQUENCE)The window_sequence field is always set to 0 (ONLY_LONG_SEQUENCE)
●任何LFE的仅最低24个频谱系数可以为非零● Only the lowest 24 spectral coefficients of any LFE can be non-zero
●不使用时域噪声整形,即tns_data_present被设定为0● Temporal noise shaping is not used, i.e. tns_data_present is set to 0
●时间弯曲不作用Time bending does not work
●不施加噪声填充No noise filling
UsacCoreCoderData()UsacCoreCoderData()
UsacCoreCoderData()包含用于对一个或二个核心编码器通道进行解码的所有信息。UsacCoreCoderData() contains all the information needed to decode one or two core encoder channels.
解码的次序为:The decoding order is:
●针对每个通道取得core_mode[]●Get core_mode[] for each channel
●在两个核心编码器通道(nrChannels==2)的情况下,解析StereoCoreToolInfo()并且确定所有立体声相关参数In case of two core encoder channels (nrChannels == 2), parse StereoCoreToolInfo() and determine all stereo related parameters
●取决于所传达的core_modes,针对每个通道来传输lpd_channel_stream()或fd_channel_stream()● Depending on the core_modes communicated, lpd_channel_stream() or fd_channel_stream() is transmitted for each channel
从以上列表可知,一个核心编码器通道(nrChannels==1)的解码导致获得core_mode位,其后面跟随一个lpd_channel_stream或fd_channel_stream,这取决于core_mode。As can be seen from the above list, decoding of one core encoder channel (nrChannels == 1) results in obtaining the core_mode bit, which is followed by one lpd_channel_stream or fd_channel_stream, depending on the core_mode.
在二个核心编码器通道的情况下,可以利用通道之间的若干传达冗余,特别是二个通道的core_mode为0的情况尤为如此。细节请参考6.2.X(StereoCoreToolInfo()的解码)。In the case of two core encoder channels, some communication redundancy between channels can be exploited, especially when both channels have core_mode set to 0. See 6.2.X (Decoding of StereoCoreToolInfo()) for details.
StereoCoreToolInfo()StereoCoreToolInfo()
StereoCoreToolInfo()允许对如下参数进行有效编码:该参数的值可以在以FD模式(core_mode[0,1]==0)对二个通道进行编码的情况下跨越CPE的核心编码器通道共享。特别地,在比特流中的适当标记被设定为1时,共享下列数据元素。StereoCoreToolInfo() allows efficient encoding of parameters whose values can be shared across the CPE's core encoder channels when encoding two channels in FD mode (core_mode[0,1] == 0). Specifically, the following data elements are shared when the appropriate flags in the bitstream are set to 1.
表-跨越核心编码器通道对的通道而共享的比特流元素Table - Bitstream elements shared across channels of a core encoder channel pair
如果未设定适当的标记,则针对每个核心编码器通道以StereoCoreToolInfo()(max_sfb、max_sfb1)或以跟随UsacCoreCoderData()元素中的StereoCoreToolInfo()的fd_channlel_stream()来分别传输数据元素。If the appropriate flag is not set, the data elements are transmitted separately for each core encoder channel in StereoCoreToolInfo() (max_sfb, max_sfbl) or in fd_channlel_stream() following StereoCoreToolInfo() in a UsacCoreCoderData() element.
在common_window==1的情况下,StereoCoreToolInfo()还包含与MDCT域中的M/S立体声编码和复杂预测数据有关的信息(参见7.7.2)。In case common_window == 1, StereoCoreToolInfo() also contains information about M/S stereo coding in the MDCT domain and complex prediction data (see 7.7.2).
UsacSbrData()UsacSbrData()
该数据块包含针对一个或二个通道的SBR带宽扩展的有效载荷。该数据的存在取决于sbrRatioIndex。This data block contains the payload of the SBR bandwidth extension for one or two channels. The presence of this data depends on the sbrRatioIndex.
SbrInfo()SbrInfo()
该元素包含在改变时不需解码器重置的SBR控制参数。This element contains SBR control parameters that do not require a decoder reset when changed.
SbrHeader()SbrHeader()
该元素包含具有SBR配置参数的SBR头数据,该数据通常不会随比特流的持续时间而改变。This element contains SBR header data with SBR configuration parameters, which generally do not change over the duration of the bitstream.
用于USAC的SBR有效载荷SBR payload for USAC
在USAC中,SBR有效载荷在UsacSbrData()中进行传输,其为每个单个通道元素或通道对元素的整数部分。UsacSbrData()紧跟随UsacCoreCoderData()。不存在用于LFE通道的SBR有效载荷。In USAC, the SBR payload is transmitted in UsacSbrData() as the integer part of each single channel element or channel pair element. UsacSbrData() immediately follows UsacCoreCoderData(). There is no SBR payload for the LFE channel.
numSlotsnumSlots
在Mps212Data帧中的时隙数目。The number of time slots in the Mps212Data frame.
图1示出用于对在输入端10处提供的经编码音频信号进行解码的音频解码器。在输入线10上,提供有作为例如数据流或者甚至更示例性地为串行数据流的经编码的音频信号。经编码的音频信号包括在数据流的有效载荷区段中的第一通道元素和第二通道元素,并且包括在数据流的配置区段中的用于第一通道元素的第一解码器配置数据和用于第二通道元素的第二解码器配置数据。通常,第一解码器配置数据将与第二解码器配置数据不同,原因在于第一通道元素通常也将与第二通道元素不同。FIG1 shows an audio decoder for decoding an encoded audio signal provided at an input 10. An encoded audio signal is provided on input line 10, for example, as a data stream or, more illustratively, as a serial data stream. The encoded audio signal comprises a first channel element and a second channel element in a payload section of the data stream, and comprises first decoder configuration data for the first channel element and second decoder configuration data for the second channel element in a configuration section of the data stream. Typically, the first decoder configuration data will differ from the second decoder configuration data, as the first channel element will typically also differ from the second channel element.
数据流或经编码的音频信号被输入到数据流读取器12中,以用于读取用于每个通道元素的配置数据并且经由连接线13将该配置数据转发至配置控制器14。此外,数据流读取器被布置成用于读取有效载荷区段中的用于每个通道元素的有效载荷数据,并且包括第一通道元素和第二通道元素的该有效载荷数据经由连接线15被提供至可配置解码器16。可配置解码器16被布置成对多个通道元素进行解码,以输出用于各个通道元素的数据,如在输出线18a、18b处所表示的。具体地,在对第一通道元素进行解码时,根据第一解码器配置数据来配置可配置解码器16,而在对第二通道元素进行解码时,根据第二解码器配置数据来配置可配置解码器16。这由连接线17a、17b表示,其中连接线17a将第一解码器配置数据从配置控制器14传输至可配置解码器,而连接线17b将第二解码器配置数据从配置控制器传输至可配置解码器。配置控制器将以任何方式来实现,以使可配置解码器根据在相应解码器配置数据中或在相应线17a、17b上所传达的解码器配置来进行操作。因此,配置控制器14可以被实现为介于实际上从数据流得到配置数据的数据流读取器12与通过实际读取的配置数据进行配置的可配置解码器器16之间的接口。A data stream or encoded audio signal is input to a data stream reader 12 for reading configuration data for each channel element and forwarding this configuration data to a configuration controller 14 via connection line 13. Furthermore, the data stream reader is arranged to read payload data for each channel element in the payload section, and this payload data, including a first channel element and a second channel element, is provided to a configurable decoder 16 via connection line 15. Configurable decoder 16 is arranged to decode a plurality of channel elements and output data for each channel element, as represented by output lines 18a, 18b. Specifically, when decoding a first channel element, configurable decoder 16 is configured according to first decoder configuration data, while when decoding a second channel element, configurable decoder 16 is configured according to second decoder configuration data. This is represented by connection lines 17a, 17b, where connection line 17a transmits the first decoder configuration data from configuration controller 14 to the configurable decoder, and connection line 17b transmits the second decoder configuration data from the configuration controller to the configurable decoder. The configuration controller is to be implemented in any manner such that the configurable decoder operates according to the decoder configuration communicated in the corresponding decoder configuration data or on the corresponding lines 17a, 17b. Thus, the configuration controller 14 can be implemented as an interface between the data stream reader 12, which actually obtains the configuration data from the data stream, and the configurable decoder 16, which is configured by the configuration data actually read.
图2示出用于对在输入端20处提供的多通道输入音频信号进行编码的相应音频编码器。输入20被示为包括三条不同的线20a、20b、20c,其中线20a承载例如中心通道音频信号,线20b承载左通道音频信号,以及线20c承载右通道音频信号。所有三个通道信号均被输入到配置处理器22和可配置编码器24中。配置处理器适于针对第一通道元素而在线21a上生成第一配置数据以及在线21b上生成第二配置数据,例如仅包含中心通道以使得第一通道元素为单个通道元素;以及针对第二通道元素,例如第二通道元素是承载左通道和右通道的通道对元素。可配置编码器24适于使用第一配置数据21a和第二配置数据21b来对多通道音频信号20进行编码以获得第一通道元素23a和第二通道元素23b。音频编码器另外包括数据流生成器26,其在输入线25a和25b处接收第一配置数据和第二配置数据,并且另外接收第一通道元素23a和第二通道元素23b。数据流生成器26适于生成表示经编码的音频信号的数据流27,该数据流具有:包括第一配置数据和第二配置数据的配置区段;以及包括第一通道元素和第二通道元素的有效载荷区段。FIG2 illustrates a corresponding audio encoder for encoding a multi-channel input audio signal provided at an input 20. The input 20 is shown as comprising three different lines 20a, 20b, and 20c, wherein line 20a carries, for example, a center channel audio signal, line 20b carries a left channel audio signal, and line 20c carries a right channel audio signal. All three channel signals are input to a configuration processor 22 and a configurable encoder 24. The configuration processor is adapted to generate first configuration data on line 21a and second configuration data on line 21b for a first channel element, e.g., including only the center channel so that the first channel element is a single channel element, and for a second channel element, e.g., a channel pair element carrying a left channel and a right channel. The configurable encoder 24 is adapted to use the first configuration data 21a and the second configuration data 21b to encode the multi-channel audio signal 20 to obtain a first channel element 23a and a second channel element 23b. The audio encoder further comprises a data stream generator 26 which receives the first and second configuration data at input lines 25a and 25b and further receives the first and second channel elements 23a and 23b. The data stream generator 26 is adapted to generate a data stream 27 representing the encoded audio signal, the data stream having a configuration section comprising the first and second configuration data and a payload section comprising the first and second channel elements.
在本文中,其概述了第一配置数据和第二配置数据可以与第一解码器配置数据或第二解码器配置数据相同或不同。在第一配置数据和第二配置数据与第一解码器配置数据或第二解码器配置数据不同的情况下,配置控制器14被配置成在配置数据为定向于编码器的数据时通过应用例如独特功能或查找表等将数据流中的配置数据转换为相应的定向于解码器的数据。然而,优选地,写入到数据流中的配置数据已经为解码器配置数据,使得可配置编码器24或配置处理器22具有例如如下功能:该功能用于从所计算的解码器配置数据得到编码器配置数据,或用于通过应用独特功能或查找表或其它预先知识而再从所计算的编码器配置数据计算或确定解码器配置数据。In this document, it is outlined that the first configuration data and the second configuration data can be the same as or different from the first decoder configuration data or the second decoder configuration data. In the event that the first configuration data and the second configuration data are different from the first decoder configuration data or the second decoder configuration data, the configuration controller 14 is configured to convert the configuration data in the data stream into corresponding decoder-directed data by applying, for example, a unique function or a lookup table, if the configuration data is encoder-directed data. However, preferably, the configuration data written to the data stream is already decoder configuration data, so that the configurable encoder 24 or configuration processor 22 has, for example, the following functionality: the functionality is used to derive encoder configuration data from the calculated decoder configuration data, or the functionality is used to calculate or determine decoder configuration data from the calculated encoder configuration data by applying a unique function or a lookup table or other prior knowledge.
图5a示出输入到图1的数据流读取器12中的或者由图2的数据流生成器26输出的经编码音频信号的大致图示。数据流包括配置区段50和有效载荷区段52。图5b示出图5a中的配置区段50的更详细实现。图5b中示出的数据流——其通常为逐一承载位的串行数据流——在其第一端50a处包括与传输结构的较高层(如MPEG-4文件格式)有关的通用配置数据。可替代地或另外地,配置数据50a(配置数据50a可以存在或可以不存在)包括包含在50b处所示的UsacChannelConfig中的另外的通用配置数据。FIG5 a shows a schematic diagram of an encoded audio signal input to the data stream reader 12 of FIG1 or output by the data stream generator 26 of FIG2 . The data stream includes a configuration section 50 and a payload section 52. FIG5 b shows a more detailed implementation of the configuration section 50 in FIG5 a. The data stream shown in FIG5 b - which is typically a serial data stream carrying bits one by one - includes general configuration data related to the higher layers of the transport structure (such as the MPEG-4 file format) at its first end 50 a. Alternatively or in addition, the configuration data 50 a (which may or may not be present) includes further general configuration data contained in the UsacChannelConfig shown at 50 b.
通常,配置数据50a还可以包括来自图6a所示的UsacConfig的数据,并且项50b包括在图6b的UsacChannelConfig中实现并示出的元素。具体地,用于所有通道元素的相同配置可以例如包括在图3a、图3b和图4a、图4b的上下文下所示出并描述的输出通道表示。Typically, configuration data 50a may also include data from the UsacConfig shown in Figure 6a, and item 50b includes elements implemented and shown in the UsacChannelConfig of Figure 6b. Specifically, the same configuration for all channel elements may, for example, include the output channel representations shown and described in the context of Figures 3a, 3b and 4a, 4b.
然后,比特流的配置区段50后面跟随UsacDecoderConfig元素,该UsacDecoderConfig元素在本示例中由第一配置数据50c、第二配置数据50d以及第三配置数据50c形成。第一配置数据50c用于第一通道元素、第二配置数据50d用于第二通道元素,以及第三配置数据50e用于第三通道元素。The configuration section 50 of the bitstream is then followed by a UsacDecoderConfig element, which in this example is formed by first configuration data 50c, second configuration data 50d and third configuration data 50e. The first configuration data 50c is for the first channel element, the second configuration data 50d is for the second channel element and the third configuration data 50e is for the third channel element.
具体地,如图5b所示的用于通道元素的每个配置数据包括在图6c中关于其语法所使用的标识符元素类型索引idx。然后,具有两位的元素类型索引idx后面跟随描述如下通道元素配置数据的位:在图6c中找到该通道元素配置数据,并且在针对单个通道元素的图6d中、在针对通道对元素的图6e中、在针对LFE元素的图6f中以及在针对扩展元素的图6k中进一步说明,上述元素都是通常可以被包括在USAC比特流中的通道元素。Specifically, each configuration data for a channel element as shown in FIG5b includes an identifier element type index idx used with respect to its syntax in FIG6c. The element type index idx having two bits is then followed by bits describing the channel element configuration data as follows: the channel element configuration data is found in FIG6c and further described in FIG6d for a single channel element, FIG6e for a channel pair element, FIG6f for an LFE element, and FIG6k for an extension element, all of which are channel elements that may be typically included in a USAC bitstream.
图5c示出包括在图5a所示的比特流的有效载荷区段52中的UASC帧。当图5b中的配置区段形成图5a的配置区段50时,即当有效载荷区段包括三个通道元素时,那么有效载荷区段52将如图5c所示来实现,即用于第一通道元素52a的有效载荷数据后面跟随有用于由52b表示的第二通道元素的有效载荷数据,而用于由52b表示的第二通道元素的有效载荷数据后面跟随有用于第三通道元素的有效载荷数据52c。因此,根据本发明,配置区段和有效载荷区段以如下方式进行组织:配置数据相对于通道元素的次序与有效载荷区段中有效载荷数据相对于通道元素的次序相同。因此,当在UsacDecoderConfig元素中的次序为用于第一通道元素的配置数据、用于第二通道元素的配置数据、用于第三通道元素的配置数据时,那么在有效载荷区段中的次序相同,即在串行数据或比特流中存在用于第一通道元素的有效载荷数据、然后跟随用于第二通道元素的有效载荷数据、再然后跟随用于第三通道元素的有效载荷数据。FIG5c illustrates a UASC frame included in a payload segment 52 of the bitstream shown in FIG5a. When the configuration segment in FIG5b forms the configuration segment 50 of FIG5a, i.e., when the payload segment includes three channel elements, then payload segment 52 is implemented as shown in FIG5c, i.e., the payload data for the first channel element 52a is followed by the payload data for the second channel element represented by 52b, and the payload data for the second channel element represented by 52b is followed by the payload data for the third channel element 52c. Thus, according to the present invention, the configuration segment and the payload segment are organized in such a way that the order of the configuration data relative to the channel elements is the same as the order of the payload data relative to the channel elements in the payload segment. Therefore, when the order in the UsacDecoderConfig element is configuration data for the first channel element, configuration data for the second channel element, and configuration data for the third channel element, then the order in the payload segment is the same, that is, in the serial data or bit stream there is payload data for the first channel element, followed by payload data for the second channel element, and then followed by payload data for the third channel element.
在配置区段和有效载荷区段中的并行结构是有利的,原因在于如下事实:关于哪个配置数据属于哪个通道元素,该并行结构允许的容易组织以非常低的开销进行传达。在现有技术中,不需要任何次序,原因在于并不存在针对通道元素的各个配置数据。然而,根据本发明,引入针对各个通道元素的各个配置数据,以确保可以最佳地选择针对每个通道元素的最佳配置数据。The parallel structure in the configuration and payload sections is advantageous because it allows for easy organization of which configuration data belongs to which channel element, communicating with very low overhead. In the prior art, no ordering is required because there is no individual configuration data for each channel element. However, according to the present invention, individual configuration data for each channel element is introduced to ensure that the best configuration data for each channel element can be optimally selected.
通常,USAC帧包括用于20毫秒至40毫秒时间的数据。当考虑更长数据流时,如图5d所示,那么存在配置区段60a,其后面跟随有有效载荷区段或帧62a、62b、62c、…62e,然后在比特流中再包括配置区段62d。Typically, a USAC frame includes data for a period of 20 to 40 milliseconds. When considering longer data streams, as shown in FIG5 d, there is a configuration section 60a, followed by payload sections or frames 62a, 62b, 62c, ... 62e, and then a configuration section 62d is included in the bit stream.
配置数据在配置区段中的次序(如关于图5b和图5c所讨论的)与帧62a至62e中的每个帧中的通道元素有效载荷数据的次序相同。因此,针对各个通道元素的有效载荷数据的次序在每个帧62a至62e中也完全相同。The order of the configuration data in the configuration section (as discussed with respect to FIG5 b and FIG5 c ) is the same as the order of the channel element payload data in each of frames 62 a to 62 e. Therefore, the order of the payload data for the individual channel elements is also exactly the same in each of frames 62 a to 62 e.
通常,当经编码的信号为例如存储在硬盘上的单个文件时,那么在整个音轨的开始阶段(如大约10分钟或20分钟的音轨),单个配置区段50是足够的。然后,单个配置区段后面跟随高数目的各个帧,并且配置对于每个帧是有效的,通道元素数据(配置或有效载荷)的次序在每个帧以及配置区段中也是相同的。Typically, when the encoded signal is a single file stored on a hard disk, for example, then at the beginning of the entire track (e.g., a track of approximately 10 or 20 minutes), a single configuration section 50 is sufficient. This single configuration section is then followed by a high number of individual frames, and the configuration is valid for each frame, the order of the channel element data (configuration or payload) being the same in each frame and configuration section.
然而,当经编码的音频信号为数据流时,必需在各个帧之间引入配置区段以提供存取点,使得解码器甚至可以在如下情况下开始解码:较早的配置区段已经被传输,但由于解码器尚未开启以接收实际数据流而使所传输的配置区段未被该解码器接收到。然而,能够任意选择在不同配置区段之间的帧的数目n,但是当想实现每秒一个存取点时,那么两个配置区段之间的帧的数目将介于25和50之间。However, when the coded audio signal is a data stream, configuration segments must be introduced between the individual frames to provide access points so that the decoder can start decoding even when an earlier configuration segment has been transmitted but not received by the decoder because the decoder has not yet been switched on to receive the actual data stream. The number of frames n between different configuration segments can be chosen arbitrarily, but if one wants to achieve one access point per second, the number of frames between two configuration segments will be between 25 and 50.
随后,图7示出用于对5.1多通道信号进行编码和解码的直接示例。Subsequently, FIG. 7 shows a direct example for encoding and decoding a 5.1 multi-channel signal.
优选地,使用四个通道元素,其中第一通道元素为包括中心通道的单个通道元素,第二通道元素为包括左通道和右通道的通道对元素CPE1,以及第三通道元素为包括左环绕通道和右环绕通道的第二通道对元素CPE2。最后,第四通道元素为LFE通道元素。在实施方式中,例如,用于单个通道元素的配置数据可以使得噪声填充工具打开,而例如针对包括环绕通道的第二通道对元素,噪声填充工具是关闭的并且施加低质量的参数立体声编码程序,但低比特率立体声编码程序导致低比特率然而质量损耗不成问题,原因在于通道对元素具有环绕通道的事实。Preferably, four channel elements are used, where the first channel element is a single channel element comprising a center channel, the second channel element is a channel pair element CPE1 comprising a left channel and a right channel, and the third channel element is a second channel pair element CPE2 comprising a left surround channel and a right surround channel. Finally, the fourth channel element is an LFE channel element. In an embodiment, for example, the configuration data for the single channel element may cause the noise filling tool to be enabled, while for the second channel pair element comprising the surround channels, for example, the noise filling tool is disabled and a low-quality parametric stereo encoding procedure is applied. However, the low-bitrate stereo encoding procedure results in a low bitrate, but quality loss is not a problem due to the fact that the channel pair element has surround channels.
另一方面,左通道和右通道包括大量的信息,因此,由MPS212配置对高质量的立体声编码程序进行传达。M/S立体声编码的有利之处在于其提供高质量,但问题在于比特率非常高。因此,M/S立体声编码对于CPE1是优选的,但对于CPE2却并非优选。此外,取决于实现,噪声填充特征可以打开或关闭并且优选地被打开,原因在于以下事实:高度强调左通道和右通道的良好且高质量的表示,而且对于中心通道,噪声填充也打开。On the other hand, the left and right channels contain a large amount of information, so the MPS 212 configuration conveys a high-quality stereo encoding process. M/S stereo encoding is advantageous in that it provides high quality, but suffers from a very high bit rate. Therefore, M/S stereo encoding is preferred for CPE1 but not for CPE2. Furthermore, the noise filling feature can be turned on or off depending on the implementation, but is preferably turned on because it places a high emphasis on good, high-quality representation of the left and right channels, and noise filling is also turned on for the center channel.
然而,当通道元素C的核心带宽例如非常低并且中心通道中被量化为零的连续线的数目也为低时,那么关闭用于中心通道的单个通道元素的噪声填充也可以是有利的,原因在于以下事实:噪声填充并不提供另外的质量增益,并且鉴于质量没有或仅较小地提升,那么可以保存用于对噪声填充工具的边信息进行传输所需的位。However, when the core bandwidth of the channel element C is, for example, very low and the number of consecutive lines quantized to zero in the center channel is also low, then it may also be advantageous to switch off noise filling for individual channel elements of the center channel due to the fact that noise filling does not provide an additional quality gain and given that the quality is not or only slightly improved, the bits required for transmitting the side information of the noise filling tool can be saved.
通常,在针对通道元素的配置区段中所传达的工具为在例如图6d、图6e、图6f、图6g、图6h、图6i、图6j中提及的工具,并且另外包括用于图6k、图6l以及图6m中的扩展元素配置的元素。如图6e所示,针对每个通道元素的MPS2121配置可以不同。Typically, the tools conveyed in the configuration section for a channel element are those mentioned in, for example, Figures 6d, 6e, 6f, 6g, 6h, 6i, and 6j, and additionally include elements for the extended element configurations in Figures 6k, 6l, and 6m. As shown in Figure 6e, the MPS 2121 configuration for each channel element can be different.
MPEG环绕使用针对空间感知的人类听觉提示的紧密参数表示,以允许多通道信号的比特率有效表示。除了CLD和ICC参数之外,可以传输IPD参数。针对相位信息的有效表示,用给定的CLD和IPD参数来估计OPD参数。IPD和OPD参数用于合成相位差以进一步改进立体声像。MPEG Surround uses a compact parameter representation of human auditory cues for spatial perception to allow for bitrate-efficient representation of multichannel signals. In addition to CLD and ICC parameters, IPD parameters can be transmitted. For efficient representation of phase information, OPD parameters are estimated from the given CLD and IPD parameters. The IPD and OPD parameters are used to synthesize phase differences to further refine the stereo image.
除了参数模式之外,可以采用残差编码,其中残差具有有限带宽或完整带宽。在此程序中,通过利用CLD、ICC和IPD参数将单声输入信号和残差信号混合,来生成两个输出信号。另外,在图6j中提及的所有参数可以分别选择为用于每个通道元素。各个参数为例如在2010年9月24日的ISO/IEC CD 23003-3(其已经通过引用并入本文)中详细说明的。In addition to the parametric mode, residual coding can be used, where the residual has limited bandwidth or full bandwidth. In this procedure, two output signals are generated by mixing a mono input signal and a residual signal using the CLD, ICC, and IPD parameters. In addition, all parameters mentioned in FIG6j can be selected separately for each channel element. The individual parameters are described in detail, for example, in ISO/IEC CD 23003-3, dated September 24, 2010, which is incorporated herein by reference.
另外,如图6f和图6g所示,核心特征(如时间弯曲特征和噪声填充特征)可以分别针对每个通道元素打开或关闭。在以上参考文献中的术语“时间弯曲式滤波器组和块切换”下描述的时间弯曲工具替代了标准滤波器组和块切换。除IMDCT之外,该工具包含从任意间隔网格到正常线性间隔的时间网格的时域至时域映射,以及窗口形状的相应适应。In addition, as shown in Figures 6f and 6g, core features (such as time warping features and noise filling features) can be turned on or off for each channel element respectively. The time warping tool described under the term "time warping filter bank and block switching" in the above reference replaces the standard filter bank and block switching. In addition to IMDCT, this tool includes time-domain to time-domain mapping from an arbitrarily spaced grid to a normal linearly spaced time grid, as well as corresponding adaptation of the window shape.
另外,如图7所示,噪声填充工具可以分别针对每个通道元素打开或关闭。在低比特率编码中,噪声填充可以用于两个目的。在低比特率音频编码中的频谱值的过程量化可能在逆量化之后导致非常稀疏的频谱,原因在于许多频谱线可能已经被量化为零。稀疏的频谱将导致经解码的信号声音尖锐或不稳定(尖叫声)。通过在解码器中用“小”值来替换零线,可以掩蔽或减少这些非常明显的伪像而不会增加明显的新噪声伪像。In addition, as shown in Figure 7, the noise filling tool can be turned on or off for each channel element separately. In low bit rate encoding, noise filling can be used for two purposes. The process quantization of spectral values in low bit rate audio encoding may result in a very sparse spectrum after inverse quantization because many spectral lines may have been quantized to zero. The sparse spectrum will cause the decoded signal to sound sharp or unstable (screeching). By replacing the zero lines with "small" values in the decoder, these very obvious artifacts can be masked or reduced without adding obvious new noise artifacts.
如果在原始频谱中存在噪声状信号部分,则基于仅少量参数信息如噪声信号部分的能量,可以在解码器中重现这些噪声信号部分的感知等效表示。与传输经编码的波形所需要的位数相比较,可以使用较少的位来传输参数信息。具体地,需要传输的数据元素为噪声偏移元素和噪声级,该噪声偏移元素为对量化至零的频带的标度因子进行修改的另外偏移值,而该噪声级为表示针对被量化为零的每条频谱线要添加的量化噪声的整数。If noise-like signal portions are present in the original spectrum, a perceptually equivalent representation of these noise signal portions can be reproduced in the decoder based on only a small amount of parametric information, such as the energy of the noise signal portions. Fewer bits are required to transmit the parametric information than to transmit the encoded waveform. Specifically, the data elements that need to be transmitted are the noise offset element, which is an additional offset value that modifies the scale factor of the frequency bands quantized to zero, and the noise level, which is an integer representing the quantization noise to be added to each spectral line quantized to zero.
如图7以及图6f和图6g所示,该特征可以分别针对每个通道元素打开或关闭。As shown in Figure 7 and Figures 6f and 6g, this feature can be turned on or off for each channel element separately.
另外,存在现在可以分别针对每个通道元素进行传达的SBR特征。Additionally, there are SBR features that can now be communicated for each channel element separately.
如图6h所示,这些SBR元素包括SBR中的不同工具的打开/关闭。要分别针对每个通道元素打开或关闭的第一工具为谐波SBR。当打开谐波SBR时,执行谐波SBR音调,而在关闭谐波SBR时,使用从MPEG-4(高效率)已知的具有连续线的音调。As shown in Figure 6h, these SBR elements include turning on/off different tools in SBR. The first tool to be turned on or off for each channel element is Harmonic SBR. When Harmonic SBR is turned on, Harmonic SBR tones are performed, while when Harmonic SBR is turned off, tones with continuous lines known from MPEG-4 (High Efficiency) are used.
此外,可以施加PVC或“预测向量编码”解码处理。为了改进eSBR工具的主观质量,特别是对于低比特率下的语音内容,向eSBR工具增加预测向量编码(PVC)。通常,对于语音信号,在低频带和高频带的频谱包络之间存在相当高的相关性。在PVC方案中,利用根据低频带的频谱包络来预测高频带的频谱包络,其中用于预测的系数矩阵借助于向量量化进行编码。HF包络调整器被修改为处理由PVC解码器生成的包络。Furthermore, a PVC or "Prediction Vector Coding" decoding process can be applied. In order to improve the subjective quality of the eSBR tool, especially for speech content at low bit rates, Prediction Vector Coding (PVC) is added to the eSBR tool. Typically, for speech signals, there is a fairly high correlation between the spectral envelopes of the low and high frequency bands. In the PVC scheme, the spectral envelope of the high frequency band is predicted from the spectral envelope of the low frequency band, where the coefficient matrix used for the prediction is encoded with the help of vector quantization. The HF envelope adjuster is modified to process the envelope generated by the PVC decoder.
因此,对于在中心通道中存在例如语音的单个通道元素,PVC工具可以特别有用;然而例如对于CPE2的环绕通道或CPE1的左通道和右通道,PVC工具没有用。Thus, the PVC tool may be particularly useful for single channel elements such as speech in the center channel; however, the PVC tool is not useful for, for example, the surround channels of CPE2 or the left and right channels of CPE1.
此外,跨时间包络整形特征(inter-TES)可以分别针对每个通道元素打开或关闭。继包络调整器之后,子带样本间的时间包络整形(inter-TES)处理QMF子带样本。该模块以比包络调整器的时间粒度更精细的时间粒度对更高频的带宽的时间包络进行整形。通过向SBR包络中的每个QMF子带样本施加增益因子,inter-Tes对QMF子带样本中的时间包络进行整形。inter-Tes包括三个模块,即较低频子带样本间的时间包络计算器、子带样本间的时间包络调整器以及子带样本间的时间包络整形器。由于该工具需要另外的位的事实,因此将存在鉴于质量增益而不调整该另外的位消耗的通道元素,以及鉴于质量增益而调整该另外的位消耗的通道元素。因此,根据本发明,使用该工具逐个通道元素的激活/解除激活。In addition, the inter-time envelope shaping feature (inter-TES) can be turned on or off for each channel element separately. Following the envelope adjuster, the inter-subband sample time envelope shaping (inter-TES) processes the QMF subband samples. This module shapes the time envelope of the higher frequency bandwidth with a finer time granularity than the time granularity of the envelope adjuster. Inter-Tes shapes the time envelope in the QMF subband samples by applying a gain factor to each QMF subband sample in the SBR envelope. Inter-Tes consists of three modules, namely the time envelope calculator between lower frequency subband samples, the time envelope adjuster between subband samples, and the time envelope shaper between subband samples. Due to the fact that the tool requires additional bits, there will be channel elements for which the additional bit consumption is not adjusted in view of the quality gain, and channel elements for which the additional bit consumption is adjusted in view of the quality gain. Therefore, according to the present invention, activation/deactivation of the tool is used for each channel element.
此外,图6i示出SBR默认头的语法,并且可以针对每个通道元素不同地选择图6i中提及的SBR默认头的所有SBR参数。例如,这与实际上设定交叉频率的起始频率或停止频率有关,其中该交叉频率即信号重建从模式改变远离成为参数模式的频率。其他特征(如频率分辨率和噪声频带分辨率等)也可用于针对各通道元素选择性地设定。In addition, FIG6i shows the syntax of the SBR default header, and all SBR parameters of the SBR default header mentioned in FIG6i can be selected differently for each channel element. For example, this is related to actually setting the start frequency or stop frequency of the crossover frequency, where the crossover frequency is the frequency at which the signal reconstruction changes from the mode to the parameter mode. Other features (such as frequency resolution and noise band resolution, etc.) can also be used to selectively set for each channel element.
因此,如图7所示,优选地分别针对立体声特征、针对核心编码器特征以及针对SBR特征来设定配置数据。元素的各个设定不仅指图6i所示SBR默认头中的SBR参数,而且还适用于图6h所示的SbrConfig中的所有参数。Therefore, as shown in Figure 7, configuration data is preferably set for stereo features, core encoder features and SBR features respectively. The respective settings of the element refer not only to the SBR parameters in the SBR default header shown in Figure 6i, but also apply to all parameters in SbrConfig shown in Figure 6h.
随后,参照图8用于说明图1的解码器的实现方式。Next, the implementation of the decoder of FIG. 1 is explained with reference to FIG. 8 .
具体地,数据流读取器12和配置控制器14的功能类似于在图1的上下文中描述的功能。然而,可配置解码器16现在例如针对各个解码器实例来实现,其中每个解码器实例具有用于由配置控制器14提供的配置数据C的输入端,以及用于从数据流读取器12接收相应通道元素的、用于数据D的输入端。In particular, the functionality of the data stream reader 12 and the configuration controller 14 is similar to that described in the context of Figure 1. However, the configurable decoder 16 is now implemented, for example, for individual decoder instances, wherein each decoder instance has an input for configuration data C provided by the configuration controller 14 and an input for data D for receiving the corresponding channel element from the data stream reader 12.
具体地,图8的功能使得针对每个单独的通道元素,提供单独的解码器实例。因此,第一解码器实例由第一配置数据配置作为例如用于中心通道的单个通道元素。In particular, the functionality of Figure 8 is such that for each individual channel element, an individual decoder instance is provided.Thus, a first decoder instance is configured by the first configuration data as a single channel element, for example for the center channel.
此外,第二解码器实例根据用于通道对元素的左通道和右通道的第二解码器配置数据进行配置。此外,第三解码器实例16c针对包括左环绕通道和右环绕通道的又一通道对元素进行配置。最后,第四解码器实例针对LFE通道进行配置。因此,第一解码器实例提供单通道C作为输出。然而,第二解码器实例16b和第三解码器实例16c各自提供两个输出通道,即一方面为左通道和右通道,另一方面为左环绕通道和右环绕通道。最后,第四解码器实例16d提供LFE通道作为输出。多通道信号的所有这些六个通道通过解码器实例被转发至输出接口19,然后最终被发送为用于例如存储,或用于例如在5.1扬声器设置中回放。清楚的是,当扬声器设置为不同的扬声器设置时,需要不同的解码器实例和不同数目的解码器实例。Furthermore, the second decoder instance is configured according to second decoder configuration data for the left and right channels of the channel pair element. Furthermore, the third decoder instance 16 c is configured for a further channel pair element comprising a left surround channel and a right surround channel. Finally, the fourth decoder instance is configured for the LFE channel. Thus, the first decoder instance provides a single channel C as an output. However, the second decoder instance 16 b and the third decoder instance 16 c each provide two output channels, namely, the left and right channels on the one hand, and the left surround channel and the right surround channel on the other hand. Finally, the fourth decoder instance 16 d provides the LFE channel as an output. All six channels of the multichannel signal are forwarded via the decoder instances to the output interface 19 and then ultimately sent for, for example, storage or for playback in, for example, a 5.1 speaker setup. Clearly, different decoder instances and a different number of decoder instances are required when the speaker setup is different.
图9示出根据本发明的实施方式的用于对经编码的音频信号执行解码的方法的优选实现方式。FIG. 9 shows a preferred implementation of a method for decoding an encoded audio signal according to an embodiment of the present invention.
在步骤90中,数据流读取器12开始读取图5a的配置区段50。然后,如在步骤92中表示的,基于相应配置数据块50c中的通道元素标识符来识别通道元素。在步骤94中,读取用于该所识别的通道元素的配置数据,并且将其用于实际上配置解码器,或用于存储以在后来处理通道元素时对解码器进行配置。这在步骤94中示出。In step 90, the data stream reader 12 begins reading the configuration section 50 of FIG. 5a. Then, as indicated in step 92, a channel element is identified based on the channel element identifier in the corresponding configuration data block 50c. In step 94, the configuration data for the identified channel element is read and used to actually configure the decoder, or to be stored for later use in configuring the decoder when processing the channel element. This is shown in step 94.
在步骤96中,使用图5b的部分50d中的第二配置数据的元素类型标识符来识别下一通道元素。这在图9的步骤96中示出。然后,在步骤98中,读取配置数据并且将其用于实际配置解码器或解码器实例,或读取配置数据以在要对用于该通道元素的有效载荷进行解码时可替代地存储配置数据。In step 96, the element type identifier of the second configuration data in portion 50d of Figure 5b is used to identify the next channel element. This is shown in step 96 of Figure 9. Then, in step 98, the configuration data is read and used to actually configure the decoder or decoder instance, or the configuration data is read to alternatively store the configuration data when the payload for that channel element is to be decoded.
然后,在步骤100中,循环通过整个配置数据,即继续通道元素的识别和用于通道元素的配置数据的读取,直到读取了所有配置数据为止。Then, in step 100 , the entire configuration data is looped through, ie the identification of channel elements and the reading of configuration data for the channel elements are continued until all configuration data have been read.
然后,在步骤102、104、106中,用于每个通道元素的有效载荷数据被读取,并且最后在步骤108中利用配置数据C进行解码,其中有效载荷数据由D表示。步骤108的结果为由例如块16a至16d输出的数据,然后该数据可以被直接送出至扬声器,或者该数据被同步化、放大、进一步处理或数字/模拟转换以最终被发送至相应扬声器。Then, in steps 102, 104, 106, the payload data for each channel element is read and finally decoded in step 108 using the configuration data C, where the payload data is represented by D. The result of step 108 is the data output by, for example, blocks 16a to 16d, which can then be sent directly to the loudspeaker, or the data can be synchronized, amplified, further processed or digital/analog converted to ultimately be sent to the corresponding loudspeaker.
虽然已经在设备的上下文中描述了一些方面,但是清楚的是这些方面还表示相应方法的描述,其中块或装置与方法步骤或方法步骤的特征相对应。类似地,在方法步骤的上下文中描述的方面也表示相应块的描述或相应装置的项目或特征的描述。Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of a corresponding method, where a block or device corresponds to a method step or a feature of a method step. Similarly, aspects described in the context of a method step also represent a description of a corresponding block or a description of an item or feature of a corresponding device.
取决于某些实现要求,可以以硬件或软件实现本发明的实施方式。可以使用如下数字储存介质来执行该实现方式:例如,软盘、数字化通用磁盘(DVD)、光盘(CD)、只读存储器(ROM)、可编程只读存储器(PROM)、可擦可编程只读存储器(EPROM)、电可擦可编程只读存储器(EEPROM)或闪存,在该数字储存介质上存储有电可读控制信号,该电可读控制信号与可编程计算机系统协作(或能够与可编程计算机系统协作)使得执行各种方法。Depending on certain implementation requirements, embodiments of the present invention can be implemented in hardware or software. The implementation can be performed using a digital storage medium such as a floppy disk, a digital versatile disk (DVD), a compact disk (CD), a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory, on which electronically readable control signals are stored, which cooperate with a programmable computer system (or are capable of cooperating with a programmable computer system) to cause the execution of the various methods.
根据本发明的一些实施方式包括具有电可读控制信号的非暂态数据载体,该电可读控制信号与可编程计算机系统协作,使得执行本文所述的方法之一。Some embodiments according to the invention comprise a non-transitory data carrier having electronically readable control signals, which cooperate with a programmable computer system, such that one of the methods described herein is performed.
所编码的音频信号可以经由有线或无线传输介质进行传输,或者可以存储在机器可读载体或非暂态存储介质上。The encoded audio signal may be transmitted via a wired or wireless transmission medium, or may be stored on a machine-readable carrier or a non-transitory storage medium.
通常,本发明的实施方式可以被实现为具有程序代码的计算机程序产品,当在计算机上运行计算机程序产品时,该程序代码可操作为执行所述的方法之一。程序代码可以例如存储在机器可读载体上。Generally, embodiments of the present invention can be implemented as a computer program product with a program code, which, when the computer program product runs on a computer, is operative to perform one of the methods described. The program code can, for example, be stored on a machine-readable carrier.
其它实施方式包括存储在机器可读载体上的用于执行本文所述的方法之一的计算机程序。Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
换言之,本发明方法的实施方式因此为如下计算机程序:当在计算机上运行该计算机程序时,该计算机程序具有的程序代码用于执行本文所述的方法之一。In other words, an embodiment of the method according to the invention is, therefore, a computer program having a program code for performing one of the methods described herein, when it runs on a computer.
因此,本发明方法的又一实施方式为如下数据载体(或数字储存介质或计算机可读介质):其包括记录于其上的用于执行本文所述的方法之一的计算机程序。A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
因此,本发明方法的又一实施方式为表示用于执行本文所述的方法之一的计算机程序的数据流或信号序列。该数据流或信号序列可以例如被配置成经由数据通信连接如经由因特网进行传输。Therefore, a further embodiment of the inventive method is a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or signal sequence can, for example, be configured to be transmitted via a data communication connection, such as via the Internet.
又一实施方式包括可以被配置成或适于执行本文所述的方法之一的处理装置,如计算机或可变成逻辑器件。A further embodiment comprises a processing means, such as a computer or a programmable logic device, which may be configured to or adapted to perform one of the methods described herein.
又一实施方式包括其上安装有用于执行本文所述的方法之一的计算机程序的计算机。A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
在一些实施方式中,可编程逻辑器件(例如现场可编程门阵列)可以用于执行本文所描述的方法的部分或全部功能。在一些实施方式中,现场可编程门阵列可以与微处理器协作以执行本文所述的方法之一。通常,该方法优选地由任何硬件装置执行。In some embodiments, a programmable logic device (e.g., a field programmable gate array) can be used to perform some or all of the functions of the methods described herein. In some embodiments, a field programmable gate array can collaborate with a microprocessor to perform one of the methods described herein. Typically, the method is preferably performed by any hardware device.
上述实施方式仅说明本发明的原理。要理解,本文所描述的布置和细节的修改及变型对本领域技术人员将是明显的。因此,其意在仅受限于审查中的专利权利要求的范围,而非受限于通过本文中的实施方式的描述和说明所提出的具体细节。The above embodiments are merely illustrative of the principles of the present invention. It is to be understood that modifications and variations of the arrangements and details described herein will be apparent to those skilled in the art. Accordingly, it is intended that the present invention be limited only by the scope of the pending patent claims and not by the specific details presented by the description and illustration of the embodiments herein.
本公开内容还包括以下技术方案。The present disclosure also includes the following technical solutions.
方案1.一种用于对经编码的音频信号进行解码的音频解码器,所述经编码的音频信号包括:在数据流的有效载荷区段中的第一通道元素和第二通道元素;以及在所述数据流的配置区段中的用于所述第一通道元素的第一解码器配置数据和用于所述第二通道元素的第二解码器配置数据,所述音频解码器包括:Solution 1. An audio decoder for decoding an encoded audio signal, the encoded audio signal comprising: a first channel element and a second channel element in a payload section of a data stream; and first decoder configuration data for the first channel element and second decoder configuration data for the second channel element in a configuration section of the data stream, the audio decoder comprising:
数据流读取器,所述数据流读取器用于读取所述配置区段中的用于每个通道元素的所述配置数据,并且用于读取所述有效载荷区段中的每个通道元素的所述有效载荷数据;a data stream reader for reading the configuration data for each channel element in the configuration section and for reading the payload data for each channel element in the payload section;
可配置解码器,所述可配置解码器用于对所述多个通道元素进行解码;以及a configurable decoder configured to decode the plurality of channel elements; and
配置控制器,所述配置控制器用于配置所述可配置解码器,以使得在对所述第一通道元素进行解码时根据所述第一解码器配置数据来配置所述可配置解码器,而在对所述第二通道元素进行解码时根据所述第二解码器配置数据来配置所述可配置解码器。A configuration controller is configured to configure the configurable decoder so that the configurable decoder is configured according to the first decoder configuration data when decoding the first channel element, and is configured according to the second decoder configuration data when decoding the second channel element.
方案2.根据方案1所述的音频解码器,Solution 2. The audio decoder according to Solution 1,
其中,所述第一通道元素为包括第一输出通道的有效载荷数据的单通道元素,以及wherein the first channel element is a single channel element including payload data of the first output channel, and
其中,所述第二通道元素为包括第二输出通道和第三输出通道的有效载荷数据的通道对元素,The second channel element is a channel pair element including payload data of the second output channel and the third output channel,
其中,所述可配置解码器被布置为在对所述第一通道元素进行解码时生成单个输出通道,而在对所述第二通道元素进行解码时生成两个输出通道,以及wherein the configurable decoder is arranged to generate a single output channel when decoding the first channel element and to generate two output channels when decoding the second channel element, and
其中,所述音频解码器被配置成经由三个不同的音频输出通道来输出用于同时输出的所述第一输出通道、所述第二输出通道和所述第三输出通道。The audio decoder is configured to output the first output channel, the second output channel, and the third output channel for simultaneous output via three different audio output channels.
方案3.根据方案1或2所述的音频解码器,Solution 3. The audio decoder according to solution 1 or 2,
其中,所述第一通道为中心通道,并且其中,所述第二通道和所述第三通道为左通道和右通道、或者左环绕通道和右环绕通道。The first channel is a center channel, and the second channel and the third channel are a left channel and a right channel, or a left surround channel and a right surround channel.
方案4.根据方案1所述的音频解码器,Solution 4. The audio decoder according to Solution 1,
其中,所述第一通道元素为包括第一输出通道和第二输出通道的数据的第一通道对元素,并且其中,所述第二通道元素为包括第三输出通道和第四输出通道的有效载荷数据的第二通道对元素,wherein the first channel element is a first channel pair element including data of a first output channel and a second output channel, and wherein the second channel element is a second channel pair element including payload data of a third output channel and a fourth output channel,
其中,所述可配置解码器被配置成在对所述第一通道元素进行解码时生成第一输出通道和第二输出通道,而在对所述第二通道元素进行解码时生成第三输出通道和第四输出通道,以及wherein the configurable decoder is configured to generate a first output channel and a second output channel when decoding the first channel element, and to generate a third output channel and a fourth output channel when decoding the second channel element, and
其中,所述音频解码器被配置成针对用于不同音频输出通道的同时输出线来输出所述第一输出通道、所述第二输出通道、所述第三输出通道和所述第四输出通道。Wherein the audio decoder is configured to output the first output channel, the second output channel, the third output channel and the fourth output channel for simultaneous output lines for different audio output channels.
方案5.根据方案4所述的音频解码器,Solution 5. The audio decoder according to Solution 4,
其中,所述第一通道为左通道,所述第二通道为右通道,所述第三通道为左环绕通道,以及所述第四通道为右环绕通道。The first channel is a left channel, the second channel is a right channel, the third channel is a left surround channel, and the fourth channel is a right surround channel.
方案6.根据前述方案中的一项所述的音频解码器,Solution 6. The audio decoder according to one of the preceding solutions,
其中,所述经编码的音频信号在所述数据流的所述配置区段中还包括通用配置区段,所述通用配置区段具有用于所述第一通道元素和所述第二通道元素的信息,并且其中,所述配置控制器被布置成用来自所述通用配置区段的所述配置信息来针对所述第一通道元素和所述第二通道元素配置所述可配置解码器。wherein the encoded audio signal further comprises a general configuration segment in the configuration segment of the data stream, the general configuration segment having information for the first channel element and the second channel element, and wherein the configuration controller is arranged to configure the configurable decoder for the first channel element and the second channel element with the configuration information from the general configuration segment.
方案7.根据前述方案中的一项所述的音频解码器,Solution 7. The audio decoder according to one of the preceding solutions,
其中,所述第一配置区段与所述第二配置区段不同,以及wherein the first configuration section is different from the second configuration section, and
其中,所述配置控制器被布置为:与在对所述第一通道元素进行解码时所使用的配置不同地配置所述可配置解码器以对所述第二通道元素进行解码。Wherein the configuration controller is arranged to configure the configurable decoder to decode the second channel element differently than the configuration used when decoding the first channel element.
方案8.根据前述方案中的一项所述的音频解码器,Solution 8. The audio decoder according to one of the preceding solutions,
其中,所述第一解码器配置数据和所述第二解码器配置数据包括关于立体声解码工具、核心解码工具或频谱带宽复制解码工具的信息,以及wherein the first decoder configuration data and the second decoder configuration data include information on a stereo decoding tool, a core decoding tool, or a spectrum bandwidth replication decoding tool, and
其中,所述可配置解码器包括所述频谱带宽复制解码工具、所述核心解码工具以及所述立体声解码工具。The configurable decoder includes the spectrum bandwidth replication decoding tool, the core decoding tool, and the stereo decoding tool.
方案9.根据前述方案中的一项所述的音频解码器,Solution 9. The audio decoder according to one of the preceding solutions,
其中,所述有效载荷区段包括帧序列,每个帧包括所述第一通道元素和所述第二通道元素,以及wherein the payload section comprises a sequence of frames, each frame comprising the first channel element and the second channel element, and
其中,用于所述第一通道元素的所述第一解码器配置数据和用于所述第二通道元素的所述第二解码器配置数据与所述帧序列相关联,wherein the first decoder configuration data for the first channel element and the second decoder configuration data for the second channel element are associated with the frame sequence,
其中,所述配置控制器被配置成针对所述帧序列中的每个帧来配置所述可配置解码器,以使得使用所述第一解码器配置数据来对每个帧中的所述第一通道元素进行解码,并且使用所述第二解码器配置数据来对每个帧中的所述第二通道元素进行解码。The configuration controller is configured to configure the configurable decoder for each frame in the frame sequence so that the first channel element in each frame is decoded using the first decoder configuration data, and the second channel element in each frame is decoded using the second decoder configuration data.
方案10.根据前述方案中的一项所述的音频解码器,Solution 10. The audio decoder according to one of the preceding solutions,
其中,所述数据流为串行数据流,并且所述配置区段依次包括用于多个通道元素的解码器配置数据,以及wherein the data stream is a serial data stream, and the configuration section sequentially includes decoder configuration data for a plurality of channel elements, and
其中,所述有效载荷区段以相同次序包括所述多个通道元素的有效载荷数据。The payload section includes the payload data of the multiple channel elements in the same order.
方案11.根据前述方案中的一项所述的音频解码器,Solution 11. The audio decoder according to one of the preceding solutions,
其中,所述配置区段包括后面跟随有所述第一解码器配置数据的第一通道元素标识,和后面跟随有所述第二解码器配置数据的第二通道元素标识,其中,所述数据流读取器被布置成对所有元素循环下述处理:顺序经过所述第一通道元素标识并且顺序读取用于该通道元素的所述第一解码器配置数据,以及顺序经过所述第二通道元素标识并且顺序读取所述第二解码器配置数据。wherein the configuration segment comprises a first channel element identifier followed by the first decoder configuration data, and a second channel element identifier followed by the second decoder configuration data, wherein the data stream reader is arranged to loop the following processing for all elements: sequentially passing through the first channel element identifier and sequentially reading the first decoder configuration data for the channel element, and sequentially passing through the second channel element identifier and sequentially reading the second decoder configuration data.
方案12.根据前述方案中的一项所述的音频解码器,Solution 12. The audio decoder according to one of the preceding solutions,
其中,所述可配置解码器包括多个并行解码器实例,wherein the configurable decoder comprises a plurality of parallel decoder instances,
其中,所述配置控制器被布置成使用所述第一解码器配置数据来配置所述第一解码器实例,并且使用所述第二解码器配置数据来配置所述第二解码器实例,以及wherein the configuration controller is arranged to configure the first decoder instance using the first decoder configuration data and to configure the second decoder instance using the second decoder configuration data, and
其中,所述数据流读取器被布置成将所述第一通道元素的有效载荷数据转发至所述第一解码器实例,并且将所述第二通道元素的有效载荷数据转发至所述第二解码器实例。Wherein the data stream reader is arranged to forward the payload data of the first channel element to the first decoder instance, and to forward the payload data of the second channel element to the second decoder instance.
方案13.根据方案12所述的音频解码器,Solution 13. The audio decoder according to Solution 12,
其中,所述有效载荷区段包括有效载荷帧序列,以及The payload section includes a payload frame sequence, and
其中所述数据流读取器被配置成将来自当前被处理的帧的每个通道元素的数据仅转发至由用于该通道元素的所述配置数据所配置的相应解码器实例。wherein the data stream reader is configured to forward data from each channel element of a currently processed frame only to the corresponding decoder instance configured by the configuration data for that channel element.
方案14.一种用于对经编码的音频信号进行解码的方法,所述经编码的音频信号包括:在数据流的有效载荷区段中的第一通道元素和第二通道元素;以及在所述数据流的配置区段中的用于所述第一通道元素的第一解码器配置数据和用于所述第二通道元素的第二解码器配置数据,所述方法包括:Embodiment 14. A method for decoding an encoded audio signal, the encoded audio signal comprising: a first channel element and a second channel element in a payload section of a data stream; and first decoder configuration data for the first channel element and second decoder configuration data for the second channel element in a configuration section of the data stream, the method comprising:
读取所述配置区段中的用于每个通道元素的所述配置数据,并且读取所述有效载荷区段中的每个通道元素的所述有效载荷数据;reading the configuration data for each channel element in the configuration section, and reading the payload data for each channel element in the payload section;
由可配置解码器对所述多个通道元素进行解码;以及decoding, by a configurable decoder, the plurality of channel elements; and
对所述可配置解码器进行配置,以使得在对所述第一通道元素进行解码时根据所述第一解码器配置数据来配置所述可配置解码器,而在对所述第二通道元素进行解码时根据所述第二解码器配置数据来配置所述可配置解码器。The configurable decoder is configured such that when decoding the first channel element, the configurable decoder is configured according to the first decoder configuration data, and when decoding the second channel element, the configurable decoder is configured according to the second decoder configuration data.
方案15.一种用于对多通道音频信号进行编码的音频编码器,包括:Solution 15. An audio encoder for encoding a multi-channel audio signal, comprising:
配置处理器,所述配置处理器用于生成用于第一通道元素的第一配置数据和用于第二通道元素的第二配置数据;a configuration processor configured to generate first configuration data for a first channel element and second configuration data for a second channel element;
可配置编码器,所述可配置编码器用于利用所述第一配置数据和所述第二配置数据来对所述多通道音频信号进行编码,以获得所述第一通道元素和所述第二通道元素;以及a configurable encoder configured to encode the multi-channel audio signal using the first configuration data and the second configuration data to obtain the first channel element and the second channel element; and
数据流生成器,所述数据流生成器用于生成表示经编码的音频信号的数据流,所述数据流具有配置区段和有效载荷区段,所述配置区段具有所述第一配置数据和所述第二配置数据,所述有效载荷区段包括所述第一通道元素和所述第二通道元素。A data stream generator is configured to generate a data stream representing an encoded audio signal, the data stream having a configuration section and a payload section, the configuration section having the first configuration data and the second configuration data, the payload section including the first channel element and the second channel element.
方案16.一种用于对多通道音频信号进行编码的方法,包括:Solution 16. A method for encoding a multi-channel audio signal, comprising:
生成用于第一通道元素的第一配置数据和用于第二通道元素的第二配置数据;generating first configuration data for a first channel element and second configuration data for a second channel element;
利用所述第一配置数据和所述第二配置数据,通过可配置编码器来对所述多通道音频信号进行编码,以获得所述第一通道元素和所述第二通道元素;以及Encoding the multi-channel audio signal by a configurable encoder using the first configuration data and the second configuration data to obtain the first channel element and the second channel element; and
生成表示经编码的音频信号的数据流,所述数据流具有配置区段和有效载荷区段,所述配置区段具有所述第一配置数据和所述第二配置数据,所述有效载荷区段包括所述第一通道元素和所述第二通道元素。A data stream representing an encoded audio signal is generated, the data stream having a configuration section having the first configuration data and the second configuration data and a payload section including the first channel element and the second channel element.
方案17.一种计算机程序,所述计算机程序在计算机上运行时执行根据方案14或方案16所述的方法。Solution 17. A computer program, which, when executed on a computer, performs the method according to Solution 14 or Solution 16.
方案18.一种经编码的音频信号,包括:Solution 18. An encoded audio signal comprising:
配置区段,所述配置区段具有用于第一通道元素的第一解码器配置数据和用于第二通道元素的第二解码器配置数据,通道元素是多通道音频信号的单个通道或两个通道的编码表示;以及a configuration section having first decoder configuration data for a first channel element and second decoder configuration data for a second channel element, a channel element being an encoded representation of a single channel or two channels of a multi-channel audio signal; and
有效载荷区段,所述有效载荷区段包括所述第一通道元素和所述第二通道元素的有效载荷数据。A payload section includes payload data of the first channel element and the second channel element.
Claims (2)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201161454121P | 2011-03-18 | 2011-03-18 | |
| US61/454,121 | 2011-03-18 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1245491A1 HK1245491A1 (en) | 2018-08-24 |
| HK1245491B true HK1245491B (en) | 2021-10-22 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN103620679B (en) | Audio encoder and decoder with flexible configuration capabilities | |
| CN112614497B (en) | Audio encoder for encoding and audio decoder for decoding | |
| HK1245491B (en) | Computer-readable medium | |
| AU2012230415B9 (en) | Audio encoder and decoder having a flexible configuration functionality | |
| RU2575390C2 (en) | Audio encoder and decoder having flexible configuration functionalities |