HK40102011A

HK40102011A - Apparatus and method for stereo filling in multichannel coding

Info

Publication number: HK40102011A
Application number: HK42024089493.1A
Authority: HK
Inventors: 萨沙·迪克; 克里斯汀·赫姆瑞希; 尼古拉斯·里特尔博谢; 弗洛里安·舒; 理查德·福格; 弗雷德里克·纳格尔
Original assignee: 弗劳恩霍夫应用研究促进协会
Priority date: 2016-02-17
Filing date: 2024-04-02
Publication date: 2024-05-31

Description

Apparatus and methods for stereo fill in multichannel encoding

本申请是申请日为2017年02月14日且题为“用于多声道编码中的立体声填充的装置和方法”的国际申请PCT/EP2017/053272所对应的中国国家申请(申请号：201780023524.4，进入中国国家阶段日期：2018年10月12日)的分案申请。This application is a divisional application of the Chinese national application (application number: 201780023524.4, date of entry into the Chinese national phase: October 12, 2018) corresponding to the international application PCT/EP2017/053272 entitled "Apparatus and method for stereo fill in multichannel coding" filed on February 14, 2017.

技术领域Technical Field

本发明涉及音频信号编码，具体而言，涉及用于多声道编码中的立体声填充的装置和方法。This invention relates to audio signal encoding, and more particularly, to apparatus and methods for stereo filling in multichannel encoding.

背景技术Background Technology

音频编码属于压缩领域，涉及利用音频信号中的冗余和不相关性。Audio coding belongs to the field of compression and involves utilizing redundancy and uncorrelatedness in audio signals.

在MPEG USAC中(参见例如[3])，使用复数预测、MPS2-1-2或具有频带受限或全频带残余信号的统一立体声来执行两个声道的联合立体声编码。MPEG环绕(参见例如[4])分层地组合一对二(OTT)和二对三(TTT)框，用于多声道音频的联合编码，而无论有或没有残差信号的传输。In MPEG USAC (see, for example [3]), joint stereo coding of two channels is performed using complex prediction, MPS2-1-2, or unified stereo with band-limited or full-band residual signals. MPEG surround (see, for example [4]) layer-wise combines one-to-two (OTT) and two-to-three (TTT) frames for joint coding of multichannel audio, regardless of whether residual signals are transmitted.

在MPEG-H中，四声道元素分层地应用MPS2-1-2立体声框，然后是复数预测/MS立体声框，构建固定的4×4再混合树(参见例如[1])。In MPEG-H, the four-channel elements are layered and applied with MPS2-1-2 stereo frames, followed by complex prediction/MS stereo frames, to construct a fixed 4×4 remix tree (see, for example, [1]).

AC4(参见例如[6])引入了新的3声道元素、4声道元素和5声道元素，其允许仅有发送的混合矩阵和随后的联合立体声编码信息来重新混合所发送的声道。此外，先前公开文献提出使用诸如Karhunen-Loeve变换(KLT)之类的正交变换用于增强型多声道音频编码(参见例如[7])。AC4 (see, for example [6]) introduces new 3-channel, 4-channel, and 5-channel elements, which allow the transmitted channels to be remixed using only the transmitted mixing matrix and subsequent joint stereo coding information. In addition, previous publications have proposed using orthogonal transforms such as the Karhunen-Loeve transform (KLT) for enhanced multichannel audio coding (see, for example [7]).

例如，在3D音频情况下，扬声器声道分布在若干高度层，结果产生水平和垂直声道对。如在USAC中定义，仅两个声道的联合编码不足以考虑声道之间的空间和感知关系。在附加前处理/后处理步骤中应用MPEG环绕，在不可能进行联合立体声编码的情况下个体地发送残差信号，例如以利用左垂直残差信号和右垂直残差信号之间的相依性。在AC-4中引入了专用N-声道元素，其允许联合编码参数的有效编码，但未能用于针对新的沈浸式回放情境(7.1+4、22.2)所提出的具有较多声道的一般性扬声器设置。MPEG-H四声道元素也限于仅4个声道并且无法动态地应用于任意声道，而仅应用于预先配置且固定数量的声道。For example, in 3D audio, speaker channels are distributed across several height levels, resulting in horizontal and vertical channel pairs. As defined in USAC, joint coding of only two channels is insufficient to account for the spatial and perceptual relationships between channels. MPEG surround is applied in additional pre-processing/post-processing steps to individually transmit residual signals when joint stereo coding is not possible, for example, to utilize the dependency between the left and right vertical residual signals. AC-4 introduced a dedicated N-channel element, which allows for efficient coding of joint coding parameters, but it was not suitable for general speaker setups with multiple channels proposed for new immersive playback scenarios (7.1+4, 22.2). The MPEG-H four-channel element is also limited to only four channels and cannot be dynamically applied to arbitrary channels, but only to a pre-configured and fixed number of channels.

MPEG-H多声道编码工具允许产生离散编码立体声框子(亦即联合编码声道对)的任意树，参考[2]。The MPEG-H multichannel coding tool allows the generation of arbitrary trees of discrete coded stereo frames (i.e., jointly coded channel pairs), see [2].

音频信号编码中常见的问题是因量化(例如，频谱量化)而引起的。量化可能导致频谱空穴。例如，在特定频带中的所有频谱值可以在编码器侧被设置为零，作为量化结果。例如，这种谱线的确切值在量化之前可以相当低并且然后量化可能会导致如下情况，其中例如特定频带内的所有谱线的频谱值已被设置为零。当解码时，在解码器侧，这可能导致非期望的频谱空穴。A common problem in audio signal coding is caused by quantization (e.g., spectral quantization). Quantization can lead to spectral holes. For example, all spectral values in a particular frequency band can be set to zero on the encoder side as a result of quantization. For instance, the exact values of such spectral lines might be quite low before quantization, and then quantization might result in a situation where, for example, the spectral values of all spectral lines within a particular frequency band have been set to zero. When decoding, this can lead to undesirable spectral holes on the decoder side.

现代频域语音/音频编码系统(例如，IETF的Opus/Celt编解码器[9]、MPEG-4(HE-)AAC[10]、或特别地MPEG-D xHE-AAC(USAC)[11])提供了取决于信号的时间稳定性而使用一个长变换-长区块-或八个顺序短变换-短区块-来编码音频帧的手段。此外，对于低比特率编码，这些方案提供了使用相同声道的伪随机噪声或低频系数来重构声道的频率系数的工具。在xHE-AAC中，这些工具分别称作噪声填充和频谱带复制。Modern frequency-domain speech/audio coding systems (e.g., the IETF's Opus/Celt codec [9], MPEG-4 (HE-)AAC [10], or particularly MPEG-D xHE-AAC (USAC) [11]) provide means of encoding audio frames using a long transform-long block or eight sequential short transform-short blocks, depending on the temporal stability of the signal. Furthermore, for low bit-rate coding, these schemes provide tools for reconstructing the frequency coefficients of a channel using pseudo-random noise or low-frequency coefficients from the same channel. In xHE-AAC, these tools are referred to as noise filling and spectral band duplication, respectively.

然而，对于非常有音调或瞬时的立体声输入，单独噪声填充和/或频谱带复制限制了在极低比特率下可以达到的编码质量，这主要是因为需要明确地发送两个声道的许多频谱系数。However, for very tonal or transient stereo inputs, individual noise padding and/or spectral band duplication limit the coding quality achievable at extremely low bit rates, primarily because many spectral coefficients from both channels need to be explicitly transmitted.

MPEG-H立体声填充是参数工具，其通过使用先前帧的降混以改善在频域中因量化引起的频谱空穴的填充。类似噪声填充，立体声填充直接在MPEG-H核心编码器的MDCT域中操作，参考[1]、[5]、[8]。MPEG-H stereo fill is a parametric tool that improves the filling of spectral holes caused by quantization in the frequency domain by using downmixing of previous frames. Similar to noise fill, stereo fill operates directly in the MDCT domain of the MPEG-H core encoder, see [1], [5], [8].

然而，在MPEG-H中使用MPEG环绕和立体声填充受限于固定的声道对元素，因此无法利用时变声道间相依性。However, the use of MPEG surround and stereo fill in MPEG-H is limited by fixed channel pairs, thus preventing the utilization of time-varying inter-channel dependencies.

MPEG-H中的多声道编码工具(MCT)允许适应各种声道间相依性，但由于典型操作配置中使用单个声道元素，因此不允许立体声填充。现有技术并未公开感知优化的方法以在时变的任意联合编码声道对的情况下生成先前帧的降混。组合MCT使用噪声填充作为立体声填充的替代以填充频谱空穴将导致噪声伪影，特别是对于调性信号尤为如此。The Multichannel Coding Tool (MCT) in MPEG-H allows for adaptation to various inter-channel dependencies, but stereo fill is not permitted due to the use of a single channel element in typical operating configurations. Existing techniques do not disclose perceptually optimized methods for generating downmixing of previous frames in the case of time-varying arbitrary jointly coded channel pairs. Combining MCT with noise fill as an alternative to stereo fill to fill spectral holes will result in noise artifacts, especially for tonal signals.

发明内容Summary of the Invention

本发明的目的是提出改善的音频编码构思。由根据本申请示例实施例的用于解码的装置、由根据本申请示例实施例的用于编码的装置、由根据本申请示例实施例的用于解码的方法、由根据本申请示例实施例的用于编码的方法、由根据本申请示例实施例的计算机程序并通过根据本申请示例实施例的编码的多声道信号来实现本发明的目的。The object of this invention is to propose an improved audio coding concept. This object is achieved by an apparatus for decoding according to an example embodiment of this application, an apparatus for encoding according to an example embodiment of this application, a method for decoding according to an example embodiment of this application, a method for encoding according to an example embodiment of this application, a computer program according to an example embodiment of this application, and by encoding multichannel signals according to an example embodiment of this application.

提出一种用于对当前帧的编码的多声道信号进行解码以获得三个或更多个当前音频输出声道的装置。多声道处理器适于根据第一多声道参数从三个或更多个解码的声道中选择两个解码的声道。此外，所述多声道处理器适于基于所述所选声道生成第一组两个或更多个处理的声道。噪声填充模块适于针对所述所选声道中的至少一个声道来标识其内部所有谱线被量化为零的一个或多个频带，并且适于根据辅助信息使用已解码的的三个或更多个先前音频输出声道的适当子集来生成混合声道，并且适于以使用所述混合声道的谱线生成的噪声来填充其内部所有谱线被量化为零的频带的谱线。An apparatus is proposed for decoding an encoded multichannel signal of a current frame to obtain three or more current audio output channels. A multichannel processor is adapted to select two decoded channels from the three or more decoded channels according to a first multichannel parameter. Furthermore, the multichannel processor is adapted to generate a first set of two or more processed channels based on the selected channels. A noise filling module is adapted to identify one or more frequency bands within at least one of the selected channels where all spectral lines are quantized to zero, and is adapted to generate a mixed channel using an appropriate subset of the three or more previously decoded audio output channels according to auxiliary information, and is adapted to fill the spectral lines of the frequency bands within the mixed channel with noise generated using the spectral lines of the mixed channel.

根据实施例，提出一种用于对先前帧的先前编码的多声道信号进行解码以获得三个或更多个先前音频输出声道并且用于对当前帧的当前编码的多声道信号进行解码以获得三个或更多个当前音频输出声道的装置。According to an embodiment, an apparatus is provided for decoding a previously encoded multichannel signal of a previous frame to obtain three or more previous audio output channels and for decoding a currently encoded multichannel signal of a current frame to obtain three or more current audio output channels.

所述装置包括接口、声道解码器、用于生成所述三个或更多个当前音频输出声道的多声道处理器、以及噪声填充模块。The device includes an interface, a channel decoder, a multichannel processor for generating the three or more current audio output channels, and a noise filling module.

所述接口适于接收所述当前编码的多声道信号，并且适于接收包括第一多声道参数的辅助信息。The interface is adapted to receive the currently encoded multichannel signal and to receive auxiliary information including first multichannel parameters.

所述声道解码器适于对所述当前帧的所述当前编码的多声道信号进行解码以获得所述当前帧的三个或更多个解码的声道集合。The channel decoder is adapted to decode the currently encoded multichannel signal of the current frame to obtain a set of three or more decoded channels for the current frame.

所述多声道处理器适于根据所述第一多声道参数从所述三个或更多个解码的声道的集合中选择第一所选两个解码的声道对。The multichannel processor is adapted to select a first selected pair of two decoded channels from the set of three or more decoded channels based on the first multichannel parameter.

此外，所述多声道处理器适于基于所述第一所选两个解码的声道对生成第一组两个或更多个处理的声道，以获得更新后的三个或更多个解码的声道集合。Furthermore, the multichannel processor is adapted to generate a first set of two or more processed channels based on the first selected two decoded channel pairs to obtain an updated set of three or more decoded channels.

在所述多声道处理器基于所述第一所选两个解码的声道对生成所述第一对两个或更多个处理的声道之前，所述噪声填充模块适于针对所述第一所选两个解码的声道对的两个声道中的至少一个声道来标识其内部所有谱线被量化为零的一个或多个频带，并且适于使用所述三个或更多个先前音频输出声道中的两个或更多个但非所有声道生成混合声道，并且适于以使用所述混合声道的谱线生成的噪声来填充其内部所有谱线被量化为零的所述一个或多个频带的谱线，其中，所述噪声填充模块适于根据所述辅助信息从所述三个或更多个先前音频输出声道中选择用于生成所述混合声道的两个或更多个先前音频输出声道。Before the multichannel processor generates the first pair of two or more processed channels based on the first selected two decoded channel pairs, the noise filling module is adapted to identify one or more frequency bands in which all spectral lines of the first selected two decoded channel pairs are quantized to zero for at least one of the two channels, and is adapted to generate a mixed channel using two or more, but not all, of the three or more previous audio output channels, and is adapted to fill the spectral lines of the one or more frequency bands in which all spectral lines of the mixed channel are quantized to zero with noise generated using the spectral lines of the mixed channel, wherein the noise filling module is adapted to select two or more previous audio output channels from the three or more previous audio output channels for generating the mixed channel according to the auxiliary information.

具体说明如何生成和填充噪声的噪声填充模块可以采用的实施例的具体构思被称作立体声填充。The specific concept of an embodiment of how a noise filling module can generate and fill noise is referred to as stereo filling.

此外，提出一种用于对具有至少三个声道的多声道信号进行编码的装置。Furthermore, an apparatus for encoding multichannel signals having at least three channels is proposed.

所述装置包括迭代处理器，适于在第一迭代步骤中，计算所述至少三个声道中的每对声道之间的声道间相关值，用于在所述第一迭代步骤中，选择具有最高值或具有高于阈值的值的声道对，并且用于使用多声道处理操作处理所选声道对，以导出所选声道对的初始多声道参数并导出第一处理的声道。The apparatus includes an iterative processor adapted to, in a first iterative step, calculate interchannel correlation values between each pair of channels in the at least three channels, to select, in the first iterative step, a channel pair having the highest value or a value above a threshold, and to process the selected channel pair using a multichannel processing operation to derive initial multichannel parameters of the selected channel pair and derive the channel of the first processing.

所述迭代处理器适于在第二迭代步骤中使用所述处理的声道中的至少一个处理的声道进行所述计算、所述选择和所述处理以导出其它的多声道参数和第二处理的声道。The iterative processor is adapted to perform the calculation, selection, and processing in a second iterative step using at least one of the processed channels to derive other multichannel parameters and a second processed channel.

此外，所述装置包括声道编码器，适于对通过所述迭代处理器执行的迭代处理所得的声道进行编码以获得编码的声道。Furthermore, the device includes a channel encoder adapted to encode the channels obtained by the iterative processing performed by the iterative processor to obtain encoded channels.

此外，所述装置包括输出接口，适于生成编码的多声道信号，所述编码的多声道信号具有所述编码的声道、所述初始多声道参数和所述其它的多声道参数，并且具有指示用于解码的装置是否须以基于先前解码的音频输出声道所生成的噪声填充其内部所有谱线被量化为零的一个或多个频带的谱线的信息，所述先前解码的音频输出声道先前已经被所述用于解码的装置解码。Furthermore, the device includes an output interface adapted to generate an encoded multichannel signal having the encoded channels, the initial multichannel parameters, and the other multichannel parameters, and having information indicating whether the decoding device should fill the spectral lines of one or more frequency bands where all spectral lines are quantized to zero with noise generated based on previously decoded audio output channels that have previously been decoded by the decoding device.

此外，提出一种用于对先前帧的先前编码的多声道信号进行解码以获得三个或更多个先前音频输出声道并且用于对当前帧的当前编码的多声道信号进行解码以获得三个或更多个当前音频输出声道的方法。所述方法包括：Furthermore, a method is proposed for decoding a previously encoded multichannel signal of a previous frame to obtain three or more previous audio output channels and for decoding a currently encoded multichannel signal of the current frame to obtain three or more current audio output channels. The method includes:

-接收所述当前编码的多声道信号，并且接收包括第一多声道参数的辅助信息。- Receive the currently encoded multichannel signal and receive auxiliary information including the first multichannel parameters.

-对所述当前帧的所述当前编码的多声道信号进行解码以获得所述当前帧的三个或更多个解码的声道集合。- Decode the currently encoded multichannel signal of the current frame to obtain a set of three or more decoded channels for the current frame.

-根据所述第一多声道参数从所述三个或更多个解码的声道的集合中选择第一所选两个解码的声道对。- Select a first pair of two decoded channels from the set of three or more decoded channels according to the first multi-channel parameter.

-基于所述第一所选两个解码的声道对生成第一组两个或更多个处理的声道，以获得更新后的三个或更多个解码的声道集合。- Generate a first set of two or more processed channels based on the first selected two decoded channel pairs to obtain an updated set of three or more decoded channels.

在基于所述第一所选两个解码的声道对生成所述第一对两个或更多个处理的声道之前，进行以下步骤：Before generating the first pair of two or more processed channels based on the first selected two decoded channel pairs, the following steps are performed:

-针对所述第一所选两个解码的声道对的两个声道中的至少一个声道来标识其内部所有谱线被量化为零的一个或多个频带，并且使用所述三个或更多个先前音频输出声道中的两个或更多个但非所有声道生成混合声道，并且以使用所述混合声道的谱线生成的噪声来填充其内部所有谱线被量化为零的所述一个或多个频带的谱线，其中，根据所述辅助信息进行从所述三个或更多个先前音频输出声道中选择用于生成所述混合声道的两个或更多个先前音频输出声道。- For at least one of the two channels of the first selected two decoded channel pairs, identify one or more frequency bands in which all spectral lines within the channel are quantized to zero, and generate a mixed channel using two or more, but not all, of the three or more previous audio output channels, and fill the spectral lines of the one or more frequency bands in which all spectral lines within the channel are quantized to zero with noise generated using the spectral lines of the mixed channel, wherein two or more previous audio output channels are selected from the three or more previous audio output channels to generate the mixed channel according to the auxiliary information.

此外，提出一种用于对具有至少三个声道的多声道信号进行编码的方法。所述方法包括：Furthermore, a method for encoding multichannel signals having at least three channels is proposed. The method includes:

-在第一迭代步骤中，计算所述至少三个声道中的每对声道之间的声道间相关值，用于在所述第一迭代步骤中，选择具有最高值或具有高于阈值的值的声道对，并且使用多声道处理操作处理所选声道对以导出用于所选声道对的初始多声道参数并导出第一处理的声道。- In the first iteration step, the interchannel correlation value between each pair of channels in the at least three channels is calculated, which is used to select the channel pair with the highest value or a value higher than a threshold in the first iteration step, and the selected channel pair is processed using a multichannel processing operation to derive the initial multichannel parameters for the selected channel pair and derive the channel of the first processing.

-在第二迭代步骤中，使用所述处理的声道中的至少一个声道进行所述计算、所述选择和所述处理以导出其它的多声道参数和第二处理的声道。- In the second iteration step, the calculation, selection, and processing are performed using at least one of the processed channels to derive other multichannel parameters and the second processed channel.

-对通过所述迭代处理器执行的迭代处理所得的声道进行编码以获得编码的声道。以及- The audio channels obtained from the iterative processing performed by the iterative processor are encoded to obtain encoded audio channels.

-生成编码的多声道信号，所述编码的多声道信号具有所述编码的声道、所述初始多声道参数和所述其它的多声道参数，并且具有指示用于解码的装置是否须以基于先前解码的音频输出声道所生成的噪声填充其内部所有谱线被量化为零的一个或多个频带的谱线的信息，所述先前解码的音频输出声道先前已经被所述用于解码的装置解码。- Generate an encoded multichannel signal having the encoded channels, the initial multichannel parameters, and the other multichannel parameters, and having information indicating whether the decoding device needs to fill the spectral lines of one or more frequency bands where all spectral lines are quantized to zero with noise generated based on previously decoded audio output channels that have been previously decoded by the decoding device.

此外，提出一种计算机程序，其中所述计算机程序中的每个被配置为当在计算机或信号处理器上执行时用于实施上述方法之一，使得通过所述计算机程序之一实施上述方法中的每种方法。Furthermore, a computer program is proposed, wherein each of the computer programs is configured to implement one of the above methods when executed on a computer or signal processor, such that each of the above methods is implemented by one of the computer programs.

此外，提出一种编码的多声道信号。所述编码的多声道信号包括编码的声道和多声道参数以及指示所述用于解码的装置是否须以基于先前解码的音频输出声道所生成的频谱数据填充其内部所有谱线被量化为零的一个或多个频带的谱线的信息，所述先前解码的音频输出声道先前已经被所述用于解码的装置解码。Furthermore, an encoded multichannel signal is proposed. The encoded multichannel signal includes encoded channels and multichannel parameters, as well as information indicating whether the decoding device needs to fill the spectral lines of one or more frequency bands where all spectral lines are quantized to zero with spectral data generated based on previously decoded audio output channels that have been previously decoded by the decoding device.

附图说明Attached Figure Description

下文中，将参照附图进一步详细描述本发明的实施例，在附图中：In the following, embodiments of the present invention will be described in further detail with reference to the accompanying drawings, in which:

图1a示出了根据一个实施例的用于解码的装置；Figure 1a illustrates an apparatus for decoding according to one embodiment;

图1b示出了根据另一实施例的用于解码的装置；Figure 1b illustrates an apparatus for decoding according to another embodiment;

图2示出了根据本申请的一个实施例的参数频域解码器的框图；Figure 2 shows a block diagram of a parametric frequency domain decoder according to an embodiment of this application;

图3示出了示意图，其例示了形成多声道音频信号的声道的频谱图的频谱序列，以便易于理解对图2的解码器的描述；Figure 3 shows a schematic diagram illustrating the spectral sequence of the spectrum of the channels that form a multi-channel audio signal, in order to facilitate understanding of the description of the decoder in Figure 2.

图4示出了示意图，其例示了图3中示出的频谱图中的当前频谱，以帮助理解图2的描述；Figure 4 shows a schematic diagram illustrating the current spectrum in the spectrum diagram shown in Figure 3 to help understand the description in Figure 2;

图5a和图5b示出了根据替代实施例的参数频域音频解码器的框图，根据该替代实施例将先前帧的降混用作声道间噪声填充的基础；Figures 5a and 5b show block diagrams of a parametric frequency domain audio decoder according to an alternative embodiment, which uses downmixing of previous frames as the basis for interchannel noise filling.

图6示出了根据一个实施例参数频域音频编码器的框图；Figure 6 shows a block diagram of a parametric frequency domain audio encoder according to one embodiment;

图7示出了根据一个实施例的用于对具有至少三个声道的多声道信号进行编码的装置的示意性框图；Figure 7 shows a schematic block diagram of an apparatus for encoding a multichannel signal having at least three channels according to one embodiment;

图8示出了根据一个实施例的用于对具有至少三个声道的多声道信号进行编码的装置的示意性框图；Figure 8 shows a schematic block diagram of an apparatus for encoding a multichannel signal having at least three channels according to one embodiment;

图9示出了根据一个实施例的立体声框子的示意性框图；Figure 9 shows a schematic block diagram of a stereo frame according to one embodiment;

图10示出了根据一个实施例的用于对具有编码的声道和至少两个多声道参数的编码的多声道信号进行解码的装置的示意性框图；Figure 10 shows a schematic block diagram of an apparatus for decoding an encoded multichannel signal having encoded channels and at least two multichannel parameters according to one embodiment.

图11示出了根据一个实施例的用于对具有至少三个声道的多声道信号进行编码的方法的流程图；Figure 11 shows a flowchart of a method for encoding a multichannel signal having at least three channels according to one embodiment;

图12示出了根据一个实施例的用于对具有编码的声道和至少两个多声道参数的编码的多声道信号进行解码的方法的流程图；Figure 12 shows a flowchart of a method for decoding an encoded multichannel signal having encoded channels and at least two multichannel parameters according to one embodiment;

图13示出了根据一个实施例的系统；Figure 13 illustrates a system according to one embodiment;

图14示出了根据一个实施例的在情境(a)中在情境中针对第一帧对组合声道的生成，和在情境(b)中针对第一帧之后的第二帧对组合声道的生成；以及Figure 14 illustrates the generation of the combined audio channels for a first frame in scenario (a) and the generation of the combined audio channels for a second frame following the first frame in scenario (b) according to one embodiment; and

图15示出了根据实施例的用于多声道参数的检索方案。Figure 15 illustrates a retrieval scheme for multi-channel parameters according to an embodiment.

在下面的描述中用相同或等效附图标记表示相同或等效元素或具有相同或等效功能的元素。In the following description, the same or equivalent reference numerals are used to denote the same or equivalent elements or elements having the same or equivalent functions.

具体实施方式Detailed Implementation

在下面的描述中，阐述了许多细节以提供对本发明的实施例更加透彻的解释。然而，对于本领域技术人员显而易见的是，可以在没有这些具体细节的情况下实践本发明的实施例。在其它情况下，公知结构和设备是以框图形式示出而非以细节示出，以免使本发明的实施例难以理解。此外，除非另外特别指出，否则下文描述的不同实施例的特征可以相互组合。In the following description, numerous details are set forth to provide a more thorough explanation of embodiments of the invention. However, it will be apparent to those skilled in the art that embodiments of the invention can be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form rather than in detail so as not to obscure embodiments of the invention. Furthermore, unless otherwise specifically indicated, features of the different embodiments described below can be combined with each other.

在描述图1a的用于解码的装置201之前，首先，描述用于多声道音频编码的噪声填充。在实施例中，图1a的噪声填充模块220例如可以被配置为进行下面针对用于多声道音频编码的噪声填充描述的技术中的一种或多种。Before describing the decoding apparatus 201 of FIG1a, noise filling for multichannel audio encoding will first be described. In an embodiment, the noise filling module 220 of FIG1a may, for example, be configured to perform one or more of the techniques described below for noise filling for multichannel audio encoding.

图2示出了根据本申请的一个实施例的频域音频解码器。解码器一般使用附图标记10指示并且包括比例因子带标识符12、解量化器14、噪声填充器16和逆变换器18以及谱线提取器20和比例因子提取器22。解码器10可以包括的可选的另外元素涵盖复数立体声预测器24、MS(中-侧)解码器26和逆时间噪声成形(TNS)滤波器工具28，其两个实例28a和28b在图2中示出。此外，下面使用附图标记30更详细地示出了降混提供器并且描绘了其轮廓。Figure 2 illustrates a frequency domain audio decoder according to one embodiment of this application. The decoder is generally indicated by reference numeral 10 and includes a scaling factor band identifier 12, a dequantizer 14, a noise filler 16, an inverse transformer 18, a spectral extractor 20, and a scaling factor extractor 22. Optional additional elements that the decoder 10 may include cover a complex stereo predictor 24, an MS (middle-side) decoder 26, and an inverse time noise shaping (TNS) filter tool 28, two examples of which 28a and 28b are shown in Figure 2. Furthermore, a downmixing provider is shown in more detail below using reference numeral 30, and its outline is depicted.

图2的频域音频解码器10是支持噪声填充的参数解码器，根据其使用某个零量化比例因子带的比例因子用噪声填充该比例因子带，作为控制被填充在该比例因子带中的噪声的水平的手段。除此之外，图2的解码器10表示被配置为从输入数据流30重构多声道音频信号的多声道音频解码器。然而，图2侧重于对被编码成数据流30的多声道音频信号中的一个进行重构所涉及的解码器10的元素，并在输出端32处输出此(输出)声道。附图标记34指示解码器10可以包括另外的元素或可以包括负责重构多声道音频信号的其它声道的一些管线操作控制，其中下面的描述指示解码器10在输出端32处对感兴趣的声道的重构如何与其它声道的解码交互。The frequency-domain audio decoder 10 of Figure 2 is a parametric decoder that supports noise filling. It fills a scaling factor band with noise according to a scaling factor of a certain zero-quantization scaling factor band as a means of controlling the level of noise filled in that scaling factor band. Furthermore, the decoder 10 of Figure 2 represents a multi-channel audio decoder configured to reconstruct a multi-channel audio signal from the input data stream 30. However, Figure 2 focuses on the elements of the decoder 10 involved in reconstructing one of the multi-channel audio signals encoded into the data stream 30 and outputting this (output) channel at output 32. Reference numeral 34 indicates that the decoder 10 may include additional elements or may include some pipeline operation control responsible for reconstructing other channels of the multi-channel audio signal, wherein the following description indicates how the reconstruction of the channel of interest by the decoder 10 at output 32 interacts with the decoding of other channels.

数据流30表示的多声道音频信号可以包括两个或更多个声道。在下文中，对本申请的实施例的描述集中在多声道音频信号只包括两个声道的立体声情况，但是原则上下面提出的实施例可以容易地转移到涉及包括多于两个声道的多声道音频信号及其编码的替代实施例。The multichannel audio signal represented by data stream 30 may include two or more channels. In the following description of embodiments of this application, the focus is on the stereo case where the multichannel audio signal includes only two channels; however, in principle, the embodiments presented below can be readily adapted to alternative embodiments involving multichannel audio signals comprising more than two channels and their encoding.

根据如下对图2的描述将更加清楚的是，图2的解码器10是变换解码器。换言之，根据解码器10的编码技术，例如使用声道的重叠变换在变换域中对声道进行编码。此外，取决于音频信号的产生装置，存在仅仅因其间的微小或决定性变化而偏离彼此的时间相位(在其期间，音频信号的声道主要表示相同音频内容)，该变化例如是不同的振幅和/或相位以便表示如下音频场景，其中声道之间的差异使得音频场景的音频源能够相对于与多声道音频信号的输出声道相关联的虚拟扬声器位置进行虚拟定位。然而，在一些其它时间相位，音频信号的不同声道可能或多或少彼此不相关并且甚至例如可以表示完全不同的音频源。As will become clearer from the following description of Figure 2, the decoder 10 in Figure 2 is a transform decoder. In other words, according to the encoding technique of decoder 10, the channels are encoded in the transform domain, for example, using an overlap transform of the channels. Furthermore, depending on the audio signal generating device, there exist time phase deviations between each other due only to small or decisive variations (during which the channels of the audio signal primarily represent the same audio content). These variations are, for example, different amplitudes and/or phases to represent audio scenes where the differences between channels allow the audio source of the audio scene to be virtually located relative to the virtual speaker positions associated with the output channels of the multi-channel audio signal. However, at some other time phases, the different channels of the audio signal may be more or less unrelated to each other and may even, for example, represent completely different audio sources.

为了考虑音频信号的声道之间的可能的时变关系，图2的解码器10的音频编解码器允许对不同测量的时变使用以利用声道间冗余。例如，MS编码允许在以下两者之间切换：将立体声音频信号的左和右声道表示为其自身，或者将其表示为分别表示左和右声道的降混及其减半差的一对M(中)和S(侧)声道。换言之，存在连续地(就频谱时间意义而言)由数据流30发送的两个声道的频谱图，但这些(发送的)声道的意义可以分别随时间并且相对于输出声道而改变。To account for possible time-varying relationships between the channels of the audio signal, the audio codec of decoder 10 in Figure 2 allows for the use of time-varying measurements of different quantities to take advantage of inter-channel redundancy. For example, MS encoding allows switching between representing the left and right channels of the stereo audio signal as themselves, or representing them as a pair of M (center) and S (side) channels representing the downmixing of the left and right channels and their halving difference, respectively. In other words, there are spectrum diagrams of two channels that are continuously (in the spectral time sense) transmitted by data stream 30, but the meaning of these (transmitted) channels can change over time and relative to the output channels.

复数立体声预测(另一种声道间冗余利用工具)使得能够在频域中，使用一个声道的频谱上共同定位线来预测另一声道的频域系数或谱线。下面将描述与此有关的更多细节。Complex stereo prediction (another tool for utilizing inter-channel redundancy) enables the prediction of frequency domain coefficients or spectral lines of another channel using common localization lines on the spectrum of one channel. More details on this will be described below.

为了帮助理解后续对图2以及其中示出的组件的描述，图3针对由数据流30表示的立体声音频信号的示例性情况，示出了如何可以将两个声道的谱线的样本值编码成数据流30以便由图2的解码器10处理的可能的方法。特别地，虽然在图3的上半部分描绘了立体声音频信号的第一声道的频谱图示40，但图3的下半部分例示了立体声音频信号的另一声道的频谱图示42。而且，值得注意的是，频谱图示40和42的“含义”可随着时间的推移而改变，这是因为例如在MS编码域与非MS编码域之间的时变切换。在第一情况下，频谱图示40和42分别关于M和S声道，而在后一情况下，频谱图示40和42关于左和右声道。可以在数据流30中用信号通知MS编码域与非MS编码域之间的切换。To aid in understanding the subsequent description of Figure 2 and the components shown therein, Figure 3 illustrates, for an exemplary case of a stereo audio signal represented by data stream 30, a possible method for encoding sample values of the spectral lines of the two channels into data stream 30 for processing by the decoder 10 of Figure 2. Specifically, while the upper half of Figure 3 depicts a spectrum diagram 40 of the first channel of the stereo audio signal, the lower half illustrates a spectrum diagram 42 of the other channel of the stereo audio signal. Moreover, it is noteworthy that the “meaning” of spectrum diagrams 40 and 42 can change over time due to, for example, time-varying switching between the MS coding domain and the non-MS coding domain. In the first case, spectrum diagrams 40 and 42 relate to the M and S channels, respectively, while in the latter case, they relate to the left and right channels. The switching between the MS coding domain and the non-MS coding domain can be signaled in data stream 30.

图3示出了可以以时变频谱时间分辨率将频谱图示40和42编码成数据流30。例如，(发送)声道两者可以以时间对齐方式被细分成使用大括号44指示的帧序列，这些帧可以同样长并且彼此邻接但不重叠。如前所述，频谱图示40和42在数据流30中表示的频谱分辨率可随着时间而改变。初始，假设对于频谱图示40和42，频谱时间分辨率随时间相同地改变，但此简化的扩展也可行，根据下面的描述这将变得显而易见。例如以帧44为单位在数据流30中用信号通知频谱时间分辨率的改变。换言之，频谱时间分辨率以帧44为单位改变。通过切换各个帧44内用于描述频谱图示40和42的变换的数量和变换长度来实现频谱图示40和42的频谱时间分辨率的改变。在图3的示例中，帧44a和44b示例性地说明了其中已经使用一个长变换对其中的音频信号的声道进行采样的帧，由此导致最高频谱分辨率，其中每个声道针对每一帧每个谱线一个谱线样本值。在图3中，使用框内的小十字指示谱线的样本值，其中这些框又排列成行和列，且表示频谱时间网格，每一行对应于一条谱线并且每一列对应于帧44的与形成频谱图示40和42所涉及的最短变换相对应的子间隔。特别地，图3例如针对帧44d例示了一帧可交替地经受较短长度的连续变换，由此针对诸如帧44d之类的这种帧，得到若干个时间上随后的降低频谱分辨率的频谱。针对帧44d示例性地使用八个短变换，结果导致在彼此隔开的谱线处在该帧42d内对频谱图示40和42的频谱时间采样，使得只有每隔七条谱线被填入，但是以用于变换帧44d的具有较短长度的八个变换窗口或变换中的每个的样本值填入。出于例示的目的，在图3中示出了用于一帧的其它数量的变换也是可行的，例如使用其变换长度例如是用于帧44a和44b的长变换的变换长度的一半的两个变换，由此得到频谱时间网格或频谱图示40和42的采样，其中每隔一条谱线获得两个谱线样本值，其中一个涉及首变换，另一个涉及尾变换。Figure 3 illustrates how spectrograms 40 and 42 can be encoded into data stream 30 with a time-varying spectral time resolution. For example, the (transmit) channels can be subdivided in a time-aligned manner into a sequence of frames indicated by braces 44, which can be of equal length and adjacent to each other without overlapping. As previously stated, the spectral resolution represented by spectrograms 40 and 42 in data stream 30 can change over time. Initially, it is assumed that the spectral time resolution changes identically over time for spectrograms 40 and 42, but this simplified extension is also possible, as will become apparent from the following description. For example, the change in spectral time resolution can be signaled in data stream 30 in units of frame 44. In other words, the spectral time resolution changes in units of frame 44. The change in the spectral time resolution of spectrograms 40 and 42 is achieved by switching the number and length of the transforms used to describe spectrograms 40 and 42 within each frame 44. In the examples of Figure 3, frames 44a and 44b exemplarily illustrate frames in which the audio signals have been sampled for each channel using a long transform, resulting in the highest spectral resolution, where each channel has one spectral line sample value for each spectral line per frame. In Figure 3, small crosses within boxes indicate the sample values of the spectral lines, where these boxes are arranged in rows and columns and represent a spectral time grid, with each row corresponding to one spectral line and each column corresponding to a sub-interval of frame 44 corresponding to the shortest transform involved in forming the spectral diagrams 40 and 42. In particular, Figure 3 illustrates, for example, frame 44d, a frame that can alternately undergo successive transforms of shorter lengths, thereby resulting in several temporally subsequent spectra with reduced spectral resolution for frames such as frame 44d. Using eight short transforms for frame 44d as an example results in spectral time sampling of spectrograms 40 and 42 within frame 42d at spectral lines spaced apart from each other, such that only every seven spectral lines are filled, but with sample values of each of the eight transform windows of shorter length used to transform frame 44d, or transforms. For illustrative purposes, Figure 3 shows that other numbers of transforms for a frame are also possible, for example, using two transforms whose transform length is, for example, half the transform length of the long transforms used for frames 44a and 44b, thereby obtaining spectral time grids or sampling of spectrograms 40 and 42, where two spectral line sample values are obtained every other spectral line, one involving the first transform and the other involving the last transform.

使用重叠窗口状线将其中帧被细分的用于变换的变换窗口例示在图3中每个频谱图下方。时间重叠例如用于TDAC(时域混迭抵消)目的。The transform windows used for transforming, in which frames are subdivided, are illustrated below each spectrogram in Figure 3 using overlapping window lines. Temporal overlap is used, for example, for TDAC (Temporal Alias Cancellation) purposes.

虽然下面描述的实施例也可以以另一种方式实施，但图3例示了以以下方式来执行针对个体帧44在不同频谱时间分辨率之间的切换的情况：使得对于每一帧44，频谱图示40和频谱图示42得到图3中由小十字指示的相同数量的谱线值，差异仅在于这些线频谱时间采样与相应帧44相对应的相应频谱时间片块(tile)的方式，其在时间上跨据相应帧44的时间，并且在频谱上跨据零频率至最大频率f_max。While the embodiments described below can also be implemented in another manner, Figure 3 illustrates the case in which switching between different spectral time resolutions for an individual frame 44 is performed in such a way that for each frame 44, spectrum diagrams 40 and 42 obtain the same number of spectral line values indicated by the small crosses in Figure 3, the only difference being the way these lines are sampled in spectral time in the manner corresponding to the corresponding spectral time tiles of the corresponding frame 44, which span the time of the corresponding frame 44 in time and span the frequency from zero to the maximum frequency _fmax in the spectrum.

使用图3中的箭头，图3针对帧44d例示了通过使一个声道的一帧内属于相同谱线但短变换窗口的谱线样本值，适当地分布于该帧内未被占据的(空的)谱线直到同一帧的下一个被占据的谱线，所有帧44可以获得类似的频谱。这种所得频谱在下文中称作“交织频谱”。在交织一个声道的一帧的n个变换时，例如，在频谱上随后的谱线的n个短变换的n个频谱上共同定位的谱线值的集合跟随其后之前，该n个短变换的n个频谱上共同定位的谱线值彼此跟随。交织的中间形式也可行：替代交织一帧的所有谱线系数，只交织帧44d的短变换的适当子集的谱线系数将可行。总而言之，每当讨论与频谱图示40和42相对应的两个声道的帧的频谱时，这些频谱可以指交织频谱或非交织频谱。Using the arrows in Figure 3, Figure 3 illustrates, for frame 44d, how a similar spectrum can be obtained for all frames 44 by appropriately distributing the sample values of spectral lines belonging to the same spectral line but with short transform windows within a frame of one channel across the unoccupied (empty) spectral lines of that frame until the next occupied spectral line in the same frame. This resulting spectrum is referred to below as the “interleaved spectrum.” When interleaving n transforms of a frame of one channel, for example, the spectral line values that are commonly located on the n spectra of the subsequent n short transforms of the spectral lines follow each other before the set of spectral line values that are commonly located on the n spectra of the subsequent spectral lines follows one another. An intermediate form of interleaving is also feasible: instead of interleaving all the spectral line coefficients of a frame, it would be feasible to interleave only the spectral line coefficients of a suitable subset of the short transforms of frame 44d. In summary, whenever discussing the spectra of the frames of two channels corresponding to the spectrum diagrams 40 and 42, these spectra can refer to either interleaved or non-interleaved spectra.

为了经由被发送到解码器10的数据流30有效地对表示频谱图示40和42的谱线系数进行编码，这些谱线系数被量化。为了频谱时间地控制量化噪声，经由在某个频谱时间网格中设置的比例因子来控制量化阶大小。特别地，在每个频谱图的每个频谱序列内，谱线被分组成频谱上连续的非重叠比例因子群组。图4在其上半部分示出了频谱图示40的频谱46，以及频谱图示42的共时频谱48。如图示出，频谱46和48沿频谱轴f被细分成比例因子带，以便将谱线分组成非重叠群组。在图4中用大括号50例示比例因子带。为了简单起见，假设比例因子带之间的边界在频谱46和48之间重合，但并非必须是这种情况。To efficiently encode the spectral line coefficients representing spectrograms 40 and 42 via the data stream 30 sent to decoder 10, these spectral line coefficients are quantized. To control quantization noise in a spectral-temporal manner, the quantization order size is controlled via a scaling factor set in a certain spectral time grid. Specifically, within each spectral sequence of each spectrogram, spectral lines are grouped into consecutive, non-overlapping scaling factor groups on the spectrum. Figure 4 shows spectrum 46 of spectrogram 40 and the synchronous spectrum 48 of spectrogram 42 in its upper half. As shown, spectrograms 46 and 48 are subdivided into scaling factor bands along the spectral axis f to group the spectral lines into non-overlapping groups. The scaling factor bands are illustrated in Figure 4 by braces 50. For simplicity, it is assumed that the boundaries between the scaling factor bands coincide between spectrograms 46 and 48, but this is not necessarily the case.

即，通过以数据流30编码，频谱图示40和42均被细分成频谱的时间序列并且这些频谱中的每个在频谱上被细分成比例因子带，并且针对每个比例因子带，数据流30编码或传递有关与相应比例因子带相对应的比例因子的信息。使用相应比例因子对落入相应比例因子带50内的谱线系数进行量化，或考虑解码器10时，可以使用对应比例因子带的比例因子对其进行解量化。That is, by encoding with data stream 30, both spectrograms 40 and 42 are subdivided into time series of spectra, and each of these spectra is further subdivided into scaling factor bands. For each scaling factor band, data stream 30 encodes or transmits information about the scaling factor corresponding to the corresponding scaling factor band. The spectral coefficients falling within the corresponding scaling factor band 50 are quantized using the corresponding scaling factor, or, considering decoder 10, they can be dequantized using the scaling factor of the corresponding scaling factor band.

在回到图2及其描述之前，在下文中假设经特别处理的声道，亦即，其解码涉及图2的解码器的特定元素(34除外)的声道，是频谱图示40的发送声道，如前文所述，该发送声道可以表示左和右声道、M声道或S声道中的一个，其中假设被编码成数据流30的多声道音频信号是立体声音频信号。Before returning to Figure 2 and its description, it is assumed below that the specially processed channel, that is, the channel whose decoding involves a specific element (other than 34) of the decoder in Figure 2, is the transmit channel of the spectrum diagram 40, which, as mentioned above, can represent one of the left and right channels, the M channel, or the S channel, wherein it is assumed that the multichannel audio signal encoded into data stream 30 is a stereo audio signal.

虽然谱线提取器20被配置为从数据流30提取谱线数据，亦即，帧44的谱线系数，但比例因子提取器22被配置为针对每一帧44提取对应的比例因子。为此，提取器20和22可使用熵解码。根据一个实施例，比例因子提取器22被配置为使用上下文适应性熵解码从数据流30顺序地提取例如图4中的频谱46的比例因子，亦即比例因子带50的比例因子。顺序解码的顺序可遵循在比例因子带中定义的例如从低频至高频的频谱顺序。比例因子提取器22可使用上下文适应性熵解码并且可取决于在当前提取的比例因子的频谱邻域中已提取的比例因子，诸如取决于紧邻在前比例因子带的比例因子，而确定每个比例因子的上下文。备选地，比例因子提取器22在基于先前已解码比例因子中的任何比例因子(例如，紧邻的先前比例因子)预测当前解码的比例因子的同时，例如，可以使用差分解码从数据流30预测地解码比例因子。值得注意的是，针对属于由零量化谱线排他地填充的或由其中的至少一个被量化至非零值的谱线填充的比例因子带的比例因子，该比例因子提取过程是不可知的。属于只由零量化谱线填充的比例因子带的比例因子可作为以下两者：可以用作对可能属于谱线(其中一个非零)填入的比例因子带的随后已解码比例因子的预测基础，且可以基于可能属于谱线(其中一个非零)填入的比例因子带的先前已解码比例因子进行预测。While spectral line extractor 20 is configured to extract spectral data from data stream 30, i.e., the spectral coefficients of frame 44, scale factor extractor 22 is configured to extract the corresponding scale factor for each frame 44. For this purpose, extractors 20 and 22 can use entropy decoding. According to one embodiment, scale factor extractor 22 is configured to sequentially extract, for example, the scale factor of spectrum 46 in FIG. 4, i.e., the scale factor of scale factor band 50, from data stream 30 using context-adaptive entropy decoding. The order of sequential decoding may follow a spectral order defined in the scale factor band, for example, from low to high frequencies. Scale factor extractor 22 can use context-adaptive entropy decoding and may determine the context of each scale factor based on scale factors already extracted in the spectral neighborhood of the currently extracted scale factor, such as depending on the scale factors of the immediately preceding scale factor band. Alternatively, scale factor extractor 22 may predictively decode the scale factor from data stream 30, for example, using differential decoding, while predicting the currently decoded scale factor based on any scale factors of previously decoded scale factors (e.g., immediately preceding scale factors). It is worth noting that the scaling factor extraction process is agnostic for scaling factor bands that belong to bands exclusively filled by zero-quantized spectral lines or those filled by at least one of them quantized to a non-zero value. Scaling factors belonging to scaling factor bands filled only by zero-quantized spectral lines can serve as both a basis for predicting subsequently decoded scaling factors that may belong to scaling factor bands filled by spectral lines (one of which is non-zero) and a basis for prediction based on previously decoded scaling factors that may belong to scaling factor bands filled by spectral lines (one of which is non-zero).

仅仅是出于完整性，注意谱线提取器20提取谱线系数，同样例如使用熵编码和/或预测编码用所述谱线系数填入比例因子带50。熵编码可基于当前被解码的谱线系数的频谱时间邻域中的谱线系数使用上下文适应性，同样，预测可以是基于其频谱时间邻域中先前已解码的谱线系数预测当前被解码的谱线系数的频谱预测、时间预测、或频谱时间预测。为了提高编码效率，谱线提取器20可以被配置为以元组执行对谱线或线系数的解码，其沿频率轴收集或分组谱线。For completeness only, note that the spectral line extractor 20 extracts spectral line coefficients, similarly using entropy coding and/or predictive coding to fill the scaling factor band 50 with these coefficients. Entropy coding can use context adaptability based on spectral line coefficients in the spectral temporal neighborhood of the currently decoded spectral line coefficient; similarly, prediction can be a spectral prediction, temporal prediction, or spectral-temporal prediction of the currently decoded spectral line coefficient based on previously decoded spectral line coefficients in its spectral temporal neighborhood. To improve coding efficiency, the spectral line extractor 20 can be configured to perform decoding of spectral lines or line coefficients in tuples, collecting or grouping spectral lines along the frequency axis.

因此，在谱线提取器20的输出端，谱线系数例如以诸如频谱46之类的频谱为单位提供，其收集例如对应帧的所有谱线系数，或备选地收集对应帧的某些短变换的所有谱线系数。在比例因子提取器22的输出端，转而输出相应频谱的对应比例因子。Therefore, at the output of the spectral line extractor 20, the spectral line coefficients are provided, for example, in units of a spectrum such as spectrum 46, which collects, for example, all spectral line coefficients of the corresponding frame, or alternatively, all spectral line coefficients of certain short transforms of the corresponding frame. At the output of the scaling factor extractor 22, the corresponding scaling factor of the corresponding spectrum is output instead.

比例因子带标识符12以及解量化器14具有耦合至谱线提取器20的输出端的谱线输入端，并且解量化器14和噪声填充器16具有耦合至比例因子提取器22的输出端的比例因子输入端。比例因子带标识符12被配置为标识在当前频谱46内的所谓零量化比例因子带，亦即，在其内部所有谱线被量化为零的比例因子带，例如图4中的比例因子带50c，和该频谱的在其内至少一条谱线被量化至非零的其余比例因子带。特别地，在图4中，使用图4中的影线区指示谱线系数。从该图中可见，在频谱46中，所有比例因子带(但比例因子带50b除外)具有至少一个谱线，其谱线系数被量化至非零值。稍后将变得清楚的是，诸如50d之类的零量化比例因子带形成了声道间噪声填充的对象，下文将进一步进行描述。在继续描述之前，注意比例因子带标识符12可将其标识只限于比例因子带50的适当子集，诸如限于高于某个开始频率52的比例因子带。在图4中，这将标识过程限于比例因子带50d、50e和50f。The scaling factor band identifier 12 and the dequantizer 14 have spectral line inputs coupled to the output of the spectral line extractor 20, and the dequantizer 14 and the noise filler 16 have scaling factor inputs coupled to the output of the scaling factor extractor 22. The scaling factor band identifier 12 is configured to identify a so-called zero-quantization scaling factor band within the current spectrum 46, i.e., a scaling factor band in which all spectral lines are quantized to zero, such as scaling factor band 50c in FIG. 4, and the remaining scaling factor bands of the spectrum in which at least one spectral line is quantized to a non-zero value. Specifically, in FIG. 4, the shaded areas in FIG. 4 are used to indicate spectral line coefficients. As can be seen from this figure, in spectrum 46, all scaling factor bands (except scaling factor band 50b) have at least one spectral line whose spectral line coefficient is quantized to a non-zero value. It will become clear later that zero-quantization scaling factor bands such as 50d form the object of inter-channel noise filling, which will be described further below. Before proceeding, note that the scale factor band identifier 12 can be limited to an appropriate subset of scale factor bands 50, such as scale factor bands above a certain starting frequency 52. In Figure 4, this limits the identification process to scale factor bands 50d, 50e, and 50f.

比例因子带标识符12通知噪声填充器16关于作为零量化比例因子带的这些比例因子带。解量化器14使用与输入频谱46相关联的比例因子，以便根据相关联比例因子，亦即，与比例因子带50相关联的比例因子，解量化、或缩放频谱46的谱线的谱线系数。特别地，解量化器14使用与相应比例因子带相关联的比例因子来解量化和缩放落入相应比例因子带内的谱线系数。图4应解释为示出了谱线的解量化结果。Scale factor band identifier 12 informs noise filler 16 about these scale factor bands that serve as zero-quantization scale factor bands. Dequantizer 14 uses the scale factor associated with the input spectrum 46 to dequantize or scale the spectral line coefficients of the spectrum 46 according to the associated scale factor, i.e., the scale factor associated with scale factor band 50. In particular, dequantizer 14 uses the scale factor associated with the corresponding scale factor band to dequantize and scale the spectral line coefficients falling within the corresponding scale factor band. Figure 4 should be interpreted as illustrating the dequantization results of the spectral lines.

噪声填充器16获得与零量化比例因子带(其形成下面噪声填充的对象)、解量化频谱以及至少这些被标识为零量化比例因子带的比例因子带的比例因子和从当前帧的数据流30获得的揭示是否要针对当前帧执行声道间噪声填充的信号通知有关的信息。Noise filler 16 obtains information relating to the zero-quantization scaling factor band (which forms the object of the noise fill below), the dequantization spectrum, and the scaling factor of at least these scaling factor bands identified as zero-quantization scaling factor bands, as well as a signal notification obtained from the data stream 30 of the current frame indicating whether interchannel noise fill should be performed for the current frame.

下面的示例中描述的声道间噪声填充过程实际上涉及两种类型的噪声填充，亦即将已被量化为零的所有谱线(而与其潜在的成员无关)涉及的本底噪声54插入任何零量化比例因子带，以及实际声道间噪声填充过程。虽然在下文中描述了这种组合，但须强调的是，根据替代实施例可以省略本底噪声的插入。此外，涉及关于当前帧的噪声填充启动和关闭并且从数据流30获得的信号通知可只与声道间噪声填充有关，或者可一起控制两种噪声填充类型的组合。The interchannel noise filling process described in the example below actually involves two types of noise filling: inserting the background noise 54, which involves all spectral lines that have been quantized to zero (and is independent of their potential members), into any zero-quantization scaling factor band, and the actual interchannel noise filling process. Although this combination is described below, it should be emphasized that the insertion of the background noise can be omitted according to alternative embodiments. Furthermore, the signal notifications concerning the start and stop of noise filling in the current frame and obtained from data stream 30 can be related only to interchannel noise filling, or can control a combination of the two noise filling types together.

至于本底噪声插入，噪声填充器16可如下操作。特别地，噪声填充器16可采用人工噪声生成，例如伪随机数生成器或一些其它随机源以便填充谱线，其谱线系数为零。可根据用于当前帧或当前频谱46的数据流30内的明确信令设置如此插入在零量化谱线处的本底噪声54的水平。可使用例如均方根(RMS)或能量测量来确定本底噪声54的“水平”。Regarding the background noise insertion, the noise filler 16 can operate as follows. Specifically, the noise filler 16 can employ artificial noise generation, such as a pseudo-random number generator or some other random source, to fill the spectral lines with zero spectral coefficients. The level of the background noise 54 thus inserted at the zero-quantization spectral line can be set according to explicit signaling within the data stream 30 for the current frame or current spectrum 46. The “level” of the background noise 54 can be determined using, for example, root mean square (RMS) or energy measurements.

因此本底噪声插入表示针对已被标识为零量化比例因子带的这些比例因子带(例如，图4中的比例因子带50d)的一种预填充。其还影响超出零量化比例因子带的其它比例因子带，但前者进一步经受以下声道间噪声填充。如下所述，声道间噪声填充过程用于填充零量化比例因子带直到经由相应零量化比例因子带的比例因子控制的水平。前者可以直接用于该目的，这是因为相应零量化比例因子带的所有谱线都被量化为零。尽管如此，数据流30可以针对每一帧或每个频谱46包含参数的附加信号通知，其通常被应用于对应帧或频谱46的所有零量化比例因子带的比例因子，且当通过噪声填充器16被应用于零量化比例因子带的比例因子上时，结果导致针对零量化比例因子带单独的相应填充水平。换言之，噪声填充器16可以针对频谱46的每个零量化比例因子带，利用相同的修改函数，使用用于当前帧的频谱46的在数据流30中包含的前述参数来修改相应比例因子带的比例因子，以便获得就能量或RMS进行测量的相应零量化比例因子带的填充目标水平，例如，声道间噪声填充过程应以(可选地)附加噪声(除了本底噪声54之外)填充相应零量化比例因子带所达到的水平。Therefore, the background noise insertion represents a pre-filling for these scaling factor bands (e.g., scaling factor band 50d in Figure 4) that have been identified as zero-quantization scaling factor bands. It also affects other scaling factor bands beyond the zero-quantization scaling factor bands, but the former are further subjected to the following inter-channel noise filling. As described below, the inter-channel noise filling process is used to fill the zero-quantization scaling factor bands up to the level controlled by the scaling factor of the corresponding zero-quantization scaling factor band. The former can be used directly for this purpose because all spectral lines of the corresponding zero-quantization scaling factor band are quantized to zero. Nevertheless, the data stream 30 may include additional signaling for each frame or each spectrum 46 containing parameters, which are typically applied to the scaling factors of all zero-quantization scaling factor bands of the corresponding frame or spectrum 46, and when applied to the scaling factors of the zero-quantization scaling factor bands via the noise filler 16, this results in a separate corresponding fill level for each zero-quantization scaling factor band. In other words, noise filler 16 can use the same modification function to modify the scaling factor of the corresponding scaling factor band for each zero-quantization scaling factor band of spectrum 46, using the aforementioned parameters included in data stream 30 for spectrum 46 of the current frame, in order to obtain the target filling level of the corresponding zero-quantization scaling factor band as measured in terms of energy or RMS. For example, the inter-channel noise filling process should fill the corresponding zero-quantization scaling factor band to the level achieved by (optionally) adding noise (in addition to the background noise 54).

具体地，为了执行声道间噪声填充56，噪声填充器16在已经大部分或完全解码的状态下获得另一声道的频谱48的频谱共同定位部分，并将频谱48的所获得部分复制到零量化比例因子带，对于其该部分在频谱上共同定位并以这样的方式缩放，即通过对相应比例因子带的谱线进行积分得出的在该零量化比例因子带内产生的总噪声水平等于从零量化比例因子带的比例因子获得的上述填充目标水平。通过这种措施，与人为产生的噪声(例如，形成噪声本底54的基础的噪声)相比，填充到相应零量化比例因子带中的噪声的音调得到改善，并且也优于从相同频谱46内的极低频率线的非受控频谱拷贝/复制46。Specifically, to perform inter-channel noise filling 56, the noise filler 16 obtains the spectral colocalization portion of the spectrum 48 of another channel after it has been largely or fully decoded, and copies the obtained portion of spectrum 48 to a zero-quantization scaling factor band. This portion is colocalized spectrally and scaled in such a way that the total noise level generated within the zero-quantization scaling factor band, obtained by integrating the spectral lines of the corresponding scaling factor band, is equal to the aforementioned target filling level obtained from the scaling factor of the zero-quantization scaling factor band. Through this measure, the tone of the noise filled into the corresponding zero-quantization scaling factor band is improved compared to artificially generated noise (e.g., the noise that forms the basis of the noise floor 54), and is also superior to uncontrolled spectral copying/replication 46 from extremely low frequency lines within the same spectrum 46.

更准确地说，针对诸如50d之类的当前频带，噪声填充器16定位另一声道的频谱48内的频谱共同定位部分，根据零量化比例因子带50d以刚刚描述的可选地涉及包含在数据流30中的当前帧或频谱46的一些附加偏移或噪声因子参数的方式来缩放其谱线，使得其结果填充相应的零量化比例因子带50d直到由零量化比例因子带50d的比例因子定义的所需水平。在本实施例中，这意味着相对于本底噪声54以相加的方式完成填充。More precisely, for a current frequency band such as 50d, the noise filler 16 locates a spectral colocalization portion within the spectrum 48 of another channel, scaling its spectral lines according to the zero-quantization scaling factor band 50d in a manner that optionally involves some additional offset or noise factor parameters of the current frame or spectrum 46 contained in the data stream 30, as just described, such that the result fills the corresponding zero-quantization scaling factor band 50d up to the desired level defined by the scaling factor of the zero-quantization scaling factor band 50d. In this embodiment, this means that the filling is performed additively relative to the noise floor 54.

根据简化的实施例，得到的噪声填充频谱46将被直接输入到逆变换器18的输入端，以便针对频谱46的谱线系数所属的每个变换窗口获得相应声道音频时间信号的时域部分，于是重叠相加过程可以组合这些时域部分(图2中未示出)。也就是说，如果频谱46是非交织频谱，其谱线系数仅属于一个变换，则逆变换器18进行该变换，从而产生一个时域部分，并且其前后端将经受重叠相加过程，其中通过对前后逆变换进行逆变换而获得前后时域部分，以实现例如时域混叠消除。然而，如果频谱46已经将其交织到多于一个连续变换的谱线系数中，则逆变换器18将对其进行单独的逆变换，以便每个逆变换获得一个时域部分，并且根据其中定义的时间顺序，这些时域部分将在其间经受重叠相加过程，对于其他频谱或帧的前后时域部分同样如此。According to a simplified embodiment, the resulting noise-filled spectrum 46 is directly input to the input of the inverse transformer 18 to obtain the time-domain portion of the corresponding channel audio time signal for each transform window to which the spectral line coefficients of spectrum 46 belong. These time-domain portions can then be combined using an overlap-add process (not shown in Figure 2). That is, if spectrum 46 is a non-interleaved spectrum whose spectral line coefficients belong to only one transform, the inverse transformer 18 performs this transform, producing a time-domain portion, and its beginning and end are subjected to an overlap-add process, where the preceding and following time-domain portions are obtained by inverse transforming the preceding and following inverse transforms to achieve, for example, time-domain aliasing cancellation. However, if spectrum 46 has been interleaved into the spectral line coefficients of more than one consecutive transform, the inverse transformer 18 performs a separate inverse transform on it so that each inverse transform yields a time-domain portion, and these time-domain portions are subjected to an overlap-add process between them according to the time order defined therein, as is the case for the preceding and following time-domain portions of other spectra or frames.

然而，为了完整起见，必须注意，可以对噪声填充的频谱执行进一步处理。如图2所示，逆TNS滤波器可以对噪声填充的频谱执行逆TNS滤波。也就是说，通过当前帧或频谱46的TNS滤波器系数来控制，到目前为止所获得的频谱沿着频谱方向进行线性滤波。However, for completeness, it must be noted that further processing can be performed on the noise-filled spectrum. As shown in Figure 2, an inverse TNS filter can perform inverse TNS filtering on the noise-filled spectrum. That is, controlled by the TNS filter coefficients of the current frame or spectrum 46, the spectrum obtained so far is linearly filtered along the spectral direction.

在有或没有逆TNS滤波的情况下，复数立体声预测器24可以将频谱视为声道间预测的预测残差。更具体地，声道间预测器24可以使用另一声道的频谱共同定位部分来预测频谱46或至少其比例因子带50的子集。关于比例因子带50b在图4中用虚线框58示出复数预测过程。也就是说，数据流30可以包含声道间预测参数，其控制例如比例因子带50中的哪个应当通过这种方式进行声道间预测而哪个不应以这种方式进行预测。此外，数据流30中的声道间预测参数还可以包括由声道间预测器24应用的复数声道间预测因子，以便获得声道间预测结果。这些因子可以分别包含在每个比例因子带的数据流30中，或者备选地分别包含在一个或多个比例因子带构成的每个组的数据流30中，其中针对每个组在数据流30中激活或用信号通知在数据流30中激活声道间预测。With or without inverse TNS filtering, the complex stereo predictor 24 can treat the spectrum as the prediction residual of inter-channel prediction. More specifically, the inter-channel predictor 24 can use the spectral co-localization portion of another channel to predict spectrum 46 or at least a subset of its scaling factor band 50. The complex prediction process is illustrated in FIG4 with dashed box 58 regarding scaling factor band 50b. That is, data stream 30 may contain inter-channel prediction parameters that control, for example, which of the scaling factor bands 50 should be inter-channel predicted in this manner and which should not. Furthermore, the inter-channel prediction parameters in data stream 30 may also include complex inter-channel prediction factors applied by inter-channel predictor 24 to obtain inter-channel prediction results. These factors may be included in data stream 30 for each scaling factor band individually, or alternatively in data stream 30 for each group consisting of one or more scaling factor bands, wherein inter-channel prediction is activated or signaled to be activated in data stream 30 for each group.

如图4所示，声道间预测的源可以是另一声道的频谱48。更确切地说，声道间预测的源可以是频谱48的频谱共同定位部分，其共同定位到比例因子带50b以通过对其虚部的估计来扩展、进行声道间预测。可以基于频谱48本身的频谱共同定位部分60来执行对虚部的估计，和/或可以使用先前帧(即，紧接在频谱46所属的当前解码的帧之前的帧)的已解码的的声道的下混频。实际上，声道间预测器24将如刚刚描述的那样获得的预测信号加到要进行声道间预测的比例因子带，例如图4中的比例因子带50b。As shown in Figure 4, the source of inter-channel prediction can be the spectrum 48 of another channel. More precisely, the source of inter-channel prediction can be the spectral colocalization portion of spectrum 48, which colocalizes to the scaling factor band 50b to extend and perform inter-channel prediction by estimating its imaginary part. The estimation of the imaginary part can be performed based on the spectral colocalization portion 60 of spectrum 48 itself, and/or the downmixing of the decoded channel from a previous frame (i.e., the frame immediately preceding the currently decoded frame to which spectrum 46 belongs) can be used. In practice, the inter-channel predictor 24 adds the prediction signal obtained as just described to the scaling factor band to which inter-channel prediction is to be performed, such as the scaling factor band 50b in Figure 4.

如在前面的描述中已经指出的，频谱46所属的声道可以是MS编码声道，或者可以是与扬声器相关的声道，诸如立体声音频信号的左或右声道。因此，可选地，MS解码器26对可选的声道间预测频谱46进行MS解码，同样地，对每个谱线或频谱46执行与频谱48对应的另一声道的频谱对应谱线的加法或减法。例如，虽然图2中未示出，但是通过解码器10的部分34以类似于上面关于频谱46所属的声道的描述的方式获得了如图4所示的频谱48，并且MS解码模块26在执行MS解码时，使频谱46和48经受逐个谱线加法或逐个谱线减法，其中频谱46和48处于处理过程内的相同阶段，意味着，例如两者都已经通过声道间预测获得，或者两者都刚刚通过噪声填充或逆TNS滤波获得。As noted in the preceding description, the channel to which spectrum 46 belongs can be an MS-coded channel or a speaker-related channel, such as the left or right channel of a stereo audio signal. Therefore, optionally, MS decoder 26 performs MS decoding on the optional inter-channel predicted spectrum 46, similarly performing addition or subtraction on each spectral line or spectrum 46 corresponding to the spectral line of the other channel corresponding to spectrum 48. For example, although not shown in Figure 2, spectrum 48 as shown in Figure 4 is obtained by part 34 of decoder 10 in a manner similar to the description above regarding the channel to which spectrum 46 belongs, and MS decoding module 26, in performing MS decoding, subjectes spectra 46 and 48 to spectral line addition or subtraction, where spectra 46 and 48 are at the same stage of the processing, meaning, for example, that both have already been obtained through inter-channel prediction, or both have just been obtained through noise filling or inverse TNS filtering.

注意，可选地，可以以例如比例因子带50为单位可由数据流30单独激活或全局涉及整个频谱46的方式执行MS解码。换言之，MS解码可以使用数据流30中的相应信号，以例如帧或一些更精细的频谱时间分辨率(例如，单独地用于频谱图示40和/或42的频谱46和/或48的比例因子带)来启动或关闭，其中假设定义了两个声道的比例因子带的相同边界。Note that, optionally, MS decoding can be performed in units of, for example, scaling factor band 50, either individually activated by data stream 30 or globally involving the entire spectrum 46. In other words, MS decoding can be started or stopped using the corresponding signals in data stream 30, at, for example, frames or some finer spectral temporal resolution (e.g., scaling factor bands of spectra 46 and/or 48 individually for spectrum diagrams 40 and/or 42), where it is assumed that the same boundaries of the scaling factor bands for both channels are defined.

如图2所示，逆TNS滤波器28的逆TNS滤波也可以在任何声道间处理之后执行，例如声道间预测58或由MS解码器26进行的MS解码。在声道间处理之前或下游的性能可以通过数据流30中每一帧的相应信号通知固定或者控制，或者处于某个其他粒度水平。在执行逆TNS滤波的任何地方，存在于当前频谱46的数据流中的相应TNS滤波器系数控制TNS滤波器，即沿频谱方向运行的线性预测滤波器，以便对输入到相应的逆TNS滤波器模块28a和/或28b的频谱进行线性滤波。As shown in Figure 2, the inverse TNS filtering of the inverse TNS filter 28 can also be performed after any inter-channel processing, such as inter-channel prediction 58 or MS decoding by the MS decoder 26. Performance before or downstream of inter-channel processing can be fixed or controlled by the corresponding signal in each frame of the data stream 30, or at some other granularity level. Wherever inverse TNS filtering is performed, the corresponding TNS filter coefficients present in the data stream of the current spectrum 46 control the TNS filter, i.e., the linear prediction filter running along the spectral direction, to linearly filter the spectrum input to the corresponding inverse TNS filter modules 28a and/or 28b.

因此，到达逆变换器18的输入端的频谱46可能已经如刚刚所述经受进一步处理。同样，上述描述并不意味着以这样的方式理解，即所有这些可选工具要么同时存在，要么不存在。这些工具可以部分地或共同地存在于解码器10中。Therefore, the spectrum 46 reaching the input of the inverse transformer 18 may have already undergone further processing as described above. Similarly, the above description does not imply that all these optional tools are either present simultaneously or absent. These tools may exist partially or collectively within the decoder 10.

在任何情况下，在逆变换器输入端处产生的频谱表示声道输出信号的最终重构，并形成当前帧的上述下混频的基础，如关于复数预测58所描述的那样，其用作对要解码的下一帧的潜在虚部估计的基础。它还可以用作声道间预测另一声道的最终重构，而非图2中除34之外的元素所涉及的声道。In any case, the spectrum generated at the input of the inverter represents the final reconstruction of the channel output signal and forms the basis for the aforementioned downmixing of the current frame, as described with respect to complex prediction 58, which serves as the basis for estimating the potential imaginary part of the next frame to be decoded. It can also be used for inter-channel prediction of the final reconstruction of another channel, rather than the channel involved in the elements in Figure 2 other than 34.

通过将该最终频谱46与频谱48的相应最终版本组合，由下混频提供器31形成相应下混频。后者，即频谱48的相应最终版本，形成预测器24中的复数声道间预测的基础。By combining the final spectrum 46 with the corresponding final version of spectrum 48, the downmixer 31 forms a corresponding downmixer. The latter, namely the corresponding final version of spectrum 48, forms the basis for the inter-channel prediction in predictor 24.

图5a和图5b示出了相对于图2的替代方案，其中用于声道间噪声填充的基础由先前帧的频谱共同定位的谱线的下混频表示，使得在使用复数声道间预测的可选情况下，该复数声道间预测的源被使用两次，作为声道间噪声填充的源以及复数声道间预测中的虚部估计的源。图5a和图5b示出了解码器10，其包括与频谱46所属的第一声道的解码有关的部分70，以及上述另一部分34的内部结构，该另一部分34涉及包括频谱48的另一声道的解码。相同的附图标记一方面用于部分70的内部元素，另一方面用于34。可以看出，结构是一样的。在输出端32处，输出立体声音频信号的一个声道，并且在第二解码器部分34的逆变换器18的输出端处，产生立体声音频信号的另一(输出)声道，其中该输出端用附图标记74指示。同样，上述实施例可以容易地转移到使用两个以上声道的情况。Figures 5a and 5b illustrate an alternative to Figure 2, where the basis for inter-channel noise filling is a downmixing representation of spectral lines co-located in the previous frame's spectrum. This allows the source of the complex inter-channel prediction to be used twice, both as a source for inter-channel noise filling and as a source for estimating the imaginary part in the complex inter-channel prediction, in the optional case of using complex inter-channel prediction. Figures 5a and 5b show a decoder 10, including a section 70 relating to the decoding of the first channel to which spectrum 46 belongs, and the internal structure of the aforementioned other section 34 relating to the decoding of another channel including spectrum 48. The same reference numerals are used for the internal elements of section 70 on one hand, and for 34 on the other. It can be seen that the structure is the same. At output 32, one channel of the stereo audio signal is output, and at the output of the inverse converter 18 of the second decoder section 34, another (output) channel of the stereo audio signal is generated, indicated by reference numeral 74. Similarly, the above embodiment can be readily adapted to cases using more than two channels.

下混频提供器31由部分70和34共同使用，并且接收频谱图示40和42的时间上共同定位的频谱48和46，以便通过以谱线为基础在谱线上对这些频谱进行求和，可能通过将每个谱线处的和除以下混频的声道数(即，在图5a和图5b的情况下为2个声道)来形成其平均值，来形成基于其的下混频。在下混频提供器31的输出端处，通过该测量得到先前帧的下混频。在这方面注意到，先前前帧在频谱图示40和42中的任一个中包含多于一个频谱的情况下，关于在该情况下下混频提供器31如何操作存在不同可能性。例如，在该情况下，下混频提供器31可以使用当前帧的尾变换的频谱，或者可以使用交织频谱图示40和42的当前帧的所有谱线系数的交织结果。在图5a和图5b被示出为连接到下混频提供器31的输出端的延迟元素74，表明在下混频提供器31的输出端处如此提供的下混频形成先前帧76的下混频(参见图4，分别关于声道间噪声填充56和复预测58)。因此，延迟元素74的输出端一方面连接到解码器部分34和70的声道间预测器24的输入端，另一方面连接到解码器部分70和34的噪声填充器16的输入端。Downmixer 31 is shared by portions 70 and 34 and receives spectra 48 and 46, which are temporally co-located in spectrograms 40 and 42, in order to form a downmix based on these spectra on a spectral line basis, possibly by averaging the sum at each spectral line by dividing the sum by the number of channels being downmixed (i.e., two channels in the cases of Figures 5a and 5b). At the output of downmixer 31, the downmix of the previous frame is obtained through this measurement. In this respect, it is noted that if the previous frame contains more than one spectrum in either of spectrograms 40 and 42, there are different possibilities regarding how downmixer 31 should operate in that case. For example, in this case, downmixer 31 may use the spectrum of the tail transform of the current frame, or it may use the interleaving result of all spectral line coefficients of the current frame in interleaved spectrograms 40 and 42. The delay element 74 shown in Figures 5a and 5b as connected to the output of the downmixer provider 31 indicates that the downmixing provided at the output of the downmixer provider 31 forms the downmixing of the previous frame 76 (see Figure 4, with respect to interchannel noise fill 56 and complex prediction 58, respectively). Therefore, the output of the delay element 74 is connected on one hand to the input of the interchannel predictor 24 of the decoder sections 34 and 70, and on the other hand to the input of the noise filler 16 of the decoder sections 70 and 34.

即，虽然在图2中，噪声填充器16接收同一当前帧的另一个声道最终重构的时间上共同定位的频谱48作为声道间噪声填充的基础，但是在图5a和图5b中，而是基于由下混频提供器31提供的先前帧的下混频来执行声道间噪声填充。执行声道间噪声填充的方式保持不变。也就是说，声道间噪声填充器16从当前帧的另一声道的频谱的相应频谱中(在图2的情况下)，并且从表示先前帧的下混频的先前帧中获得的被大部分或完全解码的最终频谱中(在图5a和图5b的情况下)，抓取频谱共同定位的部分，并且将相同的“源”部分加到要根据由相应比例因子带的比例因子确定的目标噪声水平缩放的、进行噪声填充的(例如，图4中的50d)比例因子带内的谱线。That is, although in Figure 2, the noise filler 16 receives the temporally co-located spectrum 48 of the final reconstruction of another channel in the same current frame as the basis for inter-channel noise filling, in Figures 5a and 5b, inter-channel noise filling is performed based on the downmixing of the previous frame provided by the downmixer provider 31. The manner in which inter-channel noise filling is performed remains unchanged. That is, the inter-channel noise filler 16 extracts the co-located portion of the spectrum from the corresponding spectrum of the other channel in the current frame (in the case of Figure 2), and from the final decoded spectrum obtained from the previous frame representing the downmixing of the previous frame (in the cases of Figures 5a and 5b), and adds the same "source" portion to the spectral line within the scaling factor band (e.g., 50d in Figure 4) to be scaled according to the target noise level determined by the scaling factor of the corresponding scaling factor band.

结束以上对描述音频解码器中的声道间噪声填充的实施例的讨论，对于本领域技术人员显而易见的是，在将“源”频谱的抓取的频谱或时间上共同定位的部分加到“目标”比例因子带的谱线之前，可以将某些预处理应用于“源”谱线，而不偏离声道间填充的总体构思。特别地，可能有益的是，将滤波操作(例如，频谱平坦化或倾斜去除)应用于要被加到“目标”比例因子带(如图4中的50d)的“源”区域的谱线，以便提高声道间噪声填充过程的音频质量。同样地，并且作为大部分(而不是完全)解码的频谱的示例，上述“源”部分可以从尚未用可用的逆(即，合成)TNS滤波器进行滤波的频谱中获得。Concluding the above discussion of embodiments describing interchannel noise filling in an audio decoder, it will be apparent to those skilled in the art that certain preprocessing can be applied to the "source" spectrum without departing from the overall concept of interchannel filling before adding the spectral or temporally co-located portion of the "source" spectrum to the spectral lines of the "target" scaling factor band. In particular, it may be advantageous to apply filtering operations (e.g., spectral flattening or skew removal) to the spectral lines of the "source" region to be added to the "target" scaling factor band (50d in Figure 4) to improve the audio quality of the interchannel noise filling process. Similarly, and as an example of a largely (but not fully) decoded spectrum, the aforementioned "source" portion can be obtained from a spectrum that has not yet been filtered with an available inverse (i.e., synthesized) TNS filter.

因此，上述实施例涉及声道间噪声填充的构思。在下文中，描述了如何以半后向兼容的方式将上述声道间噪声填充的构思应用于现有编解码器(即，xHE-AAC)的可能性。具体地，在下文中，描述了上述实施例的优选实施方式，根据该实施方式，立体声填充工具以半后向兼容信令方式应用于基于xHE-AAC的音频编解码器。通过使用下面进一步描述的实施方式，对于某些立体声信号，基于MPEG-D xHE-AAC(USAC)的音频编解码器中的两个声道中的任一个中的变换系数的立体声填充是可行的，由此提高尤其在低比特率下的某些音频信号的编码质量。以半后向兼容的方式用信号通知立体声填充工具，使得传统的xHE-AAC解码器可以解析和解码比特流而没有明显的音频错误或丢失。如上所述，如果音频编码器可以使用两个立体声声道的先前解码/量化的系数的组合来重构当前解码的声道中的任何一个的零量化(非发送)系数，则可以获得更好的整体质量。因此，除了频谱带复制(从低频到高频声道系数)和音频编码器(尤其是xHE-AAC或基于其的编码器)中的噪声填充(从不相关的伪随机源)之外，期望允许这种立体声填充(从先前到现在的声道系数)。Therefore, the above embodiments relate to the concept of inter-channel noise filling. Hereinafter, the possibility of applying the above-described inter-channel noise filling concept to existing codecs (i.e., xHE-AAC) in a semi-backward compatible manner is described. Specifically, a preferred embodiment of the above embodiments is described below, according to which a stereo filler tool is applied to an xHE-AAC-based audio codec in a semi-backward compatible signaling manner. By using the embodiments further described below, stereo filler of the transform coefficients in either of the two channels of an MPEG-D xHE-AAC (USAC)-based audio codec is feasible for certain stereo signals, thereby improving the encoding quality of certain audio signals, especially at low bit rates. The stereo filler tool is signaled in a semi-backward compatible manner, allowing a conventional xHE-AAC decoder to parse and decode the bitstream without noticeable audio errors or loss. As mentioned above, better overall quality can be obtained if the audio encoder can reconstruct the zero-quantized (non-transmitted) coefficients of either of the currently decoded channels using a combination of previously decoded/quantized coefficients from the two stereo channels. Therefore, in addition to spectral band copying (from low to high frequency channel coefficients) and noise filling in the audio encoder (especially xHE-AAC or its-based encoders) (from unrelated pseudo-random sources), it is desirable to allow this stereo filling (from previous to current channel coefficients).

为了允许传统xHE-AAC解码器读取和解析具有立体声填充的编码的比特流，应以半后向兼容的方式使用所需的立体声填充工具：其存在不应导致传统解码器停止或者甚至不启动解码。xHE-AAC基础结构对比特流的可读性也可以促进市场采用。To allow conventional xHE-AAC decoders to read and parse bitstreams encoded with stereo fill, the required stereo fill tool should be used in a semi-backward compatible manner: its presence should not cause conventional decoders to stop or even fail to start decoding. The readability of the bitstream within the xHE-AAC infrastructure also promotes market adoption.

为了在xHE-AAC或其潜在衍生物的情况下实现针对立体声填充工具的半向后兼容性的上述愿望，以下实施方式涉及立体声填充的功能以及在实际上与噪声填充有关的数据流中通过语法用信号通知其的能力。立体声填充工具将按照以上描述工作。在具有共同窗口配置的声道对中，当立体声填充工具被激活时，零量化比例因子带的系数作为噪声填充的替代(或者，如上所述，加上噪声填充)，通过两个声道中任何一个声道(优选地，右声道)中先前帧的系数的和或差被重构。与噪声填充类似地执行立体声填充。将通过xHE-AAC的噪声填充信令完成信令。通过8位噪声填充辅助信息传送立体声填充。这是可行的，这是因为MPEG-D USAC标准[3]规定即使要应用的噪声水平为零，也要发送所有的8比特。在这种情况下，一些噪声填充比特可以重复用于立体声填充工具。In order to achieve the aforementioned desire for semi-backward compatibility with the stereo fill tool in the case of xHE-AAC or its potential derivatives, the following implementation relates to the functionality of stereo fill and its ability to be signaled by syntax in the data stream that is actually related to noise fill. The stereo fill tool will operate as described above. In a channel pair with a common window configuration, when the stereo fill tool is activated, the coefficients of the zero quantization scaling factor band are reconstructed as a substitute for noise fill (or, as described above, with noise fill added) by the sum or difference of the coefficients of the previous frame in either of the two channels (preferably the right channel). Stereo fill is performed similarly to noise fill. The signaling is completed by the noise fill signaling of xHE-AAC. The stereo fill is transmitted via 8-bit noise fill auxiliary information. This is feasible because the MPEG-D USAC standard [3] specifies that all 8 bits should be sent even if the noise level to be applied is zero. In this case, some noise fill bits can be reused for the stereo fill tool.

关于传统xHE-AAC解码器进行的比特流解析和回放的半后向兼容性确保如下。通过包含立体声填充工具的辅助信息以及丢失的噪声水平的在五个非零比特(传统上表示噪声偏移)之后的零噪声水平(即，全都具有零值的前三个噪声填充比特)用信号通知立体声填充。由于传统的xHE-AAC解码器在3比特噪声水平为零的情况下忽略5比特噪声偏移的值，因此立体声填充工具信令的存在仅影响传统解码器中的噪声填充：噪声填充由于前三比特为零而被关闭，并且解码操作的其余部分按预期运行。特别地，不执行立体声填充，这是因为它类似于停用的噪声填充过程而操作。因此，传统解码器仍然提供对增强的比特流30的“优雅”解码，这是因为它不需要使输出信号静音或甚至在到达启动立体声填充的帧时中止解码。然而，自然地，与通过能够适当地处理新的立体声填充工具的适当解码器的解码相比，无法提供对经立体声填充的线系数的正确的预期的重构，导致受影响的帧的质量恶化。尽管如此，假设立体声填充工具按预期使用，即仅用于低比特率的立体声输入，通过xHE-AAC解码器的质量应该好于受影响的帧由于静音而丢失或导致其他明显的回放错误的情况。The semi-backward compatibility of bitstream parsing and playback performed by the conventional xHE-AAC decoder is ensured as follows. Stereo fill is signaled by including auxiliary information for the stereo fill tool and the zero noise level (i.e., the first three noise fill bits, all with zero values) after five non-zero bits (traditionally representing noise offset). Since the conventional xHE-AAC decoder ignores the 5-bit noise offset value when the 3-bit noise level is zero, the presence of the stereo fill tool signaling only affects noise fill in the conventional decoder: noise fill is disabled due to the first three bits being zero, and the rest of the decoding operation proceeds as expected. In particular, stereo fill is not performed because it operates similarly to a disabled noise fill process. Therefore, the conventional decoder still provides "graceful" decoding of the enhanced bitstream 30 because it does not need to mute the output signal or even abort decoding upon reaching a frame where stereo fill is initiated. However, naturally, compared to decoding by a suitable decoder capable of properly handling the new stereo fill tool, a correct and expected reconstruction of the stereo-filled line coefficients cannot be provided, resulting in a deterioration in the quality of the affected frames. Nevertheless, assuming the stereo fill tool is used as intended, i.e. only for low bitrate stereo input, the quality through the xHE-AAC decoder should be better than if affected frames are lost due to mute or cause other noticeable playback errors.

在下文中，将详细描述如何将立体声填充工具构建到xHE-AAC编解码器中作为扩展。The following section will describe in detail how to build the stereo fill tool into the xHE-AAC codec as an extension.

当构建到标准中时，立体声填充工具可以描述如下。具体地，这种立体声填充(SF)工具将表示MPEG-H 3D音频的频域(FD)部分中的新工具。根据上述讨论，这种立体声填充工具的目的是以低比特率进行MDCT谱系数的参数重构，类似于根据[3]中描述的标准的第7.2节已经可以通过噪声填充实现的。然而，与采用伪随机噪声源来生成任何FD声道的MDCT频谱值的噪声填充不同，SF也可用于使用先前帧的左和右MDCT频谱的下混频来重构经联合编码的立体声声道对的右声道的MDCT值。根据下面阐述的实施方式，通过可以由传统MPEG-DUSAC解码器正确地解析的噪声填充辅助信息来半向后兼容地用信号通知SF。When incorporated into the standard, the stereo fill tool can be described as follows. Specifically, this stereo fill (SF) tool will represent a new tool in the frequency domain (FD) portion of MPEG-H 3D audio. Based on the above discussion, the purpose of this stereo fill tool is to perform parametric reconstruction of MDCT spectral coefficients at a low bit rate, similar to what can already be achieved with noise fill according to Section 7.2 of the standard described in [3]. However, unlike noise fill which uses pseudo-random noise sources to generate MDCT spectral values for any FD channel, SF can also be used to reconstruct the MDCT value of the right channel of a jointly coded stereo channel pair using downmixing of the left and right MDCT spectra of the previous frame. According to the implementation described below, SF is signaled semi-backward compatiblely by noise fill auxiliary information that can be correctly resolved by a conventional MPEG-DUSAC decoder.

工具描述可以如下。当SF在联合立体声FD帧中是激活的时，右(第二)声道(例如，50d)的空(即，完全零量化)比例因子带的MDCT系数被先前帧的相应解码的的左和右声道的MDCT系数和或差替换(如果是FD)。如果传统噪声填充对于第二声道是激活的，则伪随机值也被加到每个系数。然后缩放每个比例因子带的所得系数，使得每个频带的RMS(平均系数平方的根)与通过该频带的比例因子发送的值匹配。参见[3]中的标准的第7.3节。The tool can be described as follows. When SF is active in a joint stereo FD frame, the MDCT coefficients of the empty (i.e., completely zero-quantized) scaling factor band of the right (second) channel (e.g., 50d) are replaced by the sum or difference of the MDCT coefficients of the corresponding decoded left and right channels of the previous frame (if it is FD). If conventional noise padding is active for the second channel, a pseudo-random value is also added to each coefficient. The resulting coefficients for each scaling factor band are then scaled such that the RMS (root of the square of the average coefficients) of each band matches the value sent through the scaling factor of that band. See Section 7.3 of the standard in [3].

可以为MPEG-D USAC标准中的新SF工具的使用提供一些操作约束。例如，SF工具可以仅可用于公共FD声道对的右FD声道中，即，用common_window＝＝1发送StereoCoreToolInfo()的声道对元素。此外，由于半后向兼容信令，SF工具可以仅在语法容器UsacCoreConfig()中的noiseFilling＝＝1时使用。如果该对中的任一声道处于LPDcore_mode，则即使右声道处于FD模式，也不可以使用SF工具。Some operational constraints can be provided for the use of the new SF tool in the MPEG-D USAC standard. For example, the SF tool can only be used in the right FD channel of a common FD channel pair, i.e., by sending the channel pair element of StereoCoreToolInfo() with common_window == 1. Furthermore, due to semi-backward compatible signaling, the SF tool can only be used when noiseFilling == 1 in the syntax container UsacCoreConfig(). If any channel in the pair is in LPDcore_mode, the SF tool cannot be used even if the right channel is in FD mode.

下文使用以下术语和定义，以便更清楚地描述[3]中描述的标准的扩展。The following terms and definitions are used to more clearly describe the extensions to the standard described in [3].

具体地，就数据元素而言，新引入了以下数据元素：Specifically, regarding data elements, the following new data elements have been introduced:

stereo_filling 二进制标志，指示在当前帧和声道中是否使用SFThe stereo_filling binary flag indicates whether stereo filling is used in the current frame and channel.

此外，还引入了新的帮助元素：In addition, a new help element has been introduced:

noise_offset 噪声填充偏移，用于修改零量化频带的比例因子(第7.2节)`noise_offset` is the noise fill offset, used to modify the scaling factor of the zero-quantization band (Section 7.2).

noise_level 噪声填充水平，表示加上的频谱噪声的幅度(第7.2节)noise_level: Noise fill level, indicating the magnitude of the added spectral noise (Section 7.2).

downmix_prev[] 先前帧的左和右声道的下混频(即，和或差)downmix_prev[] The downmix (i.e., sum or difference) of the left and right channels of the previous frame.

sf_index[g][sfb] 窗口组g和频宽sfb的比例因子索引(即，发送的整数)sf_index[g][sfb] The scaling factor index of window group g and bandwidth sfb (i.e., the integer to be transmitted).

将以下列方式扩展标准的解码过程。具体地，在激活SF工具的情况下对经联合立体声编码的FD声道的解码按照以下三个连续步骤执行：The standard decoding process will be extended as follows. Specifically, decoding of the jointly stereo encoded FD channels will be performed in three consecutive steps when the SF tool is activated:

首先，将进行stereo_filling标志的解码。First, the stereo_filling flag will be decoded.

stereo_filling不表示独立的比特流元素，而是从UsacChannelPairElement()中的噪声填充元素noise_offset和noise_level以及StereoCoreToolInfo()中的common_window标志导出的。如果noiseFilling＝＝0或common_window＝＝0或当前声道是元素中的左(第一)声道，则stereo_filling为0，立体声填充过程结束。否则，`stereo_filling` does not represent an independent bitstream element, but is derived from the noise fill element `noise_offset` and `noise_level` in `UsacChannelPairElement()` and the `common_window` flag in `StereoCoreToolInfo()`. If `noiseFilling == 0` or `common_window == 0` or the current channel is the left (first) channel in the element, then `stereo_filling` is 0, and the stereo fill process ends. Otherwise,

if((noiseFilling！＝0)&&(common_window！＝0)&&(noise_level＝＝0)){if((noiseFilling!=0)&&(common_window!=0)&&(noise_level==0)){

stereo_filling＝(noise_offset&16)/16；stereo_filling=(noise_offset&16)/16;

noise_level＝(noise_offset&14)/2；noise_level=(noise_offset&14)/2;

noise_offset＝(noise_offset&1)＊16；noise_offset=(noise_offset&1)*16;

}}

else{else{

stereo_filling＝0；stereo_filling = 0;

}}

换言之，如果noise_level＝＝0，则noise_offset包含stereo_filling标志，其后是4比特噪声填充数据，其然后将重新排列。由于此操作改变了noise_level和noise_offset的值，因此需要在第7.2节的噪声填充过程之前执行。此外，上述伪代码不在UsacChannelPairElement()或任何其他元素的左(第一)声道中执行。In other words, if noise_level == 0, then noise_offset contains the stereo_filling flag, followed by 4 bits of noise padding data, which are then rearranged. Because this operation changes the values of noise_level and noise_offset, it needs to be performed before the noise padding process in Section 7.2. Furthermore, the above pseudocode is not executed in the left (first) channel of UsacChannelPairElement() or any other element.

然后，将进行downmix_prev的计算。Then, downmix_prev will be calculated.

downmix_prev[]，将用于立体声填充的频谱下混频，与用于复数立体声预测中的MDST频谱估计的dmx_re_prev[]相同(参见第7.7.2.3节)。这意味着`downmix_prev[]` will perform spectral downmixing for stereo fill, identical to `dmx_re_prev[]` used for MDST spectral estimation in complex stereo prediction (see Section 7.7.2.3). This means...

·如果以其执行下混频的元素和帧(即，当前解码的帧之前的帧)的任何声道使用core_mode＝＝1(LPD)或声道使用不相等的变换长度(split_transform＝＝1或仅在一个声道中区块切换到window_sequence＝＝EIGHT_SHORT_SEQUENCE)或usacIndependencyFlag＝＝1，则downmix_prev[]的所有系数必须为零。• If any channel of the element and frame in which it performs the downmix (i.e., the frame before the currently decoded frame) uses core_mode == 1 (LPD) or the channel uses unequal transform lengths (split_transform == 1 or only block switching in one channel window_sequence == EIGHT_SHORT_SEQUENCE) or usacIndependencyFlag == 1, then all coefficients of downmix_prev[] must be zero.

·如果在当前元素中声道的变换长度从最后一帧变为当前帧(即，split_transform＝＝1之前是split_transform＝＝0，或者window_sequence＝＝EIGHT_SHORT_SEQUENCE之前是window_sequence！＝EIGHT_SHORT_SEQUENCE，反之亦然)，则在立体声填充过程中所有downmix_prev[]的系数必须为零。• If the channel transformation length in the current element changes from the last frame to the current frame (i.e., split_transform == 0 before split_transform == 1, or window_sequence == EIGHT_SHORT_SEQUENCE before window_sequence != EIGHT_SHORT_SEQUENCE, and vice versa), then all downmix_prev[] coefficients must be zero during stereo fill.

·如果先前前帧或当前帧的声道中应用变换分割，则downmix_prev[]表示逐行交织的频谱下混频。详细信息，请参见变换分割工具。• If transform splitting is applied to a channel in the previous or current frame, then downmix_prev[] represents downmixing with progressive interleaving. For more information, see the Transform Split Tool.

·如果当前帧和元素中未使用复数立体声预测，则pred_dir等于0。• If complex stereo prediction is not used in the current frame and elements, then pred_dir equals 0.

因此，先前下混频只需要针对两个工具计算一次，从而降低了复杂性。第7.7.2节中的downmix_prev[]和dmx_re_prev[]之间的唯一区别是在当前未使用复数立体声预测时，或者在它是激活的但use_prev_frame＝＝0时的表现。在这种情况下，根据第7.7.2.3节计算downmix_prev[]用于立体声填充解码，即使复数立体声预测解码不需要dmx_re_prev[]而因此其未定义/为零。Therefore, the downmixing only needs to be calculated once for both tools, thus reducing complexity. The only difference between downmix_prev[] and dmx_re_prev[] in Section 7.7.2 is the behavior when complex stereo prediction is not currently used, or when it is active but use_prev_frame == 0. In this case, downmix_prev[] is calculated for stereo fill decoding according to Section 7.7.2.3, even though complex stereo prediction decoding does not require dmx_re_prev[] and therefore it is undefined/zero.

此后，将执行空比例因子带的立体声填充。After that, stereo fill of the empty scale factor band will be performed.

如果stereo_filling＝＝1，则在噪声填充过程之后在max_sf__ste之下的所有初始空比例因子带sfb[](即，其中所有MDCT谱线都被量化为零的所有频带)中执行以下过程。首先，通过谱线平方和来计算给定sfb[]的能量和downmix_prev[]中的对应谱线。于是，给定sfbWidth包含每个sfb[]的谱线数量，If stereo_filling == 1, then after the noise filling process, the following procedure is performed in all initial empty scale factor bands sfb[] below max_sf__ste (i.e., all bands where all MDCT spectral lines are quantized to zero). First, the energy of a given sfb[] and the corresponding spectral lines in downmix_prev[] are calculated by the sum of squares of the spectral lines. Thus, a given sfbWidth contains the number of spectral lines for each sfb[].

if(energy[sfb]＜sfbwidth[sfb]){/＊noiselevelisn′t maximum，or bandstarts below noise-fill region＊/if(energy[sfb]<sfbwidth[sfb]){/*noiselevelisn′t maximum, or bandstarts below noise-fill region*/

facDmx＝sqrt((sfbwidth[sfb]-energy[sfb])/energy_dmx[sfb])；facDmx=sqrt((sfbwidth[sfb]-energy[sfb])/energy_dmx[sfb]);

factor＝0.0；factor = 0.0;

/＊if the previous downmix isn′t empty，add the scaled downmix linessuch that band reaches unity energy＊//*if the previous downmix isn’t empty, add the scaled downmix linesssuch that band reaches unity energy*/

for(index＝swb_offset[sfb]；index＜swb_offset[sfb+1]；index++){for(index=swb_offset[sfb]; index<swb_offset[sfb+1]; index++){

spectrum[window][index]+＝downmix_prev[window][index]＊facDmx；spectrum[window][index]+=downmix_prev[window][index]*facDmx;

factor+＝spectrum[window][index]＊spectrum[window][index]；factor+=spectrum[window][index]*spectrum[window][index];

}}

if((factor！＝sfbWidth[sfb])&&(factor＞0)){/＊unity energy isn′treached,so modify band＊/if((factor!=sfbWidth[sfb])&&(factor＞0)){/*unity energy isn’treached,so modify band＊/

factor＝sqrt(sfbwidth[sfb]/(factor+1e-8))；factor=sqrt(sfbwidth[sfb]/(factor+1e-8));

spectrum[window][index]＊＝factor；spectrum[window][index]*=factor;

}}

对于每组窗口的频谱。然后将比例因子应用于所得的频谱，如第7.3节所述，其中空频带的比例因子像常规比例因子一样处理。For the spectrum of each window group, a scaling factor is then applied to the resulting spectrum as described in Section 7.3, where the scaling factor for empty frequency bands is treated the same as the regular scaling factor.

xHE-AAC标准的上述扩展的替代方案将使用隐式半后向兼容信令方法。An alternative to the aforementioned extensions to the xHE-AAC standard would be to use an implicit semi-backward compatible signaling approach.

xHE-AAC代码框架中的上述实施方式描述了一种方法，该方法使用比特流中的一个比特根据图2用信号通知解码器对stereo_filling中包含的新立体声填充工具的使用。更准确地说，这种信令(让我们称之为显式半后向兼容信令)允许以下传统比特流数据(在此是噪声填充辅助信息)独立于SF信号通知而使用：在本实施例中，噪声填充数据不依赖于立体声填充信息，反之亦然。例如，可以发送由全零(noise_level＝noise_offset＝0)组成的噪声填充数据，而stereo_filling可以用信号通知任何可能的值(是二进制标志，0或1)。The above implementation in the xHE-AAC code framework describes a method that uses a single bit in the bitstream to signal the decoder the use of a new stereo filler tool contained in stereo_filling, according to Figure 2. More precisely, this signaling (let's call it explicit semi-backward compatible signaling) allows the following conventional bitstream data (here, noise filler auxiliary information) to be used independently of the SF signaling notification: In this embodiment, the noise filler data is independent of the stereo filler information, and vice versa. For example, noise filler data consisting of all zeros (noise_level = noise_offset = 0) can be sent, while stereo_filling can be signaled with any possible value (a binary flag, 0 or 1).

在传统比特流数据与本发明的比特流数据之间不需要严格独立并且本发明的信号是二元决策的情况下，可以避免信令比特的显式发送，并且可以通过存在或不存在可以被称为隐式半后向兼容信令的内容来用信号通知所述二元决策。再次以上述实施例为例，可以通过简单地采用新信令来发送对立体声填充的使用：如果noise_level为零，并且同时noise_offset不为零，则stereo_filling标志被设置为等于1。noise_level和noise_offset都不为零，stereo_filling等于0。当noise_level和noise_offset都为零时，发生该隐式信号对传统噪声填充信号的依赖。在这种情况下，不清楚使用了传统的还是新的SF隐式信令。为了避免这种歧义，必须事先定义stereo_filling的值。在本示例中，如果噪声填充数据由全零组成，则定义stereo_filling＝0是合适的，这是因为当未在帧中应用噪声填充时，这是没有立体声填充能力的传统编码器用信号通知的内容。In cases where strict independence between conventional bitstream data and the bitstream data of this invention is not required, and where the signaling of this invention is a binary decision, explicit transmission of signaling bits can be avoided. The binary decision can be signaled by the presence or absence of what can be termed implicit semi-backward compatible signaling. Again, taking the above embodiment as an example, the use of stereo filler can be sent simply by employing new signaling: if noise_level is zero and noise_offset is not zero, the stereo_filling flag is set to 1. If both noise_level and noise_offset are not zero, stereo_filling equals 0. When both noise_level and noise_offset are zero, the implicit signaling depends on the conventional noise filler signal. In this case, it is unclear whether conventional or new SF implicit signaling is used. To avoid this ambiguity, the value of stereo_filling must be defined beforehand. In this example, defining stereo_filling = 0 is appropriate if the noise filler data consists of all zeros, because this is what a conventional encoder without stereo filler capability signals when no noise filler is applied in the frame.

在隐式半后向兼容信令的情况下仍待解决的问题是如何同时用信号通知stereo_filling＝＝1并且没有噪声填充。如上所述，噪声填充数据不能全为零，并且如果要求零噪声幅度，则nosie_level((noise_offset&14)/2，如上所述)必须等于0。这样只剩下noise_offset((noise_offset&1)*16，如上所述)大于0作为解决方案。然而，即使noise_level为零，当应用比例因子时在立体声填充的情况下也会考虑noise_offset。幸运的是，编码器可以通过改变受影响的比例因子，使得在比特流写入时，它们以noise_offset包含解码器中撤消的偏移，来补偿可能无法发送为零的noise_offset的事实。这允许上述实施例中的所述隐式信令以比例因子数据速率的潜在增加为代价。因此，可以如下改变上述描述的伪代码中的立体声填充的信令，使用保存的SF信令比特来发送2比特(4个值)而不是1比特的noise_offset：The remaining problem in the case of implicit semi-backward compatible signaling is how to simultaneously signal stereo_filling == 1 and no noise filling. As mentioned above, the noise filling data cannot be all zero, and if zero noise amplitude is required, noise_level((noise_offset&14)/2, as mentioned above) must be equal to 0. This leaves only noise_offset((noise_offset&1)*16, as mentioned above) being greater than 0 as a solution. However, even if noise_level is zero, noise_offset is still considered in the case of stereo filling when the scaling factor is applied. Fortunately, encoders can compensate for the fact that a noise_offset that may not be sent to zero by changing the affected scaling factor so that, when writing the bitstream, they include the offset undone in the decoder with noise_offset. This allows the implicit signaling in the above embodiments at the cost of a potential increase in the scaling factor data rate. Therefore, the stereo filling signaling in the pseudocode described above can be modified as follows to send 2 bits (4 values) of noise_offset instead of 1 bit using the saved SF signaling bits:

if((noiseFilling)&&(common_window)&&(noise_level＝＝0)&&if((noiseFilling)&&(common_window)&&(noise_level==0)&&

(noise_offset＞0)){(noise_offset＞0){

stereo_filling＝1；stereo_filling = 1;

noise_level＝(noise_offset&28)/4；noise_level=(noise_offset&28)/4;

noise_offset＝(noise_offset&3)＊8；noise_offset=(noise_offset&3)*8;

}}

else{else{

stereo_filling＝0；stereo_filling = 0;

}}

为了完整起见，图6示出了根据本申请的实施例的参数音频编码器。首先，通常使用附图标记90表示的图6的编码器包括变换器92，用于执行在图2的输出端32处重构的音频信号的原始非失真版本的变换。如关于图3所述的，可以使用重叠变换，其中以帧为单位在不同变换长度以及对应的变换窗口之间的切换。不同变换长度和对应变换窗口在图3中使用附图标记104示出。以类似于图2的方式，图6侧重于编码器90中负责编码多声道音频信号的一个声道的部分，而解码器90的另一声道域部分通常使用图6中的附图标记96表示。For completeness, Figure 6 illustrates a parametric audio encoder according to an embodiment of this application. First, the encoder of Figure 6, generally denoted by reference numeral 90, includes a transducer 92 for performing a transformation of the original, undistorted version of the audio signal reconstructed at output 32 of Figure 2. As described with respect to Figure 3, an overlapping transformation can be used, where switching between different transform lengths and corresponding transform windows occurs in frames. Different transform lengths and corresponding transform windows are shown in Figure 3 using reference numeral 104. Similar to Figure 2, Figure 6 focuses on the portion of the encoder 90 responsible for encoding one channel of the multi-channel audio signal, while the other channel domain portion of the decoder 90 is generally denoted by reference numeral 96 in Figure 6.

在变换器92的输出端，谱线和比例因子是未量化的，并且基本上没有发生编码损失。由变换器92输出的频谱图进入量化器98，该量化器98被配置为逐个频谱地对变换器92输出的频谱图的谱线进行量化、设置和使用比例因子带的初始比例因子。也就是说，在量化器98的输出端处，得到初始比例因子和对应的谱线系数，并且一系列的噪声填充器16′、可选的逆TNS滤波器28a′、声道间预测器24′、MS解码器26′和逆TNS滤波器28b′被顺序地连接，以便为图6的编码器90提供在下混频提供器的输入端处获得如在解码器侧可获得的当前频谱的经重构最终版本的能力(参见图2)。在使用声道间预测24′和/或在使用先前帧的下混频形成声道间噪声的版本中使用声道间噪声填充的情况下，编码器90还包括下混频提供器31′以便形成多声道音频信号的声道的频谱的经重构的最终版本的下混频。当然，为了节省计算，代替最终版本，下混频提供器31′可以将声道的所述频谱的原始的未量化的版本用于形成下混频。At the output of converter 92, the spectral lines and scaling factors are unquantized, and there is essentially no coding loss. The spectrogram output from converter 92 enters quantizer 98, which is configured to quantize the spectral lines of the spectrogram output from converter 92 spectrally, setting and using the initial scaling factor of the scaling factor band. That is, at the output of quantizer 98, the initial scaling factor and the corresponding spectral line coefficients are obtained, and a series of noise fillers 16′, optional inverse TNS filters 28a′, inter-channel predictors 24′, MS decoders 26′, and inverse TNS filters 28b′ are sequentially connected to provide encoder 90 of Figure 6 with the ability to obtain a reconstructed final version of the current spectrum as available on the decoder side at the input of the downmixer (see Figure 2). When using inter-channel prediction 24′ and/or using inter-channel noise filler in a version formed by downmixing the previous frame, encoder 90 also includes a downmixer provider 31′ to form a reconstructed final version of the spectrum of the channels of the multi-channel audio signal. Of course, to save computation, instead of the final version, downmixer provider 31′ can use the original, unquantized version of the spectrum of the channels to form the downmix.

编码器90可以使用与频谱的可用的重构的最终版本有关的信息，以便执行帧间频谱预测，例如使用虚部估计执行声道间预测的上述可能版本，和/或以便执行速率控制，即以便在速率控制环路中确定在速率/失真最佳意义上设置由编码器90最终编码成数据流30的可能参数。Encoder 90 may use information related to the final version of the available reconstructed spectrum to perform inter-frame spectral prediction, such as performing the aforementioned possible version of inter-channel prediction using imaginary part estimation, and/or to perform rate control, i.e. to determine possible parameters in the rate control loop that are set in the best sense of rate/distortion by encoder 90 to ultimately encode into data stream 30.

例如，对于由标识符12'标识的每个零量化比例因子带，在编码器90的这种预测环路和/或速率控制环路中设置的一个这样的参数是相应比例因子带的比例因子，其仅仅由量化器98初始设置。在编码器90的预测和/或速率控制环路中，在一些心理声学或速率/失真最佳意义上设置零量化比例因子带的比例因子，以便确定上述目标噪声水平以及如上所述也由对应帧的数据流向解码器侧传送的可选修改参数。应当注意，可以仅使用其所属的频谱和声道(即，如前所述的“目标”频谱)来计算该比例因子，或者备选地，可以使用“目标”声道频谱的谱线以及此外从下混频提供器31'获得的来自先前帧的下混频谱(即，如前所述的“源”频谱)或另一声道频谱的谱线两者的谱线来确定该比例因子。特别地，为了稳定目标噪声水平并减少应用了声道间噪声填充的解码的音频声道中的时间水平波动，可以使用“目标”比例因子带中的谱线的能量测量与对应“源”区域中共同定位的谱线的能量测量之间的关系来计算目标比例因子。最后，如上所述，该“源”区域可以源自另一声道的经重构的最终版本或先前帧的下混频，或者如果要降低编码器复杂度，则可以源自该另一声道的初始的未量化的版本或先前帧的频谱的初始的未量化的版本的下混频。For example, for each zero-quantization scaling factor band identified by identifier 12', one such parameter set in this prediction loop and/or rate control loop of encoder 90 is the scaling factor of the corresponding scaling factor band, which is only initially set by quantizer 98. In the prediction and/or rate control loop of encoder 90, the scaling factor of the zero-quantization scaling factor band is set in some psychoacoustic or rate/distortion-optimal sense to determine the aforementioned target noise level and optional modification parameters also transmitted to the decoder side by the data stream of the corresponding frame as described above. It should be noted that the scaling factor can be calculated using only its associated spectrum and channel (i.e., the "target" spectrum as previously described), or alternatively, it can be determined using the spectral lines of the "target" channel spectrum and, in addition, the spectral lines of the downmixed spectrum from the previous frame (i.e., the "source" spectrum as previously described) or the spectral lines of another channel spectrum obtained from downmixer provider 31'. Specifically, to stabilize the target noise level and reduce temporal level fluctuations in the decoded audio channels with inter-channel noise filling, the target scaling factor can be calculated using the relationship between the energy measurements of spectral lines in the "target" scaling factor band and the energy measurements of spectral lines co-located in the corresponding "source" region. Finally, as mentioned above, the "source" region can originate from the reconstructed final version of another channel or the downmixing of a previous frame, or, if reducing encoder complexity, from the downmixing of the initial unquantized version of that other channel or the initial unquantized version of the spectrum of a previous frame.

在下文中，解释了根据实施例的多声道编码和多声道解码。在实施例中，用于图1a的解码的装置201的多声道处理器204可以例如被配置为进行以下关于噪声多声道解码所描述的技术中的一种或多种技术。The following explains multichannel encoding and multichannel decoding according to embodiments. In embodiments, the multichannel processor 204 of the device 201 for decoding of FIG1a may be configured, for example, to perform one or more of the techniques described below regarding noisy multichannel decoding.

然而，首先，在描述多声道解码之前，参考图7至图9解释根据实施例的多声道编码，然后参考图10和图12解释多声道解码。However, before describing multichannel decoding, multichannel encoding according to the embodiment will be explained with reference to Figures 7 to 9, and multichannel decoding will be explained with reference to Figures 10 and 12.

现在，参考图7至图9和图11解释根据实施例的多声道编码：Now, referring to Figures 7 through 9 and Figure 11, the multi-channel encoding according to the embodiment will be explained:

图7示出了用于对具有至少三个声道CH1至CH3的多声道信号101进行编码的装置(编码器)100的示意性框图。Figure 7 shows a schematic block diagram of an apparatus (encoder) 100 for encoding a multi-channel signal 101 having at least three channels CH1 to CH3.

装置100包括迭代处理器102、声道编码器104和输出接口106。The device 100 includes an iterative processor 102, a channel encoder 104, and an output interface 106.

迭代处理器102被配置为在第一迭代步骤中计算至少三个声道CH1至CH3中的每对声道之间的声道间相关值，以在第一迭代步骤中选择具有最高值或具有高于阈值的值的声道对，并且使用多声道处理操作处理所选择的声道对，以导出所选声道对的多声道参数MCH_PAR1并导出第一处理的声道P1和P2。在下文中，这种处理的声道P1和这种处理的声道P2也可以分别被称为组合声道P1和组合声道P2。此外，迭代处理器102被配置为使用处理的声道P1或P2中的至少一个在第二迭代步骤中执行计算、选择和处理，以导出多声道参数MCH_PAR2和第二处理的声道P3和P4。The iterative processor 102 is configured to calculate, in a first iteration step, interchannel correlation values between each pair of channels in at least three channels CH1 to CH3, to select the channel pair with the highest value or a value above a threshold, and to process the selected channel pair using multichannel processing operations to derive the multichannel parameter MCH_PAR1 of the selected channel pair and to derive the channels P1 and P2 of the first processing. Hereinafter, such processed channels P1 and P2 may also be referred to as combined channels P1 and P2, respectively. Furthermore, the iterative processor 102 is configured to perform calculation, selection, and processing in a second iteration step using at least one of the processed channels P1 or P2 to derive the multichannel parameter MCH_PAR2 and the channels P3 and P4 of the second processing.

例如，如图7所示，迭代处理器102可以在第一迭代步骤中计算：至少三个声道CH1至CH3中的第一对之间的声道间相关值，第一对由第一声道CH1和第二声道CH2组成；至少三个声道CH1至CH3中的第二对之间的声道间相关值，第二对由第二声道CH2和第三声道CH3组成；以及至少三个声道CH1至CH3中的第三对之间的声道间相关值，第三对由第一声道CH1和第三声道CH3组成。For example, as shown in Figure 7, the iterative processor 102 may calculate in the first iterative step: the interchannel correlation value between a first pair of at least three channels CH1 to CH3, the first pair consisting of the first channel CH1 and the second channel CH2; the interchannel correlation value between a second pair of at least three channels CH1 to CH3, the second pair consisting of the second channel CH2 and the third channel CH3; and the interchannel correlation value between a third pair of at least three channels CH1 to CH3, the third pair consisting of the first channel CH1 and the third channel CH3.

在图7中，假设在第一迭代步骤中，由第一声道CH1和第三声道CH3组成的第三对包括最高声道间相关值，使得迭代处理器102在第一迭代步骤中选择具有最高声道间相关值的第三对对并使用多声道处理操作处理所选择的声道对(即，第三对)，以导出所选声道对的多声道参数MCH_PAR1并导出第一处理的声道P1和P2。In Figure 7, it is assumed that in the first iteration step, the third pair consisting of the first channel CH1 and the third channel CH3 includes the highest interchannel correlation value, such that the iteration processor 102 selects the third pair with the highest interchannel correlation value in the first iteration step and processes the selected channel pair (i.e., the third pair) using multichannel processing operation to derive the multichannel parameter MCH_PAR1 of the selected channel pair and derive the channels P1 and P2 of the first processing.

此外，迭代处理器102可以被配置为在第二迭代步骤中计算至少三个声道CH1至CH3和处理的声道P1和P2的每对之间的声道间相关值，以在第二迭代步骤中选择具有最高声道间相关值或具有高于阈值的值的声道对。由此，迭代处理器102可以被配置为在第二迭代步骤(或在任何另外的迭代步骤)中不选择第一迭代步骤的所选声道对。Furthermore, the iterative processor 102 can be configured in the second iteration step to calculate interchannel correlation values between each pair of at least three channels CH1 to CH3 and processed channels P1 and P2, in order to select the channel pair with the highest interchannel correlation value or a value above a threshold in the second iteration step. Thus, the iterative processor 102 can be configured not to select the channel pair selected in the first iteration step in the second iteration step (or in any other iteration step).

参考图7中所示的示例，迭代处理器102还可以计算由第一声道CH1和第一处理的声道P1组成的第四声道对之间的声道间相关值，由第一声道CH1和第二处理的声道P2组成的第五声道对之间的声道间相关值，由第二声道CH2和第一处理的声道P1组成的第六声道对之间的声道间相关值，由第二声道CH2和第二处理的声道P2组成的第七声道对之间的声道间相关值，由第三声道CH3和第一处理的声道P1组成的第八声道对之间的声道间相关值，由第三声道CH3和第二处理的声道P2组成的第九声道对之间的声道间相关值，以及由第一处理的声道P1和第二处理的声道P2组成的第十声道对之间的声道间相关值。Referring to the example shown in Figure 7, the iterative processor 102 can also calculate the interchannel correlation value between the fourth channel pair consisting of the first channel CH1 and the first processed channel P1, the interchannel correlation value between the fifth channel pair consisting of the first channel CH1 and the second processed channel P2, the interchannel correlation value between the sixth channel pair consisting of the second channel CH2 and the first processed channel P1, the interchannel correlation value between the seventh channel pair consisting of the second channel CH2 and the second processed channel P2, the interchannel correlation value between the eighth channel pair consisting of the third channel CH3 and the first processed channel P1, the interchannel correlation value between the ninth channel pair consisting of the third channel CH3 and the second processed channel P2, and the interchannel correlation value between the tenth channel pair consisting of the first processed channel P1 and the second processed channel P2.

在图7中，假设在第二迭代步骤中，由第二声道CH2和第一处理的声道P1组成的第六声道对包括最高声道间相关值，使得迭代处理器102在第二迭代步骤中选择第六声道对并使用多声道处理操作来处理所选声道对(即，第六对)，以导出所选声道对的多声道参数MCH_PAR2并导出第二处理的声道P3和P4。In Figure 7, it is assumed that in the second iteration step, the sixth channel pair consisting of the second channel CH2 and the first processed channel P1 includes the highest inter-channel correlation value, such that the iteration processor 102 selects the sixth channel pair in the second iteration step and uses multi-channel processing operation to process the selected channel pair (i.e., the sixth pair) to derive the multi-channel parameter MCH_PAR2 of the selected channel pair and derive the second processed channels P3 and P4.

迭代处理器102可以被配置为仅在声道对的水平差小于阈值时选择该声道对，该阈值小于40dB、25dB、12dB或小于6dB。因此，25dB或40dB的阈值对应于3或0.5度的旋转角度。The iterative processor 102 can be configured to select a channel pair only when the horizontal difference between the channel pairs is less than a threshold, which is less than 40dB, 25dB, 12dB, or less than 6dB. Therefore, the threshold of 25dB or 40dB corresponds to a rotation angle of 3 or 0.5 degrees.

迭代处理器102可以被配置为计算标准化的整数相关值，其中迭代处理器102可以被配置为当整数相关值大于例如0.2或优选地0.3时选择声道对。The iterative processor 102 can be configured to compute a normalized integer correlation value, wherein the iterative processor 102 can be configured to select a channel pair when the integer correlation value is greater than, for example, 0.2 or preferably 0.3.

此外，迭代处理器102可以向声道编码器104提供通过多声道处理所得的声道。例如，参考图7，迭代处理器102可以向声道编码器104提供通过在第二迭代步骤中执行的多声道处理所得的第三处理的声道P3和第四处理的声道P4，以及通过在第一迭代步骤中执行的多声道处理所得的第二处理的声道P2。因此，迭代处理器102可以仅向声道编码器104提供在随后的迭代步骤中未(进一步)处理的那些处理的声道。如图7所示，未向声道编码器104提供第一处理的声道P1，这是因为它在第二迭代步骤中被进一步处理。Furthermore, the iterative processor 102 can provide the channel encoder 104 with the channels obtained through multi-channel processing. For example, referring to FIG7, the iterative processor 102 can provide the channel encoder 104 with the third processed channel P3 and the fourth processed channel P4 obtained through multi-channel processing performed in the second iteration step, and the second processed channel P2 obtained through multi-channel processing performed in the first iteration step. Therefore, the iterative processor 102 can only provide the channel encoder 104 with those processed channels that were not (further) processed in subsequent iteration steps. As shown in FIG7, the first processed channel P1 is not provided to the channel encoder 104 because it was further processed in the second iteration step.

声道编码器104可以被配置为对通过迭代处理器102执行的迭代处理(或多声道处理)所得的声道P2至P4进行编码，以获得编码的声道E1至E3。The channel encoder 104 can be configured to encode channels P2 to P4 obtained by iterative processing (or multi-channel processing) performed by the iterative processor 102 to obtain encoded channels E1 to E3.

例如，声道编码器104可以被配置为使用单声道编码器(或单声道框或单声道工具)120_1至120_3对通过迭代处理(或多声道处理)所得的声道P2至P4进行编码。单声道框可以被配置为对声道进行编码，使得与对具有较多能量(或较高幅度)的声道进行编码相比，对具有较少能量(或较小幅度)的声道进行编码所需的比特较少。单声道框120_1至120_3可以是例如基于变换的音频编码器。此外，声道编码器104可以被配置为使用立体声编码器(例如，参数化立体声编码器或有损立体声编码器)对通过迭代处理(或多声道处理)所得的声道P2到P4进行编码。For example, the channel encoder 104 can be configured to encode channels P2 to P4 obtained through iterative processing (or multi-channel processing) using mono encoders (or mono frames or mono tools) 120_1 to 120_3. The mono frames can be configured to encode channels such that fewer bits are required to encode channels with less energy (or smaller amplitude) compared to encoding channels with more energy (or higher amplitude). Mono frames 120_1 to 120_3 can be, for example, transform-based audio encoders. Furthermore, the channel encoder 104 can be configured to encode channels P2 to P4 obtained through iterative processing (or multi-channel processing) using a stereo encoder (e.g., a parametric stereo encoder or a lossy stereo encoder).

输出接口106可以被配置为生成具有编码的声道E1至E3和多声道参数MCH_PAR1和MCH_PAR2的编码的多声道信号107。Output interface 106 can be configured to generate encoded multichannel signals 107 with encoded channels E1 to E3 and multichannel parameters MCH_PAR1 and MCH_PAR2.

例如，输出接口106可以被配置为生成编码的多声道信号107作为串行信号或串行比特流，并且使得多声道参数MCH_PAR2在编码的信号107中位于多声道参数MCH_PAR1之前。因此，解码器(其实施例将在后面参考图10描述)将在多声道参数MCH-PAR1之前接收多声道参数MCH_PAR2。For example, output interface 106 can be configured to generate an encoded multichannel signal 107 as a serial signal or serial bit stream, such that multichannel parameter MCH_PAR2 precedes multichannel parameter MCH_PAR1 in the encoded signal 107. Therefore, the decoder (an embodiment of which will be described later with reference to FIG10) will receive multichannel parameter MCH_PAR2 before multichannel parameter MCH_PAR1.

在图7中，迭代处理器102示例性地执行两个多声道处理操作，第一迭代步骤中的多声道处理操作和第二迭代步骤中的多声道处理操作。当然，迭代处理器102还可以在随后的迭代步骤中执行另外的多声道处理操作。由此，迭代处理器102可以被配置为执行迭代步骤，直到达到迭代终止标准为止。迭代终止标准可以是最大迭代步数等于多声道信号101的声道总数或者比多声道信号101的声道总数大2，或者其中迭代终止标准是当声道间相关值不具有大于阈值的值时，该阈值优选地大于0.2或该阈值优选地为0.3。在另外的实施例中，迭代终止标准可以是最大迭代步数等于或高于多声道信号101的声道总数，或者其中迭代终止标准是当声道间相关值不具有大于阈值的值时，该阈值优选地大于0.2或该阈值优选地为0.3。In Figure 7, the iterative processor 102 exemplarily performs two multichannel processing operations: the multichannel processing operation in the first iteration step and the multichannel processing operation in the second iteration step. Of course, the iterative processor 102 can also perform additional multichannel processing operations in subsequent iteration steps. Thus, the iterative processor 102 can be configured to execute iterative steps until an iteration termination criterion is reached. The iteration termination criterion can be a maximum number of iteration steps equal to or greater than the total number of channels of the multichannel signal 101 by 2, or wherein the iteration termination criterion is when the inter-channel correlation value does not have a value greater than a threshold, preferably greater than 0.2 or preferably 0.3. In another embodiment, the iteration termination criterion can be a maximum number of iteration steps equal to or greater than the total number of channels of the multichannel signal 101, or wherein the iteration termination criterion is when the inter-channel correlation value does not have a value greater than a threshold, preferably greater than 0.2 or preferably 0.3.

出于说明目的，迭代处理器102在第一迭代步骤和第二迭代步骤中执行的多声道处理操作在图7中由处理框110和112示例性地示出。处理框110和112可以用硬件或软件实现。例如，处理框110和112可以是立体声框。For illustrative purposes, the multi-channel processing operations performed by the iterative processor 102 in the first and second iteration steps are exemplarily shown in FIG. 7 by processing blocks 110 and 112. Processing blocks 110 and 112 can be implemented in hardware or software. For example, processing blocks 110 and 112 can be stereo blocks.

由此，可以通过分层地应用已知的联合立体声编码工具来利用声道间信号相依性。与先前的MPEG方法相反，要处理的信号对不是由固定信号路径(例如，立体声编码树)预先确定的，而是可以动态地改变以适应输入信号特性。实际立体声框的输入可以是(1)未处理的声道，例如声道CH1至CH3，(2)前一立体声框的输出，例如处理的信号P1至P4，或(3)未处理的声道和前一立体声框的输出的组合声道。Therefore, the inter-channel signal dependencies can be utilized by applying known joint stereo coding tools in a layered manner. In contrast to previous MPEG methods, the signal pairs to be processed are not predetermined by fixed signal paths (e.g., stereo coding trees), but can be dynamically changed to adapt to the characteristics of the input signals. The input to the actual stereo frame can be (1) unprocessed channels, such as channels CH1 to CH3, (2) the output of the previous stereo frame, such as processed signals P1 to P4, or (3) a combination of unprocessed channels and the output of the previous stereo frame.

立体声框110和112内部的处理可以是基于预测的(如USAC中的复数预测框)或基于KLT/PCA(输入声道在编码器中旋转(例如，通过2×2旋转矩阵)以最大化能量压缩，即，将信号能量集中到一个声道中，在解码器中，经旋转信号将被重新变换为原始输入信号方向)。The processing inside the stereo frames 110 and 112 can be prediction-based (such as complex prediction frames in USAC) or KLT/PCA-based (the input channel is rotated in the encoder (e.g., by a 2×2 rotation matrix) to maximize energy compression, i.e., to concentrate the signal energy into one channel, and in the decoder, the rotated signal is re-transformed back to the original input signal direction).

在编码器100的可能实施方式中，(1)编码器计算每个声道对之间的声道间相关性，并从输入信号中选择一个合适的信号对，并将立体工具应用于所选声道；(2)编码器重新计算所有声道(未处理的声道以及处理的的中间输出声道)之间的声道间相关性，并从输入信号中选择一个合适的信号对，并将立体工具应用于所选声道；(3)编码器重复步骤(2)直到所有声道间相关性低于阈值或者如果应用了最大数量的变换。In a possible implementation of encoder 100, (1) the encoder calculates the interchannel correlation between each channel pair, selects a suitable signal pair from the input signals, and applies a stereo tool to the selected channel; (2) the encoder recalculates the interchannel correlation between all channels (unprocessed channels and processed intermediate output channels), selects a suitable signal pair from the input signals, and applies a stereo tool to the selected channel; (3) the encoder repeats step (2) until the interchannel correlation of all channels is below a threshold or if the maximum number of transformations is applied.

如已经提及的，要由编码器100，或者更确切地说是迭代处理器102，处理的信号对不是由固定信号路径(例如，立体声编码树)预先确定的，而是可以动态地改变以适应输入信号特性。由此，编码器100(或迭代处理器102)可以被配置为根据多声道(输入)信号101的至少三个声道CH1至CH3来构造立体声树。换言之，编码器100(或迭代处理器102)可以被配置为基于声道间相关性来构建立体声树(例如，通过在第一迭代步骤中计算至少三个声道CH1至CH3中的每对之间的声道间相关值，以在第一迭代步骤中，选择具有最高值或高于阈值的值的声道对，并且通过在第二迭代步骤中计算至少三个声道中的每对和先前处理的声道之间的声道间相关值，以在第二迭代步骤中选择具有最高值或高于阈值的值的声道对)。根据一步方法，可以针对可能的每次迭代计算相关矩阵，其包含先前迭代中的所有可能处理的声道的相关。As already mentioned, the signal pairs processed by encoder 100, or more precisely iterative processor 102, are not predetermined by a fixed signal path (e.g., a stereo coding tree), but can be dynamically changed to adapt to the characteristics of the input signal. Thus, encoder 100 (or iterative processor 102) can be configured to construct a stereo tree based on at least three channels CH1 to CH3 of the multichannel (input) signal 101. In other words, encoder 100 (or iterative processor 102) can be configured to construct the stereo tree based on inter-channel correlation (e.g., by calculating the inter-channel correlation value between each pair of at least three channels CH1 to CH3 in a first iteration step to select the channel pair with the highest value or a value above a threshold in the first iteration step, and by calculating the inter-channel correlation value between each pair of at least three channels and the previously processed channels in a second iteration step to select the channel pair with the highest value or a value above a threshold in the second iteration step). According to the one-step method, a correlation matrix containing the correlations of all possible processed channels in the previous iterations can be computed for each possible iteration.

如上所述，迭代处理器102可以被配置为在第一迭代步骤中导出用于所选声道对的多声道参数MCH_PAR1，并且在第二迭代步骤中导出用于所选声道对的多声道参数MCH_PAR2。多声道参数MCH_PAR1可以包括标识(或信令)在第一迭代步骤中选择的声道对的第一声道对标识(或索引)，其中多声道参数MCH_PAR2可以包括标识(或者信令)在第二迭代步骤中选择的声道对的第二声道对标识(或索引)。As described above, the iterative processor 102 can be configured to derive multichannel parameters MCH_PAR1 for the selected channel pair in a first iteration step and MCH_PAR2 for the selected channel pair in a second iteration step. Multichannel parameter MCH_PAR1 may include an identifier (or signaling) of a first channel pair identifier (or index) of the channel pair selected in the first iteration step, while multichannel parameter MCH_PAR2 may include an identifier (or signaling) of a second channel pair identifier (or index) of the channel pair selected in the second iteration step.

在下文中，描述了输入信号的有效索引。例如，可以依据声道的总数使用每个声道对的唯一索引来有效地信令声道对。例如，六个声道的声道对的索引可以如下表所示：The following describes the effective indexing of the input signal. For example, channel pairs can be efficiently signaled using a unique index for each channel pair based on the total number of channels. For example, the indexes for six-channel channel pairs can be shown in the following table:

例如，在上表中，索引5可以用信号通知由第一声道和第二声道组成的声道对。类似地，索引6可以用信号通知由第一声道和第三声道组成的声道对。For example, in the table above, index 5 can signal the channel pair consisting of the first and second channels. Similarly, index 6 can signal the channel pair consisting of the first and third channels.

n个声道的可能的声道对索引的总数可以计算为：The total number of possible channel pair indices for n channels can be calculated as follows:

numPairs＝numChannels＊(numChannels-1)/2numPairs＝numChannels*(numChannels-1)/2

因此，用信号通知一个声道对所需的比特数量为：Therefore, the number of bits required to signal a channel pair is:

numBits＝floor(log₂(numPairs-1))+1numBits＝floor(log ₂ (numPairs-1))+1

此外，编码器100可以使用声道掩码。多声道工具的配置可以包含指示对于哪个声道该工具处于激活状态的声道掩码。因此，可以从声道对索引中去除LFE(LFE＝低频效果/增强声道)，从而允许更高效的编码。例如，对于11.1设置，这将声道对索引的数量从12*11/2＝66减少到11*10/2＝55，允许以6比特而不是7比特用信号通知。该机制还可用于排除旨在为单声道对象的声道(例如，多语言音轨)。在声道掩码(channelMask)的解码时，可以生成声道映射(channelMap)以允许将声道对索引重新映射到解码器声道。Furthermore, encoder 100 can use channel masks. The configuration of a multi-channel tool can include a channel mask indicating which channel the tool is active for. Therefore, LFE (Low Frequency Effects/Enhanced Channels) can be removed from the channel pair index, allowing for more efficient encoding. For example, for an 11.1 setting, this reduces the number of channel pair indices from 12 * 11 / 2 = 66 to 11 * 10 / 2 = 55, allowing signaling with 6 bits instead of 7 bits. This mechanism can also be used to exclude channels intended for mono objects (e.g., multilingual tracks). During decoding of the channel mask, a channel map can be generated to allow remapping of channel pair indices to decoder channels.

此外，迭代处理器102可以被配置为针对第一帧导出多个所选声道对指示，其中输出接口106可以被配置为针对第一帧之后的第二帧在多声道信号107中包括保持指示符，指示第二帧具有与第一帧相同的多个所选声道对指示。Furthermore, the iterative processor 102 can be configured to derive multiple selected channel pair indications for the first frame, wherein the output interface 106 can be configured to include a hold indicator in the multichannel signal 107 for a second frame following the first frame, indicating that the second frame has the same multiple selected channel pair indications as the first frame.

保持指示符或保持树标志可用于用信号通知未发送新树，但应使用最后一个立体声树。如果声道相关属性保持固定不变较长时间，则这可以用于避免相同立体声树配置的多次发送。The hold indicator or hold tree flag can be used to signal that no new tree has been sent, but the last stereo tree should be used. This can be used to avoid multiple transmissions of the same stereo tree configuration if channel-related attributes remain unchanged for a long period of time.

图8示出了立体声框110、112的示意性框图。立体声框110、112包括用于第一输入信号I1和第二输入信号I2的输入端，以及用于第一输出信号O1和第二输出信号O2的输出端。如图8所示，输出信号O1和O2与输入信号I1和I2的相关性可以用s参数S1至S4描述。Figure 8 shows a schematic block diagram of stereo frames 110 and 112. Stereo frames 110 and 112 include input terminals for a first input signal I1 and a second input signal I2, and output terminals for a first output signal O1 and a second output signal O2. As shown in Figure 8, the correlation between the output signals O1 and O2 and the input signals I1 and I2 can be described by s-parameters S1 to S4.

迭代处理器102可以使用(或包括)立体声框110、112，以便对输入声道和/或处理的声道执行多声道处理操作，以便导出经(进一步)处理的声道。例如，迭代处理器102可以被配置为使用基于通用预测的或基于KLT(Karhunen-Loève变换)的旋转立体声框110、112。The iterative processor 102 may use (or include) stereo frames 110, 112 to perform multichannel processing operations on the input channels and/or processed channels to derive (further) processed channels. For example, the iterative processor 102 may be configured to use rotating stereo frames 110, 112 based on general prediction or KLT (Karhunen-Loève transform).

通用编码器(或编码器侧立体声框)可以被配置为基于以下等式对输入信号I1和I2进行编码以获得输出信号O1和O2：A general-purpose encoder (or encoder-side stereo frame) can be configured to encode input signals I1 and I2 based on the following equation to obtain output signals O1 and O2:

声道屏蔽(channelMask)的解码上，可生成声道对映(channelMap)In decoding channel masks, channel maps can be generated.

通用解码器(或解码器侧立体声框)可以被配置为对输入信号I1和I2进行解码，以基于以下等式获得输出信号O1和O2：A universal decoder (or decoder-side stereo frame) can be configured to decode input signals I1 and I2 to obtain output signals O1 and O2 based on the following equation:

基于预测的编码器(或编码器侧立体声框)可以被配置为对输入信号I1和I2进行编码以基于以下等式获得输出信号O1和O2：A prediction-based encoder (or encoder-side stereo frame) can be configured to encode input signals I1 and I2 to obtain output signals O1 and O2 based on the following equation:

其中p是预测系数。Where p is the prediction coefficient.

基于预测的解码器(或解码器侧立体声框)可以被配置为对输入信号I1和I2进行解码以基于以下等式获得输出信号O1和O2：A prediction-based decoder (or decoder-side stereo frame) can be configured to decode input signals I1 and I2 to obtain output signals O1 and O2 based on the following equation:

基于KLT的旋转编码器(或编码器侧立体声框)可以被配置为对输入信号I1和I2进行解码以基于以下等式获得输出信号O1和O2：A KLT-based rotary encoder (or encoder-side stereo frame) can be configured to decode input signals I1 and I2 to obtain output signals O1 and O2 based on the following equation:

基于KLT的旋转解码器(或解码器侧立体声框)可以被配置为对输入信号I1和I2进行解码，以基于以下等式(逆旋转)获得输出信号O1和O2：A KLT-based rotating decoder (or decoder-side stereo frame) can be configured to decode input signals I1 and I2 to obtain output signals O1 and O2 based on the following equation (inverse rotation):

在下文中，描述了基于KLT的旋转的旋转角度α的计算。The calculation of the rotation angle α based on KLT is described below.

基于KLT的旋转的旋转角度α可以定义为：The rotation angle α based on KLT can be defined as:

c_xy是非标准化的相关矩阵的条目，其中，c₁₁、c₂₂是声道能量。c _xy are entries in the unnormalized correlation matrix, where c _{11} and c _{22} are the tract energies.

这可以使用atan2函数来实现，以便允许区分分子中的负相关和分母中的负能量差：This can be achieved using the atan2 function, allowing for the differentiation between negative correlations in the numerator and negative energy differences in the denominator:

alpha＝0.5＊atan2(2＊correlation[ch1][ch2]，(correlation[ch1][ch1]-correlation[ch2][ch2]))；alpha=0.5*atan2(2*correlation[ch1][ch2], (correlation[ch1][ch1]-correlation[ch2][ch2]));

此外，迭代处理器102可以被配置为使用包括多个频带的每个声道的帧来计算声道间相关，从而获得多个频带的单个声道间相关值，其中迭代处理器102可以被配置为对多个频带中的每个频带执行多声道处理，使得从多个频带中的每个频带获得多声道参数。Furthermore, the iterative processor 102 can be configured to use frames of each channel comprising multiple frequency bands to calculate inter-channel correlations, thereby obtaining individual inter-channel correlation values for multiple frequency bands, wherein the iterative processor 102 can be configured to perform multi-channel processing on each of the multiple frequency bands, such that multi-channel parameters are obtained from each of the multiple frequency bands.

由此，迭代处理器102可以被配置为在多声道处理中计算立体声参数，其中迭代处理器102可以被配置为仅在频带中执行立体声处理，其中立体声参数高于由立体声量化器(例如，基于KLT的旋转编码器)定义的量化为零的阈值。立体声参数可以是例如MS开/关或旋转角度或预测系数)。Therefore, the iterative processor 102 can be configured to compute stereo parameters in multi-channel processing, wherein the iterative processor 102 can be configured to perform stereo processing only in a frequency band, wherein the stereo parameters are above a threshold defined by a stereo quantizer (e.g., a KLT-based rotary encoder) at which quantization is zero. The stereo parameters can be, for example, MS on/off, rotation angle, or prediction coefficients.

例如，迭代处理器102可以被配置为在多声道处理中计算旋转角度，其中迭代处理器102可以被配置为仅在频带中执行旋转处理，在所述频带中旋转角度高于由旋转角度量化器(例如，基于KLT的旋转编码器)定义的量化为零的阈值。For example, iterative processor 102 can be configured to calculate rotation angles in multi-channel processing, wherein iterative processor 102 can be configured to perform rotation processing only in a frequency band where the rotation angle is above a threshold defined by a rotation angle quantizer (e.g., a KLT-based rotation encoder) to which quantization is zero.

因此，编码器100(或输出接口106)可以被配置为发送变换/旋转信息作为完整频谱的一个参数(全频带框)或者作为频谱的一部分的多个频率相关参数。Therefore, encoder 100 (or output interface 106) can be configured to send transformation/rotation information as a parameter (full-band frame) of the complete spectrum or as multiple frequency-related parameters as part of the spectrum.

编码器100可以被配置为基于以下表格生成比特流107：Encoder 100 can be configured to generate bitstream 107 based on the following table:

表1-mpegh3daExtElementConfig()的语法Table 1 - Syntax of mpegh3daExtElementConfig()

表2-MCCConfig()的语法Table 2 - Syntax of MCCConfig()

表3-MultichannelCodingBoxBandWise()的语法Table 3 - Syntax of MultichannelCodingBoxBandWise()

表4-MultichannelCodingBoxFullband()的语法Table 4 - Syntax of MultichannelCodingBoxFullband()

表5-MultichannelCodingFrame()的语法Table 5 - Syntax of MultichannelCodingFrame()

表6-usacExtElementType的值Table 6 - Values of usacExtElementType

表7-对用于扩展负载解码的数据块的解释Table 7 - Explanation of data blocks used for extended payload decoding

图9示出了根据实施例的迭代处理器102的示意性框图。在图9所示的实施例中，多声道信号101是具有六个声道的5.1声道信号：左声道L，右声道R，左环绕声道Ls，右环绕声道Rs，中心声道C和低频效应声道LFE。Figure 9 shows a schematic block diagram of the iterative processor 102 according to an embodiment. In the embodiment shown in Figure 9, the multi-channel signal 101 is a 5.1-channel signal with six channels: left channel L, right channel R, left surround channel Ls, right surround channel Rs, center channel C, and low-frequency effect channel LFE.

如图9所示，迭代处理器102不处理LFE声道。这可能是这种情况，因为LFE声道与其他五个声道L、R、Ls、Rs和C中的每个声道之间的声道间相关值太小，或者因为声道掩码指示不处理LFE声道，这将在下面假设。As shown in Figure 9, the iterative processor 102 does not process the LFE channel. This could be because the interchannel correlation value between the LFE channel and each of the other five channels L, R, Ls, Rs, and C is too small, or because the channel mask indicates that the LFE channel is not processed, as will be assumed below.

在第一迭代步骤中，迭代处理器102计算五个声道L、R、Ls、Rs和C中的每对之间的声道间相关值，以在第一迭代步骤中选择具有最高值或者具有高于阈值的值的声道对。在图9中，假设左声道L和右声道R具有最高值，使得迭代处理器102使用执行多声道操作处理操作的立体声框(或立体声工具)110处理左声道L和右声道R，以导出第一处理的声道P1和第二处理的声道P2。In the first iteration step, the iterative processor 102 calculates the interchannel correlation value between each pair of the five channels L, R, Ls, Rs, and C to select the channel pair with the highest value or a value above a threshold in the first iteration step. In Figure 9, assuming that the left channel L and the right channel R have the highest values, the iterative processor 102 processes the left channel L and the right channel R using a stereo frame (or stereo tool) 110 that performs multichannel operation processing to derive the first processed channel P1 and the second processed channel P2.

在第二迭代步骤中，迭代处理器102计算五个声道L、R、Ls、Rs和C和处理的声道P1和P2中的每对之间的声道间相关值，以在第二迭代步骤中选择具有最高值或具有高于阈值的值的声道对。在图9中，假设左环绕声道Ls和右环绕声道Rs具有最高值，使得迭代处理器102使用立体声框(或立体声工具)112处理左环绕声道Ls和右环绕声道Rs，以导出第三处理的声道P3和第四处理的声道P4。In the second iteration step, the iterative processor 102 calculates the interchannel correlation values between each pair of the five channels L, R, Ls, Rs, and C and the processed channels P1 and P2, in order to select the channel pairs with the highest values or values above a threshold in the second iteration step. In Figure 9, assuming that the left surround channel Ls and the right surround channel Rs have the highest values, the iterative processor 102 processes the left surround channel Ls and the right surround channel Rs using a stereo frame (or stereo tool) 112 to derive the third processed channel P3 and the fourth processed channel P4.

在第三迭代步骤中，迭代处理器102计算五个声道L、R、Ls、Rs和C和处理的声道P1至P4中的每对之间的声道间相关值，以在第三迭代步骤中选择具有最高值或具有高于阈值的值的声道对。在图9中，假设第一处理的声道P1和第三处理的声道P3具有最高值，使得迭代处理器102使用立体声框(或立体声工具)114处理第一处理的声道P1和第三处理的声道P3，以导出第五处理的声道P5和第六处理的声道P6。In the third iteration step, the iterative processor 102 calculates the interchannel correlation values between each pair of the five channels L, R, Ls, Rs, and C and the processed channels P1 to P4, in order to select the channel pairs with the highest values or values above a threshold in the third iteration step. In Figure 9, assuming that the first processed channel P1 and the third processed channel P3 have the highest values, the iterative processor 102 processes the first processed channel P1 and the third processed channel P3 using a stereo frame (or stereo tool) 114 to derive the fifth processed channel P5 and the sixth processed channel P6.

在第四迭代步骤中，迭代处理器102计算五个声道L、R、Ls、Rs和C和处理的声道P1至P6中的每对之间的声道间相关值，以在第四迭代步骤中选择具有最高值或具有高于阈值的值的声道对。在图9中，假设第五处理的声道P5和中心声道C具有最高值，使得迭代处理器102使用立体声框(或立体工具)115处理第五处理的声道P5和中心声道C，以导出第七处理的声道P7和第八处理的声道P8。In the fourth iteration step, the iterative processor 102 calculates the interchannel correlation values between each pair of the five channels L, R, Ls, Rs, and C and the processed channels P1 to P6, in order to select the channel pairs with the highest values or values above a threshold in the fourth iteration step. In Figure 9, assuming that the fifth processed channel P5 and the center channel C have the highest values, the iterative processor 102 processes the fifth processed channel P5 and the center channel C using a stereo frame (or stereo tool) 115 to derive the seventh processed channel P7 and the eighth processed channel P8.

立体声框110至116可以是MS立体声框，即被配置为提供中间声道和侧声道的中/侧立体声框。中间声道可以是立体声框的输入声道的总和，其中侧声道可以是立体声框的输入声道之间的差。此外，立体声框110和116可以是旋转框或立体声预测框。Stereo frames 110 to 116 can be MS stereo frames, i.e., center/side stereo frames configured to provide a center channel and side channels. The center channel can be the sum of the input channels of the stereo frame, while the side channels can be the difference between the input channels of the stereo frame. Furthermore, stereo frames 110 and 116 can be rotated frames or stereo prediction frames.

在图9中，第一处理的声道P1、第三处理的声道P3和第五处理的声道P5可以是中间声道，其中第二处理的声道P2、第四处理的声道P4和第六处理的声道P6可以是侧声道。In Figure 9, the first processing channel P1, the third processing channel P3, and the fifth processing channel P5 can be the middle channel, while the second processing channel P2, the fourth processing channel P4, and the sixth processing channel P6 can be the side channels.

此外，如图9所示，迭代处理器102可以被配置为使用输入声道L、R、Ls、Rs和C以及处理的声道中的(仅)中间声道P1、P3和P5在第二迭代步骤，并且如果适用的话，在任何另外的迭代步骤中执行计算、选择和处理。换言之，迭代处理器102可以被配置为在第二迭代步骤中，并且如果适用的话，在任何另外的迭代步骤中的计算、选择和处理中不使用处理的声道中的侧声道P1、P3和P5。Furthermore, as shown in Figure 9, the iterative processor 102 can be configured to use the input channels L, R, Ls, Rs, and C, and (only) the middle channels P1, P3, and P5 of the processed channels in the second iteration step, and, if applicable, perform calculations, selections, and processing in any additional iteration steps. In other words, the iterative processor 102 can be configured not to use the side channels P1, P3, and P5 of the processed channels in the second iteration step, and, if applicable, in the calculations, selections, and processing in any additional iteration steps.

图11示出了用于对具有至少三个声道的多声道信号进行编码的方法300的流程图。方法300包括：步骤302，在第一迭代步骤中计算至少三个声道中的每对之间的声道间相关值，在第一迭代步骤中选择具有最高值或具有高于阈值的值的声道对，并使用多声道处理操作来处理所选声道对，以导出所选声道对的多声道参数MCH_PAR1并导出第一处理的声道；步骤304，使用至少一个处理的声道，在第二次迭代步骤中执行计算、选择和处理，以导出多声道参数MCH_PAR2和第二处理的声道；步骤306，对通过迭代处理器所执行的迭代处理所得的声道进行编码，以获得编码的声道；以及步骤308，生成具有编码的声道和第一和多声道参数MCH_PAR2的编码的多声道信号。Figure 11 shows a flowchart of a method 300 for encoding a multichannel signal having at least three channels. Method 300 includes: step 302, calculating interchannel correlation values between each pair of at least three channels in a first iteration step, selecting a channel pair with the highest value or a value above a threshold in the first iteration step, and processing the selected channel pair using a multichannel processing operation to derive multichannel parameters MCH_PAR1 for the selected channel pair and derive a first-processed channel; step 304, performing calculation, selection, and processing in a second iteration step using at least one processed channel to derive multichannel parameters MCH_PAR2 and a second-processed channel; step 306, encoding the channels obtained through iterative processing performed by the iterative processor to obtain encoded channels; and step 308, generating an encoded multichannel signal with encoded channels and encoded first and multichannel parameters MCH_PAR2.

在下文中，解释了多声道解码。Multichannel decoding is explained below.

图10示出了用于对具有编码的声道E1至E3和至少两个多声道参数MCH_PAR1和MCH_PAR2的编码的多声道信号107进行解码的装置(解码器)200的示意性框图。Figure 10 shows a schematic block diagram of an apparatus (decoder) 200 for decoding a multi-channel signal 107 encoded with encoded channels E1 to E3 and at least two multi-channel parameters MCH_PAR1 and MCH_PAR2.

装置200包括声道解码器202和多声道处理器204。The device 200 includes a channel decoder 202 and a multi-channel processor 204.

声道解码器202被配置为对编码的声道E1至E3进行解码以获得解码的声道D1至D3。The channel decoder 202 is configured to decode the encoded channels E1 to E3 to obtain the decoded channels D1 to D3.

例如，声道解码器202可以包括至少三个单声道解码器(或单声道框或单声道工具)206_1至206_3，其中单声道解码器206_1至206_3中的每个可以被配置为对至少三个编码的声道E1至E3中的一个进行解码，以获得相应的解码的声道E1到E3。单声道解码器206_1至206_3可以是例如基于变换的音频解码器。For example, the channel decoder 202 may include at least three mono decoders (or mono boxes or mono tools) 206_1 to 206_3, wherein each of the mono decoders 206_1 to 206_3 may be configured to decode one of at least three coded channels E1 to E3 to obtain the corresponding decoded channels E1 to E3. The mono decoders 206_1 to 206_3 may be, for example, transform-based audio decoders.

多声道处理器204被配置用于使用由多声道参数MCH_PAR2标识的第二对解码的声道并使用多声道参数MCH_PAR2来执行多声道处理以获得处理的声道，并且被配置用于使用多声道参数MCH_PAR1标识的第一声道对并使用多声道参数MCH_PAR1执行进一步的多声道处理，其中第一声道对包括至少一个处理的声道。The multichannel processor 204 is configured to perform multichannel processing using a second pair of decoded channels identified by the multichannel parameter MCH_PAR2 to obtain processed channels, and is configured to perform further multichannel processing using a first pair of channels identified by the multichannel parameter MCH_PAR1, wherein the first pair of channels includes at least one processed channel.

如图10中以举例的方式所示，多声道参数MCH_PAR2可以指示(或用信号通知)第二解码的声道对由第一解码的声道D1和第二解码的声道D2组成。因此，多声道处理器204使用由第一解码的声道D1和第二解码的声道D2组成的第二解码的声道对(用多声道参数MCH_PAR2标识)并且使用多声道参数MCH_PAR2执行多声道处理，以获得处理的声道P1*和P2*。多声道参数MCH_PAR1可以指示由第一处理的声道P1*和第三解码的声道D3组成的第一解码的声道对。因此，多声道处理器204使用由第一处理的声道P1*和第三解码的声道D3组成的第一解码的声道对(用多声道参数MCH_PAR1标识)并且使用多声道参数MCH_PAR1执行进一步的多声道处理，以获得处理的声道P3*和P4*。As illustrated in Figure 10, the multichannel parameter MCH_PAR2 can indicate (or signal) that the second decoded channel pair consists of the first decoded channel D1 and the second decoded channel D2. Therefore, the multichannel processor 204 uses the second decoded channel pair (identified by the multichannel parameter MCH_PAR2) consisting of the first decoded channel D1 and the second decoded channel D2 and performs multichannel processing using the multichannel parameter MCH_PAR2 to obtain processed channels P1* and P2*. The multichannel parameter MCH_PAR1 can indicate the first decoded channel pair consisting of the first processed channel P1* and the third decoded channel D3. Therefore, the multichannel processor 204 uses the first decoded channel pair (identified by the multichannel parameter MCH_PAR1) consisting of the first processed channel P1* and the third decoded channel D3 and performs further multichannel processing using the multichannel parameter MCH_PAR1 to obtain processed channels P3* and P4*.

此外，多声道处理器204可以提供第三处理的声道P3*作为第一声道CH1，提供第四处理的声道P4*作为第三声道CH3，提供第二处理的声道P2*作为第二声道CH2。In addition, the multi-channel processor 204 can provide a third processing channel P3* as the first channel CH1, a fourth processing channel P4* as the third channel CH3, and a second processing channel P2* as the second channel CH2.

假设图10中所示的解码器200从图7中所示的编码器100接收到编码的多声道信号107，则解码器200的第一解码的声道D1可以等同于编码器100的第三处理的声道P3，其中解码器200的第二解码的声道D2可以等同于编码器100的第四处理的声道P4，并且其中解码器200的第三解码的声道D3可以等同于编码器100的第二处理的声道P2。此外，解码器200的第一处理的声道P1*可以等同于编码器100的第一处理的声道P1。Assuming that the decoder 200 shown in Figure 10 receives the encoded multichannel signal 107 from the encoder 100 shown in Figure 7, then the first decoded channel D1 of the decoder 200 can be equivalent to the third processed channel P3 of the encoder 100, the second decoded channel D2 of the decoder 200 can be equivalent to the fourth processed channel P4 of the encoder 100, and the third decoded channel D3 of the decoder 200 can be equivalent to the second processed channel P2 of the encoder 100. Furthermore, the first processed channel P1* of the decoder 200 can be equivalent to the first processed channel P1 of the encoder 100.

此外，编码的多声道信号107可以是串行信号，其中在多声道参数MCH_PAR1之前在解码器200处接收到多声道参数MCH_PAR2。在这种情况下，多声道处理器204可以被配置为按顺序处理解码的声道，其中解码器接收多声道参数MCH_PAR1和MCH_PAR2。在图10所示的示例中，解码器在多声道参数MCH_PAR1之前接收到多声道参数MCH_PAR2，并且因此在使用由多声道参数MCH_PAR1标识的第一解码的声道对(由第一处理的声道P1*和第三解码的声道D3组成)执行多声道处理之前，使用由多声道参数MCH_PAR2标识的第二解码的声道对(由第一解码的声道D1和第二解码的声道D2组成)执行多声道处理。Furthermore, the encoded multichannel signal 107 can be a serial signal, wherein the multichannel parameter MCH_PAR2 is received at the decoder 200 before the multichannel parameter MCH_PAR1. In this case, the multichannel processor 204 can be configured to process the decoded channels sequentially, wherein the decoder receives the multichannel parameters MCH_PAR1 and MCH_PAR2. In the example shown in FIG10, the decoder receives the multichannel parameter MCH_PAR2 before the multichannel parameter MCH_PAR1, and therefore performs multichannel processing using the second decoded channel pair identified by the multichannel parameter MCH_PAR2 (consisting of the first decoded channel P1* and the third decoded channel D3) before performing multichannel processing using the first decoded channel pair identified by the multichannel parameter MCH_PAR1.

在图10中，多声道处理器204示例性地执行两个多声道处理操作。出于说明目的，由多声道处理器204执行的多声道处理操作在图10中用处理框208和210示出。处理框208和210可以用硬件或软件实现。处理框208和210可以是例如立体声框，如上面参考编码器100所讨论的，该编码器100例如是通用解码器(或解码器侧立体声框)、基于预测的解码器(或解码器侧立体声框)或基于KLT的旋转解码器(或解码器侧立体声框)。In Figure 10, the multichannel processor 204 exemplarily performs two multichannel processing operations. For illustrative purposes, the multichannel processing operations performed by the multichannel processor 204 are shown in Figure 10 by processing blocks 208 and 210. Processing blocks 208 and 210 can be implemented in hardware or software. Processing blocks 208 and 210 can be, for example, stereo blocks, as discussed above with reference to encoder 100, which is, for example, a general-purpose decoder (or decoder-side stereo block), a prediction-based decoder (or decoder-side stereo block), or a KLT-based rotation decoder (or decoder-side stereo block).

例如，编码器100可以使用基于KLT的旋转编码器(或编码器侧立体声框)。在这种情况下，编码器100可以导出多声道参数MCH_PAR1和MCH_PAR2，使得多声道参数MCH_PAR1和MCH_PAR2包括旋转角度。可以对旋转角度差分地编码。因此，解码器200的多声道处理器204可以包括差分解码器，用于对旋转角度进行差分编码。For example, encoder 100 can use a KLT-based rotary encoder (or encoder-side stereo frame). In this case, encoder 100 can derive multichannel parameters MCH_PAR1 and MCH_PAR2, such that MCH_PAR1 and MCH_PAR2 include rotation angles. The rotation angles can be differentially encoded. Therefore, the multichannel processor 204 of decoder 200 can include a differential decoder for differentially encoding the rotation angles.

装置200还可以包括输入接口212，其被配置为接收和处理编码的多声道信号107，以向声道解码器202提供编码的声道E1至E3，并向多声道处理器204提供多声道参数MCH_PAR1和MCH_PAR2。The device 200 may also include an input interface 212 configured to receive and process encoded multichannel signals 107 to provide encoded channels E1 to E3 to the channel decoder 202 and multichannel parameters MCH_PAR1 and MCH_PAR2 to the multichannel processor 204.

如前所述，可以使用保持指示符(或保持树标志)用信号通知未发送新树，但是应该使用最后的立体树。如果声道相关属性保持不变达较长时间，则这可以用于避免相同立体声树配置的多次发送。As mentioned earlier, a hold indicator (or hold tree flag) can be used to signal that no new tree has been sent, but the last stereo tree should be used. This can be used to avoid multiple transmissions of the same stereo tree configuration if channel-related attributes remain unchanged for a long time.

因此，当编码的多声道信号107针对第一帧包括多声道参数MCH_PAR1和MCH_PAR2并且针对第一帧之后的第二帧包括保持指示符，多声道处理器204可以被配置为在第二帧中对第一帧中所使用的相同的第二声道对或相同的第一声道对执行多声道处理或进一步的多声道处理。Therefore, when the encoded multichannel signal 107 includes multichannel parameters MCH_PAR1 and MCH_PAR2 for the first frame and a hold indicator for the second frame following the first frame, the multichannel processor 204 can be configured to perform multichannel processing or further multichannel processing in the second frame on the same second channel pair or the same first channel pair used in the first frame.

多声道处理和进一步的多声道处理可以包括使用立体声参数的立体声处理，其中对于解码的声道D1至D3的各个比例因子带或比例因子带组，第一立体声参数包括在多声道参数MCH_PAR1中并且第二立体声参数包括在多声道参数MCH_PAR2中。由此，第一立体声参数和第二立体声参数可以是相同类型，例如旋转角度或预测系数。当然，第一立体声参数和第二立体声参数可以是不同类型的。例如，第一立体声参数可以是旋转角度，其中第二立体声参数可以是预测系数，反之亦然。Multichannel processing and further multichannel processing may include stereo processing using stereo parameters, wherein for each scaling factor band or scaling factor band group of the decoded channels D1 to D3, a first stereo parameter is included in multichannel parameter MCH_PAR1 and a second stereo parameter is included in multichannel parameter MCH_PAR2. Thus, the first and second stereo parameters can be of the same type, such as rotation angle or prediction coefficients. Of course, the first and second stereo parameters can be of different types. For example, the first stereo parameter could be a rotation angle, while the second stereo parameter could be a prediction coefficient, and vice versa.

此外，多声道参数MCH_PAR1和MCH_PAR2可以包括多声道处理掩码，其指示哪些比例因子带经多声道处理以及哪些比例因子带未经多声道处理。由此，多声道处理器204可以被配置为不在多声道处理掩码所指示的比例因子带中执行多声道处理。Furthermore, the multichannel parameters MCH_PAR1 and MCH_PAR2 may include a multichannel processing mask indicating which scale factor bands are multichannel processed and which are not. Thus, the multichannel processor 204 can be configured not to perform multichannel processing on the scale factor bands indicated by the multichannel processing mask.

多声道参数MCH_PAR1和MCH_PAR2可以均包括声道对标识(或索引)，其中多声道处理器204可以被配置为使用预定义的解码规则或编码的多声道信号中指示的解码规则来对声道对标识(或索引)。进行解码。The multichannel parameters MCH_PAR1 and MCH_PAR2 can both include channel pair identifiers (or indices), wherein the multichannel processor 204 can be configured to decode the channel pair identifiers (or indices) using predefined decoding rules or decoding rules indicated in the encoded multichannel signal.

例如，如上面参考编码器100所描述的，可以依据声道的总数使用每对的唯一索引来有效地用信号通知声道对。For example, as described above with reference to encoder 100, the channel pairs can be efficiently signaled using a unique index for each pair based on the total number of channels.

此外，解码规则可以是Huffman解码规则，其中多声道处理器204可以被配置为执行对声道对标识的Huffman解码。In addition, the decoding rule can be a Huffman decoding rule, in which the multichannel processor 204 can be configured to perform Huffman decoding on the channel pair identifiers.

编码的多声道信号107还可以包括多声道处理允许指示符，其仅指示允许进行多声道处理的解码的声道的子组，并且指示不允许进行多声道处理的至少一个解码的声道。由此，多声道处理器204可以被配置为不对如多声道处理允许指示符所指示的不允许进行多声道处理的至少一个解码的声道执行任何多声道处理。The encoded multichannel signal 107 may also include a multichannel processing enable indicator, which indicates only the subgroup of decoded channels that are allowed to undergo multichannel processing, and indicates at least one decoded channel that is not allowed to undergo multichannel processing. Thus, the multichannel processor 204 can be configured not to perform any multichannel processing on the at least one decoded channel that is not allowed to undergo multichannel processing as indicated by the multichannel processing enable indicator.

例如，当多声道信号是5.1声道信号时，多声道处理允许指示符可以指示多声道处理仅被允许用于5个声道，即右R、左L、右环绕Rs、左环绕LS和中心C，其中，LFE声道不允许进行多声道处理。For example, when the multichannel signal is a 5.1 channel signal, the multichannel processing enable indicator can indicate that multichannel processing is only allowed for 5 channels, namely right R, left L, right surround Rs, left surround LS and center C, where the LFE channel is not allowed to be multichannel processed.

对于解码过程(对声道对索引的解码)，可以使用以下c代码。由此，对于所有声道对，需要具有有效KLT处理的声道的数量(nChannels)以及当前帧的声道对的数量(numPairs)。For the decoding process (decoding the channel pair indexes), the following C code can be used. Therefore, for all channel pairs, the number of channels that can be effectively processed (nChannels) and the number of channel pairs in the current frame (numPairs) are required.

为了对非逐频带角度的预测系数进行解码，可使用如下c-代码。To decode the prediction coefficients for non-band-by-band angles, the following C-code can be used.

为了对非逐频带KLT角度的预测系数进行解码，可使用如下c-代码。To decode the prediction coefficients for non-band-wise KLT angles, the following C-code can be used.

为了避免不同平台上三角函数的浮点差，须使用用于将角指数直接转换成sin/cos的下列询查表：To avoid floating-point differences in trigonometric functions across different platforms, the following lookup table must be used to directly convert angle indices to sin/cos:

tabIndexToSinAlpha[64]＝{tabIndexToSinAlpha[64]＝{

-1.000000f，-0.998795f，-0.995185f，-0.989177f，-0.980785f，-0.970031f，-0.956940f，-0.941544f，-1.000000f, -0.998795f, -0.995185f, -0.989177f, -0.980785f, -0.970031f, -0.956940f, -0.941544f,

-0.923880f，-0.903989f，-0.881921f，-0.857729f，-0.831470f，-0.803208f，-0.773010f，-0.740951f，-0.923880f, -0.903989f, -0.881921f, -0.857729f, -0.831470f, -0.803208f, -0.773010f, -0.740951f,

-0.707107f，-0.671559f，-0.634393f，-0.595699f，-0.555570f，-0.514103f，-0.471397f，-0.427555f，-0.707107f, -0.671559f, -0.634393f, -0.595699f, -0.555570f, -0.514103f, -0.471397f, -0.427555f,

-0.382683f，-0.336890f，-0.290285f，-0.242980f，-0.195090f，-0.146730f，-0.098017f，-0.049068f，-0.382683f, -0.336890f, -0.290285f, -0.242980f, -0.195090f, -0.146730f, -0.098017f, -0.049068f,

0.000000f，0.049068f，0.098017f，0.146730f，0.195090f，0.242980f，0.290285f，0.336890f，0.000000f, 0.049068f, 0.098017f, 0.146730f, 0.195090f, 0.242980f, 0.290285f, 0.336890f,

0.382683f，0.427555f，0.471397f，0.514103f，0.555570f，0.595699f，0.634393f，0.671559f，0.382683f, 0.427555f, 0.471397f, 0.514103f, 0.555570f, 0.595699f, 0.634393f, 0.671559f,

0.707107f，0.740951f，0.773010f，0.803208f，0.831470f，0.857729f，0.881921f，0.903989f，0.707107f, 0.740951f, 0.773010f, 0.803208f, 0.831470f, 0.857729f, 0.881921f, 0.903989f,

0.923880f，0.941544f，0.956940f，0.970031f，0.980785f，0.989177f，0.995185f，0.998795f0.923880f, 0.941544f, 0.956940f, 0.970031f, 0.980785f, 0.989177f, 0.995185f, 0.998795f

}；}；

tabIndexToCosAlpha[64]＝{tabIndexToCosAlpha[64]＝{

0.923880f，0.941544f，0.956940f，0.970031f，0.980785f，0.989177f，0.995185f，0.998795f，0.923880f, 0.941544f, 0.956940f, 0.970031f, 0.980785f, 0.989177f, 0.995185f, 0.998795f,

1.000000f，0.998795f，0.995185f，0.989177f，0.980785f，0.970031f，0.956940f，0.941544f，1.000000f, 0.998795f, 0.995185f, 0.989177f, 0.980785f, 0.970031f, 0.956940f, 0.941544f,

0.923880f，0.903989f，0.881921f，0.857729f，0.831470f，0.803208f，0.773010f，0.740951f，0.923880f, 0.903989f, 0.881921f, 0.857729f, 0.831470f, 0.803208f, 0.773010f, 0.740951f,

0.707107f，0.671559f，0.634393f，0.595699f，0.555570f，0.514103f，0.471397f，0.427555f，0.707107f, 0.671559f, 0.634393f, 0.595699f, 0.555570f, 0.514103f, 0.471397f, 0.427555f,

0.382683f，0.336890f，0.290285f，0.242980f，0.195090f，0.146730f，0.098017f，0.049068f0.382683f, 0.336890f, 0.290285f, 0.242980f, 0.195090f, 0.146730f, 0.098017f, 0.049068f

}；}；

针对多声道编码的解码，如下c-代码可用于KLT旋转的方法。For decoding multi-channel encoding, the following C-code can be used for KLT rotation methods.

针对逐频带处理，可使用如下c-代码。For frequency band processing, the following C code can be used.

针对KLT旋转的应用，可使用如下c-代码。For applications involving KLT rotation, the following C code can be used.

图12示出了用于对具有编码的声道和至少两个多声道参数MCH_PAR1、MCH_PAR2的编码的多声道信号进行解码的方法400的流程图。方法400包括：步骤402，对编码的声道进行解码以获得解码的声道；步骤404，使用由多声道参数MCH_PAR2标识的第二解码的声道对并使用多声道参数MCH_PAR2执行多声道处理，以获得处理的声道，并使用由多声道参数MCH_PAR1标识的第一声道对并使用多声道参数MCH_PAR1进行进一步的多声道处理，其中第一声道对包括至少一个处理的声道。Figure 12 shows a flowchart of a method 400 for decoding an encoded multichannel signal having encoded channels and at least two multichannel parameters MCH_PAR1 and MCH_PAR2. Method 400 includes: step 402, decoding the encoded channels to obtain decoded channels; step 404, performing multichannel processing using a second decoded channel pair identified by multichannel parameter MCH_PAR2 and using multichannel parameter MCH_PAR2 to obtain processed channels, and performing further multichannel processing using a first channel pair identified by multichannel parameter MCH_PAR1 and using multichannel parameter MCH_PAR1, wherein the first channel pair includes at least one processed channel.

在下文中，解释了根据实施例的多声道编码中的立体声填充：The stereo fill in multichannel encoding according to the embodiments is explained below:

如已经概述的，频谱量化的不期望的影响可能是量化可能导致频谱空穴。例如，作为量化的结果，特定频带中的所有频谱值可以在编码器侧被设置为零。例如，在量化之前这些谱线的确切值可能相对较低，于是量化可能导致这样的情况，其中例如在特定频带内的所有谱线的频谱值已被设置为零。在解码器侧，当解码时，这可能导致不期望的频谱空穴。As already outlined, an undesirable effect of spectral quantization is that quantization may result in spectral holes. For example, as a result of quantization, all spectral values in a particular frequency band may be set to zero on the encoder side. For instance, the exact values of these spectral lines may have been relatively low before quantization, so quantization may result in a situation where, for example, the spectral values of all spectral lines within a particular frequency band have been set to zero. On the decoder side, this may result in undesirable spectral holes during decoding.

MPEG-H中的多声道编码工具(MCT)允许适应不同的声道间相依性，但由于在典型操作配置中使用单声道元素，因此不允许立体声填充。The Multichannel Coding Tool (MCT) in MPEG-H allows for adaptation to different inter-channel dependencies, but stereo fill is not allowed because mono elements are used in typical operating configurations.

从图14中可以看出，多声道编码工具组合了以分层方式编码的三个或更多个声道。然而，多声道编码工具(MCT)在编码时如何组合不同声道的方式根据声道的当前信号属性因帧而异。As can be seen from Figure 14, the multichannel coding tool combines three or more channels encoded in a layered manner. However, how the multichannel coding tool (MCT) combines different channels during encoding varies from frame to frame depending on the current signal properties of the channels.

例如，在图14的(a)情形下，为了生成第一编码的音频信号帧，多声道编码工具(MCT)可以组合第一声道Ch1和第二声道CH2以获得第一组合声道(处理的声道)P1和第二组合声道P2。然后，多声道编码工具(MCT)可以组合第一组合声道P1和第三声道CH3以获得第三组合声道P3和第四组合声道P4。然后，多声道编码工具(MCT)可以对第二组合声道P2、第三组合声道P3和第四组合声道P4进行编码以生成第一帧。For example, in the case of Figure 14(a), in order to generate the first encoded audio signal frame, the Multichannel Coding Tool (MCT) can combine the first channel Ch1 and the second channel CH2 to obtain the first combined channel (processed channel) P1 and the second combined channel P2. Then, the MCT can combine the first combined channel P1 and the third channel CH3 to obtain the third combined channel P3 and the fourth combined channel P4. Then, the MCT can encode the second combined channel P2, the third combined channel P3, and the fourth combined channel P4 to generate the first frame.

然后，例如，在图14的(b)情形下，为了在第一编码的音频信号帧之后(时间上)生成第二编码的音频信号帧，多声道编码工具(MCT)可以组合第一声道CH1'和第三声道CH1'，以获得第一组合声道P1'和第二组合声道P2'。然后，多声道编码工具(MCT)可以组合第一组合声道P1'和第二声道CH2'以获得第三组合声道P3'和第四组合声道P4'。然后，多声道编码工具(MCT)可以对第二组合声道P2'、第三组合声道P3'和第四组合声道P4'进行编码以生成第二帧。Then, for example, in the case of Figure 14(b), in order to generate a second encoded audio signal frame after the first encoded audio signal frame (in time), the Multichannel Coding Tool (MCT) can combine the first channel CH1' and the third channel CH1' to obtain a first combined channel P1' and a second combined channel P2'. Then, the MCT can combine the first combined channel P1' and the second channel CH2' to obtain a third combined channel P3' and a fourth combined channel P4'. Then, the MCT can encode the second combined channel P2', the third combined channel P3', and the fourth combined channel P4' to generate the second frame.

从图14中可以看出，在图14的(a)的情形下生成第一帧的第二组合声道、第三组合声道和第四组合声道的方式与在图14的(b)的情形下生成第二帧的第二组合声道、第三组合声道和第四组合声道的方式显著不同，这是因为使用不同的声道组合以分别生成相应的组合声道P2、P3和P4以及P2'、P3'、P4'。As can be seen from Figure 14, the way the second, third, and fourth combined audio channels of the first frame are generated in the case of Figure 14(a) is significantly different from the way the second, third, and fourth combined audio channels of the second frame are generated in the case of Figure 14(b). This is because different channel combinations are used to generate the corresponding combined audio channels P2, P3, and P4, as well as P2', P3', and P4', respectively.

特别地，本发明的实施例基于以下发现：In particular, embodiments of the present invention are based on the following findings:

如在图7和图14中可以看到的，组合声道P3、P4和P2(或图14的(b)情形下的P2'、P3'和P4')被馈送到声道编码器104中。除此之外，声道编码器104可以例如进行量化，使得声道P2、P3和P4的频谱值可以由于量化而被设置为零。可以将频谱相邻的频谱样本编码为频谱带，其中每个频谱带可以包括多个频谱样本。As can be seen in Figures 7 and 14, the combined channels P3, P4, and P2 (or P2', P3', and P4' in case (b) of Figure 14) are fed into the channel encoder 104. Furthermore, the channel encoder 104 may, for example, perform quantization such that the spectral values of channels P2, P3, and P4 can be set to zero due to quantization. Spectral samples with adjacent spectra can be encoded into spectral bands, where each spectral band may include multiple spectral samples.

对于不同的频带，频带的频谱样本的数量可以是不同的。例如，与较高频率范围中的频带(其可以例如包括16个频率样本)相比，具有较低频率范围的频带可以例如包括较少的频谱样本(例如，4个频谱样本)。例如，Bark标度临界频带可以定义所使用的频带。The number of spectral samples in a frequency band can vary for different frequency bands. For example, a frequency band with a lower frequency range may include fewer spectral samples (e.g., 4 spectral samples) compared to a frequency band in a higher frequency range (which may include, for example, 16 frequency samples). For example, a Bark-scale critical band can define the frequency band used.

当在量化之后频带的所有频谱样本被设置为零时，可能出现特别不希望的情况。如果出现这种情况，根据本发明，建议进行立体声填充。此外，本发明基于以下发现：至少不仅应生成(伪)随机噪声。When all spectral samples in the frequency band are set to zero after quantization, particularly undesirable situations may occur. If this happens, stereo fill is recommended according to the present invention. Furthermore, the present invention is based on the finding that at least not only (pseudo)random noise should be generated.

作为添加(伪)随机噪声的替代或补充，根据本发明的实施例，如果例如在图14的(b)情形下，已经将声道P4'的频带的所有频谱值设置为零，则以与声道P3'相同或相似的方式生成的组合声道将是用于生成用于填充已被量化为零的频带的噪声的非常适当的基础。As an alternative or supplement to adding (pseudo)random noise, according to an embodiment of the invention, if, for example, in the case of FIG14(b), all spectral values of the frequency band of channel P4' have been set to zero, then the combined channel generated in the same or similar manner as channel P3' would be a very suitable basis for generating noise to fill the frequency band that has been quantized to zero.

然而，根据本发明的实施例，优选的是不使用当前帧/当前时间点的P3'组合声道的频谱值作为填充P4'组合声道(其仅包括为零的频谱值)的频带的基础，这是因为组合声道P3'以及组合声道P4'都是基于声道P1'和P2'生成的，因此使用当前的时间点的P3'组合声道将导致仅仅平移。However, according to an embodiment of the invention, it is preferable not to use the spectral values of the P3' combined channel at the current frame/current time point as the basis for filling the frequency band of the P4' combined channel (which only includes spectral values of zero), because the combined channels P3' and P4' are both generated based on channels P1' and P2', so using the P3' combined channel at the current time point would result in only a translation.

例如，如果P3'是P1'和P2'的中间声道(例如，P3'＝0.5*(P1'+P2'))，并且P4'如果是P1'和P2'的侧声道(例如，P4'＝0.5*(P1'-P2'))，则例如将P3'的衰减的频谱值引入P4'的频带中将仅仅导致平移。For example, if P3' is the middle channel of P1' and P2' (e.g., P3' = 0.5 * (P1' + P2')) and P4' is a side channel of P1' and P2' (e.g., P4' = 0.5 * (P1' - P2')), then introducing the attenuated spectral value of P3' into the frequency band of P4' will simply result in a shift.

相反，使用先前时间点的声道来生成用于填充当前P4'组合声道中的频谱空穴的频谱值将是优选的。根据本发明的发现，与当前帧的P3'组合声道相对应的先前帧的声道组合将是生成用于填充P4'的频谱空穴的频谱样本的理想基础。Conversely, it would be preferable to use the channels from previous time points to generate spectral values for filling the spectral holes in the current P4' combined channels. According to the findings of the present invention, the channel combinations from previous frames corresponding to the P3' combined channels of the current frame would be an ideal basis for generating spectral samples for filling the spectral holes in P4'.

然而，先前前帧的图14的(a)的情形下生成的组合声道P3不对应于当前帧的组合声道P3'，这是因为已经以与当前帧的组合声道P3'不同的方式生成了先前帧的组合声道P3。However, the combined channel P3 generated in the case of Figure 14(a) of the previous frame does not correspond to the combined channel P3' of the current frame because the combined channel P3 of the previous frame was generated in a different manner than the combined channel P3' of the current frame.

根据本发明的实施例的发现，应该在解码器侧基于先前帧的重构声道生成P3'组合声道的近似。According to the findings of an embodiment of the invention, an approximation of the P3' combined channel should be generated on the decoder side based on the reconstructed channel of the previous frame.

图14的(a)示出了编码器情形，其中通过生成E1、E2和E3针对先前帧对声道CH1、CH2和CH3进行编码。解码器接收声道E1、E2和E3，并重构已编码的声道CH1、CH2和CH3。可能已经发生了一些编码损失，但是，所生成的近似CH1、CH2和CH3的声道CH1*、CH2*和CH3*将与原始声道CH1、CH2和CH3非常相似，因此CH1*≈CH1、CH2*≈CH2并且CH3*≈CH3。根据实施例，解码器将针对先前帧生成的声道CH1*、CH2*和CH3*保持在缓冲器中以将它们用于当前帧中的噪声填充。Figure 14(a) illustrates the encoder scenario, where channels CH1, CH2, and CH3 are encoded for a previous frame by generating E1, E2, and E3. The decoder receives channels E1, E2, and E3 and reconstructs the encoded channels CH1, CH2, and CH3. Some coding loss may have occurred; however, the generated approximate channels CH1*, CH2*, and CH3* will be very similar to the original channels CH1, CH2, and CH3, so CH1*≈CH1, CH2*≈CH2, and CH3*≈CH3. According to an embodiment, the decoder keeps the channels CH1*, CH2*, and CH3* generated for the previous frame in a buffer to use them for noise padding in the current frame.

现在更详细地描述其中示出了根据实施例的用于解码的装置201的图1a：Figure 1a, which shows a device 201 for decoding according to an embodiment, will now be described in more detail:

图1a的装置201适于对先前帧的先前编码的多声道信号进行解码以获得三个或更多个先前音频输出声道，并且被配置为对当前帧的当前编码的多声道信号107进行解码以获得三个或更多个当前音频输出声道。The apparatus 201 of FIG1a is adapted to decode a previously encoded multichannel signal of a previous frame to obtain three or more previous audio output channels, and is configured to decode a currently encoded multichannel signal 107 of the current frame to obtain three or more current audio output channels.

该装置包括接口212、声道解码器202、用于生成三个或更多个当前音频输出声道CH1、CH2、CH3的多声道处理器204、以及噪声填充模块220。The device includes an interface 212, a channel decoder 202, a multichannel processor 204 for generating three or more current audio output channels CH1, CH2, CH3, and a noise filling module 220.

接口212适于接收当前编码的多声道信号107，并接收包括第一多声道参数MCH_PAR2的辅助信息。Interface 212 is adapted to receive the currently encoded multichannel signal 107 and to receive auxiliary information including the first multichannel parameter MCH_PAR2.

声道解码器202适于对当前帧的当前编码的多声道信号进行解码，以获得当前帧的三个或更多个解码的声道D1、D2、D3的集合。The channel decoder 202 is adapted to decode the currently encoded multichannel signal of the current frame to obtain a set of three or more decoded channels D1, D2, D3 of the current frame.

多声道处理器204适于根据第一多声道参数MCH_PAR2从三个或更多个解码的声道D1、D2、D3的集合中选择第一所选两个解码的声道对D1、D2。The multichannel processor 204 is adapted to select a first selected pair of two decoded channels D1, D2 from a set of three or more decoded channels D1, D2, D3 according to a first multichannel parameter MCH_PAR2.

作为示例，这在图1a中由被馈送到(可选的)处理框208中的两个声道D1、D2示出。As an example, this is shown in Figure 1a by two channels D1, D2 being fed into the (optional) processing box 208.

此外，多声道处理器204适于基于所述第一所选两个解码的声道对D1、D2生成第一组两个或更多个处理的声道P1*、P2*，以获得三个或更多个解码的声道D3、P1*、P2*的更新集合。Furthermore, the multichannel processor 204 is adapted to generate a first set of two or more processed channels P1*, P2* based on the first selected two decoded channel pairs D1, D2, to obtain an updated set of three or more decoded channels D3, P1*, P2*.

在该示例中，其中两个声道D1和D2被馈送到(可选的)框208中，从两个所选择的声道D1和D2生成两个处理的声道P1*和P2*。然后，三个或更多个解码的声道的更新集合包括剩下的未经修改的声道D3，并且还包括已经从D1和D2生成的P1*和P2*。In this example, two channels D1 and D2 are fed into (optional) box 208 to generate two processed channels P1* and P2* from the two selected channels D1 and D2. Then, the updated set of three or more decoded channels includes the remaining unmodified channel D3, and also includes P1* and P2* already generated from D1 and D2.

在多声道处理器204基于所述第一所选两个解码的声道对D1、D2生成第一对两个或更多个处理的声道P1*、P2*之前，噪声填充模块220适于标识所述第一所选两个解码的声道对D1、D2的两个声道中的至少一个声道、其中所有谱线被量化为零的一个或多个频带，并且适于使用三个或更多个先前音频输出声道中的两个或更多个但不是全部声道来生成混合声道，并且适于以使用混合声道的谱线生成的噪声来填充其中所有谱线被量化为零的一个或多个频带的谱线，其中噪声填充模块220适于根据辅助信息从三个或更多个先前音频输出声道中选择用于生成混合声道的两个或更多个先前音频输出声道。Before the multichannel processor 204 generates a first pair of two or more processed channels P1*, P2* based on the first selected two decoded channel pairs D1, D2, the noise filling module 220 is adapted to identify at least one channel of the two channels of the first selected two decoded channel pairs D1, D2, wherein one or more frequency bands in which all spectral lines are quantized to zero, and is adapted to generate a mixed channel using two or more, but not all, of three or more previous audio output channels, and is adapted to fill the spectral lines in one or more frequency bands in which all spectral lines are quantized to zero with noise generated using the spectral lines of the mixed channel, wherein the noise filling module 220 is adapted to select two or more previous audio output channels from three or more previous audio output channels for generating the mixed channel according to auxiliary information.

因此，噪声填充模块220分析是否存在仅具有零值的频谱的频带，并且进一步用所生成的噪声填充找到的空频带。例如，频带可以例如具有4或8或16个谱线，并且当频带的所有谱线已经量化为零时，则噪声填充模块220填充所生成的噪声。Therefore, the noise filling module 220 analyzes whether there are frequency bands with only zero values in the spectrum, and further fills the found empty frequency bands with the generated noise. For example, the frequency band may have, for example, 4, 8, or 16 spectral lines, and the noise filling module 220 fills the generated noise when all spectral lines of the frequency band have been quantized to zero.

指定如何生成和填充噪声的噪声填充模块220可以采用的实施例的特定构思被称为立体声填充。The noise filling module 220, which specifies how noise is generated and filled, may employ a particular concept in an embodiment referred to as stereo filling.

在图1a的实施例中，噪声填充模块220与多声道处理器204交互。例如，在实施例中，当噪声填充模块想要例如通过处理框处理两个声道时，它向噪声填充模块220馈送这些声道，并且噪声填充模块220检查频带是否已被量化为零，并且如果检测到，则填充这些频带。In the embodiment of FIG1a, the noise filling module 220 interacts with the multi-channel processor 204. For example, in the embodiment, when the noise filling module wants to process two channels, for example, through a processing frame, it feeds these channels to the noise filling module 220, and the noise filling module 220 checks whether the frequency bands have been quantized to zero, and if so, fills these frequency bands.

在图1b所示的另一实施例中，噪声填充模块220与声道解码器202交互。例如，已经当声道解码器对编码的多声道信号进行解码以获得三个或更多个解码的声道D1、D2和D3时，噪声填充模块例如可以检查频带是否已经被量化为零，并且例如如果检测到，则填充这些频带。在该实施例中，多声道处理器204可以通过填充噪声确保所有频谱空穴之前已经闭合。In another embodiment shown in Figure 1b, the noise filling module 220 interacts with the channel decoder 202. For example, when the channel decoder has decoded the encoded multichannel signal to obtain three or more decoded channels D1, D2, and D3, the noise filling module can, for example, check whether the frequency bands have been quantized to zero, and, if detected, fill those frequency bands. In this embodiment, the multichannel processor 204 can ensure that all spectral holes have been closed beforehand by filling noise.

在另外的实施例(未示出)中，噪声填充模块220可以与声道解码器和多声道处理器交互。例如，当声道解码器202生成解码的声道D1、D2和D3时，噪声填充模块220可能刚好在声道解码器202生成频带之后已经检查了它们是否已被量化为零，但是当多声道处理器204真正处理这些声道时，可能仅生成噪声并填充相应的频带。In another embodiment (not shown), the noise filling module 220 may interact with both the channel decoder and the multichannel processor. For example, when the channel decoder 202 generates the decoded channels D1, D2, and D3, the noise filling module 220 may have just checked whether they have been quantized to zero after the channel decoder 202 has generated the frequency bands, but when the multichannel processor 204 actually processes these channels, it may only generate noise and fill the corresponding frequency bands.

例如，随机噪声、计算廉价的操作可以被插入到已被量化为零的任何频带中，但是只有当多声道处理器204真的对其进行处理时，噪声填充模块可以填充从先前生成的音频输出声道生成的噪声。然而，在该实施例中，在插入随机噪声之前，应该在插入随机噪声之前检测是否存在频谱空穴，并且应该将该信息保存在存储器中，这是因为在插入随机噪声之后，由于插入了随机噪声，各个频带于是将具有不等于零的频谱值。For example, random noise and computationally inexpensive operations can be inserted into any frequency band that has been quantized to zero, but the noise filling module can only fill in the noise generated from the previously generated audio output channel when the multi-channel processor 204 actually processes it. However, in this embodiment, before inserting random noise, the presence of spectral holes should be detected, and this information should be stored in memory, because after the insertion of random noise, each frequency band will then have a non-zero spectral value due to the insertion of random noise.

在实施例中，除了基于先前音频输出信号生成的噪声之外，将随机噪声插入已被量化为零的频带中。In one embodiment, random noise is inserted into the frequency band that has been quantized to zero, in addition to the noise generated based on the previous audio output signal.

在一些实施例中，接口212可以例如适于接收当前编码的多声道信号107，并且适于接收包括第一多声道参数MCH_PAR2和第二多声道参数MCH_PAR1的辅助信息。In some embodiments, interface 212 may be adapted, for example, to receive the currently encoded multichannel signal 107, and to receive auxiliary information including a first multichannel parameter MCH_PAR2 and a second multichannel parameter MCH_PAR1.

多声道处理器204可以例如适于根据第二多声道参数MCH_PAR1从三个或更多个解码的声道D3、P1*，P2*的更新集合中选择第二所选两个解码的声道对P1*、D3，其中第二所选两个解码的声道对(P1*、D3)中的至少一个声道P1*是第一对两个或更多个处理的声道P1*、P2*中的一个声道。The multichannel processor 204 may, for example, be adapted to select a second selected pair of two decoded channels P1*, D3 from an updated set of three or more decoded channels D3, P1*, P2* according to a second multichannel parameter MCH_PAR1, wherein at least one channel P1* of the second selected pair of decoded channels (P1*, D3) is one of the channels P1*, P2* of the first pair of two or more processed channels P1*, P2*.

多声道处理器204可以例如适于基于所述第二所选两个解码的声道对P1*、D3生成第二组两个或更多个处理的声道P3*、P4*，以进一步更新三个或更多个解码的声道的更新集合。The multichannel processor 204 may, for example, be adapted to generate a second set of two or more processed channels P3*, P4* based on the second selected two decoded channel pairs P1*, D3, to further update the updated set of three or more decoded channels.

在图1a和图1b中可以看到该实施例的示例，在图1a和图1b中，(可选的)处理框210接收声道D3和处理的声道P1*并对其进行处理以获得处理的声道P3*和P4*，使得三个解码的声道的进一步更新的集合包括未处理的框210修改的P2*以及所生成的P3*和P4*。An example of this embodiment can be seen in Figures 1a and 1b, in which (optional) processing block 210 receives channel D3 and processed channel P1* and processes them to obtain processed channels P3* and P4*, such that the set of further updates of the three decoded channels includes the unprocessed block 210 modified P2* as well as the generated P3* and P4*.

处理框208和210在图1a和图1b中被标记为可选的。这表明尽管可以使用处理框208和210来实现多声道处理器204，但是关于确切地如何实现多声道处理器204存在各种其他可能性。例如，代替针对两个(或更多个)声道的每个不同处理使用不同的处理框208、210，可以再使用相同的处理框，或者多声道处理器204可以实现两个声道的处理而完全不使用处理框208、210(作为多声道处理器204的子单元)。Processing blocks 208 and 210 are labeled as optional in Figures 1a and 1b. This indicates that although processing blocks 208 and 210 can be used to implement the multichannel processor 204, various other possibilities exist regarding exactly how the multichannel processor 204 is implemented. For example, instead of using different processing blocks 208, 210 for each different processing of two (or more) channels, the same processing blocks can be used again, or the multichannel processor 204 can implement the processing of two channels without using processing blocks 208, 210 at all (as sub-units of the multichannel processor 204).

根据另一实施例，多声道处理器204可以例如适于通过基于所述第一所选两个解码的声道对D1、D2生成第一组恰好两个处理的声道P1*、P2*来生成第一组两个或更多个处理的声道P1*、P2*。多声道处理器204可以例如适于用第一组恰好两个处理的声道P1*、P2*替换三个或更多个解码的声道D1、D2、D3的集合中的所述第一所选两个解码的声道对D1、D2，来获得三个或更多个解码的声道D3、P1*、P2*的更新集合。多声道处理器204可以例如适于通过基于所述第二所选两个解码的声道对P1*、D3生成第二组恰好两个处理的声道P3*、P4*来生成第二组两个或更多个处理的声道P3*、P4*。此外，多声道处理器204可以例如适于用第二组恰好两个处理的声道P3*、P4*替换三个或更多个解码的声道D3、P1*、P2*的更新集合中的所述第二所选两个解码的声道对P1*、D3，以进一步更新三个或更多个解码的声道的更新集合。According to another embodiment, the multichannel processor 204 may, for example, be adapted to generate a first set of two or more processed channels P1*, P2* by generating a first set of exactly two processed channels P1*, P2* based on the first selected two decoded channel pairs D1, D2. The multichannel processor 204 may, for example, be adapted to replace the first selected two decoded channel pairs D1, D2 in a set of three or more decoded channels D1, D2, D3 with the first set of exactly two processed channels P1*, P2* to obtain an updated set of three or more decoded channels D3, P1*, P2*. The multichannel processor 204 may, for example, be adapted to generate a second set of two or more processed channels P3*, P4* by generating a second set of exactly two processed channels P3*, P4* based on the second selected two decoded channel pairs P1*, D3. Furthermore, the multichannel processor 204 may, for example, be adapted to replace the second selected two decoded channel pairs P1*, D3 in the updated set of three or more decoded channels D3, P1*, P2* with exactly two processed channels P3*, P4* of the second set, in order to further update the updated set of three or more decoded channels.

在该实施例中，从两个所选择的声道(例如，处理框208或210的两个输入声道)生成恰好两个处理的声道，并且这些恰好两个处理的声道替换三个或更多个解码的声道的集合中的所选声道。例如，多声道处理器204的处理框208用P1*和P2*替换所选择的声道D1和D2。In this embodiment, exactly two processed channels are generated from two selected channels (e.g., the two input channels of processing block 208 or 210), and these exactly two processed channels replace the selected channels in a set of three or more decoded channels. For example, processing block 208 of multichannel processor 204 replaces the selected channels D1 and D2 with P1* and P2*.

然而，在其他实施例中，可以在装置201中进行上混频以用于解码，并且可以从两个所选声道生成多于两个处理的声道，或者可以不从解码的声道的更新集合中删除所有所选声道。However, in other embodiments, upmixing can be performed in device 201 for decoding, and more than two processed channels can be generated from two selected channels, or all selected channels can be left unremoved from the updated set of decoded channels.

另一个问题是如何生成用于生成由噪声填充模块220生成的噪声的混合声道。Another issue is how to generate the mixed channels used to generate the noise produced by the noise filling module 220.

根据一些实施例，噪声填充模块220可以例如适于使用三个或更多个先前音频输出声道中的恰好两个声道作为三个或更多个先前音频输出声道中的两个或更多个声道来生成混合声道；其中，噪声填充模块220可以例如适于根据辅助信息从三个或更多个先前音频输出声道中选择恰好两个先前音频输出声道。According to some embodiments, the noise filling module 220 may be adapted, for example, to generate a mixed channel using exactly two of three or more previous audio output channels as two or more of three or more previous audio output channels; wherein, the noise filling module 220 may be adapted, for example, to select exactly two previous audio output channels from three or more previous audio output channels based on auxiliary information.

仅使用三个或更多个先前输出声道中的两个声道有助于降低计算混合声道的计算复杂度。Using only two of the three or more previously output channels helps reduce the computational complexity of calculating the mixed channels.

然而，在其他实施例中，先前音频输出声道中的两个以上声道用于生成混合声道，但是考虑的先前音频输出声道的数量小于三个或更多先前音频输出声道的总数量。However, in other embodiments, two or more channels from the previous audio output channels are used to generate the mixed channel, but the number of previous audio output channels considered is less than the total number of three or more previous audio output channels.

在仅考虑先前输出声道中的两个声道的实施例中，混合声道可以例如如下计算：In an embodiment that considers only two channels from the previous output channels, the mixed channel can be calculated, for example, as follows:

在实施例中，噪声填充模块220适于基于公式In an embodiment, the noise filling module 220 is adapted to be based on a formula

或基于公式Or based on formula

使用恰好两个先前音频输出声道来生成混合声道，其中D_ch是混合声道；其中是该恰好两个先前音频输出声道中的第一声道；其中是该恰好两个先前音频输出声道中的第二声道，其不同于该恰好两个先前音频输出声道中的第一声道，并且其中d是实数正标量。A mixed channel is generated using exactly two previous audio output channels, where D _ch is the mixed channel; where is the first channel of the exactly two previous audio output channels; where is the second channel of the exactly two previous audio output channels, which is different from the first channel of the exactly two previous audio output channels, and where d is a real positive scalar.

在典型情况下，中间声道可以是适当的混合声道。该方法计算混合声道作为所考虑的两个先前音频输出声道的中间声道。In typical cases, the middle channel can be a suitable hybrid channel. This method calculates the hybrid channel as the middle channel between the two previous audio output channels under consideration.

然而，在一些情形下，当应用时，例如当时，可能出现混合声道接近零。于是，例如可能优选地是使用作为混合信号。因此，于是使用侧声道(用于异相位输入信号)。However, in some situations, when applied, such as when the number of mixed channels is close to zero, it may be preferable to use a mixed signal. Therefore, side channels (for out-of-phase input signals) are used.

根据备选办法，噪声填充模块220适于基于公式According to the alternative method, the noise filling module 220 is suitable for use based on the formula.

或基于公式Or based on formula

使用恰好两个先前音频输出声道来生成混合声道，其中是混合声道；其中是该恰好两个先前音频输出声道中的第一声道；其中是该恰好两个先前音频输出声道中的第二声道，其不同于该恰好两个先前音频输出声道中的第一声道，并且其中α是旋转角度。A mixed channel is generated using exactly two previous audio output channels, where is the mixed channel; is the first channel of the exactly two previous audio output channels; is the second channel of the exactly two previous audio output channels, which is different from the first channel of the exactly two previous audio output channels, and α is the rotation angle.

该方法通过进行对所考虑的两个先前音频输出声道的旋转来计算混合声道。This method calculates the mixed channel by rotating the two previously considered audio output channels.

旋转角度α例如可以在如下范围内：-90°<α<90°。The rotation angle α can be within the following range, for example: -90° < α < 90°.

在实施例中，旋转角度例如可以在如下范围内：30°<α<60°。In an embodiment, the rotation angle may be within the range of 30° < α < 60°.

此外，在典型情况下，声道可以是适当的混合声道。该方法计算混合声道作为所考虑的两个先前音频输出声道的中间声道。Furthermore, in typical cases, the channel can be a suitable mixed channel. This method calculates the mixed channel as the middle channel between the two previous audio output channels under consideration.

然而，在一些情形下，当应用时，例如当时，可能出现混合声道接近零。于是，例如可能优选的是使用作为混合信号。However, in some situations, when applied, such as when the number of mixed channels is close to zero, it may be preferable to use as the mixed signal.

根据特定实施例，辅助信息可以例如是被分配给当前帧的当前辅助信息，其中接口212可以例如适于接收被分配给先前帧的先前辅助信息，其中先前辅助信息包括先前角度；其中，接口212可以例如适于接收包括当前角度的当前辅助信息，并且其中，噪声填充模块220可以例如适于使用当前辅助信息的当前角度作为旋转角度α，并且适于不使用先前辅助信息的先前角度作为旋转角度α。According to a particular embodiment, the auxiliary information may be, for example, current auxiliary information assigned to the current frame, wherein interface 212 may be, for example, adapted to receive previous auxiliary information assigned to a previous frame, wherein the previous auxiliary information includes a previous angle; wherein interface 212 may be, for example, adapted to receive current auxiliary information including the current angle, and wherein noise filling module 220 may be, for example, adapted to use the current angle of the current auxiliary information as the rotation angle α, and adapted not to use the previous angle of the previous auxiliary information as the rotation angle α.

因此，在该实施例中，即使基于先前音频输出声道计算混合声道，在辅助信息中发送的当前角度依然被用作旋转角度，而不是先前接收的旋转角度，尽管基于先前音频输出声道来计算混合声道，该先前音频输出声道是基于先前帧生成的。Therefore, in this embodiment, even if the mixed channel is calculated based on the previous audio output channel, the current angle sent in the auxiliary information is still used as the rotation angle, rather than the previously received rotation angle, even though the mixed channel is calculated based on the previous audio output channel, which was generated based on the previous frame.

本发明的一些实施例的另一方面涉及比例因子。Another aspect of some embodiments of the present invention relates to a scaling factor.

例如，频带可以是比例因子带。For example, the frequency band could be a scaling factor band.

根据一些实施例，在多声道处理器204基于所述第一所选两个解码的声道对(D1，D2)生成第一对两个或更多个处理的声道P1*、P2*之前，噪声填充模块(220)可以例如适于针对所述第一所选两个解码的声道对D1、D2的两个声道中的至少一个声道标识一个或多个比例因子带，其是其中所有谱线被量化为零的一个或多个频带，并且可以例如适于使用三个或更多个先前音频输出声道中的所述两个或更多个但不是全部声道来生成混合声道，并且适于根据其中所有谱线被量化为零的一个或多个比例因子带中的每个的比例因子，以使用混合声道的谱线生成的噪声填充其中所有谱线被量化为零的一个或多个比例因子带的谱线。According to some embodiments, before the multichannel processor 204 generates a first pair of two or more processed channels P1*, P2* based on the first selected two decoded channel pairs (D1, D2), the noise filling module (220) may, for example, be adapted to identify one or more scaling factor bands for at least one of the two channels of the first selected two decoded channel pairs D1, D2, which are one or more frequency bands in which all spectral lines are quantized to zero, and may, for example, be adapted to generate a mixed channel using said two or more, but not all, channels of three or more previous audio output channels, and is adapted to fill the spectral lines of the one or more scaling factor bands in which all spectral lines are quantized to zero with noise generated using the spectral lines of the mixed channel, according to the scaling factor of each of the one or more scaling factor bands in which all spectral lines are quantized to zero.

在这些实施例中，比例因子可以例如被分配给每个比例因子带，并且当使用混合声道生成噪声时考虑该比例因子。In these embodiments, the scaling factor may, for example, be assigned to each scaling factor band, and the scaling factor may be taken into account when generating noise using a mixed channel.

在特定实施例中，接收接口212可以例如被配置为接收所述一个或多个比例因子带中的每个的比例因子，并且所述一个或多个比例因子带中的每个的比例因子指示在量化之前所述比例因子带的谱线的能量。噪声填充模块220可以例如适于生成噪声用于其中所有谱线被量化为零的一个或多个比例因子带中的每个，使得在将噪声加到一个频带中之后谱线的能量对应于由所述比例因子带的比例因子指示的能量。In a particular embodiment, the receiving interface 212 may, for example, be configured to receive a scaling factor for each of the one or more scaling factor bands, and the scaling factor for each of the one or more scaling factor bands indicates the energy of the spectral line of the scaling factor band before quantization. The noise filling module 220 may, for example, be adapted to generate noise for each of the one or more scaling factor bands in which all spectral lines are quantized to zero, such that after adding noise to a frequency band, the energy of the spectral line corresponds to the energy indicated by the scaling factor of the scaling factor band.

例如，混合声道可以指示其中应插入噪声的比例因子带的四个谱线的谱值，并且这些谱值可以例如是：0.2；0.3；0.5；0.1。For example, a mixing channel can indicate the spectral values of the four spectral lines of the scaling factor band in which noise should be inserted, and these spectral values can be, for example: 0.2; 0.3; 0.5; 0.1.

混合声道的比例因子带的能量可以例如如下计算：The energy of the scaling factor band for the mixed channels can be calculated, for example, as follows:

(0.2)'+(0.3)+(0.5)'+(0.1)2＝0.39(0.2)'+(0.3)+(0.5)'+(0.1)2＝0.39

但是，其中应填充噪声的声道的比例因子带的比例因子可以是例如仅0.0039。However, the scaling factor of the scaling factor band of the channel that should be filled with noise can be, for example, only 0.0039.

衰减因子可以例如如下计算：The attenuation factor can be calculated as follows:

因此，在如上示例中，Therefore, in the example above,

在实施例中，将用作噪声的混合声道的比例因子带的每个频谱值与衰减因子相乘：In this embodiment, each spectral value of the scaling factor band used as noise in the mixed channel is multiplied by an attenuation factor:

因此，上述示例的比例因子带的四个频谱值中的每个都乘以衰减因子，并且得到衰减的频谱值：Therefore, each of the four spectral values of the scaling factor band in the example above is multiplied by the attenuation factor to obtain the attenuated spectral value:

0.2·0.01＝0.0020.2 * 0.01 = 0.002

0.3·0.01＝0.0030.3 * 0.01 = 0.003

0.5·0.01＝0.0050.5 * 0.01 = 0.005

0.1·0.01＝0.0010.1·0.01＝0.001

然后，可以将这些衰减的频谱值插入要填充噪声的声道的比例因子带。These attenuated spectral values can then be inserted into the scaling factor band of the channel to be filled with noise.

通过用对应的对数运算替换上述运算，例如通过用加法替换乘法等，上述示例同样适用于对数值。The above example also applies to logarithmic values by replacing the above operations with corresponding logarithmic operations, such as replacing multiplication with addition.

此外，除了上面提供的特定实施例的描述之外，噪声填充模块220的其他实施例适用参考图2至图6描述的一个、一些或所有构思。In addition to the description of the specific embodiments provided above, other embodiments of the noise filling module 220 are applicable to one, some, or all of the concepts described with reference to Figures 2 through 6.

本发明的实施例的另一方面涉及这样的问题，基于该问题，选择来自先前音频输出声道的信息声道用于生成混合声道以获得要插入的噪声。Another aspect of the embodiments of the present invention addresses the problem of selecting an information channel from a previous audio output channel to generate a mixed channel to obtain noise to be inserted.

根据实施例，根据噪声填充模块220的装置可以例如适于根据第一多声道参数MCH_PAR2从三个或更多个先前音频输出声道中选择恰好两个先前音频输出声道。According to an embodiment, the means of the noise filling module 220 can be adapted, for example, to select exactly two previous audio output channels from three or more previous audio output channels according to a first multi-channel parameter MCH_PAR2.

因此，在该实施例中，控制选择哪个声道进行处理的第一多声道参数也控制先前音频输出声道中的哪个声道用于生成混合声道以生成要插入的噪声。Therefore, in this embodiment, the first multichannel parameter that controls which channel is selected for processing also controls which channel in the previous audio output channel is used to generate the mixed channel to generate the noise to be inserted.

在实施例中，第一多声道参数MCH_PAR2可以例如指示三个或更多个解码的声道的集合中的两个解码的声道D1、D2；并且多声道处理器204适于通过选择由第一多声道参数MCH_PAR2指示的两个解码的声道D1、D2从三个或更多个解码的声道D1、D2、D3的集合中选择第一所选两个解码的声道对D1、D2。此外，第二多声道参数MCH_PAR1可以例如指示三个或更多个解码的声道的更新集合中的两个解码的声道P1*、D3。多声道处理器204可以例如适于通过选择由第二多声道参数MCH_PAR1指示的两个解码的声道P1*、D3从三个或更多个解码的声道D3、P1*、P2*的更新集合中选择第二所选两个解码的声道对P1*、D3。In an embodiment, the first multichannel parameter MCH_PAR2 may, for example, indicate two decoded channels D1, D2 from a set of three or more decoded channels; and the multichannel processor 204 is adapted to select a first selected pair of decoded channels D1, D2 from a set of three or more decoded channels D1, D2, D3 by selecting the two decoded channels D1, D2 indicated by the first multichannel parameter MCH_PAR2. Furthermore, the second multichannel parameter MCH_PAR1 may, for example, indicate two decoded channels P1*, D3 from an updated set of three or more decoded channels. The multichannel processor 204 may, for example, be adapted to select a second selected pair of decoded channels P1*, D3 from an updated set of three or more decoded channels D3, P1*, P2* by selecting the two decoded channels P1*, D3 indicated by the second multichannel parameter MCH_PAR1.

因此，在该实施例中，被选择进行第一处理(例如，图1a或图1b中的处理框208的处理)的声道不仅取决于第一多声道参数MCH_PAR2。除此之外，在第一多声道参数MCH_PAR2中明确指定这两个所选声道。Therefore, in this embodiment, the channel selected for the first processing (e.g., the processing of processing block 208 in FIG1a or FIG1b) depends not only on the first multichannel parameter MCH_PAR2. Furthermore, the two selected channels are explicitly specified in the first multichannel parameter MCH_PAR2.

同样，在该实施例中，被选择进行第二处理(例如图1a或图1b中的处理框210的处理)的声道不仅取决于第二多声道参数MCH_PAR1。除此之外，在第二多声道参数MCH_PAR1中明确指定这两个所选声道。Similarly, in this embodiment, the channel selected for the second processing (e.g., the processing of processing block 210 in FIG1a or FIG1b) depends not only on the second multichannel parameter MCH_PAR1. Furthermore, the two selected channels are explicitly specified in the second multichannel parameter MCH_PAR1.

本发明的实施例介绍了用于多声道参数的复杂索引方案，参考图15对其进行解释。The embodiments of the present invention describe a complex indexing scheme for multi-channel parameters, which is explained with reference to FIG15.

图15的(a)示出了编码器侧的五个声道的编码，该五个声道即为左声道、右声道、中心声道、左环绕声道和右环绕声道。图15的(b)示出了对编码的声道E0、E1、E2、E3、E4的解码，以重构左声道、右声道、中心声道、左环绕声道和右环绕声道。Figure 15(a) shows the encoding of the five channels on the encoder side, namely the left channel, right channel, center channel, left surround channel, and right surround channel. Figure 15(b) shows the decoding of the encoded channels E0, E1, E2, E3, and E4 to reconstruct the left channel, right channel, center channel, left surround channel, and right surround channel.

假设索引被分配给左声道、右声道、中心声道、左环绕声道和右环绕声道这五个声道中的每个，即Assuming the index is assigned to each of the five channels: left channel, right channel, center channel, left surround channel, and right surround channel, that is...

在图15的(a)中，在编码器侧，进行的第一操作可以是例如在处理框192中混合声道0(左声道)和声道3(左环绕声道)以获得两个处理的声道。可以假设处理的声道之一是中间声道而另一声道是侧声道。然而，也可以应用形成两个处理的声道的其他构思，例如，通过进行旋转操作来确定两个处理的声道。In Figure 15(a), on the encoder side, the first operation performed could be, for example, mixing channel 0 (left channel) and channel 3 (left surround channel) in processing block 192 to obtain two processing channels. It can be assumed that one of the processing channels is the center channel and the other is a side channel. However, other ideas for forming the two processing channels can also be applied, for example, determining the two processing channels by performing a rotation operation.

现在，两个所生成的处理的声道获得与用于处理的声道的索引相同的索引。即，处理的声道中的第一声道具有索引0，并且处理的声道中的第二声道具有索引3。用于该处理的所确定的多声道参数可以例如是(0；3)。Now, the two generated channels for processing acquire the same indices as the channels used for processing. That is, the first channel in the processed channels has index 0, and the second channel in the processed channels has index 3. The determined multichannel parameters for this processing can be, for example, (0; 3).

在编码器侧进行的第二操作可以是例如在处理框194中混合声道1(右声道)和声道4(右环绕声道)以获得两个进一步处理的声道。同样，两个进一步生成的处理的声道获得与用于处理的声道的索引相同的索引。即，进一步处理的声道中的第一声道具有索引1，并且处理的声道中的第二声道具有索引4。用于该处理的所确定的多声道参数可以例如是(1；4)。The second operation performed on the encoder side could be, for example, mixing channel 1 (right channel) and channel 4 (right surround channel) in processing box 194 to obtain two further processed channels. Similarly, the two further processed channels obtain the same index as the channel used for processing. That is, the first channel in the further processed channels has index 1, and the second channel in the processed channels has index 4. The determined multichannel parameters for this processing could be, for example, (1; 4).

在编码器侧进行的第三操作可以是例如在处理框196中混合处理的声道0和处理的声道1以获得另外两个处理的声道。同样，这两个所生成的处理的声道获得与用于处理的声道的索引相同的索引。即，进一步处理的声道中的第一声道具有索引0，并且处理的声道中的第二声道具有索引1。用于该处理的所确定的多声道参数可以例如是(0；1)。The third operation performed on the encoder side could be, for example, mixing processed channel 0 and processed channel 1 in processing box 196 to obtain two additional processed channels. Similarly, these two generated processed channels obtain the same index as the channel used for processing. That is, the first channel in the further processed channels has index 0, and the second channel in the processed channels has index 1. The determined multi-channel parameters for this processing could be, for example, (0; 1).

编码的声道E0、E1、E2、E3和E4通过它们的索引来区分，即，E0具有索引0，E1具有索引1，E2具有索引2，等等。The encoded channels E0, E1, E2, E3, and E4 are distinguished by their indices; that is, E0 has index 0, E1 has index 1, E2 has index 2, and so on.

编码器侧的三个操作得到三个多声道参数：The three operations on the encoder side yield three multi-channel parameters:

(0；3)，(1；4)，(0；1)。(0；3), (1；4), (0；1).

由于用于解码的装置须以相反的顺序执行编码器操作，所以例如在向用于解码的装置发送多声道参数时可以将多声道参数的顺序反转，从而得到多声道参数：Since the decoding device must perform encoder operations in reverse order, the order of the multichannel parameters can be reversed, for example, when sending multichannel parameters to the decoding device, to obtain the multichannel parameters:

(0；1)，(1；4)，(0；3)。(0；1), (1；4), (0；3).

对于用于解码的装置，(0；1)可以被称为第一多声道参数，(1；4)可以被称为第二多声道参数，并且(0；3)可以被称为第三多声道参数。For the device used for decoding, (0; 1) can be called the first multi-channel parameter, (1; 4) can be called the second multi-channel parameter, and (0; 3) can be called the third multi-channel parameter.

在图15的(b)所示的解码器侧，从接收到第一多声道参数(0；1)，用于解码的装置得出结论，作为解码器侧的第一处理操作，应处理声道0(E0)和1(E1)。这在图15的(b)的框296中进行。两个所生成的处理的声道都继承了用于生成它们的声道E0和E1的索引，因此，所生成的处理的声道也具有索引0和1。On the decoder side shown in Figure 15(b), upon receiving the first multichannel parameters (0; 1), the decoding apparatus concludes that, as the first processing operation on the decoder side, channels 0 (E0) and 1 (E1) should be processed. This is performed in box 296 of Figure 15(b). Both generated processed channels inherit the indices of the channels E0 and E1 used to generate them; therefore, the generated processed channels also have indices 0 and 1.

从接收到第二多声道参数(1；4)，用于解码的装置得出结论，作为解码器侧的第二处理操作，应处理处理的声道1和声道4(E4)。这在图15的(b)的框294中进行。两个所生成的处理的声道都继承了用于生成它们的声道1和4的索引，因此，所生成的处理的声道也具有索引1和4。Upon receiving the second multichannel parameters (1; 4), the decoding apparatus concludes that, as a second processing operation on the decoder side, channels 1 and 4 (E4) should be processed. This is performed in box 294 of Figure 15(b). Both generated processed channels inherit the indices of channels 1 and 4 used to generate them; therefore, the generated processed channels also have indices 1 and 4.

从接收到第三多声道参数(0；3)，用于解码的装置得出结论，作为解码器侧的第三处理操作，应处理处理的声道0和声道3(E3)。这在图15的(b)的框292中进行。两个所生成的处理的声道都继承了用于生成它们的声道0和3的索引，因此，所生成的处理的声道也具有索引0和3。Upon receiving the third multichannel parameters (0; 3), the decoding apparatus concludes that, as the third processing operation on the decoder side, channels 0 and 3 (E3) should be processed. This is performed in box 292 of Figure 15(b). Both generated processed channels inherit the indices of channels 0 and 3 used to generate them; therefore, the generated processed channels also have indices 0 and 3.

作为用于解码的装置的处理的结果，重构了左声道(索引0)、右声道(索引1)、中心声道(索引2)、左环绕声道(索引3)和右环绕声道(索引4)。As a result of the processing by the device used for decoding, the left channel (index 0), right channel (index 1), center channel (index 2), left surround channel (index 3), and right surround channel (index 4) were reconstructed.

让我们假设在解码器侧，由于量化，某个比例因子带内的声道E1(索引1)的所有值已被量化为零。当用于解码的装置想要在框296中进行处理时，期望经噪声填充的声道1(声道E1)。Let's assume that on the decoder side, due to quantization, all values of channel E1 (index 1) within a certain scaling factor band have been quantized to zero. When the device used for decoding wants to process in box 296, it expects noise-filled channel 1 (channel E1).

如已经概述的，实施例现在使用两个先前音频输出信号对声道1的频谱空穴进行噪声填充。As already outlined, the embodiment now uses two previous audio output signals to fill the spectral holes of channel 1 with noise.

在特定实施例中，如果要进行操作的声道具有被量化为零的比例因子带，则两个先前音频输出声道用于生成具有与应进行处理的两个声道相同的索引号的噪声。在该示例中，如果在处理框296中的处理之前检测到声道1的频谱空穴，则具有索引0(先前左声道)和具有索引1(先前右声道)的先前音频输出声道用于生成噪声以在解码器侧填充声道1的频谱空穴。In a particular embodiment, if the channel to be operated on has a scaling factor band that is quantized to zero, the two previous audio output channels are used to generate noise with the same index numbers as the two channels to be processed. In this example, if a spectral hole in channel 1 is detected before processing in processing block 296, the previous audio output channels with index 0 (previous left channel) and index 1 (previous right channel) are used to generate noise to fill the spectral hole in channel 1 on the decoder side.

由于索引始终由处理产生的处理的声道继承，因此可以假设先前输出声道将起到生成参与解码器侧的实际处理的声道的作用，如果先前音频输出声道将是当前音频输出声道。因此，可以实现对被量化为零的比例因子带的良好估计。Since the index is always inherited from the processed channels, it can be assumed that the previous output channel will serve to generate the actual processing channel participating in the decoder side, if the previous audio output channel will be the current audio output channel. Therefore, a good estimate of the scaling factor band that is quantized to zero can be achieved.

根据实施例，该装置可以例如适于将来自标识符集合的标识符分配给三个或更多个先前音频输出声道中的每个先前音频输出声道，使得三个或更多个先前音频输出声道中的每个先前音频输出声道被分配给标识符集合中的恰好一个标识符，并且使得标识符集合中的每个标识符被分配给三个或更多个先前音频输出声道中的恰好一个先前音频输出声道。此外，该装置可以例如适于将来自所述标识符集合的标识符分配给三个或更多个解码的声道的集合中的每个声道，使得三个或更多个解码的声道的集合中的每个声道被分配给标识符集合中的恰好一个标识符，并且使得标识符集合中的每个标识符被分配给三个或更多个解码的声道的集合中的恰好一个声道。According to an embodiment, the apparatus may, for example, be adapted to assign identifiers from a set of identifiers to each of three or more previous audio output channels, such that each of the three or more previous audio output channels is assigned to exactly one identifier from the set of identifiers, and that each identifier from the set of identifiers is assigned to exactly one of the three or more previous audio output channels. Furthermore, the apparatus may, for example, be adapted to assign identifiers from the set of identifiers to each of three or more decoded channels, such that each of the three or more decoded channels is assigned to exactly one identifier from the set of identifiers, and that each identifier from the set of identifiers is assigned to exactly one of the three or more decoded channels.

此外，第一多声道参数MCH_PAR2可以例如指示三个或更多个标识符集合中的第一对两个标识符。多声道处理器204可以例如适于通过选择被分配给第一对两个标识符的两个标识符的两个解码的声道D1、D2，从三个或更多个解码的声道D1、D2、D3的集合中选择第一所选两个解码的声道对D1、D2。Furthermore, the first multichannel parameter MCH_PAR2 may, for example, indicate a first pair of two identifiers from a set of three or more identifiers. The multichannel processor 204 may, for example, be adapted to select the first selected pair of two decoded channels D1, D2 from a set of three or more decoded channels D1, D2 by selecting two decoded channels D1, D2 assigned to the two identifiers of the first pair of two identifiers.

该装置可以例如适于将第一对两个标识符的两个标识符中的第一标识符分配给第一组恰好两个处理的声道P1*、P2*中的第一处理的声道。此外，该装置可以例如适于将第一对两个标识符的两个标识符中的第二标识符分配给第一组恰好两个处理的声道P1*、P2*中的第二处理的声道。The device may, for example, be adapted to assign the first identifier of the two identifiers in the first pair of two identifiers to the first processed channel in the first group of exactly two processed channels P1*, P2*. Furthermore, the device may, for example, be adapted to assign the second identifier of the two identifiers in the first pair of two identifiers to the second processed channel in the first group of exactly two processed channels P1*, P2*.

该标识符集合可以例如是索引集合，例如，非负整数集合(例如，包括标识符0；1；2；3和4的集合)。The set of identifiers can be, for example, an index set, such as a set of non-negative integers (e.g., a set including the identifiers 0; 1; 2; 3 and 4).

在特定实施例中，第二多声道参数MCH_PAR1可以例如指示三个或更多个标识符集合中的第二对两个标识符。多声道处理器204可以例如适于通过选择被分配给第二对两个标识符的两个标识符的两个解码的声道(D3、P1*)，从三个或更多个解码的声道D3、P1*、P2*的更新集合中选择第二所选两个解码的声道对P1*、D3。此外，该装置可以例如适于将第二对两个标识符的两个标识符中的第一标识符分配给第二组恰好两个处理的声道P3*、P4*的第一处理的声道。此外，该装置可以例如适于将第二对两个标识符的两个标识符中的第二标识符分配给第二组恰好两个处理的声道P3*、P4*的第二处理的声道。In a particular embodiment, the second multichannel parameter MCH_PAR1 may, for example, indicate a second pair of two identifiers from a set of three or more identifiers. The multichannel processor 204 may, for example, be adapted to select a second selected pair of two decoded channels P1*, D3 from an updated set of three or more decoded channels D3, P1*, P2* by selecting two decoded channels (D3, P1*) assigned to the two identifiers of the second pair of two identifiers. Furthermore, the device may, for example, be adapted to assign the first identifier of the two identifiers of the second pair of two identifiers to the first processed channel of the second set of exactly two processed channels P3*, P4*. Furthermore, the device may, for example, be adapted to assign the second identifier of the two identifiers of the second pair of two identifiers to the second processed channel of the second set of exactly two processed channels P3*, P4*.

在特定实施例中，第一多声道参数MCH_PAR2可以例如指示三个或更多个标识符集合中的所述第一对两个标识符。噪声填充模块220可以例如适于通过选择被分配给所述第一对两个标识符的两个标识符的两个先前音频输出声道，从三个或更多个先前音频输出声道中选择恰好两个先前音频输出声道。In a particular embodiment, the first multichannel parameter MCH_PAR2 may, for example, indicate the first pair of two identifiers in a set of three or more identifiers. The noise filling module 220 may, for example, be adapted to select exactly two previous audio output channels from three or more previous audio output channels by selecting two previous audio output channels of the two identifiers assigned to the first pair of two identifiers.

如已经概述的，图7示出了根据实施例的用于对具有至少三个声道(CH1：CH3)的多声道信号101进行编码的装置100。As already outlined, Figure 7 illustrates an apparatus 100 according to an embodiment for encoding a multichannel signal 101 having at least three channels (CH1:CH3).

该装置包括迭代处理器102，其适于在第一迭代步骤中计算至少三个声道(CH：CH3)中的每对之间的声道间相关值，用于在第一迭代步骤中选择具有最高值或具有高于阈值的值的声道对，并且用于使用多声道处理操作110、112处理所选声道对，以导出用于所选声道对的初始多声道参数MCH_PAR1并导出第一处理的声道P1、P2。The device includes an iterative processor 102 adapted to calculate, in a first iterative step, interchannel correlation values between each pair of at least three channels (CH:CH3), for selecting, in the first iterative step, channel pairs having the highest value or having a value above a threshold, and for processing the selected channel pairs using multichannel processing operations 110, 112 to derive initial multichannel parameters MCH_PAR1 for the selected channel pairs and derive channels P1, P2 of the first processing.

迭代处理器102适于使用至少一个处理的声道P1在第二迭代步骤中执行计算、选择和处理，以导出另外的多声道参数MCH_PAR2和第二处理的声道P3、P4。The iterative processor 102 is adapted to perform calculations, selections, and processing in a second iterative step using at least one processed channel P1 to derive additional multichannel parameters MCH_PAR2 and second processed channels P3 and P4.

此外，该装置包括声道编码器，该声道编码器适于对通过迭代处理器104执行的迭代处理所得的声道(P2：P4)进行编码，以获得编码的声道(E1：E3)。In addition, the device includes a channel encoder adapted to encode the channels (P2:P4) obtained by the iterative processing performed by the iterative processor 104 to obtain encoded channels (E1:E3).

此外，该装置包括输出接口106，其适于生成具有编码的声道(E1：E3)、初始多声道参数和另外的多声道参数MCH_PAR1、MCH_PAR2的编码的多声道信号107。In addition, the device includes an output interface 106, which is adapted to generate an encoded multichannel signal 107 having encoded channels (E1:E3), initial multichannel parameters, and additional multichannel parameters MCH_PAR1 and MCH_PAR2.

此外，该装置包括输出接口106，其适于生成编码的多声道信号107，以包括指示用于解码的装置是否应该用基于先前已解码的音频输出声道生成的噪声来填充其中所有谱线被量化为零的一个或多个频带的谱线的信息，所述先前已解码的音频输出声道先前已被用于解码的装置解码。In addition, the device includes an output interface 106 adapted to generate an encoded multi-channel signal 107 to include information indicating whether the decoding device should fill the spectral lines of one or more frequency bands in which all spectral lines are quantized to zero with noise generated based on previously decoded audio output channels that have been previously decoded by the decoding device.

因此，用于编码的装置能够用信号通知用于解码的装置是否应该用基于先前已解码的音频输出声道生成的噪声来填充其中所有谱线被量化为零的一个或多个频带的谱线，所述先前已解码的音频输出声道先前已被用于解码的装置解码。Therefore, the encoding device can signal to the decoding device whether the spectral lines of one or more frequency bands in which all spectral lines are quantized to zero should be filled with noise generated based on previously decoded audio output channels that have been previously decoded by the decoding device.

根据实施例，初始多声道参数和另外的多声道参数MCH_PAR1、MCH_PAR2中的每个指示恰好两个声道，恰好两个声道中的每个是编码的声道(E1：E3)之一或者是第一或第二处理的声道P1、P2、P3、P4之一或者是至少三个声道(CH1：CH3)之一。According to the embodiment, each of the initial multichannel parameters and the additional multichannel parameters MCH_PAR1 and MCH_PAR2 indicates exactly two channels, each of which is one of the encoded channels (E1:E3) or one of the first or second processed channels P1, P2, P3, P4 or one of at least three channels (CH1:CH3).

输出接口106可以例如适于生成编码的多声道信号107，使得指示用于解码的装置是否应该填充其中所有谱线被量化为零的一个或多个频带的谱线的信息，包括针对初始和多声道参数MCH_PAR1、MCH_PAR2中的每个参数，指示对于由初始和另外的多声道参数MCH_PAR1、MCH_PAR2中的所述参数指示的恰好两个声道中的至少一个声道，用于解码的装置是否应该用基于先前已解码的音频输出声道生成的频谱数据来填充其中所有谱线被量化为零的一个或多个频带的谱线，其中所述先前已解码的音频输出声道先前被用于解码的装置解码。Output interface 106 may, for example, be adapted to generate an encoded multichannel signal 107 such that information indicating whether a decoding device should fill the spectral lines of one or more frequency bands in which all spectral lines are quantized to zero includes, for each of the initial and additional multichannel parameters MCH_PAR1, MCH_PAR2, indicating whether, for at least one of exactly two channels indicated by the parameters in the initial and additional multichannel parameters MCH_PAR1, MCH_PAR2, the decoding device should fill the spectral lines of one or more frequency bands in which all spectral lines are quantized to zero with spectral data generated based on previously decoded audio output channels, wherein the previously decoded audio output channels were previously used by the decoding device for decoding.

下面进一步描述特定实施例，其中使用hasStereoFilling[pair]值发送这些信息，该值指示是否应当在当前处理的MCT声道对中应用立体声填充。The following describes a specific embodiment in which this information is sent using the hasStereoFilling[pair] value, which indicates whether stereo fill should be applied to the currently processed MCT channel pair.

图13示出了根据实施例的系统。Figure 13 illustrates the system according to an embodiment.

该系统包括如上所述的用于编码的装置100、以及根据上述实施例之一的用于解码的装置201。The system includes an encoding device 100 as described above, and a decoding device 201 according to one of the above embodiments.

用于解码的装置201被配置为从用于编码的装置100接收由用于编码的装置100生成的编码的多声道信号107。The decoding device 201 is configured to receive the encoded multi-channel signal 107 generated by the encoding device 100 from the encoding device 100.

此外，提供编码的多声道信号107。In addition, coded multi-channel signals 107 are provided.

编码的多声道信号包括Encoded multi-channel signals include

-编码的声道(E1：E3)，和- Encoded channels (E1:E3), and

-多声道参数MCH_PAR1、MCH_PAR2，和- Multi-channel parameters MCH_PAR1, MCH_PAR2, and

-指示用于解码的装置是否应该用基于先前已解码的音频输出声道生成的频谱数据来填充其中所有谱线被量化为零的一个或多个频带的谱线，其中所述先前已解码的音频输出声道先前被用于解码的装置解码。- Indicates whether the device used for decoding should fill the spectral lines of one or more frequency bands in which all spectral lines are quantized to zero with spectral data generated based on previously decoded audio output channels, wherein the previously decoded audio output channels were previously used by the device for decoding.

根据实施例，编码的多声道信号可以例如包括作为多声道参数MCH_PAR1、MCH_PAR2两个或更多个多声道参数。According to an embodiment, the encoded multichannel signal may include, for example, two or more multichannel parameters as multichannel parameters MCH_PAR1 and MCH_PAR2.

两个或更多个多声道参数MCH_PAR1、MCH_PAR2中的每个可以例如指示恰好两个声道，恰好两个声道中的每个是编码的声道(E1：E3)之一或者是多个处理的声道P1、P2、P3、P4之一或者是至少三个初始(例如，未处理)声道(CH：CH3)之一。Each of two or more multichannel parameters MCH_PAR1, MCH_PAR2 may, for example, indicate exactly two channels, each of which is one of the encoded channels (E1:E3) or one of the multiple processed channels P1, P2, P3, P4 or one of at least three initial (e.g., unprocessed) channels (CH:CH3).

指示用于解码的装置是否应填充其中所有谱线被量化为零的一个或多个频带的谱线的信息，可以例如包括针对两个或更多个多声道参数MCH_PAR1、MCH_PAR2中的每个参数，指示对于由两个或更多个多声道参数MCH_PAR1、MCH_PAR2中的所述参数指示的恰好两个声道中的至少一个声道，用于解码的装置是否应该用基于先前已解码的音频输出声道生成的频谱数据来填充其中所有谱线被量化为零的一个或多个频带的谱线，其中所述先前已解码的音频输出声道先前被用于解码的装置解码。Information indicating whether a device for decoding should fill spectral lines in one or more frequency bands where all spectral lines are quantized to zero may, for example, include, for each of two or more multichannel parameters MCH_PAR1, MCH_PAR2, indicating whether, for at least one of exactly two channels indicated by the parameters MCH_PAR1, MCH_PAR2, the device for decoding should fill spectral lines in one or more frequency bands where all spectral lines are quantized to zero with spectral data generated based on previously decoded audio output channels, wherein the previously decoded audio output channels were previously used by the device for decoding.

如下面进一步概述的，描述了特定实施例，其中使用hasStereoFilling[pair]值发送这些信息，该值指示是否应该在当前处理的MCT声道对中应用立体声填充。As further outlined below, a particular embodiment is described in which this information is sent using the hasStereoFilling[pair] value, which indicates whether stereo fill should be applied to the currently processed MCT channel pair.

在下文中，更详细地描述了一般概念和特定实施例。The general concepts and specific embodiments are described in more detail below.

实施例实现了参数化低比特率编码模式，其具有使用任意立体声树(立体声填充和MCT的组合)的灵活性。The embodiment implements a parameterized low bit rate coding mode, which has the flexibility to use arbitrary stereo trees (a combination of stereo fill and MCT).

通过分层地应用已知的联合立体声编码工具来利用声道间信号相依性。为了较低比特率，实施例扩展MCT以使用分立立体声编码框和立体声填充框的组合。因此，可以对例如具有相似内容的声道(即，具有最高相关性的声道对)应用半参数化编码，而不同声道可以单独编码或通过非参数化表示编码。因此，MCT比特流语法扩展为能够用信号通知是否允许立体声填充以及何处它是激活的。Inter-channel signal dependencies are utilized by applying known joint stereo coding tools in a layered manner. For lower bit rates, embodiments extend MCT to use a combination of discrete stereo coding frames and stereo fill frames. Thus, semi-parametric coding can be applied to channels with similar content (i.e., channel pairs with the highest correlation), while different channels can be encoded individually or via non-parametric representations. Therefore, the MCT bitstream syntax is extended to be able to signal whether stereo fill is allowed and where it is active.

实施例实现了用于任意立体声填充对的先前下混频的生成。The embodiment implements the generation of the previous downmixer for arbitrary stereo fill pairs.

立体声填充依赖于使用先前帧的下混频来改善对频域中因量化引起的频谱空穴的填充。然而，结合MCT，现在允许联合编码立体声对的集合是时变的。因此，两个联合编码的声道可能尚未先前前帧中被联合编码，即当树配置已改变时。Stereo fill relies on using downmixing from previous frames to improve the filling of spectral holes caused by quantization in the frequency domain. However, combined with MCT, the set of jointly encoded stereo pairs is now allowed to be time-varying. Therefore, two jointly encoded channels may not have been jointly encoded in a previous frame, i.e., when the tree configuration has changed.

为了估计先前下混频，先前已解码的输出声道被保存并用逆立体声操作进行处理。对于给定的立体声框，这是使用当前帧的参数以及与处理的立体声框的声道索引相对应的先前帧的解码的输出声道来完成的。To estimate the previous downmix, the previously decoded output channels are preserved and processed using inverse stereo operations. For a given stereo frame, this is done using the parameters of the current frame and the decoded output channels of the previous frame corresponding to the channel indices of the processed stereo frame.

如果例如由于独立帧(在不考虑先前帧数据的情况下可以解码的帧)或变换长度改变而导致先前输出声道信号不可用，则对应声道的先前声道缓冲器被设置为零。因此，只要至少一个先前声道信号可用，仍然可以计算非零的先前下混频。If, for example, the previous output channel signal becomes unavailable due to an independent frame (a frame that can be decoded without considering previous frame data) or a change in transform length, the previous channel buffer for the corresponding channel is set to zero. Therefore, a non-zero previous downmixer can still be calculated as long as at least one previous channel signal is available.

如果MCT被配置为使用基于预测的立体声框，则用针对立体声填充对指定的逆MS操作，优选地使用基于预测方向标志(MPEG-H语法中的pred_dir)的以下两个等式之一来计算先前下混频。If the MCT is configured to use a prediction-based stereo frame, the specified inverse MS operation is performed with stereo fill, preferably using one of the following two equations based on the prediction direction flag (pred_dir in MPEG-H syntax) to calculate the previous downmixer.

其中，d是任意实数正标量。Where d is any real positive scalar.

如果MCT被配置为使用基于旋转的立体声框，则使用具有负旋转角度的旋转计算先前下混频。If MCT is configured to use a rotation-based stereo frame, the previous downmixer is calculated using a rotation with a negative rotation angle.

因此，对于如下给出的旋转：Therefore, for the following rotation:

逆旋转计算为：The inverse rotation is calculated as follows:

其中，是先前输出声道和的期望的先前下混频。Wherein, is the previously output channel and the expected previous downmixer.

实施例实现了立体声填充在MCT中的应用。The example demonstrates the application of stereo fill in MCT.

在[1]、[5]中描述了立体声填充在单个立体声框中的应用。对于单个立体声框，立体声填充被应用于给定MCT声道对的第二声道。The application of stereo fill in a single stereo frame is described in [1] and [5]. For a single stereo frame, stereo fill is applied to the second channel of a given MCT channel pair.

特别地，结合MCT的立体声填充的区别如下：Specifically, the differences in stereo fill in MCT are as follows:

MCT树配置每帧扩展一个信令比特，以便能够用信号通知当前帧中是否允许立体声填充。The MCT tree configuration extends each frame by one signaling bit so that it can signal whether stereo fill is allowed in the current frame.

在优选实施例中，如果在当前帧中允许立体声填充，则针对每个立体声框发送用于激活立体声框中的立体声填充的一个附加比特。这是优选实施例，因为它允许编码器侧控制应该通过哪些框在解码器中应用立体声填充。In a preferred embodiment, if stereo fill is permitted in the current frame, an additional bit is sent for each stereo frame to activate the stereo fill in that frame. This is a preferred embodiment because it allows the encoder side to control which frames should be used to apply stereo fill in the decoder.

在第二实施例中，如果在当前帧中允许立体声填充，则在所有立体声框中允许立体声填充，并且不针对每个个体立体声框发送附加比特。在这种情况下，解码器控制在各个MCT框中选择性地应用立体声填充。In the second embodiment, if stereo fill is allowed in the current frame, then stereo fill is allowed in all stereo frames, and no additional bits are sent for each individual stereo frame. In this case, the decoder controls the selective application of stereo fill in each MCT frame.

以下描述了另外的构思和详细的实施例：The following describes another concept and detailed embodiments:

实施例提高了低比特率多声道操作点的质量。The implementation improves the quality of low bit rate multichannel operating points.

在频域(FD)编码的声道对元素(CPE)中，MPEG-H 3D音频标准允许使用[1]的子节5.5.5.4.9中描述的立体声填充工具，以感知上改善对由编码器中非常粗略的量化引起的频谱空穴的填充。该工具被证明特别对于以中和低比特率编码的双声道立体声是有益的。In frequency domain (FD) encoded channel pairs (CPEs), the MPEG-H 3D audio standard allows the use of the stereo filler tool described in subsection 5.5.5.4.9 of [1] to perceptually improve the filling of spectral holes caused by very coarse quantization in the encoder. This tool has proven particularly useful for two-channel stereo encoded at medium and low bit rates.

引入了在[2]的第7节中描述的多声道编码工具(MCT)，该工具实现了以每帧为基础的联合编码声道对的灵活的信号自适应定义，以利用多声道设置中的时变声道间相依性。当用于多声道设置(其中每个声道驻留在其个体单声道元素(SCE)中)的高效动态联合编码时，MCT的优点特别显著，这是因为与必须先验地建立的传统CPE+SCE(+LFE)配置不同，它允许联合声道编码从一帧到下一帧级联和/或重新配置。The Multichannel Coding Tool (MCT) described in Section 7 of [2] is introduced, which implements a flexible, signal-adaptive definition of co-coded channel pairs on a per-frame basis to take advantage of time-varying inter-channel dependencies in multichannel settings. The advantages of MCT are particularly significant when used for efficient dynamic co-coding of multichannel settings, where each channel resides in its individual mono element (SCE), because unlike the traditional CPE+SCE(+LFE) configuration which must be established a priori, it allows co-channel coding to be concatenated and/or reconfigured from one frame to the next.

在不使用CPE的情况下对多声道环绕声进行编码目前的缺点是，仅在CPE中可用的联合立体声工具-预测性M/S编码和立体声填充-不能被利用，这在中低比特率下尤其不利。MCT可以替代M/S工具，但目前无法替代立体声填充工具。The current drawback of encoding multichannel surround sound without using a CPE is that the joint stereo tools available only in the CPE—predictive M/S coding and stereo fill—cannot be utilized, which is particularly disadvantageous at low to medium bit rates. MCT can replace the M/S tools, but it cannot currently replace the stereo fill tools.

实施例允许通过用相应的信令比特扩展MCT比特流语法并且通过将立体声填充的应用推广至任意声道对而不管其声道元素类型来在MCT的声道对内使用立体声填充工具。The implementation allows the use of the stereo fill tool within MCT channel pairs by extending the MCT bitstream syntax with corresponding signaling bits and by extending the application of stereo fill to any channel pair regardless of its channel element type.

例如，一些实施例可以在MCT中实现立体声填充的信令，如下：For example, some embodiments can implement stereo fill signaling in MCT, as follows:

在CPE中，在第二声道的FD噪声填充信息中用信号通知立体声填充工具的使用，如在[1]的子节5.5.5.4.9.4中所述。当利用MCT时，每个声道都可能是“第二声道”(由于跨元素声道对的可能性)。因此，提出通过每个MCT编码的声道对一个附加比特来明确地用信号通知立体声填充。当在特定MCT“树”实例的任何声道对中都未采用立体声填充时，为了避免需要该附加比特，使用MultichannelCodingFrame()中的MCTSignalingType元素的两个当前保留条目[2]来用信号通知每个声道对存在上述附加比特。In CPE, the use of the stereo fill tool is signaled in the FD noise fill information of the second channel, as described in subsection 5.5.5.4.9.4 of [1]. When using MCT, each channel can be a “second channel” (due to the possibility of cross-element channel pairs). Therefore, it is proposed to explicitly signal the stereo fill by using an additional bit for each channel pair encoded by MCT. When stereo fill is not used in any channel pair of a particular MCT “tree” instance, to avoid needing the additional bit, the presence of the above additional bit is signaled for each channel pair by using two currently reserved entries [2] of the MCTSignalingType element in MultichannelCodingFrame().

下面提供详细描述。A detailed description is provided below.

一些实施例可以例如实现如下的先前下混频的计算：Some embodiments may implement the following prior downmixing calculation, for example:

CPE中的立体声填充通过加上先前帧的下混频的相应MDCT系数来填充第二声道的某些“空”比例因子带，所述系数根据对应频带的所发送比例因子(其否则未被使用，这是因为所述频带完全被量化为零)被缩放。使用目标声道的比例因子带控制的加权相加的过程可以在MCT的情况下相同地使用。立体声填充的源频谱，即先前帧的下混频，必须以与CPE内不同的方式计算，特别是因为MCT“树”配置可能是时变的。Stereo fill in the CPE is achieved by adding the corresponding MDCT coefficients from the downmix of the previous frame to fill certain "empty" scaling factor bands in the second channel. These coefficients are scaled according to the transmitted scaling factor of the corresponding frequency band (which would otherwise be unused because the band is completely quantized to zero). The weighted summation process, controlled by the scaling factor band of the target channel, can be used in the same way in the case of the MCT. The source spectrum of the stereo fill, i.e., the downmix of the previous frame, must be calculated differently than in the CPE, especially because the MCT "tree" configuration may be time-varying.

在MCT中，可以使用当前帧的给定联合声道对的MCT参数从最后一帧的解码的输出声道(在MCT解码之后存储)导出先前下混频。对于应用基于预测性M/S的联合编码的声道对，先前下混频，如在CPE立体声填充中，取决于当前帧的方向指示符而等于适当声道频谱的和或差。对于使用基于Karhunen-Loève旋转的联合编码的立体声对，先前下混频表示用当前帧的旋转角度计算的逆旋转。同样，下面提供了详细描述。In MCT, the previous downmix can be derived from the decoded output channels of the last frame (stored after MCT decoding) using the MCT parameters of a given joint channel pair in the current frame. For channel pairs applying predictive M/S-based joint coding, the previous downmix, as in CPE stereo fill, is equal to the sum or difference of the appropriate channel spectra, depending on the direction indicator of the current frame. For stereo pairs using joint coding based on Karhunen-Loève rotation, the previous downmix represents the inverse rotation calculated using the rotation angle of the current frame. Again, a detailed description is provided below.

复杂性评估表明，作为中低比特率工具的MCT中的立体声填充，在低/中比特率和高比特率下测量时，预计不会增加最坏情况的复杂性。此外，使用立体声填充通常与被量化为零的较多频谱系数一致，由此降低基于上下文的算术解码器的算法复杂性。假设在N声道环绕配置中使用最多N/3个立体声填充声道，并且每次执行立体声填充时使用附加的0.2WMOPS，当编码器采样率为48kHz并且IGF工具仅在12kHz以上工作时，对于5.1声道而言峰值复杂性仅增加0.4WMOPS，对于11.1声道而言峰值复杂性增加0.8WMOPS。这相当于解码器总复杂性的不到2％。Complexity assessments indicate that stereo fill in MCT, as a low-to-medium bit-rate tool, is not expected to increase worst-case complexity when measured at low/medium and high bit-rates. Furthermore, using stereo fill generally aligns with a larger number of spectral coefficients quantized to zero, thereby reducing the algorithmic complexity of the context-based arithmetic decoder. Assuming a maximum of N/3 stereo fill channels in an N-channel surround configuration, and using an additional 0.2 WMOPS per stereo fill operation, the peak complexity increases by only 0.4 WMOPS for 5.1 channels and 0.8 WMOPS for 11.1 channels when the encoder sampling rate is 48 kHz and the IGF tool only operates above 12 kHz. This equates to less than 2% of the total decoder complexity.

实施例实现MultichannelCodingFrame()元素如下：The implementation of the MultichannelCodingFrame() element is as follows:

根据一些实施例，MCT中的立体声填充可以如下实现：According to some embodiments, stereo fill in MCT can be implemented as follows:

与[1]的子节5.5.5.4.9中描述的声道对元素中的IGF立体声填充一样，多声道编码工具(MCT)中的立体声填充使用先前帧的输出频谱的下混频来填充处于噪声填充开始频率或高于其的“空”比例因子带(完全量化为零)。Similar to the IGF stereo fill in the channel pair element described in subsection 5.5.5.4.9 of [1], the stereo fill in the Multichannel Coding Tool (MCT) uses the downmixing of the output spectrum of the previous frame to fill the “empty” scaling factor band (fully quantized to zero) at or above the noise fill start frequency.

当立体声填充在MCT联合声道对中激活时(表AMD4.4中hasStereoFilling[pair]≠0)，使用先前帧的对应输出频谱的下混频(在MCT应用之后)将该声道对的第二声道的噪声填充区域(即，始于noiseFillingStartOffset或高于其)中的所有“空”比例因子带填充至特定目标能量。这是在FD噪声填充之后(参见ISO/IEC 23003-3：2012中的子节7.2)并且在比例因子和MCT联合立体声应用之前完成的。完成MCT处理后的所有输出频谱将被保存以用于在下一帧中进行潜在的立体声填充。When stereo fill is activated in an MCT combined channel pair (hasStereoFilling[pair]≠0 in Table AMD4.4), all “empty” scaling factor bands in the noise-filled region of the second channel of that channel pair (i.e., starting at or above noiseFillingStartOffset) are filled to a specific target energy using the downmixing of the corresponding output spectrum from the previous frame (after MCT application). This is done after FD noise fill (see subsection 7.2 in ISO/IEC 23003-3:2012) and before the scaling factor and MCT combined stereo application. All output spectra after MCT processing are saved for potential stereo fill in the next frame.

操作约束例如可能是，如果第二声道相同，hasStereoFilling[pair]≠0的任何后续MCT立体声对不支持第二声道的空频带中的立体声填充算法(hasStereoFilling[pair]≠0)的级联执行。在声道对元素中，根据[1]的子节5.5.5.4.9，第二(残差)声道中激活的IGF立体声填充优先于-并且因此禁用-同一帧的同一声道中的任何后续MCT立体声填充的应用。Operational constraints might include, for example, the cascading execution of any subsequent MCT stereo pairs that do not support stereo fill algorithms (hasStereoFilling[pair]≠0) in the empty frequency bands of the second channel if the second channel is the same. Within the channel pair element, according to subsection 5.5.5.4.9 of [1], the IGF stereo fill activated in the second (residual) channel takes precedence over—and therefore disables—the application of any subsequent MCT stereo fill in the same channel of the same frame.

术语和定义可以例如定义如下：Terms and definitions can be defined, for example, as follows:

hasStereoFilling[pair] 指示当前处理的MCT声道对中对立体声填充的使用hasStereoFilling[pair] indicates the use of MCT channel centering to stereo fill in the currently processed channel.

ch1,ch2 当前处理的MCT声道对中的声道的索引ch1, ch2 are the channel indices of the currently processed MCT channel pair.

spectral_data[][] 当前处理的MCT声道对中声道的频谱系数spectral_data[][] The spectral coefficients of the current MCT channel relative to the middle channel.

spectrum_data_prev[][] 先前帧中完成MCT处理之后的输出频谱spectrum_data_prev[][] The output spectrum after MCT processing in the previous frame.

downmix_prev[][] 具有当前处理的MCT声道对给出的索引的先前帧的输出声道的估计的下混频`downmix_prev[][]` is the downmixer with an estimate of the output channels of the previous frame at the given index of the MCT channels being processed.

num_swb 比例因子带的总数，见ISO/IEC23003-3第6.2.9.4子节num_swb is the total number of scaling factor bands, see section 6.2.9.4 of ISO/IEC 23003-3.

ccfl coreCoderFrameLength，变换长度，见ISO/IEC 23003-3第6.1子节ccfl coreCoderFrameLength, transformation length, see section 6.1 of ISO/IEC 23003-3.

noiseFillingStartOffset 噪声填充起始线，根据ISO/IEC 23003-3表109中的ccfl定义noiseFillingStartOffset: Noise fill start line, defined according to the CCFL definition in Table 109 of ISO/IEC 23003-3.

igf_WhiteningLevel IGF中的频谱白化，参见ISO/IEC 23008-3第5.5.5.4.7子节For spectral whitening in IGF_WhiteningLevel, see section 5.5.5.4.7 of ISO/IEC 23008-3.

seed[] randomSign()使用的噪声填充种子，参见ISO/IEC 23003-3第7.2子节See section 7.2 of ISO/IEC 23003-3 for the noise-filled seed used by seed[] randomSign().

对于一些特定实施例，解码过程可以例如描述如下：For some specific embodiments, the decoding process can be described, for example, as follows:

使用四个连续操作执行MCT立体声填充，如下所述：MCT stereo fill is performed using four consecutive operations, as described below:

步骤1：为立体声填充算法准备第二声道的频谱 Step 1: Prepare the spectrum of the second channel for the stereo fill algorithm

如果给定MCT声道对的立体声填充指示符hasStereoFilling[pair]等于零，则不使用立体声填充，并且不执行以下步骤。否则，如果先前将比例因子应用于该声道对的第二声道频谱spectral_data[ch2]，则会撤消比例因子应用。If the stereo fill indicator hasStereoFilling[pair] for a given MCT channel pair is zero, stereo fill is not used, and the following steps are not performed. Otherwise, if a scaling factor was previously applied to the second channel spectrum spectral_data[ch2] of the channel pair, the scaling factor application is undone.

步骤2：为给定的MCT声道对生成先前下混频谱 Step 2: Generate the previous downmix spectrum for the given MCT channel pair

根据在应用MCT处理之后存储的先前帧的输出信号spectral_data_prev[][]估计先前下混频。如果先前输出声道信号不可用，例如由于单独的帧(indepFlag>0)，变换长度变化或core_mode＝＝1，对应声道的前一声道缓冲器应设置为零。The previous downmix is estimated based on the output signal of the previous frame stored after the MCT processing is applied, spectral_data_prev[][]. If the previous output channel signal is unavailable, for example due to a single frame (indepFlag>0), a change in transform length, or core_mode==1, the previous channel buffer for the corresponding channel should be set to zero.

对于预测立体声对，即MCTSignalingType＝＝0，先前下混频根据先前输出声道计算为[1]的第5.5.5.4.9.4子节的步骤2中定义的downmix_prev[][]，其中spectrum[window][]由spectral_data[][window]表示。For the predicted stereo pair, i.e., MCT SignalingType == 0, the previous downmix is calculated based on the previous output channel as downmix_prev[][] defined in step 2 of subsection 5.5.5.4.9.4 of [1], where spectrum[window][] is represented by spectral_data[][window].

对于旋转立体声对，即MCTSignalingType＝＝1，通过反转在[2]的第5.5.X.3.7.1子节中定义的旋转操作，根据先前输出声道计算先前下混频。For a rotating stereo pair, i.e., MCT SignalingType == 1, the previous downmixer is calculated based on the previous output channel by reversing the rotation operation defined in subsection 5.5.X.3.7.1 of [2].

使用先前帧的L＝spectral_data_prev[ch1][]、R＝spectral_data_prev[ch2][]、dmx＝downmix_prev[]，并使用当前帧和MCT对的Idx、nSamples。Use L = spectral_data_prev[ch1][], R = spectral_data_prev[ch2][], and dmx = downmix_prev[] from the previous frame, and use Idx and nSamples from the current frame and the MCT pair.

步骤3：在第二声道的空频带中执行立体声填充算法 Step 3: Perform stereo fill algorithm in the empty frequency band of the second channel.

立体声填充应用于MCT对的第二声道，如[1]的第5.5.5.4.9.4子节的步骤3中，其中spectrum[window]由spectral_data[ch2][window]表示并且max_sfb_ste由num_swb给出。Stereo fill is applied to the second channel of the MCT pair, as in step 3 of subsection 5.5.5.4.9.4 of [1], where spectrum[window] is represented by spectral_data[ch2][window] and max_sfb_ste is given by num_swb.

步骤4：比例因子应用和噪声填充种子的自适应同步。 Step 4: Adaptive synchronization of scaling factor application and noise-filled seed.

在[1]的第5.5.5.4.9.4子节的步骤3之后，比例因子应用于所得的频谱，如在ISO/IEC 23003-3的7.3中，其中空频带的比例因子像常规比例因子一样被处理。在未定义比例因子的情况下，例如因为其位于max_sfb之上，则其值应等于零。如果使用IGF，在任何第二声道的片块中igf_WhiteningLevel等于2，并且两个声道都不采用八个短变换，在执行decode_mct()之前，在从索引noiseFillingStartOffset到索引ccfl/2-1的范围内计算MCT声道对中两个声道的谱能量。如果计算的第一声道的能量比第二声道的能量大8倍以上，则将第二声道的种子[ch2]设置为等于第一声道的种子[ch1]。Following step 3 in subsection 5.5.5.4.9.4 of [1], a scaling factor is applied to the resulting spectrum, as in 7.3 of ISO/IEC 23003-3, where the scaling factor for empty frequency bands is treated like a regular scaling factor. In the case of an undefined scaling factor, for example because it is above max_sfb, its value should be zero. If IGF is used, and igf_WhiteningLevel is equal to 2 in any second channel patch, and neither channel uses the eight short transforms, the spectral energy of the two channels in the MCT channel pair is calculated in the range from index noiseFillingStartOffset to index ccfl/2-1 before executing decode_mct(). If the calculated energy of the first channel is more than 8 times greater than the energy of the second channel, the seed [ch2] of the second channel is set to be equal to the seed [ch1] of the first channel.

尽管已经在装置的上下文中描述了一些方面，但是显然这些方面也表示对应方法的描述，其中块或设备对应于方法步骤或方法步骤的特征。类似地，在方法步骤的上下文中描述的方面还表示对应装置的对应块或项目或特征的描述。一些或所有方法步骤可以由(或使用)硬件装置执行，例如微处理器、可编程计算机或电子电路。在一些实施例中，可以用这样的装置执行一个或多个最重要的方法步骤。Although some aspects have been described in the context of the apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Similarly, aspects described in the context of method steps also represent a description of a corresponding block or item or feature of the corresponding apparatus. Some or all of the method steps may be performed by (or using) hardware devices, such as microprocessors, programmable computers, or electronic circuits. In some embodiments, one or more of the most important method steps may be performed with such a device.

根据某些实施方式要求，可以用硬件或软件实现、或者至少部分地用硬件实现、或至少部分地用软件实现本发明的实施例。可以使用其上存储有电子可读控制信号的数字存储介质来执行该实施方式，该数字存储介质例如是软盘、DVD、蓝光、CD、ROM、PROM、EPROM、EEPROM或FLASH存储器，该电子可读控制信号与可编程计算机系统协作(或能够与其协作)，从而执行相应方法。因此，数字存储介质可以是计算机可读的。According to certain implementation requirements, embodiments of the invention can be implemented in hardware or software, or at least partially in hardware or at least partially in software. This implementation can be performed using a digital storage medium having electronically readable control signals stored thereon, such as a floppy disk, DVD, Blu-ray, CD, ROM, PROM, EPROM, EEPROM, or FLASH memory, which cooperates with (or is capable of cooperating with) a programmable computer system to perform the corresponding methods. Therefore, the digital storage medium can be computer-readable.

根据本发明的一些实施例包括具有电子可读控制信号的数据载体，该电子可读控制信号能够与可编程计算机系统协作，从而执行本文所述的方法之一。Some embodiments of the invention include a data carrier having electronically readable control signals that are capable of cooperating with a programmable computer system to perform one of the methods described herein.

通常，本发明的实施例可以实现为具有程序代码的计算机程序产品，该程序代码可操作用于，当计算机程序产品在计算机上运行时，执行这些方法之一。该程序代码可以例如存储在机器可读载体上。Typically, embodiments of the present invention can be implemented as a computer program product having program code operable to perform one of these methods when the computer program product is run on a computer. This program code may, for example, be stored on a machine-readable medium.

其他实施例包括被存储在机器可读载体上的用于执行本文所述方法之一的计算机程序。Other embodiments include a computer program stored on a machine-readable medium for performing one of the methods described herein.

换言之，本发明方法的实施例因此是具有程序代码的计算机程序，该程序代码用于，当计算机程序在计算机上运行时，执行本文所述方法之一In other words, embodiments of the method of the present invention are therefore computer programs having program code that, when the computer program is run on a computer, execute one of the methods described herein.

因此，本发明方法的另一实施例是数据载体(或数字存储介质，或计算机可读介质)，包括记录在其上的用于执行本文所述方法之一的计算机程序。数据载体、数字存储介质或记录介质通常是有形的和/或非暂时性的。Therefore, another embodiment of the method of the present invention is a data carrier (or digital storage medium, or computer-readable medium) including a computer program recorded thereon for performing one of the methods described herein. The data carrier, digital storage medium, or recording medium is generally tangible and/or non-transitory.

因此，本发明方法的另一实施例是表示用于执行本文所述方法之一的计算机程序的数据流或信号序列。数据流或信号序列可以例如被配置为经由数据通信连接，例如经由互联网，进行传送。Therefore, another embodiment of the method of the present invention represents a data stream or signal sequence for performing one of the methods described herein. The data stream or signal sequence may, for example, be configured to be transmitted via a data communication connection, such as via the Internet.

另一实施例包括被配置为或适于执行本文所述方法之一的处理装置，例如计算机或可编程逻辑设备。Another embodiment includes a processing device, such as a computer or programmable logic device, configured or adapted to perform one of the methods described herein.

另一实施例包括其上安装有用于执行本文所述方法之一的计算机程序的计算机。Another embodiment includes a computer on which a computer program for performing one of the methods described herein is installed.

根据本发明的另一实施例包括一种装置或系统，被配置为向接收器传送(例如，电子地或光学地)用于执行本文所述方法之一的计算机程序。接收器可以是例如计算机、移动设备、存储设备等。该装置或系统可以例如包括用于向接收器传送计算机程序的文件服务器。Another embodiment of the invention includes an apparatus or system configured to transmit (e.g., electronically or optically) to a receiver a computer program for performing one of the methods described herein. The receiver may be, for example, a computer, mobile device, storage device, etc. The apparatus or system may, for example, include a file server for transmitting the computer program to the receiver.

在一些实施例中，可编程逻辑器件(例如，现场可编程门阵列)可用于执行本文所述方法的一些或全部功能。在一些实施例中，现场可编程门阵列可以与微处理器协作，以便执行本文所述方法之一。通常，优选地由任何硬件装置执行该方法。In some embodiments, a programmable logic device (e.g., a field-programmable gate array) may be used to perform some or all of the functions of the methods described herein. In some embodiments, the field-programmable gate array may cooperate with a microprocessor to perform one of the methods described herein. Generally, the method is preferably performed by any hardware device.

这里所述装置可以使用硬件装置、或使用计算机、或使用硬件装置和计算机的组合来实现。The apparatus described herein may be implemented using hardware devices, or using a computer, or using a combination of hardware devices and a computer.

本文所述的方法可以使用硬件装置、或使用计算机、或使用硬件装置和计算机的组合来执行。The methods described herein can be performed using hardware devices, computers, or a combination of hardware devices and computers.

上述实施例仅用于说明本发明的原理。应理解，本文所述的布置和细节的修改和变型对于本领域技术人员而言将是显而易见的。因此，旨在仅由专利的所附权利要求的范围限定，而并非由以描述和解释本文实施例的方式呈现的具体细节限定。The above embodiments are merely illustrative of the principles of the invention. It should be understood that modifications and variations of the arrangements and details described herein will be readily apparent to those skilled in the art. Therefore, the invention is intended to be limited only by the scope of the appended claims and not by the specific details presented in the manner of describing and interpreting the embodiments herein.

实施方式1：一种用于对先前帧的先前编码的多声道信号进行解码以获得三个或更多个先前音频输出声道并且用于对当前帧的当前编码的多声道信号(107)进行解码以获得三个或更多个当前音频输出声道的装置(201)， Implementation 1: An apparatus (201) for decoding a previously encoded multichannel signal of a previous frame to obtain three or more previous audio output channels and for decoding a currently encoded multichannel signal (107) of the current frame to obtain three or more current audio output channels.

其中，所述装置(201)包括接口(212)、声道解码器(202)、用于生成所述三个或更多个当前音频输出声道的多声道处理器(204)、以及噪声填充模块(220)，The device (201) includes an interface (212), a channel decoder (202), a multichannel processor (204) for generating the three or more current audio output channels, and a noise filling module (220).

其中，所述接口(212)适于接收所述当前编码的多声道信号(107)，并且适于接收包括第一多声道参数(MCH_PAR2)的辅助信息，The interface (212) is adapted to receive the currently encoded multi-channel signal (107) and to receive auxiliary information including a first multi-channel parameter (MCH_PAR2).

其中，所述声道解码器(202)适于对所述当前帧的所述当前编码的多声道信号进行解码以获得所述当前帧的三个或更多个解码的声道(D1、D2、D3)的集合，The channel decoder (202) is adapted to decode the currently encoded multichannel signal of the current frame to obtain a set of three or more decoded channels (D1, D2, D3) of the current frame.

其中，所述多声道处理器(204)适于根据所述第一多声道参数(MCH_PAR2)从所述三个或更多个解码的声道(D1、D2、D3)的集合中选择两个解码的声道(D1、D2)的第一所选对，The multi-channel processor (204) is adapted to select a first selected pair of two decoded channels (D1, D2) from the set of three or more decoded channels (D1, D2) according to the first multi-channel parameter (MCH_PAR2).

其中，所述多声道处理器(204)适于基于所述两个解码的声道(D1、D2)的第一所选对生成第一组两个或更多个处理的声道(P1*、P2*)，以获得更新后的三个或更多个解码的声道(D3、P1*、P2*)的集合，The multi-channel processor (204) is adapted to generate a first set of two or more processed channels (P1*, P2*) based on a first selected pair of the two decoded channels (D1, D2) to obtain an updated set of three or more decoded channels (D3, P1*, P2*).

其中，在所述多声道处理器(204)基于所述两个解码的声道(D1、D2)的第一所选对生成所述两个或更多个处理的声道(P1*、P2*)的第一对声道之前，所述噪声填充模块(220)适于针对所述两个解码的声道(D1、D2)的第一所选对的两个声道中的至少一个声道来标识其内部所有谱线被量化为零的一个或多个频带，并且适于使用所述三个或更多个先前音频输出声道中的两个或更多个但非所有声道来生成混合声道，并且适于以使用所述混合声道的谱线生成的噪声来填充其内部所有谱线被量化为零的所述一个或多个频带的谱线，其中，所述噪声填充模块(220)适于根据所述辅助信息从所述三个或更多个先前音频输出声道中选择用于生成所述混合声道的两个或更多个先前音频输出声道。Before the multichannel processor (204) generates a first pair of channels of the two or more processed channels (P1*, P2*) based on a first selected pair of the two decoded channels (D1, D2), the noise filling module (220) is adapted to identify one or more frequency bands in which all spectral lines of the first selected pair of the two decoded channels (D1, D2) are quantized to zero, and is adapted to generate a mixed channel using two or more, but not all, of the three or more previous audio output channels, and is adapted to fill the spectral lines of the one or more frequency bands in which all spectral lines of the mixed channel are quantized to zero with noise generated using the spectral lines of the mixed channel, wherein the noise filling module (220) is adapted to select two or more previous audio output channels from the three or more previous audio output channels for generating the mixed channel according to the auxiliary information.

实施方式2：根据实施方式1所述的装置(201)， Implementation Method 2: The apparatus (201) according to Implementation Method 1,

其中，所述噪声填充模块(220)适于使用所述三个或更多个先前音频输出声道中的恰好两个先前音频输出声道作为所述三个或更多个先前音频输出声道中的所述二个或更多个先前音频输出声道来生成所述混合声道；The noise filling module (220) is adapted to generate the mixed channel by using exactly two of the three or more previous audio output channels as the two or more previous audio output channels among the three or more previous audio output channels;

其中，所述噪声填充模块(220)适于根据所述辅助信息从所述三个或更多个先前音频输出声道中选择所述恰好两个先前音频输出声道。The noise filling module (220) is adapted to select exactly two previous audio output channels from the three or more previous audio output channels based on the auxiliary information.

实施方式3：根据实施方式2所述的装置(201)， Implementation method 3: The apparatus (201) according to implementation method 2,

其中，所述噪声填充模块(220)适于基于以下等式The noise filling module (220) is adapted to be based on the following equation

或基于以下等式Or based on the following equation

使用恰好两个先前音频输出声道生成所述混合声道，The mixed channel is generated using exactly two of the previous audio output channels.

其中，D_ch是所述混合声道，Wherein, D _ch is the mixed audio channel.

其中，是所述恰好两个先前音频输出声道中的第一声道，Here, is the first channel of exactly two previously output audio channels.

其中，是所述恰好两个先前音频输出声道中的第二声道，所述第二声道与所述恰好两个先前音频输出声道中的所述第一声道不同，并且Here, is the second channel of the exactly two previous audio output channels, and the second channel is different from the first channel of the exactly two previous audio output channels.

其中，d是实数正标量。Where d is a real positive scalar.

实施方式4：根据实施方式2所述的装置(201)， Implementation method 4: The apparatus (201) according to implementation method 2,

或基于以下等式Or based on the following equation

使用恰好两个先前音频输出声道来生成所述混合声道，The mixed channel is generated using exactly two of the previous audio output channels.

其中，是所述混合声道，Wherein, is the mixed audio channel,

其中，α是旋转角度。Where α is the rotation angle.

实施方式5：根据实施方式4所述的装置(201)， Implementation method 5: The apparatus (201) according to implementation method 4,

其中，所述辅助信息为被分配给所述当前帧的当前辅助信息，The auxiliary information refers to the current auxiliary information assigned to the current frame.

其中，所述接口(212)适于接收被分配给先前帧的先前辅助信息，其中，所述先前辅助信息包括先前角度，The interface (212) is adapted to receive previous auxiliary information allocated to a previous frame, wherein the previous auxiliary information includes a previous angle.

其中，所述接口(212)适于接收包括当前角度的所述当前辅助信息，并且The interface (212) is adapted to receive the current assistance information including the current angle, and

其中，所述噪声填充模块(220)适于使用所述当前辅助信息的所述当前角度作为所述旋转角度α，并且适于不使用所述先前辅助信息的所述先前角度作为所述旋转角度α。The noise filling module (220) is adapted to use the current angle of the current auxiliary information as the rotation angle α, and is also adapted not to use the previous angle of the previous auxiliary information as the rotation angle α.

实施方式6：根据实施方式2至5中任一项所述的装置(201)，其中，所述噪声填充模块(220)适于根据所述第一多声道参数(MCH_PAR2)从所述三个或更多个先前音频输出声道中选择所述恰好两个先前音频输出声道。 Embodiment 6: The apparatus (201) according to any one of Embodiments 2 to 5, wherein the noise filling module (220) is adapted to select exactly two previous audio output channels from the three or more previous audio output channels according to the first multi-channel parameter (MCH_PAR2).

实施方式7：根据实施方式2-6中任一项所述的装置(201)， Embodiment 7: The apparatus (201) according to any one of Embodiments 2-6,

其中，所述接口(212)适于接收所述当前编码的多声道信号(107)，并且适于接收包括所述第一多声道参数(MCH_PAR2)和第二多声道参数(MCH_PAR1)的所述辅助信息，The interface (212) is adapted to receive the currently encoded multi-channel signal (107) and to receive the auxiliary information including the first multi-channel parameter (MCH_PAR2) and the second multi-channel parameter (MCH_PAR1).

其中，所述多声道处理器(204)适于根据所述第二多声道参数(MCH_PAR1)从所述更新后的三个或更多个解码的声道(D3、P1*、P2*)的集合中选择两个解码的声道(P1*、D3)的第二所选对，所述两个解码的声道(P1*、D3)的第二所选对中的至少一个声道(P1*)是所述两个或更多个处理的声道(P1*、P2*)的第一对声道中的一个声道，并且The multichannel processor (204) is adapted to select a second selected pair of two decoded channels (P1*, D3) from the updated set of three or more decoded channels (D3, P1*, P2*) according to the second multichannel parameter (MCH_PAR1), wherein at least one channel (P1*) in the second selected pair of the two decoded channels (P1*, D3) is one of the first pair of channels of the two or more processed channels (P1*, P2*), and

其中，所述多声道处理器(204)适于基于所述两个解码的声道(P1*、D3)的第二所选对生成第二组两个或更多个处理的声道(P3*、P4*)，以进一步更新所述更新后的三个或更多个解码的声道的集合。The multi-channel processor (204) is adapted to generate a second set of two or more processed channels (P3*, P4*) based on a second selected pair of the two decoded channels (P1*, D3) to further update the set of the updated three or more decoded channels.

实施方式8：根据实施方式7所述的装置(201)， Implementation method 8: The apparatus (201) according to implementation method 7,

其中，所述多声道处理器(204)适于通过基于所述两个解码的声道(D1、D2)的第一所选对生成第一组恰好两个处理的声道(P1*、P2*)来生成所述第一组两个或更多个处理的声道(P1*、P2*)；The multi-channel processor (204) is adapted to generate a first set of two or more processed channels (P1*, P2*) by generating a first set of exactly two processed channels (P1*, P2*) based on a first selected pair of the two decoded channels (D1, D2).

其中，所述多声道处理器(204)适于用所述第一组恰好两个处理的声道(P1*、P2*)替换所述三个或更多个解码的声道(D1、D2、D3)的集合中的所述两个解码的声道(D1、D2)的第一所选对，以获得所述更新后的三个或更多个解码的声道(D3、P1*、P2*)的集合；The multichannel processor (204) is adapted to replace the first selected pair of two decoded channels (D1, D2) in the set of three or more decoded channels (D1, D2, D3) with exactly two processed channels (P1*, P2*) from the first set to obtain the updated set of three or more decoded channels (D3, P1*, P2*).

其中，所述多声道处理器(204)适于通过基于所述两个解码的声道(P1*、D3)的第二所选对生成第二组恰好两个处理的声道(P3*、P4*)来生成所述第二组两个或更多个处理的声道(P3*、P4*)，并且The multi-channel processor (204) is adapted to generate a second set of two or more processed channels (P3*, P4*) by generating a second set of exactly two processed channels (P3*, P4*) based on a second selected pair of the two decoded channels (P1*, D3), and

其中，所述多声道处理器(204)适于用所述第二组恰好两个处理的声道(P3*、P4*)替换所述更新后的三个或更多个解码的声道(D3、P1*、P2*)的集合中的所述两个解码的声道(P1*、D3)的第二所选对，以进一步更新所述更新后的三个或更多个解码的声道的集合。The multichannel processor (204) is adapted to replace the second selected pair of two decoded channels (P1*, D3) in the set of three or more decoded channels (D3, P1*, P2*) with exactly two processed channels (P3*, P4*) of the second set to further update the set of three or more decoded channels.

实施方式9：根据实施方式8所述的装置(201)， Embodiment 9: The apparatus (201) according to Embodiment 8,

其中，所述第一多声道参数(MCH_PAR2)指示所述三个或更多个解码的声道的集合中的两个解码的声道(D1、D2)；Wherein, the first multichannel parameter (MCH_PAR2) indicates two decoded channels (D1, D2) in the set of three or more decoded channels;

其中，所述多声道处理器(204)适于通过选择由所述第一多声道参数(MCH_PAR2)指示的所述两个解码的声道(D1、D2)来从所述三个或更多个解码的声道(D1、D2、D3)的集合中选择所述两个解码的声道(D1、D2)的第一所选对；The multichannel processor (204) is adapted to select a first selected pair of the two decoded channels (D1, D2) from the set of three or more decoded channels (D1, D2) by selecting the two decoded channels (D1, D2) indicated by the first multichannel parameter (MCH_PAR2);

其中，所述第二多声道参数(MCH_PAR1)指示所述更新后的三个或更多个解码的声道的集合中的两个解码的声道(P1*、D3)；Wherein, the second multichannel parameter (MCH_PAR1) indicates two decoded channels (P1*, D3) in the set of three or more decoded channels after the update;

其中，所述多声道处理器(204)适于通过选择由所述第二多声道参数(MCH_PAR1)指示的所述两个解码的声道(P1*、D3)来从所述更新后的三个或更多个解码的声道(D3、P1*、P2*)的集合中选择所述两个解码的声道(P1*、D3)的第二所选对。The multichannel processor (204) is adapted to select a second selected pair of the two decoded channels (P1*, D3) from the updated set of three or more decoded channels (D3, P1*, P2*) by selecting the two decoded channels (P1*, D3) indicated by the second multichannel parameter (MCH_PAR1).

实施方式10：根据实施方式9所述的装置(201)， Implementation 10: The apparatus (201) according to Implementation 9,

其中，所述装置(201)适于向所述三个或更多个先前音频输出声道中的每个先前音频输出声道分配标识符集合中的标识符，使得所述三个或更多个先前音频输出声道中的每个先前音频输出声道被分配所述标识符集合中的恰好一个标识符，并且使得所述标识符集合中的每个标识符被分配给所述三个或更多个先前音频输出声道中的恰好一个先前音频输出声道，The device (201) is adapted to assign an identifier from an identifier set to each of the three or more previous audio output channels, such that each of the three or more previous audio output channels is assigned exactly one identifier from the identifier set, and that each identifier from the identifier set is assigned to exactly one of the three or more previous audio output channels.

其中，所述装置(201)适于向所述三个或更多个解码的声道(D1、D2、D3)的集合中的每个声道分配所述标识符集合中的标识符，使得所述三个或更多个解码的声道集合中的每个声道被分配所述标识符集合中的恰好一个标识符，并且使得所述标识符集合中的每个标识符被分配给所述三个或更多个解码的声道(D1、D2、D3)的集合中的恰好一个声道，The device (201) is adapted to assign an identifier from the identifier set to each channel in the set of three or more decoded channels (D1, D2, D3), such that each channel in the set of three or more decoded channels is assigned exactly one identifier from the identifier set, and such that each identifier in the identifier set is assigned to exactly one channel in the set of three or more decoded channels (D1, D2, D3).

其中，所述第一多声道参数(MCH_PAR2)指示三个或更多个标识符的集合中的第一对两个标识符，Wherein, the first multichannel parameter (MCH_PAR2) indicates the first pair of two identifiers in a set of three or more identifiers.

其中，所述多声道处理器(204)适于通过选择被分配所述第一对两个标识符的两个标识符的两个解码的声道(D1、D2)来从所述三个或更多个解码的声道(D1、D2、D3)的集合中选择所述两个解码的声道(D1、D2)的第一所选对；The multichannel processor (204) is adapted to select a first selected pair of two decoded channels (D1, D2) from a set of three or more decoded channels (D1, D2) by selecting two decoded channels (D1, D2) of the two identifiers assigned to the first pair of two identifiers.

其中，所述装置(201)适于向所述第一组恰好两个处理的声道(P1*、P2*)中的第一处理的声道分配所述第一对两个标识符的两个标识符中的第一标识符，并且其中，所述装置(210)适于向所述第一组恰好两个处理的声道(P1*、P2*)中的第二处理的声道分配所述第一对两个标识符的两个标识符中的第二标识符。The device (201) is adapted to assign the first identifier of the two identifiers of the first pair of two identifiers to the first processing channel (P1*, P2*) of the first group of exactly two processing channels, and the device (210) is adapted to assign the second identifier of the two identifiers of the first pair of two identifiers to the second processing channel (P1*, P2*) of the first group of exactly two processing channels.

实施方式11：根据实施方式10所述的装置(201)， Implementation method 11: The apparatus (201) according to implementation method 10,

其中，所述第二多声道参数(MCH_PAR1)指示所述三个或更多个标识符的集合中的第二对两个标识符，Wherein, the second multi-channel parameter (MCH_PAR1) indicates the second pair of two identifiers in the set of three or more identifiers.

其中，所述多声道处理器(204)适于通过选择被分配所述第二对两个标识符的两个标识符的两个解码的声道(D3、P1*)来从所述更新后的三个或更多个解码的声道(D3、P1*、P2*)的集合中选择所述两个解码的声道(P1*、D3)的第二所选对；The multichannel processor (204) is adapted to select a second selected pair of the two decoded channels (P1*, D3) from the updated set of three or more decoded channels (D3, P1*, P2*) by selecting two decoded channels (D3, P1*) of the two identifiers assigned to the second pair of two identifiers.

其中，所述装置(201)适于向所述第二组恰好两个处理的声道(P3*、P4*)中的第一处理的声道分配所述第二对两个标识符的两个标识符中的第一标识符，并且其中，所述装置(201)适于向所述第二组恰好两个处理的声道(P3*、P4*)中的第二处理的声道分配所述第二对两个标识符的两个标识符中的第二标识符。The device (201) is adapted to assign the first identifier of the two identifiers of the second pair of two identifiers to the first processing channel of the second set of exactly two processing channels (P3*, P4*), and the device (201) is adapted to assign the second identifier of the two identifiers of the second pair of two identifiers to the second processing channel of the second set of exactly two processing channels (P3*, P4*).

实施方式12：根据实施方式10或11所述的装置(201)， Implementation method 12: The apparatus (201) according to implementation method 10 or 11,

其中，所述第一多声道参数(MCH_PAR2)指示所述三个或更多个标识符的集合中的所述第一对两个标识符，并且Wherein, the first multi-channel parameter (MCH_PAR2) indicates the first pair of two identifiers in the set of three or more identifiers, and

其中，所述噪声填充模块(220)适于通过选择被分配所述第一对两个标识符的两个标识符的两个先前音频输出声道来从所述三个或更多个先前音频输出声道中选择所述恰好两个先前音频输出声道。The noise filling module (220) is adapted to select exactly two previous audio output channels from three or more previous audio output channels by selecting two previous audio output channels of the two identifiers assigned to the first pair of two identifiers.

实施方式13：根据前述实施方式中任一项所述的装置(201)，其中，在所述多声道处理器(204)基于所述两个解码的声道(D1、D2)的第一所选对生成所述两个或更多个处理的声道(P1*、P2*)的第一对声道之前，所述噪声填充模块(220)适于针对所述两个解码的声道(D1、D2)的第一所选对的两个声道中的至少一个声道来标识其内部所有谱线被量化为零的一个或多个比例因子带，所述一个或多个比例因子带是所述一个或多个频带，并且适于使用所述三个或更多个先前音频输出声道中的所述两个或更多个但非所有声道来生成所述混合声道，并且适于根据其内部所有谱线被量化为零的所述一个或多个比例因子带中的每个比例因子带的比例因子，以使用所述混合声道的谱线生成的噪声来填充其内部所有谱线被量化为零的所述一个或多个比例因子带的谱线。 Implementation 13: The apparatus (201) according to any one of the preceding embodiments, wherein, before the multichannel processor (204) generates a first pair of channels of the two or more processed channels (P1*, P2*) based on a first selected pair of the two decoded channels (D1, D2), the noise filling module (220) is adapted to identify one or more scaling factor bands in which all spectral lines of the first selected pair of the two decoded channels (D1, D2) are quantized to zero, the one or more scaling factor bands being the one or more frequency bands, and is adapted to generate the mixed channel using two or more but not all of the three or more previous audio output channels, and is adapted to fill the spectral lines of the one or more scaling factor bands in which all spectral lines of the mixed channel are quantized to zero with noise generated using the spectral lines of the mixed channel, according to the scaling factor of each of the one or more scaling factor bands in which all spectral lines of the mixed channel are quantized to zero.

实施方式14：根据实施方式13所述的装置(201)， Embodiment 14: The apparatus (201) according to Embodiment 13,

其中，所述接收接口(212)被配置为接收所述一个或多个比例因子带中的每个比例因子带的比例因子，并且The receiving interface (212) is configured to receive the scaling factor of each scaling factor band in the one or more scaling factor bands, and

其中，所述一个或多个比例因子带中的每个比例因子带的比例因子指示在量化之前所述比例因子带的谱线的能量，并且Wherein, the scaling factor of each scaling factor band in the one or more scaling factor bands indicates the energy of the spectral line of the scaling factor band before quantization, and

其中，所述噪声填充模块(220)适于针对其内部所有谱线被量化为零的所述一个或多个比例因子带中的每个比例因子带生成所述噪声，使得在将所述噪声加到所述频带中的一个频带之后所述谱线的能量对应于所述比例因子带的比例因子指示的能量。The noise filling module (220) is adapted to generate the noise for each of the one or more scaling factor bands in which all spectral lines are quantized to zero, such that after the noise is added to one of the frequency bands, the energy of the spectral line corresponds to the energy indicated by the scaling factor of the scaling factor band.

实施方式15：一种用于对具有至少三个声道(CH1：CH3)的多声道信号(101)进行编码的装置(100)，其中，所述装置包括： Embodiment 15: An apparatus (100) for encoding a multichannel signal (101) having at least three channels (CH1:CH3), wherein the apparatus comprises:

迭代处理器(102)，适于在第一迭代步骤中计算所述至少三个声道(CH1：CH3)中的每对声道之间的声道间相关值，以用于在所述第一迭代步骤中选择具有最高值或具有高于阈值的值的声道对，并且用于使用多声道处理操作(110、112)处理所选声道对从而导出所选声道对的初始多声道参数(MCH_PAR1)并导出第一处理的声道(P1、P2)，An iterative processor (102) is adapted to calculate, in a first iterative step, the interchannel correlation value between each pair of channels in the at least three channels (CH1:CH3) for selecting, in the first iterative step, the channel pair having the highest value or a value above a threshold, and for processing the selected channel pair using multichannel processing operations (110, 112) to derive the initial multichannel parameters (MCH_PAR1) of the selected channel pair and derive the channels (P1, P2) of the first processing.

其中，所述迭代处理器(102)适于在第二迭代步骤中使用所述处理的声道中的至少一个处理的声道(P1)执行所述计算、所述选择和所述处理，以导出其它的多声道参数(MCH_PAR2)和第二处理的声道(P3、P4)；The iterative processor (102) is adapted to perform the calculation, the selection and the processing in the second iterative step using at least one of the processing channels (P1) to derive other multi-channel parameters (MCH_PAR2) and the second processing channels (P3, P4).

声道编码器，适于对通过所述迭代处理器(104)执行的迭代处理得到的声道(P2：P4)进行编码以获得编码的声道(E1：E3)；以及A channel encoder, adapted to encode the channels (P2:P4) obtained by the iterative processing performed by the iterative processor (104) to obtain encoded channels (E1:E3); and

输出接口(106)，适于生成编码的多声道信号(107)，所述编码的多声道信号(107)具有所述编码的声道(E1：E3)、所述初始多声道参数和所述其它的多声道参数(MCH_PAR1、MCH_PAR2)，并且具有指示用于解码的装置是否须以基于先前解码的音频输出声道所生成的噪声来填充其内部所有谱线被量化为零的一个或多个频带的谱线的信息，所述先前解码的音频输出声道先前已经被所述用于解码的装置所解码。An output interface (106) is adapted to generate an encoded multichannel signal (107) having the encoded channels (E1:E3), the initial multichannel parameters, and the other multichannel parameters (MCH_PAR1, MCH_PAR2), and having information indicating whether the decoding device needs to fill the spectral lines of one or more frequency bands in which all spectral lines are quantized to zero with noise generated based on previously decoded audio output channels that have been previously decoded by the decoding device.

实施方式16：根据实施方式15所述的装置(100)， Implementation method 16: The apparatus (100) according to implementation method 15,

其中，所述初始多声道参数和所述其它的多声道参数(MCH_PAR1、MCH_PAR2)中的每个参数指示恰好两个声道，所述恰好两个声道中的每个声道是所述编码的声道(E1：E3)中的一个声道或者是所述第一处理的声道或所述第二处理的声道(P1、P2、P3、P4)中的一个声道或者是所述至少三个声道(CH1：CH3)中的一个声道，并且Wherein, each of the initial multichannel parameters and the other multichannel parameters (MCH_PAR1, MCH_PAR2) indicates exactly two channels, each of the exactly two channels being one of the encoded channels (E1:E3), one of the channels of the first processing, one of the channels of the second processing (P1, P2, P3, P4), or one of the at least three channels (CH1:CH3).

其中，所述输出接口(106)适于生成所述编码的多声道信号(107)，使得指示用于解码的装置是否须填充其内部所有谱线被量化为零的一个或多个频带的谱线的所述信息包括指示如下内容的信息：对于所述初始多声道参数和所述其它的多声道参数(MCH_PAR1、MCH_PAR2)中的每个参数，针对所述初始多声道参数和所述其它的多声道参数(MCH_PAR1、MCH_PAR2)中的所述参数所指示的恰好两个声道中的至少一个声道，所述用于解码的装置是否须以基于先前解码的音频输出声道所生成的频谱数据来填充其内部所有谱线被量化为零的一个或多个频带的谱线，所述先前解码的音频输出声道先前已经被所述用于解码的装置所解码。The output interface (106) is adapted to generate the encoded multichannel signal (107) such that the information indicating whether the device for decoding needs to fill the spectral lines of one or more frequency bands in which all spectral lines are quantized to zero includes information indicating whether, for each of the initial multichannel parameters and the other multichannel parameters (MCH_PAR1, MCH_PAR2), for at least one of exactly two channels indicated by the parameters in the initial multichannel parameters and the other multichannel parameters (MCH_PAR1, MCH_PAR2), the device for decoding needs to fill the spectral lines of one or more frequency bands in which all spectral lines are quantized to zero with spectral data generated based on previously decoded audio output channels that have previously been decoded by the device for decoding.

实施方式17：一种系统，包括： Implementation method 17: A system comprising:

根据实施方式15或16所述的用于编码的装置(100)，以及The encoding device (100) according to embodiment 15 or 16, and

根据实施方式1至14中任一项所述的用于解码的装置(201)，According to any one of embodiments 1 to 14, the device (201) for decoding,

其中，所述用于解码的装置(201)被配置为从所述用于编码的装置(100)接收所述用于编码的装置(100)生成的编码的多声道信号(107)。The decoding device (201) is configured to receive the encoded multi-channel signal (107) generated by the encoding device (100) from the encoding device (100).

实施方式18：一种用于对先前帧的先前编码的多声道信号进行解码以获得三个或更多个先前音频输出声道、并且用于对当前帧的当前编码的多声道信号(107)进行解码以获得三个或更多个当前音频输出声道的方法，其中，所述方法包括： Implementation 18: A method for decoding a previously encoded multichannel signal of a previous frame to obtain three or more previous audio output channels, and for decoding a currently encoded multichannel signal (107) of the current frame to obtain three or more current audio output channels, wherein the method includes:

接收所述当前编码的多声道信号(107)，并且接收包括第一多声道参数(MCH_PAR2)的辅助信息；Receive the currently encoded multichannel signal (107), and receive auxiliary information including the first multichannel parameter (MCH_PAR2);

对所述当前帧的所述当前编码的多声道信号进行解码以获得所述当前帧的三个或更多个解码的声道(D1、D2、D3)的集合；Decode the currently encoded multichannel signal of the current frame to obtain a set of three or more decoded channels (D1, D2, D3) of the current frame;

根据所述第一多声道参数(MCH_PAR2)从所述三个或更多个解码的声道(D1、D2、D3)的集合中选择两个解码的声道(D1、D2)的第一所选对；Based on the first multichannel parameter (MCH_PAR2), a first selected pair of two decoded channels (D1, D2) is selected from the set of three or more decoded channels (D1, D2).

基于所述两个解码的声道(D1、D2)的第一所选对生成第一组两个或更多个处理的声道(P1*、P2*)，以获得更新后的三个或更多个解码的声道(D3、P1*、P2*)的集合；Based on the first selected pair of the two decoded channels (D1, D2), a first set of two or more processed channels (P1*, P2*) is generated to obtain an updated set of three or more decoded channels (D3, P1*, P2*).

其中，在基于所述两个解码的声道(D1、D2)的第一所选对生成所述两个或更多个处理的声道(P1*、P2*)的第一对声道之前，进行以下步骤：Before generating the first pair of channels (P1*, P2*) of the two or more processed channels based on the first selected pair of the two decoded channels (D1, D2), the following steps are performed:

针对所述两个解码的声道(D1、D2)的第一所选对的两个声道中的至少一个声道来标识其内部所有谱线被量化为零的一个或多个频带，并且使用所述三个或更多个先前音频输出声道中的两个或更多个但非所有声道生成混合声道，并且以使用所述混合声道的谱线生成的噪声来填充其内部所有谱线被量化为零的所述一个或多个频带的谱线，其中，根据所述辅助信息进行从所述三个或更多个先前音频输出声道中选择用于生成所述混合声道的两个或更多个先前音频输出声道。For the two decoded channels (D1, D2), at least one channel of the first selected pair is used to identify one or more frequency bands in which all spectral lines are quantized to zero, and a mixed channel is generated using two or more, but not all, of the three or more previous audio output channels, and the spectral lines of the one or more frequency bands in which all spectral lines are quantized to zero are filled with noise generated using the spectral lines of the mixed channel, wherein two or more previous audio output channels are selected from the three or more previous audio output channels to generate the mixed channel according to the auxiliary information.

实施方式19：一种用于对具有至少三个声道(CH1：CH3)的多声道信号(101)进行编码的方法，其中，所述方法包括： Implementation 19: A method for encoding a multichannel signal (101) having at least three channels (CH1:CH3), wherein the method includes:

在第一迭代步骤中计算所述至少三个声道(CH1：CH3)中的每对声道之间的声道间相关值，以用于在所述第一迭代步骤中选择具有最高值或具有高于阈值的值的声道对，并且使用多声道处理操作(110、112)来处理所选声道对从而导出所选声道对的初始多声道参数(MCH_PAR1)并导出第一处理的声道(P1、P2)；In the first iteration step, the interchannel correlation value between each pair of channels in the at least three channels (CH1:CH3) is calculated to select the channel pair with the highest value or a value above a threshold in the first iteration step, and the selected channel pair is processed using multichannel processing operations (110, 112) to derive the initial multichannel parameters (MCH_PAR1) of the selected channel pair and derive the channels (P1, P2) of the first processing.

在第二迭代步骤中，使用所述处理的声道中的至少一个声道(P1)执行所述计算、所述选择和所述处理，以导出其它的多声道参数(MCH_PAR2)和第二处理的声道(P3、P4)；In the second iteration step, the calculation, selection and processing are performed using at least one channel (P1) of the processed channels to derive other multichannel parameters (MCH_PAR2) and the channels (P3, P4) of the second processing.

对通过所述迭代处理器(104)执行的迭代处理得到的声道(P2：P4)进行编码，以获得编码的声道(E1：E3)；以及The audio channels (P2:P4) obtained by the iterative processing performed by the iterative processor (104) are encoded to obtain encoded audio channels (E1:E3); and

生成编码的多声道信号(107)，所述编码的多声道信号(107)具有所述编码的声道(E1：E3)、所述初始多声道参数和所述其它的多声道参数(MCH_PAR1、MCH_PAR2)，并且具有指示用于解码的装置是否须以基于先前解码的音频输出声道所生成的噪声来填充其内部所有谱线被量化为零的一个或多个频带的谱线的信息，所述先前解码的音频输出声道先前已经被所述用于解码的装置所解码。Generate an encoded multichannel signal (107) having the encoded channels (E1:E3), the initial multichannel parameters and the other multichannel parameters (MCH_PAR1, MCH_PAR2), and having information indicating whether the decoding device needs to fill the spectral lines of one or more frequency bands in which all spectral lines are quantized to zero with noise generated based on previously decoded audio output channels that have been previously decoded by the decoding device.

实施方式20：一种计算机程序，当在计算机或信号处理器上执行时用于实施根据实施方式18或19所述的方法。 Implementation 20: A computer program, when executed on a computer or signal processor, for implementing the method according to Implementation 18 or 19.

实施方式21：一种编码的多声道信号(107)，包括： Implementation method 21: An encoded multi-channel signal (107), comprising:

编码的声道(E1：E3)，Encoded audio channels (E1:E3),

多声道参数(MCH_PAR1、MCH_PAR2)；以及Multichannel parameters (MCH_PAR1, MCH_PAR2); and

指示用于解码的装置是否须以基于先前解码的音频输出声道所生成的噪声来填充其内部所有谱线被量化为零的一个或多个频带的谱线的信息，所述先前解码的音频输出声道先前已经被所述用于解码的装置所解码。The information indicates whether the device used for decoding needs to fill the spectral lines of one or more frequency bands in which all spectral lines are quantized to zero with noise generated based on the previously decoded audio output channel, which has been previously decoded by the device used for decoding.

实施方式22：根据实施方式21所述的编码的多声道信号(107)， Implementation method 22: The multi-channel signal (107) encoded according to implementation method 21,

其中，所述编码的多声道信号包括两个或更多个多声道参数(MCH_PAR1、MCH_PAR2)作为所述多声道参数(MCH_PAR1、MCH_PAR2)，The encoded multichannel signal includes two or more multichannel parameters (MCH_PAR1, MCH_PAR2) as the multichannel parameters (MCH_PAR1, MCH_PAR2).

其中，所述两个或更多个多声道参数(MCH_PAR1、MCH_PAR2)中的每个参数指示恰好两个声道，所述恰好两个声道中的每个声道是所述编码的声道(E1:E3)中的一个声道或者是多个处理的声道(P1、P2、P3、P4))中的一个声道或者是至少三个初始声道(CH:CH3)中的一个声道，并且Wherein, each of the two or more multi-channel parameters (MCH_PAR1, MCH_PAR2) indicates exactly two channels, each of the exactly two channels being one of the encoded channels (E1:E3), one of the multiple processed channels (P1, P2, P3, P4), or one of at least three initial channels (CH:CH3).

其中，指示用于解码的装置是否须填充其内部所有谱线被量化为零的一个或多个频带的谱线的所述信息包括指示如下内容的信息：对于所述两个或更多个多声道参数(MCH_PAR1、MCH_PAR2)中的每个参数，针对所述两个或更多个多声道参数中的所述参数指示的所述恰好两个声道中的至少一个声道，所述用于解码的装置是否须以基于先前解码的音频输出声道所生成的频谱数据来填充其内部所有谱线被量化为零的一个或多个频带的谱线，所述先前解码的音频输出声道先前已经被所述用于解码的装置所解码。The information indicating whether the decoding device needs to fill the spectral lines of one or more frequency bands where all spectral lines are quantized to zero includes information indicating whether, for each of the two or more multichannel parameters (MCH_PAR1, MCH_PAR2), for at least one of the exactly two channels indicated by the parameter of the two or more multichannel parameters, the decoding device needs to fill the spectral lines of one or more frequency bands where all spectral lines are quantized to zero with spectral data generated based on previously decoded audio output channels that have previously been decoded by the decoding device.

参考文献References

[1]ISO/IEC international standard 23008-3:2015，″Informationtechnology-High efficiency coding and media deliverly in heterogeneousenvironments-Part 3:3Daudio，″March 2015[1]ISO/IEC international standard 23008-3:2015, "Information technology-High efficiency coding and media delivery in heterogeneousenvironments-Part 3:3Daudio," March 2015

[2]ISO/IEC amendment 23008-3:2015/PDAM3，″Information technology-Hiighefficiency coding and media delivery in heterogeneous environments-Part 3:3Daudio，Amendment 3:MPEG-H 3D Audio Phase 2,″July 2015[2]ISO/IEC amendment 23008-3:2015/PDAM3, "Information technology-Highefficiency coding and media delivery in heterogeneous environments-Part 3: 3Daudio, Amendment 3: MPEG-H 3D Audio Phase 2," July 2015

[3]International Organization for Standardization,ISO/IEC 23003-3:2012,″Information Technology-MPEG audio-Part 3:Unified speech and audiocoding,″Geneva,Jan.2012[3]International Organization for Standardization,ISO/IEC 23003-3:2012,″Information Technology-MPEG audio-Part 3:Unified speech and audiocoding,″Geneva,Jan.2012

[4]ISOIIEC 23003-1:2007-Information technology-MPEG audiotechnologies Part1:MPEG Surround[4]ISOIIEC 23003-1:2007-Information technology-MPEG audiotechnologies Part1:MPEG Surround

[5]C.R.Helmrich，A.Niedermeier，S.Bayer，B.Edler，″Low-Complexity Semi-Parametric Joint-Stereo Audio Transform Coding，″in Proc.EUSIPCO，Nice，September 2015[5] C.R.Helmrich, A.Niedermeier, S.Bayer, B.Edler, "Low-Complexity Semi-Parametric Joint-Stereo Audio Transform Coding," in Proc.EUSIPCO, Nice, September 2015

[6]ETSI TS 103 190 V1.1.1(2014-04)-Digital Audio Compression(AC-4)Standard[6]ETSI TS 103 190 V1.1.1(2014-04)-Digital Audio Compression(AC-4)Standard

[7]Yang,Dai and Ai,Hongmei and Kyriakakis,Chris and Kuo，C.-C.Jay,2001:Adaptive Karhunen-Loeve Transform for Enhanced Multichannet AudioCoding，http:/iict.usc.edu/pubs/Adaptive％20Karhunen-Loeve％20Transform％20for％20Enhanced％20Multichannel％20Audio％20Coding.pdf[7]Yang, Dai and Ai, Hongmei and Kyriakakis, Chris and Kuo, C.-C.Jay, 2001: Adaptive Karhunen-Loeve Transform for Enhanced Multichanne t AudioCoding, http://iict.usc.edu/pubs/Adaptive%20Karhunen-Loeve%20Transform%20for%20Enhanced%20Multichannel%20Audio%20Coding.pdf

[8]European Patent Application,Publication EP 2 830 060 A1:″Noisefilling in multichannel audio coding″,published on 28January 2015[8]European Patent Application, Publication EP 2 830 060 A1: "Noisefilling in multichannel audio coding", published on 28January 2015

[9]Internet Engineering Task Force(IETF),RFC 6716,″Definition of theOpus Audio Codec,″Int.Standard,Sep.2012.Available online at:http://tools.ietf.org/html/rfc6716[9]Internet Engineering Task Force(IETF),RFC 6716,″Definition of theOpus Audio Codec,″Int.Standard,Sep.2012.Available online at:http://tools.ietf.org/html/rfc6716

[10]International Organization for Standardization,ISO/IEC 14496-3:2009,″Information Technology-Coding of audio-visual objects-Part 3:Audio，″Geneva，Switzerland，Aug.2009[10]International Organization for Standardization, ISO/IEC 14496-3:2009, "Information Technology-Coding of audio-visual objects-Part 3:Audio," Geneva, Switzerland, Aug.2009

[11]M.Neuendorf et al.，″MPEG Unified Speech and Audio Coding-The ISO/MPEG Standard for High-Efficiency Audio Coding of All Content Types，″inProc.132^ndAES Convention，Budapest，Hungary，Apr.2012.Also to appear in theJournal of the AES，2013[11] M.Neuendorf et al., "MPEG Unified Speech and Audio Coding-The ISO/MPEG Standard for High-Efficiency Audio Coding of All Content Types," in Proc.132 ^nd AES Convention, Budapest, Hungary, Apr.2012. Also to appear in the Journal of the AES, 2013

Claims

1. An apparatus (201) for decoding a previously encoded multichannel signal of a previous frame to obtain three or more previous audio output channels and for decoding a currently encoded multichannel signal (107) of the current frame to obtain three or more current audio output channels.

The device (201) includes an interface (212), a channel decoder (202), a multichannel processor (204) for generating the three or more current audio output channels, and a noise filling module (220).

The interface (212) is adapted to receive the currently encoded multi-channel signal (107) and to receive auxiliary information including a first multi-channel parameter (MCH_PAR2).

The channel decoder (202) is adapted to decode the currently encoded multichannel signal of the current frame to obtain a set of three or more decoded channels (D1, D2, D3) of the current frame.

The multi-channel processor (204) is adapted to select a first selected pair of two decoded channels (D1, D2) from the set of three or more decoded channels (D1, D2) according to the first multi-channel parameter (MCH_PAR2).

The multi-channel processor (204) is adapted to generate a first set of two or more processed channels (P1*, P2*) based on a first selected pair of the two decoded channels (D1, D2) to obtain an updated set of three or more decoded channels (D3, P1*, P2*).

Before the multichannel processor (204) generates a first pair of channels of the two or more processed channels (P1*, P2*) based on a first selected pair of the two decoded channels (D1, D2), the noise filling module (220) is adapted to identify one or more frequency bands in which all spectral lines of the first selected pair of the two decoded channels (D1, D2) are quantized to zero, and is adapted to generate a mixed channel using two or more, but not all, of the three or more previous audio output channels, and is adapted to fill the spectral lines of the one or more frequency bands in which all spectral lines of the mixed channel are quantized to zero with noise generated using the spectral lines of the mixed channel, wherein the noise filling module (220) is adapted to select two or more previous audio output channels from the three or more previous audio output channels for generating the mixed channel according to the auxiliary information.

2. The apparatus (201) according to claim 1,

The noise filling module (220) is adapted to generate the mixed channel by using exactly two of the three or more previous audio output channels as the two or more previous audio output channels among the three or more previous audio output channels;

The noise filling module (220) is adapted to select exactly two previous audio output channels from the three or more previous audio output channels based on the auxiliary information.

3. The apparatus (201) according to claim 2,

The noise filling module (220) is adapted to be based on the following equation

Or based on the following equation

The mixed channel is generated using exactly two of the previous audio output channels.

Wherein, D _ch is the mixed audio channel.

Here, is the first channel of exactly two previously output audio channels.

Here, is the second channel of the exactly two previous audio output channels, and the second channel is different from the first channel of the exactly two previous audio output channels.

Where d is a real positive scalar.

4. The apparatus (201) according to claim 2,

The noise filling module (220) is adapted to be based on the following equation

Or based on the following equation

Wherein, is the mixed audio channel,

Here, is the first channel of exactly two previously output audio channels.

Where α is the rotation angle.

5. The apparatus (201) according to claim 4,

The auxiliary information refers to the current auxiliary information assigned to the current frame.

The interface (212) is adapted to receive previous auxiliary information allocated to a previous frame, wherein the previous auxiliary information includes a previous angle.

The interface (212) is adapted to receive the current assistance information including the current angle, and

The noise filling module (220) is adapted to use the current angle of the current auxiliary information as the rotation angle α, and is also adapted not to use the previous angle of the previous auxiliary information as the rotation angle α.

6. The apparatus (201) according to any one of claims 2 to 5, wherein the noise filling module (220) is adapted to select exactly two previous audio output channels from the three or more previous audio output channels according to the first multichannel parameter (MCH_PAR2).

7. The apparatus (201) according to any one of claims 2-6,

The interface (212) is adapted to receive the currently encoded multi-channel signal (107) and to receive the auxiliary information including the first multi-channel parameter (MCH_PAR2) and the second multi-channel parameter (MCH_PAR1).

The multichannel processor (204) is adapted to select a second selected pair of two decoded channels (P1*, D3) from the updated set of three or more decoded channels (D3, P1*, P2*) according to the second multichannel parameter (MCH_PAR1), wherein at least one channel (P1*) in the second selected pair of the two decoded channels (P1*, D3) is one of the first pair of channels of the two or more processed channels (P1*, P2*), and

The multi-channel processor (204) is adapted to generate a second set of two or more processed channels (P3*, P4*) based on a second selected pair of the two decoded channels (P1*, D3) to further update the set of the updated three or more decoded channels.

8. The apparatus (201) according to claim 7,

The multi-channel processor (204) is adapted to generate a first set of two or more processed channels (P1*, P2*) by generating a first set of exactly two processed channels (P1*, P2*) based on a first selected pair of the two decoded channels (D1, D2).

The multichannel processor (204) is adapted to replace the first selected pair of two decoded channels (D1, D2) in the set of three or more decoded channels (D1, D2, D3) with exactly two processed channels (P1*, P2*) from the first set to obtain the updated set of three or more decoded channels (D3, P1*, P2*).

The multi-channel processor (204) is adapted to generate a second set of two or more processed channels (P3*, P4*) by generating a second set of exactly two processed channels (P3*, P4*) based on a second selected pair of the two decoded channels (P1*, D3), and

The multichannel processor (204) is adapted to replace the second selected pair of two decoded channels (P1*, D3) in the set of three or more decoded channels (D3, P1*, P2*) with exactly two processed channels (P3*, P4*) of the second set to further update the set of three or more decoded channels.

9. The apparatus (201) according to claim 8,

Wherein, the first multichannel parameter (MCH_PAR2) indicates two decoded channels (D1, D2) in the set of three or more decoded channels;

The multichannel processor (204) is adapted to select a first selected pair of the two decoded channels (D1, D2) from the set of three or more decoded channels (D1, D2) by selecting the two decoded channels (D1, D2) indicated by the first multichannel parameter (MCH_PAR2);

Wherein, the second multichannel parameter (MCH_PAR1) indicates two decoded channels (P1*, D3) in the set of three or more decoded channels after the update;

The multichannel processor (204) is adapted to select a second selected pair of the two decoded channels (P1*, D3) from the updated set of three or more decoded channels (D3, P1*, P2*) by selecting the two decoded channels (P1*, D3) indicated by the second multichannel parameter (MCH_PAR1).

10. The apparatus (201) according to claim 9,

The device (201) is adapted to assign an identifier from an identifier set to each of the three or more previous audio output channels, such that each of the three or more previous audio output channels is assigned exactly one identifier from the identifier set, and that each identifier from the identifier set is assigned to exactly one of the three or more previous audio output channels.

The device (201) is adapted to assign an identifier from the identifier set to each channel in the set of three or more decoded channels (D1, D2, D3), such that each channel in the set of three or more decoded channels is assigned exactly one identifier from the identifier set, and such that each identifier in the identifier set is assigned to exactly one channel in the set of three or more decoded channels (D1, D2, D3).

Wherein, the first multichannel parameter (MCH_PAR2) indicates the first pair of two identifiers in a set of three or more identifiers.

The multichannel processor (204) is adapted to select a first selected pair of two decoded channels (D1, D2) from a set of three or more decoded channels (D1, D2) by selecting two decoded channels (D1, D2) of the two identifiers assigned to the first pair of two identifiers.

The device (201) is adapted to assign the first identifier of the two identifiers of the first pair of two identifiers to the first processing channel (P1*, P2*) of the first group of exactly two processing channels, and the device (210) is adapted to assign the second identifier of the two identifiers of the first pair of two identifiers to the second processing channel (P1*, P2*) of the first group of exactly two processing channels.

11. The apparatus (201) according to claim 10,

Wherein, the second multi-channel parameter (MCH_PAR1) indicates the second pair of two identifiers in the set of three or more identifiers.

The multichannel processor (204) is adapted to select a second selected pair of the two decoded channels (P1*, D3) from the updated set of three or more decoded channels (D3, P1*, P2*) by selecting two decoded channels (D3, P1*) of the two identifiers assigned to the second pair of two identifiers.

The device (201) is adapted to assign the first identifier of the two identifiers of the second pair of two identifiers to the first processing channel of the second group of exactly two processing channels (P3*, P4*), and the device (201) is adapted to assign the second identifier of the two identifiers of the second pair of two identifiers to the second processing channel of the second group of exactly two processing channels (P3*, P4*).

12. The apparatus (201) according to claim 10 or 11,

Wherein, the first multi-channel parameter (MCH_PAR2) indicates the first pair of two identifiers in the set of three or more identifiers, and

The noise filling module (220) is adapted to select exactly two previous audio output channels from three or more previous audio output channels by selecting two previous audio output channels of the two identifiers assigned to the first pair of two identifiers.

13. The apparatus (201) according to any one of the preceding claims, wherein, before the multichannel processor (204) generates a first pair of channels of the two or more processed channels (P1*, P2*) based on a first selected pair of the two decoded channels (D1, D2), the noise filling module (220) is adapted to identify one or more scaling factor bands in which all spectral lines of the first selected pair of the two decoded channels (D1, D2) are quantized to zero, the one or more scaling factor bands being the one or more frequency bands, and is adapted to generate the mixed channel using two or more, but not all, of the three or more previous audio output channels, and is adapted to fill the spectral lines of the one or more scaling factor bands in which all spectral lines of the mixed channel are quantized to zero with noise generated using the spectral lines of the mixed channel, according to the scaling factor of each of the one or more scaling factor bands in which all spectral lines of the mixed channel are quantized to zero.

14. The apparatus (201) according to claim 13,

The receiving interface (212) is configured to receive the scaling factor of each scaling factor band in the one or more scaling factor bands, and

Wherein, the scaling factor of each scaling factor band in the one or more scaling factor bands indicates the energy of the spectral line of the scaling factor band before quantization, and

The noise filling module (220) is adapted to generate the noise for each of the one or more scaling factor bands in which all spectral lines are quantized to zero, such that after the noise is added to one of the frequency bands, the energy of the spectral line corresponds to the energy indicated by the scaling factor of the scaling factor band.

15. An apparatus (100) for encoding a multichannel signal (101) having at least three channels (CH1:CH3), wherein the apparatus comprises:

An iterative processor (102) is adapted to calculate, in a first iterative step, the interchannel correlation value between each pair of channels in the at least three channels (CH1:CH3) for selecting, in the first iterative step, the channel pair having the highest value or a value above a threshold, and for processing the selected channel pair using multichannel processing operations (110, 112) to derive the initial multichannel parameters (MCH_PAR1) of the selected channel pair and derive the channels (P1, P2) of the first processing.

The iterative processor (102) is adapted to perform the calculation, the selection and the processing in the second iterative step using at least one of the processing channels (P1) to derive other multi-channel parameters (MCH_PAR2) and the second processing channels (P3, P4).

A channel encoder, adapted to encode the channels (P2:P4) obtained by the iterative processing performed by the iterative processor (104) to obtain encoded channels (E1:E3); and

The output interface (106) is adapted to generate an encoded multichannel signal (107) having the encoded channels (E1:E3), the initial multichannel parameters and the other multichannel parameters (MCH_PAR1, MCH_PAR2), and having information indicating whether the decoding device needs to fill the spectral lines of one or more frequency bands in which all spectral lines are quantized to zero with noise generated based on previously decoded audio output channels that have been previously decoded by the decoding device.

16. The apparatus (100) according to claim 15,

Wherein, each of the initial multichannel parameters and the other multichannel parameters (MCH_PAR1, MCH_PAR2) indicates exactly two channels, each of the exactly two channels being one of the encoded channels (E1:E3), one of the channels of the first processing, one of the channels of the second processing (P1, P2, P3, P4), or one of the at least three channels (CH1:CH3).

The output interface (106) is adapted to generate the encoded multichannel signal (107) such that the information indicating whether the device for decoding needs to fill the spectral lines of one or more frequency bands in which all spectral lines are quantized to zero includes information indicating whether, for each of the initial multichannel parameters and the other multichannel parameters (MCH_PAR1, MCH_PAR2), for at least one of exactly two channels indicated by the parameters in the initial multichannel parameters and the other multichannel parameters (MCH_PAR1, MCH_PAR2), the device for decoding needs to fill the spectral lines of one or more frequency bands in which all spectral lines are quantized to zero with spectral data generated based on previously decoded audio output channels that have previously been decoded by the device for decoding.

17. A system comprising:

The encoding apparatus (100) according to claim 15 or 16, and

The device (201) for decoding according to any one of claims 1 to 14,

The decoding device (201) is configured to receive the encoded multi-channel signal (107) generated by the encoding device (100) from the encoding device (100).

18. A method for decoding a previously encoded multichannel signal of a previous frame to obtain three or more previous audio output channels, and for decoding a currently encoded multichannel signal (107) of a current frame to obtain three or more current audio output channels, wherein the method comprises:

Receive the currently encoded multichannel signal (107), and receive auxiliary information including the first multichannel parameter (MCH_PAR2);

Decode the currently encoded multichannel signal of the current frame to obtain a set of three or more decoded channels (D1, D2, D3) of the current frame;

Based on the first multichannel parameter (MCH_PAR2), a first selected pair of two decoded channels (D1, D2) is selected from the set of three or more decoded channels (D1, D2).

Based on the first selected pair of the two decoded channels (D1, D2), a first set of two or more processed channels (P1*, P2*) is generated to obtain an updated set of three or more decoded channels (D3, P1*, P2*).

Before generating the first pair of channels (P1*, P2*) of the two or more processed channels based on the first selected pair of the two decoded channels (D1, D2), the following steps are performed:

For the two decoded channels (D1, D2), at least one channel of the first selected pair is used to identify one or more frequency bands in which all spectral lines are quantized to zero, and a mixed channel is generated using two or more, but not all, of the three or more previous audio output channels, and the spectral lines of the one or more frequency bands in which all spectral lines are quantized to zero are filled with noise generated using the spectral lines of the mixed channel, wherein two or more previous audio output channels are selected from the three or more previous audio output channels to generate the mixed channel according to the auxiliary information.

19. A method for encoding a multichannel signal (101) having at least three channels (CH1:CH3), wherein the method comprises:

In the first iteration step, the interchannel correlation value between each pair of channels in the at least three channels (CH1:CH3) is calculated to select the channel pair with the highest value or a value above a threshold in the first iteration step, and the selected channel pair is processed using multichannel processing operations (110, 112) to derive the initial multichannel parameters (MCH_PAR1) of the selected channel pair and derive the channels (P1, P2) of the first processing.

In the second iteration step, the calculation, selection and processing are performed using at least one channel (P1) of the processed channels to derive other multichannel parameters (MCH_PAR2) and the channels (P3, P4) of the second processing.

The audio channels (P2:P4) obtained by the iterative processing performed by the iterative processor (104) are encoded to obtain encoded audio channels (E1:E3); and

Generate an encoded multichannel signal (107) having the encoded channels (E1:E3), the initial multichannel parameters and the other multichannel parameters (MCH_PAR1, MCH_PAR2), and having information indicating whether the decoding device needs to fill the spectral lines of one or more frequency bands in which all spectral lines are quantized to zero with noise generated based on previously decoded audio output channels that have been previously decoded by the decoding device.

20. A computer program, when executed on a computer or signal processor, for carrying out the method according to claim 18 or 19.

21. An encoded multichannel signal (107), comprising:

Encoded audio channels (E1:E3),

Multichannel parameters (MCH_PAR1, MCH_PAR2); and

The information indicates whether the device used for decoding needs to fill the spectral lines of one or more frequency bands in which all spectral lines are quantized to zero with noise generated based on the previously decoded audio output channel, which has been previously decoded by the device used for decoding.

22. The encoded multi-channel signal (107) according to claim 21,

The encoded multichannel signal includes two or more multichannel parameters (MCH_PAR1, MCH_PAR2) as the multichannel parameters (MCH_PAR1, MCH_PAR2).

Wherein, each of the two or more multi-channel parameters (MCH_PAR1, MCH_PAR2) indicates exactly two channels, each of the exactly two channels being one of the encoded channels (E1:E3), one of the multiple processed channels (P1, P2, P3, P4), or one of at least three initial channels (CH:CH3).

The information indicating whether the decoding device needs to fill the spectral lines of one or more frequency bands where all spectral lines are quantized to zero includes information indicating whether, for each of the two or more multichannel parameters (MCH_PAR1, MCH_PAR2), for at least one of the exactly two channels indicated by the parameter of the two or more multichannel parameters, the decoding device needs to fill the spectral lines of one or more frequency bands where all spectral lines are quantized to zero with spectral data generated based on previously decoded audio output channels that have previously been decoded by the decoding device.