HK40001584B

HK40001584B - Audio encoder and decoder

Info

Publication number: HK40001584B
Application number: HK19124847.5A
Authority: HK
Inventors: K·克约尔林; H·普恩哈根; H·默德; K·J·罗德恩; L·塞勒斯特罗姆
Original assignee: 杜比国际公司
Priority date: 2013-04-05
Filing date: 2019-06-06
Publication date: 2023-12-22

Description

Audio encoders and decoders

本申请是申请号为201480011081.3、申请日为2014年4月4日、发明名称为“音频编码器和解码器”的发明专利申请的分案申请。This application is a divisional application of the invention patent application with application number 201480011081.3, application date April 4, 2014, and invention title "Audio Encoder and Decoder".

对相关申请的交叉引用Cross-reference to related applications

本申请要求于2013年4月5日提交的美国临时专利申请No.61/808680的优先权，其全部内容通过引用并入于此。This application claims priority to U.S. Provisional Patent Application No. 61/808680, filed April 5, 2013, the entire contents of which are incorporated herein by reference.

技术领域Technical Field

本文的公开内容一般涉及多声道音频编码。特别地，它涉及用于包括参数编码和离散多声道编码的混合编码的编码器和解码器。The disclosures herein generally relate to multichannel audio coding. In particular, it relates to encoders and decoders for hybrid coding that includes parametric coding and discrete multichannel coding.

背景技术Background Technology

在传统的多声道音频编码中，可能的编码方案包括离散多声道编码或者诸如MPEG环绕(MPEG Surround)之类的参数编码。所使用的方案取决于音频系统的带宽。已知参数编码方法就收听质量而言是可扩展和高效的，这使得它们在低比特速率应用中特别有吸引力。在高比特速率应用中，常常使用离散多声道编码。从带宽效率的角度看，现有的分发或处理格式以及相关联的编码技术可以被改进，尤其是在具有介于低比特速率和高比特速率之间的比特速率的应用中。In traditional multichannel audio coding, possible coding schemes include discrete multichannel coding or parametric coding such as MPEG Surround. The scheme used depends on the bandwidth of the audio system. Parametric coding methods are known to be scalable and efficient in terms of listening quality, making them particularly attractive for low bitrate applications. Discrete multichannel coding is often used in high bitrate applications. From a bandwidth efficiency perspective, existing distribution or processing formats and associated coding techniques can be improved, especially in applications with bitrates between low and high bitrates.

US7292901(Kroon等人)涉及混合编码方法，其中混合音频信号由至少一个下混的(downmixed)频谱成分和至少一个未混合的(unmixed)频谱成分形成。在该申请中提出的方法可以提高具有某比特速率的应用的能力，但是可以需要进一步的改进来进一步提高音频处理系统的效率。US7292901 (Kroon et al.) relates to a hybrid coding method in which a hybrid audio signal is formed by at least one downmixed spectral component and at least one unmixed spectral component. The method proposed in this application can improve the capabilities of applications with a certain bit rate, but further improvements may be needed to further enhance the efficiency of the audio processing system.

发明内容Summary of the Invention

根据本发明的一个方面，提供了一种用于在音频处理系统中对编码音频比特流的时间帧进行解码的方法，该方法包括：According to one aspect of the present invention, a method for decoding time frames of an encoded audio bitstream in an audio processing system is provided, the method comprising:

针对时间帧接收M个上混信号，所述M个上混信号包括与高于第一交越频率的频率对应的频谱系数，For each time frame, M upmixed signals are received, wherein the M upmixed signals include spectral coefficients corresponding to frequencies higher than the first crossover frequency.

其中，所述M个上混信号是针对所述时间帧将通过在重构范围中执行第二交越频率以上的频率重构获得的N个频率扩展下混信号上混成M个上混信号的结果，其中，所述第二交越频率高于所述第一交越频率并且所述频率重构使用从所述编码音频比特流导出的重构参数；The M upmixed signals are the result of upmixing N frequency extension downmixed signals obtained by performing frequency reconstruction above the second crossover frequency in the reconstruction range into M upmixed signals for the time frame, wherein the second crossover frequency is higher than the first crossover frequency and the frequency reconstruction uses reconstruction parameters derived from the encoded audio bitstream.

针对所述时间帧从所述编码音频比特流中提取另一波形编码信号，该另一波形编码信号包括与高于所述第一交越频率的频率的子集对应的频谱系数；以及For the time frame, another waveform encoded signal is extracted from the encoded audio bitstream, the other waveform encoded signal including spectral coefficients corresponding to a subset of frequencies higher than the first crossover frequency; and

针对所述时间帧使所述另一波形编码信号与所述M个上混信号之一交织以产生交织信号。For the time frame, the other waveform encoded signal is interleaved with one of the M upmixing signals to generate an interleaved signal.

附图说明Attached Figure Description

现在将参考附图来描述示例性实施例，其中：An exemplary embodiment will now be described with reference to the accompanying drawings, in which:

图1是根据示例性实施例的解码系统的概括框图；Figure 1 is a general block diagram of a decoding system according to an exemplary embodiment;

图2示出了图1中的解码系统的第一部分；Figure 2 shows the first part of the decoding system in Figure 1;

图3示出了图1中的解码系统的第二部分；Figure 3 shows the second part of the decoding system in Figure 1;

图4示出了图1中的解码系统的第三部分；Figure 4 shows the third part of the decoding system in Figure 1;

图5是根据示例性实施例的编码系统的概括框图；Figure 5 is a general block diagram of an encoding system according to an exemplary embodiment;

图6是根据示例性实施例的解码系统的概括框图；Figure 6 is a general block diagram of a decoding system according to an exemplary embodiment;

图7示出了图6中的解码系统的第三部分；以及Figure 7 shows the third part of the decoding system in Figure 6; and

图8是根据示例性实施例的编码系统的概括框图。Figure 8 is a general block diagram of an encoding system according to an exemplary embodiment.

所有附图都是示意性的，并且一般只示出了阐明本公开内容所必需的部分，而其它部分可能被略去或者仅作暗示。除非另有表明，否则在不同的图中相同的标号指代相同的部分。All accompanying drawings are schematic and generally show only the parts necessary to illustrate the contents of this disclosure, while other parts may be omitted or are merely implied. Unless otherwise stated, the same reference numerals refer to the same parts in different drawings.

具体实施方式Detailed Implementation

概述–解码器Overview – Decoder

如在本文所使用的，音频信号可以是纯音频信号、视听信号或多媒体信号中的音频部分或者这些信号中的任何一个与元数据的结合。As used herein, an audio signal can be a pure audio signal, an audio-visual signal, or the audio portion of a multimedia signal, or any of these signals combined with metadata.

如在本文所使用的，多个信号的下混(downmixing)意味着例如通过形成线性组合来组合这多个信号，使得获得更少数量的信号。下混的逆操作被称为上混(upmixing)，即，对更少数量的信号执行操作以获得更多数量的信号。As used herein, downmixing multiple signals means, for example, combining these signals by forming a linear combination to obtain a smaller number of signals. The inverse operation of downmixing is called upmixing, which is performing the operation on a smaller number of signals to obtain a larger number of signals.

根据第一方面，示例性实施例提出了基于输入信号来重构多声道音频信号的方法、设备和计算机程序产品。所提出的方法、设备和计算机程序产品一般可以具有相同的特征和优点。According to the first aspect, exemplary embodiments propose methods, apparatus, and computer program products for reconstructing multichannel audio signals based on input signals. The proposed methods, apparatus, and computer program products can generally have the same features and advantages.

根据示例性实施例，提供了用于重构M个编码声道的多声道音频处理系统的解码器，其中M>2。解码器包括第一接收级，所述第一接收级被配置为接收包括与介于第一和第二交越频率(cross-over frequency)之间的频率对应的频谱系数的N个波形编码(waveform-coded)下混信号，其中1<N<M。According to an exemplary embodiment, a decoder is provided for reconstructing a multichannel audio processing system with M coded channels, where M > 2. The decoder includes a first receiving stage configured to receive N waveform-coded downmixed signals, each comprising spectral coefficients corresponding to frequencies between first and second cross-over frequencies, where 1 < N < M.

解码器还包括第二接收级，所述第二接收级被配置为接收包括与高至第一交越频率的频率对应的频谱系数的M个波形编码信号，这M个波形编码信号中的每一个对应于M个编码声道中相应的一个。The decoder also includes a second receiving stage configured to receive M waveform-coded signals comprising spectral coefficients corresponding to frequencies up to a first crossover frequency, each of the M waveform-coded signals corresponding to a specific one of the M coded channels.

解码器还包括位于第二接收级下游的下混级，所述下混级被配置为把M个波形编码信号下混成包括与高至第一交越频率的频率对应的频谱系数的N个下混信号。The decoder also includes a downmixing stage located downstream of the second receiver stage, the downmixing stage being configured to downmix M waveform-coded signals into N downmixed signals, each including spectral coefficients corresponding to frequencies up to the first crossover frequency.

解码器还包括位于第一接收级和下混级下游的第一组合级，所述第一组合级被配置为把由第一接收级接收的N个下混信号中的每一个下混信号和来自下混级的N个下混信号中的对应的一个下混信号组合成N个组合下混信号。The decoder also includes a first combining stage located downstream of the first receiving stage and the downmixing stage, the first combining stage being configured to combine each of the N downmixing signals received by the first receiving stage and a corresponding downmixing signal from the N downmixing signals from the downmixing stage into N combined downmixing signals.

解码器还包括位于第一组合级下游的高频重构级，所述高频重构级被配置为通过执行高频重构把来自组合级的N个组合下混信号中的每一个组合下混信号扩展到高于第二交越频率的频率范围。The decoder also includes a high-frequency reconstruction stage located downstream of the first combination stage, the high-frequency reconstruction stage being configured to extend each of the N combined downmixed signals from the combination stage to a frequency range above the second crossover frequency by performing high-frequency reconstruction.

解码器还包括位于高频重构级下游的上混级，所述上混级被配置为执行把来自高频重构级的N个频率扩展信号参数上混成包括与高于第一交越频率的频率对应的频谱系数的M个上混信号，这M个上混信号中的每一个上混信号对应于M个编码声道之一。The decoder also includes an upmixing stage located downstream of the high-frequency reconstruction stage, the upmixing stage being configured to perform upmixing of N frequency spread signal parameters from the high-frequency reconstruction stage into M upmixed signals, each of the M upmixed signals corresponding to one of the M coded channels.

解码器还包括位于上混级和第二接收级下游的第二组合级，所述第二组合级被配置为使来自上混级的M个上混信号与由第二接收级接收的M个波形编码信号组合。The decoder also includes a second combining stage located downstream of the upmixing stage and the second receiving stage, the second combining stage being configured to combine M upmixed signals from the upmixing stage with M waveform-coded signals received by the second receiving stage.

M个波形编码信号是未混入参数信号的纯波形编码信号，即，它们是被处理过的多声道音频信号的未下混离散表示。具有以这些波形编码信号表示的较低频率的优点可以是人耳对音频信号中具有低频的部分更敏感。通过以更好的质量对该部分进行编码，可以提高被解码的音频的整体印象。The M waveform-coded signals are pure waveform-coded signals without any added parameter signals; that is, they are unmixed discrete representations of the processed multichannel audio signal. The advantage of representing lower frequencies with these waveform-coded signals is that the human ear is more sensitive to the low-frequency components of an audio signal. By encoding this portion with better quality, the overall impression of the decoded audio can be improved.

具有至少两个下混信号的优点是：与只有一个下混声道的系统相比，本实施例提供了增加维度的下混信号。因此，根据本实施例，可以提供更好的解码音频质量，这可能比一个下混信号的系统所提供的比特速率益处更重要。The advantage of having at least two downmixing signals is that, compared to a system with only one downmixing channel, this embodiment provides an increased dimension of downmixing. Therefore, according to this embodiment, better decoded audio quality can be provided, which may be more significant than the bit rate benefit offered by a system with only one downmixing signal.

使用包括参数下混和离散多声道编码的混合编码的优点是：与使用传统的参数编码方法(即，带HE-AAC的MPEG环绕)相比，这可以针对某些比特速率提高解码音频信号的质量。在大约72千比特每秒(kbps)的比特速率处，传统的参数编码模型可能饱和，即，解码音频信号的质量受参数模型的缺点限制，而非受用于编码的比特不足限制。因此，对于从大约72kbps开始的比特速率，在离散地波形编码较低频率上使用比特可能更有利。同时，使用参数下混和离散多声道编码的混合方法是：与使用其中所有比特都用于波形编码较低频率的方法以及对剩余频率使用谱带复制(spectral band replication，SBR)相比，这可以针对例如128kbps或以下的某些比特速率提高解码音频的质量。The advantage of using hybrid coding, which includes parametric downmixing and discrete multichannel coding, is that it can improve the quality of decoded audio signals at certain bit rates compared to using traditional parametric coding methods (i.e., MPEG surround with HE-AAC). At bit rates of approximately 72 kilobits per second (kbps), traditional parametric coding models may saturate; that is, the quality of the decoded audio signal is limited by the shortcomings of the parametric model, rather than by the insufficient number of bits available for coding. Therefore, for bit rates starting at approximately 72 kbps, it may be more advantageous to use bits for discrete waveform coding at lower frequencies. Meanwhile, the hybrid approach using parametric downmixing and discrete multichannel coding can improve the quality of decoded audio at certain bit rates, such as 128 kbps or lower, compared to methods that use all of these bits for waveform coding at lower frequencies and employ spectral band replication (SBR) for the remaining frequencies.

具有只包括与介于第一交越频率和第二交越频率之间的频率对应的频谱数据的N个波形编码下混信号的优点是：音频信号处理系统所需的比特传输速率可以降低。可替代地，通过具有带通滤波的下混信号而节省的比特可以用于波形编码较低的频率，例如，用于那些频率的采样频率可以更高或者第一交越频率可以增大。The advantage of having N waveform-coded downmixers that include only spectral data corresponding to frequencies between the first and second crossover frequencies is that the bit transmission rate required by the audio signal processing system can be reduced. Alternatively, the bits saved by using downmixers with bandpass filtering can be used to waveform-code lower frequencies; for example, the sampling frequency for those frequencies can be higher or the first crossover frequency can be increased.

如上面所提到的，由于人耳对音频信号中具有低频的部分更敏感，因此，作为音频信号中具有高于第二交越频率的频率的部分，高频可以在不降低解码音频信号的感知音频质量的情况下通过高频重构来重建。As mentioned above, since the human ear is more sensitive to the low-frequency part of the audio signal, the high-frequency part of the audio signal, which has a frequency higher than the second crossover frequency, can be reconstructed through high-frequency reconstruction without reducing the perceived audio quality of the decoded audio signal.

本实施例的另一优点可以是：由于在上混级中执行的参数上混只对与高于第一交越频率的频率对应的频谱系数操作，因此降低了上混的复杂度。Another advantage of this embodiment is that since the parameter upmixing performed in the upmixing stage only operates on the spectral coefficients corresponding to frequencies higher than the first crossover frequency, the complexity of upmixing is reduced.

根据另一实施例，在第一组合级中执行的所述组合是在频域中执行的，其中包括与介于第一和第二交越频率之间的频率对应的频谱系数的N个波形编码下混信号中的每一个和包括与高至第一交越频率的频率对应的频谱系数的N个下混信号中的对应下混信号组合成N个组合下混。According to another embodiment, the combination performed in the first combination level is performed in the frequency domain, which includes each of N waveform-coded downmixing signals with spectral coefficients corresponding to frequencies between the first and second crossover frequencies and a corresponding downmixing signal from N downmixing signals with spectral coefficients corresponding to frequencies up to the first crossover frequency, combining them into N combined downmixing signals.

本实施例的优点可以是：M个波形编码信号和N个波形编码下混信号可以由波形编码器分别使用为所述M个的波形编码的信号和N个波形编码的下混信号独立加窗(independent windowing)的重叠窗变换(overlapping windowed transforms)来编码，并且仍然能够被解码器解码。The advantage of this embodiment is that the M waveform encoded signals and the N waveform encoded downmixed signals can be encoded by the waveform encoder using overlapping windowed transforms that independently window the M waveform encoded signals and the N waveform encoded downmixed signals, respectively, and can still be decoded by the decoder.

根据另一实施例，在高频重构级中把N个组合下混信号中的每一个扩展到第二交越频率以上的频率范围是在频域中执行的。According to another embodiment, extending each of the N combined downmixed signals in the high-frequency reconstruction stage to a frequency range above the second crossover frequency is performed in the frequency domain.

根据另一实施例，在第二组合步骤中执行的组合，即，包括与高于第一交越频率的频率对应的频谱系数的M个上混信号和包括与高至第一交越频率的频率对应的频谱系数的M个波形编码信号的组合，是在频域中执行的。如上面所提到的，在QMF域中组合信号的优点是：可以使用用来对MDCT中的信号进行编码的重叠窗变换的独立加窗。According to another embodiment, the combination performed in the second combination step—that is, the combination of M upmixed signals including spectral coefficients corresponding to frequencies above the first crossover frequency and M waveform-coded signals including spectral coefficients corresponding to frequencies up to the first crossover frequency—is performed in the frequency domain. As mentioned above, the advantage of combining signals in the QMF domain is that independent windowing can be used for overlapping window transforms used to encode signals in MDCT.

根据另一实施例，在上混级执行的N个频率扩展的组合下混信号到M个上混信号的参数上混是在频域中执行的。According to another embodiment, the parametric upmixing of the combination of N frequency extensions performed in the upmixing stage to M upmixing signals is performed in the frequency domain.

根据又一实施例，把M个波形编码信号下混成包括与高至第一交越频率的频率对应的频谱系数的N个下混信号是在频域中执行的。According to another embodiment, downmixing M waveform-coded signals into N downmixed signals, including spectral coefficients corresponding to frequencies up to the first crossover frequency, is performed in the frequency domain.

根据实施例，频域是正交镜像滤波器(QMF)域。According to the embodiment, the frequency domain is the quadrature mirror filter (QMF) domain.

根据另一实施例，在下混级中执行的下混是在时域中执行的，其中M个波形编码信号被下混成包括与高至第一交越频率的频率对应的频谱系数的N个下混信号。According to another embodiment, the downmixing performed in the downmixing stage is performed in the time domain, wherein M waveform-coded signals are downmixed into N downmixed signals, each including spectral coefficients corresponding to frequencies up to a first crossover frequency.

根据又一实施例，第一交越频率取决于多声道音频处理系统的比特传输速率。这会导致可用带宽被用来提高解码音频信号的质量，因为音频信号中具有低于第一交越频率的频率的部分是纯波形编码的。According to another embodiment, the first crossover frequency depends on the bit transmission rate of the multi-channel audio processing system. This results in the available bandwidth being used to improve the quality of the decoded audio signal, since the portion of the audio signal with frequencies below the first crossover frequency is purely waveform-coded.

根据另一实施例，通过在高频重构级执行高频重构把N个组合下混信号中的每一个扩展到第二交越频率以上的频率范围是使用高频重构参数来执行的。高频重构参数可以例如在接收级由解码器接收，然后被发送到高频重构级。高频重构例如可以包括执行谱带复制(SBR)。According to another embodiment, extending each of the N combined undermixed signals into a frequency range above the second crossover frequency by performing high-frequency reconstruction at the high-frequency reconstruction stage is performed using high-frequency reconstruction parameters. These high-frequency reconstruction parameters can be received by the decoder at the receiver stage and then sent to the high-frequency reconstruction stage. High-frequency reconstruction may, for example, include performing spectral band replication (SBR).

根据另一实施例，上混级中的参数上混是利用上混参数进行的。上混参数例如在接收级由编码器接收，并被发送到上混级。N个频率扩展的组合下混信号的去相关版本被生成，并且N个频率扩展的组合下混信号和N个频率扩展的组合下混信号的去相关版本经受矩阵运算。矩阵运算的参数由上混参数给出。According to another embodiment, the parameter upmixing in the upmixing stage is performed using upmixing parameters. These upmixing parameters are received by the encoder, for example, at the receiver stage and sent to the upmixing stage. A decorrelated version of the combined downmixed signal with N frequency spreads is generated, and the combined downmixed signal with N frequency spreads and its decorrelated version undergo matrix operations. The parameters for the matrix operations are given by the upmixing parameters.

根据另一种实施例，在第一接收级中接收到的N个波形编码下混信号和在第二接收级中接收到的M个波形编码信号分别使用为所述N个波形编码下混信号和M个波形编码信号独立加窗的重叠窗变换来编码。According to another embodiment, the N waveform-coded downmixed signals received in the first receiving stage and the M waveform-coded signals received in the second receiving stage are encoded using an overlapping window transform that independently windows the N waveform-coded downmixed signals and the M waveform-coded signals, respectively.

这样做的优点可以是：这允许提高编码质量，从而允许提高被解码的多声道音频信号的质量。例如，如果在某个时间点在较高频带中检测到瞬变，则波形编码器可以用较短的窗序列来编码该特定时间帧，而对于较低的频带，可以保持默认的窗序列。The advantage of this approach is that it allows for improved encoding quality, which in turn allows for improved quality of the decoded multi-channel audio signal. For example, if a transient is detected in a higher frequency band at a certain time point, the waveform encoder can encode that specific time frame with a shorter window sequence, while maintaining the default window sequence for lower frequency bands.

根据实施例，解码器可以包括第三接收级，所述第三接收级被配置为接收包括与高于第一交越频率的频率的子集对应的频谱系数的另一波形编码信号。解码器还可以包括位于上混级下游的交织级。交织级可以被配置为使该另一波形编码信号与M个上混信号之一交织。第三接收级还可以被配置为接收多个其它波形编码信号，并且交织级还可以被配置为使这多个其它波形编码信号与多个M上混信号交织。According to an embodiment, the decoder may include a third receiving stage configured to receive another waveform-coded signal comprising spectral coefficients corresponding to a subset of frequencies above a first crossover frequency. The decoder may also include an interleaving stage downstream of the upmixing stage. The interleaving stage may be configured to interleave the other waveform-coded signal with one of M upmixing signals. The third receiving stage may also be configured to receive a plurality of other waveform-coded signals, and the interleaving stage may further be configured to interleave these plurality of other waveform-coded signals with a plurality of M upmixing signals.

其优点在于：高于第一交越频率的频率范围中难以根据下混信号来参数重构的某些部分可以以波形编码的形式提供，以便与经参数重构的上混信号交织。Its advantage is that certain parts of the frequency range above the first crossover frequency that are difficult to parametrically reconstruct from the downmixing signal can be provided in the form of waveform encoding so as to interleave with the parametrically reconstructed upmixing signal.

在一个示例性实施例中，所述交织是通过使该另一波形编码信号与M个上混信号之一相加来执行的。根据另一种示例性实施例，使该另一波形编码信号与M个上混信号之一交织的步骤包括：在高于第一交越频率的频率的与该另一波形编码信号的频谱系数对应的子集中用该另一波形编码信号替换M个上混信号之一。In one exemplary embodiment, the interleaving is performed by adding the other waveform-coded signal to one of the M upmixing signals. According to another exemplary embodiment, the step of interleaving the other waveform-coded signal with one of the M upmixing signals includes replacing one of the M upmixing signals with the other waveform-coded signal in a subset of frequencies above a first crossover frequency corresponding to the spectral coefficients of the other waveform-coded signal.

根据示例性实施例，解码器还可以被配置为例如通过第三接收级接收控制信号。控制信号可以指示如何使该另一波形编码信号与M个上混信号之一交织，其中使这另一波形编码信号与M个上混信号之一交织的步骤基于该控制信号。具体而言，控制信号可以指示该另一波形编码信号要与M个上混信号之一交织的频率范围和时间范围，诸如QMF域中的一个或多个时间/频率块。相应地，交织可以按时间和频率在一个声道内发生。According to an exemplary embodiment, the decoder can also be configured to receive control signals, for example, via a third receiving stage. The control signals can instruct how the other waveform-encoded signal is interleaved with one of the M upmix signals, wherein the step of interleaving the other waveform-encoded signal with one of the M upmix signals is based on the control signals. Specifically, the control signals can instruct the frequency range and time range for which the other waveform-encoded signal is to be interleaved with one of the M upmix signals, such as one or more time/frequency blocks in the QMF domain. Accordingly, the interleaving can occur in one channel, both in time and frequency.

这样做的优点是：可以选择不遭受用来编码波形编码信号的重叠窗变换的混叠或启动/渐隐问题的时间范围和频率范围。The advantage of doing this is that you can select the time and frequency ranges that do not suffer from aliasing or start/fade-out problems caused by the overlapping window transform used to encode the waveform signal.

概述–编码器Overview – Encoder

根据第二方面，示例性实施例提出了基于输入信号来对多声道音频信号进行编码的方法、设备和计算机程序产品。According to the second aspect, exemplary embodiments provide a method, apparatus, and computer program product for encoding multichannel audio signals based on input signals.

所提出的方法、设备和计算机程序产品一般可以具有相同的特征和优点。The proposed methods, devices, and computer program products can generally have the same characteristics and advantages.

关于如在以上对解码器的概述中介绍的特征和设置的优点对于用于编码器的对应特征和设置一般可以有效。The advantages of the features and settings described in the above overview of the decoder are generally effective for the corresponding features and settings used in the encoder.

根据示例性实施例，提供了用于编码M个声道的多声道音频处理系统的编码器，其中M>2。According to an exemplary embodiment, an encoder is provided for encoding a multichannel audio processing system with M channels, where M>2.

编码器包括被配置为接收与要编码的M个声道对应的M个信号的接收级。The encoder includes a receiver stage configured to receive M signals corresponding to the M channels to be encoded.

编码器还包括第一波形编码级，所述第一波形编码级被配置为接收来自接收级的M个信号，并通过针对与高至第一交越频率的频率对应的频率范围单独地波形编码这M个信号来生成M个波形编码信号，由此这M个波形编码信号包括与高至第一交越频率的频率对应的频谱系数。The encoder also includes a first waveform encoding stage configured to receive M signals from a receiving stage and generate M waveform-coded signals by individually waveform encoding the M signals for a frequency range corresponding to frequencies up to a first crossover frequency, whereby the M waveform-coded signals include spectral coefficients corresponding to frequencies up to the first crossover frequency.

编码器还包括下混级，所述下混级被配置为接收来自接收级的M个信号，并把这M个信号下混成N个下混信号，其中1<N<M。The encoder also includes a downmixing stage configured to receive M signals from the receiving stage and downmix these M signals into N downmixed signals, where 1 < N < M.

编码器还包括高频重构编码级，所述高频重构编码级被配置为接收来自下混级的N个下混信号，并使这N个下混信号经受高频重构编码，由此高频重构编码级被配置为提取使得能够在第二交越频率以上对N个下混信号进行高频重构的高频重构参数。The encoder also includes a high-frequency reconstruction coding stage configured to receive N downmixed signals from the downmixing stage and subject these N downmixed signals to high-frequency reconstruction coding, thereby configuring the high-frequency reconstruction coding stage to extract high-frequency reconstruction parameters that enable high-frequency reconstruction of the N downmixed signals above a second crossover frequency.

编码器还包括参数编码级，所述参数编码级被配置为接收来自接收级的M个信号和来自下混级的N个下混信号，并针对与高于第一交越频率的频率对应的频率范围使这M个信号接受参数编码，由此参数编码级被配置为提取使得能够针对第一交越频率以上的频率范围把N个下混信号上混成与M个声道对应的M个重构信号的上混参数。The encoder also includes a parameter encoding stage configured to receive M signals from a receiving stage and N downmixing signals from a downmixing stage, and to parameter-encode the M signals for a frequency range corresponding to frequencies above a first crossover frequency. The parameter encoding stage is thus configured to extract upmixing parameters that enable the N downmixing signals to be upmixed into M reconstructed signals corresponding to the M channels for a frequency range above the first crossover frequency.

编码器还包括第二波形编码级，所述第二波形编码级被配置为接收来自下混级的N个下混信号，并通过针对与介于第一和第二交越频率之间的频率对应的频率范围，波形编码这N个下混信号，来生成N个波形编码下混信号，由此这N个波形编码下混信号包括与介于第一交越频率和第二交越频率之间的频率对应的频谱系数。The encoder also includes a second waveform encoding stage configured to receive N downmixing signals from the downmixing stage and generate N waveform-coded downmixing signals by waveform encoding the N downmixing signals for a frequency range corresponding to frequencies between the first and second crossover frequencies, whereby the N waveform-coded downmixing signals include spectral coefficients corresponding to frequencies between the first and second crossover frequencies.

根据实施例，在高频重构编码级中使N个下混信号经受高频重构编码是在频域中执行的，优选地是正交镜像滤波器(QMF)域。According to an embodiment, subjecting N downmixed signals to high-frequency reconstruction coding in the high-frequency reconstruction coding stage is performed in the frequency domain, preferably in the quadrature mirror filter (QMF) domain.

根据另一实施例，在参数编码级中使M个信号经受参数编码是在频域中执行的，优选地是在正交镜像滤波器(QMF)域中。According to another embodiment, the parameter encoding of M signals in the parameter encoding stage is performed in the frequency domain, preferably in the quadrature mirror filter (QMF) domain.

根据又一实施例，在第一波形编码级中通过单独地波形编码M个信号来生成M个波形编码信号包括对这M个信号应用重叠窗变换，其中对于这M个信号中的至少两个使用不同的重叠窗序列。According to another embodiment, generating M waveform-coded signals by individually waveform-coding M signals in the first waveform coding level includes applying an overlap window transform to the M signals, wherein at least two of the M signals use different overlap window sequences.

根据实施例，编码器还可以包括第三波形编码级，所述第三波形编码级被配置为通过针对与第一交越频率以上的频率范围的子集对应的频率范围，波形编码这M个信号之一，来生成另一波形编码信号的。According to an embodiment, the encoder may further include a third waveform encoding level, which is configured to generate another waveform-encoded signal by waveform encoding one of the M signals for a frequency range corresponding to a subset of the frequency range above the first crossover frequency.

根据实施例，编码器可以包括控制信号生成级。控制信号生成级被配置为生成指示在解码器中如何使所述另一波形编码信号与M个信号之一的参数重构交织的控制信号。例如，控制信号可以指示该另一波形编码信号与M个信号之一交织的频率范围和时间范围。According to an embodiment, the encoder may include a control signal generation stage. The control signal generation stage is configured to generate a control signal that indicates how the other waveform encoded signal is reconstructed and interleaved with one of the M signals in the decoder. For example, the control signal may indicate the frequency range and time range of the interleaving of the other waveform encoded signal with one of the M signals.

示例性实施例Exemplary embodiments

图1是多声道音频处理系统中用于重构M个编码声道的解码器100的概括框图。解码器100包括三个概念性部分200、300、400，将结合下面的图2-4来更详细地说明这个概念性部分。在第一概念性部分200中，编码器接收代表要解码的多声道音频信号的M个波形编码信号和N个波形编码下混信号，其中1<N<M。在所例示的例子中，N被设置为2。在第二概念性部分300中，M个波形编码信号被下混，并与N个波形编码下混信号组合。然后，对组合的下混信号执行高频重构(HFR)。在第三概念性部分400中，高频重构信号被上混，并且将M个波形编码信号与上混信号组合，以重构M个编码声道。Figure 1 is a general block diagram of a decoder 100 used to reconstruct M coded channels in a multi-channel audio processing system. The decoder 100 comprises three conceptual sections 200, 300, and 400, which will be explained in more detail with reference to Figures 2-4 below. In the first conceptual section 200, the encoder receives M waveform-coded signals and N waveform-coded downmixed signals representing the multi-channel audio signal to be decoded, where 1 < N < M. In the illustrated example, N is set to 2. In the second conceptual section 300, the M waveform-coded signals are downmixed and combined with the N waveform-coded downmixed signals. High-frequency reconstruction (HFR) is then performed on the combined downmixed signal. In the third conceptual section 400, the high-frequency reconstruction signal is upmixed, and the M waveform-coded signals are combined with the upmixed signals to reconstruct the M coded channels.

在结合图2-4描述的示例性实施例中，描述了被编码的5.1环绕声的重构。注意，在所述实施例中或者在附图中未提到低频效果信号。这并非意味着忽略了任何低频效果。以本领域技术人员周知的任何合适的方式，将低频效果(Lfe)添加到重构的5个声道。还要注意，所描述的解码器同样较好地适于其它类型的编码环绕声，诸如7.1或9.1环绕声。In the exemplary embodiment described in conjunction with Figures 2-4, the reconstruction of encoded 5.1 surround sound is depicted. Note that low-frequency effect signals are not mentioned in the described embodiment or in the figures. This does not mean that any low-frequency effects are ignored. Low-frequency effects (Lfe) are added to the reconstructed 5 channels in any suitable manner known to those skilled in the art. It should also be noted that the described decoder is equally well suited to other types of encoded surround sound, such as 7.1 or 9.1 surround sound.

图2示出了图1中的解码器100的第一概念性部分200。解码器包括两个接收级212、214。在第一接收级212中，比特流202被解码和解量化成两个波形编码下混信号208a-b。这两个波形编码下混信号208a-b中的每一个都包括与介于第一交越频率k_y和第二交越频率k_x之间的频率对应的频谱系数。Figure 2 illustrates a first conceptual portion 200 of the decoder 100 in Figure 1. The decoder includes two receiver stages 212 and 214. In the first receiver stage 212, the bitstream 202 is decoded and dequantized into two waveform-coded downmixed signals 208a-b. Each of these two waveform-coded downmixed signals 208a-b includes spectral coefficients corresponding to frequencies between a first crossover frequency _ky and a second crossover frequency _xx .

在第二接收级212中，比特流202被解码和解量化成五个波形编码下混信号210a-e。这五个波形编码下混信号208a-e中的每一个都包括与高至第一交越频率k_x的频率对应的频谱系数。In the second receiver stage 212, the bitstream 202 is decoded and dequantized into five waveform-coded downmixed signals 210a-e. Each of these five waveform-coded downmixed signals 208a-e includes spectral coefficients corresponding to frequencies up to the first crossover frequency _kx .

以举例的方式，信号210a-e包括两个声道对元素和一个用于中央的单声道元素。声道对元素例如可以是左前和左环绕声信号的组合以及右前和右环绕声信号的组合。另一个例子是左前和右前信号的组合以及左环绕声和右环绕声信号的组合。这些声道对元素例如可以按和差格式(sum-and-difference format)来编码。所有五个信号210a-e都可以通过使用具有独立加窗的重叠窗变换来编码，并且仍然能够被解码器解码。这可以允许提高编码质量并因此允许提高解码信号的质量。By way of example, signals 210a-e include two channel pairs and one mono element for the center. The channel pairs can be, for example, a combination of left front and left surround signals and a combination of right front and right surround signals. Another example is a combination of left front and right front signals and a combination of left surround and right surround signals. These channel pairs can be encoded, for example, in a sum-and-difference format. All five signals 210a-e can be encoded using an overlapping window transform with independent windowing and can still be decoded by the decoder. This allows for improved encoding quality and therefore allows for improved quality of the decoded signal.

以举例的方式，第一交越频率k_y是1.1kHz。以举例的方式，第二交越频率k_x在5.6-8kHz的范围内。应当注意，第一交越频率k_y可以变化，甚至基于逐个信号，即，编码器可以检测到特定输出信号中的信号分量可能未被立体声下混信号208a-b如实地再现，并且可以针对该特定的时刻增加相关波形编码信号(即，210a-e)的带宽，即，第一交越频率k_y，以便对信号分量进行合适的波形编码。For example, the first crossover frequency _ky is 1.1 kHz. For example, the second crossover frequency _kx is in the range of 5.6-8 kHz. It should be noted that the first crossover frequency _ky can vary, even on a signal-by-signal basis. That is, the encoder can detect that a signal component in a particular output signal may not be faithfully reproduced by the stereo downmix signal 208a-b, and can increase the bandwidth of the relevant waveform encoded signal (i.e., 210a-e), i.e., the first crossover frequency _ky , for that particular moment in order to perform appropriate waveform encoding of the signal component.

如随后将在本说明书中描述的，编码器100的剩下的级典型地在正交镜像滤波器(QMF)域中操作。出于这个原因，通过应用逆MDCT 216，由第一和第二接收级212、214以修正离散余弦变换(MDCT)形式接收的信号208a-b、210a-e中的每一个被变换到时域。然后，通过应用QMF变换218，每个信号被变换回到频域。As will be described later in this specification, the remaining stages of encoder 100 typically operate in the quadrature mirror filter (QMF) domain. For this reason, each of the signals 208a-b, 210a-e received by the first and second receiving stages 212, 214 in modified discrete cosine transform (MDCT) form is transformed to the time domain by applying inverse MDCT 216. Each signal is then transformed back to the frequency domain by applying QMF transform 218.

在图3中，五个波形编码信号210在下混级308被下混成包括与高至第一交越频率k_y的频率对应的频谱系数的两个下混信号310、312。可以通过使用与图2中所示的在编码器中用来创建两个下混信号208a-b的相同下混方案，对低通多声道信号210a-e执行下混，来形成这些下混信号310、312。In Figure 3, five waveform encoded signals 210 are downmixed in downmixing stage 308 into two downmixed signals 310 and 312, each including spectral coefficients corresponding to frequencies up to the first crossover frequency _ky . These downmixed signals 310 and 312 can be formed by downmixing the low-pass multichannel signals 210a-e using the same downmixing scheme used in the encoder to create the two downmixed signals 208a-b shown in Figure 2.

然后，这两个新的下混信号310、312在第一组合级320、322中与对应的下混信号208a-b组合，以形成组合下混信号302a-b。因此，组合下混信号302a-b中的每一个包括源自下混信号310、312的、与高至第一交越频率k_y的频率对应的频谱系数以及源自在第一接收级212中接收的两个波形编码下混信号208a-b的、与介于第一交越频率k_y和第二交越频率k_x之间的频率对应的频谱系数(图2中所示)。Then, these two new downmixing signals 310 and 312 are combined with the corresponding downmixing signals 208a-b in the first combination stages 320 and 322 to form combined downmixing signals 302a-b. Therefore, each of the combined downmixing signals 302a-b includes spectral coefficients derived from downmixing signals 310 and 312 corresponding to frequencies up to the first crossover frequency _ky , and spectral coefficients derived from the two waveform-coded downmixing signals 208a-b received in the first receiving stage 212 corresponding to frequencies between the first crossover frequency _ky and the second crossover frequency _xx (shown in Figure 2).

编码器还包括高频重构(HFR)级314。HFR级被配置为通过执行高频重构把来自组合级的两个组合下混信号302a-b中的每一个扩展到第二交越频率k_x以上的频率范围。根据一些实施例，所执行的高频重构可以包括执行谱带复制(SBR)。可以通过以任何适当方式使用由HFR级314接收的高频重构参数来进行高频重构。The encoder also includes a high-frequency reconstruction (HFR) stage 314. The HFR stage is configured to extend each of the two combined downmixed signals 302a-b from the combining stage into a frequency range above a second crossover frequency _kx by performing high-frequency reconstruction. According to some embodiments, the performed high-frequency reconstruction may include performing spectral band replication (SBR). High-frequency reconstruction can be performed by using the high-frequency reconstruction parameters received by the HFR stage 314 in any suitable manner.

来自高频重构级314的输出是包括下混信号208a-b和所应用的HFR扩展316、318的两个信号304a-b。如上所述，HFR级314基于在与两个下混信号208a-b组合的、来自第二接收级214(图2中所示的)的输入信号210a-e中存在的频率来执行高频重构。简而言之，HFR范围316、318包括来自下混信号310、312的频谱系数中的已被拷贝至HFR范围316、318的的部分。因此，五个波形编码信号210a-e的部分将在来自HFR级314的输出304的HFR范围316、318中出现。The output from the high-frequency reconstruction stage 314 consists of two signals 304a-b, including the downmixed signals 208a-b and the applied HFR extensions 316, 318. As described above, the HFR stage 314 performs high-frequency reconstruction based on the frequencies present in the input signals 210a-e from the second receiver stage 214 (shown in FIG. 2), which are combined with the two downmixed signals 208a-b. In short, the HFR ranges 316, 318 include the portion of the spectral coefficients from the downmixed signals 310, 312 that has been copied to the HFR ranges 316, 318. Therefore, portions of the five waveform-coded signals 210a-e will appear in the HFR ranges 316, 318 of the output 304 from the HFR stage 314.

应当注意，在高频重构级314之前的下混级308处的下混和第一组合级320、322中的组合可以在时域中进行，即，在通过应用(图2中所示的)逆修正离散余弦变换(MDCT)216将每个信号变换到时域之后进行。但是，假定波形编码信号210a-e和波形编码下混信号208a-b可以由波形编码器使用具有独立加窗的重叠窗变换来编码，信号210a-e和208a-b可能无法在时域中无缝地组合。因此，如果至少第一组合级320、322中的组合在QMF域中进行，则获得受到更好控制的场景。It should be noted that the downmixing at downmixing stage 308 before high-frequency reconstruction stage 314 and the combination in the first combination stages 320, 322 can be performed in the time domain, i.e., after each signal is transformed to the time domain by applying the inverse modified discrete cosine transform (MDCT) 216 (shown in Figure 2). However, assuming that the waveform-coded signals 210a-e and the waveform-coded downmixing signals 208a-b can be encoded by the waveform encoder using an overlapping window transform with independent windowing, signals 210a-e and 208a-b may not be able to be seamlessly combined in the time domain. Therefore, a better-controlled scenario is obtained if the combination in at least the first combination stages 320, 322 is performed in the QMF domain.

图4示出了编码器100的第三概念性部分400，也是最后一个。来自HFR级314的输出304构成上混级402的输入。上混级402通过对频率扩展的信号304a-e执行参数上混来创建五个信号输出404a-e。对于高于第一交越频率k_y的频率，五个上混信号404a-e中的每一个对应于经编码的5.1环绕声中的五个编码声道之一。根据示例性的参数上混过程，上混级402首先接收参数混合参数。上混级402还生成两个频率扩展的组合下混信号304a-b的去相关版本。上混级402还使两个频率扩展的组合下混信号304a-b和两个频率扩展的组合下混信号304a-b的去相关版本接受矩阵运算，其中矩阵运算的参数由上混参数给出。可替代地，可以应用本领域中已知的任何其它参数上混过程。可应用的参数上混过程被描述在例如“MPEG Surround—The ISO/MPEG Standard for Efficient and CompatibleMultichannel Audio Coding”(Herre等人，Journal of the Audio EngineeringSociety,Vol.56,No.11,2008年11月)中。Figure 4 illustrates the third and final conceptual section 400 of encoder 100. The output 304 from HFR stage 314 constitutes the input to upmixing stage 402. Upmixing stage 402 creates five signal outputs 404a-e by performing parametric upmixing on the frequency-spread signals 304a-e. For frequencies above the first crossover frequency _ky , each of the five upmixed signals 404a-e corresponds to one of the five coded channels in the encoded 5.1 surround sound. According to an exemplary parametric upmixing process, upmixing stage 402 first receives parametric mixing parameters. Upmixing stage 402 also generates decorrelated versions of the combined downmixed signals 304a-b of the two frequency spreads. Upmixing stage 402 further subjectes the combined downmixed signals 304a-b of the two frequency spreads and their decorrelated versions to matrix operations, where the parameters of the matrix operations are given by the upmixing parameters. Alternatively, any other parametric upmixing process known in the art can be applied. Applicable parameter upmixing processes are described, for example, in “MPEG Surround—The ISO/MPEG Standard for Efficient and Compatible Multichannel Audio Coding” (Herre et al., Journal of the Audio Engineering Society, Vol. 56, No. 11, November 2008).

因此，来自上混级402的输出404a-e不包括低于第一交越频率k_y的频率。与高至第一交越频率k_y的频率对应的剩余频谱系数存在于五个波形编码信号210a-e中，这五个波形编码信号210a-e已经被延迟级412延迟以匹配上混信号404的定时。Therefore, the outputs 404a-e from the upmixing stage 402 do not include frequencies below the first crossover frequency _ky . The remaining spectral coefficients corresponding to frequencies up to the first crossover frequency _ky exist in five waveform-coded signals 210a-e, which have been delayed by the delay stage 412 to match the timing of the upmixing signal 404.

编码器100还包括第二组合级416、418。第二组合级416、418被配置为组合五个上混信号404a-e与由第二接收级214(图2中所示)接收的五个波形编码信号210a-e。The encoder 100 also includes second combination stages 416, 418. The second combination stages 416, 418 are configured to combine five upmixed signals 404a-e with five waveform encoded signals 210a-e received by the second receiving stage 214 (shown in FIG. 2).

注意，任何存在的Lfe信号都可以作为单独的信号被添加到所得到的组合信号422。然后，通过应用逆QMF变换420，将信号422中的每一个变换到时域。因此，来自逆QMF变换414的输出是完全解码的5.1声道音频信号。Note that any existing Lfe signal can be added as a separate signal to the resulting combined signal 422. Each of the signals in 422 is then transformed to the time domain by applying the inverse QMF transform 420. Therefore, the output from the inverse QMF transform 414 is a fully decoded 5.1-channel audio signal.

图6示出了作为图1的解码系统100的变型的解码系统100’。解码系统100’具有与图1的概念性部分100、200、300对应的概念性部分200’、300’和400’。图6的解码系统100’与图1的解码系统之间的差别在于：在概念性部分200’中存在第三接收级616，并且在第三概念性部分400’中存在交织级714。Figure 6 shows a decoding system 100', a variant of the decoding system 100 of Figure 1. The decoding system 100' has conceptual sections 200', 300', and 400' corresponding to the conceptual sections 100, 200, and 300 of Figure 1. The difference between the decoding system 100' of Figure 6 and the decoding system of Figure 1 is that a third receiving stage 616 is present in the conceptual section 200', and an interleaving stage 714 is present in the third conceptual section 400'.

第三接收级616被配置为接收另一波形编码信号。这另一波形编码信号包括与高于第一交越频率的频率的子集对应的频谱系数。可以通过应用逆MDCT 216将该另一波形编码信号变换到时域。然后，可以通过应用QMF变换218将其变换回到频域。The third receiver stage 616 is configured to receive another waveform-coded signal. This other waveform-coded signal includes spectral coefficients corresponding to a subset of frequencies above the first crossover frequency. This other waveform-coded signal can be transformed to the time domain by applying inverse MDCT 216. It can then be transformed back to the frequency domain by applying QMF transform 218.

应当理解，该另一波形编码信号可以被接收为单独的信号。但是，该另一波形编码信号还可以构成五个波形编码信号210a-e中的一个或多个的一部分。换句话说，该另一波形编码信号可以例如使用相同的MCDT变换来与五个波形编码信号210a-e中的一个或多个联合编码。如果是这样，则第三接收级616对应于第二接收级，即，该另一波形编码信号是经由第二接收级214与五个波形编码信号210a-e一起接收的。It should be understood that this additional waveform-coded signal can be received as a separate signal. However, this additional waveform-coded signal can also form part of one or more of the five waveform-coded signals 210a-e. In other words, this additional waveform-coded signal can, for example, be jointly encoded with one or more of the five waveform-coded signals 210a-e using the same MCDT transform. If so, then the third receiving stage 616 corresponds to the second receiving stage, that is, the additional waveform-coded signal is received together with the five waveform-coded signals 210a-e via the second receiving stage 214.

图7更详细地示出了图6的解码器100’的第三概念性部分300’。除了高频扩展的下混信号304a-b和五个波形编码信号210a-e之外，另一波形编码信号710也被输入到第三概念性部分400’。在所例示的例子中，该另一波形编码信号710对应于五个声道中的第三个声道。该另一波形编码信号710还包括与从第一交越频率k_y开始的频率间隔对应的频谱系数。但是，高于第一交越频率的频率范围中的被该另一波形编码信号710覆盖的子集的形式在不同的实施例中当然可以变化。还要注意，可以接收多个波形编码信号710a-e，其中不同的波形编码信号可以对应于不同的输出声道。被这多个其它波形编码信号710a-e覆盖的频率范围的子集可以在这多个其它波形编码信号710a-e的不同信号之间有所变化。Figure 7 shows the third conceptual section 300' of the decoder 100' of Figure 6 in more detail. In addition to the high-frequency extended downmixed signals 304a-b and the five waveform-coded signals 210a-e, another waveform-coded signal 710 is also input to the third conceptual section 400'. In the illustrated example, this other waveform-coded signal 710 corresponds to the third of the five channels. This other waveform-coded signal 710 also includes spectral coefficients corresponding to frequency intervals starting from the first crossover frequency _ky . However, the form of the subset of frequency ranges above the first crossover frequency covered by this other waveform-coded signal 710 can, of course, vary in different embodiments. It should also be noted that multiple waveform-coded signals 710a-e can be received, with different waveform-coded signals corresponding to different output channels. The subset of frequency ranges covered by these multiple other waveform-coded signals 710a-e can vary among the different signals of these multiple other waveform-coded signals 710a-e.

该另一波形编码信号710可以被延迟级712延迟，以匹配从上混级402输出的上混信号404的定时。然后，上混信号404和该另一波形编码信号710被输入到交织级714。交织级714进行交织，即，组合上混信号404和该另一波形编码信号710，以生成交织信号704。在本示例中，交织级714因此使第三上混信号404c和该另一波形编码信号710交织。可以通过把两个信号相加到一起来执行交织。但是，典型地，通过在信号重叠的频率范围和时间范围内用该另一波形编码信号710替换上混信号404来执行交织。The other waveform-encoded signal 710 can be delayed by delay stage 712 to match the timing of the upmix signal 404 output from upmix stage 402. The upmix signal 404 and the other waveform-encoded signal 710 are then input to interleaving stage 714. Interleaving stage 714 performs interleaving, that is, combines the upmix signal 404 and the other waveform-encoded signal 710 to generate an interleaved signal 704. In this example, interleaving stage 714 thus interleaves the third upmix signal 404c and the other waveform-encoded signal 710. Interleaving can be performed by adding the two signals together. However, typically, interleaving is performed by replacing the upmix signal 404 with the other waveform-encoded signal 710 within the frequency and time range of signal overlap.

然后，交织信号704被输入到第二组合级416、418，在那里它与波形编码信号201a-e组合，从而以与参考图4所描述的相同方式生成输出信号722。要注意，交织级714和第二组合级416、418的次序可以颠倒，使得组合在交织之前执行。Then, the interleaved signal 704 is input to the second combination stages 416, 418, where it is combined with the waveform-coded signals 201a-e to generate the output signal 722 in the same manner as described with reference to FIG4. Note that the order of the interleaving stage 714 and the second combination stages 416, 418 can be reversed, such that the combination is performed before the interleaving.

此外，在该另一波形编码信号710构成五个波形编码信号210a-e中的一个或多个的一部分的情况下，第二组合级416、418和交织级714可以结合为单个级。具体而言，这种结合的级将针对高至第一交越频率k_y的频率使用五个波形编码信号210a-e的频谱成分。对于高于第一交越频率的频率，结合的级将使用与另一波形编码信号710交织的上混信号404。Furthermore, when the other waveform-coded signal 710 constitutes part of one or more of the five waveform-coded signals 210a-e, the second combination stages 416, 418 and the interleaving stage 714 can be combined into a single stage. Specifically, this combined stage will use the spectral components of the five waveform-coded signals 210a-e for frequencies up to the first crossover frequency _ky . For frequencies above the first crossover frequency, the combined stage will use an upmixed signal 404 interleaved with the other waveform-coded signal 710.

交织级714可以在控制信号的控制下操作。出于此目的，解码器100’可以例如经由第三接收级616接收控制信号，该控制信号指示如何交织该另一波形编码信号与M个上混信号之一。例如，控制信号可以指示该另一波形编码信号710与上混信号404之一要交织的频率范围和时间范围。例如，可以按照要进行交织的时间/频率块来表示频率范围和时间范围。时间/频率块可以是关于发生交织的QMF域的时间/频率网格的时间/频率块。The interleaving stage 714 can operate under the control of a control signal. For this purpose, the decoder 100' can receive the control signal, for example via a third receiving stage 616, which indicates how to interleave the other waveform-encoded signal with one of the M upmixing signals. For example, the control signal can indicate the frequency range and time range to be interleaved between the other waveform-encoded signal 710 and one of the upmixing signals 404. For example, the frequency range and time range can be represented according to time/frequency blocks to be interleaved. The time/frequency block can be a time/frequency block about the time/frequency grid of the QMF domain where the interleaving occurs.

控制信号可以使用向量，诸如二进制向量，来指示要进行交织的时间/频率块。具体而言，可以存在关于频率方向的第一向量，以指示要执行交织的频率。例如可以通过针对第一向量中的对应频率间隔指示逻辑1来作出该指示。还可以存在关于时间方向的第二向量，以指示要执行交织的时间间隔。例如可以通过针对第二向量中对应时间间隔指示逻辑1来作出该指示。出于此目的，时间帧典型地被划分成多个时隙，使得可以逐子帧地作出时间指示。通过使第一向量和第二向量相交，可以构造时间/频率矩阵。例如，时间/频率矩阵可以是这样的二进制矩阵：对于第一和第二向量指示逻辑1的每个时间/频率块，该二进制矩阵包括逻辑1。然后，交织级714可以在执行交织时使用时间/频率矩阵，例如使得对于在时间/频率矩阵中诸如由逻辑1指示的时间/频率块，上混信号704中的一个或多个被所述另一波形编码信号710替换。The control signals can use vectors, such as binary vectors, to indicate the time/frequency blocks to be interleaved. Specifically, there can be a first vector in the frequency direction to indicate the frequency at which interleaving is to be performed. This indication can be made, for example, by indicating logic 1 for the corresponding frequency interval in the first vector. There can also be a second vector in the time direction to indicate the time interval at which interleaving is to be performed. This indication can be made, for example, by indicating logic 1 for the corresponding time interval in the second vector. For this purpose, time frames are typically divided into multiple time slots, allowing time indications to be made subframe by subframe. A time/frequency matrix can be constructed by intersecting the first and second vectors. For example, the time/frequency matrix can be a binary matrix in which logic 1 is included for each time/frequency block indicated by logic 1 in the first and second vectors. The interleaving level 714 can then use the time/frequency matrix when performing interleaving, for example, such that for a time/frequency block in the time/frequency matrix, such as that indicated by logic 1, one or more of the upmixing signals 704 are replaced by the other waveform-coded signal 710.

注意，向量可以使用除二进制方案之外的其它方案来指示要进行交织的时间/频率块。例如，向量可以借助于诸如0之类的第一值来指示不进行交织，并且通过第二值来指示关于被第二值标识的某个声道要进行交织。Note that vectors can use schemes other than binary schemes to indicate time/frequency blocks to be interleaved. For example, a vector can use a first value such as 0 to indicate no interleaving and a second value to indicate interleaving with respect to a particular channel identified by the second value.

图5以举例的方式示出了根据实施例的用于编码M个声道的多声道音频处理系统的编码系统500的概括框图。Figure 5 shows, by way of example, a general block diagram of an encoding system 500 for encoding a multi-channel audio processing system with M channels according to an embodiment.

在图5所描述的示例性实施例中，描述了5.1环绕声的编码。因此，在所例示的例子中，M被设置为五。注意，在所述实施例中或者在附图中没有提到低频效果信号。这并非意味着忽略了任何低频效果。低频效果(Lfe)以本领域技术人员周知的任何合适方式被添加到比特流552。还要注意，所描述的编码器同样良好地适于编码其它类型的环绕声，诸如7.1或9.1环绕声。在编码器500中，五个信号502、504在接收级(未示出)被接收。编码器500包括第一波形编码级506，第一波形编码级506被配置为从接收级接收五个信号502、504并且通过逐个地波形编码这五个信号502、504来生成五个波形编码信号518。波形编码级506例如可以使五个接收信号502、504中的每一个接受MDCT变换。如关于解码器所讨论的，编码器可以选择使用具有独立加窗的MDCT变换来编码五个信号502、504中的每一个。这可以允许提高编码质量并因此允许提高解码信号的质量。In the exemplary embodiment depicted in Figure 5, encoding of 5.1 surround sound is described. Therefore, in the illustrated example, M is set to five. Note that low-frequency effect signals are not mentioned in the described embodiment or in the figures. This does not mean that any low-frequency effects are ignored. Low-frequency effects (Lfe) are added to bitstream 552 in any suitable manner known to those skilled in the art. It should also be noted that the described encoder is equally well suited for encoding other types of surround sound, such as 7.1 or 9.1 surround sound. In encoder 500, five signals 502, 504 are received at a receiver stage (not shown). Encoder 500 includes a first waveform encoding stage 506 configured to receive the five signals 502, 504 from the receiver stage and generate five waveform-encoded signals 518 by waveform encoding these five signals 502, 504 one by one. Waveform encoding stage 506 may, for example, cause each of the five received signals 502, 504 to undergo MDCT transformation. As discussed regarding the decoder, the encoder can choose to use an MDCT transform with independent windowing to encode each of the five signals 502, 504. This allows for improved encoding quality and therefore allows for improved quality of the decoded signal.

对于与高至第一交越频率的频率对应的频率范围，五个波形编码信号518被波形编码。因此，五个波形编码信号518包括与高至第一交越频率的频率对应的频谱系数。这可以通过让五个波形编码信号518中的每一个经受低通滤波器来实现。然后，五个波形编码信号518根据心理声学模型被量化520。心理模型被配置为尽可能准确，考虑多声道音频处理系统中的可用比特速率，当在系统的解码器侧被解码时再现如收听者感知的编码信号。For a frequency range corresponding to frequencies up to the first crossover frequency, five waveform-coded signals 518 are waveform-coded. Therefore, the five waveform-coded signals 518 include spectral coefficients corresponding to frequencies up to the first crossover frequency. This can be achieved by subjecting each of the five waveform-coded signals 518 to a low-pass filter. The five waveform-coded signals 518 are then quantized 520 according to a psychoacoustic model. The psychoacoustic model is configured to be as accurate as possible, taking into account the available bit rate in the multi-channel audio processing system, to reproduce the encoded signal as perceived by the listener when decoded at the system's decoder side.

如以上所讨论的，编码器500执行包括离散多声道编码和参数编码的混合编码。如上所述，对于高至第一交越频率的频率，在波形编码级506中对输入信号502、504中的每一个执行离散多声道编码。对于高于第一交越频率的频率，执行参数编码，以便能够在解码器侧根据N个下混信号重构五个输入信号502、504。在图5所例示的例子中，N被设置为2。五个输入信号502、504的下混在下混级534中执行。下混级534有利地在QMF域中操作。因此，在被输入到下混级534之前，五个信号502、504由QMF分析级526变换到QMF域。下混级对五个信号502、504执行线性下混操作，并输出两个下混信号544、546。As discussed above, encoder 500 performs hybrid coding including discrete multichannel coding and parametric coding. As described above, for frequencies up to the first crossover frequency, discrete multichannel coding is performed on each of the input signals 502, 504 in waveform coding stage 506. For frequencies above the first crossover frequency, parametric coding is performed so that the five input signals 502, 504 can be reconstructed on the decoder side based on N downmixed signals. In the example illustrated in Figure 5, N is set to 2. Downmixing of the five input signals 502, 504 is performed in downmixing stage 534. Downmixing stage 534 advantageously operates in the QMF domain. Therefore, before being input to downmixing stage 534, the five signals 502, 504 are transformed to the QMF domain by QMF analysis stage 526. The downmixing stage performs linear downmixing on the five signals 502, 504 and outputs two downmixed signals 544, 546.

在这两个下混信号544、546通过经受逆QMF变换554而被变换回时域之后，它们由第二波形编码级508接收。第二波形编码级508通过针对与介于第一和第二交越频率之间的频率对应的频率范围来波形编码两个下混信号544、546而生成两个波形编码下混信号。波形编码级508可以例如使两个下混信号中的每一个经受MDCT变换。因此，这两个波形编码下混信号包括与介于第一交越频率和第二交越频率之间的频率对应的频谱系数。然后，根据心理声学模型，这两个波形编码下混信号被量化522。After the two downmixed signals 544 and 546 are transformed back to the time domain by undergoing an inverse QMF transform 554, they are received by a second waveform coding stage 508. The second waveform coding stage 508 generates two waveform-coded downmixed signals by waveform coding the two downmixed signals 544 and 546 for a frequency range corresponding to frequencies between the first and second crossover frequencies. The waveform coding stage 508 may, for example, subject each of the two downmixed signals to an MDCT transform. Therefore, the two waveform-coded downmixed signals include spectral coefficients corresponding to frequencies between the first and second crossover frequencies. Then, according to a psychoacoustic model, the two waveform-coded downmixed signals are quantized 522.

为了能够在解码器侧重构第二交越频率以上的频率，从两个下混信号544、546中提取高频重构(HFR)参数538。这些参数在HFR编码级532提取。In order to reconstruct frequencies above the second crossover frequency on the decoder side, high-frequency reconstruction (HFR) parameters 538 are extracted from the two downmixed signals 544 and 546. These parameters are extracted at the HFR coding level 532.

为了能够在解码器侧根据两个下混信号544、546重构五个信号，由参数编码级530接收五个输入信号502、504。对于与高于第一交越频率的频率对于的频率范围，这五个信号502、504经受参数编码。然后，参数编码级530被配置为提取上混参数536，该上混参数536使得对于第一交越频率以上的频率范围，能够把两个下混信号544、546上混成对应于五个输入信号502、504(即，经编码的5.1环绕声中的五个声道)的五个重构信号。注意，只针对第一交越频率以上的频率范围提取上混参数536。这可以降低参数编码级530的复杂度，以及对应参数数据的比特速率。To reconstruct five signals from two downmixed signals 544 and 546 at the decoder side, the parametric encoding stage 530 receives five input signals 502 and 504. These five signals 502 and 504 undergo parametric encoding for the frequency range corresponding to frequencies above the first crossover frequency. The parametric encoding stage 530 is then configured to extract an upmixing parameter 536 that allows the two downmixed signals 544 and 546 to be upmixed into five reconstructed signals corresponding to the five input signals 502 and 504 (i.e., the five channels in the encoded 5.1 surround sound) for the frequency range above the first crossover frequency. Note that the upmixing parameter 536 is extracted only for the frequency range above the first crossover frequency. This reduces the complexity of the parametric encoding stage 530 and the bit rate of the corresponding parameter data.

注意，下混534可以在时域中实现。在这种情况下，QMF分析级526应当位于下混级534的下游、HFR编码级532之前，因为HFR编码级532典型地在QMF域中操作。在这种情况下，逆QMF级554可以省略。Note that downmixing 534 can be implemented in the time domain. In this case, QMF analysis level 526 should be downstream of downmixing level 534 and before HFR coding level 532, because HFR coding level 532 typically operates in the QMF domain. In this case, inverse QMF level 554 can be omitted.

编码器500还包括比特流生成级(即，比特流多路复用器)524。根据编码器500的示例性实施例，比特流生成级被配置为接收五个经编码和量化的信号548、两个参数信号536、538以及两个经编码和量化的下混信号550。这些信号被比特流生成级524转换为比特流552，以进一步在多声道音频系统中分发。The encoder 500 also includes a bitstream generation stage (i.e., a bitstream multiplexer) 524. According to an exemplary embodiment of the encoder 500, the bitstream generation stage is configured to receive five encoded and quantized signals 548, two parameter signals 536, 538, and two encoded and quantized downmixed signals 550. These signals are converted into a bitstream 552 by the bitstream generation stage 524 for further distribution in a multi-channel audio system.

在所描述的多声道音频系统中，例如当在互联网上流传输音频时，常常存在最大可用比特速率。由于输入信号502、504的每个时间帧的特性不同，因此在五个波形编码信号548和两个下混波形编码信号550之间不能使用完全相同的比特分配。此外，每个个体信号548和550可能需要更多或更少的分配的比特，使得信号可以根据心理声学模型来重构。根据示例性实施例，第一和第二波形编码级506、508共享公共的比特池(bit reservoir)。取决于待编码的信号的特性以及当前的心理声学模型，每个编码帧可用的比特首先在第一和第二波形编码级506、508之间分配。然后如上所述，比特在个体信号548、550之间分配。在分配可用的比特时，用于高频重构参数538和上混参数536的比特数当然要被考虑。关于在特定时间帧分配的比特数，要注意调整用于第一和第二波形编码级506、508的心理声学模型，以在第一交越频率周围在感知上平滑地转变。In the described multichannel audio system, such as when streaming audio over the Internet, there is often a maximum available bit rate. Because the characteristics of each time frame of the input signals 502 and 504 are different, the exact same bit allocation cannot be used among the five waveform-coded signals 548 and the two downmixed waveform-coded signals 550. Furthermore, each individual signal 548 and 550 may require more or fewer allocated bits, allowing the signal to be reconstructed according to a psychoacoustic model. According to an exemplary embodiment, the first and second waveform coding stages 506 and 508 share a common bit reservoir. Depending on the characteristics of the signal to be encoded and the current psychoacoustic model, the available bits for each encoded frame are first allocated between the first and second waveform coding stages 506 and 508. Then, as described above, bits are allocated among the individual signals 548 and 550. When allocating the available bits, the number of bits used for the high-frequency reconstruction parameter 538 and the upmixing parameter 536 must, of course, be taken into account. Regarding the number of bits allocated in a specific time frame, care should be taken to adjust the psychoacoustic models used for the first and second waveform coding levels 506 and 508 to achieve a perceptually smooth transition around the first crossover frequency.

图8示出了编码系统800的可替代实施例。图8的编码系统800与图5的编码系统500之间的差别在于编码器800被布置为通过针对与第一交越频率以上的频率范围的子集对应的频率范围，波形编码输入信号502、504中的一个或多个，来生成另一波形编码信号。Figure 8 illustrates an alternative embodiment of the encoding system 800. The difference between the encoding system 800 of Figure 8 and the encoding system 500 of Figure 5 is that the encoder 800 is arranged to generate another waveform-encoded signal by waveform-encoded input signals 502, 504 for a frequency range corresponding to a subset of the frequency range above the first crossover frequency.

出于此目的，编码器800包括交织检测级802。交织检测级802被配置为识别输入信号502、504中在由参数编码级530和高频重构编码级532编码时通过参数重构未被良好重构的部分。例如，交织检测级802可以将输入信号502、504与由参数编码级530和高频重构编码级532定义的输入信号502、504的参数重构相比较。基于该比较，交织检测级802可以标识第一交越频率以上的频率范围中的要被波形编码的子集804。交织检测级802还可以标识其间第一交越频率以上的频率范围的已标识出的子集804要被波形编码的时间范围。所标识出的频率和时间子集804、806可以被输入到第一波形编码级506。基于所接收的频率和时间子集804和806，第一波形编码级506通过针对由子集804、806标识出的时间和频率范围，波形编码输入信号502、504中的一个或多个来生成另一波形编码信号808。然后，该另一波形编码信号808可以被级520编码和量化，并被添加到比特流846。For this purpose, encoder 800 includes an interleaving detection stage 802. The interleaving detection stage 802 is configured to identify portions of input signals 502, 504 that were not well reconstructed by parametric reconstruction during encoding by parametric encoding stage 530 and high-frequency reconstruction encoding stage 532. For example, the interleaving detection stage 802 can compare the input signals 502, 504 with the parametric reconstructions of the input signals 502, 504 defined by parametric encoding stage 530 and high-frequency reconstruction encoding stage 532. Based on this comparison, the interleaving detection stage 802 can identify a subset 804 of the frequency range above a first crossover frequency to be waveform encoded. The interleaving detection stage 802 can also identify the time range within which the identified subset 804 of the frequency range above the first crossover frequency is to be waveform encoded. The identified frequency and time subsets 804, 806 can be input to the first waveform encoding stage 506. Based on the received frequency and time subsets 804 and 806, the first waveform encoding stage 506 generates another waveform encoded signal 808 by means of one or more of the waveform encoded input signals 502 and 504 for the time and frequency range identified by subsets 804 and 806. This other waveform encoded signal 808 can then be encoded and quantized by stage 520 and added to the bitstream 846.

交织检测级802还可以包括控制信号生成级。控制信号生成级被配置为生成控制信号810，该控制信号810指示在解码器中如何使所述另一波形编码信号与输入信号502、504之一的参数重构交织。例如，如参考图7所描述的，控制信号可以指示该另一波形编码信号要与参数重构交织的频率范围和时间范围。控制信号可以被添加到比特流846。The interleaving detection stage 802 may also include a control signal generation stage. The control signal generation stage is configured to generate a control signal 810 that indicates how the other waveform encoded signal should be parametrically re-interleaved with one of the input signals 502, 504 in the decoder. For example, as described with reference to FIG7, the control signal may indicate the frequency range and time range with which the other waveform encoded signal should be parametrically re-interleaved. The control signal may be added to the bitstream 846.

等效、扩展、替代及其他Equivalent, extended, substitute and others

在研究以上描述之后，对于本领域技术人员来说，本公开内容的更多实施例将变得显然。虽然本说明书和附图公开了实施例和例子，但是本公开内容不限于这些具体的例子。在不脱离由所附权利要求书限定的本公开内容的范围的情况下，可以作出各种变型和变化。在权利要求中出现的任何附图标记都不应当被理解为对其范围的限制。After studying the above description, further embodiments of this disclosure will become apparent to those skilled in the art. While embodiments and examples are disclosed in this specification and accompanying drawings, this disclosure is not limited to these specific examples. Various modifications and variations may be made without departing from the scope of this disclosure as defined by the appended claims. Any reference numerals appearing in the claims should not be construed as limiting their scope.

另外，根据对附图、公开内容和所附权利要求的研究，本领域技术人员在实践本公开内容时可以理解和实现所公开的实施例的变化。在权利要求中，词语“包括”不排除其它要素或步骤，并且不定冠词“一个”不排除多个。某些措施在相互不同的从属权利要求中陈述的单纯事实并非指示这些措施的组合不能用来获益。Furthermore, based on a study of the accompanying drawings, the disclosure, and the appended claims, those skilled in the art can understand and implement variations of the disclosed embodiments in practicing this disclosure. In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" does not exclude a plurality. The mere fact that certain measures are stated in mutually different dependent claims does not indicate that a combination of these measures cannot be beneficial.

以上公开的系统和方法可以被实现为软件、固件、硬件或者其组合。在硬件实现中，在以上描述中提到的功能单元之间任务的划分不一定对应于物理单元的划分；相反，一个物理部件可以具有多个功能，并且一个任务可以由几个物理部件合作实行。某些部件或所有部件可以被实现为由数字信号处理器或微处理器执行的软件，或者被实现为硬件或者被实现为专用集成电路。这种软件可以分布在计算机可读介质上，计算机可读介质可以包括计算机存储介质(或非临时性介质)和通信介质(或临时介质)。如本领域技术人员周知的，术语“计算机存储介质”包括以任何方法或技术实现的易失性和非易失性、可移除和不可移除的介质，以便存储诸如计算机可读指令、数据结构、程序模块或其它数据之类的信息。计算机存储介质包括但不限于：RAM、ROM、EEPROM、闪存存储器或其它存储器技术、CD-ROM、数字多功能盘(DVD)或其它光学盘储存器、盒式磁带、磁带、磁盘储存器或其它磁性存储设备、或者可用来存储期望的信息并且可以被计算机访问的任何其它介质。另外，本领域技术人员周知：通信介质典型地在诸如载波或其它传输机制之类的调制数据信号中包含计算机可读指令、数据结构、程序模块或其它数据，并且包括任何信息传送介质。The systems and methods disclosed above can be implemented as software, firmware, hardware, or a combination thereof. In a hardware implementation, the division of tasks among the functional units mentioned above does not necessarily correspond to the division of physical units; rather, a physical component may have multiple functions, and a task may be performed collaboratively by several physical components. Some or all of the components may be implemented as software executed by a digital signal processor or microprocessor, or as hardware, or as an application-specific integrated circuit. Such software may be distributed on a computer-readable medium, which may include computer storage media (or non-transitory media) and communication media (or temporary media). As is known to those skilled in the art, the term "computer storage medium" includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information such as computer-readable instructions, data structures, program modules, or other data. Computer storage media include, but are not limited to: RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disc (DVD) or other optical disc storage, magnetic tape, magnetic tape, disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and is accessible to a computer. In addition, those skilled in the art will know that communication media typically contain computer-readable instructions, data structures, program modules or other data in modulated data signals such as carrier waves or other transmission mechanisms, and include any information transmission medium.

Claims

1. A method for decoding time frames of an encoded audio bitstream in an audio processing system, the method comprising:

For each time frame, M upmixed signals (404) are received, wherein the M upmixed signals include spectral coefficients corresponding to frequencies higher than the first crossover frequency.

The M upmixed signals are the result of upmixing N frequency extension downmixed signals obtained by performing frequency reconstruction above the second crossover frequency in the reconstruction range into M upmixed signals for the time frame, wherein the second crossover frequency is higher than the first crossover frequency and the frequency reconstruction uses reconstruction parameters derived from the encoded audio bitstream.

For the time frame, another waveform encoded signal is extracted from the encoded audio bitstream, the other waveform encoded signal including spectral coefficients corresponding to a subset of frequencies higher than the first crossover frequency; and

For the time frame, the other waveform encoded signal is interleaved with one of the M upmixing signals to generate an interleaved signal.

2. The method of claim 1, wherein the first crossover frequency depends on the bit transmission rate of the audio processing system.

3. The method of claim 1, wherein the interleaving comprises: (i) adding the other waveform encoded signal to one of the M upmixing signals, (ii) combining the other waveform encoded signal with one of the M upmixing signals, or (iii) replacing one of the M upmixing signals with the other waveform encoded signal.

4. The method of claim 1, wherein the frequency reconstruction is performed in the frequency domain.

5. The method of claim 1, further comprising receiving a control signal used during interleaving to generate an interleaved signal.

6. The method of claim 5, wherein the control signal indicates how to interleave the other waveform-coded signal with one of the M upmixing signals by specifying the frequency range or time range of the interleaving.

7. The method of claim 5, wherein the first value of the control signal indicates that interleaving is performed for the corresponding frequency region.

8. The method of claim 1, wherein the audio processing system is a hybrid decoder that performs waveform decoding and parameter decoding.

9. An audio decoder for decoding time frames of an encoded audio bitstream, the audio decoder comprising:

An input terminal for receiving M upmixed signals (404) for a time frame, wherein the M upmixed signals include spectral coefficients corresponding to frequencies higher than a first crossover frequency.

A demultiplexer is configured to extract, for the time frame, another waveform-coded signal from the encoded audio bitstream, the other waveform-coded signal comprising spectral coefficients corresponding to a subset of frequencies above the first crossover frequency; and

An interleaver is used to interleave the other waveform-coded signal with one of the M upmixing signals for the time frame to produce an interleaved signal.

10. A non-transitory computer-readable medium comprising instructions that, when executed by a processor, perform the method as described in any one of claims 1-8.

11. An apparatus for decoding time frames of an encoded audio bitstream in an audio processing system, comprising:

Processor; and

A memory storing instructions that, when executed by a processor, cause the processor to perform the method as described in any one of claims 1-8.

12. An apparatus for decoding time frames of an encoded audio bitstream in an audio processing system, comprising units for performing each step of the method as described in any one of claims 1-8.