CN1848690A

CN1848690A - Multi-channel digital audio encoding device and method thereof

Info

Publication number: CN1848690A
Application number: CNA2005100958986A
Authority: CN
Inventors: 游余立
Original assignee: Digital Rise Technology Co Ltd
Current assignee: Digital Rise Technology Co Ltd
Priority date: 2004-09-17
Filing date: 2005-09-07
Publication date: 2006-10-18
Anticipated expiration: 2025-09-07
Also published as: CN101247129B; CN101246689A; CN100364235C; CN101247129A; CN101055719A; CN101246689B; CN101312041B; CN101055719B; CN101046963B; CN101312041A; CN101055721A; CN101046963A; CN101055721B; CN101241701B; CN101241701A

Abstract

A low bit-rate digital audio coding system includes an encoder that assigns a codebook to groups of quantization indices based on local characteristics of the quantization indices, such that the codebook application range is independent of quantization boundaries. The invention also includes a resolution filter bank or a tri-mode resolution filter bank that can be selectively switched between high and low frequency resolution modes or high, low and medium modes, e.g., when transients are detected in a frame. The resulting multi-channel audio signal has a greatly reduced bit rate for efficient transmission or storage. The decoder is essentially the inverse structure and method of the encoder, producing a restored audio signal that is not audibly distinguishable from the original signal.

Description

Multi-channel digital audio encoding device and method thereof

相关申请related application

本申请要求2004年9月17日申请的美国临时申请60/610,674的优先权。This application claims priority to US Provisional Application 60/610,674, filed September 17,2004.

发明背景Background of the invention

本发明通常涉及用于编码和解码多声道数字音频信号的方法和系统。更确切地说，本发明涉及一个低比特率的数字音频编码系统，其在获得透明的音频信号再现的同时大大降低了多声道音频信号的比特率以便进行有效的发送或存储，甚至连听测专家也不能区分在解码器端还原的音频信号与原始信号。The present invention generally relates to methods and systems for encoding and decoding multi-channel digital audio signals. More precisely, the present invention relates to a low-bit-rate digital audio coding system which greatly reduces the bit-rate of multi-channel audio signals for efficient transmission or storage, even for listening, while achieving transparent audio signal reproduction. Even test experts cannot distinguish the audio signal restored at the decoder from the original signal.

多声道数字音频编码系统通常包括下列元件：时间-频率分析滤波器组，其产生输入的PCM(脉冲编码调制)样本的一个频率表示、叫做子带样本或子带信号；心理声学模型，其基于人耳的听觉特性来计算一个掩蔽阈值，而低于该掩蔽阈值的量化噪声不太可能被听见；全局比特分配器，其向每组子带样本分配比特资源，以便得到的量化噪声功率低于掩蔽阈值；多个量化器，其根据被分配的比特来量化子带样本；多个熵编码器，其降低量化指数中的统计冗余性；和最后的多路复用器，其把量化指数的熵编码及其它辅助信息打包成一个完整的比特流。A multi-channel digital audio coding system usually includes the following elements: a time-frequency analysis filter bank, which produces a frequency representation of the incoming PCM (Pulse Code Modulation) samples, called subband samples or subband signals; a psychoacoustic model, whose A masking threshold below which quantization noise is less likely to be heard is calculated based on the auditory characteristics of the human ear; a global bit allocator that allocates bit resources to each group of subband samples so that the resulting quantization noise power is low multiple quantizers, which quantize the subband samples according to the assigned bits; multiple entropy encoders, which reduce the statistical redundancy in the quantization index; and finally a multiplexer, which quantizes The entropy encoding of the exponent and other auxiliary information are packaged into a complete bitstream.

例如，杜比AC-3用可切换窗口尺寸的高频率分辨率的MDCT(改进的离散余弦变换)滤波器组把输入PCM样本映射到频域中。稳态信号用512点的窗口来分析，而暂态信号与256点的窗口来分析。来自MDCT的子带信号被表示为指数/尾数并随后被量化。采用前向-后向自适应的心理声学模型来优化量化并减少编码比特分配信息所需的比特。为了降低解码器的复杂度而不使用熵编码。最后，量化指数及其它辅助信息被多路复用成一个完整的AC-3比特流。AC-3中配置的自适应MDCT的频率分辨率没有很好地与输入信号特性相匹配，因此它的压缩特性受到很大的限制。熵编码的缺少是限制其压缩特性的另一个因素。For example, Dolby AC-3 uses a high frequency resolution MDCT (Modified Discrete Cosine Transform) filter bank with switchable window size to map input PCM samples into the frequency domain. Steady-state signals are analyzed with a window of 512 points, while transient signals are analyzed with a window of 256 points. The subband signals from the MDCT are represented as exponents/mantissas and then quantized. A forward-backward adaptive psychoacoustic model is employed to optimize quantization and reduce the bits required to encode bit allocation information. Entropy coding is not used in order to reduce the complexity of the decoder. Finally, the quantization index and other side information are multiplexed into a complete AC-3 bitstream. The frequency resolution of the adaptive MDCT configured in AC-3 is not well matched to the input signal characteristics, so its compression characteristics are greatly limited. The lack of entropy coding is another factor limiting its compression properties.

MPEG1&2层III(MP3)使用一个32频带的多相滤波器组，其中的每个子带滤波器都跟随有一个在6和18点之间切换的自适应MDCT。一个高级心理声学模型被用来指导其比特分配和标量非均匀量化。哈夫曼(Huffman)码被用来编码量化指数和大部分的其它辅助信息。混合滤波器组的较差的频率隔离极大地限制了它的压缩特性而且具有很高的算法复杂性。MPEG1&2 Layer III (MP3) uses a 32-band polyphase filter bank, where each subband filter is followed by an adaptive MDCT that switches between 6 and 18 points. An advanced psychoacoustic model is used to guide its bit allocation and scalar non-uniform quantization. Huffman codes are used to encode the quantization index and most of the other side information. The poor frequency isolation of hybrid filter banks greatly limits its compression properties and has high algorithmic complexity.

DTS相干声学采用一个32频带的多相滤波器组以获得输入信号的低分辨率频率表示。为了补偿较差的频率分辨率，ADPCM(自适应差分脉码调制)被选择性地用于每个子带。如果ADPCM产生一个良好的编码增益，则均匀标量量化被直接应用于子带样本或应用于预测残差。矢量量化可以选择性地被应用到高频率的子带。哈夫曼码可以选择性地被应用到标量量化指数及其它辅助信息。因为多相滤波器组+ADPCM的结构根本不能提供良好的时间和频率分辨率，所以它的压缩特性很低。DTS Coherent Acoustics employs a 32-band polyphase filter bank to obtain a low-resolution frequency representation of the input signal. To compensate for poor frequency resolution, ADPCM (Adaptive Differential Pulse Code Modulation) is selectively used for each subband. Uniform scalar quantization is applied directly to the subband samples or to the prediction residual if ADPCM yields a good coding gain. Vector quantization can be selectively applied to high frequency subbands. Huffman codes can optionally be applied to scalar quantization indices and other side information. Because the structure of polyphase filter bank + ADPCM cannot provide good time and frequency resolution at all, its compression characteristics are very low.

MPEG 2 AAC和MPEG 4 AAC采用一个自适应MDCT滤波器组，其窗口尺寸可以在256和2048之间切换。心理声学模型产生的掩蔽阈值被用来指导其标量非均匀量化和比特分配。哈夫曼码被用来编码量化指数和大部分的其它辅助信息。诸如TNS(暂时噪声整形)、增益控制(类似于MP3的混合滤波器组)、频谱预测(子带内的线性预测)之类的许多其它的工具被用来进一步增强它的压缩特性，而这以极大地增加了算法复杂性为代价。MPEG 2 AAC and MPEG 4 AAC use an adaptive MDCT filter bank whose window size can be switched between 256 and 2048. The masking threshold generated by the psychoacoustic model is used to guide its scalar non-uniform quantization and bit allocation. Huffman codes are used to encode quantization indices and most other side information. Many other tools such as TNS (Temporal Noise Shaping), Gain Control (mixed filter banks similar to MP3), spectral prediction (linear prediction within subbands) are used to further enhance its compression properties, and this At the cost of greatly increasing algorithmic complexity.

因此，仍然需要一个低比特率的音频编码系统，其极大地降低了多声道音频信号的比特率以用于有效发送或存储，而同时也能获得透明的音频信号再现。本发明满足了这个需要并提供了其它的相关优点。Therefore, there remains a need for a low bit-rate audio coding system which greatly reduces the bit-rate of multi-channel audio signals for efficient transmission or storage, while also achieving transparent audio signal reproduction. The present invention fulfills this need and provides other related advantages.

发明内容Contents of the invention

在以下讨论中，术语″分析/合成滤波器组″等指的是执行时间-频率的分析/合成的设备或方法。它可以非限制性地包括如下：In the following discussion, the terms "analysis/synthesis filter bank" and the like refer to devices or methods that perform time-frequency analysis/synthesis. It can include without limitation the following:

●酉变换；Unitary transformation;

●临界采样的、均匀的、或非均匀的带通滤波器组时变或非时变组；Time-varying or time-invariant banks of critically sampled, uniform, or non-uniform bandpass filters;

●谐波或正弦波的分析器/合成器。● Harmonic or sine wave analyzer/synthesizer.

多相滤波器组、DFT(离散傅里叶变换)、DCT(离散余弦变换)以及MDCT是一些被广泛使用的滤波器组。术语″子带信号或子带样本″等指的是出自分析滤波器组和进入合成滤波器组的信号或样本。Polyphase filter banks, DFT (Discrete Fourier Transform), DCT (Discrete Cosine Transform), and MDCT are some of the widely used filter banks. The terms "subband signal or subband sample" etc. refer to the signal or sample coming out of the analysis filterbank and into the synthesis filterbank.

本发明的一个目的是为多声道音频信号的低比特率编码提供与现有技术同样水平的压缩性能但却降低了算法复杂性。It is an object of the present invention to provide the same level of compression performance as in the prior art but with reduced algorithmic complexity for low bit rate coding of multi-channel audio signals.

这由编码器在编码端侧完成，编码器包括：This is done on the encoding end side by an encoder consisting of:

1)成帧器，用于把输入的PCM样本聚类分割成准稳态帧，其大小是分析滤波器组的子带数的整数倍，并且其时间范围是2到50ms。1) A framer, used to cluster the input PCM samples into quasi-stationary frames, whose size is an integer multiple of the number of subbands of the analysis filter bank, and whose time range is 2 to 50 ms.

2)暂态检测器，用于检测该帧中暂态的存在。一个实施例是根据取子带距离测量的阈值，阈值从低频率分辨率模式下的分析滤波器组的子带样本中获得。2) A transient detector for detecting the presence of transients in the frame. One embodiment is based on taking a threshold of the subband distance measure obtained from the subband samples of the analysis filterbank in low frequency resolution mode.

3)可变分辨率的分析滤波器组，用于把输入的PCM样本转换成子带样本，它可以用下列之一来执行：3) A variable resolution analysis filter bank for converting the input PCM samples into subband samples, which can be implemented with one of the following:

a)滤波器组，可以在高、中、低频率分辨率模式之间切换其操作。高频率分辨率模式用于稳态帧，而中、低频率分辨率模式用于具有暂态的帧。在一个暂态帧内，低频率分辨率模式被用于暂态段，而中间分辨率模式被用于该帧的剩余部分，在这个架构之下存在三类帧：a) A filter bank whose operation can be switched between high, medium and low frequency resolution modes. High frequency resolution modes are used for steady state frames, while medium and low frequency resolution modes are used for frames with transients. Within a transient frame, the low frequency resolution mode is used for the transient segment, while the intermediate resolution mode is used for the remainder of the frame. There are three types of frames under this architecture:

i)滤波器组只以高频率分辨率模式操作来处理的稳态帧；i) steady state frames processed by the filter bank only operating in high frequency resolution mode;

ii)滤波器组以中、高时间分辨率模式操作来处理的暂态帧；ii) Transient frames processed by the filter bank operating in medium and high temporal resolution modes;

iii)滤波器组只以中间分辨率模式操作处理的慢暂态帧；两个优选实施例被给出如下：iii) Slow transient frames processed by the filter bank operating only in intermediate resolution mode; two preferred embodiments are given as follows:

i)DCT实现，其中，三个级别的分辨率对应于三个DCT块长度；i) DCT implementation, wherein three levels of resolution correspond to three DCT block lengths;

ii)MDCT实现，其中，三个级别的分辨率对应于三个MDCT块长度或窗口长度。定义多个窗口类型以桥接这些窗口之间的转换。ii) MDCT implementation, where three levels of resolution correspond to three MDCT block lengths or window lengths. Define multiple window types to bridge transitions between these windows.

b)混合滤波器组，其基于一个可以在高、低分辨率模式之间切换其操作的滤波器组；b) a hybrid filter bank based on a filter bank whose operation can be switched between high and low resolution modes;

i)在当前帧中不存在暂态时，它切换到高频率分辨率模式以确保稳态段的高压缩性能；i) when there are no transients in the current frame, it switches to a high frequency resolution mode to ensure high compression performance for steady-state segments;

ii)在当前帧中存在暂态时，它切换到低频率分辨率/高时间分辨率模式以避免前向回声效应。这个低频率分辨率模式还跟随有一个暂态聚类分割级，其把子带样本分成稳态段，然后可选地在每个子带中后跟一个任意分辨率的滤波器组或ADPCM，如果被选择的话，可用于向每个稳态段提供适合的频率分辨率。ii) When there is a transient in the current frame, it switches to a low frequency resolution/high time resolution mode to avoid the forward echo effect. This low frequency resolution mode is also followed by a transient clustering segmentation stage that divides the subband samples into stationary segments, then optionally followed in each subband by an arbitrary resolution filterbank or ADPCM, if If selected, can be used to provide the appropriate frequency resolution for each steady state segment.

给出两个实施例，其中，一个基于DCT而另一个基于MDCT。给出两个暂态段的实施例出，其中，一个基于取阈值而另一个基于k均值算法，两个实施例都使用子带距离测量。Two examples are given, where one is based on DCT and the other on MDCT. Two examples of transient segments are given, one based on thresholding and the other based on the k-means algorithm, both using subband distance measurements.

2)计算掩蔽阈值的心理声学模型。2) Calculate the psychoacoustic model of the masking threshold.

3)可选的和/差编码器，其把左右声道对中的子带样本转换成和/差声道对。3) An optional sum/difference encoder that converts subband samples in left and right channel pairs into sum/difference channel pairs.

4)可选的联合强度编码器，其对比源声道来提取联合声道的强度比例因子(引导向量)，将联合声道合并到源声道中，并丢弃联合声道中的各个子带样本。4) An optional joint intensity encoder that extracts the intensity scale factor (steering vector) of the joint channel compared to the source channel, merges the joint channel into the source channel, and discards individual subbands in the joint channel sample.

5)全局比特分配器，其把比特资源分配给多组子带样本，以便它们的量化噪声功率低于掩蔽阈值。5) A global bit allocator that allocates bit resources to groups of subband samples such that their quantization noise power is below the masking threshold.

6)标量量化器，其用比特分配器提供的步长来量化所有的子带样本。6) A scalar quantizer that quantizes all subband samples with the step size provided by the bit divider.

7)可选的交错器，当帧中存在暂态时，其被选择性地用来从新排列量化指数以便于降低比特总数。7) An optional interleaver, which is selectively used to rearrange the quantization indices when there are transients in the frame in order to reduce the total number of bits.

8)熵编码器，其基于量化指数的局部统计特征把最佳的码书从码书库分配给多组量化指数，包括下列步骤：8) entropy coder, it assigns the best codebook to multiple groups of quantization indexes from the codebook library based on the local statistical characteristics of the quantization index, comprising the following steps:

a)把最佳码书分配给每个量化指数，因此实质上把量化指数转换成码书指数。a) Assign the best codebook to each quantized index, thus essentially converting the quantized index into a codebook index.

b)把这些码书指数分成很大的段，段边界定义了码书的应用范围。b) Divide these codebook indexes into very large segments, and the segment boundaries define the application range of the codebook.

一个优选实施例是：A preferred embodiment is:

c)把量化指数分块为区组(granule)，每个区组包括固定数目的量化指数。c) Block the quantization indices into granules, each granule comprising a fixed number of quantization indices.

d)确定每个区组的最大码书需求。d) Determine the maximum codebook requirement for each block.

e)把最小码书分配给一个能容纳其最大码书需求的区组：e) Assign the smallest codebook to a block that can accommodate its largest codebook requirement:

f)清除那些码书指数比其近邻的码书指数小的孤立的小块区域；那些对应于零量化指数的码书指数的孤立小块区域可以不经过这样的处理。f) Eliminate those isolated small block regions whose codebook index is smaller than its neighbor's codebook index; those isolated small block regions corresponding to the codebook index of zero quantization index may not be processed in this way.

用于对编码码书应用范围进行编码的一个优选实施例使用了游程长度码。A preferred embodiment for encoding the range of application of an encoded codebook uses run-length codes.

9)熵编码器，其用码书及其由熵码书选择器确定的应用范围来编码所有的量化指数。9) An entropy encoder that encodes all quantization indices with a codebook and its range of application determined by the entropy codebook selector.

10)多路复用器，其把量化指数的所有熵代码和辅助信息打包成一个完整的比特流，这样构造是为了量化指数出现在用于量化步长的指数之前。这个构造使得不必要把每个暂态段的量化单元数打包进比特流，因为它可以从被解包的量化指数中恢复。10) A multiplexer that packs all the entropy codes and side information of the quantization indices into one complete bitstream, constructed so that the quantization indices appear before the indices used for the quantization steps. This construction makes it unnecessary to pack the number of quantization units per transient segment into the bitstream, since it can be recovered from the unpacked quantization index.

本发明的解码器包括：The decoder of the present invention includes:

1)多路解复用器，用于从比特流解包不同的码字；1) a demultiplexer for unpacking different codewords from the bitstream;

2)量化指数码书解码器，用于从比特流中解码用于量化指数的熵码书及其各个应用范围(application range)；2) Quantization index codebook decoder, used to decode the entropy codebook used for quantization index and its application range from the bit stream;

3)熵解码器，用于从比特流中解码量化指数；3) an entropy decoder for decoding quantization indices from the bitstream;

4)可选的去交错器，在当前帧中存在暂态时，其选择性地从新排列量化指数；4) An optional deinterleaver that selectively rearranges quantization indices when there are transients in the current frame;

5)量化单元个数再造器，其用下列步骤从量化指数中重建每个暂态段的量化单元个数：5) Quantization unit number reconstructor, it reconstructs the quantization unit number of each transient segment from the quantization index with the following steps:

a)为每个暂态段找到具有非零量化指数的最大子带；a) Find the largest subband with a non-zero quantization index for each transient segment;

b)找到能容纳这个子带的最小临界频带，这就是这个暂态段的量化单元个数；b) Find the minimum critical frequency band that can accommodate this subband, which is the number of quantization units in this transient segment;

6)步长解包器，其解包所有量化单元的量化步长；6) a step size unpacker, which unpacks the quantization step size of all quantization units;

7)逆量化器，其从量化指数和步长中重建子带样本；7) An inverse quantizer, which reconstructs the subband samples from the quantization index and step size;

8)可选的联合强度解码器，其利用联合强度比例因子(引导向量)从源声道的子带样本中重建联合声道的子带样本；8) An optional joint intensity decoder that reconstructs the subband samples of the joint channel from the subband samples of the source channel using the joint intensity scale factor (steering vector);

9)可选的和/差解码器，其从和/差声道的子带样本中重建左右声道的子带样本；9) An optional sum/difference decoder which reconstructs the subband samples of the left and right channels from the subband samples of the sum/difference channels;

10)可变分辨率的合成滤波器组，其从子带样本中重建音频PCM样本，这可以通过以下来执行：10) A variable resolution synthesis filterbank that reconstructs audio PCM samples from subband samples, which can be performed by:

a)合成滤波器组，能够在高、中、低分辨率模式之间切换其操作；a) a synthesis filter bank capable of switching its operation between high, medium and low resolution modes;

b)混合合成滤波器组，其是基于一个能够在高、低分辨率模式之间切换的合成滤波器组；b) a hybrid synthesis filter bank based on a synthesis filter bank that can be switched between high and low resolution modes;

i)当比特流指示当前帧是用可变换分辨率的分析滤波器组以低频率分辨率模式来编码时，这个合成滤波器组是一个二级混合滤波器组，其中，第一级是一个任意分辨率的合成滤波器组或一个逆ADPCM，而第二级是可在高、低频率分辨率模式之间切换的自适应合成滤波器组的低频率分辨率模式；i) When the bitstream indicates that the current frame is coded in low frequency resolution mode using a variable-resolution analysis filterbank, the synthesis filterbank is a two-stage hybrid filterbank, where the first stage is a Arbitrary resolution synthesis filter bank or an inverse ADPCM, while the second stage is a low frequency resolution mode of an adaptive synthesis filter bank switchable between high and low frequency resolution modes;

ii)当比特流指示当前帧是用可变换分辨率的分析滤波器组以高频率分辨率模式来编码时，这个合成滤波器组只不过是高频率分辨率模式下的可变换分辨率的合成滤波器组。ii) When the bitstream indicates that the current frame was coded in high frequency resolution mode with a scalable resolution analysis filterbank, the synthesis filterbank is nothing but scalable resolution synthesis in high frequency resolution mode filter bank.

最后，本发明提供了一个低编码延迟模式，这个模式在可切换分辨率分析滤波器组的高频率分辨率模式被编码器禁止时被启动，并且帧长随后被减小到在低频率分辨率模式下的可切换分辨率滤波器组的块长或其整数倍。Finally, the present invention provides a low encoding delay mode that is enabled when the high frequency resolution mode of the switchable resolution analysis filter bank is disabled by the encoder, and the frame length is subsequently reduced to that at low frequency resolution The block length of the switchable-resolution filterbank in mode or an integer multiple thereof.

根据本发明，编码多声道数字音频信号的方法通常包括从多声道数字音频信号创建PCM样本和把该PCM样本转换成子带样本的步骤。具有边界的多个量化指数通过量化子带样本而被创建。通过把预先设计的码书库中能够容纳量化指数的最小的码书分配给每个量化指数，量化指数被转换成码书指数。在创建用于存储或发送的编码数据流之前，码书指数被聚类分割和编码。According to the invention, a method of encoding a multi-channel digital audio signal generally comprises the steps of creating PCM samples from the multi-channel digital audio signal and converting the PCM samples into subband samples. Multiple quantization indices with boundaries are created by quantizing the subband samples. The quantization indices are converted into codebook indices by assigning to each quantization index the smallest codebook that can accommodate the quantization indices in the pre-designed codebook library. The codebook index is clustered and encoded before creating an encoded data stream for storage or transmission.

一般来说，PCM样本被输入到持续时间在2到50毫秒(ms)之间的准稳态帧中。掩蔽阈值可使用例如一个心理声学模型来计算。比特分配器把比特资源分配到多组子带样本中，以便量化噪声功率低于掩蔽阈值。In general, PCM samples are input into quasi-steady-state frames of duration between 2 and 50 milliseconds (ms). The masking threshold can be calculated using, for example, a psychoacoustic model. The bit allocator allocates bit resources into groups of subband samples such that the quantization noise power is below the masking threshold.

转换步骤包括：使用一个有选择地在高、低频率分辨率模式下切换的分辨率滤波器组。检测暂态，当没有检测到暂态时使用高频率分辨率模式；然而，当检测到暂态时，分辨率滤波器组被切换到低频率分辨率模式。随着把分辨率滤波器组切换到低频率分辨率模式，子带样本就被分成稳态段。每个稳态段的频率分辨率用任意分辨率的滤波器组或自适应差分脉码调制来修整。The conversion step includes using a resolution filter bank that selectively switches between high and low frequency resolution modes. To detect transients, the high frequency resolution mode is used when no transients are detected; however, when a transient is detected, the resolution filter bank is switched to low frequency resolution mode. With switching the resolution filter bank to low frequency resolution mode, the subband samples are divided into stationary segments. The frequency resolution of each steady-state segment is trimmed with an arbitrary resolution filter bank or adaptive differential pulse code modulation.

量化指数可以在帧中存在暂态时被从新排列以降低比特总数。游程长度编码器可用于编码最佳熵码书的应用边界，可以采用聚类分割算法。The quantization index can be rearranged to reduce the total number of bits when there are transients in the frame. A run-length encoder can be used to encode the application boundary of the optimal entropy codebook, and a clustering and partitioning algorithm can be used.

和/差编码器可以被用来把左右声道对中的子带样本转换到和/差声道对中。此外，联合强度编码器可用于对比源声道来提取联合声道的强度比例因子，把联合声道合并成源声道，并且丢弃联合声道中所有的相关子带样本。A sum/difference encoder may be used to convert subband samples in left and right channel pairs into sum/difference channel pairs. In addition, a joint intensity encoder can be used to compare the source channels to extract the intensity scale factor of the joint channel, merge the joint channels into the source channel, and discard all relevant subband samples in the joint channel.

一般来说，创建一个完整的比特数据流的组合步骤通过在存储或向解码器发送编码数字音频信号之前使用一个多路复用器来执行。Generally, the step of combining to create a complete bitstream is performed using a multiplexer before storing or sending the encoded digital audio signal to a decoder.

解码音频数据比特流的方法包括：如通过使用一个多路解复用器来接收编码音频数据流并解包该数据流。熵码书指数及其各自的应用范围被解码。这可能涉及游程长度和熵解码器。它们还被用来解码量化指数。The method of decoding the audio data bitstream includes receiving the encoded audio data stream and depacketizing the data stream, such as by using a demultiplexer. The entropy codebook indices and their respective ranges of application are decoded. This may involve run-length and entropy decoders. They are also used to decode quantization indices.

当在当前帧中检测到暂态时，量化指数如通过用去交错器来从新排列。子带样本然后从被解码的量化指数中重建。通过使用可在低和高频率分辨率模式之间切换的可变分辨率的合成滤波器组，音频PCM样本从重建的子带样本中被重建。当数据流指示当前帧是用可切换分辨率分析滤波器组以低频率分辨率模式来编码时，可变合成分辨率滤波器组用作一个二级混合滤波器组，其中，第一级包括一个任意分辨率的合成滤波器组或一个逆自适应差分脉码调制，第二级是可变合成滤波器组的低频率分辨率模式。当数据流指示当前帧是用可切换分辨率的分析滤波器组以高频率分辨率模式来编码时，可变分辨率合成滤波器组在高频率分辨率模式下操作。When a transient is detected in the current frame, the quantization indices are rearranged, such as by using a deinterleaver. The subband samples are then reconstructed from the decoded quantization indices. Audio PCM samples are reconstructed from the reconstructed subband samples using a variable resolution synthesis filterbank switchable between low and high frequency resolution modes. When the data stream indicates that the current frame is coded in low frequency resolution mode with a switchable resolution analysis filterbank, the variable synthesis resolution filterbank is used as a two-stage hybrid filterbank, where the first stage consists of An arbitrary resolution synthesis filter bank or an inverse adaptive differential pulse code modulation, the second stage is a low frequency resolution mode of variable synthesis filter bank. The variable resolution synthesis filterbank operates in the high frequency resolution mode when the data stream indicates that the current frame is encoded in the high frequency resolution mode with the switchable resolution analysis filterbank.

一个联合强度解码器可用于用联合强度比例因子从源声道子带样本中重建联合声道子带样本。此外，和/差解码器可以被用来从和/差声道子带样本中重建左右声道的子带样本。A joint intensity decoder may be used to reconstruct the joint channel subband samples from the source channel subband samples using the joint intensity scale factor. Furthermore, a sum/difference decoder can be used to reconstruct left and right channel subband samples from the sum/difference channel subband samples.

本发明结果是一个低比特率的数字音频编码系统，其极大地降低了多声道音频信号的比特率以用于有效发送，同时还获得透明的音频信号再现，以致于很难将它与原始信号区分。The result of the present invention is a low bit-rate digital audio coding system which greatly reduces the bit-rate of multi-channel audio signals for efficient transmission, while still achieving a reproduction of the audio signal so transparent that it is difficult to compare it with the original Signal distinction.

本发明的其它特征和优点将参考附图从下列详细说明中变得明显，其通过举例的方式来说明本发明的原理。Other features and advantages of the invention will become apparent from the following detailed description, taken with reference to the accompanying drawings, illustrating by way of example the principles of the invention.

附图说明Description of drawings

下列附图用来说明本发明。在这些附图中：The following drawings serve to illustrate the invention. In these drawings:

图1是一个示意图，描述根据本发明的多声道数字音频信号的编码和解码；Figure 1 is a schematic diagram describing the encoding and decoding of a multi-channel digital audio signal according to the present invention;

图2是一个示意图，说明了根据本发明使用的一个示例性编码器；Figure 2 is a schematic diagram illustrating an exemplary encoder used in accordance with the present invention;

图3是具有任意分辨率的滤波器组的可变分辨率的分析滤波器组的一个示意图；Fig. 3 is a schematic diagram of a variable resolution analysis filter bank with a filter bank of arbitrary resolution;

图4是具有ADPCM的可变分辨率的分析滤波器组的一个示意图；Fig. 4 is a schematic diagram of the analysis filter bank with the variable resolution of ADPCM;

图5是根据本发明的用于可切换MDCT窗口类型的示意图；Fig. 5 is the schematic diagram that is used for switchable MDCT window type according to the present invention;

图6是根据本发明的暂态段的一个示意图；Fig. 6 is a schematic diagram of a transient section according to the present invention;

图7是根据本发明的具有两个分辨率模式的可切换滤波器组的一个应用示意图；7 is a schematic diagram of an application of a switchable filter bank with two resolution modes according to the present invention;

图8是根据本发明的具有三个分辨率模式的可切换滤波器组的一个应用示意图；8 is a schematic diagram of an application of a switchable filter bank with three resolution modes according to the present invention;

类似于图5，图9是根据本发明的用于具有三个分辨率模式的可切换MDCT的其它窗口类型的示意图；Similar to FIG. 5, FIG. 9 is a schematic diagram of other window types for switchable MDCT with three resolution modes according to the present invention;

图10描述了根据本发明的具有三个分辨率模式的可切换MDCT窗口序列的一组例子；Figure 10 depicts a set of examples of switchable MDCT window sequences with three resolution modes according to the present invention;

图11是本发明与先有技术相比的熵码书的确定示意图；Fig. 11 is a schematic diagram of determining the entropy codebook of the present invention compared with the prior art;

图12是根据本发明把码书指数分成很大的段或消除码书指数的孤立的小块区域的示意图；Fig. 12 is a schematic diagram of dividing the codebook index into very large segments or eliminating isolated small block regions of the codebook index according to the present invention;

图13是本发明配备的解码器的示意图；Fig. 13 is a schematic diagram of a decoder equipped in the present invention;

图14是根据本发明的具有任意分辨率的滤波器组的可变分辨率的合成滤波器组的一个示意图；Fig. 14 is a schematic diagram of a variable resolution synthetic filter bank with an arbitrary resolution filter bank according to the present invention;

图15是具有逆ADPCM的可变分辨率合成滤波器组的一个示意图；和Figure 15 is a schematic diagram of a variable resolution synthesis filter bank with inverse ADPCM; and

图16是根据本发明当使用半混合滤波器组或可切换滤波器组+ADPCM时的比特流的结构示意图。FIG. 16 is a schematic diagram of the structure of a bitstream when using a semi-hybrid filter bank or a switchable filter bank+ADPCM according to the present invention.

图17是在处理只间隔一帧的暂态时，短到短转换的长窗口的优点示意图。Figure 17 is a schematic illustration of the advantages of long windows for short-to-short transitions when dealing with transients only one frame apart.

图18是根据本发明当使用三模式可切换滤波器组时的比特流的结构示意图。FIG. 18 is a schematic diagram of the structure of a bitstream when a three-mode switchable filter bank is used according to the present invention.

具体实施方式Detailed ways

如附图中所示，为了说明的目的，本发明涉及一个低比特率数字音频编码和解码系统，其极大地降低了多声道音频信号的比特率以用于有效发送或存储，同时也实现了透明的音频再现。即，多声道编码的音频信号比特率通过使用算法复杂度较低的系统来减小，而且即使是听测专家也无法区分在解码器端上还原的音频信号与原始信号。As shown in the accompanying drawings, for purposes of illustration, the present invention relates to a low bit-rate digital audio encoding and decoding system that greatly reduces the bit rate of a multi-channel audio signal for efficient transmission or storage, while also enabling transparent audio reproduction. That is, the bitrate of a multi-channel encoded audio signal is reduced by using a system with less algorithmic complexity, and even an audiologist cannot distinguish the restored audio signal from the original signal at the decoder end.

如图1中所示，本发明的编码器5将多声道音频信号作为输入并将其编码成比特流，并且极大地降低了比特率以适于在声道容量有限的媒介上发送或存储。只要接收到由编码器5产生的比特流，解码器10就对其进行解码并重建甚至听测专家也不能将其与原始信号区别的多声道音频信号。As shown in Figure 1, the encoder 5 of the present invention takes a multi-channel audio signal as input and encodes it into a bit stream, and greatly reduces the bit rate to be suitable for transmission or storage on a medium with limited channel capacity . As soon as the bitstream produced by the encoder 5 is received, the decoder 10 decodes it and reconstructs a multi-channel audio signal that even audiologists cannot distinguish from the original signal.

在编码器5和解码器10内部，多声道音频信号被作为离散声道来处理。即，每个声道与其它声道同样地来对待，除非清楚地指定了联合声道编码2。这在图1中用极度简化的编码器和解码器结构做出了说明。Inside the encoder 5 and decoder 10, the multi-channel audio signal is processed as discrete channels. That is, each channel is treated the same as other channels, unless joint channel coding 2 is explicitly specified. This is illustrated in Figure 1 with an extremely simplified encoder and decoder structure.

利用这种极度简化的编码器结构，其编码处理过程说明如下。来自每个声道的音频信号首先在分析滤波器组的第一级1中被分解成子带信号。来自所有声道的子带信号被选择性地送到联合声道编码器2，其通过组合对应于来自不同声道的相同频带的子带信号，采用人耳的听觉特性来降低比特率。可以在2中联合编码的子带信号然后被量化并在3中被编码。量化指数或它们的熵编码以及来自所有声道的辅助信息然后在4中被多路复用成一个完整的比特流以用于发送或存储。Utilizing this extremely simplified encoder structure, its encoding process is explained as follows. The audio signal from each channel is first decomposed into subband signals in the first stage 1 of the analysis filter bank. The subband signals from all channels are selectively sent to a joint channel encoder 2, which exploits the auditory characteristics of the human ear to reduce the bit rate by combining subband signals corresponding to the same frequency band from different channels. The subband signals, which can be jointly coded in 2, are then quantized and coded in 3. The quantization indices or their entropy encoding and side information from all channels are then multiplexed in 4 into a complete bitstream for transmission or storage.

在解码端上，比特流首先在6中被多路解复用为辅助信息和量化指数或其熵编码。熵编码在7中被解码(注意：诸如哈夫曼码之类的前缀码的熵解码和多路解复用通常在一个单个步骤中执行)。子带信号在7中利用量化指数和由辅助信息携带的步长被重建。如果在编码器中使用联合声道编码，则联合声道解码在8中被执行。然后，每个声道的音频信号在合成级9中利用子带信号被重建。On the decoding end, the bitstream is first demultiplexed in 6 into side information and quantization indices or its entropy encoding. The entropy codes are decoded in 7 (note: entropy decoding and demultiplexing of prefix codes such as Huffman codes are usually performed in one single step). The subband signal is reconstructed in 7 using the quantization index and the step size carried by the side information. If joint channel coding is used in the encoder, joint channel decoding is performed in 8 . The audio signal for each channel is then reconstructed in a synthesis stage 9 using the subband signals.

上述极度简化的编码器和解码器结构被用来单独说明本发明给出的编码和解码方法的离散特性。实际应用于音频信号每个声道的编码和解码方法差别悬殊并且更加复杂。除非另作说明，则这些方法在音频信号的一个声道环境中被描述如下。The extremely simplified encoder and decoder structures described above are used to separately illustrate the discrete nature of the encoding and decoding methods presented in the present invention. The encoding and decoding methods actually applied to each channel of an audio signal are very different and more complex. Unless otherwise stated, the methods are described below in the context of one channel of an audio signal.

编码器 Encoder

编码音频信号的一个声道的通用方法在图2中被描述如下：A general method of encoding one channel of an audio signal is depicted in Figure 2 as follows:

成帧器11把持续时间从2到50ms的输入PCM样本分成准稳态帧。一帧中PCM样本的确切个数必须是在可变分辨率的时间-频率分析滤波器组13中使用的不同滤波器组的子带最大个数的整数倍。假定子带的最大数是N，那么一帧中PCM样本的个数是The framer 11 divides the input PCM samples of duration from 2 to 50 ms into quasi-stationary frames. The exact number of PCM samples in a frame must be an integer multiple of the maximum number of subbands of the different filter banks used in the variable resolution time-frequency analysis filter bank 13 . Assuming that the maximum number of subbands is N, then the number of PCM samples in a frame is

L＝k·NL＝k·N

其中，k是一个正整数。Among them, k is a positive integer.

暂态分析12检测当前输入帧中暂态的存在并将该信息传递给可变分辨率分析组13。Transient analysis 12 detects the presence of transients in the current input frame and passes this information to variable resolution analysis group 13 .

在这里可以采用任何已知的暂态检测方法。在本发明的一个实施例中，PCM样本的输入帧被送到可变分辨率的分析滤波器组的低频率分辨率模式。让s(m，n)表示来自这个滤波器组的输出样本，其中，m是子带指数而n是子带域中的时间指数(temporal index)。在以下讨论中，术语″暂态检测距离″等指的是为每个时间指数定义的距离测量：Any known transient detection method can be used here. In one embodiment of the invention, an input frame of PCM samples is fed to the low frequency resolution mode of the variable resolution analysis filterbank. Let s(m,n) denote the output samples from this filterbank, where m is the subband index and n is the temporal index in the subband domain. In the following discussion, the term "transient detection distance" etc. refers to the distance measure defined for each time index:

$E E. ((n no)) = = {Σ Σ}_{m m = = 00}^{M m - - 11} | | s the s ((m m,, n no)) | |$

或or

$E E. ((n no)) = = {Σ Σ}_{m m = = 00}^{M m - - 11} {s the s}^{22} ((m m,, n no))$

其中，M是滤波器组的子带个数。其它类型的距离测量也可以用类似的方法被采用。让 $E_{\max} = \underset{n}{Max} E (n)$ 和 $E_{\min} = \underset{n}{Min} E (n)$ 是这个距离的最大和最小值，如果Among them, M is the number of subbands of the filter bank. Other types of distance measurements can also be employed in a similar way. let ${E.}_{\max} = \underset{no}{Max} E. (no)$ and ${E.}_{\min} = \underset{no}{Min} E. (no)$ are the maximum and minimum values of this distance, if

$\frac{E_{\max} - E_{\min}}{E_{\max} + E_{\min}} > Threshold$ (阈值) $\frac{{E.}_{\max} - {E.}_{\min}}{{E.}_{\max} + {E.}_{\min}} > Threshold$ (threshold)

则声明存在暂态，其中，阈值可以被设置为0.5。A transient state is then declared, where the threshold can be set to 0.5.

本发明使用一个可变分辨率的分析滤波器组13。存在许多已知的方法来实现可变分辨率的分析滤波器组。一个突出的方法是使用可以在高、低频率分辨率模式之间切换其操作的滤波器组，高频率分辨率模式用于处理音频信号的稳态段而低频率分辨率模式用于处理暂态。然而，分辨率的切换由于理论和实践的约束不能及时任意地发生。相反，它通常发生在帧分界，即帧用高频率分辨率模式或低频率分辨率模式来处理。如图7中所示，对于暂态帧131，滤波器组已经切换到低频率分辨率模式以避免前向回声效应。因为暂态132本身是很短的，而该帧的前暂态133和后暂态134段又长得多，所以低频率分辨率模式的滤波器组显然与这些稳态段不匹配。这极大地限制了整个帧所能达到的总的编码增益。The present invention uses a variable resolution analysis filter bank 13 . There are many known methods to implement variable resolution analysis filter banks. A prominent method is the use of filter banks whose operation can be switched between high and low frequency resolution modes, the high frequency resolution mode being used to process the steady state segments of the audio signal and the low frequency resolution mode being used to process transient states . However, the switching of resolution cannot happen arbitrarily in time due to theoretical and practical constraints. Instead, it usually occurs at frame boundaries, i.e. frames are processed in either high frequency resolution mode or low frequency resolution mode. As shown in Figure 7, for the transient frame 131, the filter bank has been switched to a low frequency resolution mode to avoid the forward echo effect. Because the transient 132 itself is very short, and the pre-transient 133 and post-transient 134 segments of the frame are much longer, the low frequency resolution mode's filter bank obviously does not match these steady state segments. This greatly limits the total coding gain that can be achieved for the entire frame.

本发明提出了三个方法来解决这个问题。基本思想是在可切换分辨率结构内为暂态帧的稳态部分(stationary majority)提供一个较高频率分辨率。The present invention proposes three methods to solve this problem. The basic idea is to provide a higher frequency resolution for the stationary majority of the transient frame within a switchable resolution structure.

半混合滤波器组Semi-hybrid filter bank

如图3中所示，它实质上是一个混合滤波器组，包括一个可以在高、低频率分辨率模式之间切换的可切换分辨率的分析滤波器组28，并且在低频率分辨率模式24时，后面跟随有一个暂态聚类分割单元25，然后在每个子带中有一个可选的任意分辨率的分析滤波器组26。As shown in Figure 3, it is essentially a hybrid filter bank, including an analysis filter bank 28 with switchable resolution that can be switched between high and low frequency resolution modes, and in the low frequency resolution mode 24, followed by a transient clustering segmentation unit 25, and then an optional analysis filter bank 26 of arbitrary resolution in each subband.

当暂态检测器12没有检测到暂态存在时，可切换分辨率的分析滤波器组28进入低时间分辨率模式27，其确保高频率分辨率以实现高音频信号编码增益，具有强的音调分量。When no transient is detected by the transient detector 12, the resolution-switchable analysis filter bank 28 enters a low temporal resolution mode 27, which ensures high frequency resolution for high audio signal coding gain, with strong tones portion.

当暂态检测器12检测到暂态存在时，可切换分辨率的分析滤波器组28进入高时间分辨率模式24。这确保了用良好的时间分辨率来处理暂态以防止前向回声。如此产生的子带样本如图6中所示被暂态聚类分割部分25分成准稳态段。在以下讨论中，术语″暂态段″等指的是这些准稳态段。这后面是每个子带中的任意分辨率的分析滤波器组26，其子带个数等于每个子带中每个暂态段的子带样本个数。When the transient detector 12 detects the presence of a transient, the resolution switchable analysis filter bank 28 enters the high temporal resolution mode 24 . This ensures that transients are processed with good time resolution to prevent forward echoes. The subband samples thus generated are divided into quasi-stationary segments by the transient clustering segmentation section 25 as shown in FIG. 6 . In the following discussion, the term "transient segment" and the like refer to these quasi-stationary segments. This is followed by an analysis filterbank 26 of arbitrary resolution in each subband equal to the number of subband samples per transient in each subband.

可切换分辨率的分析滤波器组28能用可以在高、低频率分辨率模式之间切换其操作的任何滤波器组来实现。本发明的一个实施例采用了一对DCT，对应于低和高频率分辨率，其转换长度分别为小和大。假定转换长度为M，则类型4的DCT的子带样本被获得为：The switchable resolution analysis filter bank 28 can be implemented with any filter bank whose operation can be switched between high and low frequency resolution modes. One embodiment of the present invention employs a pair of DCTs, corresponding to low and high frequency resolutions, with small and large transform lengths, respectively. Assuming a transform length of M, the subband samples of a DCT of type 4 are obtained as:

$s the s ((m m,, n no)) = = \sqrt{\frac{22}{M m}} {Σ Σ}_{k k = = 00}^{M m - - 11} cos cos [[\frac{π π}{M m} ((k k + + 0.5 0.5)) ((n no + + 0.5 0.5))]] \cdot \cdot x x ((mM mM + + k k))$

其中，x(.)是输入PCM样本。其它形式的DCT可以用来代替类型4的DCT。where x(.) is the input PCM sample. Other forms of DCT can be used instead of Type 4 DCT.

因为DCT倾向引起成块效应，所以本发明的一个较好的实施例是采用改进的DCT(MDCT)：Because DCT tends to cause blocking effects, a preferred embodiment of the invention is to use a Modified DCT (MDCT):

$s the s ((m m,, n no)) = = \sqrt{\frac{22}{M m}} {Σ Σ}_{k k = = 00}^{22 M m - - 11} cos cos [[\frac{π π}{M m} ((k k + + 0.5 0.5 + + \frac{M m}{22})) ((n no + + 0.5 0.5))]] \cdot &Center Dot; w w ((k k)) \cdot &Center Dot; x x ((mM mM - - M m + + k k))$

其中，w(.)是窗口函数。Among them, w(.) is the window function.

窗口函数在每半个窗口中必须是功率对称的：The window function must be power symmetric in each half window:

w²(k)+w²(M-k)＝1 k＝0，...，M-1w ² (k)+w ² (Mk)=1 k=0, . . . , M-1

w²(k+M)+w²(2M-1-k)＝1 k＝0，...，M-1w ² (k+M)+w ² (2M-1-k)=1 k=0, . . . , M-1

以便于保证理想的重建。In order to ensure ideal reconstruction.

尽管满足上述情况的任何窗口都可以被使用，但只有下列正弦窗口Although any window that satisfies the above conditions can be used, only the following sine windows

$w w ((k k)) = = &PlusMinus; &PlusMinus; sin sin [[((k k + + 0.5 0.5)) \frac{π π}{22 M m}]] k k = = 00,, . . . . . .,, 22 M m - - 11$

具有良好的特性，即输入信号中的直流分量被集中到第一变换系数。It has the good property that the DC component in the input signal is concentrated to the first transform coefficient.

为了当MDCT在高、低频率模式或长、短窗口之间切换时能保持理想的重建，长、短窗口的重叠部分必须有相同的形状。In order to maintain ideal reconstruction when MDCT switches between high and low frequency modes or between long and short windows, the overlapping parts of long and short windows must have the same shape.

依赖于输入PCM样本的瞬变特性，编码器可以选择一个长窗口(如图5中的第一窗口61所示)，切换到一个短窗口序列(如图5中的第四窗口64所示)，并返回。图5中的长到短转换的长窗口62和短到长转换的长窗口63是桥接这类切换所需要的。当两个暂态非常接近但不是接近到足以保证短窗口的连续应用时，图5中长窗口65的短到短转换是有用的。编码器需要向解码器传送被用于每一帧的窗口类型，以便相同的窗口被用来重建PCM样本。Depending on the transient characteristics of the input PCM samples, the encoder can choose a long window (shown as the first window 61 in Figure 5), switch to a sequence of short windows (shown as the fourth window 64 in Figure 5) , and returns. The long window 62 for the long-to-short transition and the long window 63 for the short-to-long transition in FIG. 5 are needed to bridge this type of switching. The short-to-short transition of the long window 65 in Figure 5 is useful when the two transients are very close but not close enough to warrant continuous application of the short window. The encoder needs to communicate to the decoder the type of window used for each frame, so that the same window is used to reconstruct the PCM samples.

短到短转换的长窗口的优点是可以处理只间隔一帧的邻近暂态。如在图17的顶端67所示，先有技术的MDCT可以处理至少间隔两帧的暂态。如在图17的底部68所示，使用这个短到短转换的长窗口可以将其减少到一帧。The advantage of long windows for short-to-short transitions is that adjacent transients that are only one frame apart can be handled. As shown at the top 67 of Figure 17, the prior art MDCT can handle transients at least two frames apart. As shown at the bottom 68 of Figure 17, using this long window for short-to-short transitions can reduce this to one frame.

本发明然后将执行暂态段25。通过利用二进制函数值从0到1或1到0的变化，暂态段可以由指示暂态位置的二进制函数或聚类分割边界来表示。例如，图6中的准稳态段可以被表示如下：The present invention will then execute the transient segment 25 . By exploiting the change of binary function value from 0 to 1 or 1 to 0, a transient segment can be represented by a binary function or a cluster segmentation boundary indicating the location of the transient. For example, the quasi-steady-state segment in Figure 6 can be represented as follows:

$T T ((n no)) = = \{\begin{matrix} 00,, n no = = 0,1,2,3,4 0,1,2,3,4 \\ 11,, n no = = 5,6,7,8,9 5,6,7,8,9 \\ 00,, n no = = 10,11,12,13,14,15,16 10,11,12,13,14,15,16 \end{matrix}$

注意，T(n)＝0不一定意味着音频信号的能量在时间指数n时很高，反之亦然。在以下讨论各处的函数T(n)被称为″暂态段函数″等。由这个段函数携带的信息必须被直接或者间接地传送到解码器。编码零-一游程长度的游程长度编码是一个有效的选择。对于上面的具体例子，T(n)可以用5、5和7的游程长度代码被传送到解码器。游程长度代码还可以被熵编码。Note that T(n) = 0 does not necessarily mean that the energy of the audio signal is high at time index n, and vice versa. The function T(n) is referred to throughout the following discussion as a "transient segment function" and the like. The information carried by this segment function must be passed directly or indirectly to the decoder. Run-length encoding that encodes zero-one run lengths is an efficient choice. For the specific example above, T(n) can be delivered to the decoder with run length codes of 5, 5 and 7. Run-length codes can also be entropy encoded.

暂态聚类分割部分25可以用任何已知的暂态聚类分割方法来实现。在本发明的一个实施例中，暂态聚类分割可以通过简单地对暂态探测距离取阈值来完成。The transient clustering segmentation part 25 can be realized by any known transient clustering segmentation method. In one embodiment of the present invention, transient clustering and segmentation can be accomplished by simply thresholding the transient detection distance.

阈值可以被设置为Threshold can be set as

$Threshold Threshold = = k k \cdot &Center Dot; \frac{{E E.}_{max max} + + {E E.}_{min min}}{22}$

其中，k是一个可调节的常数。Among them, k is an adjustable constant.

本发明的一个更高级的实施例是根据k均值聚类算法，其包括下列步骤：A more advanced embodiment of the present invention is based on the k-means clustering algorithm comprising the following steps:

1)暂态聚类分割函数T(n)被初始化，利用上述取阈值方法获得的结果。1) The transient clustering and segmentation function T(n) is initialized, and the result obtained by the above-mentioned thresholding method is used.

2)每一类的质心被计算：2) The centroid of each class is calculated:

对于与T(n)＝0相关联的类；

For classes associated with T(n)=0;

对于与T(n)＝1相关联的类。

For the class associated with T(n)=1.

3)暂态聚类分割函数T(n)基于以下规则来分配3) The transient clustering and segmentation function T(n) is assigned based on the following rules

4)进到步骤2。4) Go to step 2.

任意分辨率的分析滤波器组26本质上是一个诸如DCT之类的变换，它的块长等于每个子带段中的样本个数。假定在一帧内每个子带都存在32个子带样本并且它们被分为(9、3、20)，则块长为9、3和20的三个变换将被分别应用到三个子带段中每一个的子带样本。在以下讨论中，术语″子带段″等指的是子带内暂态段的子带样本。第m个子带的最后段(9、3、20)的变换可以用类型4的DCT来说明如下The arbitrary resolution analysis filterbank 26 is essentially a transform such as DCT with a block size equal to the number of samples in each subband segment. Assuming that there are 32 subband samples in each subband in a frame and they are divided into (9, 3, 20), three transforms with block lengths of 9, 3 and 20 will be applied to the three subband segments respectively Each subband sample. In the following discussion, the term "subband segment" and the like refer to subband samples of a transient segment within a subband. The transformation of the last segment (9, 3, 20) of the m-th subband can be described using a DCT of type 4 as follows

$u u ((m m,, n no)) = = \sqrt{\frac{22}{2020}} {Σ Σ}_{k k = = 00}^{2020 - - 11} cos cos [[\frac{π π}{2020} ((k k + + 0.5 0.5)) ((n no + + 0.5 0.5))]] \cdot &Center Dot; s the s ((m m,, 1212 + + k k))$

这个转换将增加每个暂态段内的频率分辨率，所以可以期待一个良好的编码增益。然而在许多情况下，编码增益小于1或者太小，则有利的决策是丢弃这类变换结果并经由辅助信息通知解码器这个决策。由于与辅助信息相关的开销，如果是否丢弃转换结果的决定是根据一组子带段，则它可以改进总的编码增益，即一个比特被用来为一组子带段而不是每个子带段传送这个决策。This conversion will increase the frequency resolution within each transient segment, so a good coding gain can be expected. In many cases, however, the coding gain is less than 1 or too small, and it is an advantageous decision to discard such transformation results and inform the decoder of this decision via side information. Due to the overhead associated with the side information, it can improve the overall coding gain if the decision whether to discard a conversion result is based on a set of subband segments, i.e. one bit is used for a set of subband segments instead of each subband segment Send this decision.

在以下讨论中，术语″量化单元″等指的是属于相同心理声学临界频带和暂态段内的一组相连的子带样本。量化单元可以是用于上述决策制定的子带段的一个良好分组。如果这个被使用，则对量化单元中所有的子带段来计算总编码增益。如果编码增益大于1或某些其它的较高阈值，则为量化单元中所有的子带段保留转换结果。否则，该结果被丢弃。只需用一个比特向解码器传送这个用于量化单元中所有子带段的决策。In the following discussion, the terms "quantization unit" and the like refer to a group of contiguous subband samples belonging to the same psychoacoustically critical band and transient. A quantization unit may be a good grouping of subband segments for the above decision making. If this is used, the total coding gain is calculated for all subband segments in the quantization unit. If the coding gain is greater than 1 or some other upper threshold, the conversion result is kept for all subband segments in the quantization unit. Otherwise, the result is discarded. Only one bit is required to convey this decision for all subband segments in the quantization unit to the decoder.

可切换滤波器组+ADPCM Switchable Filter Bank + ADPCM

如图4中所示，它基本上与图3中的相同，只不过任意分辨率的分析滤波器组26被ADPCM29所替代。是否应用ADPCM的决定又是根据诸如量化单元之类的一组子带段，以便于降低辅助信息的成本。该组子带段甚至可以共享一组预测系数。在此处可以使用量化预测系数的已知方法，比如包括LAR(对数面积比)、IS(反正弦)以及LSP(线谱对)。As shown in FIG. 4, it is basically the same as in FIG. 3, except that the arbitrary resolution analysis filter bank 26 is replaced by ADPCM 29. The decision whether to apply ADPCM is again based on a set of subband segments, such as quantization units, in order to reduce the cost of side information. The set of subband segments may even share a set of prediction coefficients. Known methods of quantizing the prediction coefficients can be used here, including LAR (Log Area Ratio), IS (Inverse Sine), and LSP (Line Spectral Pair), for example.

三模式可切换的滤波器组Three-mode switchable filter bank

不同于只有高、低分辨率模式的常见可切换滤波器组，这个滤波器组可以在高、中、低分辨率模式之间切换其操作。高、低频率分辨率模式分别是用于稳态和暂态帧，而且遵循与双模式可切换滤波器组相同的一类原则。中间分辨率模式的主要意图是向暂态帧内的稳态段提供较好的频率分辨率。因此，在一个暂态帧内，低频率分辨率模式被用于暂态段，而中间分辨率模式被用于该帧的剩余部分。不同于先有技术，对于单个帧的音频数据，本发明可切换滤波器组以两个分辨率模式操作。中间分辨率模式还可以被用来处理具有平滑暂态的帧。Unlike common switchable filter banks that only have high and low resolution modes, this filter bank can switch its operation between high, medium and low resolution modes. High and low frequency resolution modes are used for steady-state and transient frames, respectively, and follow the same class of principles as the dual-mode switchable filter bank. The main purpose of the intermediate resolution mode is to provide better frequency resolution to the steady state segments within the transient frame. Thus, within a transient frame, the low frequency resolution mode is used for the transient segment and the medium resolution mode is used for the remainder of the frame. Unlike the prior art, the switchable filter bank of the present invention operates in two resolution modes for a single frame of audio data. Intermediate resolution mode can also be used to process frames with smooth transients.

在以下讨论中，术语″长块″等指的是滤波器组在每个时刻在高频率分辨率模式下输出的一个样本块：术语″中块″等指的是滤波器组在中频分辨率模式下每个时刻输出的一样本块；术语″短块″等指的是滤波器组在低频率分辨率模式下每个时刻输出的一样本块。三种帧可以用这三种定义被描述如下：In the following discussion, the term "long block" etc. refers to a block of samples output by the filter bank at each time instant in high frequency resolution mode: the term "medium block" etc. refers to the filter bank output at medium frequency resolution A block of samples output at each moment in mode; the term "short block" etc. refers to a block of samples output at each moment of the filter bank in low frequency resolution mode. Three types of frames can be described using these three definitions as follows:

·滤波器组以高频率分辨率模式操作来处理的稳态帧，这类帧中的每一帧通常包括一个或多个长块；Steady state frames processed by the filter bank operating in a high frequency resolution mode, each of such frames typically comprising one or more long blocks;

·滤波器组以高、中时间分辨率模式操作来处理的具有暂态的帧，这类帧中的每一帧都包括几个中块和几个短块，所有短块的样本总数等于一个中块的样本总数；Frames with transients processed by the filter bank operating in high and medium temporal resolution modes, each of which includes several medium blocks and several short blocks, the total number of samples of all short blocks is equal to one the total number of samples in the block;

·滤波器组以中间分辨率模式下操作来处理的具有平滑暂态的帧，这类帧中的每一帧都包括几个中块。• Filter banks operating in intermediate resolution mode to process frames with smooth transients, each of which includes several intermediate blocks.

这个新方法的优点在图8中被示出。图8基本上与图7相同，只不过原先在图7中在低频率分辨率模式下处理的许多段(141、142和143)现在由中频率分辨率模式来处理。因为这些段是稳态的，所以中频率分辨率模式显然比低频率分辨率模式更加匹配，因此可以期待较高的编码增益。The advantages of this new approach are shown in FIG. 8 . Figure 8 is essentially the same as Figure 7, except that many of the segments (141, 142 and 143) that were originally processed in the low frequency resolution mode in Figure 7 are now processed by the medium frequency resolution mode. Because these segments are steady state, the medium frequency resolution mode is clearly better matched than the low frequency resolution mode, so higher coding gain can be expected.

本发明的一个实施例采用具有小、中、大块长的三元组DCT，分别对应于低、中、高频率的分辨率模式。One embodiment of the present invention employs a triplet DCT with small, medium, and large block lengths, corresponding to low, medium, and high frequency resolution modes, respectively.

本发明的一个较好实施例(无成块效应)采用具有小、中、大的块长的三元组MDCT。由于引入了中间分辨率模式，除了图5中的窗口类型之外还提供了图9中所示的窗口类型。这些窗口被描述如下：A preferred embodiment of the invention (without blocking) uses a triplet MDCT with small, medium and large block lengths. Due to the introduction of the intermediate resolution mode, the window types shown in Fig. 9 are provided in addition to the window types in Fig. 5 . These windows are described as follows:

·中窗口151；· Medium window 151;

·长到中转换的长窗口152：作为一个长窗口，其桥接从长窗口到中窗口的转换。·Long window 152 for long-to-medium transition: as a long window, it bridges the transition from long window to medium window.

·中到长转换的长窗口153：作为一个长窗口，其桥接从中窗口到长窗口的转换。• Long window 153 for medium-to-long transition: as a long window, it bridges the transition from medium window to long window.

·中到中转换的长窗口154：作为一个长窗口，其桥接从中窗口到另一个中窗口的转换。• Long window 154 for medium-to-medium transition: as a long window, it bridges the transition from a medium window to another medium window.

·中到短转换的中窗口155：作为一个中窗口，其桥接从中窗口到短窗口的转换。• Medium window 155 for medium to short transition: as a medium window, it bridges the transition from medium window to short window.

·短到中转换的中窗口156：作为一个中窗口，其桥接从短窗口到中窗口的转换。• Medium window 156 for short to medium transition: as a medium window, it bridges the transition from short window to medium window.

·中到短转换的长窗口157：作为一个长窗口，其桥接从中窗口到短窗口的转换。• Long window 157 for medium to short transition: as a long window, it bridges the transition from medium window to short window.

·短到中转换的长窗口158：作为一个长窗口，其桥接从短窗口到中窗口的转换。• Long window 158 for short to medium transition: as a long window, it bridges the transition from short to medium window.

注意：类似于图5中短到短转换的长窗口65，中到中转换的长窗口154、中到短转换的长窗口157、和短到中转换的长窗口158可使三模式MDCT处理间隔一帧的暂态。NOTE: Similar to the long window 65 for the short-to-short transition in FIG. Transient for one frame.

图10示出窗口序列的一些例子。161举例说明了这个实施例用中分辨率167处理慢暂态的能力，而162到166说明了向暂态分配精细的时间分辨率168、向同一帧内的稳态段分配中时间分辨率169、和向稳态帧分配高频率分辨率170的能力。Figure 10 shows some examples of window sequences. 161 illustrates the ability of this embodiment to handle slow transients with medium resolution 167, while 162 to 166 illustrate the assignment of fine temporal resolution 168 to transients and medium temporal resolution 169 to steady state segments within the same frame , and the ability to assign high frequency resolution 170 to steady state frames.

常见的和/差编码方法14可以在这里被应用。例如，一个简单使用方法如下：The usual sum/difference coding method 14 can be applied here. For example, a simple usage is as follows:

和声道＝0.5(左声道+右声道) Harmony channel = 0.5 (left channel + right channel)

常见的联合强度编码方法15可以在这里被应用。一个简单的方法可以是Common joint strength coding methods15 can be applied here. A simple way can be

·用源和联合声道的和来替换源声道。• Replace the source channel with the sum of the source and joint channels.

·将其调整为与量化单元内的原始源声道相同的能量级Adjust it to the same energy level as the original source channel within the quantization unit

·丢弃量化单元内联合声道的子带样本，只把比例因子(被称为″引导向量″或本发明中的″比例因子″)的量化指数传送到解码器，其被定义为：Subband samples of the joint channel within the quantization unit are discarded and only the quantization index of the scale factor (called "steering vector" or "scale factor" in the present invention) is passed to the decoder, which is defined as:

诸如对数之类的引导向量的非均匀量化将被用来匹配人耳的听觉特性。熵编码可以被应用于引导向量的量化指数。Non-uniform quantization of steering vectors such as logarithms will be used to match the auditory properties of the human ear. Entropy coding can be applied to the quantization indices of the steering vectors.

为了避免源和联合声道在它们的相位差接近180度的情况下的抵消效应，可以在它们被合计形成联合声道时应用极性：To avoid canceling effects of source and joint channels when they are nearly 180 degrees out of phase, polarity can be applied when they are summed to form the joint channel:

和声道＝源声道+极性·联合声道Harmony channel = source channel + polarity combined channel

极性还必须被传送到解码器。Polarity must also be communicated to the decoder.

心理声学模型23基于人耳的听觉特性来计算音频样本的当前输入帧的掩蔽阈值，低于掩蔽阈值的量化噪声不太可能被听到。在这里可以使用任何常见的心理声学模型，但是本发明要求它的心理声学模型对每个量化单元都输出一个掩蔽阈值。The psychoacoustic model 23 calculates a masking threshold for the current input frame of audio samples based on the auditory characteristics of the human ear, below which quantization noise is less likely to be heard. Any common psychoacoustic model can be used here, but the present invention requires its psychoacoustic model to output a masking threshold for each quantization unit.

全局比特分配器16全局地向每个量化单元分配一帧可用的比特资源，以便每个量化单元中的量化噪声功率低于其各自的掩蔽阈值，它通过调节其量化步长来控制每个量化单元的量化噪声功率。量化单元内的所有子带样本都用相同的步长来量化。The global bit allocator 16 globally allocates a frame of available bit resources to each quantization unit so that the quantization noise power in each quantization unit is below its respective masking threshold, and it controls each quantization by adjusting its quantization step size Quantization noise power of the unit. All subband samples within a quantization unit are quantized with the same step size.

在这里可以采用所有已知的比特分配方法。这类方法之一是有名的Water Filling算法。它的基本思想是发现其QNMR(量化噪声掩蔽比)最高的量化单元，并减少分配给该量化单元的步长以降低量化噪声。它重复这个处理直到所有量化单元的QNMR都小于1(或任何其它的阈值)或当前帧的比特资源耗尽为止。All known bit allocation methods can be used here. One such method is the well-known Water Filling algorithm. Its basic idea is to find the quantization unit whose QNMR (quantization noise masking ratio) is the highest, and reduce the step size assigned to this quantization unit to reduce the quantization noise. It repeats this process until the QNMR of all quantization units is less than 1 (or any other threshold) or the current frame's bit resources are exhausted.

量化步长本身必须被量化以使其可以被打包到比特流中。诸如对数之类的非均匀量化将被用来匹配人耳的听觉特性。熵编码可以被应用于步长的量化指数。The quantization step itself must be quantized so that it can be packed into the bitstream. A non-uniform quantization such as logarithmic would be used to match the auditory properties of the human ear. Entropy coding can be applied to the quantization index of the step size.

本发明使用全局比特分配16提供的步长来量化每个量化单元17内的所有子带样本。在这里可以应用所有的线性或非线性的、均匀或非均匀的量化方案。The present invention quantizes all subband samples within each quantization unit 17 using the step size provided by the global bit allocation 16 . All linear or non-linear, uniform or non-uniform quantization schemes can be used here.

只有在当前帧中存在暂态时，才可以选择性地调用交错18。让x(m，n，k)是第m个准稳态段和第n个子带中的第k个量化指数。(m，n，k)通常是量化指数被排列的顺序。交错单元18重新排序量化指数以便它们被排列为(n，m，k)。这样做的动机是量化指数的从新排列可以使得这些编码指数所需的比特数比不交错指数时少。是否调用交错的决策需要作为辅助信息传送到解码器。Interleave 18 can optionally be invoked only if there is a transient in the current frame. Let x(m,n,k) be the mth quasi-stationary segment and the kth quantization index in the nth subband. (m, n, k) is usually the order in which the quantization indices are arranged. The interleaving unit 18 reorders the quantization indices so that they are arranged as (n, m, k). The motivation for this is that the rearrangement of the quantized indices allows for encoding these indices with fewer bits than would be required if the indices were not interleaved. The decision whether to invoke interleaving needs to be passed to the decoder as side information.

在先前的音频编码算法中，熵码书的应用范围与量化单元相同，所以熵码书由量化单元内的量化指数来确定(参见图11的顶端)。因此没有用于优化的空间。In previous audio coding algorithms, the application range of the entropy codebook is the same as that of the quantization unit, so the entropy codebook is determined by the quantization index within the quantization unit (see the top of FIG. 11 ). So there is no room for optimization.

本发明在这方面是完全不同的。它在进行到码书选择时忽略了量化单元的存在。相反，它把最佳码书分配给每个量化指数19，因此本质上把量化指数转换成了码书指数。然后，它把这些码书指数分成较大的段，而段边界定义了码书应用的范围。显然，这些码书应用范围与由量化单元确定的范围相差悬殊。它们仅仅是基于量化指数的品质，因而所选择的码书更适合量化指数。因此，只需要较少的比特把量化指数传送到解码器。The present invention is quite different in this respect. It ignores the presence of quantization units when proceeding to codebook selection. Instead, it assigns the best codebook to each quantization index19, thus essentially converting the quantization index into a codebook index. It then divides these codebook indices into larger segments, and the segment boundaries define the extent to which the codebook applies. Obviously, the range of application of these codebooks is quite different from the range determined by the quantization unit. They are only based on the quality of the quantized index, so the selected codebook is more suitable for the quantized index. Therefore, fewer bits are required to convey the quantization index to the decoder.

这个方法对比于先前技术的优点在图11中被说明。让我们看看图中最大的量化指数。它属于量化单元d并且利用先前的方法要选择一个大码书，这个大码书显然不是最佳的，因为量化单元d中的大多数指数要小得多。另一方面，通过使用本发明的新方法，相同的量化指数被分成段C，所以它与其它的大量化指数共享一个码书。此外，段D中的所有量化指数都很小，所以一个小码书将被选择。因此，需要较少比特来编码量化指数。The advantages of this approach over the prior art are illustrated in FIG. 11 . Let's look at the largest quantized exponent in the graph. It belongs to quantization unit d and with the previous method it is necessary to choose a large codebook which is obviously not optimal since most exponents in quantization unit d are much smaller. On the other hand, by using the new method of the present invention, the same quantization index is divided into segment C so that it shares a codebook with other quantization indices. Also, all quantization exponents in segment D are small, so a small codebook will be chosen. Therefore, fewer bits are required to encode the quantization index.

现在参见图12，先有技术的系统只须把码书指数作为辅助信息传送到解码器，因为它们的应用范围与预定的量化单元相同。然而，本发明的方法除了传送码书指数之外还需要把码书应用范围作为辅助信息传送到解码器，因为它们独立于量化单元。如果处理不当，则这个额外开销可能会以更多的比特用于整个辅助信息和量化指数而结束。因此，把码书指数分成大的段对于控制这个开销来说是相当关键的，因为大段意味着较少个数的码书指数及其应用范围需要被传送到解码器。Referring now to FIG. 12, the prior art system only needs to communicate the codebook indices as side information to the decoder, since they apply to the same range as the predetermined quantization units. However, the method of the present invention needs to transmit the codebook application range as side information to the decoder in addition to the codebook index, since they are independent of the quantization unit. If not handled properly, this overhead can end up with more bits for the overall side information and quantization index. Therefore, dividing the codebook indices into large segments is quite critical to control this overhead, because a large segment means that a smaller number of codebook indices and their application ranges need to be transmitted to the decoder.

本发明的一个实施例用下列步骤来完成这个码书选择的新方案：An embodiment of the present invention uses the following steps to accomplish this new scheme for codebook selection:

1)把量化指数分块成区组，每个区组包括P个量化指数。1) The quantized indices are divided into blocks, and each block includes P quantized indices.

2)确定每个区组最大码书需求。对于对称量化器来说，这通常由每个区组内最大的绝对量化指数来表示：2) Determine the maximum codebook requirement for each block. For symmetric quantizers, this is usually represented by the largest absolute quantization exponent within each block:

其中I(.)是量化指数；where I(.) is the quantitative index;

3)把最小码书分配给那个可以容纳最大码书需求的区组：3) Assign the smallest codebook to the block that can accommodate the largest codebook requirement:

4)通过把那些码书指数比其近邻小的孤立的小块区域的码书指数提升到其近邻的码书指数的最小值的方法而把这些孤立的小块区域清除掉。这在图12中由映射71到72、73到74、77到78以及79到80来说明。深入对应于零量化指数的码书指数中的孤立的小块区域可以从这个处理中被除去，因为这个码书指示没有代码需要被传送。这在图12中被描述为75到76的映射。这个步骤明显地降低了需要被传送到解码器的码书指数的个数及其应用范围。4) By raising the codebook index of those isolated small block areas whose codebook index is smaller than its neighbors to the minimum value of the codebook index of its neighbors, these isolated small block areas are removed. This is illustrated by maps 71 to 72 , 73 to 74 , 77 to 78 and 79 to 80 in FIG. 12 . Isolated small-block regions deep in the codebook index corresponding to zero quantization index can be eliminated from this process, since this codebook indicates that no code needs to be transmitted. This is depicted as a 75 to 76 mapping in FIG. 12 . This step significantly reduces the number of codebook indices that need to be transmitted to the decoder and its scope of application.

本发明的一个实施例采用游程长度代码来编码码书应用范围，并且游程长度代码还可以用熵代码来编码。One embodiment of the present invention uses run-length codes to encode the codebook application range, and the run-length codes can also be encoded with entropy codes.

所有的量化指数都用由熵码书选择器19确定的码书和和它们各自的应用范围来编码20。All quantization indices are coded 20 using the codebook determined by the entropy codebook selector 19 and their respective application ranges.

熵编码可以用各种哈夫曼码书来实现。当一个码书中的量化级数很小时，多个量化指数被归集(blocked)到一起以形成一个大的哈夫曼码书。当量化级的个数(number of quantization levels)太大时(例如超过200)，则采用递归索引。对此，一个大的量化指数q被表示为Entropy coding can be implemented with various Huffman codebooks. When the number of quantization levels in a codebook is small, multiple quantization indices are blocked together to form a large Huffman codebook. When the number of quantization levels is too large (for example, more than 200), recursive indexing is used. For this, a large quantization exponent q is denoted as

q＝m·M+rq＝m·M+r

其中，M是模，m是商，而r是余数。只有m和r需要被传送到解码器。它们中的一个或其两者都可以用哈夫曼码来编码。where M is the modulus, m is the quotient, and r is the remainder. Only m and r need to be passed to the decoder. One or both of them can be encoded with Huffman codes.

熵编码可以用各种各样的算术码书来实现。当量化级个数太大时(例如超过200)，递归索引也将被使用。Entropy coding can be implemented with various arithmetic codebooks. When the number of quantization levels is too large (for example, more than 200), the recursive index will also be used.

其它类型的熵编码也可以被用于上述的哈夫曼和算术编码。Other types of entropy coding can also be used for Huffman and arithmetic coding as described above.

不经过熵编码而直接打包全部或部分量化指数也是一个好的选择。It is also a good choice to directly pack all or part of the quantization index without entropy coding.

因为量化指数的统计特性在可变分辨率滤波器组采用低和高分辨率模式时明显不同，所以本发明的一个实施例采用两个熵码书库来分别在这两个模式下编码量化指数。第三个库可以被用于中间分辨率模式，它还可以与高或低分辨率模式共享这个库。Because the statistical properties of the quantization indices differ significantly when the variable-resolution filterbank employs low and high-resolution modes, one embodiment of the present invention employs two entropy codebooks to encode the quantization indices in these two modes, respectively. A third library can be used for intermediate resolution modes, and it can also share this library with high or low resolution modes.

本发明把所有量化指数和其它的辅助信息多路复用21成一个完整的比特流。辅助信息包括量化步长、采样率、扬声器配置、帧长、准稳态段的长度、熵码书的代码等。诸如时间码之类的其它辅助信息也可以被打包在比特流中。The present invention multiplexes 21 all quantization indices and other side information into a complete bit stream. Auxiliary information includes quantization step size, sampling rate, loudspeaker configuration, frame length, length of quasi-stationary segment, code of entropy codebook, etc. Other ancillary information such as timecodes can also be packed in the bitstream.

先有技术的系统需要把每个暂态段的量化单元个数传送到解码器，因为量化步长的解包、量化指数的码书、和量化指数自身都取决于此。然而在本发明中，因为量化指数码书及其应用范围的选择由熵码书选择19的专门方法从量化单元中分离出(decouple)，比特流可以用如此的方法来构成，即量化指数可以在需要量化单元的个数之前被解包。一旦量化指数被解包，它们就能被用来重建量化单元的个数。这将在解码器中来解释。Prior art systems need to communicate the number of quantization units per transient to the decoder, since the unpacking of the quantization step size, the codebook of the quantization index, and the quantization index itself all depend on this. In the present invention, however, since the selection of the quantization index codebook and its application range is decoupled from the quantization unit by a special method of entropy codebook selection 19, the bitstream can be constructed in such a way that the quantization index can be Unpacked before the number of quantization units required. Once the quantization indices are unpacked, they can be used to reconstruct the number of quantization units. This will be interpreted in the decoder.

有了上述考虑，本发明的一个实施例在使用半混合滤波器组或可切换滤波器组+ADPCM时使用一个如图16中所示的比特流结构，它本质上包括以下部分：With the above considerations, an embodiment of the present invention uses a bitstream structure as shown in Figure 16 when using a semi-hybrid filter bank or switchable filter bank+ADPCM, which essentially includes the following parts:

·同步字81：指示音频数据帧的开始；Synchronization word 81: indicates the beginning of the audio data frame;

·帧头82：包括音频信号的相关信息，比如采样率、正常的声道数、LFE(低频效应)声道数、扬声器配置等；Frame header 82: includes relevant information of the audio signal, such as sampling rate, normal channel number, LFE (low frequency effect) channel number, speaker configuration, etc.;

·声道1、2、...、N、83、84、85：每个声道的所有音频数据都在此被打包；Channels 1, 2, ..., N, 83, 84, 85: all audio data for each channel is packed here;

·辅助数据86：包括诸如时间码之类的辅助数据；Ancillary data 86: includes ancillary data such as timecode;

·错误检测87：误差检测码在这里被插入以检测当前帧中出现的差错，以便于差错处理程序能够在检测到比特流差错时启动；Error detection 87: error detection code is inserted here to detect errors occurring in the current frame, so that the error handling program can start when a bitstream error is detected;

每个声道的音频数据还被构造如下：The audio data for each channel is also structured as follows:

·窗口类型90：指示诸如图5中所示之类的窗口被用于编码器以便解码器能够使用相同的窗口；Window type 90: indicates that a window such as that shown in Figure 5 is used for the encoder so that the decoder can use the same window;

·暂态位置91：只用于暂态的帧，它指示每个暂态段的位置。如果游程长度代码被使用，则这是每个暂态段的长度被打包的位置；• Transient Position 91: A frame only for transients, which indicates the position of each transient segment. If run-length codes are used, this is where the length of each transient segment is packed;

·交错决策92：一个比特，只在暂态帧中，指示是否交错每个暂态段的量化指数以便于解码器知道是否要去交错量化指数；Interleaving Decision 92: One bit, only in the transient frame, indicating whether to interleave the quantization index of each transient segment so that the decoder knows whether to interleave the quantization index;

·码书指数和应用范围93：它把所有的关于熵码书及对量化指数的应用范围的信息传送，它包括以下部分：Codebook index and application range 93: It conveys all the information about the entropy codebook and the application range of the quantization index, which includes the following parts:

○码书个数101：传送当前声道的每个暂态段的熵码书个数；○Number of codebooks 101: transmit the number of entropy codebooks of each transient segment of the current channel;

○应用范围102：按照量化指数或区组来传送每个熵码书的应用范围，它们还可以用熵代码来编码；○ Application range 102: transmit the application range of each entropy codebook according to the quantization index or block, and they can also be encoded with entropy codes;

○码书指数103：把这个指数传送到熵码书，它们还可以进一步用熵代码来编码；○ Codebook index 103: send this index to the entropy codebook, they can be further encoded with entropy codes;

·量化指数94：传送用于当前声道所有量化指数的熵代码；Quantization index 94: conveys the entropy codes for all quantization indices of the current channel;

·量化步长95：把指数传送到用于每个量化单元的量化步长，它还可以用熵代码来编码。如先前所解释的，步长指数的个数、或量化单元的个数将如49中所示由解码器从量化指数中重建；• Quantization step size 95: The exponent is passed to the quantization step size for each quantization unit, which can also be encoded with an entropy code. As explained previously, the number of step indices, or the number of quantization units, will be reconstructed by the decoder from the quantization indices as shown in 49;

·任意分辨率的滤波器组决策96：一个比特用于每个量化单元，它只出现在可切换分辨率的分析滤波器组28采取低频率分辨率模式时，指示解码器是否要对量化单元内的所有子带段执行任意分辨率的滤波器组重建(51或55)；Arbitrary resolution filter bank decision 96: one bit for each quantization unit, it only occurs when the switchable resolution analysis filter bank 28 adopts low frequency resolution mode, indicating whether the decoder should make a quantization unit Perform arbitrary resolution filter bank reconstruction (51 or 55) for all subband segments within ;

·和/差编码决定97：一个比特用于被和/差编码的量化单元中的一个。它是可选择的并且只出现在采用和/差编码时，它指示解码器是否要执行和/差解码47；• Sum/difference coding decision 97: One bit for one of the quantization units being sum/difference coded. It is optional and only appears when sum/difference encoding is used, it indicates whether the decoder is to perform sum/difference decoding47;

·联合强度编码决策和引导向量98：它传送关于解码器是否要进行联合强度解码的信息，它是可选择的并且只用于被联合强度编码的联合声道的量化单元，并且只出现在编码器采用联合强度编码时，它包括以下部分：· Joint Strength Coding Decision and Steering Vector 98: It conveys information about whether the decoder is to perform joint strength decoding, it is optional and only used for quantization units of the joint channels that are jointly strength coded, and only appears in the encoding When the device adopts joint strength coding, it includes the following parts:

○决策121：每个联合量化单元一个比特，向解码器指示是否要对量化单元中的子带样本进行联合声道解码；○ Decision 121: One bit per joint quantization unit, indicating to the decoder whether to perform joint channel decoding on the subband samples in the quantization unit;

○极性122：每个联合量化单元一个比特，表示联合声道相对于源声道的极性：○ Polarity 122: One bit per joint quantization unit, indicating the polarity of the joint channel relative to the source channel:

○引导向量123：每个联合量化单元一个比例因子，它可以被熵编码；o Steering vector 123: one scaling factor per joint quantization unit, which can be entropy coded;

·辅助数据99：包括诸如动态范围控制之类的辅助信息。· Auxiliary data 99: includes auxiliary information such as dynamic range control.

当三模式可切换的滤波器组被使用时，比特流结构本质上与上述相同，除了：When a three-mode switchable filter bank is used, the bitstream structure is essentially the same as above, except:

·窗口类型90：指示哪一个窗口诸如图5和图9中所示窗口用于编码器以便于解码器能够使用相同的窗口。注意，对于具有暂态的帧来说，这个窗口类型只涉及帧中的最后一个窗口，因为其余的窗口能够从这个窗口类型、暂态位置以及最后帧中使用的最后窗口来推断；• Window Type 90: Indicates which window such as that shown in Fig. 5 and Fig. 9 is used for the encoder so that the decoder can use the same window. Note that for frames with transients, this window type only refers to the last window in the frame, since the rest of the windows can be inferred from this window type, the transient position, and the last window used in the last frame;

·暂态位置91：只出现在具有暂态的帧的情况下。它首先指示这个帧是否具有慢暂态171。如果不是，则它按照中块172并然后按照短块173来指示暂态位置；• Transient position 91 : only occurs in the case of frames with transients. It first indicates whether this frame has slow transients 171 or not. If not, it indicates the transient position according to the middle block 172 and then according to the short block 173;

·任意分辨率的滤波器组决策96：它是不相干的，因此未被使用。• Arbitrary resolution filter bank decision 96: it is irrelevant and therefore not used.

解码器 decoder

本发明的解码器基本上实现了编码器的逆处理，它在图13中被示出并被解释如下。The decoder of the present invention basically implements the inverse process of the encoder, which is shown in Figure 13 and explained below.

一个多路解复用器41从比特流中解码出量化指数，以及如量化步长、采样率、扬声器配置和时间码等之类的辅助信息，。当诸如哈夫曼码之类的前缀熵代码被使用时，这个步骤是一个结合了熵解码的单一步骤。A demultiplexer 41 decodes the quantization index from the bitstream, as well as ancillary information such as quantization step size, sampling rate, loudspeaker configuration and time code, etc. When prefixed entropy codes such as Huffman codes are used, this step is a single step combined with entropy decoding.

量化指数码书解码器42从比特流中解码量化指数及其各自的应用范围的熵码书。The quantization index codebook decoder 42 decodes the entropy codebook of quantization indices and their respective ranges of application from the bitstream.

熵解码器43基于由量化指数码书解码器42提供的熵码书及其各自的应用范围从比特流中解码量化指数。The entropy decoder 43 decodes quantization indices from the bitstream based on the entropy codebooks provided by the quantization index codebook decoder 42 and their respective application ranges.

去交错44只有在当前帧中存在暂态时才被选择性地采用。如果从比特流解包的决策比特指示交错18在编码器中被调用过，则去交错量化指数。否则，不做任何修改地传递量化指数。De-interlacing 44 is selectively employed only when there are transients in the current frame. If the decision bit unpacked from the bitstream indicates that interleaving 18 was invoked in the encoder, the quantization index is deinterleaved. Otherwise, the quantized exponent is passed without modification.

本发明从每个暂态段49的非零量化指数中重建量化单元的个数。让q(m，n)是对于第m个暂态段的第n个子带的量化指数(如果帧中不存在暂态，则只存在一个暂态段)，找出每个暂态段m的具有非零量化指数的最大子带：The present invention reconstructs the number of quantization units from the non-zero quantization index of each transient segment 49 . Let q(m,n) be the quantization index for the nth subband of the mth transient (if there is no transient in the frame, then there is only one transient), find the Maximum subband with nonzero quantization index:

${Band band}_{max max} ((m m)) = = \underset{n no}{max max} {{n no | | q q ((m m,, n no)) &NotEqual; &NotEqual; 00}}$

回想一下，量化单元是由在频率中的临界频带和时间上的暂态段定义的，因此每个暂态段的量化单元个数是能够容纳Band_max(m)的最小临界频带。让频带(Cb)是第Cb个临界频带的最大子带，每个暂态段m的量化单元数可表示如下：Recall that quantization units are defined by critical bands in frequency and transients in time, so the number of quantization units per transient is the minimum critical band that can accommodate Band _max (m). Let the frequency band (Cb) be the largest subband of the Cbth critical frequency band, and the number of quantization units for each transient segment m can be expressed as follows:

$N N ((m m)) \underset{Cb Cb}{min min} {{Cb Cb | | Band band ((Cb Cb)) &GreaterEqual; &Greater Equal; {Band band}_{max max} ((m m))}}$

量化步长解包50从比特流中解包每个量化单元的量化步长。Quantization step unpacking 50 unpacks the quantization step of each quantization unit from the bitstream.

逆量化45利用每个量化单元的各个量化步长从量化指数中重建子带样本。Inverse quantization 45 reconstructs the subband samples from the quantization indices using the individual quantization steps of each quantization unit.

如果比特流指示编码器中调用过联合强度编码15，则联合强度解码46从源声道复制子带样本并将其乘以极性和引导向量以重建联合声道的子带样本：If the bitstream indicates that joint intensity coding15 was invoked in the encoder, joint intensity decoding46 copies the subband samples from the source channels and multiplies them by the polarity and steering vectors to reconstruct the subband samples of the joint channel:

联合声道＝极性·引导向量·源声道Joint Channel = Polarity Steering Vector Source Channel

如果比特流指示和/差编码14在编码器中被调用过，则和/差解码器47从和/差声道中重建左右声道。对应于在和/差编码14中所解释的和/差编码例子，左右声道能够被重建为：If the bitstream indicates that the sum/difference encoding 14 was invoked in the encoder, the sum/difference decoder 47 reconstructs the left and right channels from the sum/difference channels. Corresponding to the sum/difference coding example explained in sum/difference coding 14, the left and right channels can be reconstructed as:

左声道＝和声道+差声道

右声道＝和声道-差声道

本发明的解码器结合了一个可变分辨率的合成滤波器组48，其实质上是用来编码信号的分析滤波器组的逆装置。The decoder of the present invention incorporates a variable resolution synthesis filter bank 48 which is essentially the inverse of the analysis filter bank used to encode the signal.

如果三模式可切换分辨率的分析滤波器组被用于编码器，则其对应的合成滤波器组的操作被唯一地确定并要求相同的窗口序列用于合成处理。If a three-mode switchable-resolution analysis filterbank is used for the encoder, the operation of its corresponding synthesis filterbank is uniquely determined and requires the same window sequence for the synthesis process.

如果半混合滤波器组或可切换滤波器组+ADPCM被用于编码器，则解码过程被描述如下：If a semi-hybrid filter bank or switchable filter bank + ADPCM is used for the encoder, the decoding process is described as follows:

·如果比特流指示当前帧是用可切换分辨率的分析滤波器组28以高频率分辨率模式来编码的，则可切换分辨率的合成滤波器组54因此进入高频率分辨率模式并且从子带样本中重建PCM样本(见图14和图15)。· If the bitstream indicates that the current frame was coded in high frequency resolution mode with switchable resolution analysis filterbank 28, then switchable resolution synthesis filterbank 54 enters high frequency resolution mode accordingly and Reconstruct the PCM samples from the samples (see Figures 14 and 15).

·如果比特流指示当前帧是用可切换分辨率的分析滤波器组28以低频率分辨率模式来编码的，则子带样本首先被送到任意分辨率的合成滤波器组51(图14)或逆ADPCM55(图15)，并且这取决于编码器中哪一个被使用了，然后完成它们各自的合成处理。然后，PCM样本由可切换分辨率的合成滤波器组以低频率分辨率模式53从这些合成的子带样本中重建。If the bitstream indicates that the current frame is coded in low frequency resolution mode with a switchable resolution analysis filterbank 28, the subband samples are first sent to an arbitrary resolution synthesis filterbank 51 (Fig. 14) or inverse ADPCM55 (FIG. 15), and depending on which of the encoders is used, then complete their respective synthesis processes. PCM samples are then reconstructed from these synthesized subband samples by a switchable resolution synthesis filterbank in low frequency resolution mode 53 .

合成滤波器组52、51和55分别是分析滤波器组28、26和29的逆装置。它们的结构与操作处理由分析滤波器组来唯一地确定。因此，无论在编码器中使用什么分析滤波器组，其对应的合成滤波器组必须被用于解码器。Synthesis filter banks 52, 51 and 55 are the inverses of analysis filter banks 28, 26 and 29, respectively. Their structure and operational processing are uniquely determined by the analysis filterbank. Therefore, whatever analysis filterbank is used in the encoder, its corresponding synthesis filterbank must be used in the decoder.

低编码延迟模式Low Encoding Latency Mode

当可切换分辨率的分析滤波器组的高频率分辨率模式被编码器禁止时，帧长可以随后被减小到低频模式下的可切换分辨率的滤波器组的块长或其整数倍，这产生了一个小得多的帧长，导致编码器和解码器操作所需的小得多的延迟。这就是本发明的低编码延迟模式。When the high-frequency resolution mode of the switchable-resolution analysis filterbank is disabled by the encoder, the frame length can then be reduced to the block length of the switchable-resolution filterbank in low-frequency mode or an integer multiple thereof, This yields a much smaller frame size, resulting in much smaller delays required for encoder and decoder operations. This is the low coding delay mode of the present invention.

尽管若干实施例已经为了举例的目的被详细描述，然而在不脱离本发明的范围和精神的前提下可以做出不同的修改。因此，本发明只被附加的权利要求所限制。Although several embodiments have been described in detail for the purpose of illustration, various modifications may be made without departing from the scope and spirit of the invention. Accordingly, the invention is limited only by the appended claims.

Claims

1. A method for encoding and decoding a multi-channel digital audio signal, comprising the steps of:

Cluster the input PCM samples into quasi-stationary frames;

Convert PCM samples to subband samples;

Block quantization of subband samples into numerous quantization indices;

Provide pre-designed code book library;

Based on the local characteristics of the quantization index, the codebook is assigned to multiple groups of quantization indices, so that the application range of the codebook is independent of the block quantization boundary;

Encoding codebook indices and their respective domains of application;

Create a complete encoded data stream;

send the complete encoded data stream;

receiving the encoded data stream and unpacking the data stream;

Decode the quantization index from the data stream;

reconstructing subband samples from the decoded quantization indices; and

Reconstruct audio PCM samples from reconstructed subband samples.

2. The method of claim 1, wherein said codebook assigning step comprises: converting the quantization index into a codebook index by assigning to each quantization index the index of the smallest available codebook capable of accommodating said quantization index, and assigning the code Book index clustering partitioned into multiple application domains.

3. The method of claim 1, wherein the duration of the quasi-steady state frame is between 2 and 50 milliseconds.

4. The method of claim 1, wherein said step of converting comprises using a resolution filter bank selectively switchable between high and low frequency resolution modes.

5. The method of claim 4, including the step of detecting a transient, using the high frequency resolution mode when no transient is detected, and switching to the low frequency resolution mode when a transient is detected.

6. The method of claim 5, wherein the subband samples are clustered and partitioned into quasi-stationary segments as the resolution filter bank is switched to a low frequency resolution mode.

7. The method of claim 4, wherein the resolution filter bank is configured to include a long window capable of bridging transitions from short windows immediately to another short window in order to process transients only one long window apart.

8. The method of claim 1, wherein said converting step includes using a resolution filter bank selectively switchable between a high-resolution mode, a low-resolution mode, and an intermediate-resolution mode, so that in a single frame Use multiple resolutions.

9. The method of claim 8, wherein the resolution filter bank is configured to include a window that bridges an immediate transition from one shorter window to another shorter window so that processing is only separated by one such window transient state.

10. The method of claim 6, comprising trimming the frequency resolution of each steady state segment using an arbitrary resolution filter bank or Adaptive Differential Pulse Code Modulation (ADPCM).

11. The method of claim 1, including the step of calculating a noise masking threshold.

12. The method of claim 11, wherein the calculating step is performed using a psychoacoustic model.

13. The method of claim 1, wherein the step of creating a plurality of quantization indices comprises quantizing the subband samples using a quantization step provided by a bit allocator that allocates bit resources into groups of subband samples to Keep the quantization noise power below a masking threshold.

14. The method of claim 1, comprising converting subband samples of left and right channel pairs into sum and difference channel pairs.

15. The method of claim 14, wherein said transforming step is performed using a sum/difference encoder.

16. The method of claim 1, comprising: extracting an intensity scaling factor of the joint channel compared to the source channel, merging the joint channel to the source channel, and discarding all subband samples in the joint channel.

17. The method of claim 16, wherein said extracting and combining steps are performed using a joint intensity encoder.

18. The method of claim 1, comprising rearranging the quantization index to reduce the total number of bits when a transient is present in a frame.

19. The method of claim 1, comprising: providing a run length encoder for encoding the application range of the codebook.

20. The method of claim 1, comprising: applying a transient clustering segmentation algorithm when a transient is detected.

21. The method of claim 1, wherein said combining step is performed using a multiplexer.

22. The method of claim 1, wherein the encoded data stream includes codebook indices and their application ranges, including the number of codebooks, application ranges and codebook indices.

23. The method of claim 1, wherein the variable synthesis resolution filterbank is used as a secondary Hybrid filterbanks where the first stage consists of an arbitrary resolution synthesis filterbank or an inverse adaptive differential pulse code modulation (ADPCM) and the second stage is a low frequency resolution mode of variable synthesis filterbanks .

24. The method of claim 1, wherein when the data stream indicates that the current frame is coded in a high frequency resolution mode using a switchable resolution analysis filter bank, the variable resolution synthesis filter bank is encoded at a high frequency resolution rate mode operation.

25. The method of claim 1, wherein said step of unpacking the data stream is performed using a demultiplexer.

26. The method of claim 1, wherein said decoding step uses an entropy decoder to decode the entropy-encoded codebook and uses a run-length decoder to decode its respective range of applications from the data stream.

27. The method of claim 1, wherein said decoding step further comprises decoding the subband quantization indices from the data stream using an entropy decoder.

28. The method of claim 27, comprising reconstructing the number of quantization units using the decoded quantization index.

29. The method of claim 1, comprising rearranging the quantization indices when a transient is detected in the current frame.

30. The method of claim 29, wherein said rearranging step is performed using a deinterleaver.

31. The method of claim 1, comprising: reconstructing the subband samples of the joint channel from the subband samples of the source channels using the joint intensity scale factor.

32. The method of claim 31, wherein said reconstructing step is performed using a joint intensity decoder.

33. The method of claim 1, comprising reconstructing subband samples of the left and right channels from the sum and difference subband channels.

34. The method of claim 33, wherein said reconstructing step is performed using a sum/difference decoder.

35. A method for encoding a multi-channel digital audio signal comprising the steps of:

Divide the input PCM samples into quasi-stationary frames;

Convert PCM samples to subband samples;

Create numerous quantization indices by creating block quantization boundaries in subband samples;

Provide pre-designed code library;

Encoding codebook indices and their respective domains of application; and

Create a complete encoded data stream for storage or transmission.

36. The method of claim 35, wherein said step of assigning codebooks includes converting quantization indices to codebook indices by assigning to each quantization index the smallest available codebook that can accommodate the quantization indices.

37. The method of claim 36, wherein the duration of the quasi-steady state frame is between 2 and 50 milliseconds.

38. The method of claim 35, wherein said converting step comprises using a resolution filter bank selectively switchable between high and low frequency resolution modes.

39. The method of claim 38, including the steps of detecting a transient, and using the high frequency resolution mode when a transient is not detected; and switching to the low frequency resolution mode when a transient is detected.

40. The method of claim 39, wherein the subband samples are clustered and partitioned into stationary segments as the resolution filterbank is switched to a low frequency resolution mode.

41. The method of claim 40, comprising trimming the frequency resolution of each steady state segment using an arbitrary resolution filter bank or Adaptive Differential Pulse Code Modulation (ADPCM).

42. The method of claim 41, wherein the resolution filter bank is configured to include a long window for bridging transitions from a short window immediately to another short window in order to process transients only one long window apart.

43. The method of claim 35, wherein said converting step includes the step of using a resolution filter bank selectively switchable between high, low, and medium frequency resolution modes so that when a transient is detected Multiple resolutions can be used in a single frame.

44. The method of claim 43, wherein the resolution filter bank is configured to include a window for bridging transitions from one shorter window immediately to another shorter window, in order to process images separated by only one such window transient.

45. The method of claim 35, wherein the step of creating a plurality of quantization indices comprises: using a quantization step size provided by a bit allocator that allocates bit resources into groups of subband samples such that the quantization noise power is less than A masking threshold.

46. The method of claim 35, including a step of calculating a noise masking threshold.

47. The method of claim 46, wherein the calculating step is performed using a psychoacoustic model.

48. The method of claim 35, comprising converting subband samples of left and right channel pairs to sum and difference channel pairs.

49. The method of claim 48, wherein the converting step is performed using a sum/difference encoder.

50. The method of claim 35, comprising extracting an intensity scaling factor of the joint channel compared to the source channel, merging the joint channel to the source channel, and discarding all associated subband samples of the joint channel.

51. The method of claim 50, wherein the steps of extracting and combining are performed with a joint intensity encoder.

52. The method of claim 35, comprising rearranging the quantization indices to reduce the total number of bits when a transient is present in the frame.

53. The method of claim 35, comprising: providing a run length encoder for encoding application boundaries of the codebook.

54. The method of claim 35, comprising: applying a transient clustering segmentation algorithm when a transient is detected.

55. The method of claim 35, wherein the step of creating the complete data stream is performed using a multiplexer.

56. A method for encoding and transmitting a multi-channel digital audio signal comprising the steps of:

Divide the input PCM samples into quasi-stationary frames;

Convert PCM samples to subband samples with a resolution filter bank that is selectively switchable between high, low and medium frequency resolution modes, so that multiple resolutions can be used in a single frame when a transient is detected;

Detect transients, use high frequency resolution mode when no transients are detected, switch to low or medium frequency resolution mode when transients are detected, where subband samples are divided into The steady state segments, and the frequency resolution of each steady state segment in that frame are adjusted with the low or medium frequency resolution mode in the same frame;

Create multiple quantization indices by creating block quantization boundaries in subband samples;

Provide pre-designed codebook library;

Based on the local characteristics of the quantization index, the codebook is assigned to multiple groups of quantization indices, so that the application range of the codebook has nothing to do with the block quantization boundary;

Codebook indices and their respective areas of application; and

Use a multiplexer to create a complete data stream for storage or transmission.

57. The method of claim 56, wherein the codebook assigning step includes converting quantized indices to codebook indices by assigning to each quantized index the smallest available codebook that accommodates said index.

58. The method of claim 56, wherein the step of creating a plurality of quantization indices comprises using a step size provided by a bit allocator that allocates bit resources into groups of subband samples such that the quantization noise power of each subband is low based on the calculated masking threshold.

59. The method of claim 56, comprising: using a psychoacoustic model to calculate the masking threshold.

60. The method of claim 56, comprising converting subband samples in the left and right channels to the sum and difference channels with a sum/difference encoder.

61. The method of claim 56, comprising: using a joint intensity encoder to extract an intensity scale factor of the joint channel compared to the source channel, merging the joint channel into the source channel, and discarding all correlations in the joint channel Subband samples.

62. The method of claim 56, comprising: providing a run length encoder for encoding application boundaries of the codebook.

63. The method of claim 56, wherein the resolution filter bank is configured to include a window for bridging transitions from one shorter window immediately to another shorter window, so as to process images separated by only one such window transient.

64. A method for decoding an encoded audio data stream comprising the steps of:

Receive an encoded audio data stream and unpack the data stream;

Decode the quantization index from the data stream;

reconstructing subband samples from the decoded quantization indices; and

Reconstruct audio Pulse Code Modulation (PCM) samples from reconstructed subband samples using a variable resolution synthesis filterbank switchable between low and high frequency resolution modes;

Wherein, when the data stream indicates that the current frame is coded with a switchable resolution analysis filter bank in low frequency resolution mode, the variable synthesis resolution filter bank is used as a two-stage hybrid filter bank, wherein the first stage consists of an arbitrary resolution synthesis filter bank or an inverse adaptive differential pulse code modulation (ADPCM), the second stage is a low frequency resolution mode of the variable synthesis filter bank; and

Wherein, when the data stream indicates that the current frame is coded with the switchable resolution analysis filter bank in the high frequency resolution mode, the variable resolution synthesis filter bank operates in the high frequency resolution mode.

65. The method of claim 64, wherein the step of unpacking the data stream is performed with a demultiplexer.

66. The method of claim 64, wherein the step of decoding is performed with an entropy decoder for decoding the entropy codebook and a run-length decoder for decoding its respective application range from the data stream.

67. The method of claim 66, wherein the decoding step further comprises decoding the quantization index from the data stream with an entropy decoder.

68. The method of claim 67, comprising reconstructing the number of quantization units from the decoded quantization indices.

69. The method of claim 67, comprising rearranging the quantization indices when a transient is detected in the current frame.

70. The method of claim 69, wherein the step of rearranging is performed using a deinterleaver.

71. The method of claim 64, comprising: reconstructing the subband samples of the joint channel from the subband samples of the source channels using the joint intensity scale factor.

72. The method of claim 71, wherein the step of reconstructing is performed using a joint intensity decoder.

73. The method of claim 64, comprising reconstructing subband samples of the left and right channels from the sum and difference subband channels.

74. The method of claim 73, wherein the step of reconstructing is performed using a sum/difference decoder.

75. The method of claim 64, wherein the resolution filter bank is configured to include a window for bridging transitions from one short window immediate to another short window in order to process transients only one long window apart.

76. A method for decoding an encoded audio bitstream comprising the steps of:

Receive an encoded audio data stream and unpack the data stream;

Decode the quantization index from the data stream;

reconstructing subband samples from the decoded quantization indices; and

Reconstruct audio Pulse Code Modulation (PCM) samples from reconstructed subband samples by using a variable resolution synthesis filterbank switchable between low, medium, and high frequency resolution modes;

wherein the variable resolution synthesis filterbank operates in the high frequency resolution mode when the data stream indicates that the current frame is encoded with the switchable resolution analysis filterbank in the high frequency resolution mode; and

Among them, when the data stream indicates that the current frame is divided into clusters and these segments are coded with a switchable resolution analysis filter bank in low or medium frequency resolution mode, for each segment of the frame, the variable resolution The synthesis filterbank operates in low or medium frequency resolution mode accordingly.

77. The method of claim 76, wherein the step of unpacking the data stream is performed with a demultiplexer.

78. The method of claim 76, wherein the step of decoding is performed using an entropy decoder for decoding an entropy codebook and a run-length decoder for decoding its respective application range from the data stream.

79. The method of claim 78, wherein the decoding step further comprises decoding the quantization index from the data stream with an entropy decoder.

80. The method of claim 79, comprising reconstructing the number of quantization units from the decoded quantization indices.

81. The method of claim 79, comprising rearranging the quantization indices when a transient is detected in the current frame.

82. The method of claim 81, wherein the step of rearranging is performed using a deinterleaver.

83. The method of claim 76, comprising: reconstructing the subband samples of the joint channel from the subband samples of the source channels using the joint intensity scale factor.

84. The method of claim 83, wherein the step of reconstructing is performed with a joint intensity decoder.

85. The method of claim 76, comprising reconstructing the subband samples of the left and right channels from the sum and difference subband channels.

86. The method of claim 85, wherein the step of reconstructing is performed using a sum/difference decoder.

87. The method of claim 76, wherein the resolution filter bank is configured to include a window capable of bridging an immediate transition from one shorter window to another shorter window to facilitate processing of images separated by only one such window. transient.