WO2005096273A1

WO2005096273A1 - Enhanced audio encoding/decoding device and method

Info

Publication number: WO2005096273A1
Application number: PCT/CN2005/000440
Authority: WO
Inventors: Xingde Pan; Dietz Martin; Andreas Ehret; Holger HÖRICH; Xiaoming Zhu; Michael Schug; Weimin Ren; Lei Wang; Hao Deng; Fredrik Henn
Original assignee: BEIJING E-WORLD TECHNOLOGY Co Ltd; BEIJING MEDIA WORKS Co Ltd; Coding Technologies Sweden AB
Current assignee: BEIJING E-WORLD TECHNOLOGY Co Ltd; BEIJING MEDIA WORKS Co Ltd; Coding Technologies Sweden AB
Priority date: 2004-04-01
Filing date: 2005-04-01
Publication date: 2005-10-13
Anticipated expiration: 2006-10-01
Also published as: EP1873753A1

Abstract

An enhanced audio encoding device is consisted of a psychoacoustical analyzing module, a time-frequency mapping module, a quantization and entropy encoding module, a bit-stream multiplexing module, a signal characteristic analyzing module and a multi-resolution analyzing module, in which the signal characteristic analyzing module is configured to analyze the signal type of the input audio signal, the psychoacoustical analyzing module calculates a masking threshold and a signal-to-masking ratio of the audio signal, and outputs to said quantization and entropy encoding module, the multi-resolution analyzing module is configured to perform an analysis of the multi-resolution based on the signal type, the quantization and entropy encoding module performs a quantization and an entropy encoding of the frequency-domain coefficients under the control of the signal-to-masking ratio, and the bit-stream multiplexing module forms an audio encoding code stream. The device can support the ratio signal whose sampling rate is between 8kHz and 192kHz.

Description

一种增强音频编解码装置及方法技术领域 Enhanced audio codec device and method

本发明涉及音频编解码技术领域，具体地说，涉及一种基于感知模型的增强音频编解码装置及方法。背景技术 The present invention relates to the field of audio codec technology, and in particular to an enhanced audio codec device and method based on a perceptual model. Background technique

为得到高保真的数字音频信号，需对数字音频信号进行音频编码或音频压缩以便于存储和传输。对音频信号进行编码的目的是用尽可能少的比特数实现音频信号的透明表示，例如原始输入的音频信号与经编码后输出的音频信号之间几乎没有差别。 In order to obtain a high-fidelity digital audio signal, the digital audio signal is audio encoded or audio compressed for storage and transmission. The purpose of encoding an audio signal is to achieve a transparent representation of the audio signal with as few bits as possible, for example, there is little difference between the originally input audio signal and the encoded output audio signal.

在二十世纪八十年代初， CD的出现体现了用数字表示音频信号的诸多优点，例如高保真度、大动态范围和强鲁棒性。然而，这些优点都是以很高的数据速率为代价的。例如 CD 质量的立体声信号的教字化所要求的采样率为 44. 1kHz,且每个釆样值需用 15比特进行均匀量化，这样，没有经过压缩的数据速率就达到了 1. 41Mb/s , 如此高的数据速率给数据的传输和存储带来极大的不便，特别是在多媒体应用和无线传输应用的场合下，更是受到带宽和成本的限制。为了保持高质量的音频信号，因此要求新的网络和无线多媒体数字音频系统必须降低数据的速率，且同时不损害音频的质量。针对上述问题，目前已提出了多种既能得到很高压缩比又能产生高保真的音频信号的音频压缩技术，典型的有国际标准化组织 IS0 EC 的 MPEG- 1/- 2/-4 技术、杜比公司的 AC- 2/AC- 3 技术、索尼公司的 ATRAC/MiniDisc/SDDS技术以及朗讯科技的 PAC/EPAC/MPAC技术等。下面选择 MPEG- 2 AAC 技术、杜比公司的 AC- 3技术进行具体的说明。 In the early 1980s, the advent of CDs represented the many advantages of digitally representing audio signals, such as high fidelity, large dynamic range, and robustness. However, these advantages are at the expense of high data rates. For example, the sample rate of the CD-quality stereo signal requires a sampling rate of 44.1 kHz, and each sample value is uniformly quantized with 15 bits, so that the uncompressed data rate reaches 1.41 Mb/s. Such a high data rate brings great inconvenience to the transmission and storage of data, especially in the case of multimedia applications and wireless transmission applications, and is limited by bandwidth and cost. In order to maintain high quality audio signals, new network and wireless multimedia digital audio systems are required to reduce the rate of data without compromising the quality of the audio. In response to the above problems, various audio compression technologies have been proposed which can obtain high compression ratio and high fidelity audio signals, and typically have the MPEG-1/-2/-4 technology of the International Organization for Standardization (ISO) EC0 EC, Dolby's AC-2/AC-3 technology, Sony's ATRAC/MiniDisc/SDDS technology, and Lucent's PAC/EPAC/MPAC technology. The following is a detailed description of MPEG-2 AAC technology and Dolby AC-3 technology.

MPEG- 1技术和 MPEG- 2 BC技术是主要用于单声道及立体声音频信号的高音质编码技术，随着对在较低码率下达到较高编码质量的多声道音频编码的需求的日益增长，由于 MPEG-2 BC编码技术强调与 MPEG- 1技术的后向兼容性，因此无法以低于 540kbps的码率实现五声道的高音质编码。针对这一不足，提出了 MPEG- ² AAC技术，该技术可采用 320kbps 的速率对五声道信号实现较高质量的编码。 MPEG-1 technology and MPEG-2 BC technology are high-quality audio coding techniques primarily used for mono and stereo audio signals, with the need for multi-channel audio coding that achieves higher coding quality at lower bit rates. Increasingly, since MPEG-2 BC encoding technology emphasizes backward compatibility with MPEG-1 technology, it is impossible to achieve high-quality encoding of five channels at a code rate lower than 540 kbps. In response to this shortcoming, the MPEG- ² AAC technology is proposed, which can achieve higher quality encoding for five-channel signals at a rate of 320 kbps.

图 1给出了 MPEG- 2 AAC编码器的方框图，该编码器包括增益控制器 101、滤波器组 102、时域噪声整形模块 103、强度 /耦合模块 104、心理声学模型、二阶后向自适应预测器 105、和 /差立体声模块 106、比特分配和量化编码模块 107以及比特流复用模块 108 , 其中比特分配和量化编码模块 107 进一步包括压缩比 /失真处理控制器、尺度因子模块、非均匀量化器和熵编码模块。 Figure 1 shows a block diagram of an MPEG-2 AAC encoder comprising a gain controller 101, a filter bank 102, a time domain noise shaping module 103, an intensity/coupling module 104, a psychoacoustic model, a second order backward self. An adaptive predictor 105, a / difference stereo module 106, a bit allocation and quantization encoding module 107, and a bitstream multiplexing module 108, wherein the bit allocation and quantization encoding module 107 further includes a compression ratio/distortion processing controller, a scale factor module, Non-uniform quantizer and entropy coding module.

滤波器组 102 采用改进的离散余弦变换（MDCT ), 其分辨率是信号自适应的，即对于稳态信号采用 2048点 MDCT变换，而对于瞬态信号则采用 256点 MDCT变换；这样，对于 ⁴⁸kHz釆样的信号，其最大频率分辨率为 23Hz, 最大时间分辨率为 2. 6ms。同时在滤波器组 102中可以使用正弦窗和 Kaiser-Bessel窗，当输入信号的谐波间隔小于 140Hz时使月正弦窗，当输入信号中很强的成分间隔大于 220Hz时使用 Ka iser- Bessel窗。 The filter bank 102 employs a modified discrete cosine transform (MDCT) whose resolution is signal adaptive, that is, a 2048-point MDCT transform is used for the steady-state signal, and a 256-point MDCT transform is used for the transient signal; thus, for ⁴⁸ The kHz-like signal has a maximum frequency resolution of 23 Hz and a maximum time resolution of 2.6 ms. At the same time, a sine window and a Kaiser-Bessel window can be used in the filter bank 102, and the sinusoidal window is used when the harmonic interval of the input signal is less than 140 Hz, and the Ka iser-Bessel window is used when the strong component interval of the input signal is greater than 220 Hz. .

音频信号经过增益控制器 101后进入滤波器組 102 , 根据不同的信号进行滤波，然后通过时域噪声整形模块 103对滤波器组 102输出的频谱系数进行处理，时域噪声整形技术是在频域上对频谱系数进行线性预测分析，然后依据上述分析控制量化噪声在时域上的^ 状，以此达到控制预回声的目的。 After the audio signal passes through the gain controller 101, it enters the filter bank 102, performs filtering according to different signals, and then processes the spectral coefficients output by the filter bank 102 through the time domain noise shaping module 103. The time domain noise shaping technique is in the frequency domain. Linear prediction analysis is performed on the spectral coefficients, and then the quantization noise is controlled in the time domain according to the above analysis, thereby achieving the purpose of controlling the pre-echo.

强度 /耦合模块 104是用于对信号强度的立体声编码，由于对于高频段（大于 2kHz ) 的信号，听觉的方向感与有关信号强度的变化（信号包络）有关，而与信号的波形无关，即恒包络信号对听觉方向感无影响，因此可利用这一特点以及多声道间的相关信息，将若干声道合成一个共同声道进行编码，这就形成了强度 /耦合技术。 The intensity/coupling module 104 is for stereo encoding of signal strength, since for a high frequency band (greater than 2 kHz) the sense of direction of the hearing is related to the change in signal strength (signal envelope), regardless of the waveform of the signal, That is, the constant envelope signal has no influence on the sense of direction of the hearing, so this feature and the related information between multiple channels can be used to combine several channels into one common channel for encoding, which forms an intensity/coupling technique.

二阶后向自适应预测器 105 用于消除稳态信号的冗余，提高编码效率。和差立体声 ( M/S )模块 106 是针对声道对进行操作，声道对是指诸如双声道信号或多声道信号中钓左右声道或左右环绕声道的两个声道。 M/S模块 106利用声道对中两个声道之间的相关 4生以达到减少码率和提高编码效率的效果。比特分配和量化编码模块 107是通过一个嵌套循环过程实现的，其中非均匀量化器是进行有损编码，而熵编码模块是进行无损编码，这样可以去除冗余和减少相关。嵌套循环包括内层循环和外层循环，其中内层循环调整非均匀量化器的步长直到所提供的比特用完，外层循环则利用量化噪声与掩蔽阈值的比来估计^ 号的编码质量。最后经过编码的信号通过比特流复用模块 108形成编码的音频流输出。 The second-order backward adaptive predictor 105 is used to eliminate redundancy of the steady state signal and improve coding efficiency. The and difference stereo (M/S) module 106 operates for a pair of channels, which are two channels for capturing left and right channels or left and right surround channels, such as a two-channel signal or a multi-channel signal. The M/S module 106 utilizes the correlation between the two channels of the channel pair to achieve the effect of reducing the code rate and improving the coding efficiency. The bit allocation and quantization coding module 107 is implemented by a nested loop process in which the non-uniform quantizer performs lossy coding and the entropy coding module performs lossless coding, which removes redundancy and reduces correlation. The nested loop includes an inner loop and an outer loop, wherein the inner loop adjusts the step size of the non-uniform quantizer until the supplied bits are used up, and the outer loop estimates the encoding of the ^ using the ratio of the quantization noise to the masking threshold. quality. The last encoded signal forms an encoded audio stream output through bitstream multiplexing module 108.

在采样率可伸缩的情况下，输入信号同时进行四频段多相位滤波器组（PQF ) 中产生四个等带宽的频带，每个频带利用 MDCT产生 256个频谱系数，总共有 1024个。在每个频带内都使用增益控制器 101。而在解码器中可以忽略高频的 PQF频带得到低采样率信号。 In the case where the sampling rate is scalable, the input signal simultaneously generates four equal-bandwidth bands in the quad-band polyphase filter bank (PQF), and each band uses MDCT to generate 256 spectral coefficients, for a total of 1024. A gain controller 101 is used in each band. In the decoder, the high frequency PQF band can be ignored to obtain a low sampling rate signal.

图 2给出了对应的 MPEG- 2 AAC解码器的方框示意图。该解码器包括比特流解复用莫块 201、无损解码模块 202、逆量化器 203、尺度因子模块 204、和 /差立体声（M/S )模块 205、预测模块 206、强度 /耦合模块 207、时域噪声整形模块 208、滤波器组 209和增益 ^空制模块 210。编码的音频流经过比特流解复用模块 201进行解复用，得到相应的数据流禾控制流。上述信号通过无损解码模块 202的解码后，得到尺度因子的整数表示和信号谱的量化值。逆量化器 203是一组通过压扩函数实现的非均匀量化器組，用于将整数量化值转换为重建傅。由于编码器中的尺度因子模块是将当前尺度因子与前一尺度因子进行差分，然后将差分值采用 Huffman编码，因此解码器中的尺度因子模块 204进行 Huffman解码可得到相应的差分值，再恢复出真实的尺度因子。 M/S模块 205在边信息的控制下将和差声道转换成左右声道。由于在编码器中采用二阶后向自适应预测器 105消除稳态信号的冗余并提高编码效率，因此在解码器中通过预测模块 206 进行预测解码。强度 /耦合模块 207 在边信息的控制下进行强度 /耦合解码，然后输出到时域噪声整形模块 208 中进行时域噪声整形解码，最后通过滤波器组 209进行综合滤波，滤波器組 209采用逆向改进离散余弦变换 ( IMDCT )技术。 Figure 2 shows a block diagram of the corresponding MPEG-2 AAC decoder. The decoder includes a bitstream demultiplexing block 201, a lossless decoding module 202, an inverse quantizer 203, a scale factor module 204, and a / difference stereo (M/S) module 205, a prediction module 206, an intensity/coupling module 207, The time domain noise shaping module 208, the filter bank 209, and the gain ^empty module 210. The encoded audio stream is demultiplexed by the bitstream demultiplexing module 201 to obtain a corresponding data stream and control stream. After the above signal is decoded by the lossless decoding module 202, an integer representation of the scale factor and a signal spectrum are obtained. Quantitative value. The inverse quantizer 203 is a set of non-uniform quantizers implemented by a companding function for converting integer quantized values into reconstruction Fu. Since the scale factor module in the encoder differentiates the current scale factor from the previous scale factor and then uses the Huffman code for the difference value, the scale factor module 204 in the decoder performs Huffman decoding to obtain the corresponding difference value, and then recovers. A true scale factor. The M/S module 205 converts the difference channel to the left and right channels under the control of the side information. Since the second order backward adaptive predictor 105 is used in the encoder to eliminate the redundancy of the steady state signal and improve the coding efficiency, the prediction decoding is performed by the prediction module 206 in the decoder. The intensity/coupling module 207 performs intensity/coupling decoding under the control of the side information, and then outputs it to the time domain noise shaping module 208 for time domain noise shaping decoding, and finally performs comprehensive filtering by the filter bank 209, and the filter bank 209 adopts the reverse direction. Improved Discrete Cosine Transform (IMDCT) technique.

对于采样频率可伸缩的情况，可通过增益控制模块 210忽略高频的 PQF频带，以得到低采样率信号。 For the case where the sampling frequency is scalable, the high frequency PQF band can be ignored by the gain control module 210 to obtain a low sampling rate signal.

MPEG- 2 AAC编解码技术适用于中高码率的音频信号，但对低码率或甚低码率的音频信号的编码质量较差；同时该编解码技术涉及的编解码模块较多，实现的复杂度较高，不利于实时实现。 MPEG-2 AAC codec technology is suitable for medium and high bit rate audio signals, but the encoding quality of low bit rate or low bit rate audio signals is poor. At the same time, the codec technology involves more codec modules. High complexity is not conducive to real-time implementation.

图 3给出了采用杜比 AC- 3技术的编码器的结构示意图，包括暂态信号检测模块 301、？丈进的离散余弦变换滤波器 MDCT 302、频借包络 /指数编码模块 303、尾数编码模块 304、前向-后向自适应感知模型 305、参数比特分配模块 306和比特流复用模块 307。 Figure 3 shows the structure of the encoder using Dolby AC-3 technology, including the transient signal detection module 301, ? The discrete cosine transform filter MDCT 302, the frequency borrowing envelope/exponential encoding module 303, the mantissa encoding module 304, the forward-backward adaptive sensing model 305, the parameter bit allocation module 306, and the bit stream multiplexing module 307.

音频信号通过暂态信号检测模块 301判别是稳态信号还是瞬态信号，同时通过信号自适应 MDCT滤波器组 302将时域数据映射到频域数据，其中 512点的长窗应用于稳态信号，一对短窗应用于瞬态信号。 The audio signal is judged to be a steady state signal or a transient signal by the transient signal detecting module 301, and the time domain data is mapped to the frequency domain data by the signal adaptive MDCT filter bank 302, wherein a long window of 512 points is applied to the steady state signal. , a pair of short windows applied to the transient signal.

频谱包络 /指数编码模块 303根据码率和频率分辨率的要求釆用三种模式对信号的指数部分进行编码，分别是 D15、 D25和 D45编码模式。 AC- 3技术在频率上对频谱包络采取差分编码，因为最多需要 ± 2增量，每个增量代表 6dB的电平变化，对于第一个直流项采用绝对值编码，其余指数就采用差分编码。在 D15频谱包络指数编码中，每个指数大约需要 1. 33比特， 3个差分组在一个 7比特的字长中编码， D15编码模式通过牺牲时间分辨率而提供精细的频率分辨率。由于只是对相对平稳的信号才需要精细的频率分辨率，而这样的信号在许多块上的频傅保持相对恒定，因此，对于稳态信号， D15 偶尔被传送，通常是每 6个声音块（一个数据帧）的频譜包络被传送一次。当信号频谱不稳定时，需要常更新频侮估计值。估计值采用较小的频率分辨率编码，通常使用 D25和 D45编码模式。 D25编码模式提供了合适的频率分辨率和时间分辨率，每隔一个频率系数就进行差分编码，这样每个指数大约需要 1. 15比特。当频讲在 2至 3个块上都是稳定^ h 然后突然变化时，可以采用 D25编码模式。 D45编码模式是每隔三个频率系数进行差 ^编码，这样每个指数大约需要 0. 58比特。 D45编码模式提供了很高的时间分辨率和较低的频率分辨率，所以一般应用在对瞬态信号的编码中。 The spectral envelope/exponential encoding module 303 encodes the exponential portions of the signals in three modes according to the requirements of the code rate and frequency resolution, namely the D15, D25, and D45 encoding modes. The AC-3 technology differentially encodes the spectral envelope in frequency because at most ± 2 increments are required, each increment represents a 6 dB level change, the first DC term is absolute coded, and the remaining indices are differential. coding. In the D15 spectral envelope index coding, each index requires approximately 1.33 bits, and three difference packets are encoded in a 7-bit word length. The D15 coding mode provides fine frequency resolution by sacrificing temporal resolution. Since fine frequency resolution is only required for relatively stationary signals, and the frequency of such signals remains relatively constant over many blocks, D15 is occasionally transmitted for steady state signals, typically every 6 sound blocks ( The spectral envelope of a data frame is transmitted once. When the signal spectrum is unstable, the frequency estimate needs to be updated frequently. Estimates are encoded with a smaller frequency resolution, usually using the D25 and D45 encoding modes. The D25 encoding mode provides suitable frequency resolution and time resolution, and differential encoding is performed every other frequency coefficient. Each index requires approximately 1.15 bits. When the frequency is stable on 2 to 3 blocks and then suddenly changes, the D25 encoding mode can be used. The D45 coding mode is to perform a difference encoding every three frequency coefficients, so that each index needs about 0.58 bits. The D45 encoding mode provides high temporal resolution and low frequency resolution, so it is generally used in the encoding of transient signals.

前向-后向自适应感知模型 305用于估计每帧信号的掩蔽阈直。其中前向自适应部分仅应用在编码器端，在码率的限制下，通过迭代循环估计一组最圭的感知模型参数，然后这些参数被传递到后向自适应部分以估计出每帧的掩蔽阈值。后自适应部分同时应用在编码器端和解码器端。 The forward-backward adaptive sensing model 305 is used to estimate the masking threshold of each frame of the signal. The forward adaptive part is only applied to the encoder end. Under the limitation of the code rate, a set of the most perceptual model parameters is estimated by iterative loop, and then these parameters are passed to the backward adaptive part to estimate each frame. Mask the threshold. The post-adaptive part is applied to both the encoder side and the decoder side.

参数比特分配模块 306根据掩蔽准则分析音频信号的频 ·¾■包终，以确定给每个尾数分配的比特数。该模块 306利用一个比特池对所有声道进行全局比分配。在尾数编码模块 304 中进行编码时，从比特池中循环取出比特分配给所有的声道，根据可以获得的比特数来调整尾数的量化。为达到压缩编码的目的， AC- 3编码器还采用高频耦合的技术，将被耦合信号的高频部分按照人耳临界带宽划分成 18 个子频段，然后逸择某些声道从某个子带开始进行耦合。最后通过比特流复用模块 307形成 AC- 3音频流出。 The parameter bit allocation module 306 analyzes the frequency of the audio signal according to the masking criteria to determine the number of bits to assign to each mantissa. The module 306 utilizes a bit pool for global ratio allocation of all channels. When encoding is performed in the mantissa encoding module 304, bits are cyclically extracted from the bit pool and allocated to all channels, and the quantization of the mantissa is adjusted in accordance with the number of available bits. For the purpose of compression coding, the AC-3 encoder also uses high-frequency coupling technology to divide the high-frequency part of the coupled signal into 18 sub-bands according to the critical bandwidth of the human ear, and then select some channels from a certain sub-band. Start coupling. Finally, the AC-3 audio stream is formed by the bit stream multiplexing module 307.

图 4给出了采用杜比 AC-3解码的流程示意图。首先输入经 ϋ AC- 3编码器编码的比特流，对比特流进行数据帧同步和误码检测，如果检测到一个数据吴码，则进行误码掩盖或弱音处理。然后对比特流进行解包，获得主信息和边信息，再进亍指数解码。在进行指数解码时，需要有两个边信息：一是打包的指数数目；一个是所采^的指数策略，如 D15、 D25或 D45模式。已经解码的指数和比特分配边信息再进行比特配，指出每个打包的尾数所用的比特数，得到一组比特分配指针,每个比特分配指针对应一个编码的尾数。比特分配指针指出用于尾数的量化器以及在码流中每个尾数占用的比特数。对单个编码的尾数值进行解量化，将其转变成一个解量化的值，占用零比特的尾数被 'ϊ灰复成零，或者在抖动标志的控制下用一个随机抖动值代替。然后进行解耦合的操作，解合是从公共耦合声道和耦合因子中恢复出被耦合声道的高频部分，包括指数和尾数。如在编码端采用 2/0模式编码时，会对某子带采用矩阵处理，那么在解码端需通过矩阵恢复将该子带的和差声道值转换成左右声道值。在码流中包含有每个音频块的动态范围控制将该值进行动态范围压缩，以改变系数的幅度，包括指数和尾数。将频域系数进行逆变换，转变成时域样本，然后对时域样本进行加窗处理,相邻的块进行重叠相加，重构出 PC1M音频信号。当解码输出的声道数小于编码比特流中的声道数时，还需要对音频信号进行混处理，最后输出 PCM 杜比 AC- ³编码技术主要针对高比特率多声道环绕声的信号，但是当 5. 1声道的编码比特率低于 384kbps时，其编码效果较差；而且对于单声道和双声道立体声的编码效率也较低。 Figure 4 shows a schematic diagram of the process using Dolby AC-3 decoding. First, the bit stream encoded by the ϋ AC-3 encoder is input, and data frame synchronization and error detection are performed on the bit stream. If a data code is detected, error concealment or mute processing is performed. Then the bit stream is unpacked to obtain the main information and the side information, and then the index is decoded. In the exponential decoding, two side information is needed: one is the number of indexed packets; one is the indexing strategy of the mining, such as D15, D25 or D45 mode. The decoded index and bit allocation side information are then bit-matched, indicating the number of bits used for each packed mantissa, resulting in a set of bit allocation pointers, each bit allocation pointer corresponding to an encoded mantissa. The bit allocation pointer indicates the quantizer used for the mantissa and the number of bits occupied by each mantissa in the code stream. The single coded mantissa value is dequantized, converted into a dequantized value, and the mantissa occupying zero bits is complexed to zero by zero, or replaced by a random jitter value under the control of the jitter flag. The decoupling operation is then performed, and the decoupling is to recover the high frequency portion of the coupled channel from the common coupling channel and the coupling factor, including the exponent and the mantissa. For example, when the encoding end adopts 2/0 mode encoding, matrix processing is applied to a certain sub-band, then the sub-band and the difference channel value are converted into left and right channel values by matrix recovery at the decoding end. The dynamic range control of each audio block is included in the code stream to dynamically range the value to change the magnitude of the coefficients, including the exponent and mantissa. The frequency domain coefficients are inverse transformed and converted into time domain samples. Then, the time domain samples are windowed, and the adjacent blocks are overlapped and added to reconstruct the PC1M audio signal. When the number of channels outputted by the decoding is smaller than the number of channels in the encoded bit stream, the audio signal needs to be mixed, and finally the PCM is output. Dolby AC- ³ encoding technology is mainly for high bit rate multi-channel surround sound signals, but when the 5.1 bit encoding bit rate is lower than 384kbps, its encoding effect is poor; and for mono and dual sound Channel stereo coding efficiency is also low.

综上，现有的编解码技术无法全面解决从甚低码率、低码率到高码率音频信号以及单声道、默声道信号的编解码质量，实现较为复杂。发明内容 In summary, the existing codec technology cannot comprehensively solve the coding and decoding quality from very low code rate, low bit rate to high bit rate audio signal and mono channel and silent channel signals, and the implementation is complicated. Summary of the invention

本发明所要解决的技术问题在于提供一种增强音频编 /解码的装置及方法，以解决现有技术对于较低码率音频信号的编码效率低、质量差的问题。 The technical problem to be solved by the present invention is to provide an apparatus and method for enhancing audio encoding/decoding to solve the problem of low coding efficiency and poor quality of the lower rate audio signal in the prior art.

本发明的增强音频编码装置，包括心理声学分析模块、时频映射模块、量化和熵编码模块以及比特流复用模块、信号性质分析模块和多分辨率分析模块; 其中所述信号性质分析模块，用于对输入音频信号进行类型分析，并输出到所述心理声学分析模块和所述时频映射模块，同时将信号类型分析结果信息输出到所述比特流复用模块；所述心理声学分析模块，用于计算音频信号的掩蔽阈值和信掩比，并输出到所述量化和熵编码模块；所述时频映射模块，用于将时域音频信号转变成频域系数，并输出到多分辨率分析模块；所述多分辨率分析模块，用于根据所述信号性质分析模块输出的信号类型分析结果，对快变类型信号的频域系数进行多分辨率分析，并输出到量化和熵编码模块；所述量化和熵编码模块，在所述心理声学分析模块输出的信掩比的控制下，用于对频域系数进行量化和熵编码，并输出到所述比特流复用模块；所述比特流复用模块用于将接收到的数据进行复用，形成音频编码码流。 The enhanced audio coding device of the present invention comprises a psychoacoustic analysis module, a time-frequency mapping module, a quantization and entropy coding module, and a bit stream multiplexing module, a signal property analysis module and a multi-resolution analysis module; wherein the signal property analysis module, And performing type analysis on the input audio signal, and outputting to the psychoacoustic analysis module and the time-frequency mapping module, and outputting signal type analysis result information to the bit stream multiplexing module; the psychoacoustic analysis module And calculating a masking threshold and a mask ratio of the audio signal, and outputting to the quantization and entropy encoding module; the time-frequency mapping module, configured to convert the time domain audio signal into a frequency domain coefficient, and output to the multi-resolution The multi-resolution analysis module is configured to perform multi-resolution analysis on the frequency domain coefficients of the fast-changing type signal according to the signal type analysis result output by the signal property analysis module, and output the same to the quantization and entropy coding module. The quantization and entropy coding module, outputted by the psychoacoustic analysis module Under the control of the mask ratio, the frequency domain coefficients are quantized and entropy encoded, and output to the bit stream multiplexing module; the bit stream multiplexing module is configured to multiplex the received data to form an audio code. Code stream.

本发明的增强音频解码装置，包括：比特流解复用模块、熵解码模块、逆量化器组、频率-时间映射模块和多分辨率综合模块；所述比特流解复用模块用于对压缩音频数据流进行解复用，并向所述熵解码模块和所述多分辨率综合模块输出相应的数据信号和控制信号；所述熵解码模块用于对上述信号进行解码处理，恢复谱的量化值，输出到所述逆量化器组；所述逆量化器组用于重建逆量化谱，并输出到所述到多分辨率综合模块；所述多分辨率综合模块用于对逆量化进行多分辨率综合，并输出到所述频率-时间映射模块；所述频率 -时间映射模块用于对谱系数进行频率-时间映射，输出时域音频信号。 The enhanced audio decoding apparatus of the present invention comprises: a bit stream demultiplexing module, an entropy decoding module, an inverse quantizer group, a frequency-time mapping module, and a multi-resolution synthesis module; and the bit stream demultiplexing module is used for compression And demultiplexing the audio data stream, and outputting corresponding data signals and control signals to the entropy decoding module and the multi-resolution synthesis module; the entropy decoding module is configured to decode the foregoing signals, and restore spectrum quantization And outputting to the inverse quantizer group; the inverse quantizer group is configured to reconstruct an inverse quantization spectrum, and outputting to the multi-resolution synthesis module; the multi-resolution synthesis module is configured to perform inverse quantization The resolution is integrated and output to the frequency-time mapping module; the frequency-time mapping module is configured to perform frequency-time mapping on the spectral coefficients, and output a time domain audio signal.

本发明适用于多种采样率、声道配置的音频信号的高保真压缩编码，可以支持采样率为 8kHz到 192kHz之间的音频信号；可支持所有可能的声道配置；并且支持范围很宽的目标码率的音频编 /解码。附图说明 The invention is applicable to high-fidelity compression coding of audio signals of various sampling rates and channel configurations, and can support audio signals with sampling rates between 8 kHz and 192 kHz; all possible channel configurations can be supported; and a wide range of support is supported. Audio encoding/decoding of the target bit rate. DRAWINGS

图 1是 MPEG- 2 AAC编码器的方框图； Figure 1 is a block diagram of an MPEG-2 AAC encoder;

图 2是 MPEG- 2 AAC解码器的方框图； Figure 2 is a block diagram of an MPEG-2 AAC decoder;

图 3是采用杜比 AC-3技术的编码器的结构示意图； Figure 3 is a schematic structural view of an encoder using Dolby AC-3 technology;

图 4是采用杜比 AC- 3技术的解码流程示意图； Figure 4 is a schematic diagram of a decoding process using Dolby AC-3 technology;

图 5是本发明编码装置的结构示意图； Figure 5 is a schematic structural view of an encoding device of the present invention;

图 6是采用 Harr小波基小波变换的滤波结构示意图； 6 is a schematic diagram of a filtering structure using a Harr wavelet-based wavelet transform;

图 7是采用 Harr小波基小波变换得到的时频划分示意图; Figure 7 is a schematic diagram of time-frequency division obtained by using Harr wavelet-based wavelet transform;

图 8是本发明解码装置的结构示意图； 8 is a schematic structural diagram of a decoding apparatus of the present invention;

图 9是本发明编码装置的实施例一的结构示意图； Figure 9 is a schematic structural view of Embodiment 1 of the coding apparatus of the present invention;

图 10是本发明解码装置的实施例一的结构示意图； FIG. 10 is a schematic structural diagram of Embodiment 1 of a decoding apparatus according to the present invention; FIG.

图 11是本发明编码装置的实施例二的结构示意图； Figure 11 is a schematic structural view of Embodiment 2 of the encoding apparatus of the present invention;

图 12是本发明解码装置的实施例二的结构示意图； FIG. 12 is a schematic structural diagram of Embodiment 2 of a decoding apparatus according to the present invention; FIG.

图 13是本发明编码装置的实施例三的结构示意图； Figure 13 is a schematic structural view of a third embodiment of the encoding apparatus of the present invention;

图 14是本发明解码装置的实施例三的结构示意图； FIG. 14 is a schematic structural diagram of Embodiment 3 of a decoding apparatus according to the present invention; FIG.

图 15是本发明编码装置的实施例四的结构示意图； Figure 15 is a schematic structural view of Embodiment 4 of the encoding apparatus of the present invention;

图 16是本发明解码装置的实施例四的结构示意图； 16 is a schematic structural diagram of Embodiment 4 of a decoding apparatus according to the present invention;

图 17是本发明编码装置的实施例五的结构示意图； Figure 17 is a schematic structural view of Embodiment 5 of the coding apparatus of the present invention;

图 18是本发明解码装置的实施例五的结构示意图； 18 is a schematic structural diagram of Embodiment 5 of a decoding apparatus according to the present invention;

图 19是本发明编码装置的实施例六的结构示意图； Figure 19 is a schematic structural view of Embodiment 6 of the encoding apparatus of the present invention;

图 20是本发明解码装置的实施例六的结构示意图； 20 is a schematic structural diagram of Embodiment 6 of a decoding apparatus according to the present invention;

图 21是本发明编码装置的实施例七的结构示意图； Figure 21 is a schematic structural view of Embodiment 7 of the coding apparatus of the present invention;

图 22是本发明解码装置的实施例七的结构示意图。具体实施方式 Figure 22 is a block diagram showing the structure of a seventh embodiment of the decoding apparatus of the present invention. Detailed ways

图 1至图 4是现有技术的几种编码器的结构示意图，已在背景扶术中进行了介绍，此处不再赞述。 Fig. 1 to Fig. 4 are schematic diagrams showing the structure of several encoders of the prior art, which have been introduced in the background, and are not mentioned here.

需要说明的是：为方便、清楚地说明本发明，下述编解码装置的具体实施例是采用对应的方式说明的，但并不限定编码装置与解码装置必须是——对应的。如图 5所示，本发明提供的音频编码装置包括信号性质分析模块 50、心理声学分析模块 51、时频映射模块 52、多分辨率分析模块 53、量化和熵编码模块 54以及比特流复用模块 55; 其中信号性质分析模块 50用于对输入音频信号进行类型分析，将音频信号输出到心理声学分析模块 51和时频映射模块 52, 同时将信号类型分析结果输出到比特流复用模块 55; 心理声学分析模块 51用于计算输入音频信号的掩蔽阈值和信掩比，输出到量化和熵编码模块 54; 时频映射模块 52用于将时域音频信号转变成频域系数，并输出到多分辨率分析模块 53;多分辨率分析模块 53根据心理声学分析模块 51输出的信号类型分析结果，用于对快变类型信号的频域系数进行多分辨率分析，并输出到量化和熵编码模块 54; 量化和熵编码模块 54在心理声学分析模块 51输出的信掩比的控制下，用于对频域系数进行量化和熵编码，并输出到比特流复用模块 55; 比特流复用模块 55用于将接收到的数据进行复用，形成音频编码码流。 It should be noted that, in order to explain the present invention conveniently and clearly, the specific embodiment of the following codec device is described in a corresponding manner, but the encoding device and the decoding device are not necessarily limited. As shown in FIG. 5, the audio encoding apparatus provided by the present invention includes a signal property analyzing module 50, a psychoacoustic analyzing module 51, a time-frequency mapping module 52, a multi-resolution analyzing module 53, a quantization and entropy encoding module 54, and bit stream multiplexing. The module 55 is configured to perform type analysis on the input audio signal, output the audio signal to the psychoacoustic analysis module 51 and the time-frequency mapping module 52, and output the signal type analysis result to the bit stream multiplexing module 55. The psychoacoustic analysis module 51 is configured to calculate a masking threshold and a signal mask ratio of the input audio signal, and output to the quantization and entropy encoding module 54. The time-frequency mapping module 52 is configured to convert the time domain audio signal into frequency domain coefficients and output the same. The resolution analysis module 53 is configured to perform multi-resolution analysis on the frequency domain coefficients of the fast-changing type signal according to the signal type analysis result output by the psychoacoustic analysis module 51, and output to the quantization and entropy coding module. 54; the quantization and entropy coding module 54 is used to control the frequency domain system under the control of the mask ratio output by the psychoacoustic analysis module 51. The number is quantized and entropy encoded and output to the bit stream multiplexing module 55; the bit stream multiplexing module 55 is configured to multiplex the received data to form an audio encoded code stream.

数字音频信号在信号性质分析模块 50 中进行信号类型分析，将音频信号的类型信息输出到比特流复用模块 55；并同时将音频信号输出到所述心理声学分析模块 51和所述时频映射模块 52中，一方面在心理声学分析模块 51中计算该帧音频信号的掩蔽阔值和信掩比，然后将信掩比作为控制信号传送给量化和熵编码模块 54; 另一方面时域的音频信号通过时频映射模块 52转变成频域系数；上述频域系数在多分辨率分析模块 53中，对快变信号进行多分辨率分析，提高快变信号的时间分辨率，并将结果输出到量化和熵编码模块 54 中；在心理声学分析模块 51输出的信掩比的控制下，在量化和熵编码模块 54中进行量化和熵编码，经过编码后的数据和控制信号在比特流复用模块 55 进行复用，形成增强音频编码的码流。 The digital audio signal is subjected to signal type analysis in the signal property analysis module 50, and the type information of the audio signal is output to the bit stream multiplexing module 55; and the audio signal is simultaneously output to the psychoacoustic analysis module 51 and the time-frequency map. In the module 52, on the one hand, the masking threshold and the mask ratio of the frame audio signal are calculated in the psychoacoustic analysis module 51, and then the mask ratio is transmitted as a control signal to the quantization and entropy encoding module 54; The signal is converted into frequency domain coefficients by the time-frequency mapping module 52; the frequency domain coefficients are multi-resolution analysis module 53 in the multi-resolution analysis module 53 to improve the time resolution of the fast-changing signal, and output the result to In the quantization and entropy coding module 54; under the control of the mask ratio output by the psychoacoustic analysis module 51, quantization and entropy coding are performed in the quantization and entropy coding module 54, and the encoded data and control signals are multiplexed in the bit stream. Module 55 multiplexes to form a code stream that enhances the audio coding.

下面对上述音频编码装置的各个组成模块进行具体详细地说明。 The respective constituent modules of the above audio encoding device will be described in detail below.

信号性质分析模块 50, 用于输入的音频信号进行信号类型分析，并将音频信号的类型信息输出到比特流复用模块 55; 同时将音频信号输出到心理声学分析模块 51和时频映射模块 52。 The signal property analysis module 50 performs signal type analysis on the input audio signal, and outputs the type information of the audio signal to the bit stream multiplexing module 55; and simultaneously outputs the audio signal to the psychoacoustic analysis module 51 and the time-frequency mapping module 52. .

信号性质分析模块 50基于自适应阈值和波形预测进行前、后向掩蔽效应分析来确定信号的类型为緩变信号还是快变信号，若是快变类型信号，则继续计算突变成分的相关参数信息，如突变信号发生的位置以及突变信号的强度等。 The signal property analysis module 50 performs front and back masking effect analysis based on the adaptive threshold and the waveform prediction to determine whether the signal type is a slow-changing signal or a fast-changing signal, and if it is a fast-changing type signal, continues to calculate related parameter information of the abrupt component. Such as the location of the mutation signal and the strength of the mutation signal.

心理声学分析模块 51 主要用于计算输入音频信号的掩蔽阈值、信掩比和感知熵。根据心理声学分析模块 51 计算出的感知熵可动态地分析当前信号帧进行透明编码所需的比特数，从而调整帧间的比特分配。心理声学分析模块 51 输出各个子带的信掩比到量化和熵编码模块 54 , 对其进行控制。 The psychoacoustic analysis module 51 is mainly used to calculate the masking threshold, the mask ratio and the perceptual entropy of the input audio signal. The perceptual entropy calculated by the psychoacoustic analysis module 51 can dynamically analyze the number of bits required for the current signal frame to be transparently encoded, thereby adjusting the bit allocation between frames. The psychoacoustic analysis module 51 outputs the mask ratio of each sub-band to the quantized sum The entropy encoding module 54 controls it.

时频映射模块 52用于实现音频信号从时域信号到频域系数的变换，由滤波器组构成，具体可以是离散傅立叶变换（DFT )滤波器组、离散余弦变换（DCT )滤波器组、修正离散余弦变换（MDCT )滤波器组、余弦调制滤波器组、小波变换滤波器组等。通过时频映射得到的频域系数被输出到量化和熵编码模块 54中，进行量化和编码处理。 The time-frequency mapping module 52 is configured to implement the transformation of the audio signal from the time domain signal to the frequency domain coefficient, and is composed of a filter bank, and specifically may be a discrete Fourier transform (DFT) filter bank, a discrete cosine transform (DCT) filter bank, Modified discrete cosine transform (MDCT) filter bank, cosine modulated filter bank, wavelet transform filter bank, etc. The frequency domain coefficients obtained by the time-frequency mapping are output to the quantization and entropy encoding module 54, and quantization and encoding processing is performed.

对于快变类型信号，为有效克服编码过程中产生的预回声现象，提高编码质量，本发明编码装置通过多分辨率分析模块 53 来提高编码快变信号的时间分辨率。时频映射模块 52输出的频域系数输入到多分辨率分析模块 53中，如果是快变类型信号，则进行频域小波变换或频域修正离散余弦变换（MDCT ), 获得对频域系数的多分辨率表示，输出到量化和熵编码模块 54 中。如果是緩变类型信号，则对频域系数不进行处理，直接输出到量化和熵编码模块 54。 For the fast-changing type signal, in order to effectively overcome the pre-echo phenomenon generated in the encoding process and improve the encoding quality, the encoding apparatus of the present invention increases the time resolution of the encoded fast-changing signal by the multi-resolution analyzing module 53. The frequency domain coefficients output by the time-frequency mapping module 52 are input to the multi-resolution analysis module 53. If it is a fast-varying type signal, the frequency domain wavelet transform or the frequency domain modified discrete cosine transform (MDCT) is performed to obtain the frequency domain coefficients. The multi-resolution representation is output to the quantization and entropy encoding module 54. If it is a slowly varying type signal, the frequency domain coefficients are not processed and are directly output to the quantization and entropy encoding module 54.

多分辨率分析模块 53 包括频域系数变换模块和重組模块，其中频域系数变换模块用于将频域系数变换为时频平面系数；重组模块用于将时频平面系数按照一定的规则进行重组。频域系数变换模块可采用频域小波变换滤波器组、频域 MDCT变换滤波器組等。 The multi-resolution analysis module 53 includes a frequency domain coefficient transform module and a recombination module, wherein the frequency domain coefficient transform module is configured to transform the frequency domain coefficients into time-frequency plane coefficients; the recombination module is configured to reorganize the time-frequency plane coefficients according to certain rules. . The frequency domain coefficient transform module may adopt a frequency domain wavelet transform filter bank, a frequency domain MDCT transform filter bank, or the like.

量化和嫡编码模块 54 进一步包括了非线性量化器组和编码器，其中量化器可以是标量量化器或矢量量化器。矢量量化器进一步分为无记忆矢量量化器和有记忆矢量量化器两大类。对于无记忆矢量量化器，每个输入矢量是独立进行量化的，与以前的各矢量无关；有记忆矢量量化器是在量化一个矢量时考虑以前的矢量，即利用了矢量之间的相关性。主要的无记忆矢量量化器包括全搜索矢量量化器、树搜索矢量量化器、多级矢量量化器、增益 /波形矢量量化器和分离均值矢量量化器；主要的有记忆矢量量化器包括预测矢量量化器和有限状态矢量量化器。 The quantization and chirp encoding module 54 further includes a non-linear quantizer group and an encoder, where the quantizer can be a scalar quantizer or a vector quantizer. The vector quantizer is further divided into two categories: memoryless vector quantizer and memory vector quantizer. For a memoryless vector quantizer, each input vector is independently quantized, independent of the previous vectors; a memory vector quantizer considers the previous vector when quantizing a vector, ie, exploits the correlation between vectors. The main memoryless vector quantizers include a full search vector quantizer, a tree search vector quantizer, a multilevel vector quantizer, a gain/waveform vector quantizer, and a separate mean vector quantizer; the main memory vector quantizer includes predictive vector quantization And finite state vector quantizer.

如果采用标量量化器，则非线性量化器组进一步包括 M个子带量化器。在每个子带量化器中主要利用尺度因子进行量化，具体是：对 M个尺度因子带中所有的频域系数进行非线性压缩，再利用尺度因子对该子带的频域系数进行量化 , 得到整数表示的量化谱输出到编码器，将每帧信号中的第一个尺度因子作为公共尺度因子输出到比特流复用模块 55 , 其它尺度因子与其前一个尺度因子进行差分处理后输出到编码器。 If a scalar quantizer is employed, the non-linear quantizer group further includes M sub-band quantizers. In each sub-band quantizer, the scale factor is mainly used for quantization, specifically: nonlinearly compressing all the frequency domain coefficients in the M scale factor bands, and then using the scale factor to quantize the frequency domain coefficients of the sub-band, The quantized spectrum represented by the integer is output to the encoder, and the first scale factor in each frame signal is output to the bit stream multiplexing module 55 as a common scale factor, and other scale factors are differentially processed with the previous scale factor and output to the encoder. .

上述步骤中的尺度因子是不断变化的值，按照比特分配策略来调整。本发明提供了一种全局感知失真最小的比特分配策略，具体如下： The scale factor in the above steps is a constantly changing value, which is adjusted according to the bit allocation strategy. The present invention provides a bit allocation strategy with minimal global perceptual distortion, as follows:

首先，初始化每个子带量化器，选择合适的尺度因子，使所有子带中的谱系数的量化值为 0。此时每个子带的量化噪声等于每个子带的能量值，每个子带的噪声掩蔽比 NMR等于它的信掩比 SMR，量化所消耗的比特数为 0 , 剩余比特数等于目标比特数 First, each sub-band quantizer is initialized, and an appropriate scale factor is selected such that the quantized value of the spectral coefficients in all sub-bands is zero. At this time, the quantization noise of each sub-band is equal to the energy value of each sub-band, and the noise masking ratio of each sub-band is NMR, etc. In its letter mask ratio SMR, the number of bits consumed for quantization is 0, and the number of remaining bits is equal to the number of target bits.

其次，查找噪声掩蔽比 NMR最大的子带，若最大噪声掩蔽比丽 R小于等于 1 , 则尺度因子不变，输出分配结果，比特分配过程结束；否则，将对应的子带量化器的尺度因子减小一个单位，然后计算该子带所需增加的比特数 Δ5,. (β)。若该子带的剩余比特数 Secondly, finding the subband with the largest noise masking ratio NMR, if the maximum noise masking ratio R is less than or equal to 1, the scale factor is unchanged, the output allocation result is output, and the bit allocation process ends; otherwise, the scale factor of the corresponding subband quantizer is Decrease by one unit and then calculate the number of bits Δ5,. (β) that the subband needs to increase. If the remaining bits of the subband

B_t≥AB_i (Q_i ) , 则确认此次尺度因子的修改，并将剩余比特数 A减去 Δ5,.(¾)，重新计算该子带的噪声掩蔽比丽 R , 然后继续查找噪声掩蔽比丽 R最大的子带，重复执行后续步骤。如果该子带的剩佘比特数 < Δ5,(ρ,) ,则取消此次修改，保留上一次的尺度因子以及剩佘比特数，最后输出分配结果，比特分配过程结束。 B _t ≥AB _i (Q _i ), then confirm the modification of the scale factor, and subtract Δ5,.(3⁄4) from the remaining number of bits A, recalculate the noise mask of the sub-band, and then continue to find the noise. Masking the largest subband of Bili R, repeat the subsequent steps. If the number of remaining bits of the subband is < Δ5, (ρ,), the modification is canceled, the previous scale factor and the remaining number of bits are retained, and finally the allocation result is output, and the bit allocation process ends.

如果采用矢量量化器，则频域系数组成多个维矢量输入到非线性量化器组中，对于每个维矢量都根据平整因子进行傅平整，即缩小侮的动态范围，然后由矢量量化器根据主观感知距离测度准则在码书中找到与待量化矢量距离最小的码字，将对应的码字索引传递给编码器。平整因子是^ L据矢量量化的比特分配策略调整的，而矢量量化的比特分配则根据不同子带间感知重要度来控制。 If a vector quantizer is used, the frequency domain coefficients are composed into a plurality of dimensional vectors and input into the nonlinear quantizer group. For each dimension vector, the flattening factor is used to perform the flattening, that is, the dynamic range of the chirp is reduced, and then the vector quantizer is used. The subjective perceptual distance measure criterion finds the codeword with the smallest distance from the vector to be quantized in the codebook, and transmits the corresponding codeword index to the encoder. The leveling factor is adjusted according to the bit allocation strategy of vector quantization, and the bit allocation of vector quantization is controlled according to the perceived importance between different sub-bands.

在经过上述量化处理后，利用熵编码技术进一步去除量化后的系数以及边信息的统计冗余。熵编码是一种信源编码技术，其基本思想是：对出现概率较大的符号给予较短长度的码字，而对出现概率小的符号给予较长的码字，这样平均码字的长度最短。根据 Shannon 的无噪声编码定理，如果传输的 N个源消息的符号是独立的，那么使用适当的变长度编码，码字的平均长度将满足 H(x) _ 1 After the above quantization process, the entropy coding technique is used to further remove the quantized coefficients and the statistical redundancy of the side information. Entropy coding is a kind of source coding technology. The basic idea is to give shorter-length codewords to symbols with higher probability of occurrence and longer codewords to symbols with lower probability of occurrence, so that the length of average codewords The shortest. According to Shannon's noiseless coding theorem, if the symbols of the transmitted N source messages are independent, then using the appropriate variable length coding, the average length of the codeword will satisfy H(x) _ 1

≤n < , 其中表示信源的熵, log₂ (£>) log₂ (D) N 示符号变量。由于熵 #W是平均码字长度的最短极限，上述公式表明此时码字的平均长度很接近于它的下限熵因此这种变长度编码技术又成为 "熵编码"。熵编码主要有 Huffman编码、算术编码或游程编码等方法，本发明中的熵编码均可采用上述编码方法的任一种。 ≤ n < , where represents the entropy of the source, log ₂ (£>) log ₂ (D) N shows the symbol variable. Since entropy #W is the shortest limit of the average codeword length, the above formula indicates that the average length of the codeword is very close to its lower bound entropy at this time. Therefore, this variable length coding technique becomes "entropy coding". The entropy coding mainly includes methods such as Huffman coding, arithmetic coding or run length coding, and any entropy coding in the present invention may adopt any of the above coding methods.

经过标量量化器量化后输出的量化谱和差分处理后的尺度因子在编码器中进行熵编码，得到码书序号、尺度因子编码值和无损编码量化谱，再对码书序号进行熵编码，得到码书序号编码值，然后将尺度因子编码值、码书序号编码值和无损编码量化倕输出到比特流复用模块 55中。 After being quantized by the scalar quantizer, the quantized spectrum and the scaled factor after the differential processing are entropy encoded in the encoder, and the code book serial number, the scale factor coded value and the lossless coded quantized spectrum are obtained, and then the code book number is entropy coded. The code book serial number encodes the value, and then outputs the scale factor code value, the code book sequence code value, and the lossless code quantizer 到 to the bit stream multiplexing module 55.

经过矢量量化器量化后得到的码字索引在编码器中进行一维或多维熵编码，得到码字索引的编码值，然后将码字索引的编码值输出到比特流复用模块 55中。基于上述编码器的编码方法，具体包括：对输入音频信号进行信号类型分析；计算音频信号的信掩比；对音频信号进行时频映射，获得音频信号的频域系数；对频域系数进行多分辨率分析以及量化和熵编码；将信号类型分析结果和编码后的音频码流进行复用，得到压缩音频码流。 The codeword index obtained by the vector quantizer quantization is subjected to one-dimensional or multi-dimensional entropy coding in the encoder to obtain an encoded value of the codeword index, and then the encoded value of the codeword index is output to the bitstream multiplexing module 55. The encoding method based on the above encoder specifically includes: performing signal type analysis on the input audio signal; calculating a signal mask ratio of the audio signal; performing time-frequency mapping on the audio signal to obtain a frequency domain coefficient of the audio signal; and performing more on the frequency domain coefficient Resolution analysis and quantization and entropy coding; multiplexing the signal type analysis result and the encoded audio code stream to obtain a compressed audio code stream.

分析信号类型是基于自适应阈值和波形预测进行前、后向掩蔽效应分析来确定的，具体步骤是：将输入的音频数据分解成帧；把输入帧分解成多个子帧，并查找各个子帧上 PCM 数据绝对值的局部最大点；在各子帧的局部最大点中选出子帧峰值；对某个子帧峰值，利用该子帧前面的多个（典型的可取 3个）子帧峰值预测相对该子帧前向延迟的多个（典型的可取 4个）子帧的典型样本值；计算该子帧峰值与所预测出的典型样本值的差值和比值；如果预测差值和比值都大于设定的阈值，则判断该子帧存在突跃信号，确认该子帧具备后向掩蔽预回声能力的局部最大峰点，如果在该子帧前端与掩蔽峰点前 2. 5ms处之间存在一个峰值足够小的子帧，则判断该帧信号属于快变类型信号；如果预测差值和比值不大于设定的阈值，则重复上述步骤直到判断出该帧信号是快变类型信号或者到达最后一个子帧，如果到达最后一个子帧仍未判断出该帧信号是快变类型信号，则该帧信号属于緩变类型信号。 The analysis signal type is determined based on adaptive threshold and waveform prediction for front and back masking effect analysis. The specific steps are: decomposing the input audio data into frames; decomposing the input frame into multiple sub-frames, and finding each sub-frame The local maximum point of the absolute value of the upper PCM data; the subframe peak value is selected in the local maximum point of each subframe; for a certain subframe peak, a plurality of (typically 3) subframe peak predictions in front of the subframe are utilized Typical sample values of a plurality of (typically 4) subframes relative to the forward delay of the subframe; calculating a difference and a ratio of the peak value of the subframe to the predicted typical sample value; if both the predicted difference and the ratio are If it is greater than the set threshold, it is determined that there is a sudden signal in the sub-frame, and the sub-frame has a local maximum peak point of the back-masking pre-echo capability, if between the front end of the sub-frame and the mask peak, 2.5 ms. If there is a sub-frame with a sufficiently small peak, it is judged that the frame signal belongs to the fast-changing type signal; if the predicted difference value and the ratio are not greater than the set threshold, the above steps are repeated until it is determined The frame signal is a fast-changing type signal or reaches the last sub-frame. If the last sub-frame is not determined to be a fast-changing type signal, the frame signal belongs to a slowly varying type signal.

对时域音频信号进行时频变换的方法有很多，如离散傅立叶变换（DFT )、离散余弦变换（DCT )、修正离散余弦变换（MDCT )、余弦调制滤波器组、小波变换等。下面以修正离散余弦变换 MDCT和余弦调制滤波为例说明时频映射的过程。 There are many methods for time-frequency transform of time-domain audio signals, such as discrete Fourier transform (DFT), discrete cosine transform (DCT), modified discrete cosine transform (MDCT), cosine-modulated filter bank, wavelet transform, and so on. The following describes the process of time-frequency mapping by taking the modified discrete cosine transform MDCT and cosine modulation filtering as an example.

对于采用修正离散余弦变换 MDCT进行时频变换的情况，首先选取前一帧个样本和当前帧个样本的时域信号，再对这两帧共 2 个样本的时域信号进行加窗操作，然后对经过加窗后的信号进行 MDCT变换，从而获得个频域系数。 For the case of time-frequency transform using the modified discrete cosine transform (MDCT), first select the time domain signals of the previous frame and the current frame samples, and then window the time domain signals of the two frames of the two frames, and then The MDCT transform is performed on the windowed signal to obtain a frequency domain coefficient.

MDCT分析滤波器的脉冲响应为：

则 MDCT变换为： X{k) = ∑ x(n)h_k ( ) 0≤k≤M - l , 其中：为窗函数; x (n)^} MDCT 变换的输入时域信号； X MDCT变换的输出频域信号。 The impulse response of the MDCT analysis filter is:

Then the MDCT transform is: X{k) = ∑ x(n)h _k ( ) 0≤k≤M - l , where: is the window function; x (n)^} the input time domain signal of the MDCT transform; X MDCT transform Output frequency domain signal.

为满足信号完全重构的条件， MDCT变换的窗函数必须满足以下两个条件： In order to satisfy the condition of complete signal reconstruction, the window function of the MDCT transform must satisfy the following two conditions:

w(2M - l - n) = w(n) 且 w² (n) + w² (n + M) = l . 在实际中，可选用 S ine 窗作为窗函数。当然，也可以通过使用双正交变换，用特定的分析滤波器和合成滤波器修改上述对窗函数的限制。 w(2M - l - n) = w(n) and w ² (n) + w ² (n + M) = l . In practice, the S ine window can be selected as the window function. Of course, it is also possible to use a bi-orthogonal transformation to specify The analysis filter and synthesis filter modify the above limitations on the window function.

对于采用余弦调制滤波进行时频变换的情况，则首先选取前一帧个样本和当前帧 M 个样本的时域信号，再对这两帧共 2 个样本的时域信号进行加窗操作，然后对经过加窗后的信号进行余弦调制变换，从而获得个频域系数。 For the case of time-frequency transform using cosine modulation filtering, first select the time domain signal of the previous frame sample and the M frame of the current frame, and then perform windowing operation on the time domain signals of two samples of the two frames, and then A cosine modulation transform is performed on the windowed signal to obtain a frequency domain coefficient.

传统的余弦调制滤波技术的冲击响应为 The impulse response of the traditional cosine modulation filtering technique is

K (") = 2p_a (") + 0.5)(" - ¾ + ¾K (") = 2p _a (") + 0.5)(" - 3⁄4 + 3⁄4

n=QX"',N_h -1

n=QX"', N _h -1

n=0,l,---,N_f 一 1 其中 0≤ <Μ— 1, 0≤η<2ΚΜ-1, 为大于零的整数， 1)*^。假设 Μ子带余弦调制滤波器组的分析窗（分析原型滤波器） ^0)的冲击响应长度为 Ν_α , 综合窗（综合原型滤波器） p_s ( )的冲击响应长度为 N_s。当分析窗和综合窗相等时，即 P。(n~) = p_s(n),RN_a = N_s , 由上面两式所表示的余弦调制滤波器组为正交滤波器组，此时矩阵 H和 ( [H]_nk = h_k(n),[F]_nJ[ = ΛΟ) )为正交变换矩阵。为获得线性相位滤波器组，进一步规定对称窗满足 ρ_α(2ΚΜ— \— ) = _Ρα( )。为保证正交和双正交系统的完全重构性，窗函数还需满足一定的条件，详见文献 "Multirate Systems and Filter Banks" , P. P. Vaidynathan, Prentice Hal 1, Englewood Cliffs, NJ, 1993。 n = 0, l, ---, N _f - 1 where 0 ≤ < Μ - 1, 0 ≤ η < 2 ΚΜ -1, is an integer greater than zero, 1) * ^. Assume that the analysis response window (analytical prototype filter) of the dice with cosine-modulated filter bank has a shock response length of Ν _α , and the integrated window (integrated prototype filter) p _s ( ) has an impulse response length of N _s . When the analysis window and the synthesis window are equal, that is, P. (n~) = p _s (n), RN _a = N _s , the cosine-modulated filter bank represented by the above two equations is an orthogonal filter bank, where matrix H and ( [H] _nk = h _k ( n), [F] _nJ[ = ΛΟ)) is an orthogonal transformation matrix. To obtain a linear phase filter bank, it is further specified that the symmetric window satisfies ρ _α (2ΚΜ— \− ) = _Ρα ( ). In order to ensure the complete reconfiguration of orthogonal and biorthogonal systems, the window function also needs to meet certain conditions, see the literature "Multirate Systems and Filter Banks", PP Vaidynathan, Prentice Hal 1, Englewood Cliffs, NJ, 1993.

计算重采样后信号的掩蔽阁值和信掩比包括以下步驟： Calculating the masking value and the mask ratio of the resampled signal includes the following steps:

第一步、将信号进行时域到频域的映射。可采用快速傅立叶变换和汉宁窗（hanning window )技术将时域数据转换成频域系数 l ], 用幅度 r ]和相位表示为 The first step is to map the signal from time domain to frequency domain. The fast Fourier transform and the hanning window technique can be used to convert the time domain data into frequency domain coefficients l], expressed by the amplitude r] and the phase

X[k]^r[k]e^jm , 那么每个子带的能量是该子带内所有谱线能量的和，即其中和 >,分别表示子带 b的上下边界。X[k]^r[k]e ^jm , then the energy of each sub-band is the sum of the energy of all the spectral lines in the sub-band, that is, the sum >, respectively representing the upper and lower boundaries of the sub-band b.

第二步、确定信号中的音调和非音调成分。信号的音调性是通过对每个谱线进行帧间预测来估计的，每个镨线的预测值和真实值的欧氏距离被映射为不可预测测度，高预测性的谱成分被认为是音调性很强的，而低预测性的谱成分被认为是类噪声的。预测值的幅度 r_pred和相位 φ_ρκά可用以下公式来表示： r_pred \ =， [k] + {r_t_ [k] - r_t_₂ [k]) 其中， ί表示当前帧的系数； -1表示前一帧的系数；卜 2表示前两帧的系数。 The second step is to determine the pitch and non-tonal components in the signal. The tonality of the signal is estimated by inter-predicting each spectral line. The Euclidean distance between the predicted and true values of each squall line is mapped to an unpredictable measure, and the highly predictive spectral components are considered to be tones. Very strong, and low-predictive spectral components are considered to be noise-like. The amplitude r _pred and the phase φ _{ρκά of the} predicted value can be expressed by the following formula: r _pred \ =, [k] + {r _t _ [k] - r _t _ ₂ [k]) where ί represents the coefficient of the current frame; -1 indicates the coefficient of the previous frame; Bu 2 indicates the coefficient of the first two frames.

那么，不可预测测度的计算公式为：

Then, the formula for calculating the unpredictable measure is:

其中，欧氏距离 i¾t (；采用下式计算： di

Among them, the Euclidean distance i3⁄4t (; is calculated by the following formula: di

因此，每个子带的不可预测度是该子带内所有谱线的能量对其不可预测度的加权和，。子带能量和不可预测度 c[b]分别与扩展函数进行卷积运

Therefore, the unpredictability of each subband is the weighted sum of the energy of all the spectral lines in the subband to its unpredictability. Subband energy and unpredictability c[b] are convoluted separately with the extension function

算，得到子带能量扩展和子带不可预测度扩展^: ό]，掩模对子带 6的扩展函数表示为 sU, ]。为了消除扩展函数对能量变换的影响，需要对子带不可预测度扩展做归一化处理，其归一化的结果用 [b]表示为 [δ] = ^。同样，为消除扩展函数对子带能量的影响，定义归一化能量扩展5；| ]为 [ ]：=^, 其中归一化因子为： n[b] Calculate, obtain subband energy extension and subband unpredictability extension ^: ό], mask The extension function for subband 6 is expressed as sU, ]. In order to eliminate the influence of the extension function on the energy transformation, the sub-band unpredictability extension needs to be normalized, and the normalized result is expressed as [δ] = ^ by [b]. Similarly, to eliminate the effect of the extension function on the subband energy, define the normalized energy extension 5; | ] is [ ]:=^, where the normalization factor is: n[b]

bmax Bmax

n[b]=∑∑s[i,b], ό„„为该帧信号所分的子带数。根据归一化不可预测度扩展 , 可计算子带的音调性： t[b] = -0.299 - 0.43 log_e (c_s [b]) , 且 0≤t[6]≤l。当 [6]=1时，表示该子带信号为纯音调；当 [6]=0时，表示该子带信号为白噪声。 n[b]=∑∑s[i,b], ό„„ is the number of subbands divided by the frame signal. According to the normalized unpredictability expansion, the pitch of the subband can be calculated: t[b] = -0.299 - 0.43 log _e (c _s [b]) , and 0 ≤ t[6] ≤ l. When [6]=1, it indicates that the subband signal is pure tone; when [6]=0, it indicates that the subband signal is white noise.

第三步、计算每个子带所需的信噪比（Signal- to- Noise Ratio, 简称 SNR)。将所有子带的噪声掩蔽音调（Noise- Masking- Tone, 筒称丽 T ) 的值设为 5dB, 音调掩蔽噪声 (Tone-Masking- Noise, 简称 TMN )的值设为 18dB, 若要使噪声不被感知，则每个子带所需的信噪比 ₁5½ [0]是₁¾、 [6] = 18 ] + 6(1- ])。第四步、计算每个子带的掩蔽阈值以及信号的感知熵。根据前述步骤得到的每个子带的归一化信号能量和所需的信噪比 SNR, 计算每个子带的噪声能量阈值为 «[b] = [6]10—纖 ⁰。为了避免预回声的影响，将当前帧的噪声能量阈值 [6]与前一帧的噪声能量阔值 n_prev[b]进行比较' 得到信号的掩蔽阈值为 = m(n[b],2n_prev[b]) , 这样可以确保掩蔽阈值不会因为在分析窗的近末端有高能量的冲击产生而出现偏差。 The third step is to calculate the Signal-to-Noise Ratio (SNR) required for each sub-band. Set the value of the noise masking tone (Noise-Masking-Tone) of all sub-bands to 5dB, and the value of Tone-Masking-Noise (TMN) to 18dB. When perceived, the required signal-to-noise ratio for each sub-band ₁ 51⁄2 [0] is ₁ 3⁄4, [6] = 18 ] + 6(1- ]). The fourth step is to calculate the masking threshold of each subband and the perceptual entropy of the signal. According to the normalized signal energy of each sub-band obtained by the foregoing steps and the required signal-to-noise ratio SNR, the noise energy threshold of each sub-band is calculated as «[b] = [6] 10 - fiber ⁰ . In order to avoid the influence of pre-echo, the noise energy threshold [6] of the current frame is compared with the noise energy threshold n _prev [b] of the previous frame. 'The masking threshold of the signal is = m(n[b], 2n _prev [b]) , This ensures that the masking threshold does not deviate due to high energy impacts at the near end of the analysis window.

进一步地，考虑静止掩蔽阈值 ^ rW的影响，选择最终的信号的掩蔽阈值为静止掩蔽阈值与上述计算的掩蔽阈值两者中的数值大者，即 "[6] = max(n[b],qsthr[b])。然后采用口下公式计算感知熵，即；½ 其中 cbiv議示各

Further, considering the influence of the static masking threshold ^ rW, the masking threshold for selecting the final signal is greater than the value of the static masking threshold and the above calculated masking threshold, ie "[6] = max(n[b], Qsthr[b]). Then use the formula under the mouth to calculate the perceptual entropy, that is, 1⁄2 where cbiv represents each

予带所包含的谱线个数。 The number of lines included in the band.

第五步：计算每个子带信号的信掩比（Signa卜 to-Mask Ratio, 简称 SMR)。每个子带的信掩比 SMR [b]为 SMi?[b] = 。 Step 5: Calculate the signal-to-mask ratio (Signa to-Mask Ratio, SMR for short) of each sub-band signal. The mask ratio of each subband is SMi[b] = for SMR [b].

然后对频域系数进行多分辨率分析。多分辨率分析模块 53对输入的频域数据进行时 -频咸的重新组织，以频率精度的降低为代价提高频域数据的时间分辨率，从而自动适应快变类型信号的时频特性，达到抑制预回声的效果，且无需调整时频映射模块 52中滤波器组的形式。 Multi-resolution analysis of the frequency domain coefficients is then performed. The multi-resolution analysis module 53 performs time-frequency salt re-organization on the input frequency domain data, and improves the time resolution of the frequency domain data at the cost of reducing the frequency precision, thereby automatically adapting to the time-frequency characteristics of the fast-changing type signal, The effect of the pre-echo is suppressed, and the form of the filter bank in the time-frequency mapping module 52 need not be adjusted.

多分辨率分析包括频域系数变换和重组两个步骤，其中通过频域系数变换将频域系数变换为时频平面系数；通过重组将时频平面系数按照一定的规则进行分组。 The multi-resolution analysis includes two steps of frequency domain coefficient transform and recombination, wherein frequency domain coefficients are transformed into time-frequency plane coefficients by frequency domain coefficient transform; time-frequency plane coefficients are grouped according to certain rules by recombination.

下面以频域小波变换和频域 MDCT变换为例，说明多分辨率分析的过程。 The process of multi-resolution analysis is illustrated by taking the frequency domain wavelet transform and the frequency domain MDCT transform as examples.

1 )频域小波变换 1) Frequency domain wavelet transform

假设时序序列 x(f),i = 0,1,···,2 -1 , 经过时频映射后得到的频域系数为 X(k), 1=0、 Assuming the time series x(f), i = 0,1,···, 2 -1 , the frequency domain coefficients obtained after time-frequency mapping are X(k), 1=0,

1、 …、 M-l_a 频域小波或小波包变换的小波基可以是固定的，也可以是自适应的。 1, ..., Ml _a frequency domain wavelet or wavelet packet transform wavelet base can be fixed or adaptive.

下面以最简单的 Harr 小波基小波变换为例，说明对频域系数进行多分辨率分析的过 Let's take the simplest Harr wavelet-based wavelet transform as an example to illustrate the multi-resolution analysis of frequency domain coefficients.

Harr小波基的尺度系数为，- ^] , 图 ⁶示出了采用 Harr

The scale factor of the Harr wavelet base is -^], and Figure ⁶ shows the use of Harr.

1 11 1

^波基进行小波变换的滤波结构示意图，其中 H。表示低通滤波（滤波系数为 ), ^ Schematic diagram of the filtering structure of the wavelet transform, where H. Indicates low pass filtering (filter coefficient is ),

V2 V2 H,表示高通滤波（滤波系数为 ), " I 2" 表示 1倍的下采样操作。对于频域系 V2 V2 H, which means high-pass filtering (filter coefficient is), and "I 2" means 1 time of downsampling operation. For the frequency domain

V2 1 V2 1

数的中低频部分：)， = 0, · · ·, ^不进行小波变换，对频域系数的高频部分进行 Harr小波变换，得到不同的时间 -频率区间的系数 ₂ (k)、 X₃ (k)、 X k)、 X₅ (k) . X₆ (k) ^ X₇ (k) , 对应的时间-频率平面划分如图 7 所示。选择不同的小波基，可选用不同的小波变换结构进行处理，得到其他类似的时间 -频率平面划分。因此可以 #_据需要，任意调整信号分析时的时频平面划分，满足不同的时间和频率分辨率的分析要求。 The middle and low frequency parts of the number :), = 0, · · ·, ^ Do not perform wavelet transform, perform Harr wavelet transform on the high frequency part of the frequency domain coefficient, and obtain coefficients ₂ (k), X _{3 of} different time-frequency intervals. (k), X k), X ₅ (k) . X ₆ (k) ^ X ₇ (k) , the corresponding time-frequency plane division is shown in Fig. 7. Different wavelet bases can be selected, and different wavelet transform structures can be selected for processing, and other similar time-frequency plane partitions are obtained. Therefore, it is possible to adjust the time-frequency plane division of the signal analysis arbitrarily according to the needs, and to meet the analysis requirements of different time and frequency resolutions.

上述时频平面系数在重组模块中按照一定的规则进行重組，例如：可先将时频平面系数在频率方向组织，每个频带中的系数在时间方向组织，然后将组织好的系数按照子窗、尺度因子带的顺序排列。 The time-frequency plane coefficients are reorganized according to certain rules in the recombination module. For example: the time-frequency plane coefficients can be organized in the frequency direction, the coefficients in each frequency band are organized in the time direction, and then the well-organized coefficients are followed by the sub-window. , the order of the scale factor bands.

2 )频域 MDCT变换 2) Frequency domain MDCT transform

设输入频域 MDCT变换滤波器组的频域数据为 X(l), 1= 0, 1 , ...， N-1 , 依次对这 N点频域数据进行 M点的 MDCT变换，使得时频域数据的频率精度有所下降，而时间精度则相应地提高了。在不同的频域范围内使用不同长度的频域 MDCT 变换，可以获得不同的时-频平面划分即不同的时、频精度。重组模块对频域 MDCT 变换滤波器组输出的时-频域数据进行重组，一种重组方法是先将时频平面系数在频率方向组织，同时每个频带中的系数在时间方向组织，然后将组织好的系数按照子窗、尺度因子带的顺序排列。 Let the frequency domain data of the input frequency domain MDCT transform filter bank be X(l), 1= 0, 1 , ..., N-1, and perform MCT transform of M point in sequence for the N point frequency domain data, so that time The frequency accuracy of the frequency domain data is reduced, and the time accuracy is correspondingly improved. Different frequency-domain MDCT transforms can be used in different frequency domain ranges to obtain different time-frequency plane divisions, that is, different time and frequency precisions. The recombination module reorganizes the time-frequency domain data outputted by the frequency domain MDCT transform filter bank. A recombination method is to first organize the time-frequency plane coefficients in the frequency direction, and the coefficients in each frequency band are organized in the time direction, and then The organized coefficients are arranged in the order of the sub-window and the scale factor.

量化和熵编码进一步包括了非线性量化和熵编码两个步骤，其中量化可以是标量量化或矢量量化。 Quantization and entropy coding further include two steps of nonlinear quantization and entropy coding, where the quantization can be scalar quantization or vector quantization.

标量量化包括以下步 -骤：对所有尺度因子带中的频域系数进行非线性压缩；再利用每个子带的尺度因子对该子带的频域系数进行量化，得到整数表示的量化谱；选择每帧信号中的第一个尺度因子作为公共尺度因子；其它尺度因子与其前一个尺度因子进行差分处理。 The scalar quantization includes the following steps: nonlinearly compressing the frequency domain coefficients in all scale factor bands; and then quantizing the frequency domain coefficients of the subbands by using the scale factor of each subband to obtain a quantized spectrum represented by an integer; The first scale factor in each frame of the signal is used as a common scale factor; other scale factors are differentially processed from their previous scale factor.

矢量量化包括以下步骤：将频域系数构成多个多维矢量信号；对于每个维矢量都根据平整因子进行潘平整； ■据主观感知距离测度准则在码书中查找与待量化矢量距离最小的码字，获得其码字索引。 Vector quantization includes the following steps: constituting a plurality of multi-dimensional vector signals by frequency domain coefficients; panning for each dimension vector according to a flattening factor; ■ finding a code having the smallest distance from a vector to be quantized in a codebook according to a subjective perceptual distance measure criterion Word, get its codeword index.

熵编码步骤包括：对量化谱和差分处理后的尺度因子进行熵编码，得到码书序号、尺度因子编码值和无损编码量化语；对码书序号进行熵编码，得到码书序号编码值。 The entropy coding step comprises: entropy coding the quantized spectrum and the differentially processed scale factor to obtain a codebook serial number, a scale factor coding value, and a lossless coding quantization language; entropy coding the codebook sequence number to obtain a codebook serial number coding value.

或者是：对码字索引进行一维或多维熵编码，得到码字索引的编码值。上迷的熵编码方法可以采用现有的 Huffman编码、算术编码或游程编码等方法中的任 ―种。 Or: Perform one-dimensional or multi-dimensional entropy coding on the codeword index to obtain the coded value of the codeword index. The above entropy coding method can adopt any of the existing methods such as Huffman coding, arithmetic coding or run length coding.

经过量化和熵编码处理后，得到编码后的音频码流，将该码流与公共尺度因子、信号类型分析结果一起进行复用，得到压缩音频码流。 After quantization and entropy coding processing, the encoded audio code stream is obtained, and the code stream is multiplexed together with the common scale factor and signal type analysis result to obtain a compressed audio code stream.

图 8是本发明音频解码装置的结构示意图。音频解码装置包括比特流解复用模块 60、熵解码模块 61、逆量化器组 6²、多分辨率综合模块 63和频率-时间映射模块 64。压缩音频码流经过比特流解复用模块 60 的解复用后，得到相应的数据信号和控制信号，输出到熵解码模块 61和多分辨率综合模块 63;数据信号和控制信号在熵解码模块 61中进行解码处理， ' 复出谱的量化值。上述量化值在逆量化器组 62 中重建，得到逆量化后的谱，逆量化语输出到多分辨率综合模块 63中，经过多分辨率综合后输出到频率-时间映射模块 64 中，再经过频率-时间映射得到时域的音频信号。 Figure 8 is a block diagram showing the structure of an audio decoding device of the present invention. The audio decoding apparatus includes a bit stream by demultiplexing module 60, entropy decoding module 61, an inverse quantizer group ^62, the multi-resolution integration module 63 and a frequency - time map module 64. After the compressed audio code stream is demultiplexed by the bit stream demultiplexing module 60, corresponding data signals and control signals are obtained, which are output to the entropy decoding module 61 and the multi-resolution synthesis module 63; the data signal and the control signal are in the entropy decoding module. In 61, the decoding process is performed, 'the quantized value of the spectrum is recovered. The above quantized values are reconstructed in the inverse quantizer group 62 to obtain an inverse quantized spectrum. The inverse quantized words are output to the multi-resolution synthesis module 63, and after multi-resolution synthesis, are output to the frequency-time mapping module 64, and then The frequency-time mapping yields an audio signal in the time domain.

' 比特流解复用模块 60 对压缩音频码流进行分解，得到相应的数据信号和控制信号，为其他模块提供相应的解码信息。压缩音频数据流经过解复用后，输出到熵解码模块 61 的信号包括公共尺度因子、尺度因子编码值、码书序号编码值和无损编码量化谱，或者是码字索引的编码值；输出信号类型信息到多分辨率综合模块 63。 The bit stream demultiplexing module 60 decomposes the compressed audio stream to obtain corresponding data signals and control signals, and provides corresponding decoding information for other modules. After the compressed audio data stream is demultiplexed, the signal outputted to the entropy decoding module 61 includes a common scale factor, a scale factor coded value, a codebook sequence number coded value, and a lossless coded quantized spectrum, or an encoded value of the codeword index; The type information is sent to the multi-resolution synthesis module 63.

如果在编码装置中量化和熵编码模块 54 中采用标量量化器，则在解码装置中，熵解码模块 61收到的是比特流解复用模块 60输出的公共尺度因子、尺度因子编码值、码书序号编码值和无损编码量化谱，然后对其进行码书序号解码、谱系数解码和尺度因子解码，重建出量化 i普，并向逆量化器组 62 输出尺度因子的整数表示和谱的量化值。熵解码模块 61采用的解码方法与编码装置中熵编码的编码方法相对应，如 Huffman解码、算术解码或游程解码等。 ■ If a scalar quantizer is employed in the quantization and entropy encoding module 54 in the encoding device, in the decoding device, the entropy decoding module 61 receives the common scale factor, scale factor encoded value, and code output by the bitstream demultiplexing module 60. The book serial number coded value and the lossless coded quantized spectrum are then subjected to codebook sequence number decoding, spectral coefficient decoding and scale factor decoding, reconstructed quantization, and output the integer representation of the scale factor and the quantization of the spectrum to the inverse quantizer group 62. value. The decoding method employed by the entropy decoding module 61 corresponds to an entropy-encoded encoding method in the encoding device, such as Huffman decoding, arithmetic decoding, or run-length decoding. ■

逆量化器组 62 接收到谱的量化值和尺度因子的整数表示后，将谱的量化值逆量化为无缩放的重建语（逆量化谱），并向多分辨率综合模块 63输出逆量化借。逆量化器组 62 可以是均匀量化器组，也可以是通过压扩函数实现的非均匀量化器组。在编码装置中，量化器组采用的是标量量化器，则在解码装置中逆量化器组 62 也采用标量逆量化器。在标量逆量^ ^器中，首先对谱的量化值进行非线性扩张，然后利用每个尺度因子得到对应尺度因子带中所有的谱系数（逆量化谱）。 After receiving the quantized value of the spectrum and the integer representation of the scale factor, the inverse quantizer group 62 inversely quantizes the quantized value of the spectrum into a non-scaled reconstruction (inverse quantization spectrum), and outputs the inverse quantization borrowing to the multi-resolution synthesis module 63. . The inverse quantizer group 62 may be a uniform quantizer group or a non-uniform quantizer group implemented by a companding function. In the encoding apparatus, the quantizer group employs a scalar quantizer, and the inverse quantizer group 62 also employs a scalar inverse quantizer in the decoding apparatus. In the scalar inverse quantity controller, the spectral quantized values are first nonlinearly expanded, and then each scale factor is used to obtain all the spectral coefficients (inverse quantized spectrum) in the corresponding scale factor bands.

如果量化和燏编码模块 54中采用矢量量化器，则在解码装置中，熵解码模块 61收到比特流觯复用模块 60 输出的码字索引的编码值，将码字索引的编码值采用与编码时的熵编码方法对应的熵解码方法进行解码，得到对应的码字索引。码字索引输出到逆量化器组 62中，通过查询码书，得到量化值（逆量化谱），输出到多分辨率综合模块 63。逆量化器组 62采用逆矢量量化器。逆量化谱经过多分辨率综合后，通过频率-时间映射模块 64的映射处理，得到时域音频信号。频率-时间映射模块 64可以是逆离散余弦变换（IDCT ) 滤波器组、逆离散傅立叶变换（IDFT )滤波器组、逆修正离散余弦变换（IMDCT ) 滤波器组、逆小波变换滤波器组以及余弦调制滤波器组等。 If the vector quantizer is employed in the quantization and chirp encoding module 54, in the decoding apparatus, the entropy decoding module 61 receives the encoded value of the codeword index output by the bitstream multiplexing module 60, and uses the encoded value of the codeword index. The entropy decoding method corresponding to the entropy coding method at the time of encoding is decoded to obtain a corresponding codeword index. The codeword index is output to the inverse quantizer group 62, and the quantized value (inverse quantized spectrum) is obtained by querying the codebook, and output to the multi-resolution synthesis module 63. The inverse quantizer group 62 employs an inverse vector quantizer. After the inverse quantization spectrum is integrated by the multi-resolution, the time domain audio signal is obtained by the mapping process of the frequency-time mapping module 64. The frequency-time mapping module 64 may be an inverse discrete cosine transform (IDCT) filter bank, an inverse discrete Fourier transform (IDFT) filter bank, an inverse modified discrete cosine transform (IMDCT) filter bank, an inverse wavelet transform filter bank, and a cosine Modulation filter bank, etc.

基于上述解码器的解码方法包括：对压缩音频码流进行解复用，得到数据信息和控制信息；对上述信息进行熵解码，得到 i普的量化值；对谱的量化值进行逆量化处理，得到逆量化傳；将逆量化谱进行多分辨率综合后，再进行频率-时间映射，得到时域音频信号。 The decoding method based on the above decoder includes: demultiplexing the compressed audio code stream to obtain data information and control information; performing entropy decoding on the information to obtain a quantized value of the i-precision; performing inverse quantization processing on the quantized value of the spectrum, The inverse quantization is obtained; after the inverse quantization spectrum is multi-resolution integrated, frequency-time mapping is performed to obtain a time domain audio signal.

如果解复用后的信息中包括码书序号编码值、公共尺度因子、尺度因子编码值和无损编码量化谱，则表明在编码装置中 i香系数是采用标量量化技术进行量化，则嫡解码的步驟包括：对码书序号编码值进行解码，获得所有尺度因子带的码书序号；根据码书序号对应的码书，解码所有尺度因子带的量化系数；解码所有尺度因子带的尺度因子，重建量化谱。上述过程所采用的熵解码方法对应编码方法中的熵编码方法，如游程解码方法、 Huffman 解码方法、算术解码方法等。 If the demultiplexed information includes the codebook serial number encoding value, the common scale factor, the scale factor encoding value, and the lossless encoding quantization spectrum, it indicates that the ixiang coefficient is quantized by the scalar quantization technique in the encoding device, and then the decoding is performed. The steps include: decoding the code number of the code book to obtain the code book number of all the scale factor bands; decoding the quantized coefficients of all the scale factor bands according to the code book corresponding to the code book serial number; decoding the scale factors of all the scale factor bands, reconstructing Quantitative spectrum. The entropy decoding method adopted in the above process corresponds to an entropy coding method in the coding method, such as a run length decoding method, a Huffman decoding method, an arithmetic decoding method, and the like.

下面以采用游程解码方法解码码书序号、采用 Huffman解码方法解码量化系数和采用 Huffman解码方法解码尺度因子为例，说明熵解码的过程。 In the following, the process of entropy decoding is illustrated by using a run-length decoding method to decode a codebook sequence number, a Huffman decoding method to decode a quantized coefficient, and a Huffman decoding method to decode a scale factor.

首先通过猝程解码方法获得所有尺度因子带的码书序号，解码后的码书序号为某一区间的整数，如 £设该区间为 [0， 11] , 那么只有位于该有效范围内的，即 0至 11之间的码书序号才与对应的谱系数 Huffman码书相对应。对于全零子带，可选择某一码书序号对应，典型的可选 0序号。 First, the codebook number of all scale factor bands is obtained by the process decoding method, and the decoded codebook sequence number is an integer of a certain interval. If the interval is set to [0, 11], then only the valid range is within the valid range. That is, the codebook number between 0 and 11 corresponds to the corresponding spectral coefficient Huffman codebook. For all zero sub-bands, you can select a code book serial number corresponding to a typical optional 0 serial number.

当解码得到各尺度因子带的码书号后，使用与该码书号对应的谱系数 Huffman码书，对所有尺度因予带的量化系数进行解码。如果一个尺度因子带的码书号在有效范围内，本实施例如在 1至 11之间，那么该码书号对应一个谱系数码书，则使用该码书从量化谱中解码得到尺度因子带的量化系数的码字索引，然后从码字索引中解包得到量化系数。如果尺度因子带的码书号不在 1至 11之间，那么该码书号不对应任何谱系数码书，该尺度因子带的量化系数也就不用解码，直接将该子带的量化系数全部置为零。 After decoding the codebook number of each scale factor band, the spectral coefficient Huffman codebook corresponding to the codebook number is used to decode the quantized coefficients of all the scales. If the codebook number of a scale factor band is within the valid range, and the implementation is, for example, between 1 and 11, then the codebook number corresponds to a spectral coefficient codebook, and the quantized coefficient of the scale factor band is decoded from the quantized spectrum using the codebook. The codeword index is then unpacked from the codeword index to obtain the quantized coefficients. If the codebook number of the scale factor band is not between 1 and 11, the codebook number does not correspond to any spectral coefficient codebook, and the quantized coefficient of the subband is not decoded, and the quantized coefficients of the subband are all set to zero.

尺度因子用于在逆量化谱系数基础上重构 i普值，如果尺度因子带的码书号处于有效范围内，则每一个码书号都对应一个尺度因子。在对上述尺度因子进行解码时，首先读取第一个尺度因子所占用的码流，然后对其它尺度因子进行 Huffman解码，依次得到各尺度因子与前一尺度因子之间的差值，将该差值与前一尺度因子值相加，得到各尺度因子。如果当前子带的量化系数全部为零，那么该子带的尺度因子不需要解码。 The scale factor is used to reconstruct the i-value based on the inverse quantized spectral coefficients. If the codebook number of the scale factor is within the valid range, each codebook number corresponds to a scale factor. When decoding the above scale factor, first reading the code stream occupied by the first scale factor, and then performing Huffman decoding on other scale factors, and sequentially obtaining the difference between each scale factor and the previous scale factor, The difference is added to the previous scale factor value to obtain each scale factor. in case The quantized coefficients of the current subband are all zero, then the scale factor of the subband does not need to be decoded.

经过上述熵解码过程后，得到谱的量化值和尺度因子的整数表示，然后对谱的量化值进行逆量化处理，获得逆量化谱。逆量化处理包括：对谱的量化值进行非线性扩张；根据每个尺度因子得到对应尺度因子带中的所有谱系数（逆量化谱）。 After the above entropy decoding process, an integer representation of the spectral quantized value and the scale factor is obtained, and then the quantized value of the spectrum is inversely quantized to obtain an inverse quantized spectrum. The inverse quantization process includes: nonlinearly expanding the quantized values of the spectra; and obtaining all spectral coefficients (inverse quantized spectra) in the corresponding scale factor bands according to each scale factor.

如果解复用后的信息中包括码字索引的编码值，则表明编码装置中釆用矢量量化技术对谱系数进行量化，则嫡解码的步驟包括：采用与编码装置中熵编码方法对应的熵解码方法对码字索引的编码值进行解码，得到码字索引。然后对码字索引进行逆量化处理，获得逆量化譜。 If the demultiplexed information includes the coded value of the codeword index, it indicates that the coding means quantizes the spectral coefficients by using a vector quantization technique, and then the step of decoding comprises: adopting an entropy corresponding to the entropy coding method in the coding device The decoding method decodes the encoded value of the codeword index to obtain a codeword index. The codeword index is then inverse quantized to obtain an inverse quantized spectrum.

对于逆量化谱，如果是快变类型信号，则对频域系数进行多分辩率分析，然后对频域系数的多分辨率表示进行量化和熵编码；如果不是快变类型信号，则直接将频域系数进行量化和熵编码。 For the inverse quantization spectrum, if it is a fast-varying type signal, multi-resolution analysis is performed on the frequency domain coefficients, and then the multi-resolution representation of the frequency domain coefficients is quantized and entropy encoded; if it is not a fast-changing type signal, the frequency is directly The domain coefficients are quantized and entropy encoded.

多分辨率综合可采用频域小波变换法或频域 MDCT 变换法。频域小波综合法包括：先将上述时频平面系数按照一定的规则重组；再对频域系数进行小波变换，得到时频平面系数。而 MDCT 变换法则包括：先将上述时频平面系数按照一定的规则重组，再对频域系数进行数次 MDCT 变换，得到时频平面系数。重组的方法可以包括：先将时频平面系数在频率方向组织，每个频带中的系数在时间方向组织，然后将组织好的系数按照子窗、尺度因子带的顺序排列。 Multi-resolution synthesis can be performed by frequency domain wavelet transform or frequency domain MDCT transform. The frequency domain wavelet synthesis method includes: first recombining the above-mentioned time-frequency plane coefficients according to a certain rule; then performing wavelet transform on the frequency domain coefficients to obtain a time-frequency plane coefficient. The MDCT transform method includes: first recombining the above-mentioned time-frequency plane coefficients according to a certain rule, and then performing MDCT transform on the frequency domain coefficients several times to obtain a time-frequency plane coefficient. The method of recombining may include: firstly, the time-frequency plane coefficients are organized in the frequency direction, the coefficients in each frequency band are organized in the time direction, and then the organized coefficients are arranged in the order of the sub-window and the scale factor sub-band.

对频域系数进行频率-时间映射处理的方法与编码方法中的时-频映射处理方法相对应，可以采用逆离散余弦变换（ IDCT )、逆离散傅立叶变换（ IDFT )、逆修正离散余弦变换 ( IMDCT )、逆小波变换等方法完成。 The method of performing frequency-time mapping processing on frequency domain coefficients corresponds to the time-frequency mapping processing method in the encoding method, and may use inverse discrete cosine transform (IDCT), inverse discrete Fourier transform (IDFT), inverse modified discrete cosine transform ( IMDCT), inverse wavelet transform and other methods are completed.

下面以逆修正离散余弦变换 IMDCT为例说明频率 -时间映射过程。频率-时间映射过程包括三个步驟： IMDCT变换、时域加窗处理和时域叠加运算。 The inverse-corrected discrete cosine transform IMDCT is taken as an example to illustrate the frequency-time mapping process. The frequency-time mapping process consists of three steps: IMDCT transformation, time domain windowing, and time domain superposition.

首先对预测前的谱或逆量化谱进行 IMDCT变换，得到变换后的时域信号 x,.„。 IMDCT变换的表达式为： χ,.„ , 其中， η表示样本序号, 且 First, the IMDCT transform is performed on the pre-predicted spectrum or the inverse quantized spectrum to obtain the transformed time domain signal x,. The expression of the IMDCT transform is: χ,.„ , where η represents the sample number, and

'

0≤ n < N , 表示时域样本数, 取值为 2048 , "₀ = (Λ/2+1) / 2 ; /表示帧序号; 表示谱序号。 0 ≤ n < N , indicating the number of time domain samples, the value is 2048, " ₀ = (Λ/2 + 1) / 2 ; / indicates the frame number; indicates the spectrum number.

其次，对 IMDCT变换获得的时域信号在时域进行加窗处理。为满足完全重构条件，窗函数 (n)必须满足以下两个条件： w(2M -\ - n) = w(n)且 w² (n) + w² (" + = 1。典型的窗函数有 S i ne窗、 Ka i se r- Bes se l窗等。本发明采用一种固定的窗函数，其窗函数为：

Secondly, the time domain signal obtained by the IMDCT transform is windowed in the time domain. To satisfy the full reconstruction condition, the window function (n) must satisfy the following two conditions: w(2M -\ - n) = w(n) and w ² (n) + w ² (" + = 1. Typical window functions are S i ne windows, Ka i se r- Bes se l windows, and the like. The present invention employs a fixed window function whose window function is:

0. . . N-l ; 表示窗函数的第 k个系数，有 w (k) = w (2*^l-i)；表示编码帧的样本数，取值为 7^1 024。另外可以利用双正史变换，采用特定的分析滤波器和合成滤波器修改上述对窗函数的限制。 0. . . N -l ; represents the kth coefficient of the window function, with w (k) = w (2*^l-i); represents the number of samples of the encoded frame, and the value is 7^1 024. Alternatively, the double positive history transform can be used to modify the above-mentioned restrictions on the window function using a specific analysis filter and synthesis filter.

最后，对上述加窗时域信号进行叠加处理，得到时域音频信号。具体是：将加窗操作后获得的信号的前 Nil个样本和前一帧信号的后 Nil个样本重叠相加，获得 Nil个输出的时域音频样本, timeSam_{i n} = preSa _{i n} +

, 其中表示帧序号, 表示样本序号，有 0≤"≤ 且 Ν的取值为 2 048。 Finally, the windowed time domain signal is superimposed to obtain a time domain audio signal. Specifically, the first Nil samples of the signal obtained after the windowing operation and the last Nil samples of the previous frame signal are overlapped and added to obtain Nil output time domain audio samples, timeSam _in = preSa _in +

, where the frame number is indicated, indicating the sample number, with 0 ≤ " ≤ and Ν is 2 048.

2 2

图 9是本发明编码装置的第一个实施例的示意图。该实施例在图 5的基础上，增加了频域线性预测及矢量量化模块 56 , 位于多分辨率分析模块 53的输出与量化和熵编码模块 5 的输入之间，输出残差序列到量化和熵编码模块 54 , 同时将量化得到的码字索引作为边信息输出到比特流复用模块 55。 Figure 9 is a schematic illustration of a first embodiment of an encoding apparatus of the present invention. This embodiment adds a frequency domain linear prediction and vector quantization module 56 on the basis of FIG. 5, between the output of the multiresolution analysis module 53 and the input of the quantization and entropy coding module 5, and outputs the residual sequence to the quantized sum. The entropy encoding module 54 simultaneously outputs the quantized codeword index as side information to the bitstream multiplexing module 55.

由于频域系数在经过多分辨率析后得到的是具有特定时频平面划分的时频系数，因此频域线性预测及矢量量化模块 56 需对每个时间段上的频域系数进行线性预测和多级矢量量化。 Since the frequency domain coefficients are obtained by multi-resolution analysis, the time-frequency coefficients with specific time-frequency plane divisions are obtained, so the frequency-domain linear prediction and vector quantization module 56 needs to perform linear prediction on the frequency domain coefficients in each time period. Multi-level vector quantization.

多分辨率分析模块 53输出的频¾戈系数传送至频域线性预测及矢量量化模块 56中，在对频域系数进行多分辨率分析后， ^"每个时间段上的频域系数进行标准的线性预测分析；如果预测增益满足给定的条件，则才频域系数进行线性预测误差滤波，获得的预测系数转换成线谱频率系数 LSF ( Line Spec t rum Frequency ), 再采用最佳的失真度量准则搜索计算出各级码本的码字索引，并将码宇索引作为边信息传送到比特流复用模块 55 , 而经过预测分析得到的残差序列则输出到量^ ^和熵编码模块 54。 The frequency signal output from the multi-resolution analysis module 53 is transmitted to the frequency domain linear prediction and vector quantization module 56. After multi-resolution analysis of the frequency domain coefficients, ^" the frequency domain coefficients on each time period are standard. Linear predictive analysis; if the predicted gain satisfies a given condition, the frequency domain coefficients are linearly predicted error filtered, and the obtained predictive coefficients are converted into line spectral frequency coefficients LSF (Line Spec t rum Frequency ), and then the best distortion is used. The metric search calculates the codeword index of each codebook, and transmits the code index as side information to the bit stream multiplexing module 55, and the residual sequence obtained through the prediction analysis is output to the quantity ^^ and the entropy coding module. 54.

频域线性预测及矢量量化模块 56 由线性预测分析器、线性预测滤波器、转换器和矢量量化器构成。频域系数输入到线' f生预测分析器中进行预测分析，得到预测增益和预测系数，对满足一定条件的频域系数，愉出到线性预测滤波器中进行滤波，得到残差序列；残差序列直接输出到量化和熵编码模 54 中，而预测系数通过转换器转换成线谱频率系数 LSF , 再将 LSF 参数送入矢量量化器中进行多级矢量量化，量化后的信号被传送到比特流复用模块 55中。 The frequency domain linear prediction and vector quantization module 56 is composed of a linear predictive analyzer, a linear predictive filter, a converter, and a vector quantizer. The frequency domain coefficient is input into the line 'f prediction analyzer for prediction analysis, and the prediction gain and the prediction coefficient are obtained. The frequency domain coefficients satisfying certain conditions are filtered out to the linear prediction filter to obtain a residual sequence; The difference sequence is directly output to the quantization and entropy coding mode 54, and the prediction coefficient is converted into a line spectral frequency coefficient LSF by the converter, and then the LSF parameter is sent to the vector quantizer for multi-level vector quantization, and the quantized signal is transmitted to In the bit stream multiplexing module 55.

对音频信号进行频域线性预测理能够有效地抑制预回声并获得较大的编码增益。假设实信号其平方 Hi lber t 包络 e y>表示为： _e(t) = F— ¹ fc( ） . C*( - ^, 其中 Frequency domain linear prediction of the audio signal can effectively suppress the pre-echo and obtain a larger coding gain. Fake Let the real signal have its square Hi lber t envelope e y> as: _e (t) = F - ¹ fc( ) . C*( - ^, where

CO为对应于信号正频率成分的单边 i普，即信号的 Hi lber t包络是与该信号谱的自相关函数有关的。而信号的功率谙密度函数与其时域波形的自相关函数的关系为： PSD(f) , 因此信号在时域的平方 Hi lber t包络与信号在频域的功率

CO is a one-sided i-perform corresponding to the positive frequency component of the signal, ie the Hi lber t envelope of the signal is related to the autocorrelation function of the signal spectrum. The relationship between the power 谙 density function of the signal and the autocorrelation function of its time domain waveform is: PSD(f) , so the squared Hi lber t envelope of the signal in the time domain and the power of the signal in the frequency domain

借密度函数是互为对偶关系的。由上可口，每个一定频率范围内的部分带通信号，如果它的 Hi lbert包絡保持恒定，那么相邻语值的自相关也将保持恒定，这就意味着谱系数序列相对于频率而言是稳态序列，从而可以用预测编码技术来对语值进行处理，用公用的一组预测系数来有效地表示该信号。 The borrowing density function is mutually dual. From the above, the partial bandpass signal in each certain frequency range, if its Hi lbert envelope remains constant, then the autocorrelation of the adjacent value will remain constant, which means that the spectral coefficient sequence is relative to the frequency. It is a steady-state sequence so that the predictive coding technique can be used to process the speech values, and a common set of prediction coefficients is used to effectively represent the signal.

基于图 9所示编码装置的编码方法与基于图 5所示编码装置的编码方法基本相同，区别在于增加了下述步骤：当频域系数经过多分辨率分析后，对每个时间段上的频域系数进行标准的线性预测分析，得到预测增益和预测系数；判断预测增益是否超过设定的阔值，如果超过，则根据预测系数对频域系数进行频域线性预测误差滤波，得到残差序列；将预测系数转化成线对频率系数，并对线潜对频率系数进行多级矢量量化处理，得到边信息；对残差序列进行量化和熵编码；如果预测增益未超过设定的阈值，则对频域系数进行量化和熵编码。 · The encoding method based on the encoding apparatus shown in FIG. 9 is basically the same as the encoding method based on the encoding apparatus shown in FIG. 5, except that the following steps are added: When the frequency domain coefficients are subjected to multi-resolution analysis, for each time period The frequency domain coefficient is subjected to standard linear prediction analysis to obtain prediction gain and prediction coefficient; whether the prediction gain exceeds the set threshold value, and if it exceeds, frequency domain linear prediction error filtering is performed on the frequency domain coefficient according to the prediction coefficient to obtain a residual The prediction coefficient is converted into a line pair frequency coefficient, and the line potential is subjected to multi-level vector quantization processing to obtain side information; the residual sequence is quantized and entropy encoded; if the prediction gain does not exceed the set threshold, The frequency domain coefficients are then quantized and entropy encoded. ·

当频域系数经过多分辨率分析后，首先对每个时间段上的频域系数进行标准的线性预测分析，包括计算自相关矩阵、递推执行 Levinson-Durb in算法获得预测增益和预测系数。然后判断计算的预测增益是否超过预先设定的阈值，如果超过，则根据预测系数对频域系数进行线性预测误差滤波；否则对频域系数不作处理，执行下一步驟，对频域系数进行量化和熵编码。 After the multi-resolution analysis of the frequency domain coefficients, the standard linear prediction analysis of the frequency domain coefficients on each time period is first performed, including calculating the autocorrelation matrix and recursively performing the Levinson-Durb in algorithm to obtain the prediction gain and the prediction coefficient. Then, it is judged whether the calculated prediction gain exceeds a preset threshold, and if it exceeds, linear prediction error filtering is performed on the frequency domain coefficients according to the prediction coefficient; otherwise, the frequency domain coefficients are not processed, and the next step is performed to quantize the frequency domain coefficients. And entropy coding.

线性预测可分为前向预测和后向预测两种，前向预测是指利用某一时刻之前的值预测当前值，而后向预测是指利用某一时刻之后的值预测当前值。下面以前向预测为例说明线性预测误差滤波，线性预测误差滤波器的传递函数为 = 1 -∑^.2-'' , 其中 α,,表示预测系数， ρ为预测阶数。经过时间-频率变换后的频域系数 YU经过滤波后，得到预测误差 Linear prediction can be divided into forward prediction and backward prediction. Forward prediction refers to predicting the current value by using the value before a certain moment, while backward prediction refers to predicting the current value by using the value after a certain moment. In the following, the linear prediction error filtering is described as an example. The transfer function of the linear prediction error filter is = 1 -∑^.2-'', where α, represents the prediction coefficient, and ρ is the prediction order. After the time-frequency transform, the frequency domain coefficient YU is filtered to obtain the prediction error.

E (k) , 也称残差序列，两者之间满足关系 E k) = X(k) · A(z) = X(k) - J a_tX{k - i)。E (k) , also known as the residual sequence, satisfies the relationship between the two E k) = X(k) · A(z) = X(k) - J a _t X{k - i).

=1 =1

这样，经过线性预测误差滤波，时间-频率变换输出的频域系数 X (k)就可以用残差序列 E ί¾和一组预测系数表示。然后将这组预测系数 a_t转换成线' i普频率系数 LSF , 并对其进行多级矢量量化，矢量量化选择最佳的失真度量准则（如最近邻准则），搜索计算出各级码本的码字索引，以此可确定预测系数对应的码宇，将码字索引作为边信息输出。同时，对残差序列进行量化和熵编码。由线性预测分析编码原理可知，谱系数的残差序列的动态范围小于原始系数的动态范围，因此在量化时可以分配较少的比特数，或者对于相同比特数的条件，可以获得改进的编码增益。 Thus, after linear prediction error filtering, the frequency domain coefficient X (k) of the time-frequency transform output can be represented by the residual sequence E ί3⁄4 and a set of prediction coefficients. And then converting the set of prediction coefficients a _t into a line 'i Pu frequency coefficient LSF and Perform multi-level vector quantization, vector quantization selects the best distortion metric (such as nearest neighbor criterion), and searches and calculates the codeword index of each codebook, so as to determine the code corresponding to the prediction coefficient, and use the codeword index as Side information output. At the same time, the residual sequence is quantized and entropy encoded. According to the linear prediction analysis coding principle, the dynamic range of the residual sequence of the spectral coefficients is smaller than the dynamic range of the original coefficients, so that fewer bits can be allocated in the quantization, or an improved coding gain can be obtained for the same number of bits. .

图 10是解码装置的实施例一的示意图，该解码装置在图 8所示解码装置的基础上，增加了逆频域线性预测及矢量量化模块 65 , 该逆频域裁性预测及矢量量化模块 65位于逆量化器組 62的输出与多分辨综合模块 63的输入之间，并且比特流解复用模块 60向其输出逆频域线性预测矢量量化控制信息，用于对逆量化 f (残差讲）进行逆量化处理和逆线性预测滤 -波，得到预测前的谱，并输出到多分辨率综合模块 63中。 10 is a schematic diagram of Embodiment 1 of a decoding apparatus. The decoding apparatus adds an inverse frequency domain linear prediction and vector quantization module 65 based on the decoding apparatus shown in FIG. 8, and the inverse frequency domain cropping prediction and vector quantization module 65 is located between the output of the inverse quantizer group 62 and the input of the multiresolution synthesis module 63, and the bitstream demultiplexing module 60 outputs inverse frequency domain linear predictive vector quantization control information thereto for inverse quantization f (residual The inverse quantization process and the inverse linear prediction filter-wave are performed to obtain a spectrum before prediction, and output to the multi-resolution synthesis module 63.

在编码器中，采用频域线性预测矢量量化技术来 p制预回声，并获得较大的编码增益。因此在解码器中，逆量化谱和比特流解复用模块 60 输出的逆频域线性预测矢量量化控制信息输入到逆频域线性预测及矢量量化模块 65中恢出线性预测前的谱。 In the encoder, frequency domain linear predictive vector quantization techniques are used to p-pre-echo and obtain a larger coding gain. Therefore, in the decoder, the inverse frequency domain linear predictive vector quantization control information output by the inverse quantization spectrum and bit stream demultiplexing module 60 is input to the inverse frequency domain linear prediction and vector quantization module 65 to recover the spectrum before the linear prediction.

逆频域线性预测及矢量量化模块 65 包括逆矢量量化器、逆转换器和逆线性预测滤波器，其中逆矢量量化器用于对码字索引进行逆量化得至 J线语对频率系数（ LSF ); 逆转换器则用于将线谱频率系数（LSF )逆转换为预测系数；逆线性预测滤波器用于居预测系数对逆量化语进行逆滤波，得到预测前的谱，并输出到多分辨率综合模块 63。 The inverse frequency domain linear prediction and vector quantization module 65 includes an inverse vector quantizer, an inverse transformer, and an inverse linear predictor, wherein the inverse vector quantizer is used to inverse quantize the codeword index to the J line pair frequency coefficient (LSF). The inverse converter is used to inversely convert the line spectral frequency coefficient (LSF) into a prediction coefficient; the inverse linear prediction filter is used to inversely filter the inverse quantized word by the prediction coefficient, obtain the spectrum before prediction, and output to the multi-resolution Synthesis module 63.

基于图 10所示解码装置的解码方法与基于图 8所示解码装置的解码方法基本相同，区别在于增加了下述步骤：在得到逆量化语后，判断控制信息中是否包含逆量化谱需要经过逆频域线性预测矢量量化的信息，如果含有，则进行逆矢量量化处理，得到预测系数，并根据预测系数对逆量化谱进行线性预测合成，得到测前的谱；将预测前的傳进行多分辨率综合。 The decoding method based on the decoding device shown in FIG. 10 is basically the same as the decoding method based on the decoding device shown in FIG. 8, except that the following steps are added: After obtaining the inverse quantization word, it is determined whether or not the inverse quantization spectrum is included in the control information. Inverse frequency domain linear predictive vector quantization information, if it is, then inverse vector quantization process is performed to obtain prediction coefficients, and linear prediction synthesis is performed on the inverse quantization spectrum according to the prediction coefficients to obtain a pre-test spectrum; Resolution synthesis.

在获得逆量化语后，根据控制信息判断该帧信号是否经过频域线性预测矢量量化，如果是，则从控制信息中获取预测系数矢量量化后的码字索引；再根据码字索引得到量化的线谱频率系数（LSF ), 并以此计算出预测系数；然后夺逆量化谱进行线性预测合成，得到预测前的谱。线性预测误差滤波处理所采用的传递函数 Α ω^ '. Α(ζ) = 1 - ^ ,.ζ— '' , 其中：是预 After obtaining the inverse quantized language, determining, according to the control information, whether the frame signal is subjected to frequency domain linear predictive vector quantization, and if so, obtaining a codeword index of the predicted coefficient vector quantization from the control information; and then obtaining the quantized according to the codeword index The line frequency coefficient (LSF) is used to calculate the prediction coefficient; then the inverse quantization spectrum is used for linear prediction synthesis to obtain the spectrum before prediction. The transfer function used in the linear prediction error filtering process Α ω^ '. Α(ζ) = 1 - ^ ,.ζ— '' , where: is pre

(=1 (=1

测系数；为预测阶数。因此残差序列与预则前的谱 0"满足： Xik) = E(k) . -i- = EW + X a_tX(k― i)。这样，残差序列和计算出的预测系数 α,.经过频域线性预测合成，就可得到预测前的语 Measuring coefficient; for predicting the order. Therefore, the residual sequence and the pre-predicted spectrum 0" are satisfied: Xik) = E(k) . -i- = EW + X a _t X(k― i). Thus, the residual sequence and the calculated prediction coefficient α, after frequency domain linear prediction synthesis, can obtain the pre-predictive language.

X(k) , 将预测前的谱 0 >进行频率 -时间映射处理。 X(k), the pre-predicted spectrum 0 > is subjected to frequency-time mapping processing.

如果控制信息表明该信号帧没有经过频域线性预测矢量量化，则不进行逆频域线性预测矢量量化处理，将逆量化镨直接进行频率-时间映射处理。 If the control information indicates that the signal frame has not undergone frequency domain linear predictive vector quantization, the inverse frequency domain linear predictive vector quantization process is not performed, and the inverse quantization 镨 is directly subjected to frequency-time mapping processing.

图 11给出了本发明编码装置的第二个实施例的结构示意图。该实施例在图 5 的基础上增加了和差立体声（M/S )编码模块 57 , 该模块位于多分辨率分析模块 53的输出与量化和熵编码模块 54的输入之间。对于多声道信号，心理声学分析模块 51除了计算音频信号单声道的掩蔽阈值，还要计算和差声道的掩蔽阈值，输出到量化和熵编码模块 5⁴。和差立体声编码模块 57还可以位于量化和熵编码模块 54中的量化器组与编码器之间。 Figure 11 is a block diagram showing the structure of a second embodiment of the encoding apparatus of the present invention. This embodiment adds a difference stereo (M/S) encoding module 57 on the basis of FIG. 5 between the output of the multiresolution analysis module 53 and the input of the quantization and entropy encoding module 54. For multi-channel signal, a psychoacoustic analysis module 51 in addition to a monaural audio signal is calculated masking threshold value, and further calculates a masking threshold difference channel output to quantization and entropy encoding module ^54. The and difference stereo encoding module 57 may also be located between the quantizer group and the encoder in the quantization and entropy encoding module 54.

和差立体声编码模块 57 是利用声道对中两个声道之间的相关性，将左右声道的频域系数 /残差序列等效为和差声道的频域系数 /残差序列，以此达到减少码率和提高编码效率的效果，因此只适用于信号类型一致的多声道信号。如果是单声道信号或者信号类型不一致的多声道信号，则不进行和差立体声编码处理。 And the difference stereo encoding module 57 is to use the correlation between the two channels of the channel pair, and the frequency domain coefficient/residual sequence of the left and right channels is equivalent to the frequency domain coefficient/residual sequence of the difference channel, In this way, the effect of reducing the code rate and improving the coding efficiency is achieved, so that it is only applicable to multi-channel signals of the same signal type. If it is a mono signal or a multi-channel signal of which the signal type is not uniform, the difference and the stereo encoding process are not performed.

基于图 11所示编码装置的编码方法与基于图 5所示编码装置的编码方法基;^目同，区别在于增加了下述步驟：在对频域系数进行量化和燏編码处理之前，判断音频信号是否为多声道信号，如果是多声道信号，则判断左、右声道信号的信号类型是否一致，如果信号类型一致，则判断两声道对应的尺度因子带之间是否满足和差立体声编码条件，如果满足，则对其进行和差立体声编码，得到和差声道的频域系数；如果不满足，则不进行和差立体声编码；如果是单声道信号或信号类型不一致的多声道信号，则对频域系数不进行处理。 The encoding method based on the encoding apparatus shown in FIG. 11 is the same as the encoding method based on the encoding apparatus shown in FIG. 5, and the difference is that the following steps are added: before the quantization and encoding processing of the frequency domain coefficients are performed, Whether the audio signal is a multi-channel signal, if it is a multi-channel signal, it is determined whether the signal types of the left and right channel signals are consistent, and if the signal types are the same, it is determined whether the scale factor bands corresponding to the two channels are satisfied and The difference stereo coding condition, if it is satisfied, is subjected to the sum and the difference stereo coding to obtain the frequency domain coefficients of the difference channel; if not, the difference and the stereo coding are not performed; if the mono signal or the signal type is inconsistent For multi-channel signals, the frequency domain coefficients are not processed.

和差立体声编码除了可以应用在量化处理之前，还可以应用在量化之后、熵编码之前，即：在对频域系数量化后，判断音频信号是否为多声道信号，如果是多声道信号，则判断左、右声道信号的信号类型是否一致，如果信号类型一致，则判断两声道对应的尺度因子带之间是否满足和差立体声编码条件，如果满足，则对其进行和差立体声编码；如果不满足，则不进行和差立体声编码处理；如果是单声道信号或信号类型不一致的多声道信号，则对频域系数不进^ "和差立体声编码处理。 And the difference stereo coding can be applied before quantization and before entropy coding, that is, after quantization of the frequency domain coefficients, whether the audio signal is a multi-channel signal, and if it is a multi-channel signal, Then, it is judged whether the signal types of the left and right channel signals are consistent. If the signal types are the same, it is judged whether the scale factor bands corresponding to the two channels satisfy the difference and the stereo encoding conditions, and if they are satisfied, the stereo coding is performed. If not, the stereo encoding process is not performed; if it is a mono signal or a multi-channel signal with inconsistent signal types, the frequency domain coefficients are not processed and the differential stereo encoding process is performed.

判断尺度因子带是否可进行和差立体声编码的方法艮多，本发明采用的判断方法是：通过 K- L变换。具体判断过程如下：假如左声道尺度因子带的谱系数为 l (k) , 右声道相对应的尺度因子带的谱系数为 r (k) , 其相关矩阵There are many methods for judging whether the scale factor band can be performed and differential stereo coding. The judgment method adopted by the present invention is: by K-L transform. The specific judgment process is as follows: If the spectral coefficient of the left channel scale factor band is l (k), the spectral coefficient of the scale factor band corresponding to the right channel is r (k), and its correlation matrix

1 1

C„. =丄∑r(W * r(k)；是尺度因子带的谱线数目对相关矩阵 C进行 -Z变换，得到 C„. =丄∑r(W * r(k); is the number of spectral lines of the scale factor band. The -Z transformation is performed on the correlation matrix C to obtain

0、 cos a - sm a 0, cos a - sm a

RCR' = A 其中， R RCR' = A where R

0 λ... sin a cos a 0 λ... sin a cos a

旋转角度 a满足 tan(2iz) 就是和差立体声编码模式。因此

The rotation angle a satisfies tan(2iz) and the difference stereo coding mode. therefore

当旋转角度 a的绝对值偏离 ττ/4较小时，比如3 /16 < |^ < 5 /16 , 对应的尺度因子带可以进行和差立体声编码。 . When the absolute value of the rotation angle a deviates from ττ/4, such as 3 /16 < |^ < 5 /16 , the corresponding scale factor band can be subjected to and differential stereo coding. .

如果和差立体声编码应用在量化处理之前 , 则将左右声道在尺度因子带的频域系数通过线性变换用和差声道的频域系数代替：

If the sum and difference stereo coding are applied before the quantization process, the frequency domain coefficients of the left and right channels in the scale factor band are replaced by the frequency domain coefficients of the linear transform and the difference channel:

其中， Μ表示和声道频域系数； S表示差声道频域系数； L表示左声道频域系数； R表示为右声道频域系数。 Where Μ denotes the channel frequency domain coefficient; S denotes the difference channel frequency domain coefficient; L denotes the left channel frequency domain coefficient; R denotes the right channel frequency domain coefficient.

如果和差立体声编码应用在量化之后，则左右声道在尺度因千带的量化后的频域系数通过线性变换用和差声道的频域系数代替：

If the sum and difference stereo coding are applied after quantization, the quantized frequency domain coefficients of the left and right channels at the scale due to the thousands of bands are replaced by the frequency domain coefficients of the linear transform and the difference channel:

其中： /表示量化后的和声道频域系数；表示量化后的差声道频域系数；表示量化后的左声道频域系数； A表示量化后的右声道频域系数。 Where: / represents the quantized sum channel frequency domain coefficients; represents the quantized difference channel frequency domain coefficients; represents the quantized left channel frequency domain coefficients; A represents the quantized right channel frequency domain coefficients.

将和差立体声编码放在量化处理之后，不仅可以有效的去除左右声道的相关，而且由于在量化后进行，因此可以达到无损编码。 By placing the sum and difference stereo coding after the quantization process, not only the correlation of the left and right channels can be effectively removed, but also since the quantization is performed, lossless coding can be achieved.

图 12是解码装置的实施例二的示意图。该解码装置在图 8所示的解码装置的基础上，增加了和差立体声解码模块 66，位于逆量化器组 62的输出与多分辦率综合模块 63的输入之间，接收比特流解复用模块 60输出的信号类型分析结果与和差立体声控制信号，用于根据上述控制信息将和差声道的逆量化谱转换成左右声道的逆量化潘。 Figure 12 is a schematic diagram of a second embodiment of a decoding apparatus. The decoding apparatus adds a difference and difference stereo decoding module 66, based on the decoding apparatus shown in FIG. 8, between the output of the inverse quantizer group 62 and the input of the multi-dividend ratio synthesizing module 63, and receives the bit stream demultiplexing. The signal type analysis result and the difference stereo control signal output by the module 60 are used for The inverse quantized spectrum of the difference channel and the difference channel are converted into inverse quantization pans of the left and right channels according to the above control information.

在和差立体声控制信号中，有一个标志位用于表明当前声道对是否需要和差立体声解码，若需要，则在每个尺度因子带上也有一个标志位表明对应尺度因子带是否需要和差立体声解码，和差立体声解码模块 66 根据尺度因子带的标志位，确定是否需要对某些尺度因子带中的逆量化侮进行和差立体声解码。如果在编码装置中进行了和差立体声编码，则在解码装置中必须对逆量化语进行和差立体声解码。 In the sum and difference stereo control signals, there is a flag to indicate whether the current channel pair needs to be and the difference stereo decoding. If necessary, there is also a flag on each scale factor band indicating whether the corresponding scale factor band needs and is poor. The stereo decoding, and difference stereo decoding module 66 determines whether inverse quantization 侮 and differential stereo decoding are required in certain scale factor bands based on the flag bits of the scale factor band. If the sum and the difference stereo coding are performed in the encoding device, the inverse quantized speech must be subjected to the difference stereo decoding in the decoding device.

和差立体声解码模块 66还可以位于嫡解码模块 61的输出与逆量化器組 6 2的输入之间，接收比特流解复用模块 60输出的和差立体声控制信号和信号类型分析结。 The sum and difference stereo decoding module 66 may also be located between the output of the demodulation module 61 and the input of the inverse quantizer group 62, and receive the sum and difference stereo control signals and signal type analysis outputs output by the bitstream demultiplexing module 60.

基于图 12所示解码装置的解码方法基本与基于图 8所示解码装置的解码方法相同，区别在于增加了下述步骤：在得到逆量化谱后，如果信号类型分析结果表明信" "类型一致，则根据和差立体声控制信号判断是否需要对逆量化语进行和差立体声解码；如果需要，则根据每个尺度因子带上的标志位判断该尺度因子带是否需要和差立体声解码， ^。果需要，则将该尺度因子带中的和差声道的逆量化谱转换成左右声道的逆量化谱，再进行后续处理；如果信号类型不一致或者不需要进行和差立体声解码，则对逆量化谱不进 4亍处理，直接进行后续处理。 The decoding method based on the decoding apparatus shown in FIG. 12 is basically the same as the decoding method based on the decoding apparatus shown in FIG. 8, except that the following steps are added: after the inverse quantization spectrum is obtained, if the signal type analysis result indicates that the letter type is consistent And determining whether it is necessary to perform inverse stereo decoding on the inverse quantized signal according to the difference stereo control signal; if necessary, determining whether the scale factor band requires and difference stereo decoding according to the flag bit on each scale factor band, ^. If necessary, the inverse quantized spectrum of the difference channel in the scale factor band is converted into the inverse quantized spectrum of the left and right channels, and then subjected to subsequent processing; if the signal type is inconsistent or does not need to perform and differential stereo decoding, then The quantized spectrum is processed without further processing.

和差立体声解码还可以在熵解码处理之后、逆量化处理之前进行，即：当将到谱的量化值后，如果信号类型分析结果表明信号类型一致，则根据和差立体声控制信号判断是否需要对谱的量化值进行和差立体声解码；如果需要，则根据每个尺度因子带上的标志位判断该尺度因子带是否需要和差立体声解码，如果需要，则将该尺度因子带中的和差声道的傳的量化值转换成左右声道的谱的量化值，再进行后续处理；如果信号类型不一致或者不需要进行和差立体声解码，则对谱的量化值不进行处理，直接进行后续处理。 And the difference stereo decoding can also be performed after the entropy decoding process and before the inverse quantization process, that is, after the quantized value of the spectrum, if the signal type analysis result indicates that the signal types are consistent, it is determined according to the difference and the stereo control signal whether it is necessary to The quantized value of the spectrum is subjected to and the difference stereo decoding; if necessary, the flag bit on each scale factor band is used to determine whether the scale factor band requires and the difference stereo decoding, and if necessary, the sum of the scale factor bands The quantized value of the channel is converted into the quantized value of the spectrum of the left and right channels, and then subjected to subsequent processing; if the signal type is inconsistent or does not need to be subjected to and the difference stereo decoding, the quantized value of the spectrum is not processed, and subsequent processing is directly performed.

如果和差立体声解码在熵解码之后、逆量化之前，则左右声道在尺度因子带的频域系数采用下列运算通过和差声道的频域系数得到：，其中：表示量化后的 If the sum and difference stereo decoding are after entropy decoding and before inverse quantization, then the frequency domain coefficients of the left and right channels in the scale factor band are obtained by the following operations through the frequency domain coefficients of the difference channel: , where: represents the quantized

和声道频域系数； S表示量化后的差声道频域系数； Z表示量化后的左声道频域系数； f表示量化后的右声道频域系数。

And the channel frequency domain coefficient; S represents the quantized difference channel frequency domain coefficient; Z represents the quantized left channel frequency domain coefficient; f represents the quantized right channel frequency domain coefficient.

如果和差立体声解码在逆量化之后，则左右声道在子带的逆量化后的频威系数根据下面的矩阵运算通过和差声道的频域系数得到，其中：表和声道频域 If the inverse stereo decoding is after inverse quantization, the frequency coefficients of the left and right channels after inverse quantization in the subband are obtained according to the following matrix operation and the frequency domain coefficients of the difference channel, where: Table and channel frequency domain

系数； s表示差声道频域系数； 7表示左声道频域系数； "表示右声道频域系数。图 13给出了本发明编码装置的第三个实施例的结构示意图。该实施例在图 9 ^基础上，增加了和差立体声编码模块 57, 位于频域线性预测及矢量量化模块 56的输出与量化和熵编码模块 54的输入之间，心理声学分析模块 51将和差声道的掩蔽阐值输出到 4：化和熵编码模块 54。 Coefficient; s represents the difference channel frequency domain coefficient; 7 represents the left channel frequency domain coefficient; "represents the right channel frequency domain coefficient. Fig. 13 is a view showing the configuration of a third embodiment of the encoding apparatus of the present invention. This embodiment, on the basis of Fig. 9, adds a difference and stereo encoding module 57 between the output of the frequency domain linear prediction and vector quantization module 56 and the input of the quantization and entropy encoding module 54, the psychoacoustic analysis module 51 The masking values of the difference channels are output to a 4: quantization and entropy encoding module 54.

和差立体声编码模块 57 也可以位于量化和熵编码模块 54 中的量化器组与编冯器之间，接收心理声学分析模块 51输出的信号类型分析结果。 The sum and difference stereo coding module 57 may also be located between the quantizer group and the scalar in the quantization and entropy coding module 54, and receive the signal type analysis result output by the psychoacoustic analysis module 51.

在本实施例中，和差立体声编码模块 57的功能及工作原理与其在图 11中的相同，此处不再赘述。 In the present embodiment, the function and working principle of the and difference stereo coding module 57 are the same as those in FIG. 11, and details are not described herein again.

基于图 13所示编码装置的编码方法与基于图 9所示编码装置的编码方法基本才目同，区别在于增加了下述步驟：在对频域系数进行量化和熵编码处理之前，判断音频信号是否为多声道信号，如果是多声道信号，则判断左、右声道信号的信号类型是否一致，如果信号类型一致，则判断尺度因子带是否满足编码条件，如果满足，则对该尺度因子带进行和差立体声编码；如果不满足，则不进行和差立体声编码处理；如果是单声道信号或信号类型不一致的多声道信号 , 则不进行和差立体声编码处理。 The encoding method based on the encoding apparatus shown in FIG. 13 is basically the same as the encoding method based on the encoding apparatus shown in FIG. 9, except that the following steps are added: Before the quantization and entropy encoding processing of the frequency domain coefficients, the audio signal is judged. Whether it is a multi-channel signal, if it is a multi-channel signal, it is judged whether the signal types of the left and right channel signals are consistent. If the signal types are the same, it is judged whether the scale factor band satisfies the coding condition, and if it is satisfied, the scale is The factor band performs and the difference stereo encoding; if not, the sum and difference stereo encoding processing is not performed; if it is a mono signal or a multi-channel signal of which the signal type is inconsistent, the difference stereo encoding processing is not performed.

和差立体声编码除了可以应用在量化处理之前，还可以应用在量化之后、熵编码之前，即：在对频域系数量化后，判断音频信号是否为多声道信号，如果是多声道信号，则判断左、右声道信号的信号类型是否一致，如果信号类型一致，则判断尺度因子带是否满足编码条件，如果满足，则对该尺度因子带进行和差立体声编码；如果不满足，则不进行和差立体声编码处理；如果是单声道信号或信号类型不一致的多声道信号，则对不进行和差立体声编码处理。 And the difference stereo coding can be applied before quantization and before entropy coding, that is, after quantization of the frequency domain coefficients, whether the audio signal is a multi-channel signal, and if it is a multi-channel signal, And determining whether the signal types of the left and right channel signals are consistent. If the signal types are consistent, determining whether the scale factor band satisfies the coding condition, and if so, performing the difference stereo coding on the scale factor band; if not, not Performing and difference stereo encoding processing; if it is a mono signal or a multi-channel signal of which the signal type is inconsistent, the pairing and the difference stereo encoding processing are performed.

图 14是解码装置的实施例三的结构图。该解码装置在图 10所示解码装置的基石出上，增加了和差立体声解码模块 66 , 位于逆量化器组 62的输出与逆频域线性预测及矢量量化模块 65的输入之间，比特流解复用模块 60向其输出和差立体声控制信号。 Figure 14 is a block diagram showing a third embodiment of the decoding apparatus. The decoding apparatus adds a sum and difference stereo decoding module 66 on the basis of the decoding apparatus shown in Fig. 10, between the output of the inverse quantizer group 62 and the input of the inverse frequency domain linear prediction and vector quantization module 65, the bit stream The demultiplexing module 60 outputs and the difference stereo control signals thereto.

和差立体声解码模块 66也可以位于熵解码模块 61的输出与逆量化器组 62的输入之间，接收比特流解复用模块 60输出的和差立体声控制信号。 The sum and difference stereo decoding module 66 may also be located between the output of the entropy decoding module 61 and the input of the inverse quantizer group 62, and receive the sum and difference stereo control signals output by the bit stream demultiplexing module 60.

在本实施例中，和差立体声解码模块 66的功能及工作原理与其在图 10中的相同，此处不再赘述。 In this embodiment, the function and working principle of the and difference stereo decoding module 66 are the same as those in FIG. 10, and details are not described herein again.

基于图 14所示解码装置的解码方法与基于图 10所示解码装置的解码方法基本目同，区別在于增加了下述步驟：在得到逆量化潘后，如果信号类型分析结果表明信号型一致，则根据和差立体声控制信号判断是否需要对逆量化谱进行和差立体声解码；如果需要，则根据每个尺度因子带上的标志位判断该尺度因子带是否需要和差立体声解码，如果需要，则将该尺度因子带中的和差声道的逆量化谱转换成左右芦道的逆量化谱，再进行后续处理；如果信号类型不一致或者不需要进行和差立体声解码，则对逆量化谱不进行处理，直接进行后续处理。 The decoding method based on the decoding device shown in FIG. 14 is basically the same as the decoding method based on the decoding device shown in FIG. 10, and the difference is that the following steps are added: after the inverse quantization pan is obtained, if the signal type analysis result indicates that the signal type is consistent, Then, according to the difference stereo control signal, it is judged whether it is necessary to perform inverse stereo quantization and differential stereo decoding; if necessary, Determining whether the scale factor band requires and difference stereo decoding according to the flag bit on each scale factor band, if necessary, converting the inverse quantization spectrum of the difference channel in the scale factor band into the inverse quantization spectrum of the left and right aisles Then, the subsequent processing is performed; if the signal types are inconsistent or the difference stereo decoding is not required, the inverse quantization spectrum is not processed, and the subsequent processing is directly performed.

和差立体声解码还可以在逆量化处理之前进行，即：当得到镨的量化值后，如果信号类型分析结果表明信号类型一致，则根据和差立体声控制 ft号判断是否需要对谱的量化值进行和差立体声解码；如果需要，则根据每个尺度因子带的标志位判断该尺度因子带是否需要和差立体声解码，如果需要，则将该尺度因子带中和差声道的谱的量化值转换成左右声道的谱的量化值，再进行后续处理；如果信号类型 ^一致或者不需要进行和差立体声解码，则对谱的量化值不进行处理，直接进行后续处理 ' And the difference stereo decoding can also be performed before the inverse quantization process, that is: after obtaining the quantized value of 镨, if the signal type analysis result indicates that the signal type is consistent, it is determined according to the difference and the stereo control ft number whether the quantized value of the spectrum needs to be performed. And difference stereo decoding; if necessary, determining whether the scale factor band requires and difference stereo decoding according to the flag bit of each scale factor band, and if necessary, converting the quantized value of the spectrum of the scale factor band and the difference channel The quantized values of the spectrum of the left and right channels are subjected to subsequent processing; if the signal type is consistent or no difference stereo decoding is required, the quantized values of the spectrum are not processed, and the subsequent processing is directly performed'

图 15给出了本发明编码装置的第四个实施例的示意围。本实施例是在图 5所示编码装置的基础上，增加了重采样模块 590和频带扩展模块 591 , 其中重采样模块 590对输入音频信号进行重采样，改变音频信号的采样率，再将改变了采样率的音频信号输出到信号性质分析模块 50; 频带扩展模块 591用于将输入的音频 ^号在整个频带上进行分析，提取高频部分的谱包络及其与低频部分产生关系的特性，并¾"出到比特流复用模块 55。 Fig. 15 shows a schematic representation of a fourth embodiment of the encoding device of the present invention. In this embodiment, based on the encoding apparatus shown in FIG. 5, a resampling module 590 and a band extending module 591 are added, wherein the resampling module 590 resamples the input audio signal, changes the sampling rate of the audio signal, and then changes The sampling rate audio signal is output to the signal property analysis module 50; the band extension module 591 is configured to analyze the input audio signal over the entire frequency band, and extract the spectral envelope of the high frequency portion and its relationship with the low frequency portion. , and 3⁄4" out to the bit stream multiplexing module 55.

重采样模块 590用于对输入音频信号进行重采样， t采样包括上采样和下采样两种，下面以下采样为例说明重采样。在本实施例中，重采样块 590包括低通滤波器和下采样器，其中低通滤波器用于限制音频信号的频带，消除下 ^样可能引起的混叠。输入的音频信号经过低通滤波后，进行下采样。假设输入的音频信^ "为 s (n) , 经过脉冲响应为 1φ) 的低通滤波器滤波后的输出为 v (nj, 则有 1 ） = ^(» (« - 对进行 M倍的下采样后的序列为 x (n)、则 x( ) 。这样，重釆样后的音频信号 x (n)

The resampling module 590 is configured to resample the input audio signal, and the t sampling includes both upsampling and downsampling. The following sampling is used as an example to illustrate resampling. In the present embodiment, the resampling block 590 includes a low pass filter and a down sampler, wherein the low pass filter is used to limit the frequency band of the audio signal, eliminating aliasing that may be caused by the sample. The input audio signal is low-pass filtered and downsampled. Assume that the input audio signal is "s (n), the impulse response is 1φ). The output of the low-pass filter is v (nj, then there is 1) = ^(» (« - for M times The sampled sequence is x (n), then x( ). Thus, the re-sampled audio signal x (n)

的采样率就比原始输入的音频信号的釆样率降低了 1 倍。 The sampling rate is reduced by a factor of 1 compared to the original input audio signal.

原始输入音频信号输入到频带扩展模块 591后，在整个频带上进行分析，提取出高频部分的谱包络及其与低频部分产生关系的特性，作为频 ^扩展控制信息输出到比特流复用模块 55。 After the original input audio signal is input to the band extension module 591, the analysis is performed on the entire frequency band, and the spectral envelope of the high-frequency portion and its characteristics related to the low-frequency portion are extracted, and output as the frequency extension control information to the bit stream multiplexing. Module 55.

频带扩展的基本原理是：对于大多数音频信号，其高频部分的特性与低频部分的特性存在很强的相关性，因此音频信号的高频部分可以通过低频部分有效地重构出来，这样，音频信号的高频部分可以不传输。为确保高频部分能够确的重构，在压缩音频码流中仅传输少量的频带扩展控制信号就可以了。 The basic principle of frequency band expansion is: For most audio signals, the characteristics of the high frequency part have a strong correlation with the characteristics of the low frequency part, so the high frequency part of the audio signal can be effectively reconstructed by the low frequency part, thus, The high frequency portion of the audio signal may not be transmitted. In order to ensure that the high frequency part can be reconstructed correctly, only the compressed audio stream is compressed. It is sufficient to transmit a small number of band extension control signals.

频带扩展模块 591包括参数提取模块和语包络提取模块，输入信号进入参数提取模块中，提取在不同时频区域表示输入信号谱特性的参数，然后在谱包络提取模块中，以一定的时频分辨率估计信号高频部分的谱包络。 .为了确保时频分辨率最适合于当前输入信号的特性，谱包络的时频分辨率可以自由选择。输入信号谱特性的参数和高频部分的语包络作为频带扩展的控制信号输出到比特流复用模块 55中复用。 The band expansion module 591 includes a parameter extraction module and a language envelope extraction module, and the input signal enters the parameter extraction module to extract parameters indicating spectral characteristics of the input signal in different time-frequency regions, and then in the spectral envelope extraction module, at a certain time The frequency resolution estimates the spectral envelope of the high frequency portion of the signal. To ensure that the time-frequency resolution is best suited to the characteristics of the current input signal, the time-frequency resolution of the spectral envelope is freely selectable. The parameters of the input signal spectrum characteristics and the speech envelope of the high frequency portion are output as a band extension control signal to the bit stream multiplexing module 55 for multiplexing.

比特流复用模块 55收到量化和熵编码模块 54输出的包括公共尺度因子、尺度因子编码值、码书序号编码值和无损编码量化谱的码流或者是码字索引的编码值以及频带扩展模块 591输出的频带扩展控制信号后，对其进行复用，得到压缩音频数据流。 The bitstream multiplexing module 55 receives the code stream including the common scale factor, the scale factor coded value, the codebook sequence number coded value, and the lossless coded quantized spectrum or the coded value of the codeword index and the band extension output by the quantization and entropy coding module 54. After the band extension control signal output by the module 591 is multiplexed, a compressed audio data stream is obtained.

基于图 15 所示编码装置的编码方法，具体包括：在整个频带上分析输入音频信号，提取其高频谱包络和信号谱特性参数作为频带扩展控制信号；对输入音频信号进行重采样处理和信号类型分析；计算重采样后信号的信掩比；对重采样后的信号进行时频映射，获得音频信号的频域系数；对频域系数进行量化和熵编码；将频带扩展控制信号和编码后的音频码流进行复用，得到压缩音频码流。其中重采样处理包括两个步骤：限制音频信号的频带；对限制频带的音频信号进行多倍的下采样。 The encoding method based on the encoding apparatus shown in FIG. 15 specifically includes: analyzing an input audio signal over the entire frequency band, extracting a high spectral envelope and a signal spectral characteristic parameter as a frequency band extension control signal; resampling the input audio signal and a signal Type analysis; calculating the signal-to-mask ratio of the resampled signal; performing time-frequency mapping on the resampled signal to obtain a frequency domain coefficient of the audio signal; performing quantization and entropy coding on the frequency domain coefficient; and extending the frequency band control signal and coding The audio stream is multiplexed to obtain a compressed audio stream. The resampling process includes two steps: limiting the frequency band of the audio signal; and down-sampling the audio signal of the limited band.

图 16是解码装置的实施例四的结构示意图，该实施例是在图 8所示解码装置的基础上增加了频带扩展模块 68 , 接收比特流解复用模块 60 输出的频带扩展控制信息和频率- 时间映射模块 64 输出的低频段的时域音频信号，通过频语搬移和高频调整重建高频信号部分，输出宽频带音频信号。 FIG. 16 is a schematic structural diagram of Embodiment 4 of the decoding apparatus. The embodiment is based on the decoding apparatus shown in FIG. 8. A band extension module 68 is added, and the band extension control information and frequency output by the bit stream demultiplexing module 60 are received. - The time domain audio signal of the low frequency band output by the time mapping module 64 is reconstructed by frequency shifting and high frequency adjustment to output a wideband audio signal.

基于图 16所示解码装置的解码方法，基本与基于图 8所示解码装置的解码方法相同，区別在于增加了下述步驟：在获得时域音频信号后，根据频带扩展控制信息和时域音频信号，重构音频信号的高频部分，得到宽频带音频信号。 The decoding method based on the decoding apparatus shown in FIG. 16 is basically the same as the decoding method based on the decoding apparatus shown in FIG. 8, except that the following steps are added: After obtaining the time domain audio signal, the control information and the time domain audio are expanded according to the frequency band. The signal reconstructs the high frequency portion of the audio signal to obtain a wideband audio signal.

图 17、 1 和 21是编码装置的第五个至第七个实施例，分别是在图 11、图 9和图 13 所示编码装置的基础上，增加了重采样模块 590和频带扩展模块 591 , 这两个模块与其他模块的连接关系、功能和原理均与其在图 15中的相同，此处不再赘述。 Figures 17, 1 and 21 are fifth to seventh embodiments of the encoding apparatus, based on the encoding apparatus shown in Figures 11, 9, and 13, respectively, with the addition of the resampling module 590 and the band extending module 591. The connection relationship, functions, and principles of the two modules are the same as those in FIG. 15 and will not be described here.

图 18、 20和 22则是解码装置的第五个至第七个实施例，分别是在图 12、图 10和图 14所示解码装置的基础上，增加了频带扩展模块 68 , 接收比特流解复用模块 60输出的频带扩展控制信息和频率-时间映射模块 64输出的低频段的时域音频信号，通过频谱搬移和高频调整重建高频信号部分，输出宽频带音频信号。 18, 20 and 22 are fifth to seventh embodiments of the decoding apparatus, respectively, based on the decoding apparatus shown in Figs. 12, 10 and 14, respectively, a band extension module 68 is added, and a bit stream is received. The band extension control information output by the demultiplexing module 60 and the time domain audio signal of the low frequency band output by the frequency-time mapping module 64 reconstruct the high frequency signal portion by spectrum shifting and high frequency adjustment, and output a wideband audio signal.

在上述编码装置的 7个实施例中，还可以包括增益控制模块，接收信号性质分析模块 50输出的音频信号，控制快变类型信号的动态范围，消除音频处中的预回声，其输出连接到时频映射模块 52和心理声学分析模块 51 , 同时将增益调整粮翰出到比特流复用模块 55。 In the seven embodiments of the foregoing encoding apparatus, the gain control module may further be included, and the signal property analysis module is received. 50 output audio signal, controlling the dynamic range of the fast-changing type signal, eliminating pre-echo in the audio, and outputting the output to the time-frequency mapping module 52 and the psychoacoustic analysis module 51, and simultaneously adjusting the gain to the bit stream Use module 55.

增益控制模块根据音频信号的信号类型，只对快变类型信号进 4亍控制，而对緩变类型信号，则不进行处理，直接输出。对于快变类型信号，增益控制模块调整信号的时域能量包络，提升快变点前信号的增益值，使得快变点前、后的时域信号度较为接近；然后将调整了时域能量包络的时域信号输出到时频映射模块 52, 同时将增调整量输出到比特流复用模块 55。 The gain control module only controls the fast-change type signal according to the signal type of the audio signal, and does not process and directly output the slowly-changing type signal. For fast-change type signals, the gain control module adjusts the time-domain energy envelope of the signal, and increases the gain value of the signal before the fast-changing point, so that the time-domain signal degrees before and after the fast-changing point are relatively close; then the time domain energy is adjusted. The time domain signal of the envelope is output to the time-frequency mapping module 52, and the increased adjustment amount is output to the bit stream multiplexing module 55.

基于该编码装置的编码方法基本与基于上述编码装置的编码方法相同，区别在于增加了下述步骤：对经过信号类型分析的信号进行增益控制。 The encoding method based on the encoding device is basically the same as the encoding method based on the above encoding device, with the difference that the following steps are added: Gain control is performed on the signal subjected to signal type analysis.

在上述编码装置的 7 个实施例中，还可以包括逆增益控制模 fe, 位于频率-时间映射模块 64的输出之后，接收比特流解复用模块 60输出的信号类型分斤结果和增益调整量信息，用于调整时域信号的增益，控制预回声。逆增益控制模块接^ ^到频率-时间映射模块 64输出的重建时域信号后，对快变类型信号进行控制，而对緩变类^ H言号不进行处理。对于快变类型信号，逆增益控制模块 4N¾增益调整量信息调整重建时 ^或信号的能量包络，减小快变点前信号的幅度值，将能量包络调回原先的前低后高的状态，这样快变点前的量化噪声的幅度值会和信号的幅度值一起相应地减小，从而控制了预回。 In the seven embodiments of the foregoing encoding apparatus, an inverse gain control module fe may be further included, and after receiving the output of the frequency-time mapping module 64, the signal type and the gain adjustment amount output by the bit stream demultiplexing module 60 are received. Information, used to adjust the gain of the time domain signal, to control the pre-echo. After the inverse gain control module is connected to the reconstructed time domain signal outputted by the frequency-time mapping module 64, the fast-changing type signal is controlled, and the slow-changing type is not processed. For the fast-change type signal, the inverse gain control module 4N3⁄4 gain adjustment amount information adjusts the energy envelope of the reconstruction or the signal, reduces the amplitude value of the signal before the fast-changing point, and returns the energy envelope to the original front low and high State, such that the magnitude of the quantization noise before the fast change point is correspondingly reduced with the amplitude value of the signal, thereby controlling the pre-return.

基于该解码装置的解码方法基本与基于上述解码装置的解码方相同，区别在于增加了下述步骤：对重建时域信号进行逆增益控制。 The decoding method based on the decoding device is basically the same as the decoding method based on the above decoding device, with the difference that the following steps are added: inverse gain control is performed on the reconstructed time domain signal.

最后所应说明的是，以上实施例仅用以说明本发明的技术方案非限制，尽管参照较佳实施例对本发明进行了详细说明，本领域的普通技术人员应当理解，可以对本发明的技术方案进行修改或者等同替换，而不脱离本发明技术方案的精神和围，其均应涵盖在本发明的权利要求范围当中。 It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and the present invention will be described in detail with reference to the preferred embodiments, and those skilled in the art should understand that the technical solutions of the present invention can be Modifications or equivalents are intended to be included within the scope of the appended claims.

Claims

Rights request

An enhanced audio coding apparatus, comprising a psychoacoustic analysis module, a time-frequency mapping module, a quantization and entropy coding module, and a bitstream multiplexing module, further comprising a signal property analysis module and a multi-resolution analysis module; The signal property analysis module is configured to perform type analysis on the input audio signal, and output the signal to the psychoacoustic analysis module and the time-frequency mapping module, and output the type analysis result of the audio signal to the lost stream. Multiplexing module

The psychoacoustic analysis module is configured to calculate a masking threshold and a signal mask ratio of the audio signal, and output to the quantization and entropy encoding module;

The time-frequency mapping module is configured to convert a time domain audio signal into a frequency domain coefficient, and output the data to: a resolution analysis module;

The multi-resolution analysis module is configured to perform multi-resolution analysis on the frequency domain coefficients of the fast-changing type signal according to the signal class analysis result output by the signal property analysis module, and output the same to the quantization and entropy coding module;

The quantization and entropy coding module is configured to quantize and entropy encode frequency domain coefficients under control of a mask ratio output by the psychoacoustic analysis module, and output the same to the bit stream multiplexing module;

The bit stream multiplexing module is configured to multiplex the received data to form an audio code u.

2. The enhanced audio encoding apparatus according to claim 1, wherein the multi-resolution analysis module comprises a frequency domain coefficient transform module and a recombination module, wherein the frequency domain coefficient transform module is configured to transform a frequency domain coefficient The time-frequency plane coefficient is used; the recombination module is configured to recombine the time-frequency plane coefficients according to a certain rule; wherein the frequency domain coefficient transform module is a frequency domain wavelet transform filter bank or a frequency domain MDCT transform filter bank.

3. The enhanced audio encoding apparatus according to claim 1, further comprising a frequency domain linear prediction and vector quantization module, located at an output of said multiresolution analysis module and said input of said quantized and entropy encoded blocks The frequency domain linear prediction and vector quantization module is specifically composed of a linear prediction analyzer, a linear prediction filter, a converter, and a vector quantizer;

The linear predictive analyzer is configured to perform predictive analysis on frequency domain coefficients, obtain prediction gain and page measurement coefficients, and output frequency domain coefficients satisfying a certain condition to the linear prediction filter; The domain coefficients are directly output to the quantization and entropy coding module;

The linear prediction filter is configured to filter the frequency domain coefficients to obtain a residual sequence, and the ridge residual sequence is output to the quantization and entropy coding module, and the prediction coefficients are output to the converter;

The converter is configured to convert a prediction coefficient into a line spectrum pair frequency coefficient; The vector quantizer is configured to perform multi-level vector quantization on the line error pair frequency coefficient, and the quantized related side information is transmitted to the bit stream multiplexing module.

The enhanced audio encoding apparatus according to any one of claims 1 to 3, further comprising a sum and difference stereo encoding module, the output of the frequency domain linear prediction and vector quantization module, and the quantization and entropy coding. Between the inputs of the module, or between the quantizer group and the encoder in the quantization and entropy coding module; the signal property analysis module outputs a signal type analysis result thereto; the sum and difference stereo coding module is used to The residual sequence/frequency domain coefficients of the left and right channels are converted to residual sequence/frequency domain coefficients of the difference channel.

The enhanced audio encoding apparatus according to any one of claims 1 to 4, further comprising a resampling module and a band extending module;

The resampling module is configured to resample the input audio signal, change the sampling rate of the audio signal, and output the audio signal after changing the sampling rate to the psychoacoustic analysis module and the signal property analysis module; a low pass filter and a down sampler; wherein the low pass filter is configured to limit a frequency band of the audio signal, and the down sampler is configured to downsample the audio signal of the limited frequency band to reduce a sampling rate of the signal;

The frequency band expansion module is configured to analyze the input audio signal over the entire frequency band, extract a spectral envelope of the high frequency portion, and characterize the correlation between the low and high frequency chirps, and output the data to the bit stream multiplexing The module includes: a parameter extraction module and a spectral envelope extraction module; the parameter extraction module is configured to extract a parameter indicating an input signal characteristic of the input signal in different time-frequency regions; The time-frequency resolution estimates the spectral envelope of the high-frequency portion of the signal, and then outputs the parameters of the spectral characteristics of the input signal and the spectral envelope of the high-frequency portion to the bitstream multiplexing module.

6. An enhanced audio coding method, comprising the steps of:

Step 1: Perform type analysis on the input audio signal, and use the signal type analysis result as part of the multiplexing information. Step 2: Perform time-frequency mapping on the type-analyzed signal to obtain a frequency domain coefficient of the audio signal; meanwhile, calculate an audio signal Letter cover ratio;

Step 3: If it is a fast variable type signal, perform multi-resolution analysis on the frequency domain coefficient; if it is not a fast change type signal, go to step 4;

Step 4: Quantizing and entropy coding the frequency domain coefficients under the control of the mask ratio;

Step 5: multiplexing the encoded audio signal to obtain a compressed audio code stream.

The enhanced audio coding method according to claim 6, wherein the quantization in the fourth step is scalar quantization, and specifically includes: performing nonlinear compression on frequency domain coefficients in all scale factor bands; and using each subband The scale factor quantizes the frequency domain coefficients of the subband to obtain a quantized spectrum expressed by an integer; selects one scale factor in each frame signal as a common scale factor; and other scale factors are differentially processed with the previous scale factor; The entropy coding comprises: entropy coding the quantized spectrum and the differentially processed scale factor, obtaining a codebook sequence "?", a scale factor coded value, and a quantized "i" lossless coded value; entropy coding the codebook sequence number, Get the code book serial number code H.

The enhanced audio coding method according to claim 6, wherein the step three-multi-rate analysis comprises: performing MDCT transform on the frequency domain coefficients to obtain a time-frequency plane coefficient; and determining, according to the time-frequency plane coefficient, The reorganization method includes: firstly, the time-frequency plane coefficients are organized in the frequency direction, the coefficients in each frequency are organized in the time direction, and then the organized coefficients are arranged in the order of the sub-window and the scale factor band. .

The enhanced audio coding method according to any one of claims 6-8, characterized in that, between the step-3 and the step 4, the method further comprises: performing standard linear prediction analysis on the frequency domain coefficients to obtain a prediction gain. And a prediction coefficient; determining whether the prediction gain exceeds a set value; if it exceeds, performing frequency linear prediction error filtering on the frequency domain coefficient according to the prediction coefficient, obtaining a linear prediction residual sequence of the frequency domain coefficient; converting the prediction coefficient into a line Word pair frequency coefficient, and multi-level vector quantization processing on the frequency coefficient to obtain side information; quantize and entropy encode the residual sequence; if the prediction gain does not exceed the set value, then the frequency domain The coefficients are quantized and entropy encoded.

The enhanced audio encoding method according to any one of claims 6-9, wherein the step 4 further comprises: quantizing the frequency domain coefficients; determining whether the audio signal is a multi-channel signal, if it is multi-channel The signal determines whether the signal types of the left and right channel signals are consistent. If the signal types are the same, it is determined whether the scale factor bands corresponding to the two channels satisfy the difference and the stereo coding conditions, and if satisfied, the scale factor band is The spectral system in the middle and the differential stereo coding obtain the frequency domain coefficients of the difference channel; if not, the lineage in the scale factor band does not perform the difference stereo coding; if it is a mono signal or the signal type is inconsistent The multi-channel signal, the frequency coefficient is not processed; the frequency domain coefficients are entropy encoded;

The method for judging whether the scale factor band satisfies the coding condition is: K-L transformation, specifically: calculating a correlation matrix of spectral coefficients of left and right channel scale factor bands; performing KL transformation on the correlation matrix; if the rotation angle α is absolutely straight Deviation; when r/4 is small, such as 3 /16 < |^ < 5 /16, the corresponding scale factor band can be subjected to and differential stereo coding; the sum and difference stereo coding is: , where: represents the quantized harmony Channel frequency domain coefficient;

The quantized difference channel frequency domain coefficient is displayed; the quantized left channel frequency domain coefficient is represented; and the quantized right sound frequency domain coefficient is represented.

The enhanced audio encoding method according to any one of claims 6 to 10, characterized in that, before the step one, further comprising a repeating step and a frequency band expanding step;

In the repeating step, the input audio signal is re-sampled to change the sampling rate of the audio signal; The frequency band expansion step analyzes the input audio signal over the entire frequency band, extracting its high spectral envelope and signal spectral characteristic parameters as part of signal multiplexing.

12. An enhanced audio decoding apparatus, comprising: a bit stream demultiplexing module, an entropy decoding module, an inverse quantizer group, and a frequency-time mapping module, further comprising: a multi-resolution synthesis module;

The bitstream demultiplexing module is configured to demultiplex the compressed audio data stream, and output corresponding data signals and control signals to the entropy decoding module and the multi-resolution synthesis module;

The entropy decoding module is configured to perform decoding processing on the foregoing signal, recover the wrong quantized value, and output the result to the inverse quantizer group;

The inverse quantizer group is configured to reconstruct an inverse quantized language and output to the multi-resolution synthesis module;

The multi-resolution synthesis module is configured to perform multi-resolution synthesis on the inverse quantization spectrum and output to the frequency-time mapping module;

The frequency-time mapping module is configured to perform frequency-time mapping on the spectral coefficients and output a time domain audio signal.

The enhanced audio decoding device according to claim 12, wherein the multi-resolution synthesis module comprises: a coefficient recombination module and a coefficient transformation module; the coefficient transformation module is a frequency domain inverse wavelet transform filter bank or Frequency domain inverse modified discrete cosine transform filter bank.

14. The enhanced audio decoding apparatus according to claim 12 or 13, further comprising an inverse frequency domain linear prediction and vector quantization module, the output of the inverse quantizer group and the multi-resolution synthesis module The inverse frequency domain linear prediction and vector quantization module specifically includes an inverse vector quantizer, an inverse transformer, and an inverse linear prediction filter; the inverse vector quantizer is used for inverse quantization of the codeword index to obtain a line language The inverse frequency converter is configured to inversely convert the line spectrum to the frequency coefficient into a prediction coefficient; and the inverse linear prediction filter is configured to inversely filter the inverse quantization spectrum according to the prediction coefficient to obtain a spectrum before prediction.

The enhanced audio decoding device according to any one of claims 12-14, further comprising a sum and difference stereo decoding module, located after the inverse quantizer group or at an output of the entropy decoding module Between the inputs of the inverse quantizer group, receiving the sum and difference stereo control signals output by the bit stream demultiplexing module, for converting the quantized values of the inverse quantized spectrum/spectrum of the difference channel and the differential channel according to the sum and difference stereo control information The quantized value of the inverse quantized/spectrum of the left and right channels.

16. An enhanced audio decoding method, comprising the steps of:

Step 1: Demultiplexing the compressed audio data stream to obtain data information and control information;

Step 2: Entropy decoding the above information to obtain a quantized value of the spectrum;

Step 3: performing inverse quantization processing on the quantized value of the spectrum to obtain an inverse quantization spectrum; Step 4: performing multi-resolution synthesis on the inverse quantization spectrum;

Step 5: Perform frequency-time mapping to obtain a time domain audio signal.

The enhanced audio decoding method according to claim 16, wherein the step four-resolution integration step is specifically: arranging the inverse quantized coefficients according to the order of the sub-window and the scale factor band, and then following the frequency order. The recombination is performed, and then multiple inverse modified cosine transforms are performed on the recombined coefficients to obtain an inverse quantized spectrum before multiresolution analysis.

The enhanced audio decoding method according to claim 16, wherein the step 5 may further comprise: performing inverse modified discrete cosine transform to obtain a transformed time domain signal; and transforming the time domain signal at time The domain performs windowing processing; superimposing the windowed time domain signal to obtain a time domain audio signal; wherein the window function in the windowing processing is:

^(M=cos(/7i/2*((i+0.5)/ -0.94*8ίη(2*/7// *(1+0. )) I {2*pi))) , where ir = 0 ... ^l; ίΧ ^ denotes the kth coefficient of the window function, with w(k)=w(2*7^1_ ); represents the number of samples of the encoded frame.

The enhanced audio decoding method according to claim 17 or 18, further comprising: determining whether the inverse quantization spectrum is included in the control information needs to undergo inverse frequency domain linearity between the step 3 and the fourth step The information of the predicted vector quantization, if included, is subjected to inverse vector quantization processing to obtain a prediction coefficient, and the prediction coefficient is used for linear prediction synthesis of the inverse quantization spectrum to obtain a spectrum before prediction; and the spectrum before prediction is subjected to frequency-time mapping; The inverse vector quantization process further includes: obtaining a codeword index of the prediction coefficient vector quantization from the control information; and obtaining a quantized line frequency coefficient according to the codeword index, and calculating the prediction coefficient.

The enhanced audio decoding method according to any one of claims 16 to 19, wherein, between the second step and the third step, the method further comprises: if the signal type analysis result indicates that the signal types are consistent, the basis and the difference are The stereo control signal determines whether inverse quantization and differential stereo decoding are required; if necessary, determining whether the scale factor band requires and difference stereo decoding according to the flag bit on each scale factor band, and if necessary, the scale factor The inverse quantization of the band and the difference channel is converted into the inverse quantization spectrum of the left and right channels, and the process proceeds to step 3; if the signal type is inconsistent or does not need to perform the difference and the stereo decoding, the inverse quantization spectrum is not processed, Go to step 3; wherein the sum and difference stereo decoding are: , where: represents the quantized sum channel frequency domain

The number represents the quantized difference channel frequency domain coefficient; represents the quantized left channel frequency domain coefficient; represents the quantized right channel frequency domain coefficient.