[go: up one dir, main page]

WO2005096273A1 - Enhanced audio encoding/decoding device and method - Google Patents

Enhanced audio encoding/decoding device and method Download PDF

Info

Publication number
WO2005096273A1
WO2005096273A1 PCT/CN2005/000440 CN2005000440W WO2005096273A1 WO 2005096273 A1 WO2005096273 A1 WO 2005096273A1 CN 2005000440 W CN2005000440 W CN 2005000440W WO 2005096273 A1 WO2005096273 A1 WO 2005096273A1
Authority
WO
WIPO (PCT)
Prior art keywords
module
signal
frequency
frequency domain
inverse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2005/000440
Other languages
French (fr)
Chinese (zh)
Inventor
Xingde Pan
Dietz Martin
Andreas Ehret
Holger HÖRICH
Xiaoming Zhu
Michael Schug
Weimin Ren
Lei Wang
Hao Deng
Fredrik Henn
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING E-WORLD TECHNOLOGY Co Ltd
BEIJING MEDIA WORKS Co Ltd
Coding Technologies Sweden AB
Original Assignee
BEIJING E-WORLD TECHNOLOGY Co Ltd
BEIJING MEDIA WORKS Co Ltd
Coding Technologies Sweden AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING E-WORLD TECHNOLOGY Co Ltd, BEIJING MEDIA WORKS Co Ltd, Coding Technologies Sweden AB filed Critical BEIJING E-WORLD TECHNOLOGY Co Ltd
Priority to EP05742018A priority Critical patent/EP1873753A1/en
Publication of WO2005096273A1 publication Critical patent/WO2005096273A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components

Definitions

  • the present invention relates to the field of audio codec technology, and in particular to an enhanced audio codec device and method based on a perceptual model. Background technique
  • the digital audio signal is audio encoded or audio compressed for storage and transmission.
  • the purpose of encoding an audio signal is to achieve a transparent representation of the audio signal with as few bits as possible, for example, there is little difference between the originally input audio signal and the encoded output audio signal.
  • the advent of CDs represented the many advantages of digitally representing audio signals, such as high fidelity, large dynamic range, and robustness.
  • these advantages are at the expense of high data rates.
  • the sample rate of the CD-quality stereo signal requires a sampling rate of 44.1 kHz, and each sample value is uniformly quantized with 15 bits, so that the uncompressed data rate reaches 1.41 Mb/s.
  • Such a high data rate brings great inconvenience to the transmission and storage of data, especially in the case of multimedia applications and wireless transmission applications, and is limited by bandwidth and cost.
  • new network and wireless multimedia digital audio systems are required to reduce the rate of data without compromising the quality of the audio.
  • MPEG-1 technology and MPEG-2 BC technology are high-quality audio coding techniques primarily used for mono and stereo audio signals, with the need for multi-channel audio coding that achieves higher coding quality at lower bit rates.
  • MPEG-2 BC encoding technology emphasizes backward compatibility with MPEG-1 technology, it is impossible to achieve high-quality encoding of five channels at a code rate lower than 540 kbps.
  • the MPEG- 2 AAC technology is proposed, which can achieve higher quality encoding for five-channel signals at a rate of 320 kbps.
  • Figure 1 shows a block diagram of an MPEG-2 AAC encoder comprising a gain controller 101, a filter bank 102, a time domain noise shaping module 103, an intensity/coupling module 104, a psychoacoustic model, a second order backward self.
  • the filter bank 102 employs a modified discrete cosine transform (MDCT) whose resolution is signal adaptive, that is, a 2048-point MDCT transform is used for the steady-state signal, and a 256-point MDCT transform is used for the transient signal; thus, for 48
  • MDCT modified discrete cosine transform
  • the kHz-like signal has a maximum frequency resolution of 23 Hz and a maximum time resolution of 2.6 ms.
  • a sine window and a Kaiser-Bessel window can be used in the filter bank 102, and the sinusoidal window is used when the harmonic interval of the input signal is less than 140 Hz, and the Ka iser-Bessel window is used when the strong component interval of the input signal is greater than 220 Hz. .
  • the audio signal After the audio signal passes through the gain controller 101, it enters the filter bank 102, performs filtering according to different signals, and then processes the spectral coefficients output by the filter bank 102 through the time domain noise shaping module 103.
  • the time domain noise shaping technique is in the frequency domain. Linear prediction analysis is performed on the spectral coefficients, and then the quantization noise is controlled in the time domain according to the above analysis, thereby achieving the purpose of controlling the pre-echo.
  • the intensity/coupling module 104 is for stereo encoding of signal strength, since for a high frequency band (greater than 2 kHz) the sense of direction of the hearing is related to the change in signal strength (signal envelope), regardless of the waveform of the signal, That is, the constant envelope signal has no influence on the sense of direction of the hearing, so this feature and the related information between multiple channels can be used to combine several channels into one common channel for encoding, which forms an intensity/coupling technique.
  • the second-order backward adaptive predictor 105 is used to eliminate redundancy of the steady state signal and improve coding efficiency.
  • the and difference stereo (M/S) module 106 operates for a pair of channels, which are two channels for capturing left and right channels or left and right surround channels, such as a two-channel signal or a multi-channel signal.
  • the M/S module 106 utilizes the correlation between the two channels of the channel pair to achieve the effect of reducing the code rate and improving the coding efficiency.
  • the bit allocation and quantization coding module 107 is implemented by a nested loop process in which the non-uniform quantizer performs lossy coding and the entropy coding module performs lossless coding, which removes redundancy and reduces correlation.
  • the nested loop includes an inner loop and an outer loop, wherein the inner loop adjusts the step size of the non-uniform quantizer until the supplied bits are used up, and the outer loop estimates the encoding of the ⁇ using the ratio of the quantization noise to the masking threshold. quality.
  • the last encoded signal forms an encoded audio stream output through bitstream multiplexing module 108.
  • the input signal simultaneously generates four equal-bandwidth bands in the quad-band polyphase filter bank (PQF), and each band uses MDCT to generate 256 spectral coefficients, for a total of 1024.
  • PQF quad-band polyphase filter bank
  • each band uses MDCT to generate 256 spectral coefficients, for a total of 1024.
  • a gain controller 101 is used in each band.
  • the high frequency PQF band can be ignored to obtain a low sampling rate signal.
  • FIG. 2 shows a block diagram of the corresponding MPEG-2 AAC decoder.
  • the decoder includes a bitstream demultiplexing block 201, a lossless decoding module 202, an inverse quantizer 203, a scale factor module 204, and a / difference stereo (M/S) module 205, a prediction module 206, an intensity/coupling module 207, The time domain noise shaping module 208, the filter bank 209, and the gain ⁇ empty module 210.
  • the encoded audio stream is demultiplexed by the bitstream demultiplexing module 201 to obtain a corresponding data stream and control stream.
  • the lossless decoding module 202 After the above signal is decoded by the lossless decoding module 202, an integer representation of the scale factor and a signal spectrum are obtained.
  • the inverse quantizer 203 is a set of non-uniform quantizers implemented by a companding function for converting integer quantized values into reconstruction Fu. Since the scale factor module in the encoder differentiates the current scale factor from the previous scale factor and then uses the Huffman code for the difference value, the scale factor module 204 in the decoder performs Huffman decoding to obtain the corresponding difference value, and then recovers. A true scale factor.
  • the M/S module 205 converts the difference channel to the left and right channels under the control of the side information. Since the second order backward adaptive predictor 105 is used in the encoder to eliminate the redundancy of the steady state signal and improve the coding efficiency, the prediction decoding is performed by the prediction module 206 in the decoder.
  • the intensity/coupling module 207 performs intensity/coupling decoding under the control of the side information, and then outputs it to the time domain noise shaping module 208 for time domain noise shaping decoding, and finally performs comprehensive filtering by the filter bank 209, and the filter bank 209 adopts the reverse direction.
  • IMDCT Improved Discrete Cosine Transform
  • the high frequency PQF band can be ignored by the gain control module 210 to obtain a low sampling rate signal.
  • MPEG-2 AAC codec technology is suitable for medium and high bit rate audio signals, but the encoding quality of low bit rate or low bit rate audio signals is poor. At the same time, the codec technology involves more codec modules. High complexity is not conducive to real-time implementation.
  • Figure 3 shows the structure of the encoder using Dolby AC-3 technology, including the transient signal detection module 301, ? The discrete cosine transform filter MDCT 302, the frequency borrowing envelope/exponential encoding module 303, the mantissa encoding module 304, the forward-backward adaptive sensing model 305, the parameter bit allocation module 306, and the bit stream multiplexing module 307.
  • the audio signal is judged to be a steady state signal or a transient signal by the transient signal detecting module 301, and the time domain data is mapped to the frequency domain data by the signal adaptive MDCT filter bank 302, wherein a long window of 512 points is applied to the steady state signal. , a pair of short windows applied to the transient signal.
  • the spectral envelope/exponential encoding module 303 encodes the exponential portions of the signals in three modes according to the requirements of the code rate and frequency resolution, namely the D15, D25, and D45 encoding modes.
  • the AC-3 technology differentially encodes the spectral envelope in frequency because at most ⁇ 2 increments are required, each increment represents a 6 dB level change, the first DC term is absolute coded, and the remaining indices are differential. coding.
  • each index requires approximately 1.33 bits, and three difference packets are encoded in a 7-bit word length.
  • the D15 coding mode provides fine frequency resolution by sacrificing temporal resolution.
  • D15 is occasionally transmitted for steady state signals, typically every 6 sound blocks ( The spectral envelope of a data frame is transmitted once. When the signal spectrum is unstable, the frequency estimate needs to be updated frequently. Estimates are encoded with a smaller frequency resolution, usually using the D25 and D45 encoding modes.
  • the D25 encoding mode provides suitable frequency resolution and time resolution, and differential encoding is performed every other frequency coefficient. Each index requires approximately 1.15 bits. When the frequency is stable on 2 to 3 blocks and then suddenly changes, the D25 encoding mode can be used.
  • the D45 coding mode is to perform a difference encoding every three frequency coefficients, so that each index needs about 0.58 bits.
  • the D45 encoding mode provides high temporal resolution and low frequency resolution, so it is generally used in the encoding of transient signals.
  • the forward-backward adaptive sensing model 305 is used to estimate the masking threshold of each frame of the signal.
  • the forward adaptive part is only applied to the encoder end. Under the limitation of the code rate, a set of the most perceptual model parameters is estimated by iterative loop, and then these parameters are passed to the backward adaptive part to estimate each frame. Mask the threshold.
  • the post-adaptive part is applied to both the encoder side and the decoder side.
  • the parameter bit allocation module 306 analyzes the frequency of the audio signal according to the masking criteria to determine the number of bits to assign to each mantissa.
  • the module 306 utilizes a bit pool for global ratio allocation of all channels.
  • bits are cyclically extracted from the bit pool and allocated to all channels, and the quantization of the mantissa is adjusted in accordance with the number of available bits.
  • the AC-3 encoder also uses high-frequency coupling technology to divide the high-frequency part of the coupled signal into 18 sub-bands according to the critical bandwidth of the human ear, and then select some channels from a certain sub-band. Start coupling. Finally, the AC-3 audio stream is formed by the bit stream multiplexing module 307.
  • Figure 4 shows a schematic diagram of the process using Dolby AC-3 decoding.
  • the bit stream encoded by the ⁇ AC-3 encoder is input, and data frame synchronization and error detection are performed on the bit stream. If a data code is detected, error concealment or mute processing is performed. Then the bit stream is unpacked to obtain the main information and the side information, and then the index is decoded.
  • two side information is needed: one is the number of indexed packets; one is the indexing strategy of the mining, such as D15, D25 or D45 mode.
  • the decoded index and bit allocation side information are then bit-matched, indicating the number of bits used for each packed mantissa, resulting in a set of bit allocation pointers, each bit allocation pointer corresponding to an encoded mantissa.
  • the bit allocation pointer indicates the quantizer used for the mantissa and the number of bits occupied by each mantissa in the code stream.
  • the single coded mantissa value is dequantized, converted into a dequantized value, and the mantissa occupying zero bits is complexed to zero by zero, or replaced by a random jitter value under the control of the jitter flag.
  • the decoupling operation is then performed, and the decoupling is to recover the high frequency portion of the coupled channel from the common coupling channel and the coupling factor, including the exponent and the mantissa.
  • the decoupling is to recover the high frequency portion of the coupled channel from the common coupling channel and the coupling factor, including the exponent and the mantissa.
  • matrix processing is applied to a certain sub-band, then the sub-band and the difference channel value are converted into left and right channel values by matrix recovery at the decoding end.
  • the dynamic range control of each audio block is included in the code stream to dynamically range the value to change the magnitude of the coefficients, including the exponent and mantissa.
  • the frequency domain coefficients are inverse transformed and converted into time domain samples.
  • the time domain samples are windowed, and the adjacent blocks are overlapped and added to reconstruct the PC1M audio signal.
  • the audio signal needs to be mixed, and finally the PCM is output.
  • Dolby AC- 3 encoding technology is mainly for high bit rate multi-channel surround sound signals, but when the 5.1 bit encoding bit rate is lower than 384kbps, its encoding effect is poor; and for mono and dual sound Channel stereo coding efficiency is also low.
  • the technical problem to be solved by the present invention is to provide an apparatus and method for enhancing audio encoding/decoding to solve the problem of low coding efficiency and poor quality of the lower rate audio signal in the prior art.
  • the enhanced audio coding device of the present invention comprises a psychoacoustic analysis module, a time-frequency mapping module, a quantization and entropy coding module, and a bit stream multiplexing module, a signal property analysis module and a multi-resolution analysis module; wherein the signal property analysis module, And performing type analysis on the input audio signal, and outputting to the psychoacoustic analysis module and the time-frequency mapping module, and outputting signal type analysis result information to the bit stream multiplexing module; the psychoacoustic analysis module And calculating a masking threshold and a mask ratio of the audio signal, and outputting to the quantization and entropy encoding module; the time-frequency mapping module, configured to convert the time domain audio signal into a frequency domain coefficient, and output to the multi-resolution
  • the multi-resolution analysis module is configured to perform multi-resolution analysis on the frequency domain coefficients of the fast-changing type signal according to the signal type analysis result output by the signal property analysis module, and output the same to the quantization and entrop
  • the quantization and entropy coding module outputted by the psychoacoustic analysis module Under the control of the mask ratio, the frequency domain coefficients are quantized and entropy encoded, and output to the bit stream multiplexing module; the bit stream multiplexing module is configured to multiplex the received data to form an audio code. Code stream.
  • the enhanced audio decoding apparatus of the present invention comprises: a bit stream demultiplexing module, an entropy decoding module, an inverse quantizer group, a frequency-time mapping module, and a multi-resolution synthesis module; and the bit stream demultiplexing module is used for compression And demultiplexing the audio data stream, and outputting corresponding data signals and control signals to the entropy decoding module and the multi-resolution synthesis module; the entropy decoding module is configured to decode the foregoing signals, and restore spectrum quantization And outputting to the inverse quantizer group; the inverse quantizer group is configured to reconstruct an inverse quantization spectrum, and outputting to the multi-resolution synthesis module; the multi-resolution synthesis module is configured to perform inverse quantization
  • the resolution is integrated and output to the frequency-time mapping module; the frequency-time mapping module is configured to perform frequency-time mapping on the spectral coefficients, and output a time domain audio signal.
  • the invention is applicable to high-fidelity compression coding of audio signals of various sampling rates and channel configurations, and can support audio signals with sampling rates between 8 kHz and 192 kHz; all possible channel configurations can be supported; and a wide range of support is supported. Audio encoding/decoding of the target bit rate.
  • FIG. 1 is a block diagram of an MPEG-2 AAC encoder
  • FIG. 2 is a block diagram of an MPEG-2 AAC decoder
  • Figure 3 is a schematic structural view of an encoder using Dolby AC-3 technology
  • Figure 4 is a schematic diagram of a decoding process using Dolby AC-3 technology
  • Figure 5 is a schematic structural view of an encoding device of the present invention.
  • FIG. 6 is a schematic diagram of a filtering structure using a Harr wavelet-based wavelet transform
  • Figure 7 is a schematic diagram of time-frequency division obtained by using Harr wavelet-based wavelet transform
  • FIG. 8 is a schematic structural diagram of a decoding apparatus of the present invention.
  • Figure 9 is a schematic structural view of Embodiment 1 of the coding apparatus of the present invention.
  • FIG. 10 is a schematic structural diagram of Embodiment 1 of a decoding apparatus according to the present invention.
  • Figure 11 is a schematic structural view of Embodiment 2 of the encoding apparatus of the present invention.
  • FIG. 12 is a schematic structural diagram of Embodiment 2 of a decoding apparatus according to the present invention.
  • Figure 13 is a schematic structural view of a third embodiment of the encoding apparatus of the present invention.
  • FIG. 14 is a schematic structural diagram of Embodiment 3 of a decoding apparatus according to the present invention.
  • Figure 15 is a schematic structural view of Embodiment 4 of the encoding apparatus of the present invention.
  • FIG. 16 is a schematic structural diagram of Embodiment 4 of a decoding apparatus according to the present invention.
  • Figure 17 is a schematic structural view of Embodiment 5 of the coding apparatus of the present invention.
  • Embodiment 5 of a decoding apparatus is a schematic structural diagram of Embodiment 5 of a decoding apparatus according to the present invention.
  • Figure 19 is a schematic structural view of Embodiment 6 of the encoding apparatus of the present invention.
  • FIG. 20 is a schematic structural diagram of Embodiment 6 of a decoding apparatus according to the present invention.
  • Figure 21 is a schematic structural view of Embodiment 7 of the coding apparatus of the present invention.
  • Figure 22 is a block diagram showing the structure of a seventh embodiment of the decoding apparatus of the present invention. Detailed ways
  • Fig. 1 to Fig. 4 are schematic diagrams showing the structure of several encoders of the prior art, which have been introduced in the background, and are not mentioned here.
  • the audio encoding apparatus includes a signal property analyzing module 50, a psychoacoustic analyzing module 51, a time-frequency mapping module 52, a multi-resolution analyzing module 53, a quantization and entropy encoding module 54, and bit stream multiplexing.
  • the module 55 is configured to perform type analysis on the input audio signal, output the audio signal to the psychoacoustic analysis module 51 and the time-frequency mapping module 52, and output the signal type analysis result to the bit stream multiplexing module 55.
  • the psychoacoustic analysis module 51 is configured to calculate a masking threshold and a signal mask ratio of the input audio signal, and output to the quantization and entropy encoding module 54.
  • the time-frequency mapping module 52 is configured to convert the time domain audio signal into frequency domain coefficients and output the same.
  • the resolution analysis module 53 is configured to perform multi-resolution analysis on the frequency domain coefficients of the fast-changing type signal according to the signal type analysis result output by the psychoacoustic analysis module 51, and output to the quantization and entropy coding module. 54; the quantization and entropy coding module 54 is used to control the frequency domain system under the control of the mask ratio output by the psychoacoustic analysis module 51.
  • the number is quantized and entropy encoded and output to the bit stream multiplexing module 55; the bit stream multiplexing module 55 is configured to multiplex the received data to form an audio encoded code stream.
  • the digital audio signal is subjected to signal type analysis in the signal property analysis module 50, and the type information of the audio signal is output to the bit stream multiplexing module 55; and the audio signal is simultaneously output to the psychoacoustic analysis module 51 and the time-frequency map.
  • the masking threshold and the mask ratio of the frame audio signal are calculated in the psychoacoustic analysis module 51, and then the mask ratio is transmitted as a control signal to the quantization and entropy encoding module 54;
  • the signal is converted into frequency domain coefficients by the time-frequency mapping module 52; the frequency domain coefficients are multi-resolution analysis module 53 in the multi-resolution analysis module 53 to improve the time resolution of the fast-changing signal, and output the result to In the quantization and entropy coding module 54; under the control of the mask ratio output by the psychoacoustic analysis module 51, quantization and entropy coding are performed in the quantization and entropy coding module 54, and the encoded data and control signals are multiplexed in the bit stream.
  • Module 55 multiplexes to form a code stream that enhances the audio coding.
  • the signal property analysis module 50 performs signal type analysis on the input audio signal, and outputs the type information of the audio signal to the bit stream multiplexing module 55; and simultaneously outputs the audio signal to the psychoacoustic analysis module 51 and the time-frequency mapping module 52. .
  • the signal property analysis module 50 performs front and back masking effect analysis based on the adaptive threshold and the waveform prediction to determine whether the signal type is a slow-changing signal or a fast-changing signal, and if it is a fast-changing type signal, continues to calculate related parameter information of the abrupt component. Such as the location of the mutation signal and the strength of the mutation signal.
  • the psychoacoustic analysis module 51 is mainly used to calculate the masking threshold, the mask ratio and the perceptual entropy of the input audio signal.
  • the perceptual entropy calculated by the psychoacoustic analysis module 51 can dynamically analyze the number of bits required for the current signal frame to be transparently encoded, thereby adjusting the bit allocation between frames.
  • the psychoacoustic analysis module 51 outputs the mask ratio of each sub-band to the quantized sum
  • the entropy encoding module 54 controls it.
  • the time-frequency mapping module 52 is configured to implement the transformation of the audio signal from the time domain signal to the frequency domain coefficient, and is composed of a filter bank, and specifically may be a discrete Fourier transform (DFT) filter bank, a discrete cosine transform (DCT) filter bank, Modified discrete cosine transform (MDCT) filter bank, cosine modulated filter bank, wavelet transform filter bank, etc.
  • DFT discrete Fourier transform
  • DCT discrete cosine transform
  • MDCT Modified discrete cosine transform
  • the encoding apparatus of the present invention increases the time resolution of the encoded fast-changing signal by the multi-resolution analyzing module 53.
  • the frequency domain coefficients output by the time-frequency mapping module 52 are input to the multi-resolution analysis module 53. If it is a fast-varying type signal, the frequency domain wavelet transform or the frequency domain modified discrete cosine transform (MDCT) is performed to obtain the frequency domain coefficients.
  • the multi-resolution representation is output to the quantization and entropy encoding module 54. If it is a slowly varying type signal, the frequency domain coefficients are not processed and are directly output to the quantization and entropy encoding module 54.
  • the multi-resolution analysis module 53 includes a frequency domain coefficient transform module and a recombination module, wherein the frequency domain coefficient transform module is configured to transform the frequency domain coefficients into time-frequency plane coefficients; the recombination module is configured to reorganize the time-frequency plane coefficients according to certain rules.
  • the frequency domain coefficient transform module may adopt a frequency domain wavelet transform filter bank, a frequency domain MDCT transform filter bank, or the like.
  • the quantization and chirp encoding module 54 further includes a non-linear quantizer group and an encoder, where the quantizer can be a scalar quantizer or a vector quantizer.
  • the vector quantizer is further divided into two categories: memoryless vector quantizer and memory vector quantizer. For a memoryless vector quantizer, each input vector is independently quantized, independent of the previous vectors; a memory vector quantizer considers the previous vector when quantizing a vector, ie, exploits the correlation between vectors.
  • the main memoryless vector quantizers include a full search vector quantizer, a tree search vector quantizer, a multilevel vector quantizer, a gain/waveform vector quantizer, and a separate mean vector quantizer; the main memory vector quantizer includes predictive vector quantization And finite state vector quantizer.
  • the non-linear quantizer group further includes M sub-band quantizers.
  • the scale factor is mainly used for quantization, specifically: nonlinearly compressing all the frequency domain coefficients in the M scale factor bands, and then using the scale factor to quantize the frequency domain coefficients of the sub-band,
  • the quantized spectrum represented by the integer is output to the encoder, and the first scale factor in each frame signal is output to the bit stream multiplexing module 55 as a common scale factor, and other scale factors are differentially processed with the previous scale factor and output to the encoder.
  • the scale factor in the above steps is a constantly changing value, which is adjusted according to the bit allocation strategy.
  • the present invention provides a bit allocation strategy with minimal global perceptual distortion, as follows:
  • each sub-band quantizer is initialized, and an appropriate scale factor is selected such that the quantized value of the spectral coefficients in all sub-bands is zero.
  • the quantization noise of each sub-band is equal to the energy value of each sub-band
  • the noise masking ratio of each sub-band is NMR, etc.
  • SMR letter mask ratio
  • the number of bits consumed for quantization is 0, and the number of remaining bits is equal to the number of target bits.
  • the scale factor of the corresponding subband quantizer is Decrease by one unit and then calculate the number of bits ⁇ 5,. ( ⁇ ) that the subband needs to increase. If the remaining bits of the subband
  • the frequency domain coefficients are composed into a plurality of dimensional vectors and input into the nonlinear quantizer group.
  • the flattening factor is used to perform the flattening, that is, the dynamic range of the chirp is reduced, and then the vector quantizer is used.
  • the subjective perceptual distance measure criterion finds the codeword with the smallest distance from the vector to be quantized in the codebook, and transmits the corresponding codeword index to the encoder.
  • the leveling factor is adjusted according to the bit allocation strategy of vector quantization, and the bit allocation of vector quantization is controlled according to the perceived importance between different sub-bands.
  • Entropy coding is a kind of source coding technology. The basic idea is to give shorter-length codewords to symbols with higher probability of occurrence and longer codewords to symbols with lower probability of occurrence, so that the length of average codewords The shortest. According to Shannon's noiseless coding theorem, if the symbols of the transmitted N source messages are independent, then using the appropriate variable length coding, the average length of the codeword will satisfy H(x) _ 1
  • the entropy coding mainly includes methods such as Huffman coding, arithmetic coding or run length coding, and any entropy coding in the present invention may adopt any of the above coding methods.
  • the quantized spectrum and the scaled factor after the differential processing are entropy encoded in the encoder, and the code book serial number, the scale factor coded value and the lossless coded quantized spectrum are obtained, and then the code book number is entropy coded.
  • the code book serial number encodes the value, and then outputs the scale factor code value, the code book sequence code value, and the lossless code quantizer ⁇ to the bit stream multiplexing module 55.
  • the codeword index obtained by the vector quantizer quantization is subjected to one-dimensional or multi-dimensional entropy coding in the encoder to obtain an encoded value of the codeword index, and then the encoded value of the codeword index is output to the bitstream multiplexing module 55.
  • the encoding method based on the above encoder specifically includes: performing signal type analysis on the input audio signal; calculating a signal mask ratio of the audio signal; performing time-frequency mapping on the audio signal to obtain a frequency domain coefficient of the audio signal; and performing more on the frequency domain coefficient Resolution analysis and quantization and entropy coding; multiplexing the signal type analysis result and the encoded audio code stream to obtain a compressed audio code stream.
  • the analysis signal type is determined based on adaptive threshold and waveform prediction for front and back masking effect analysis.
  • the specific steps are: decomposing the input audio data into frames; decomposing the input frame into multiple sub-frames, and finding each sub-frame The local maximum point of the absolute value of the upper PCM data; the subframe peak value is selected in the local maximum point of each subframe; for a certain subframe peak, a plurality of (typically 3) subframe peak predictions in front of the subframe are utilized Typical sample values of a plurality of (typically 4) subframes relative to the forward delay of the subframe; calculating a difference and a ratio of the peak value of the subframe to the predicted typical sample value; if both the predicted difference and the ratio are If it is greater than the set threshold, it is determined that there is a sudden signal in the sub-frame, and the sub-frame has a local maximum peak point of the back-masking pre-echo capability, if between the front end of the sub-frame and the mask peak, 2.5 ms.
  • the frame signal belongs to the fast-changing type signal; if the predicted difference value and the ratio are not greater than the set threshold, the above steps are repeated until it is determined
  • the frame signal is a fast-changing type signal or reaches the last sub-frame. If the last sub-frame is not determined to be a fast-changing type signal, the frame signal belongs to a slowly varying type signal.
  • time-frequency transform of time-domain audio signals such as discrete Fourier transform (DFT), discrete cosine transform (DCT), modified discrete cosine transform (MDCT), cosine-modulated filter bank, wavelet transform, and so on.
  • DFT discrete Fourier transform
  • DCT discrete cosine transform
  • MDCT modified discrete cosine transform
  • cosine-modulated filter bank wavelet transform
  • wavelet transform wavelet transform
  • time-frequency transform using the modified discrete cosine transform (MDCT)
  • MDCT transform is performed on the windowed signal to obtain a frequency domain coefficient.
  • the window function of the MDCT transform must satisfy the following two conditions:
  • the S ine window can be selected as the window function.
  • the analysis filter and synthesis filter modify the above limitations on the window function.
  • time-frequency transform using cosine modulation filtering first select the time domain signal of the previous frame sample and the M frame of the current frame, and then perform windowing operation on the time domain signals of two samples of the two frames, and then A cosine modulation transform is performed on the windowed signal to obtain a frequency domain coefficient.
  • n QX"', N h -1
  • n 0, l, ---, N f - 1 where 0 ⁇ ⁇ ⁇ - 1, 0 ⁇ ⁇ ⁇ 2 ⁇ -1, is an integer greater than zero, 1) * ⁇ .
  • the analysis response window (analytical prototype filter) of the dice with cosine-modulated filter bank has a shock response length of ⁇ ⁇
  • the integrated window (integrated prototype filter) p s ( ) has an impulse response length of N s .
  • the window function also needs to meet certain conditions, see the literature "Multirate Systems and Filter Banks", PP Vaidynathan, Prentice Hal 1, Englewood Cliffs, NJ, 1993.
  • Calculating the masking value and the mask ratio of the resampled signal includes the following steps:
  • the first step is to map the signal from time domain to frequency domain.
  • the fast Fourier transform and the hanning window technique can be used to convert the time domain data into frequency domain coefficients l], expressed by the amplitude r] and the phase
  • the energy of each sub-band is the sum of the energy of all the spectral lines in the sub-band, that is, the sum >, respectively representing the upper and lower boundaries of the sub-band b.
  • the second step is to determine the pitch and non-tonal components in the signal.
  • the tonality of the signal is estimated by inter-predicting each spectral line.
  • the Euclidean distance between the predicted and true values of each squall line is mapped to an unpredictable measure, and the highly predictive spectral components are considered to be tones.
  • Very strong, and low-predictive spectral components are considered to be noise-like.
  • r pred ⁇ , [k] + ⁇ r t _ [k] - r t _ 2 [k])
  • represents the coefficient of the current frame
  • -1 indicates the coefficient of the previous frame
  • Bu 2 indicates the coefficient of the first two frames.
  • the unpredictability of each subband is the weighted sum of the energy of all the spectral lines in the subband to its unpredictability.
  • Subband energy and unpredictability c[b] are convoluted separately with the extension function
  • n[b] ⁇ s[i,b]]
  • ⁇ command law is the number of subbands divided by the frame signal.
  • the third step is to calculate the Signal-to-Noise Ratio (SNR) required for each sub-band.
  • SNR Signal-to-Noise Ratio
  • TONE Tone-Masking-Noise
  • the fourth step is to calculate the masking threshold of each subband and the perceptual entropy of the signal.
  • the noise energy threshold [6] of the current frame is compared with the noise energy threshold n prev [b] of the previous frame.
  • the number of lines included in the band is the number of lines included in the band.
  • Step 5 Calculate the signal-to-mask ratio (Signa to-Mask Ratio, SMR for short) of each sub-band signal.
  • the multi-resolution analysis module 53 performs time-frequency salt re-organization on the input frequency domain data, and improves the time resolution of the frequency domain data at the cost of reducing the frequency precision, thereby automatically adapting to the time-frequency characteristics of the fast-changing type signal, The effect of the pre-echo is suppressed, and the form of the filter bank in the time-frequency mapping module 52 need not be adjusted.
  • the multi-resolution analysis includes two steps of frequency domain coefficient transform and recombination, wherein frequency domain coefficients are transformed into time-frequency plane coefficients by frequency domain coefficient transform; time-frequency plane coefficients are grouped according to certain rules by recombination.
  • the process of multi-resolution analysis is illustrated by taking the frequency domain wavelet transform and the frequency domain MDCT transform as examples.
  • a frequency domain wavelet or wavelet packet transform wavelet base can be fixed or adaptive.
  • Harr wavelet-based wavelet transform As an example to illustrate the multi-resolution analysis of frequency domain coefficients.
  • the scale factor of the Harr wavelet base is - ⁇ ]
  • Figure 6 shows the use of Harr.
  • V2 V2 H which means high-pass filtering (filter coefficient is)
  • I 2 means 1 time of downsampling operation
  • the middle and low frequency parts of the number :), 0, ⁇ ⁇ ⁇ , ⁇
  • Different wavelet bases can be selected, and different wavelet transform structures can be selected for processing, and other similar time-frequency plane partitions are obtained. Therefore, it is possible to adjust the time-frequency plane division of the signal analysis arbitrarily according to the needs, and to meet the analysis requirements of different time and frequency resolutions.
  • the time-frequency plane coefficients are reorganized according to certain rules in the recombination module. For example: the time-frequency plane coefficients can be organized in the frequency direction, the coefficients in each frequency band are organized in the time direction, and then the well-organized coefficients are followed by the sub-window. , the order of the scale factor bands.
  • Different frequency-domain MDCT transforms can be used in different frequency domain ranges to obtain different time-frequency plane divisions, that is, different time and frequency precisions.
  • the recombination module reorganizes the time-frequency domain data outputted by the frequency domain MDCT transform filter bank.
  • a recombination method is to first organize the time-frequency plane coefficients in the frequency direction, and the coefficients in each frequency band are organized in the time direction, and then The organized coefficients are arranged in the order of the sub-window and the scale factor.
  • Quantization and entropy coding further include two steps of nonlinear quantization and entropy coding, where the quantization can be scalar quantization or vector quantization.
  • the scalar quantization includes the following steps: nonlinearly compressing the frequency domain coefficients in all scale factor bands; and then quantizing the frequency domain coefficients of the subbands by using the scale factor of each subband to obtain a quantized spectrum represented by an integer;
  • the first scale factor in each frame of the signal is used as a common scale factor; other scale factors are differentially processed from their previous scale factor.
  • Vector quantization includes the following steps: constituting a plurality of multi-dimensional vector signals by frequency domain coefficients; panning for each dimension vector according to a flattening factor; ⁇ finding a code having the smallest distance from a vector to be quantized in a codebook according to a subjective perceptual distance measure criterion Word, get its codeword index.
  • the entropy coding step comprises: entropy coding the quantized spectrum and the differentially processed scale factor to obtain a codebook serial number, a scale factor coding value, and a lossless coding quantization language; entropy coding the codebook sequence number to obtain a codebook serial number coding value.
  • the above entropy coding method can adopt any of the existing methods such as Huffman coding, arithmetic coding or run length coding.
  • the encoded audio code stream is obtained, and the code stream is multiplexed together with the common scale factor and signal type analysis result to obtain a compressed audio code stream.
  • FIG. 8 is a block diagram showing the structure of an audio decoding device of the present invention.
  • the audio decoding apparatus includes a bit stream by demultiplexing module 60, entropy decoding module 61, an inverse quantizer group 62, the multi-resolution integration module 63 and a frequency - time map module 64.
  • the compressed audio code stream is demultiplexed by the bit stream demultiplexing module 60, corresponding data signals and control signals are obtained, which are output to the entropy decoding module 61 and the multi-resolution synthesis module 63; the data signal and the control signal are in the entropy decoding module.
  • the decoding process is performed, 'the quantized value of the spectrum is recovered.
  • the above quantized values are reconstructed in the inverse quantizer group 62 to obtain an inverse quantized spectrum.
  • the inverse quantized words are output to the multi-resolution synthesis module 63, and after multi-resolution synthesis, are output to the frequency-time mapping module 64, and then The frequency-time mapping yields an audio signal in the time domain.
  • the bit stream demultiplexing module 60 decomposes the compressed audio stream to obtain corresponding data signals and control signals, and provides corresponding decoding information for other modules.
  • the signal outputted to the entropy decoding module 61 includes a common scale factor, a scale factor coded value, a codebook sequence number coded value, and a lossless coded quantized spectrum, or an encoded value of the codeword index;
  • the type information is sent to the multi-resolution synthesis module 63.
  • the entropy decoding module 61 receives the common scale factor, scale factor encoded value, and code output by the bitstream demultiplexing module 60.
  • the book serial number coded value and the lossless coded quantized spectrum are then subjected to codebook sequence number decoding, spectral coefficient decoding and scale factor decoding, reconstructed quantization, and output the integer representation of the scale factor and the quantization of the spectrum to the inverse quantizer group 62. value.
  • the decoding method employed by the entropy decoding module 61 corresponds to an entropy-encoded encoding method in the encoding device, such as Huffman decoding, arithmetic decoding, or run-length decoding. ⁇
  • the inverse quantizer group 62 After receiving the quantized value of the spectrum and the integer representation of the scale factor, the inverse quantizer group 62 inversely quantizes the quantized value of the spectrum into a non-scaled reconstruction (inverse quantization spectrum), and outputs the inverse quantization borrowing to the multi-resolution synthesis module 63. .
  • the inverse quantizer group 62 may be a uniform quantizer group or a non-uniform quantizer group implemented by a companding function.
  • the quantizer group employs a scalar quantizer
  • the inverse quantizer group 62 also employs a scalar inverse quantizer in the decoding apparatus.
  • the spectral quantized values are first nonlinearly expanded, and then each scale factor is used to obtain all the spectral coefficients (inverse quantized spectrum) in the corresponding scale factor bands.
  • the entropy decoding module 61 receives the encoded value of the codeword index output by the bitstream multiplexing module 60, and uses the encoded value of the codeword index.
  • the entropy decoding method corresponding to the entropy coding method at the time of encoding is decoded to obtain a corresponding codeword index.
  • the codeword index is output to the inverse quantizer group 62, and the quantized value (inverse quantized spectrum) is obtained by querying the codebook, and output to the multi-resolution synthesis module 63.
  • the inverse quantizer group 62 employs an inverse vector quantizer.
  • the frequency-time mapping module 64 may be an inverse discrete cosine transform (IDCT) filter bank, an inverse discrete Fourier transform (IDFT) filter bank, an inverse modified discrete cosine transform (IMDCT) filter bank, an inverse wavelet transform filter bank, and a cosine Modulation filter bank, etc.
  • IDCT inverse discrete cosine transform
  • IMDCT inverse modified discrete cosine transform
  • the decoding method based on the above decoder includes: demultiplexing the compressed audio code stream to obtain data information and control information; performing entropy decoding on the information to obtain a quantized value of the i-precision; performing inverse quantization processing on the quantized value of the spectrum, The inverse quantization is obtained; after the inverse quantization spectrum is multi-resolution integrated, frequency-time mapping is performed to obtain a time domain audio signal.
  • the demultiplexed information includes the codebook serial number encoding value, the common scale factor, the scale factor encoding value, and the lossless encoding quantization spectrum, it indicates that the ixiang coefficient is quantized by the scalar quantization technique in the encoding device, and then the decoding is performed.
  • the steps include: decoding the code number of the code book to obtain the code book number of all the scale factor bands; decoding the quantized coefficients of all the scale factor bands according to the code book corresponding to the code book serial number; decoding the scale factors of all the scale factor bands, reconstructing Quantitative spectrum.
  • the entropy decoding method adopted in the above process corresponds to an entropy coding method in the coding method, such as a run length decoding method, a Huffman decoding method, an arithmetic decoding method, and the like.
  • the process of entropy decoding is illustrated by using a run-length decoding method to decode a codebook sequence number, a Huffman decoding method to decode a quantized coefficient, and a Huffman decoding method to decode a scale factor.
  • the codebook number of all scale factor bands is obtained by the process decoding method, and the decoded codebook sequence number is an integer of a certain interval. If the interval is set to [0, 11], then only the valid range is within the valid range. That is, the codebook number between 0 and 11 corresponds to the corresponding spectral coefficient Huffman codebook. For all zero sub-bands, you can select a code book serial number corresponding to a typical optional 0 serial number.
  • the spectral coefficient Huffman codebook corresponding to the codebook number is used to decode the quantized coefficients of all the scales. If the codebook number of a scale factor band is within the valid range, and the implementation is, for example, between 1 and 11, then the codebook number corresponds to a spectral coefficient codebook, and the quantized coefficient of the scale factor band is decoded from the quantized spectrum using the codebook. The codeword index is then unpacked from the codeword index to obtain the quantized coefficients. If the codebook number of the scale factor band is not between 1 and 11, the codebook number does not correspond to any spectral coefficient codebook, and the quantized coefficient of the subband is not decoded, and the quantized coefficients of the subband are all set to zero.
  • the scale factor is used to reconstruct the i-value based on the inverse quantized spectral coefficients. If the codebook number of the scale factor is within the valid range, each codebook number corresponds to a scale factor.
  • decoding the above scale factor first reading the code stream occupied by the first scale factor, and then performing Huffman decoding on other scale factors, and sequentially obtaining the difference between each scale factor and the previous scale factor, The difference is added to the previous scale factor value to obtain each scale factor. in case The quantized coefficients of the current subband are all zero, then the scale factor of the subband does not need to be decoded.
  • the inverse quantization process includes: nonlinearly expanding the quantized values of the spectra; and obtaining all spectral coefficients (inverse quantized spectra) in the corresponding scale factor bands according to each scale factor.
  • the step of decoding comprises: adopting an entropy corresponding to the entropy coding method in the coding device
  • the decoding method decodes the encoded value of the codeword index to obtain a codeword index.
  • the codeword index is then inverse quantized to obtain an inverse quantized spectrum.
  • the inverse quantization spectrum For the inverse quantization spectrum, if it is a fast-varying type signal, multi-resolution analysis is performed on the frequency domain coefficients, and then the multi-resolution representation of the frequency domain coefficients is quantized and entropy encoded; if it is not a fast-changing type signal, the frequency is directly The domain coefficients are quantized and entropy encoded.
  • Multi-resolution synthesis can be performed by frequency domain wavelet transform or frequency domain MDCT transform.
  • the frequency domain wavelet synthesis method includes: first recombining the above-mentioned time-frequency plane coefficients according to a certain rule; then performing wavelet transform on the frequency domain coefficients to obtain a time-frequency plane coefficient.
  • the MDCT transform method includes: first recombining the above-mentioned time-frequency plane coefficients according to a certain rule, and then performing MDCT transform on the frequency domain coefficients several times to obtain a time-frequency plane coefficient.
  • the method of recombining may include: firstly, the time-frequency plane coefficients are organized in the frequency direction, the coefficients in each frequency band are organized in the time direction, and then the organized coefficients are arranged in the order of the sub-window and the scale factor sub-band.
  • the method of performing frequency-time mapping processing on frequency domain coefficients corresponds to the time-frequency mapping processing method in the encoding method, and may use inverse discrete cosine transform (IDCT), inverse discrete Fourier transform (IDFT), inverse modified discrete cosine transform ( IMDCT), inverse wavelet transform and other methods are completed.
  • IDCT inverse discrete cosine transform
  • IDFT inverse discrete Fourier transform
  • IMDCT inverse modified discrete cosine transform
  • wavelet transform inverse wavelet transform
  • the inverse-corrected discrete cosine transform IMDCT is taken as an example to illustrate the frequency-time mapping process.
  • the frequency-time mapping process consists of three steps: IMDCT transformation, time domain windowing, and time domain superposition.
  • the IMDCT transform is performed on the pre-predicted spectrum or the inverse quantized spectrum to obtain the transformed time domain signal x,.
  • the expression of the IMDCT transform is: ⁇ ,. prepare , where ⁇ represents the sample number, and
  • the time domain signal obtained by the IMDCT transform is windowed in the time domain.
  • Typical window functions are S i ne windows, Ka i se r- Bes se l windows, and the like.
  • the present invention employs a fixed window function whose window function is:
  • the double positive history transform can be used to modify the above-mentioned restrictions on the window function using a specific analysis filter and synthesis filter.
  • the windowed time domain signal is superimposed to obtain a time domain audio signal.
  • Figure 9 is a schematic illustration of a first embodiment of an encoding apparatus of the present invention.
  • This embodiment adds a frequency domain linear prediction and vector quantization module 56 on the basis of FIG. 5, between the output of the multiresolution analysis module 53 and the input of the quantization and entropy coding module 5, and outputs the residual sequence to the quantized sum.
  • the entropy encoding module 54 simultaneously outputs the quantized codeword index as side information to the bitstream multiplexing module 55.
  • the frequency-domain linear prediction and vector quantization module 56 needs to perform linear prediction on the frequency domain coefficients in each time period. Multi-level vector quantization.
  • the frequency signal output from the multi-resolution analysis module 53 is transmitted to the frequency domain linear prediction and vector quantization module 56.
  • the frequency domain coefficients on each time period are standard.
  • Linear predictive analysis if the predicted gain satisfies a given condition, the frequency domain coefficients are linearly predicted error filtered, and the obtained predictive coefficients are converted into line spectral frequency coefficients LSF (Line Spec t rum Frequency ), and then the best distortion is used.
  • the metric search calculates the codeword index of each codebook, and transmits the code index as side information to the bit stream multiplexing module 55, and the residual sequence obtained through the prediction analysis is output to the quantity ⁇ and the entropy coding module. 54.
  • the frequency domain linear prediction and vector quantization module 56 is composed of a linear predictive analyzer, a linear predictive filter, a converter, and a vector quantizer.
  • the frequency domain coefficient is input into the line 'f prediction analyzer for prediction analysis, and the prediction gain and the prediction coefficient are obtained.
  • the frequency domain coefficients satisfying certain conditions are filtered out to the linear prediction filter to obtain a residual sequence;
  • the difference sequence is directly output to the quantization and entropy coding mode 54, and the prediction coefficient is converted into a line spectral frequency coefficient LSF by the converter, and then the LSF parameter is sent to the vector quantizer for multi-level vector quantization, and the quantized signal is transmitted to In the bit stream multiplexing module 55.
  • CO is a one-sided i-perform corresponding to the positive frequency component of the signal, ie the Hi lber t envelope of the signal is related to the autocorrelation function of the signal spectrum.
  • PSD(f) The relationship between the power ⁇ density function of the signal and the autocorrelation function of its time domain waveform is: PSD(f) , so the squared Hi lber t envelope of the signal in the time domain and the power of the signal in the frequency domain
  • the borrowing density function is mutually dual. From the above, the partial bandpass signal in each certain frequency range, if its Hi lbert envelope remains constant, then the autocorrelation of the adjacent value will remain constant, which means that the spectral coefficient sequence is relative to the frequency. It is a steady-state sequence so that the predictive coding technique can be used to process the speech values, and a common set of prediction coefficients is used to effectively represent the signal.
  • the encoding method based on the encoding apparatus shown in FIG. 9 is basically the same as the encoding method based on the encoding apparatus shown in FIG. 5, except that the following steps are added:
  • the frequency domain coefficients are subjected to multi-resolution analysis, for each time period
  • the frequency domain coefficient is subjected to standard linear prediction analysis to obtain prediction gain and prediction coefficient; whether the prediction gain exceeds the set threshold value, and if it exceeds, frequency domain linear prediction error filtering is performed on the frequency domain coefficient according to the prediction coefficient to obtain a residual
  • the prediction coefficient is converted into a line pair frequency coefficient, and the line potential is subjected to multi-level vector quantization processing to obtain side information; the residual sequence is quantized and entropy encoded; if the prediction gain does not exceed the set threshold,
  • the frequency domain coefficients are then quantized and entropy encoded.
  • the standard linear prediction analysis of the frequency domain coefficients on each time period is first performed, including calculating the autocorrelation matrix and recursively performing the Levinson-Durb in algorithm to obtain the prediction gain and the prediction coefficient. Then, it is judged whether the calculated prediction gain exceeds a preset threshold, and if it exceeds, linear prediction error filtering is performed on the frequency domain coefficients according to the prediction coefficient; otherwise, the frequency domain coefficients are not processed, and the next step is performed to quantize the frequency domain coefficients. And entropy coding.
  • Linear prediction can be divided into forward prediction and backward prediction.
  • Forward prediction refers to predicting the current value by using the value before a certain moment
  • backward prediction refers to predicting the current value by using the value after a certain moment.
  • the frequency domain coefficient YU is filtered to obtain the prediction error.
  • the frequency domain coefficient X (k) of the time-frequency transform output can be represented by the residual sequence E ⁇ 3 ⁇ 4 and a set of prediction coefficients. And then converting the set of prediction coefficients a t into a line 'i Pu frequency coefficient LSF and Perform multi-level vector quantization, vector quantization selects the best distortion metric (such as nearest neighbor criterion), and searches and calculates the codeword index of each codebook, so as to determine the code corresponding to the prediction coefficient, and use the codeword index as Side information output. At the same time, the residual sequence is quantized and entropy encoded.
  • the dynamic range of the residual sequence of the spectral coefficients is smaller than the dynamic range of the original coefficients, so that fewer bits can be allocated in the quantization, or an improved coding gain can be obtained for the same number of bits. .
  • Embodiment 10 is a schematic diagram of Embodiment 1 of a decoding apparatus.
  • the decoding apparatus adds an inverse frequency domain linear prediction and vector quantization module 65 based on the decoding apparatus shown in FIG. 8, and the inverse frequency domain cropping prediction and vector quantization module 65 is located between the output of the inverse quantizer group 62 and the input of the multiresolution synthesis module 63, and the bitstream demultiplexing module 60 outputs inverse frequency domain linear predictive vector quantization control information thereto for inverse quantization f (residual The inverse quantization process and the inverse linear prediction filter-wave are performed to obtain a spectrum before prediction, and output to the multi-resolution synthesis module 63.
  • frequency domain linear predictive vector quantization techniques are used to p-pre-echo and obtain a larger coding gain. Therefore, in the decoder, the inverse frequency domain linear predictive vector quantization control information output by the inverse quantization spectrum and bit stream demultiplexing module 60 is input to the inverse frequency domain linear prediction and vector quantization module 65 to recover the spectrum before the linear prediction.
  • the inverse frequency domain linear prediction and vector quantization module 65 includes an inverse vector quantizer, an inverse transformer, and an inverse linear predictor, wherein the inverse vector quantizer is used to inverse quantize the codeword index to the J line pair frequency coefficient (LSF).
  • the inverse converter is used to inversely convert the line spectral frequency coefficient (LSF) into a prediction coefficient; the inverse linear prediction filter is used to inversely filter the inverse quantized word by the prediction coefficient, obtain the spectrum before prediction, and output to the multi-resolution Synthesis module 63.
  • the decoding method based on the decoding device shown in FIG. 10 is basically the same as the decoding method based on the decoding device shown in FIG. 8, except that the following steps are added: After obtaining the inverse quantization word, it is determined whether or not the inverse quantization spectrum is included in the control information. Inverse frequency domain linear predictive vector quantization information, if it is, then inverse vector quantization process is performed to obtain prediction coefficients, and linear prediction synthesis is performed on the inverse quantization spectrum according to the prediction coefficients to obtain a pre-test spectrum; Resolution synthesis.
  • the pre-predicted spectrum 0 > is subjected to frequency-time mapping processing.
  • control information indicates that the signal frame has not undergone frequency domain linear predictive vector quantization
  • the inverse frequency domain linear predictive vector quantization process is not performed, and the inverse quantization ⁇ is directly subjected to frequency-time mapping processing.
  • FIG 11 is a block diagram showing the structure of a second embodiment of the encoding apparatus of the present invention.
  • This embodiment adds a difference stereo (M/S) encoding module 57 on the basis of FIG. 5 between the output of the multiresolution analysis module 53 and the input of the quantization and entropy encoding module 54.
  • M/S difference stereo
  • a psychoacoustic analysis module 51 in addition to a monaural audio signal is calculated masking threshold value, and further calculates a masking threshold difference channel output to quantization and entropy encoding module 54.
  • the and difference stereo encoding module 57 may also be located between the quantizer group and the encoder in the quantization and entropy encoding module 54.
  • the difference stereo encoding module 57 is to use the correlation between the two channels of the channel pair, and the frequency domain coefficient/residual sequence of the left and right channels is equivalent to the frequency domain coefficient/residual sequence of the difference channel, In this way, the effect of reducing the code rate and improving the coding efficiency is achieved, so that it is only applicable to multi-channel signals of the same signal type. If it is a mono signal or a multi-channel signal of which the signal type is not uniform, the difference and the stereo encoding process are not performed.
  • the encoding method based on the encoding apparatus shown in FIG. 11 is the same as the encoding method based on the encoding apparatus shown in FIG. 5, and the difference is that the following steps are added: before the quantization and encoding processing of the frequency domain coefficients are performed, Whether the audio signal is a multi-channel signal, if it is a multi-channel signal, it is determined whether the signal types of the left and right channel signals are consistent, and if the signal types are the same, it is determined whether the scale factor bands corresponding to the two channels are satisfied and The difference stereo coding condition, if it is satisfied, is subjected to the sum and the difference stereo coding to obtain the frequency domain coefficients of the difference channel; if not, the difference and the stereo coding are not performed; if the mono signal or the signal type is inconsistent For multi-channel signals, the frequency domain coefficients are not processed.
  • the difference stereo coding can be applied before quantization and before entropy coding, that is, after quantization of the frequency domain coefficients, whether the audio signal is a multi-channel signal, and if it is a multi-channel signal, Then, it is judged whether the signal types of the left and right channel signals are consistent. If the signal types are the same, it is judged whether the scale factor bands corresponding to the two channels satisfy the difference and the stereo encoding conditions, and if they are satisfied, the stereo coding is performed. If not, the stereo encoding process is not performed; if it is a mono signal or a multi-channel signal with inconsistent signal types, the frequency domain coefficients are not processed and the differential stereo encoding process is performed.
  • the judgment method adopted by the present invention is: by K-L transform.
  • the specific judgment process is as follows: If the spectral coefficient of the left channel scale factor band is l (k), the spectral coefficient of the scale factor band corresponding to the right channel is r (k), and its correlation matrix
  • the rotation angle a satisfies tan(2iz) and the difference stereo coding mode. Therefore
  • the frequency domain coefficients of the left and right channels in the scale factor band are replaced by the frequency domain coefficients of the linear transform and the difference channel:
  • denotes the channel frequency domain coefficient
  • S denotes the difference channel frequency domain coefficient
  • L denotes the left channel frequency domain coefficient
  • R denotes the right channel frequency domain coefficient.
  • / represents the quantized sum channel frequency domain coefficients; represents the quantized difference channel frequency domain coefficients; represents the quantized left channel frequency domain coefficients; A represents the quantized right channel frequency domain coefficients.
  • Figure 12 is a schematic diagram of a second embodiment of a decoding apparatus.
  • the decoding apparatus adds a difference and difference stereo decoding module 66, based on the decoding apparatus shown in FIG. 8, between the output of the inverse quantizer group 62 and the input of the multi-dividend ratio synthesizing module 63, and receives the bit stream demultiplexing.
  • the signal type analysis result and the difference stereo control signal output by the module 60 are used for The inverse quantized spectrum of the difference channel and the difference channel are converted into inverse quantization pans of the left and right channels according to the above control information.
  • the stereo decoding, and difference stereo decoding module 66 determines whether inverse quantization ⁇ and differential stereo decoding are required in certain scale factor bands based on the flag bits of the scale factor band. If the sum and the difference stereo coding are performed in the encoding device, the inverse quantized speech must be subjected to the difference stereo decoding in the decoding device.
  • the sum and difference stereo decoding module 66 may also be located between the output of the demodulation module 61 and the input of the inverse quantizer group 62, and receive the sum and difference stereo control signals and signal type analysis outputs output by the bitstream demultiplexing module 60.
  • the decoding method based on the decoding apparatus shown in FIG. 12 is basically the same as the decoding method based on the decoding apparatus shown in FIG. 8, except that the following steps are added: after the inverse quantization spectrum is obtained, if the signal type analysis result indicates that the letter type is consistent And determining whether it is necessary to perform inverse stereo decoding on the inverse quantized signal according to the difference stereo control signal; if necessary, determining whether the scale factor band requires and difference stereo decoding according to the flag bit on each scale factor band, ⁇ .
  • the inverse quantized spectrum of the difference channel in the scale factor band is converted into the inverse quantized spectrum of the left and right channels, and then subjected to subsequent processing; if the signal type is inconsistent or does not need to perform and differential stereo decoding, then The quantized spectrum is processed without further processing.
  • the difference stereo decoding can also be performed after the entropy decoding process and before the inverse quantization process, that is, after the quantized value of the spectrum, if the signal type analysis result indicates that the signal types are consistent, it is determined according to the difference and the stereo control signal whether it is necessary to The quantized value of the spectrum is subjected to and the difference stereo decoding; if necessary, the flag bit on each scale factor band is used to determine whether the scale factor band requires and the difference stereo decoding, and if necessary, the sum of the scale factor bands
  • the quantized value of the channel is converted into the quantized value of the spectrum of the left and right channels, and then subjected to subsequent processing; if the signal type is inconsistent or does not need to be subjected to and the difference stereo decoding, the quantized value of the spectrum is not processed, and subsequent processing is directly performed.
  • the frequency domain coefficients of the left and right channels in the scale factor band are obtained by the following operations through the frequency domain coefficients of the difference channel: , where: represents the quantized
  • S represents the quantized difference channel frequency domain coefficient
  • Z represents the quantized left channel frequency domain coefficient
  • f represents the quantized right channel frequency domain coefficient.
  • the frequency coefficients of the left and right channels after inverse quantization in the subband are obtained according to the following matrix operation and the frequency domain coefficients of the difference channel, where: Table and channel frequency domain
  • Fig. 13 is a view showing the configuration of a third embodiment of the encoding apparatus of the present invention. This embodiment, on the basis of Fig. 9, adds a difference and stereo encoding module 57 between the output of the frequency domain linear prediction and vector quantization module 56 and the input of the quantization and entropy encoding module 54, the psychoacoustic analysis module 51 The masking values of the difference channels are output to a 4: quantization and entropy encoding module 54.
  • the sum and difference stereo coding module 57 may also be located between the quantizer group and the scalar in the quantization and entropy coding module 54, and receive the signal type analysis result output by the psychoacoustic analysis module 51.
  • the function and working principle of the and difference stereo coding module 57 are the same as those in FIG. 11, and details are not described herein again.
  • the encoding method based on the encoding apparatus shown in FIG. 13 is basically the same as the encoding method based on the encoding apparatus shown in FIG. 9, except that the following steps are added:
  • the audio signal is judged. Whether it is a multi-channel signal, if it is a multi-channel signal, it is judged whether the signal types of the left and right channel signals are consistent.
  • the scale factor band performs and the difference stereo encoding; if not, the sum and difference stereo encoding processing is not performed; if it is a mono signal or a multi-channel signal of which the signal type is inconsistent, the difference stereo encoding processing is not performed.
  • the difference stereo coding can be applied before quantization and before entropy coding, that is, after quantization of the frequency domain coefficients, whether the audio signal is a multi-channel signal, and if it is a multi-channel signal, And determining whether the signal types of the left and right channel signals are consistent. If the signal types are consistent, determining whether the scale factor band satisfies the coding condition, and if so, performing the difference stereo coding on the scale factor band; if not, not Performing and difference stereo encoding processing; if it is a mono signal or a multi-channel signal of which the signal type is inconsistent, the pairing and the difference stereo encoding processing are performed.
  • Figure 14 is a block diagram showing a third embodiment of the decoding apparatus.
  • the decoding apparatus adds a sum and difference stereo decoding module 66 on the basis of the decoding apparatus shown in Fig. 10, between the output of the inverse quantizer group 62 and the input of the inverse frequency domain linear prediction and vector quantization module 65, the bit stream
  • the demultiplexing module 60 outputs and the difference stereo control signals thereto.
  • the sum and difference stereo decoding module 66 may also be located between the output of the entropy decoding module 61 and the input of the inverse quantizer group 62, and receive the sum and difference stereo control signals output by the bit stream demultiplexing module 60.
  • the decoding method based on the decoding device shown in FIG. 14 is basically the same as the decoding method based on the decoding device shown in FIG. 10, and the difference is that the following steps are added: after the inverse quantization pan is obtained, if the signal type analysis result indicates that the signal type is consistent, Then, according to the difference stereo control signal, it is judged whether it is necessary to perform inverse stereo quantization and differential stereo decoding; if necessary, Determining whether the scale factor band requires and difference stereo decoding according to the flag bit on each scale factor band, if necessary, converting the inverse quantization spectrum of the difference channel in the scale factor band into the inverse quantization spectrum of the left and right aisles Then, the subsequent processing is performed; if the signal types are inconsistent or the difference stereo decoding is not required, the inverse quantization spectrum is not processed, and the subsequent processing is directly performed.
  • the difference stereo decoding can also be performed before the inverse quantization process, that is: after obtaining the quantized value of ⁇ , if the signal type analysis result indicates that the signal type is consistent, it is determined according to the difference and the stereo control ft number whether the quantized value of the spectrum needs to be performed. And difference stereo decoding; if necessary, determining whether the scale factor band requires and difference stereo decoding according to the flag bit of each scale factor band, and if necessary, converting the quantized value of the spectrum of the scale factor band and the difference channel
  • the quantized values of the spectrum of the left and right channels are subjected to subsequent processing; if the signal type is consistent or no difference stereo decoding is required, the quantized values of the spectrum are not processed, and the subsequent processing is directly performed'
  • Fig. 15 shows a schematic representation of a fourth embodiment of the encoding device of the present invention.
  • a resampling module 590 and a band extending module 591 are added, wherein the resampling module 590 resamples the input audio signal, changes the sampling rate of the audio signal, and then changes The sampling rate audio signal is output to the signal property analysis module 50; the band extension module 591 is configured to analyze the input audio signal over the entire frequency band, and extract the spectral envelope of the high frequency portion and its relationship with the low frequency portion. , and 3 ⁇ 4" out to the bit stream multiplexing module 55.
  • the resampling module 590 is configured to resample the input audio signal, and the t sampling includes both upsampling and downsampling.
  • the following sampling is used as an example to illustrate resampling.
  • the resampling block 590 includes a low pass filter and a down sampler, wherein the low pass filter is used to limit the frequency band of the audio signal, eliminating aliasing that may be caused by the sample.
  • the input audio signal is low-pass filtered and downsampled. Assume that the input audio signal is "s (n), the impulse response is 1 ⁇ ).
  • the sampling rate is reduced by a factor of 1 compared to the original input audio signal.
  • the band extension module 591 After the original input audio signal is input to the band extension module 591, the analysis is performed on the entire frequency band, and the spectral envelope of the high-frequency portion and its characteristics related to the low-frequency portion are extracted, and output as the frequency extension control information to the bit stream multiplexing. Module 55.
  • the basic principle of frequency band expansion is: For most audio signals, the characteristics of the high frequency part have a strong correlation with the characteristics of the low frequency part, so the high frequency part of the audio signal can be effectively reconstructed by the low frequency part, thus, The high frequency portion of the audio signal may not be transmitted. In order to ensure that the high frequency part can be reconstructed correctly, only the compressed audio stream is compressed. It is sufficient to transmit a small number of band extension control signals.
  • the band expansion module 591 includes a parameter extraction module and a language envelope extraction module, and the input signal enters the parameter extraction module to extract parameters indicating spectral characteristics of the input signal in different time-frequency regions, and then in the spectral envelope extraction module, at a certain time
  • the frequency resolution estimates the spectral envelope of the high frequency portion of the signal. To ensure that the time-frequency resolution is best suited to the characteristics of the current input signal, the time-frequency resolution of the spectral envelope is freely selectable.
  • the parameters of the input signal spectrum characteristics and the speech envelope of the high frequency portion are output as a band extension control signal to the bit stream multiplexing module 55 for multiplexing.
  • the bitstream multiplexing module 55 receives the code stream including the common scale factor, the scale factor coded value, the codebook sequence number coded value, and the lossless coded quantized spectrum or the coded value of the codeword index and the band extension output by the quantization and entropy coding module 54. After the band extension control signal output by the module 591 is multiplexed, a compressed audio data stream is obtained.
  • the encoding method based on the encoding apparatus shown in FIG. 15 specifically includes: analyzing an input audio signal over the entire frequency band, extracting a high spectral envelope and a signal spectral characteristic parameter as a frequency band extension control signal; resampling the input audio signal and a signal Type analysis; calculating the signal-to-mask ratio of the resampled signal; performing time-frequency mapping on the resampled signal to obtain a frequency domain coefficient of the audio signal; performing quantization and entropy coding on the frequency domain coefficient; and extending the frequency band control signal and coding
  • the audio stream is multiplexed to obtain a compressed audio stream.
  • the resampling process includes two steps: limiting the frequency band of the audio signal; and down-sampling the audio signal of the limited band.
  • FIG. 16 is a schematic structural diagram of Embodiment 4 of the decoding apparatus. The embodiment is based on the decoding apparatus shown in FIG. 8. A band extension module 68 is added, and the band extension control information and frequency output by the bit stream demultiplexing module 60 are received. - The time domain audio signal of the low frequency band output by the time mapping module 64 is reconstructed by frequency shifting and high frequency adjustment to output a wideband audio signal.
  • the decoding method based on the decoding apparatus shown in FIG. 16 is basically the same as the decoding method based on the decoding apparatus shown in FIG. 8, except that the following steps are added: After obtaining the time domain audio signal, the control information and the time domain audio are expanded according to the frequency band. The signal reconstructs the high frequency portion of the audio signal to obtain a wideband audio signal.
  • Figures 17, 1 and 21 are fifth to seventh embodiments of the encoding apparatus, based on the encoding apparatus shown in Figures 11, 9, and 13, respectively, with the addition of the resampling module 590 and the band extending module 591.
  • the connection relationship, functions, and principles of the two modules are the same as those in FIG. 15 and will not be described here.
  • a band extension module 68 is added, and a bit stream is received.
  • the band extension control information output by the demultiplexing module 60 and the time domain audio signal of the low frequency band output by the frequency-time mapping module 64 reconstruct the high frequency signal portion by spectrum shifting and high frequency adjustment, and output a wideband audio signal.
  • the gain control module may further be included, and the signal property analysis module is received.
  • 50 output audio signal, controlling the dynamic range of the fast-changing type signal, eliminating pre-echo in the audio, and outputting the output to the time-frequency mapping module 52 and the psychoacoustic analysis module 51, and simultaneously adjusting the gain to the bit stream Use module 55.
  • the gain control module only controls the fast-change type signal according to the signal type of the audio signal, and does not process and directly output the slowly-changing type signal.
  • the gain control module adjusts the time-domain energy envelope of the signal, and increases the gain value of the signal before the fast-changing point, so that the time-domain signal degrees before and after the fast-changing point are relatively close; then the time domain energy is adjusted.
  • the time domain signal of the envelope is output to the time-frequency mapping module 52, and the increased adjustment amount is output to the bit stream multiplexing module 55.
  • the encoding method based on the encoding device is basically the same as the encoding method based on the above encoding device, with the difference that the following steps are added: Gain control is performed on the signal subjected to signal type analysis.
  • an inverse gain control module fe may be further included, and after receiving the output of the frequency-time mapping module 64, the signal type and the gain adjustment amount output by the bit stream demultiplexing module 60 are received. Information, used to adjust the gain of the time domain signal, to control the pre-echo. After the inverse gain control module is connected to the reconstructed time domain signal outputted by the frequency-time mapping module 64, the fast-changing type signal is controlled, and the slow-changing type is not processed.
  • the inverse gain control module 4N3 ⁇ 4 gain adjustment amount information adjusts the energy envelope of the reconstruction or the signal, reduces the amplitude value of the signal before the fast-changing point, and returns the energy envelope to the original front low and high State, such that the magnitude of the quantization noise before the fast change point is correspondingly reduced with the amplitude value of the signal, thereby controlling the pre-return.
  • the decoding method based on the decoding device is basically the same as the decoding method based on the above decoding device, with the difference that the following steps are added: inverse gain control is performed on the reconstructed time domain signal.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

An enhanced audio encoding device is consisted of a psychoacoustical analyzing module, a time-frequency mapping module, a quantization and entropy encoding module, a bit-stream multiplexing module, a signal characteristic analyzing module and a multi-resolution analyzing module, in which the signal characteristic analyzing module is configured to analyze the signal type of the input audio signal, the psychoacoustical analyzing module calculates a masking threshold and a signal-to-masking ratio of the audio signal, and outputs to said quantization and entropy encoding module, the multi-resolution analyzing module is configured to perform an analysis of the multi-resolution based on the signal type, the quantization and entropy encoding module performs a quantization and an entropy encoding of the frequency-domain coefficients under the control of the signal-to-masking ratio, and the bit-stream multiplexing module forms an audio encoding code stream. The device can support the ratio signal whose sampling rate is between 8kHz and 192kHz.

Description

一种增强音频编解码装置及方法 技术领域  Enhanced audio codec device and method

本发明涉及音频编解码技术领域, 具体地说, 涉及一种基于感知模型的增强音频编解 码装置及方法。 背景技术  The present invention relates to the field of audio codec technology, and in particular to an enhanced audio codec device and method based on a perceptual model. Background technique

为得到高保真的数字音频信号, 需对数字音频信号进行音频编码或音频压缩以便于存 储和传输。 对音频信号进行编码的目的是用尽可能少的比特数实现音频信号的透明表示, 例如原始输入的音频信号与经编码后输出的音频信号之间几乎没有差别。  In order to obtain a high-fidelity digital audio signal, the digital audio signal is audio encoded or audio compressed for storage and transmission. The purpose of encoding an audio signal is to achieve a transparent representation of the audio signal with as few bits as possible, for example, there is little difference between the originally input audio signal and the encoded output audio signal.

在二十世纪八十年代初, CD的出现体现了用数字表示音频信号的诸多优点, 例如高保 真度、 大动态范围和强鲁棒性。 然而, 这些优点都是以很高的数据速率为代价的。 例如 CD 质量的立体声信号的教字化所要求的采样率为 44. 1kHz,且每个釆样值需用 15比特进行均 匀量化, 这样, 没有经过压缩的数据速率就达到了 1. 41Mb/s , 如此高的数据速率给数据的 传输和存储带来极大的不便, 特别是在多媒体应用和无线传输应用的场合下, 更是受到带 宽和成本的限制。 为了保持高质量的音频信号, 因此要求新的网络和无线多媒体数字音频 系统必须降低数据的速率, 且同时不损害音频的质量。 针对上述问题, 目前已提出了多种 既能得到很高压缩比又能产生高保真的音频信号的音频压缩技术, 典型的有国际标准化组 织 IS0 EC 的 MPEG- 1/- 2/-4 技术、 杜比公司的 AC- 2/AC- 3 技术、 索尼公司的 ATRAC/MiniDisc/SDDS技术以及朗讯科技的 PAC/EPAC/MPAC技术等。 下面选择 MPEG- 2 AAC 技术、 杜比公司的 AC- 3技术进行具体的说明。  In the early 1980s, the advent of CDs represented the many advantages of digitally representing audio signals, such as high fidelity, large dynamic range, and robustness. However, these advantages are at the expense of high data rates. For example, the sample rate of the CD-quality stereo signal requires a sampling rate of 44.1 kHz, and each sample value is uniformly quantized with 15 bits, so that the uncompressed data rate reaches 1.41 Mb/s. Such a high data rate brings great inconvenience to the transmission and storage of data, especially in the case of multimedia applications and wireless transmission applications, and is limited by bandwidth and cost. In order to maintain high quality audio signals, new network and wireless multimedia digital audio systems are required to reduce the rate of data without compromising the quality of the audio. In response to the above problems, various audio compression technologies have been proposed which can obtain high compression ratio and high fidelity audio signals, and typically have the MPEG-1/-2/-4 technology of the International Organization for Standardization (ISO) EC0 EC, Dolby's AC-2/AC-3 technology, Sony's ATRAC/MiniDisc/SDDS technology, and Lucent's PAC/EPAC/MPAC technology. The following is a detailed description of MPEG-2 AAC technology and Dolby AC-3 technology.

MPEG- 1技术和 MPEG- 2 BC技术是主要用于单声道及立体声音频信号的高音质编码技 术, 随着对在较低码率下达到较高编码质量的多声道音频编码的需求的日益增长, 由于 MPEG-2 BC编码技术强调与 MPEG- 1技术的后向兼容性, 因此无法以低于 540kbps的码率实 现五声道的高音质编码。 针对这一不足, 提出了 MPEG- 2 AAC技术, 该技术可采用 320kbps 的速率对五声道信号实现较高质量的编码。 MPEG-1 technology and MPEG-2 BC technology are high-quality audio coding techniques primarily used for mono and stereo audio signals, with the need for multi-channel audio coding that achieves higher coding quality at lower bit rates. Increasingly, since MPEG-2 BC encoding technology emphasizes backward compatibility with MPEG-1 technology, it is impossible to achieve high-quality encoding of five channels at a code rate lower than 540 kbps. In response to this shortcoming, the MPEG- 2 AAC technology is proposed, which can achieve higher quality encoding for five-channel signals at a rate of 320 kbps.

图 1给出了 MPEG- 2 AAC编码器的方框图, 该编码器包括增益控制器 101、 滤波器组 102、 时域噪声整形模块 103、 强度 /耦合模块 104、 心理声学模型、 二阶后向自适应预测 器 105、 和 /差立体声模块 106、 比特分配和量化编码模块 107以及比特流复用模块 108 , 其中比特分配和量化编码模块 107 进一步包括压缩比 /失真处理控制器、 尺度因子模块、 非均匀量化器和熵编码模块。 Figure 1 shows a block diagram of an MPEG-2 AAC encoder comprising a gain controller 101, a filter bank 102, a time domain noise shaping module 103, an intensity/coupling module 104, a psychoacoustic model, a second order backward self. An adaptive predictor 105, a / difference stereo module 106, a bit allocation and quantization encoding module 107, and a bitstream multiplexing module 108, wherein the bit allocation and quantization encoding module 107 further includes a compression ratio/distortion processing controller, a scale factor module, Non-uniform quantizer and entropy coding module.

滤波器组 102 采用改进的离散余弦变换(MDCT ), 其分辨率是信号自适应的, 即对于 稳态信号采用 2048点 MDCT变换, 而对于瞬态信号则采用 256点 MDCT变换; 这样, 对于 48kHz釆样的信号, 其最大频率分辨率为 23Hz, 最大时间分辨率为 2. 6ms。 同时在滤波器 组 102中可以使用正弦窗和 Kaiser-Bessel窗, 当输入信号的谐波间隔小于 140Hz时使月 正弦窗, 当输入信号中很强的成分间隔大于 220Hz时使用 Ka iser- Bessel窗。 The filter bank 102 employs a modified discrete cosine transform (MDCT) whose resolution is signal adaptive, that is, a 2048-point MDCT transform is used for the steady-state signal, and a 256-point MDCT transform is used for the transient signal; thus, for 48 The kHz-like signal has a maximum frequency resolution of 23 Hz and a maximum time resolution of 2.6 ms. At the same time, a sine window and a Kaiser-Bessel window can be used in the filter bank 102, and the sinusoidal window is used when the harmonic interval of the input signal is less than 140 Hz, and the Ka iser-Bessel window is used when the strong component interval of the input signal is greater than 220 Hz. .

音频信号经过增益控制器 101后进入滤波器組 102 , 根据不同的信号进行滤波, 然后 通过时域噪声整形模块 103对滤波器组 102输出的频谱系数进行处理, 时域噪声整形技术 是在频域上对频谱系数进行线性预测分析, 然后依据上述分析控制量化噪声在时域上的^ 状, 以此达到控制预回声的目的。  After the audio signal passes through the gain controller 101, it enters the filter bank 102, performs filtering according to different signals, and then processes the spectral coefficients output by the filter bank 102 through the time domain noise shaping module 103. The time domain noise shaping technique is in the frequency domain. Linear prediction analysis is performed on the spectral coefficients, and then the quantization noise is controlled in the time domain according to the above analysis, thereby achieving the purpose of controlling the pre-echo.

强度 /耦合模块 104是用于对信号强度的立体声编码, 由于对于高频段(大于 2kHz ) 的信号, 听觉的方向感与有关信号强度的变化(信号包络)有关, 而与信号的波形无关, 即恒包络信号对听觉方向感无影响, 因此可利用这一特点以及多声道间的相关信息, 将若 干声道合成一个共同声道进行编码, 这就形成了强度 /耦合技术。  The intensity/coupling module 104 is for stereo encoding of signal strength, since for a high frequency band (greater than 2 kHz) the sense of direction of the hearing is related to the change in signal strength (signal envelope), regardless of the waveform of the signal, That is, the constant envelope signal has no influence on the sense of direction of the hearing, so this feature and the related information between multiple channels can be used to combine several channels into one common channel for encoding, which forms an intensity/coupling technique.

二阶后向自适应预测器 105 用于消除稳态信号的冗余, 提高编码效率。 和差立体声 ( M/S )模块 106 是针对声道对进行操作, 声道对是指诸如双声道信号或多声道信号中钓 左右声道或左右环绕声道的两个声道。 M/S模块 106利用声道对中两个声道之间的相关 4生 以达到减少码率和提高编码效率的效果。 比特分配和量化编码模块 107是通过一个嵌套循 环过程实现的, 其中非均匀量化器是进行有损编码, 而熵编码模块是进行无损编码, 这样 可以去除冗余和减少相关。 嵌套循环包括内层循环和外层循环, 其中内层循环调整非均匀 量化器的步长直到所提供的比特用完, 外层循环则利用量化噪声与掩蔽阈值的比来估计^ 号的编码质量。 最后经过编码的信号通过比特流复用模块 108形成编码的音频流输出。  The second-order backward adaptive predictor 105 is used to eliminate redundancy of the steady state signal and improve coding efficiency. The and difference stereo (M/S) module 106 operates for a pair of channels, which are two channels for capturing left and right channels or left and right surround channels, such as a two-channel signal or a multi-channel signal. The M/S module 106 utilizes the correlation between the two channels of the channel pair to achieve the effect of reducing the code rate and improving the coding efficiency. The bit allocation and quantization coding module 107 is implemented by a nested loop process in which the non-uniform quantizer performs lossy coding and the entropy coding module performs lossless coding, which removes redundancy and reduces correlation. The nested loop includes an inner loop and an outer loop, wherein the inner loop adjusts the step size of the non-uniform quantizer until the supplied bits are used up, and the outer loop estimates the encoding of the ^ using the ratio of the quantization noise to the masking threshold. quality. The last encoded signal forms an encoded audio stream output through bitstream multiplexing module 108.

在采样率可伸缩的情况下, 输入信号同时进行四频段多相位滤波器组(PQF ) 中产生 四个等带宽的频带, 每个频带利用 MDCT产生 256个频谱系数, 总共有 1024个。 在每个频 带内都使用增益控制器 101。 而在解码器中可以忽略高频的 PQF频带得到低采样率信号。  In the case where the sampling rate is scalable, the input signal simultaneously generates four equal-bandwidth bands in the quad-band polyphase filter bank (PQF), and each band uses MDCT to generate 256 spectral coefficients, for a total of 1024. A gain controller 101 is used in each band. In the decoder, the high frequency PQF band can be ignored to obtain a low sampling rate signal.

图 2给出了对应的 MPEG- 2 AAC解码器的方框示意图。 该解码器包括比特流解复用 莫 块 201、 无损解码模块 202、 逆量化器 203、 尺度因子模块 204、 和 /差立体声 (M/S )模块 205、 预测模块 206、 强度 /耦合模块 207、 时域噪声整形模块 208、 滤波器组 209和增益 ^空 制模块 210。 编码的音频流经过比特流解复用模块 201进行解复用, 得到相应的数据流禾 控制流。 上述信号通过无损解码模块 202的解码后, 得到尺度因子的整数表示和信号谱的 量化值。 逆量化器 203是一组通过压扩函数实现的非均匀量化器組, 用于将整数量化值转 换为重建傅。 由于编码器中的尺度因子模块是将当前尺度因子与前一尺度因子进行差分, 然后将差分值采用 Huffman编码, 因此解码器中的尺度因子模块 204进行 Huffman解码可 得到相应的差分值, 再恢复出真实的尺度因子。 M/S模块 205在边信息的控制下将和差声 道转换成左右声道。 由于在编码器中采用二阶后向自适应预测器 105消除稳态信号的冗余 并提高编码效率, 因此在解码器中通过预测模块 206 进行预测解码。 强度 /耦合模块 207 在边信息的控制下进行强度 /耦合解码, 然后输出到时域噪声整形模块 208 中进行时域噪 声整形解码, 最后通过滤波器组 209进行综合滤波, 滤波器組 209采用逆向改进离散余弦 变换 ( IMDCT )技术。 Figure 2 shows a block diagram of the corresponding MPEG-2 AAC decoder. The decoder includes a bitstream demultiplexing block 201, a lossless decoding module 202, an inverse quantizer 203, a scale factor module 204, and a / difference stereo (M/S) module 205, a prediction module 206, an intensity/coupling module 207, The time domain noise shaping module 208, the filter bank 209, and the gain ^empty module 210. The encoded audio stream is demultiplexed by the bitstream demultiplexing module 201 to obtain a corresponding data stream and control stream. After the above signal is decoded by the lossless decoding module 202, an integer representation of the scale factor and a signal spectrum are obtained. Quantitative value. The inverse quantizer 203 is a set of non-uniform quantizers implemented by a companding function for converting integer quantized values into reconstruction Fu. Since the scale factor module in the encoder differentiates the current scale factor from the previous scale factor and then uses the Huffman code for the difference value, the scale factor module 204 in the decoder performs Huffman decoding to obtain the corresponding difference value, and then recovers. A true scale factor. The M/S module 205 converts the difference channel to the left and right channels under the control of the side information. Since the second order backward adaptive predictor 105 is used in the encoder to eliminate the redundancy of the steady state signal and improve the coding efficiency, the prediction decoding is performed by the prediction module 206 in the decoder. The intensity/coupling module 207 performs intensity/coupling decoding under the control of the side information, and then outputs it to the time domain noise shaping module 208 for time domain noise shaping decoding, and finally performs comprehensive filtering by the filter bank 209, and the filter bank 209 adopts the reverse direction. Improved Discrete Cosine Transform (IMDCT) technique.

对于采样频率可伸缩的情况, 可通过增益控制模块 210忽略高频的 PQF频带, 以得到 低采样率信号。  For the case where the sampling frequency is scalable, the high frequency PQF band can be ignored by the gain control module 210 to obtain a low sampling rate signal.

MPEG- 2 AAC编解码技术适用于中高码率的音频信号, 但对低码率或甚低码率的音频信 号的编码质量较差; 同时该编解码技术涉及的编解码模块较多, 实现的复杂度较高, 不利 于实时实现。  MPEG-2 AAC codec technology is suitable for medium and high bit rate audio signals, but the encoding quality of low bit rate or low bit rate audio signals is poor. At the same time, the codec technology involves more codec modules. High complexity is not conducive to real-time implementation.

图 3给出了采用杜比 AC- 3技术的编码器的结构示意图, 包括暂态信号检测模块 301、 ?丈进的离散余弦变换滤波器 MDCT 302、 频借包络 /指数编码模块 303、 尾数编码模块 304、 前向-后向自适应感知模型 305、 参数比特分配模块 306和比特流复用模块 307。  Figure 3 shows the structure of the encoder using Dolby AC-3 technology, including the transient signal detection module 301, ? The discrete cosine transform filter MDCT 302, the frequency borrowing envelope/exponential encoding module 303, the mantissa encoding module 304, the forward-backward adaptive sensing model 305, the parameter bit allocation module 306, and the bit stream multiplexing module 307.

音频信号通过暂态信号检测模块 301判别是稳态信号还是瞬态信号, 同时通过信号自 适应 MDCT滤波器组 302将时域数据映射到频域数据, 其中 512点的长窗应用于稳态信号, 一对短窗应用于瞬态信号。  The audio signal is judged to be a steady state signal or a transient signal by the transient signal detecting module 301, and the time domain data is mapped to the frequency domain data by the signal adaptive MDCT filter bank 302, wherein a long window of 512 points is applied to the steady state signal. , a pair of short windows applied to the transient signal.

频谱包络 /指数编码模块 303根据码率和频率分辨率的要求釆用三种模式对信号的指 数部分进行编码, 分别是 D15、 D25和 D45编码模式。 AC- 3技术在频率上对频谱包络采取 差分编码, 因为最多需要 ± 2增量, 每个增量代表 6dB的电平变化, 对于第一个直流项采 用绝对值编码, 其余指数就采用差分编码。 在 D15频谱包络指数编码中, 每个指数大约需 要 1. 33比特, 3个差分组在一个 7比特的字长中编码, D15编码模式通过牺牲时间分辨率 而提供精细的频率分辨率。 由于只是对相对平稳的信号才需要精细的频率分辨率, 而这样 的信号在许多块上的频傅保持相对恒定, 因此, 对于稳态信号, D15 偶尔被传送, 通常是 每 6个声音块(一个数据帧) 的频譜包络被传送一次。 当信号频谱不稳定时, 需要常更新 频侮估计值。 估计值采用较小的频率分辨率编码, 通常使用 D25和 D45编码模式。 D25编 码模式提供了合适的频率分辨率和时间分辨率, 每隔一个频率系数就进行差分编码, 这样 每个指数大约需要 1. 15比特。 当频讲在 2至 3个块上都是稳定^ h 然后突然变化时, 可 以采用 D25编码模式。 D45编码模式是每隔三个频率系数进行差 ^编码, 这样每个指数大 约需要 0. 58比特。 D45编码模式提供了很高的时间分辨率和较低的频率分辨率, 所以一般 应用在对瞬态信号的编码中。 The spectral envelope/exponential encoding module 303 encodes the exponential portions of the signals in three modes according to the requirements of the code rate and frequency resolution, namely the D15, D25, and D45 encoding modes. The AC-3 technology differentially encodes the spectral envelope in frequency because at most ± 2 increments are required, each increment represents a 6 dB level change, the first DC term is absolute coded, and the remaining indices are differential. coding. In the D15 spectral envelope index coding, each index requires approximately 1.33 bits, and three difference packets are encoded in a 7-bit word length. The D15 coding mode provides fine frequency resolution by sacrificing temporal resolution. Since fine frequency resolution is only required for relatively stationary signals, and the frequency of such signals remains relatively constant over many blocks, D15 is occasionally transmitted for steady state signals, typically every 6 sound blocks ( The spectral envelope of a data frame is transmitted once. When the signal spectrum is unstable, the frequency estimate needs to be updated frequently. Estimates are encoded with a smaller frequency resolution, usually using the D25 and D45 encoding modes. The D25 encoding mode provides suitable frequency resolution and time resolution, and differential encoding is performed every other frequency coefficient. Each index requires approximately 1.15 bits. When the frequency is stable on 2 to 3 blocks and then suddenly changes, the D25 encoding mode can be used. The D45 coding mode is to perform a difference encoding every three frequency coefficients, so that each index needs about 0.58 bits. The D45 encoding mode provides high temporal resolution and low frequency resolution, so it is generally used in the encoding of transient signals.

前向-后向自适应感知模型 305用于估计每帧信号的掩蔽阈 直。 其中前向自适应部分 仅应用在编码器端, 在码率的限制下, 通过迭代循环估计一组最 圭的感知模型参数, 然后 这些参数被传递到后向自适应部分以估计出每帧的掩蔽阈值。 后 自适应部分同时应用在 编码器端和解码器端。  The forward-backward adaptive sensing model 305 is used to estimate the masking threshold of each frame of the signal. The forward adaptive part is only applied to the encoder end. Under the limitation of the code rate, a set of the most perceptual model parameters is estimated by iterative loop, and then these parameters are passed to the backward adaptive part to estimate each frame. Mask the threshold. The post-adaptive part is applied to both the encoder side and the decoder side.

参数比特分配模块 306根据掩蔽准则分析音频信号的频 ·¾■包终, 以确定给每个尾数分 配的比特数。 该模块 306利用一个比特池对所有声道进行全局比 分配。 在尾数编码模块 304 中进行编码时, 从比特池中循环取出比特分配给所有的声道, 根据可以获得的比特数 来调整尾数的量化。 为达到压缩编码的目的, AC- 3编码器还采用高频耦合的技术, 将被耦 合信号的高频部分按照人耳临界带宽划分成 18 个子频段, 然后逸择某些声道从某个子带 开始进行耦合。 最后通过比特流复用模块 307形成 AC- 3音频流 出。  The parameter bit allocation module 306 analyzes the frequency of the audio signal according to the masking criteria to determine the number of bits to assign to each mantissa. The module 306 utilizes a bit pool for global ratio allocation of all channels. When encoding is performed in the mantissa encoding module 304, bits are cyclically extracted from the bit pool and allocated to all channels, and the quantization of the mantissa is adjusted in accordance with the number of available bits. For the purpose of compression coding, the AC-3 encoder also uses high-frequency coupling technology to divide the high-frequency part of the coupled signal into 18 sub-bands according to the critical bandwidth of the human ear, and then select some channels from a certain sub-band. Start coupling. Finally, the AC-3 audio stream is formed by the bit stream multiplexing module 307.

图 4给出了采用杜比 AC-3解码的流程示意图。 首先输入经 ϋ AC- 3编码器编码的比特 流, 对比特流进行数据帧同步和误码检测, 如果检测到一个数据 吴码, 则进行误码掩盖或 弱音处理。 然后对比特流进行解包, 获得主信息和边信息, 再进 亍指数解码。 在进行指数 解码时, 需要有两个边信息: 一是打包的指数数目; 一个是所采^的指数策略, 如 D15、 D25或 D45模式。 已经解码的指数和比特分配边信息再进行比特 配, 指出每个打包的尾 数所用的比特数,得到一组比特分配指针,每个比特分配指针对应一个编码的尾数。 比特分 配指针指出用于尾数的量化器以及在码流中每个尾数占用的比特数。 对单个编码的尾数值 进行解量化, 将其转变成一个解量化的值, 占用零比特的尾数被 'ϊ灰复成零, 或者在抖动标 志的控制下用一个随机抖动值代替。 然后进行解耦合的操作,解 合是从公共耦合声道和 耦合因子中恢复出被耦合声道的高频部分,包括指数和尾数。 如 在编码端采用 2/0模式 编码时, 会对某子带采用矩阵处理, 那么在解码端需通过矩阵恢复将该子带的和差声道值 转换成左右声道值。 在码流中包含有每个音频块的动态范围控制 将该值进行动态范围 压缩,以改变系数的幅度, 包括指数和尾数。 将频域系数进行逆变换, 转变成时域样本, 然后对时域样本进行加窗处理,相邻的块进行重叠相加,重构出 PC1M音频信号。 当解码输出 的声道数小于编码比特流中的声道数时, 还需要对音频信号进行 混处理, 最后输出 PCM 杜比 AC- 3编码技术主要针对高比特率多声道环绕声的信号, 但是当 5. 1声道的编码 比特率低于 384kbps时, 其编码效果较差; 而且对于单声道和双声道立体声的编码效率也 较低。 Figure 4 shows a schematic diagram of the process using Dolby AC-3 decoding. First, the bit stream encoded by the ϋ AC-3 encoder is input, and data frame synchronization and error detection are performed on the bit stream. If a data code is detected, error concealment or mute processing is performed. Then the bit stream is unpacked to obtain the main information and the side information, and then the index is decoded. In the exponential decoding, two side information is needed: one is the number of indexed packets; one is the indexing strategy of the mining, such as D15, D25 or D45 mode. The decoded index and bit allocation side information are then bit-matched, indicating the number of bits used for each packed mantissa, resulting in a set of bit allocation pointers, each bit allocation pointer corresponding to an encoded mantissa. The bit allocation pointer indicates the quantizer used for the mantissa and the number of bits occupied by each mantissa in the code stream. The single coded mantissa value is dequantized, converted into a dequantized value, and the mantissa occupying zero bits is complexed to zero by zero, or replaced by a random jitter value under the control of the jitter flag. The decoupling operation is then performed, and the decoupling is to recover the high frequency portion of the coupled channel from the common coupling channel and the coupling factor, including the exponent and the mantissa. For example, when the encoding end adopts 2/0 mode encoding, matrix processing is applied to a certain sub-band, then the sub-band and the difference channel value are converted into left and right channel values by matrix recovery at the decoding end. The dynamic range control of each audio block is included in the code stream to dynamically range the value to change the magnitude of the coefficients, including the exponent and mantissa. The frequency domain coefficients are inverse transformed and converted into time domain samples. Then, the time domain samples are windowed, and the adjacent blocks are overlapped and added to reconstruct the PC1M audio signal. When the number of channels outputted by the decoding is smaller than the number of channels in the encoded bit stream, the audio signal needs to be mixed, and finally the PCM is output. Dolby AC- 3 encoding technology is mainly for high bit rate multi-channel surround sound signals, but when the 5.1 bit encoding bit rate is lower than 384kbps, its encoding effect is poor; and for mono and dual sound Channel stereo coding efficiency is also low.

综上, 现有的编解码技术无法全面解决从甚低码率、 低码率到高码率音频信号以及单 声道、 默声道信号的编解码质量, 实现较为复杂。 发明内容  In summary, the existing codec technology cannot comprehensively solve the coding and decoding quality from very low code rate, low bit rate to high bit rate audio signal and mono channel and silent channel signals, and the implementation is complicated. Summary of the invention

本发明所要解决的技术问题在于提供一种增强音频编 /解码的装置及方法, 以解决现 有技术对于较低码率音频信号的编码效率低、 质量差的问题。  The technical problem to be solved by the present invention is to provide an apparatus and method for enhancing audio encoding/decoding to solve the problem of low coding efficiency and poor quality of the lower rate audio signal in the prior art.

本发明的增强音频编码装置, 包括心理声学分析模块、 时频映射模块、 量化和熵编码 模块以及比特流复用模块、 信号性质分析模块和多分辨率分析模块; 其中所述信号性质分 析模块, 用于对输入音频信号进行类型分析, 并输出到所述心理声学分析模块和所述时频 映射模块, 同时将信号类型分析结果信息输出到所述比特流复用模块; 所述心理声学分析 模块, 用于计算音频信号的掩蔽阈值和信掩比, 并输出到所述量化和熵编码模块; 所述时 频映射模块, 用于将时域音频信号转变成频域系数, 并输出到多分辨率分析模块; 所述多 分辨率分析模块, 用于根据所述信号性质分析模块输出的信号类型分析结果, 对快变类型 信号的频域系数进行多分辨率分析,并输出到量化和熵编码模块; 所述量化和熵编码模块, 在所述心理声学分析模块输出的信掩比的控制下, 用于对频域系数进行量化和熵编码, 并 输出到所述比特流复用模块; 所述比特流复用模块用于将接收到的数据进行复用, 形成音 频编码码流。  The enhanced audio coding device of the present invention comprises a psychoacoustic analysis module, a time-frequency mapping module, a quantization and entropy coding module, and a bit stream multiplexing module, a signal property analysis module and a multi-resolution analysis module; wherein the signal property analysis module, And performing type analysis on the input audio signal, and outputting to the psychoacoustic analysis module and the time-frequency mapping module, and outputting signal type analysis result information to the bit stream multiplexing module; the psychoacoustic analysis module And calculating a masking threshold and a mask ratio of the audio signal, and outputting to the quantization and entropy encoding module; the time-frequency mapping module, configured to convert the time domain audio signal into a frequency domain coefficient, and output to the multi-resolution The multi-resolution analysis module is configured to perform multi-resolution analysis on the frequency domain coefficients of the fast-changing type signal according to the signal type analysis result output by the signal property analysis module, and output the same to the quantization and entropy coding module. The quantization and entropy coding module, outputted by the psychoacoustic analysis module Under the control of the mask ratio, the frequency domain coefficients are quantized and entropy encoded, and output to the bit stream multiplexing module; the bit stream multiplexing module is configured to multiplex the received data to form an audio code. Code stream.

本发明的增强音频解码装置, 包括: 比特流解复用模块、 熵解码模块、 逆量化器组、 频率-时间映射模块和多分辨率综合模块; 所述比特流解复用模块用于对压缩音频数据流 进行解复用, 并向所述熵解码模块和所述多分辨率综合模块输出相应的数据信号和控制信 号; 所述熵解码模块用于对上述信号进行解码处理, 恢复谱的量化值, 输出到所述逆量化 器组; 所述逆量化器组用于重建逆量化谱, 并输出到所述到多分辨率综合模块; 所述多分 辨率综合模块用于对逆量化 进行多分辨率综合, 并输出到所述频率-时间映射模块; 所 述频率 -时间映射模块用于对谱系数进行频率-时间映射, 输出时域音频信号。  The enhanced audio decoding apparatus of the present invention comprises: a bit stream demultiplexing module, an entropy decoding module, an inverse quantizer group, a frequency-time mapping module, and a multi-resolution synthesis module; and the bit stream demultiplexing module is used for compression And demultiplexing the audio data stream, and outputting corresponding data signals and control signals to the entropy decoding module and the multi-resolution synthesis module; the entropy decoding module is configured to decode the foregoing signals, and restore spectrum quantization And outputting to the inverse quantizer group; the inverse quantizer group is configured to reconstruct an inverse quantization spectrum, and outputting to the multi-resolution synthesis module; the multi-resolution synthesis module is configured to perform inverse quantization The resolution is integrated and output to the frequency-time mapping module; the frequency-time mapping module is configured to perform frequency-time mapping on the spectral coefficients, and output a time domain audio signal.

本发明适用于多种采样率、 声道配置的音频信号的高保真压缩编码, 可以支持采样率 为 8kHz到 192kHz之间的音频信号; 可支持所有可能的声道配置; 并且支持范围很宽的目 标码率的音频编 /解码。 附图说明 The invention is applicable to high-fidelity compression coding of audio signals of various sampling rates and channel configurations, and can support audio signals with sampling rates between 8 kHz and 192 kHz; all possible channel configurations can be supported; and a wide range of support is supported. Audio encoding/decoding of the target bit rate. DRAWINGS

图 1是 MPEG- 2 AAC编码器的方框图;  Figure 1 is a block diagram of an MPEG-2 AAC encoder;

图 2是 MPEG- 2 AAC解码器的方框图;  Figure 2 is a block diagram of an MPEG-2 AAC decoder;

图 3是采用杜比 AC-3技术的编码器的结构示意图;  Figure 3 is a schematic structural view of an encoder using Dolby AC-3 technology;

图 4是采用杜比 AC- 3技术的解码流程示意图;  Figure 4 is a schematic diagram of a decoding process using Dolby AC-3 technology;

图 5是本发明编码装置的结构示意图;  Figure 5 is a schematic structural view of an encoding device of the present invention;

图 6是采用 Harr小波基小波变换的滤波结构示意图;  6 is a schematic diagram of a filtering structure using a Harr wavelet-based wavelet transform;

图 7是采用 Harr小波基小波变换得到的时频划分示意图;  Figure 7 is a schematic diagram of time-frequency division obtained by using Harr wavelet-based wavelet transform;

图 8是本发明解码装置的结构示意图;  8 is a schematic structural diagram of a decoding apparatus of the present invention;

图 9是本发明编码装置的实施例一的结构示意图;  Figure 9 is a schematic structural view of Embodiment 1 of the coding apparatus of the present invention;

图 10是本发明解码装置的实施例一的结构示意图;  FIG. 10 is a schematic structural diagram of Embodiment 1 of a decoding apparatus according to the present invention; FIG.

图 11是本发明编码装置的实施例二的结构示意图;  Figure 11 is a schematic structural view of Embodiment 2 of the encoding apparatus of the present invention;

图 12是本发明解码装置的实施例二的结构示意图;  FIG. 12 is a schematic structural diagram of Embodiment 2 of a decoding apparatus according to the present invention; FIG.

图 13是本发明编码装置的实施例三的结构示意图;  Figure 13 is a schematic structural view of a third embodiment of the encoding apparatus of the present invention;

图 14是本发明解码装置的实施例三的结构示意图;  FIG. 14 is a schematic structural diagram of Embodiment 3 of a decoding apparatus according to the present invention; FIG.

图 15是本发明编码装置的实施例四的结构示意图;  Figure 15 is a schematic structural view of Embodiment 4 of the encoding apparatus of the present invention;

图 16是本发明解码装置的实施例四的结构示意图;  16 is a schematic structural diagram of Embodiment 4 of a decoding apparatus according to the present invention;

图 17是本发明编码装置的实施例五的结构示意图;  Figure 17 is a schematic structural view of Embodiment 5 of the coding apparatus of the present invention;

图 18是本发明解码装置的实施例五的结构示意图;  18 is a schematic structural diagram of Embodiment 5 of a decoding apparatus according to the present invention;

图 19是本发明编码装置的实施例六的结构示意图;  Figure 19 is a schematic structural view of Embodiment 6 of the encoding apparatus of the present invention;

图 20是本发明解码装置的实施例六的结构示意图;  20 is a schematic structural diagram of Embodiment 6 of a decoding apparatus according to the present invention;

图 21是本发明编码装置的实施例七的结构示意图;  Figure 21 is a schematic structural view of Embodiment 7 of the coding apparatus of the present invention;

图 22是本发明解码装置的实施例七的结构示意图。 具体实施方式  Figure 22 is a block diagram showing the structure of a seventh embodiment of the decoding apparatus of the present invention. Detailed ways

图 1至图 4是现有技术的几种编码器的结构示意图, 已在背景扶术中进行了介绍, 此 处不再赞述。  Fig. 1 to Fig. 4 are schematic diagrams showing the structure of several encoders of the prior art, which have been introduced in the background, and are not mentioned here.

需要说明的是: 为方便、 清楚地说明本发明, 下述编解码装置的具体实施例是采用对 应的方式说明的, 但并不限定编码装置与解码装置必须是——对应的。 如图 5所示, 本发明提供的音频编码装置包括信号性质分析模块 50、 心理声学分析模 块 51、 时频映射模块 52、 多分辨率分析模块 53、 量化和熵编码模块 54以及比特流复用模 块 55; 其中信号性质分析模块 50用于对输入音频信号进行类型分析, 将音频信号输出到 心理声学分析模块 51和时频映射模块 52, 同时将信号类型分析结果输出到比特流复用模 块 55; 心理声学分析模块 51用于计算输入音频信号的掩蔽阈值和信掩比, 输出到量化和 熵编码模块 54; 时频映射模块 52用于将时域音频信号转变成频域系数, 并输出到多分辨 率分析模块 53;多分辨率分析模块 53根据心理声学分析模块 51输出的信号类型分析结果, 用于对快变类型信号的频域系数进行多分辨率分析, 并输出到量化和熵编码模块 54; 量化 和熵编码模块 54在心理声学分析模块 51输出的信掩比的控制下, 用于对频域系数进行量 化和熵编码, 并输出到比特流复用模块 55; 比特流复用模块 55用于将接收到的数据进行 复用, 形成音频编码码流。 It should be noted that, in order to explain the present invention conveniently and clearly, the specific embodiment of the following codec device is described in a corresponding manner, but the encoding device and the decoding device are not necessarily limited. As shown in FIG. 5, the audio encoding apparatus provided by the present invention includes a signal property analyzing module 50, a psychoacoustic analyzing module 51, a time-frequency mapping module 52, a multi-resolution analyzing module 53, a quantization and entropy encoding module 54, and bit stream multiplexing. The module 55 is configured to perform type analysis on the input audio signal, output the audio signal to the psychoacoustic analysis module 51 and the time-frequency mapping module 52, and output the signal type analysis result to the bit stream multiplexing module 55. The psychoacoustic analysis module 51 is configured to calculate a masking threshold and a signal mask ratio of the input audio signal, and output to the quantization and entropy encoding module 54. The time-frequency mapping module 52 is configured to convert the time domain audio signal into frequency domain coefficients and output the same. The resolution analysis module 53 is configured to perform multi-resolution analysis on the frequency domain coefficients of the fast-changing type signal according to the signal type analysis result output by the psychoacoustic analysis module 51, and output to the quantization and entropy coding module. 54; the quantization and entropy coding module 54 is used to control the frequency domain system under the control of the mask ratio output by the psychoacoustic analysis module 51. The number is quantized and entropy encoded and output to the bit stream multiplexing module 55; the bit stream multiplexing module 55 is configured to multiplex the received data to form an audio encoded code stream.

数字音频信号在信号性质分析模块 50 中进行信号类型分析, 将音频信号的类型信息 输出到比特流复用模块 55; 并同时将音频信号输出到所述心理声学分析模块 51和所述时 频映射模块 52中, 一方面在心理声学分析模块 51中计算该帧音频信号的掩蔽阔值和信掩 比, 然后将信掩比作为控制信号传送给量化和熵编码模块 54; 另一方面时域的音频信号通 过时频映射模块 52转变成频域系数; 上述频域系数在多分辨率分析模块 53中, 对快变信 号进行多分辨率分析, 提高快变信号的时间分辨率, 并将结果输出到量化和熵编码模块 54 中; 在心理声学分析模块 51输出的信掩比的控制下, 在量化和熵编码模块 54中进行量化 和熵编码, 经过编码后的数据和控制信号在比特流复用模块 55 进行复用, 形成增强音频 编码的码流。  The digital audio signal is subjected to signal type analysis in the signal property analysis module 50, and the type information of the audio signal is output to the bit stream multiplexing module 55; and the audio signal is simultaneously output to the psychoacoustic analysis module 51 and the time-frequency map. In the module 52, on the one hand, the masking threshold and the mask ratio of the frame audio signal are calculated in the psychoacoustic analysis module 51, and then the mask ratio is transmitted as a control signal to the quantization and entropy encoding module 54; The signal is converted into frequency domain coefficients by the time-frequency mapping module 52; the frequency domain coefficients are multi-resolution analysis module 53 in the multi-resolution analysis module 53 to improve the time resolution of the fast-changing signal, and output the result to In the quantization and entropy coding module 54; under the control of the mask ratio output by the psychoacoustic analysis module 51, quantization and entropy coding are performed in the quantization and entropy coding module 54, and the encoded data and control signals are multiplexed in the bit stream. Module 55 multiplexes to form a code stream that enhances the audio coding.

下面对上述音频编码装置的各个组成模块进行具体详细地说明。  The respective constituent modules of the above audio encoding device will be described in detail below.

信号性质分析模块 50, 用于输入的音频信号进行信号类型分析, 并将音频信号的类型 信息输出到比特流复用模块 55; 同时将音频信号输出到心理声学分析模块 51和时频映射 模块 52。  The signal property analysis module 50 performs signal type analysis on the input audio signal, and outputs the type information of the audio signal to the bit stream multiplexing module 55; and simultaneously outputs the audio signal to the psychoacoustic analysis module 51 and the time-frequency mapping module 52. .

信号性质分析模块 50基于自适应阈值和波形预测进行前、 后向掩蔽效应分析来确定 信号的类型为緩变信号还是快变信号, 若是快变类型信号, 则继续计算突变成分的相关参 数信息, 如突变信号发生的位置以及突变信号的强度等。  The signal property analysis module 50 performs front and back masking effect analysis based on the adaptive threshold and the waveform prediction to determine whether the signal type is a slow-changing signal or a fast-changing signal, and if it is a fast-changing type signal, continues to calculate related parameter information of the abrupt component. Such as the location of the mutation signal and the strength of the mutation signal.

心理声学分析模块 51 主要用于计算输入音频信号的掩蔽阈值、 信掩比和感知熵。 根 据心理声学分析模块 51 计算出的感知熵可动态地分析当前信号帧进行透明编码所需的比 特数, 从而调整帧间的比特分配。 心理声学分析模块 51 输出各个子带的信掩比到量化和 熵编码模块 54 , 对其进行控制。 The psychoacoustic analysis module 51 is mainly used to calculate the masking threshold, the mask ratio and the perceptual entropy of the input audio signal. The perceptual entropy calculated by the psychoacoustic analysis module 51 can dynamically analyze the number of bits required for the current signal frame to be transparently encoded, thereby adjusting the bit allocation between frames. The psychoacoustic analysis module 51 outputs the mask ratio of each sub-band to the quantized sum The entropy encoding module 54 controls it.

时频映射模块 52用于实现音频信号从时域信号到频域系数的变换, 由滤波器组构成, 具体可以是离散傅立叶变换(DFT )滤波器组、 离散余弦变换(DCT )滤波器组、 修正离散 余弦变换(MDCT )滤波器组、 余弦调制滤波器组、 小波变换滤波器组等。 通过时频映射得 到的频域系数被输出到量化和熵编码模块 54中, 进行量化和编码处理。  The time-frequency mapping module 52 is configured to implement the transformation of the audio signal from the time domain signal to the frequency domain coefficient, and is composed of a filter bank, and specifically may be a discrete Fourier transform (DFT) filter bank, a discrete cosine transform (DCT) filter bank, Modified discrete cosine transform (MDCT) filter bank, cosine modulated filter bank, wavelet transform filter bank, etc. The frequency domain coefficients obtained by the time-frequency mapping are output to the quantization and entropy encoding module 54, and quantization and encoding processing is performed.

对于快变类型信号, 为有效克服编码过程中产生的预回声现象, 提高编码质量, 本发 明编码装置通过多分辨率分析模块 53 来提高编码快变信号的时间分辨率。 时频映射模块 52输出的频域系数输入到多分辨率分析模块 53中, 如果是快变类型信号, 则进行频域小 波变换或频域修正离散余弦变换(MDCT ), 获得对频域系数的多分辨率表示, 输出到量化 和熵编码模块 54 中。 如果是緩变类型信号, 则对频域系数不进行处理, 直接输出到量化 和熵编码模块 54。  For the fast-changing type signal, in order to effectively overcome the pre-echo phenomenon generated in the encoding process and improve the encoding quality, the encoding apparatus of the present invention increases the time resolution of the encoded fast-changing signal by the multi-resolution analyzing module 53. The frequency domain coefficients output by the time-frequency mapping module 52 are input to the multi-resolution analysis module 53. If it is a fast-varying type signal, the frequency domain wavelet transform or the frequency domain modified discrete cosine transform (MDCT) is performed to obtain the frequency domain coefficients. The multi-resolution representation is output to the quantization and entropy encoding module 54. If it is a slowly varying type signal, the frequency domain coefficients are not processed and are directly output to the quantization and entropy encoding module 54.

多分辨率分析模块 53 包括频域系数变换模块和重組模块, 其中频域系数变换模块用 于将频域系数变换为时频平面系数; 重组模块用于将时频平面系数按照一定的规则进行重 组。 频域系数变换模块可采用频域小波变换滤波器组、 频域 MDCT变换滤波器組等。  The multi-resolution analysis module 53 includes a frequency domain coefficient transform module and a recombination module, wherein the frequency domain coefficient transform module is configured to transform the frequency domain coefficients into time-frequency plane coefficients; the recombination module is configured to reorganize the time-frequency plane coefficients according to certain rules. . The frequency domain coefficient transform module may adopt a frequency domain wavelet transform filter bank, a frequency domain MDCT transform filter bank, or the like.

量化和嫡编码模块 54 进一步包括了非线性量化器组和编码器, 其中量化器可以是标 量量化器或矢量量化器。 矢量量化器进一步分为无记忆矢量量化器和有记忆矢量量化器两 大类。 对于无记忆矢量量化器, 每个输入矢量是独立进行量化的, 与以前的各矢量无关; 有记忆矢量量化器是在量化一个矢量时考虑以前的矢量, 即利用了矢量之间的相关性。 主 要的无记忆矢量量化器包括全搜索矢量量化器、 树搜索矢量量化器、 多级矢量量化器、 增 益 /波形矢量量化器和分离均值矢量量化器; 主要的有记忆矢量量化器包括预测矢量量化 器和有限状态矢量量化器。  The quantization and chirp encoding module 54 further includes a non-linear quantizer group and an encoder, where the quantizer can be a scalar quantizer or a vector quantizer. The vector quantizer is further divided into two categories: memoryless vector quantizer and memory vector quantizer. For a memoryless vector quantizer, each input vector is independently quantized, independent of the previous vectors; a memory vector quantizer considers the previous vector when quantizing a vector, ie, exploits the correlation between vectors. The main memoryless vector quantizers include a full search vector quantizer, a tree search vector quantizer, a multilevel vector quantizer, a gain/waveform vector quantizer, and a separate mean vector quantizer; the main memory vector quantizer includes predictive vector quantization And finite state vector quantizer.

如果采用标量量化器, 则非线性量化器组进一步包括 M个子带量化器。 在每个子带量 化器中主要利用尺度因子进行量化, 具体是: 对 M个尺度因子带中所有的频域系数进行非 线性压缩, 再利用尺度因子对该子带的频域系数进行量化 , 得到整数表示的量化谱输出到 编码器, 将每帧信号中的第一个尺度因子作为公共尺度因子输出到比特流复用模块 55 , 其 它尺度因子与其前一个尺度因子进行差分处理后输出到编码器。  If a scalar quantizer is employed, the non-linear quantizer group further includes M sub-band quantizers. In each sub-band quantizer, the scale factor is mainly used for quantization, specifically: nonlinearly compressing all the frequency domain coefficients in the M scale factor bands, and then using the scale factor to quantize the frequency domain coefficients of the sub-band, The quantized spectrum represented by the integer is output to the encoder, and the first scale factor in each frame signal is output to the bit stream multiplexing module 55 as a common scale factor, and other scale factors are differentially processed with the previous scale factor and output to the encoder. .

上述步骤中的尺度因子是不断变化的值, 按照比特分配策略来调整。 本发明提供了一 种全局感知失真最小的比特分配策略, 具体如下:  The scale factor in the above steps is a constantly changing value, which is adjusted according to the bit allocation strategy. The present invention provides a bit allocation strategy with minimal global perceptual distortion, as follows:

首先, 初始化每个子带量化器, 选择合适的尺度因子, 使所有子带中的谱系数的量化 值为 0。 此时每个子带的量化噪声等于每个子带的能量值, 每个子带的噪声掩蔽比 NMR等 于它的信掩比 SMR, 量化所消耗的比特数为 0 , 剩余比特数 等于目标比特数 First, each sub-band quantizer is initialized, and an appropriate scale factor is selected such that the quantized value of the spectral coefficients in all sub-bands is zero. At this time, the quantization noise of each sub-band is equal to the energy value of each sub-band, and the noise masking ratio of each sub-band is NMR, etc. In its letter mask ratio SMR, the number of bits consumed for quantization is 0, and the number of remaining bits is equal to the number of target bits.

其次, 查找噪声掩蔽比 NMR最大的子带, 若最大噪声掩蔽比丽 R小于等于 1 , 则尺度 因子不变, 输出分配结果, 比特分配过程结束; 否则, 将对应的子带量化器的尺度因子减 小一个单位, 然后计算该子带所需增加的比特数 Δ5,. (β)。 若该子带的剩余比特数  Secondly, finding the subband with the largest noise masking ratio NMR, if the maximum noise masking ratio R is less than or equal to 1, the scale factor is unchanged, the output allocation result is output, and the bit allocation process ends; otherwise, the scale factor of the corresponding subband quantizer is Decrease by one unit and then calculate the number of bits Δ5,. (β) that the subband needs to increase. If the remaining bits of the subband

Bt≥ABi (Qi ) , 则确认此次尺度因子的修改, 并将剩余比特数 A减去 Δ5,.(¾), 重新计算该 子带的噪声掩蔽比丽 R , 然后继续查找噪声掩蔽比丽 R最大的子带, 重复执行后续步骤。 如果该子带的剩佘比特数 < Δ5,(ρ,) ,则取消此次修改,保留上一次的尺度因子以及剩佘 比特数, 最后输出分配结果, 比特分配过程结束。 B t ≥AB i (Q i ), then confirm the modification of the scale factor, and subtract Δ5,.(3⁄4) from the remaining number of bits A, recalculate the noise mask of the sub-band, and then continue to find the noise. Masking the largest subband of Bili R, repeat the subsequent steps. If the number of remaining bits of the subband is < Δ5, (ρ,), the modification is canceled, the previous scale factor and the remaining number of bits are retained, and finally the allocation result is output, and the bit allocation process ends.

如果采用矢量量化器, 则频域系数组成多个 维矢量输入到非线性量化器组中, 对于 每个 维矢量都根据平整因子进行傅平整, 即缩小侮的动态范围, 然后由矢量量化器根据 主观感知距离测度准则在码书中找到与待量化矢量距离最小的码字, 将对应的码字索引传 递给编码器。 平整因子是^ L据矢量量化的比特分配策略调整的, 而矢量量化的比特分配则 根据不同子带间感知重要度来控制。  If a vector quantizer is used, the frequency domain coefficients are composed into a plurality of dimensional vectors and input into the nonlinear quantizer group. For each dimension vector, the flattening factor is used to perform the flattening, that is, the dynamic range of the chirp is reduced, and then the vector quantizer is used. The subjective perceptual distance measure criterion finds the codeword with the smallest distance from the vector to be quantized in the codebook, and transmits the corresponding codeword index to the encoder. The leveling factor is adjusted according to the bit allocation strategy of vector quantization, and the bit allocation of vector quantization is controlled according to the perceived importance between different sub-bands.

在经过上述量化处理后, 利用熵编码技术进一步去除量化后的系数以及边信息的统计 冗余。 熵编码是一种信源编码技术, 其基本思想是: 对出现概率较大的符号给予较短长度 的码字,而对出现概率小的符号给予较长的码字,这样平均码字的长度最短。根据 Shannon 的无噪声编码定理,如果传输的 N个源消息的符号是独立的,那么使用适当的变长度编码, 码字的平均长度 将满足 H(x) _ 1  After the above quantization process, the entropy coding technique is used to further remove the quantized coefficients and the statistical redundancy of the side information. Entropy coding is a kind of source coding technology. The basic idea is to give shorter-length codewords to symbols with higher probability of occurrence and longer codewords to symbols with lower probability of occurrence, so that the length of average codewords The shortest. According to Shannon's noiseless coding theorem, if the symbols of the transmitted N source messages are independent, then using the appropriate variable length coding, the average length of the codeword will satisfy H(x) _ 1

≤n < , 其中 表示信源的熵, log2 (£>) log2 (D) N 示符号变量。 由于熵 #W是平均码字长度的最短极限, 上述公式表明此时码字的平均长度 很接近于它的下限熵 因此这种变长度编码技术又成为 "熵编码"。 熵编码主要有 Huffman编码、 算术编码或游程编码等方法, 本发明中的熵编码均可采用上述编码方法的 任一种。 ≤ n < , where represents the entropy of the source, log 2 (£>) log 2 (D) N shows the symbol variable. Since entropy #W is the shortest limit of the average codeword length, the above formula indicates that the average length of the codeword is very close to its lower bound entropy at this time. Therefore, this variable length coding technique becomes "entropy coding". The entropy coding mainly includes methods such as Huffman coding, arithmetic coding or run length coding, and any entropy coding in the present invention may adopt any of the above coding methods.

经过标量量化器量化后输出的量化谱和差分处理后的尺度因子在编码器中进行熵编 码, 得到码书序号、 尺度因子编码值和无损编码量化谱, 再对码书序号进行熵编码, 得到 码书序号编码值, 然后将尺度因子编码值、 码书序号编码值和无损编码量化倕输出到比特 流复用模块 55中。  After being quantized by the scalar quantizer, the quantized spectrum and the scaled factor after the differential processing are entropy encoded in the encoder, and the code book serial number, the scale factor coded value and the lossless coded quantized spectrum are obtained, and then the code book number is entropy coded. The code book serial number encodes the value, and then outputs the scale factor code value, the code book sequence code value, and the lossless code quantizer 到 to the bit stream multiplexing module 55.

经过矢量量化器量化后得到的码字索引在编码器中进行一维或多维熵编码, 得到码字 索引的编码值, 然后将码字索引的编码值输出到比特流复用模块 55中。 基于上述编码器的编码方法, 具体包括: 对输入音频信号进行信号类型分析; 计算音 频信号的信掩比; 对音频信号进行时频映射, 获得音频信号的频域系数; 对频域系数进行 多分辨率分析以及量化和熵编码; 将信号类型分析结果和编码后的音频码流进行复用, 得 到压缩音频码流。 The codeword index obtained by the vector quantizer quantization is subjected to one-dimensional or multi-dimensional entropy coding in the encoder to obtain an encoded value of the codeword index, and then the encoded value of the codeword index is output to the bitstream multiplexing module 55. The encoding method based on the above encoder specifically includes: performing signal type analysis on the input audio signal; calculating a signal mask ratio of the audio signal; performing time-frequency mapping on the audio signal to obtain a frequency domain coefficient of the audio signal; and performing more on the frequency domain coefficient Resolution analysis and quantization and entropy coding; multiplexing the signal type analysis result and the encoded audio code stream to obtain a compressed audio code stream.

分析信号类型是基于自适应阈值和波形预测进行前、 后向掩蔽效应分析来确定的, 具 体步骤是:将输入的音频数据分解成帧;把输入帧分解成多个子帧,并查找各个子帧上 PCM 数据绝对值的局部最大点; 在各子帧的局部最大点中选出子帧峰值; 对某个子帧峰值, 利 用该子帧前面的多个(典型的可取 3个)子帧峰值预测相对该子帧前向延迟的多个(典型 的可取 4个)子帧的典型样本值;计算该子帧峰值与所预测出的典型样本值的差值和比值; 如果预测差值和比值都大于设定的阈值, 则判断该子帧存在突跃信号, 确认该子帧具备后 向掩蔽预回声能力的局部最大峰点, 如果在该子帧前端与掩蔽峰点前 2. 5ms处之间存在一 个峰值足够小的子帧, 则判断该帧信号属于快变类型信号; 如果预测差值和比值不大于设 定的阈值, 则重复上述步骤直到判断出该帧信号是快变类型信号或者到达最后一个子帧, 如果到达最后一个子帧仍未判断出该帧信号是快变类型信号, 则该帧信号属于緩变类型信 号。  The analysis signal type is determined based on adaptive threshold and waveform prediction for front and back masking effect analysis. The specific steps are: decomposing the input audio data into frames; decomposing the input frame into multiple sub-frames, and finding each sub-frame The local maximum point of the absolute value of the upper PCM data; the subframe peak value is selected in the local maximum point of each subframe; for a certain subframe peak, a plurality of (typically 3) subframe peak predictions in front of the subframe are utilized Typical sample values of a plurality of (typically 4) subframes relative to the forward delay of the subframe; calculating a difference and a ratio of the peak value of the subframe to the predicted typical sample value; if both the predicted difference and the ratio are If it is greater than the set threshold, it is determined that there is a sudden signal in the sub-frame, and the sub-frame has a local maximum peak point of the back-masking pre-echo capability, if between the front end of the sub-frame and the mask peak, 2.5 ms. If there is a sub-frame with a sufficiently small peak, it is judged that the frame signal belongs to the fast-changing type signal; if the predicted difference value and the ratio are not greater than the set threshold, the above steps are repeated until it is determined The frame signal is a fast-changing type signal or reaches the last sub-frame. If the last sub-frame is not determined to be a fast-changing type signal, the frame signal belongs to a slowly varying type signal.

对时域音频信号进行时频变换的方法有很多, 如离散傅立叶变换(DFT )、 离散余弦变 换(DCT )、 修正离散余弦变换(MDCT )、 余弦调制滤波器组、 小波变换等。 下面以修正离 散余弦变换 MDCT和余弦调制滤波为例说明时频映射的过程。  There are many methods for time-frequency transform of time-domain audio signals, such as discrete Fourier transform (DFT), discrete cosine transform (DCT), modified discrete cosine transform (MDCT), cosine-modulated filter bank, wavelet transform, and so on. The following describes the process of time-frequency mapping by taking the modified discrete cosine transform MDCT and cosine modulation filtering as an example.

对于采用修正离散余弦变换 MDCT进行时频变换的情况, 首先选取前一帧 个样本和 当前帧 个样本的时域信号, 再对这两帧共 2 个样本的时域信号进行加窗操作, 然后对 经过加窗后的信号进行 MDCT变换, 从而获得 个频域系数。  For the case of time-frequency transform using the modified discrete cosine transform (MDCT), first select the time domain signals of the previous frame and the current frame samples, and then window the time domain signals of the two frames of the two frames, and then The MDCT transform is performed on the windowed signal to obtain a frequency domain coefficient.

MDCT分析滤波器的脉冲响应为:

Figure imgf000012_0001
则 MDCT变换为: X{k) = ∑ x(n)hk ( ) 0≤k≤M - l , 其中: 为窗函数; x (n)^} MDCT 变换的输入时域信号; X MDCT变换的输出频域信号。 The impulse response of the MDCT analysis filter is:
Figure imgf000012_0001
Then the MDCT transform is: X{k) = ∑ x(n)h k ( ) 0≤k≤M - l , where: is the window function; x (n)^} the input time domain signal of the MDCT transform; X MDCT transform Output frequency domain signal.

为满足信号完全重构的条件, MDCT变换的窗函数 必须满足以下两个条件:  In order to satisfy the condition of complete signal reconstruction, the window function of the MDCT transform must satisfy the following two conditions:

w(2M - l - n) = w(n) 且 w2 (n) + w2 (n + M) = l . 在实际中, 可选用 S ine 窗作为窗函数。 当然, 也可以通过使用双正交变换, 用特定 的分析滤波器和合成滤波器修改上述对窗函数的限制。 w(2M - l - n) = w(n) and w 2 (n) + w 2 (n + M) = l . In practice, the S ine window can be selected as the window function. Of course, it is also possible to use a bi-orthogonal transformation to specify The analysis filter and synthesis filter modify the above limitations on the window function.

对于采用余弦调制滤波进行时频变换的情况, 则首先选取前一帧 个样本和当前帧 M 个样本的时域信号, 再对这两帧共 2 个样本的时域信号进行加窗操作, 然后对经过加窗 后的信号进行余弦调制变换, 从而获得 个频域系数。  For the case of time-frequency transform using cosine modulation filtering, first select the time domain signal of the previous frame sample and the M frame of the current frame, and then perform windowing operation on the time domain signals of two samples of the two frames, and then A cosine modulation transform is performed on the windowed signal to obtain a frequency domain coefficient.

传统的余弦调制滤波技术的冲击响应为  The impulse response of the traditional cosine modulation filtering technique is

K (") = 2pa (") + 0.5)(" - ¾ + ¾K (") = 2p a (") + 0.5)(" - 3⁄4 + 3⁄4

Figure imgf000013_0001
Figure imgf000013_0001

n=QX"',Nh -1

Figure imgf000013_0002
n=QX"', N h -1
Figure imgf000013_0002

n=0,l,---,Nf 一 1 其中 0≤ <Μ— 1, 0≤η<2ΚΜ-1, 为大于零的整数, 1)*^。 假设 Μ子带余弦调制滤波器组的分析窗(分析原型滤波器) ^0)的冲击响应长度为 Να , 综合窗(综合原型滤波器) ps ( )的冲击响应长度为 Ns。 当分析窗和综合窗相等时, 即 P。(n~) = ps(n),RNa = Ns , 由上面两式所表示的余弦调制滤波器组为正交滤波器组, 此 时矩阵 H和 ( [H]nk = hk(n),[F]nJ[ = ΛΟ) )为正交变换矩阵。为获得线性相位滤波器组, 进一步规定对称窗满足 ρα(2ΚΜ— \— ) = Ρα( )。 为保证正交和双正交系统的完全重构性, 窗函数还需满足一定的条件, 详见文献 "Multirate Systems and Filter Banks" , P. P. Vaidynathan, Prentice Hal 1, Englewood Cliffs, NJ, 1993。 n = 0, l, ---, N f - 1 where 0 ≤ < Μ - 1, 0 ≤ η < 2 ΚΜ -1, is an integer greater than zero, 1) * ^. Assume that the analysis response window (analytical prototype filter) of the dice with cosine-modulated filter bank has a shock response length of Ν α , and the integrated window (integrated prototype filter) p s ( ) has an impulse response length of N s . When the analysis window and the synthesis window are equal, that is, P. (n~) = p s (n), RN a = N s , the cosine-modulated filter bank represented by the above two equations is an orthogonal filter bank, where matrix H and ( [H] nk = h k ( n), [F] nJ[ = ΛΟ)) is an orthogonal transformation matrix. To obtain a linear phase filter bank, it is further specified that the symmetric window satisfies ρ α (2ΚΜ— \− ) = Ρα ( ). In order to ensure the complete reconfiguration of orthogonal and biorthogonal systems, the window function also needs to meet certain conditions, see the literature "Multirate Systems and Filter Banks", PP Vaidynathan, Prentice Hal 1, Englewood Cliffs, NJ, 1993.

计算重采样后信号的掩蔽阁值和信掩比包括以下步驟:  Calculating the masking value and the mask ratio of the resampled signal includes the following steps:

第一步、 将信号进行时域到频域的映射。 可采用快速傅立叶变换和汉宁窗(hanning window )技术将时域数据转换成频域系数 l ], 用幅度 r ]和相位 表示为  The first step is to map the signal from time domain to frequency domain. The fast Fourier transform and the hanning window technique can be used to convert the time domain data into frequency domain coefficients l], expressed by the amplitude r] and the phase

X[k]^r[k]ejm , 那么每个子带的能量 是该子带内所有谱线能量的和, 即 其中 和 >,分别表示子带 b的上下边界。X[k]^r[k]e jm , then the energy of each sub-band is the sum of the energy of all the spectral lines in the sub-band, that is, the sum >, respectively representing the upper and lower boundaries of the sub-band b.

Figure imgf000013_0003
Figure imgf000013_0003

第二步、 确定信号中的音调和非音调成分。 信号的音调性是通过对每个谱线进行帧间 预测来估计的, 每个镨线的预测值和真实值的欧氏距离被映射为不可预测测度, 高预测性 的谱成分被认为是音调性很强的, 而低预测性的谱成分被认为是类噪声的。 预测值的幅度 rpred和相位 φρκά可用以下公式来表示: rpred \ =, [k] + {rt_ [k] - rt_2 [k]) 其中, ί表示当前帧的系数; -1表示前一帧的系数; 卜 2表示前两帧的系数。 The second step is to determine the pitch and non-tonal components in the signal. The tonality of the signal is estimated by inter-predicting each spectral line. The Euclidean distance between the predicted and true values of each squall line is mapped to an unpredictable measure, and the highly predictive spectral components are considered to be tones. Very strong, and low-predictive spectral components are considered to be noise-like. The amplitude r pred and the phase φ ρκά of the predicted value can be expressed by the following formula: r pred \ =, [k] + {r t _ [k] - r t _ 2 [k]) where ί represents the coefficient of the current frame; -1 indicates the coefficient of the previous frame; Bu 2 indicates the coefficient of the first two frames.

那么, 不可预测测度 的计算公式为:

Figure imgf000014_0001
Then, the formula for calculating the unpredictable measure is:
Figure imgf000014_0001

其中, 欧氏距离 i¾t (; 采用下式计算: di

Figure imgf000014_0002
Among them, the Euclidean distance i3⁄4t (; is calculated by the following formula: di
Figure imgf000014_0002

因此,每个子带的不可预测度 是该子带内所有谱线的能量对其不可预测度的加权 和, 。 子带能量 和不可预测度 c[b]分别与扩展函数进行卷积运

Figure imgf000014_0003
Therefore, the unpredictability of each subband is the weighted sum of the energy of all the spectral lines in the subband to its unpredictability. Subband energy and unpredictability c[b] are convoluted separately with the extension function
Figure imgf000014_0003

算, 得到子带能量扩展 和子带不可预测度扩展^: ό], 掩模 对子带 6的扩展函数表 示为 sU, ]。 为了消除扩展函数对能量变换的影响, 需要对子带不可预测度扩展 做归一化处理, 其归一化的结果用 [b]表示为 [δ] = ^。 同样, 为消除扩展函数对子 带能量的影响, 定义归一化能量扩展5;| ]为 [ ]:=^, 其中归一化因子 为: n[b] Calculate, obtain subband energy extension and subband unpredictability extension ^: ό], mask The extension function for subband 6 is expressed as sU, ]. In order to eliminate the influence of the extension function on the energy transformation, the sub-band unpredictability extension needs to be normalized, and the normalized result is expressed as [δ] = ^ by [b]. Similarly, to eliminate the effect of the extension function on the subband energy, define the normalized energy extension 5; | ] is [ ]:=^, where the normalization factor is: n[b]

bmax  Bmax

n[b]=∑∑s[i,b], ό„„为该帧信号所分的子带数。 根据归一化不可预测度扩展 , 可计算子带的音调性 : t[b] = -0.299 - 0.43 loge (cs [b]) , 且 0≤t[6]≤l。 当 [6]=1时, 表示该子带信号为纯音调; 当 [6]=0时, 表示该子带信号为白噪声。 n[b]=∑∑s[i,b], ό„„ is the number of subbands divided by the frame signal. According to the normalized unpredictability expansion, the pitch of the subband can be calculated: t[b] = -0.299 - 0.43 log e (c s [b]) , and 0 ≤ t[6] ≤ l. When [6]=1, it indicates that the subband signal is pure tone; when [6]=0, it indicates that the subband signal is white noise.

第三步、 计算每个子带所需的信噪比(Signal- to- Noise Ratio, 简称 SNR)。 将所有 子带的噪声掩蔽音调 (Noise- Masking- Tone, 筒称 丽 T ) 的值设为 5dB, 音调掩蔽噪声 (Tone-Masking- Noise, 简称 TMN )的值设为 18dB, 若要使噪声不被感知, 则每个子带所 需的信噪比 15½ [0]是1¾、 [6] = 18 ] + 6(1- ])。 第四步、 计算每个子带的掩蔽阈值以及信号的感知熵。 根据前述步骤得到的每个子带 的归一化信号能量和所需的信噪比 SNR, 计算每个子带的噪声能量阈值 为 «[b] = [6]10—纖 0。 为了避免预回声的影响, 将当前帧的噪声能量阈值 [6]与前一帧的噪声能量阔值 nprev[b]进行比较' 得到信号的掩蔽阈值为 = m(n[b],2nprev[b]) , 这样可以确保掩蔽阈 值不会因为在分析窗的近末端有高能量的冲击产生而出现偏差。 The third step is to calculate the Signal-to-Noise Ratio (SNR) required for each sub-band. Set the value of the noise masking tone (Noise-Masking-Tone) of all sub-bands to 5dB, and the value of Tone-Masking-Noise (TMN) to 18dB. When perceived, the required signal-to-noise ratio for each sub-band 1 51⁄2 [0] is 1 3⁄4, [6] = 18 ] + 6(1- ]). The fourth step is to calculate the masking threshold of each subband and the perceptual entropy of the signal. According to the normalized signal energy of each sub-band obtained by the foregoing steps and the required signal-to-noise ratio SNR, the noise energy threshold of each sub-band is calculated as «[b] = [6] 10 - fiber 0 . In order to avoid the influence of pre-echo, the noise energy threshold [6] of the current frame is compared with the noise energy threshold n prev [b] of the previous frame. 'The masking threshold of the signal is = m(n[b], 2n prev [b]) , This ensures that the masking threshold does not deviate due to high energy impacts at the near end of the analysis window.

进一步地, 考虑静止掩蔽阈值 ^ rW的影响, 选择最终的信号的掩蔽阈值为静止掩 蔽阈值与上述计算的掩蔽阈值两者中的数值大者, 即 "[6] = max(n[b],qsthr[b])。 然后采用 口下公式计算感知熵, 即;½ 其中 cbiv議 示各

Figure imgf000015_0001
Further, considering the influence of the static masking threshold ^ rW, the masking threshold for selecting the final signal is greater than the value of the static masking threshold and the above calculated masking threshold, ie "[6] = max(n[b], Qsthr[b]). Then use the formula under the mouth to calculate the perceptual entropy, that is, 1⁄2 where cbiv represents each
Figure imgf000015_0001

予带所包含的谱线个数。 The number of lines included in the band.

第五步: 计算每个子带信号的信掩比(Signa卜 to-Mask Ratio, 简称 SMR)。 每个子带 的信掩比 SMR [b]为 SMi?[b] = 。 Step 5: Calculate the signal-to-mask ratio (Signa to-Mask Ratio, SMR for short) of each sub-band signal. The mask ratio of each subband is SMi[b] = for SMR [b].

Figure imgf000015_0002
Figure imgf000015_0002

然后对频域系数进行多分辨率分析。 多分辨率分析模块 53对输入的频域数据进行时 -频 咸的重新组织,以频率精度的降低为代价提高频域数据的时间分辨率,从而自动适应快变类型信号的时 频特性, 达到抑制预回声的效果, 且无需调整时频映射模块 52中滤波器组的形式。  Multi-resolution analysis of the frequency domain coefficients is then performed. The multi-resolution analysis module 53 performs time-frequency salt re-organization on the input frequency domain data, and improves the time resolution of the frequency domain data at the cost of reducing the frequency precision, thereby automatically adapting to the time-frequency characteristics of the fast-changing type signal, The effect of the pre-echo is suppressed, and the form of the filter bank in the time-frequency mapping module 52 need not be adjusted.

多分辨率分析包括频域系数变换和重组两个步骤, 其中通过频域系数变换将频域系数 变换为时频平面系数; 通过重组将时频平面系数按照一定的规则进行分组。  The multi-resolution analysis includes two steps of frequency domain coefficient transform and recombination, wherein frequency domain coefficients are transformed into time-frequency plane coefficients by frequency domain coefficient transform; time-frequency plane coefficients are grouped according to certain rules by recombination.

下面以频域小波变换和频域 MDCT变换为例, 说明多分辨率分析的过程。  The process of multi-resolution analysis is illustrated by taking the frequency domain wavelet transform and the frequency domain MDCT transform as examples.

1 )频域小波变换  1) Frequency domain wavelet transform

假设时序序列 x(f),i = 0,1,···,2 -1 , 经过时频映射后得到的频域系数为 X(k), 1=0、  Assuming the time series x(f), i = 0,1,···, 2 -1 , the frequency domain coefficients obtained after time-frequency mapping are X(k), 1=0,

1、 …、 M-la 频域小波或小波包变换的小波基可以是固定的, 也可以是自适应的。 1, ..., Ml a frequency domain wavelet or wavelet packet transform wavelet base can be fixed or adaptive.

下面以最简单的 Harr 小波基小波变换为例, 说明对频域系数进行多分辨率分析的过  Let's take the simplest Harr wavelet-based wavelet transform as an example to illustrate the multi-resolution analysis of frequency domain coefficients.

Harr小波基的尺度系数为 ,- ^] , 图 6示出了采用 Harr

Figure imgf000015_0003
The scale factor of the Harr wavelet base is -^], and Figure 6 shows the use of Harr.
Figure imgf000015_0003

1 11 1

^波基进行小波变换的滤波结构示意图, 其中 H。表示低通滤波(滤波系数为 ), ^ Schematic diagram of the filtering structure of the wavelet transform, where H. Indicates low pass filtering (filter coefficient is ),

V2 V2 H,表示高通滤波(滤波系数为 ), " I 2" 表示 1倍的下采样操作。 对于频域系 V2 V2 H, which means high-pass filtering (filter coefficient is), and "I 2" means 1 time of downsampling operation. For the frequency domain

V2 1  V2 1

数的中低频部分 :), = 0, · · ·, ^不进行小波变换, 对频域系数的高频部分进行 Harr小波 变换,得到不同的时间 -频率区间的系数 2 (k)、 X3 (k)、 X k)、 X5 (k) . X6 (k) ^ X7 (k) , 对应的时间-频率平面划分如图 7 所示。 选择不同的小波基, 可选用不同的小波变换结构 进行处理, 得到其他类似的时间 -频率平面划分。 因此可以 #_据需要, 任意调整信号分析 时的时频平面划分, 满足不同的时间和频率分辨率的分析要求。 The middle and low frequency parts of the number :), = 0, · · ·, ^ Do not perform wavelet transform, perform Harr wavelet transform on the high frequency part of the frequency domain coefficient, and obtain coefficients 2 (k), X 3 of different time-frequency intervals. (k), X k), X 5 (k) . X 6 (k) ^ X 7 (k) , the corresponding time-frequency plane division is shown in Fig. 7. Different wavelet bases can be selected, and different wavelet transform structures can be selected for processing, and other similar time-frequency plane partitions are obtained. Therefore, it is possible to adjust the time-frequency plane division of the signal analysis arbitrarily according to the needs, and to meet the analysis requirements of different time and frequency resolutions.

上述时频平面系数在重组模块中按照一定的规则进行重組, 例如: 可先将时频平面系 数在频率方向组织, 每个频带中的系数在时间方向组织, 然后将组织好的系数按照子窗、 尺度因子带的顺序排列。  The time-frequency plane coefficients are reorganized according to certain rules in the recombination module. For example: the time-frequency plane coefficients can be organized in the frequency direction, the coefficients in each frequency band are organized in the time direction, and then the well-organized coefficients are followed by the sub-window. , the order of the scale factor bands.

2 )频域 MDCT变换  2) Frequency domain MDCT transform

设输入频域 MDCT变换滤波器组的频域数据为 X(l), 1= 0, 1 , ..., N-1 , 依次对这 N点频 域数据进行 M点的 MDCT变换, 使得时频域数据的频率精度有所下降, 而时间精度则相应 地提高了。 在不同的频域范围内使用不同长度的频域 MDCT 变换, 可以获得不同的时-频 平面划分即不同的时、 频精度。 重组模块对频域 MDCT 变换滤波器组输出的时-频域数据 进行重组, 一种重组方法是先将时频平面系数在频率方向组织, 同时每个频带中的系数在 时间方向组织, 然后将组织好的系数按照子窗、 尺度因子带的顺序排列。  Let the frequency domain data of the input frequency domain MDCT transform filter bank be X(l), 1= 0, 1 , ..., N-1, and perform MCT transform of M point in sequence for the N point frequency domain data, so that time The frequency accuracy of the frequency domain data is reduced, and the time accuracy is correspondingly improved. Different frequency-domain MDCT transforms can be used in different frequency domain ranges to obtain different time-frequency plane divisions, that is, different time and frequency precisions. The recombination module reorganizes the time-frequency domain data outputted by the frequency domain MDCT transform filter bank. A recombination method is to first organize the time-frequency plane coefficients in the frequency direction, and the coefficients in each frequency band are organized in the time direction, and then The organized coefficients are arranged in the order of the sub-window and the scale factor.

量化和熵编码进一步包括了非线性量化和熵编码两个步骤, 其中量化可以是标量量化 或矢量量化。  Quantization and entropy coding further include two steps of nonlinear quantization and entropy coding, where the quantization can be scalar quantization or vector quantization.

标量量化包括以下步 -骤: 对所有尺度因子带中的频域系数进行非线性压缩; 再利用每 个子带的尺度因子对该子带的频域系数进行量化, 得到整数表示的量化谱; 选择每帧信号 中的第一个尺度因子作为公共尺度因子; 其它尺度因子与其前一个尺度因子进行差分处 理。  The scalar quantization includes the following steps: nonlinearly compressing the frequency domain coefficients in all scale factor bands; and then quantizing the frequency domain coefficients of the subbands by using the scale factor of each subband to obtain a quantized spectrum represented by an integer; The first scale factor in each frame of the signal is used as a common scale factor; other scale factors are differentially processed from their previous scale factor.

矢量量化包括以下步骤: 将频域系数构成多个多维矢量信号; 对于每个 维矢量都根 据平整因子进行潘平整; ■据主观感知距离测度准则在码书中查找与待量化矢量距离最小 的码字, 获得其码字索引。  Vector quantization includes the following steps: constituting a plurality of multi-dimensional vector signals by frequency domain coefficients; panning for each dimension vector according to a flattening factor; ■ finding a code having the smallest distance from a vector to be quantized in a codebook according to a subjective perceptual distance measure criterion Word, get its codeword index.

熵编码步骤包括: 对量化谱和差分处理后的尺度因子进行熵编码, 得到码书序号、 尺 度因子编码值和无损编码量化语; 对码书序号进行熵编码, 得到码书序号编码值。  The entropy coding step comprises: entropy coding the quantized spectrum and the differentially processed scale factor to obtain a codebook serial number, a scale factor coding value, and a lossless coding quantization language; entropy coding the codebook sequence number to obtain a codebook serial number coding value.

或者是: 对码字索引进行一维或多维熵编码, 得到码字索引的编码值。 上迷的熵编码方法可以采用现有的 Huffman编码、 算术编码或游程编码等方法中的任 ―种。 Or: Perform one-dimensional or multi-dimensional entropy coding on the codeword index to obtain the coded value of the codeword index. The above entropy coding method can adopt any of the existing methods such as Huffman coding, arithmetic coding or run length coding.

经过量化和熵编码处理后, 得到编码后的音频码流, 将该码流与公共尺度因子、 信号 类型分析结果一起进行复用, 得到压缩音频码流。  After quantization and entropy coding processing, the encoded audio code stream is obtained, and the code stream is multiplexed together with the common scale factor and signal type analysis result to obtain a compressed audio code stream.

图 8是本发明音频解码装置的结构示意图。 音频解码装置包括比特流解复用模块 60、 熵解码模块 61、 逆量化器组 62、 多分辨率综合模块 63和频率-时间映射模块 64。 压缩音 频码流经过比特流解复用模块 60 的解复用后, 得到相应的数据信号和控制信号, 输出到 熵解码模块 61和多分辨率综合模块 63;数据信号和控制信号在熵解码模块 61中进行解码 处理, ' 复出谱的量化值。 上述量化值在逆量化器组 62 中重建, 得到逆量化后的谱, 逆 量化语输出到多分辨率综合模块 63中,经过多分辨率综合后输出到频率-时间映射模块 64 中, 再经过频率-时间映射得到时域的音频信号。 Figure 8 is a block diagram showing the structure of an audio decoding device of the present invention. The audio decoding apparatus includes a bit stream by demultiplexing module 60, entropy decoding module 61, an inverse quantizer group 62, the multi-resolution integration module 63 and a frequency - time map module 64. After the compressed audio code stream is demultiplexed by the bit stream demultiplexing module 60, corresponding data signals and control signals are obtained, which are output to the entropy decoding module 61 and the multi-resolution synthesis module 63; the data signal and the control signal are in the entropy decoding module. In 61, the decoding process is performed, 'the quantized value of the spectrum is recovered. The above quantized values are reconstructed in the inverse quantizer group 62 to obtain an inverse quantized spectrum. The inverse quantized words are output to the multi-resolution synthesis module 63, and after multi-resolution synthesis, are output to the frequency-time mapping module 64, and then The frequency-time mapping yields an audio signal in the time domain.

' 比特流解复用模块 60 对压缩音频码流进行分解, 得到相应的数据信号和控制信号, 为其他模块提供相应的解码信息。 压缩音频数据流经过解复用后, 输出到熵解码模块 61 的信号包括公共尺度因子、 尺度因子编码值、 码书序号编码值和无损编码量化谱, 或者是 码字索引的编码值; 输出信号类型信息到多分辨率综合模块 63。  The bit stream demultiplexing module 60 decomposes the compressed audio stream to obtain corresponding data signals and control signals, and provides corresponding decoding information for other modules. After the compressed audio data stream is demultiplexed, the signal outputted to the entropy decoding module 61 includes a common scale factor, a scale factor coded value, a codebook sequence number coded value, and a lossless coded quantized spectrum, or an encoded value of the codeword index; The type information is sent to the multi-resolution synthesis module 63.

如果在编码装置中量化和熵编码模块 54 中采用标量量化器, 则在解码装置中, 熵解 码模块 61收到的是比特流解复用模块 60输出的公共尺度因子、 尺度因子编码值、 码书序 号编码值和无损编码量化谱, 然后对其进行码书序号解码、 谱系数解码和尺度因子解码, 重建出量化 i普, 并向逆量化器组 62 输出尺度因子的整数表示和谱的量化值。 熵解码模块 61采用的解码方法与编码装置中熵编码的编码方法相对应, 如 Huffman解码、 算术解码或 游程解码等。 ■  If a scalar quantizer is employed in the quantization and entropy encoding module 54 in the encoding device, in the decoding device, the entropy decoding module 61 receives the common scale factor, scale factor encoded value, and code output by the bitstream demultiplexing module 60. The book serial number coded value and the lossless coded quantized spectrum are then subjected to codebook sequence number decoding, spectral coefficient decoding and scale factor decoding, reconstructed quantization, and output the integer representation of the scale factor and the quantization of the spectrum to the inverse quantizer group 62. value. The decoding method employed by the entropy decoding module 61 corresponds to an entropy-encoded encoding method in the encoding device, such as Huffman decoding, arithmetic decoding, or run-length decoding. ■

逆量化器组 62 接收到谱的量化值和尺度因子的整数表示后, 将谱的量化值逆量化为 无缩放的重建语(逆量化谱), 并向多分辨率综合模块 63输出逆量化借。 逆量化器组 62 可以是均匀量化器组, 也可以是通过压扩函数实现的非均匀量化器组。 在编码装置中, 量 化器组采用的是标量量化器, 则在解码装置中逆量化器组 62 也采用标量逆量化器。 在标 量逆量^ ^器中, 首先对谱的量化值进行非线性扩张, 然后利用每个尺度因子得到对应尺度 因子带中所有的谱系数(逆量化谱)。  After receiving the quantized value of the spectrum and the integer representation of the scale factor, the inverse quantizer group 62 inversely quantizes the quantized value of the spectrum into a non-scaled reconstruction (inverse quantization spectrum), and outputs the inverse quantization borrowing to the multi-resolution synthesis module 63. . The inverse quantizer group 62 may be a uniform quantizer group or a non-uniform quantizer group implemented by a companding function. In the encoding apparatus, the quantizer group employs a scalar quantizer, and the inverse quantizer group 62 also employs a scalar inverse quantizer in the decoding apparatus. In the scalar inverse quantity controller, the spectral quantized values are first nonlinearly expanded, and then each scale factor is used to obtain all the spectral coefficients (inverse quantized spectrum) in the corresponding scale factor bands.

如果量化和燏编码模块 54中采用矢量量化器, 则在解码装置中, 熵解码模块 61收到 比特流觯复用模块 60 输出的码字索引的编码值, 将码字索引的编码值采用与编码时的熵 编码方法对应的熵解码方法进行解码, 得到对应的码字索引。 码字索引输出到逆量化器组 62中, 通过查询码书, 得到量化值(逆量化谱), 输出到 多分辨率综合模块 63。 逆量化器组 62采用逆矢量量化器。 逆量化谱经过多分辨率综合后, 通过频率-时间映射模块 64的映射处理, 得到时域音频信号。 频率-时间映射模块 64可以 是逆离散余弦变换(IDCT ) 滤波器组、 逆离散傅立叶变换(IDFT )滤波器组、 逆修正离散 余弦变换(IMDCT ) 滤波器组、 逆小波变换滤波器组以及余弦调制滤波器组等。 If the vector quantizer is employed in the quantization and chirp encoding module 54, in the decoding apparatus, the entropy decoding module 61 receives the encoded value of the codeword index output by the bitstream multiplexing module 60, and uses the encoded value of the codeword index. The entropy decoding method corresponding to the entropy coding method at the time of encoding is decoded to obtain a corresponding codeword index. The codeword index is output to the inverse quantizer group 62, and the quantized value (inverse quantized spectrum) is obtained by querying the codebook, and output to the multi-resolution synthesis module 63. The inverse quantizer group 62 employs an inverse vector quantizer. After the inverse quantization spectrum is integrated by the multi-resolution, the time domain audio signal is obtained by the mapping process of the frequency-time mapping module 64. The frequency-time mapping module 64 may be an inverse discrete cosine transform (IDCT) filter bank, an inverse discrete Fourier transform (IDFT) filter bank, an inverse modified discrete cosine transform (IMDCT) filter bank, an inverse wavelet transform filter bank, and a cosine Modulation filter bank, etc.

基于上述解码器的解码方法包括: 对压缩音频码流进行解复用, 得到数据信息和控制 信息; 对上述信息进行熵解码, 得到 i普的量化值; 对谱的量化值进行逆量化处理, 得到逆 量化傳; 将逆量化谱进行多分辨率综合后, 再进行频率-时间映射, 得到时域音频信号。  The decoding method based on the above decoder includes: demultiplexing the compressed audio code stream to obtain data information and control information; performing entropy decoding on the information to obtain a quantized value of the i-precision; performing inverse quantization processing on the quantized value of the spectrum, The inverse quantization is obtained; after the inverse quantization spectrum is multi-resolution integrated, frequency-time mapping is performed to obtain a time domain audio signal.

如果解复用后的信息中包括码书序号编码值、 公共尺度因子、 尺度因子编码值和无损 编码量化谱, 则表明在编码装置中 i香系数是采用标量量化技术进行量化, 则嫡解码的步驟 包括: 对码书序号编码值进行解码, 获得所有尺度因子带的码书序号; 根据码书序号对应 的码书, 解码所有尺度因子带的量化系数; 解码所有尺度因子带的尺度因子, 重建量化谱。 上述过程所采用的熵解码方法对应编码方法中的熵编码方法, 如游程解码方法、 Huffman 解码方法、 算术解码方法等。  If the demultiplexed information includes the codebook serial number encoding value, the common scale factor, the scale factor encoding value, and the lossless encoding quantization spectrum, it indicates that the ixiang coefficient is quantized by the scalar quantization technique in the encoding device, and then the decoding is performed. The steps include: decoding the code number of the code book to obtain the code book number of all the scale factor bands; decoding the quantized coefficients of all the scale factor bands according to the code book corresponding to the code book serial number; decoding the scale factors of all the scale factor bands, reconstructing Quantitative spectrum. The entropy decoding method adopted in the above process corresponds to an entropy coding method in the coding method, such as a run length decoding method, a Huffman decoding method, an arithmetic decoding method, and the like.

下面以采用游程解码方法解码码书序号、 采用 Huffman解码方法解码量化系数和采用 Huffman解码方法解码尺度因子为例, 说明熵解码的过程。  In the following, the process of entropy decoding is illustrated by using a run-length decoding method to decode a codebook sequence number, a Huffman decoding method to decode a quantized coefficient, and a Huffman decoding method to decode a scale factor.

首先通过猝程解码方法获得所有尺度因子带的码书序号, 解码后的码书序号为某一区 间的整数, 如 £设该区间为 [0, 11] , 那么只有位于该有效范围内的, 即 0至 11之间的码 书序号才与对应的谱系数 Huffman码书相对应。对于全零子带,可选择某一码书序号对应, 典型的可选 0序号。  First, the codebook number of all scale factor bands is obtained by the process decoding method, and the decoded codebook sequence number is an integer of a certain interval. If the interval is set to [0, 11], then only the valid range is within the valid range. That is, the codebook number between 0 and 11 corresponds to the corresponding spectral coefficient Huffman codebook. For all zero sub-bands, you can select a code book serial number corresponding to a typical optional 0 serial number.

当解码得到各尺度因子带的码书号后, 使用与该码书号对应的谱系数 Huffman码书, 对所有尺度因予带的量化系数进行解码。 如果一个尺度因子带的码书号在有效范围内, 本 实施例如在 1至 11之间, 那么该码书号对应一个谱系数码书, 则使用该码书从量化谱中 解码得到尺度因子带的量化系数的码字索引, 然后从码字索引中解包得到量化系数。 如果 尺度因子带的码书号不在 1至 11之间, 那么该码书号不对应任何谱系数码书, 该尺度因 子带的量化系数也就不用解码, 直接将该子带的量化系数全部置为零。  After decoding the codebook number of each scale factor band, the spectral coefficient Huffman codebook corresponding to the codebook number is used to decode the quantized coefficients of all the scales. If the codebook number of a scale factor band is within the valid range, and the implementation is, for example, between 1 and 11, then the codebook number corresponds to a spectral coefficient codebook, and the quantized coefficient of the scale factor band is decoded from the quantized spectrum using the codebook. The codeword index is then unpacked from the codeword index to obtain the quantized coefficients. If the codebook number of the scale factor band is not between 1 and 11, the codebook number does not correspond to any spectral coefficient codebook, and the quantized coefficient of the subband is not decoded, and the quantized coefficients of the subband are all set to zero.

尺度因子用于在逆量化谱系数基础上重构 i普值, 如果尺度因子带的码书号处于有效范 围内, 则每一个码书号都对应一个尺度因子。 在对上述尺度因子进行解码时, 首先读取第 一个尺度因子所占用的码流, 然后对其它尺度因子进行 Huffman解码, 依次得到各尺度因 子与前一尺度因子之间的差值, 将该差值与前一尺度因子值相加, 得到各尺度因子。 如果 当前子带的量化系数全部为零, 那么该子带的尺度因子不需要解码。 The scale factor is used to reconstruct the i-value based on the inverse quantized spectral coefficients. If the codebook number of the scale factor is within the valid range, each codebook number corresponds to a scale factor. When decoding the above scale factor, first reading the code stream occupied by the first scale factor, and then performing Huffman decoding on other scale factors, and sequentially obtaining the difference between each scale factor and the previous scale factor, The difference is added to the previous scale factor value to obtain each scale factor. in case The quantized coefficients of the current subband are all zero, then the scale factor of the subband does not need to be decoded.

经过上述熵解码过程后, 得到谱的量化值和尺度因子的整数表示, 然后对谱的量化值 进行逆量化处理, 获得逆量化谱。 逆量化处理包括: 对谱的量化值进行非线性扩张; 根据 每个尺度因子得到对应尺度因子带中的所有谱系数(逆量化谱)。  After the above entropy decoding process, an integer representation of the spectral quantized value and the scale factor is obtained, and then the quantized value of the spectrum is inversely quantized to obtain an inverse quantized spectrum. The inverse quantization process includes: nonlinearly expanding the quantized values of the spectra; and obtaining all spectral coefficients (inverse quantized spectra) in the corresponding scale factor bands according to each scale factor.

如果解复用后的信息中包括码字索引的编码值, 则表明编码装置中釆用矢量量化技术 对谱系数进行量化, 则嫡解码的步驟包括: 采用与编码装置中熵编码方法对应的熵解码方 法对码字索引的编码值进行解码, 得到码字索引。 然后对码字索引进行逆量化处理, 获得 逆量化譜。  If the demultiplexed information includes the coded value of the codeword index, it indicates that the coding means quantizes the spectral coefficients by using a vector quantization technique, and then the step of decoding comprises: adopting an entropy corresponding to the entropy coding method in the coding device The decoding method decodes the encoded value of the codeword index to obtain a codeword index. The codeword index is then inverse quantized to obtain an inverse quantized spectrum.

对于逆量化谱, 如果是快变类型信号, 则对频域系数进行多分辩率分析, 然后对频域 系数的多分辨率表示进行量化和熵编码; 如果不是快变类型信号, 则直接将频域系数进行 量化和熵编码。  For the inverse quantization spectrum, if it is a fast-varying type signal, multi-resolution analysis is performed on the frequency domain coefficients, and then the multi-resolution representation of the frequency domain coefficients is quantized and entropy encoded; if it is not a fast-changing type signal, the frequency is directly The domain coefficients are quantized and entropy encoded.

多分辨率综合可采用频域小波变换法或频域 MDCT 变换法。 频域小波综合法包括: 先 将上述时频平面系数按照一定的规则重组; 再对频域系数进行小波变换, 得到时频平面系 数。 而 MDCT 变换法则包括: 先将上述时频平面系数按照一定的规则重组, 再对频域系数 进行数次 MDCT 变换, 得到时频平面系数。 重组的方法可以包括: 先将时频平面系数在频 率方向组织, 每个频带中的系数在时间方向组织, 然后将组织好的系数按照子窗、 尺度因 子带的顺序排列。  Multi-resolution synthesis can be performed by frequency domain wavelet transform or frequency domain MDCT transform. The frequency domain wavelet synthesis method includes: first recombining the above-mentioned time-frequency plane coefficients according to a certain rule; then performing wavelet transform on the frequency domain coefficients to obtain a time-frequency plane coefficient. The MDCT transform method includes: first recombining the above-mentioned time-frequency plane coefficients according to a certain rule, and then performing MDCT transform on the frequency domain coefficients several times to obtain a time-frequency plane coefficient. The method of recombining may include: firstly, the time-frequency plane coefficients are organized in the frequency direction, the coefficients in each frequency band are organized in the time direction, and then the organized coefficients are arranged in the order of the sub-window and the scale factor sub-band.

对频域系数进行频率-时间映射处理的方法与编码方法中的时-频映射处理方法相对 应, 可以采用逆离散余弦变换( IDCT )、 逆离散傅立叶变换( IDFT )、 逆修正离散余弦变换 ( IMDCT )、 逆小波变换等方法完成。  The method of performing frequency-time mapping processing on frequency domain coefficients corresponds to the time-frequency mapping processing method in the encoding method, and may use inverse discrete cosine transform (IDCT), inverse discrete Fourier transform (IDFT), inverse modified discrete cosine transform ( IMDCT), inverse wavelet transform and other methods are completed.

下面以逆修正离散余弦变换 IMDCT为例说明频率 -时间映射过程。 频率-时间映射过程 包括三个步驟: IMDCT变换、 时域加窗处理和时域叠加运算。  The inverse-corrected discrete cosine transform IMDCT is taken as an example to illustrate the frequency-time mapping process. The frequency-time mapping process consists of three steps: IMDCT transformation, time domain windowing, and time domain superposition.

首先对预测前的谱或逆量化谱进行 IMDCT变换, 得到变换后的时域信号 x,.„。 IMDCT变 换的表达式为: χ,.„ , 其中, η表示样本序号, 且  First, the IMDCT transform is performed on the pre-predicted spectrum or the inverse quantized spectrum to obtain the transformed time domain signal x,. The expression of the IMDCT transform is: χ,.„ , where η represents the sample number, and

'

Figure imgf000019_0001
'
Figure imgf000019_0001

0≤ n < N , 表示时域样本数, 取值为 2048 , "0 = (Λ/2+1) / 2 ; /表示帧序号; 表示谱序 号。 0 ≤ n < N , indicating the number of time domain samples, the value is 2048, " 0 = (Λ/2 + 1) / 2 ; / indicates the frame number; indicates the spectrum number.

其次, 对 IMDCT变换获得的时域信号在时域进行加窗处理。 为满足完全重构条件, 窗 函数 (n)必须满足以下两个条件: w(2M -\ - n) = w(n)且 w2 (n) + w2 (" + = 1。 典型的窗函数有 S i ne窗、 Ka i se r- Bes se l窗等。 本发明采用一种固定的窗函数, 其窗 函数为:

Figure imgf000020_0001
Secondly, the time domain signal obtained by the IMDCT transform is windowed in the time domain. To satisfy the full reconstruction condition, the window function (n) must satisfy the following two conditions: w(2M -\ - n) = w(n) and w 2 (n) + w 2 (" + = 1. Typical window functions are S i ne windows, Ka i se r- Bes se l windows, and the like. The present invention employs a fixed window function whose window function is:
Figure imgf000020_0001

0. . . N-l ; 表示窗函数的第 k个系数, 有 w (k) = w (2*^l-i); 表示编码帧的样本数, 取值为 7^1 024。 另外可以利用双正史变换, 采用特定的分析滤波器和合成滤波器修改上述 对窗函数的限制。 0. . . N -l ; represents the kth coefficient of the window function, with w (k) = w (2*^l-i); represents the number of samples of the encoded frame, and the value is 7^1 024. Alternatively, the double positive history transform can be used to modify the above-mentioned restrictions on the window function using a specific analysis filter and synthesis filter.

最后, 对上述加窗时域信号进行叠加处理, 得到时域音频信号。 具体是: 将加窗操作 后获得的信号的前 Nil个样本和前一帧信号的后 Nil个样本重叠相加, 获得 Nil个输出的 时域音频样本, timeSami n = preSa i n +

Figure imgf000020_0002
, 其中 表示帧序号, 表示样本 序号, 有 0≤"≤ 且 Ν的取值为 2 048。 Finally, the windowed time domain signal is superimposed to obtain a time domain audio signal. Specifically, the first Nil samples of the signal obtained after the windowing operation and the last Nil samples of the previous frame signal are overlapped and added to obtain Nil output time domain audio samples, timeSam in = preSa in +
Figure imgf000020_0002
, where the frame number is indicated, indicating the sample number, with 0 ≤ " ≤ and Ν is 2 048.

2  2

图 9是本发明编码装置的第一个实施例的示意图。 该实施例在图 5的基础上, 增加了 频域线性预测及矢量量化模块 56 , 位于多分辨率分析模块 53的输出与量化和熵编码模块 5 的输入之间, 输出残差序列到量化和熵编码模块 54 , 同时将量化得到的码字索引作为 边信息输出到比特流复用模块 55。  Figure 9 is a schematic illustration of a first embodiment of an encoding apparatus of the present invention. This embodiment adds a frequency domain linear prediction and vector quantization module 56 on the basis of FIG. 5, between the output of the multiresolution analysis module 53 and the input of the quantization and entropy coding module 5, and outputs the residual sequence to the quantized sum. The entropy encoding module 54 simultaneously outputs the quantized codeword index as side information to the bitstream multiplexing module 55.

由于频域系数在经过多分辨率 析后得到的是具有特定时频平面划分的时频系数, 因 此频域线性预测及矢量量化模块 56 需对每个时间段上的频域系数进行线性预测和多级矢 量量化。  Since the frequency domain coefficients are obtained by multi-resolution analysis, the time-frequency coefficients with specific time-frequency plane divisions are obtained, so the frequency-domain linear prediction and vector quantization module 56 needs to perform linear prediction on the frequency domain coefficients in each time period. Multi-level vector quantization.

多分辨率分析模块 53输出的频¾戈系数传送至频域线性预测及矢量量化模块 56中, 在 对频域系数进行多分辨率分析后, ^"每个时间段上的频域系数进行标准的线性预测分析; 如果预测增益满足给定的条件, 则才频域系数进行线性预测误差滤波, 获得的预测系数转 换成线谱频率系数 LSF ( Line Spec t rum Frequency ), 再采用最佳的失真度量准则搜索计 算出各级码本的码字索引, 并将码宇索引作为边信息传送到比特流复用模块 55 , 而经过预 测分析得到的残差序列则输出到量^ ^和熵编码模块 54。  The frequency signal output from the multi-resolution analysis module 53 is transmitted to the frequency domain linear prediction and vector quantization module 56. After multi-resolution analysis of the frequency domain coefficients, ^" the frequency domain coefficients on each time period are standard. Linear predictive analysis; if the predicted gain satisfies a given condition, the frequency domain coefficients are linearly predicted error filtered, and the obtained predictive coefficients are converted into line spectral frequency coefficients LSF (Line Spec t rum Frequency ), and then the best distortion is used. The metric search calculates the codeword index of each codebook, and transmits the code index as side information to the bit stream multiplexing module 55, and the residual sequence obtained through the prediction analysis is output to the quantity ^^ and the entropy coding module. 54.

频域线性预测及矢量量化模块 56 由线性预测分析器、 线性预测滤波器、 转换器和矢 量量化器构成。 频域系数输入到线' f生预测分析器中进行预测分析, 得到预测增益和预测系 数, 对满足一定条件的频域系数, 愉出到线性预测滤波器中进行滤波, 得到残差序列; 残 差序列直接输出到量化和熵编码模 54 中, 而预测系数通过转换器转换成线谱频率系数 LSF , 再将 LSF 参数送入矢量量化器中进行多级矢量量化, 量化后的信号被传送到比特流 复用模块 55中。  The frequency domain linear prediction and vector quantization module 56 is composed of a linear predictive analyzer, a linear predictive filter, a converter, and a vector quantizer. The frequency domain coefficient is input into the line 'f prediction analyzer for prediction analysis, and the prediction gain and the prediction coefficient are obtained. The frequency domain coefficients satisfying certain conditions are filtered out to the linear prediction filter to obtain a residual sequence; The difference sequence is directly output to the quantization and entropy coding mode 54, and the prediction coefficient is converted into a line spectral frequency coefficient LSF by the converter, and then the LSF parameter is sent to the vector quantizer for multi-level vector quantization, and the quantized signal is transmitted to In the bit stream multiplexing module 55.

对音频信号进行频域线性预测 理能够有效地抑制预回声并获得较大的编码增益。 假 设实信号 其平方 Hi lber t 包络 e y>表示为: e(t) = F— 1 fc( ) . C*( - ^, 其中 Frequency domain linear prediction of the audio signal can effectively suppress the pre-echo and obtain a larger coding gain. Fake Let the real signal have its square Hi lber t envelope e y> as: e (t) = F - 1 fc( ) . C*( - ^, where

CO为对应于信号 正频率成分的单边 i普, 即信号的 Hi lber t包络是与该信号谱的自相 关函数有关的。 而信号的功率谙密度函数与其时域波形的自相关函数的关系为: PSD(f) , 因此信号在时域的平方 Hi lber t包络与信号在频域的功率

Figure imgf000021_0001
CO is a one-sided i-perform corresponding to the positive frequency component of the signal, ie the Hi lber t envelope of the signal is related to the autocorrelation function of the signal spectrum. The relationship between the power 谙 density function of the signal and the autocorrelation function of its time domain waveform is: PSD(f) , so the squared Hi lber t envelope of the signal in the time domain and the power of the signal in the frequency domain
Figure imgf000021_0001

借密度函数是互为对偶关系的。 由上可 口, 每个一定频率范围内的部分带通信号, 如果它 的 Hi lbert包絡保持恒定, 那么相邻语值的自相关也将保持恒定, 这就意味着谱系数序列 相对于频率而言是稳态序列, 从而可以用预测编码技术来对语值进行处理, 用公用的一组 预测系数来有效地表示该信号。 The borrowing density function is mutually dual. From the above, the partial bandpass signal in each certain frequency range, if its Hi lbert envelope remains constant, then the autocorrelation of the adjacent value will remain constant, which means that the spectral coefficient sequence is relative to the frequency. It is a steady-state sequence so that the predictive coding technique can be used to process the speech values, and a common set of prediction coefficients is used to effectively represent the signal.

基于图 9所示编码装置的编码方法与基于图 5所示编码装置的编码方法基本相同, 区 别在于增加了下述步骤: 当频域系数经过多分辨率分析后, 对每个时间段上的频域系数进 行标准的线性预测分析, 得到预测增益和预测系数; 判断预测增益是否超过设定的阔值, 如果超过, 则根据预测系数对频域系数进行频域线性预测误差滤波, 得到残差序列; 将预 测系数转化成线 对频率系数,并对线潜对频率系数进行多级矢量量化处理,得到边信息; 对残差序列进行量化和熵编码; 如果预测增益未超过设定的阈值, 则对频域系数进行量化 和熵编码。 ·  The encoding method based on the encoding apparatus shown in FIG. 9 is basically the same as the encoding method based on the encoding apparatus shown in FIG. 5, except that the following steps are added: When the frequency domain coefficients are subjected to multi-resolution analysis, for each time period The frequency domain coefficient is subjected to standard linear prediction analysis to obtain prediction gain and prediction coefficient; whether the prediction gain exceeds the set threshold value, and if it exceeds, frequency domain linear prediction error filtering is performed on the frequency domain coefficient according to the prediction coefficient to obtain a residual The prediction coefficient is converted into a line pair frequency coefficient, and the line potential is subjected to multi-level vector quantization processing to obtain side information; the residual sequence is quantized and entropy encoded; if the prediction gain does not exceed the set threshold, The frequency domain coefficients are then quantized and entropy encoded. ·

当频域系数经过多分辨率分析后, 首先对每个时间段上的频域系数进行标准的线性预 测分析, 包括计算自相关矩阵、递推执行 Levinson-Durb in算法获得预测增益和预测系数。 然后判断计算的预测增益是否超过预先设定的阈值, 如果超过, 则根据预测系数对频域系 数进行线性预测误差滤波; 否则对频域系数不作处理, 执行下一步驟, 对频域系数进行量 化和熵编码。  After the multi-resolution analysis of the frequency domain coefficients, the standard linear prediction analysis of the frequency domain coefficients on each time period is first performed, including calculating the autocorrelation matrix and recursively performing the Levinson-Durb in algorithm to obtain the prediction gain and the prediction coefficient. Then, it is judged whether the calculated prediction gain exceeds a preset threshold, and if it exceeds, linear prediction error filtering is performed on the frequency domain coefficients according to the prediction coefficient; otherwise, the frequency domain coefficients are not processed, and the next step is performed to quantize the frequency domain coefficients. And entropy coding.

线性预测可分为前向预测和后向预测两种, 前向预测是指利用某一时刻之前的值预测 当前值, 而后向预测是指利用某一时刻之后的值预测当前值。 下面以前向预测为例说明线 性预测误差滤波, 线性预测误差滤波器的传递函数为 = 1 -∑^.2-'' , 其中 α,,表示预测 系数, ρ为预测阶数。 经过时间-频率变换后的频域系数 YU经过滤波后, 得到预测误差  Linear prediction can be divided into forward prediction and backward prediction. Forward prediction refers to predicting the current value by using the value before a certain moment, while backward prediction refers to predicting the current value by using the value after a certain moment. In the following, the linear prediction error filtering is described as an example. The transfer function of the linear prediction error filter is = 1 -∑^.2-'', where α, represents the prediction coefficient, and ρ is the prediction order. After the time-frequency transform, the frequency domain coefficient YU is filtered to obtain the prediction error.

E (k) , 也称残差序列, 两者之间满足关系 E k) = X(k) · A(z) = X(k) - J atX{k - i)。E (k) , also known as the residual sequence, satisfies the relationship between the two E k) = X(k) · A(z) = X(k) - J a t X{k - i).

=1  =1

这样, 经过线性预测误差滤波, 时间-频率变换输出的频域系数 X (k)就可以用残差序 列 E ί¾和一组预测系数 表示。 然后将这组预测系数 at转换成线' i普频率系数 LSF , 并对其 进行多级矢量量化, 矢量量化选择最佳的失真度量准则 (如最近邻准则), 搜索计算出各 级码本的码字索引, 以此可确定预测系数对应的码宇, 将码字索引作为边信息输出。 同时, 对残差序列 进行量化和熵编码。 由线性预测分析编码原理可知, 谱系数的残差序列的 动态范围小于原始 系数的动态范围, 因此在量化时可以分配较少的比特数, 或者对于相 同比特数的条件, 可以获得改进的编码增益。 Thus, after linear prediction error filtering, the frequency domain coefficient X (k) of the time-frequency transform output can be represented by the residual sequence E ί3⁄4 and a set of prediction coefficients. And then converting the set of prediction coefficients a t into a line 'i Pu frequency coefficient LSF and Perform multi-level vector quantization, vector quantization selects the best distortion metric (such as nearest neighbor criterion), and searches and calculates the codeword index of each codebook, so as to determine the code corresponding to the prediction coefficient, and use the codeword index as Side information output. At the same time, the residual sequence is quantized and entropy encoded. According to the linear prediction analysis coding principle, the dynamic range of the residual sequence of the spectral coefficients is smaller than the dynamic range of the original coefficients, so that fewer bits can be allocated in the quantization, or an improved coding gain can be obtained for the same number of bits. .

图 10是解码装置的实施例一的示意图, 该解码装置在图 8所示解码装置的基础上, 增加了逆频域线性预测及矢量量化模块 65 , 该逆频域裁性预测及矢量量化模块 65位于逆 量化器組 62的输出与多分辨综合模块 63的输入之间, 并且比特流解复用模块 60向其输 出逆频域线性预测矢量量化控制信息, 用于对逆量化 f (残差讲)进行逆量化处理和逆线 性预测滤 -波, 得到预测前的谱, 并输出到多分辨率综合模块 63中。  10 is a schematic diagram of Embodiment 1 of a decoding apparatus. The decoding apparatus adds an inverse frequency domain linear prediction and vector quantization module 65 based on the decoding apparatus shown in FIG. 8, and the inverse frequency domain cropping prediction and vector quantization module 65 is located between the output of the inverse quantizer group 62 and the input of the multiresolution synthesis module 63, and the bitstream demultiplexing module 60 outputs inverse frequency domain linear predictive vector quantization control information thereto for inverse quantization f (residual The inverse quantization process and the inverse linear prediction filter-wave are performed to obtain a spectrum before prediction, and output to the multi-resolution synthesis module 63.

在编码器中,采用频域线性预测矢量量化技术来 p制预回声,并获得较大的编码增益。 因此在解码器中, 逆量化谱和比特流解复用模块 60 输出的逆频域线性预测矢量量化控制 信息输入到逆频域线性预测及矢量量化模块 65中恢 出线性预测前的谱。  In the encoder, frequency domain linear predictive vector quantization techniques are used to p-pre-echo and obtain a larger coding gain. Therefore, in the decoder, the inverse frequency domain linear predictive vector quantization control information output by the inverse quantization spectrum and bit stream demultiplexing module 60 is input to the inverse frequency domain linear prediction and vector quantization module 65 to recover the spectrum before the linear prediction.

逆频域线性预测及矢量量化模块 65 包括逆矢量量化器、 逆转换器和逆线性预测滤波 器, 其中逆矢量量化器用于对码字索引进行逆量化得至 J线语对频率系数( LSF ); 逆转换器 则用于将线谱频率系数(LSF )逆转换为预测系数; 逆线性预测滤波器用于 居预测系数 对逆量化语进行逆滤波, 得到预测前的谱, 并输出到多分辨率综合模块 63。  The inverse frequency domain linear prediction and vector quantization module 65 includes an inverse vector quantizer, an inverse transformer, and an inverse linear predictor, wherein the inverse vector quantizer is used to inverse quantize the codeword index to the J line pair frequency coefficient (LSF). The inverse converter is used to inversely convert the line spectral frequency coefficient (LSF) into a prediction coefficient; the inverse linear prediction filter is used to inversely filter the inverse quantized word by the prediction coefficient, obtain the spectrum before prediction, and output to the multi-resolution Synthesis module 63.

基于图 10所示解码装置的解码方法与基于图 8所示解码装置的解码方法基本相同, 区别在于增加了下述步骤: 在得到逆量化语后, 判断控制信息中是否包含逆量化谱需要经 过逆频域线性预测矢量量化的信息, 如果含有, 则进行逆矢量量化处理, 得到预测系数, 并根据预测系数对逆量化谱进行线性预测合成, 得到 测前的谱; 将预测前的傳进行多分 辨率综合。  The decoding method based on the decoding device shown in FIG. 10 is basically the same as the decoding method based on the decoding device shown in FIG. 8, except that the following steps are added: After obtaining the inverse quantization word, it is determined whether or not the inverse quantization spectrum is included in the control information. Inverse frequency domain linear predictive vector quantization information, if it is, then inverse vector quantization process is performed to obtain prediction coefficients, and linear prediction synthesis is performed on the inverse quantization spectrum according to the prediction coefficients to obtain a pre-test spectrum; Resolution synthesis.

在获得逆量化语后, 根据控制信息判断该帧信号是否经过频域线性预测矢量量化, 如 果是, 则从控制信息中获取预测系数矢量量化后的码字索引; 再根据码字索引得到量化的 线谱频率系数(LSF ), 并以此计算出预测系数; 然后 夺逆量化谱进行线性预测合成, 得到 预测前的谱。 线性预测误差滤波处理所采用的传递函数 Α ω^ '. Α(ζ) = 1 - ^ ,.ζ— '' , 其中: 是预  After obtaining the inverse quantized language, determining, according to the control information, whether the frame signal is subjected to frequency domain linear predictive vector quantization, and if so, obtaining a codeword index of the predicted coefficient vector quantization from the control information; and then obtaining the quantized according to the codeword index The line frequency coefficient (LSF) is used to calculate the prediction coefficient; then the inverse quantization spectrum is used for linear prediction synthesis to obtain the spectrum before prediction. The transfer function used in the linear prediction error filtering process Α ω^ '. Α(ζ) = 1 - ^ ,.ζ— '' , where: is pre

(=1  (=1

测系数; 为预测阶数。 因此残差序列 与预则前的谱 0"满足: Xik) = E(k) . -i- = EW + X atX(k― i)。 这样, 残差序列 和计算出的预测系数 α,.经过频域线性预测合成, 就可得到预测前的语 Measuring coefficient; for predicting the order. Therefore, the residual sequence and the pre-predicted spectrum 0" are satisfied: Xik) = E(k) . -i- = EW + X a t X(k― i). Thus, the residual sequence and the calculated prediction coefficient α, after frequency domain linear prediction synthesis, can obtain the pre-predictive language.

X(k) , 将预测前的谱 0 >进行频率 -时间映射处理。 X(k), the pre-predicted spectrum 0 > is subjected to frequency-time mapping processing.

如果控制信息表明该信号帧没有经过频域线性预测矢量量化, 则不进行逆频域线性预 测矢量量化处理, 将逆量化镨直接进行频率-时间映射处理。  If the control information indicates that the signal frame has not undergone frequency domain linear predictive vector quantization, the inverse frequency domain linear predictive vector quantization process is not performed, and the inverse quantization 镨 is directly subjected to frequency-time mapping processing.

图 11给出了本发明编码装置的第二个实施例的结构示意图。 该实施例在图 5 的基础 上增加了和差立体声 (M/S )编码模块 57 , 该模块位于多分辨率分析模块 53的输出与量化 和熵编码模块 54的输入之间。 对于多声道信号, 心理声学分析模块 51除了计算音频信号 单声道的掩蔽阈值, 还要计算和差声道的掩蔽阈值, 输出到量化和熵编码模块 54。 和差立 体声编码模块 57还可以位于量化和熵编码模块 54中的量化器组与编码器之间。 Figure 11 is a block diagram showing the structure of a second embodiment of the encoding apparatus of the present invention. This embodiment adds a difference stereo (M/S) encoding module 57 on the basis of FIG. 5 between the output of the multiresolution analysis module 53 and the input of the quantization and entropy encoding module 54. For multi-channel signal, a psychoacoustic analysis module 51 in addition to a monaural audio signal is calculated masking threshold value, and further calculates a masking threshold difference channel output to quantization and entropy encoding module 54. The and difference stereo encoding module 57 may also be located between the quantizer group and the encoder in the quantization and entropy encoding module 54.

和差立体声编码模块 57 是利用声道对中两个声道之间的相关性, 将左右声道的频域 系数 /残差序列等效为和差声道的频域系数 /残差序列, 以此达到减少码率和提高编码效率 的效果, 因此只适用于信号类型一致的多声道信号。 如果是单声道信号或者信号类型不一 致的多声道信号, 则不进行和差立体声编码处理。  And the difference stereo encoding module 57 is to use the correlation between the two channels of the channel pair, and the frequency domain coefficient/residual sequence of the left and right channels is equivalent to the frequency domain coefficient/residual sequence of the difference channel, In this way, the effect of reducing the code rate and improving the coding efficiency is achieved, so that it is only applicable to multi-channel signals of the same signal type. If it is a mono signal or a multi-channel signal of which the signal type is not uniform, the difference and the stereo encoding process are not performed.

基于图 11所示编码装置的编码方法与基于图 5所示编码装置的编码方法基;^目同, 区别在于增加了下述步驟: 在对频域系数进行量化和燏編码处理之前, 判断音频信号是否 为多声道信号, 如果是多声道信号, 则判断左、 右声道信号的信号类型是否一致, 如果信 号类型一致, 则判断两声道对应的尺度因子带之间是否满足和差立体声编码条件, 如果满 足, 则对其进行和差立体声编码, 得到和差声道的频域系数; 如果不满足, 则不进行和差 立体声编码; 如果是单声道信号或信号类型不一致的多声道信号, 则对频域系数不进行处 理。  The encoding method based on the encoding apparatus shown in FIG. 11 is the same as the encoding method based on the encoding apparatus shown in FIG. 5, and the difference is that the following steps are added: before the quantization and encoding processing of the frequency domain coefficients are performed, Whether the audio signal is a multi-channel signal, if it is a multi-channel signal, it is determined whether the signal types of the left and right channel signals are consistent, and if the signal types are the same, it is determined whether the scale factor bands corresponding to the two channels are satisfied and The difference stereo coding condition, if it is satisfied, is subjected to the sum and the difference stereo coding to obtain the frequency domain coefficients of the difference channel; if not, the difference and the stereo coding are not performed; if the mono signal or the signal type is inconsistent For multi-channel signals, the frequency domain coefficients are not processed.

和差立体声编码除了可以应用在量化处理之前,还可以应用在量化之后、熵编码之前, 即: 在对频域系数量化后, 判断音频信号是否为多声道信号, 如果是多声道信号, 则判断 左、 右声道信号的信号类型是否一致, 如果信号类型一致, 则判断两声道对应的尺度因子 带之间是否满足和差立体声编码条件, 如果满足, 则对其进行和差立体声编码; 如果不满 足, 则不进行和差立体声编码处理; 如果是单声道信号或信号类型不一致的多声道信号, 则对频域系数不进^ "和差立体声编码处理。  And the difference stereo coding can be applied before quantization and before entropy coding, that is, after quantization of the frequency domain coefficients, whether the audio signal is a multi-channel signal, and if it is a multi-channel signal, Then, it is judged whether the signal types of the left and right channel signals are consistent. If the signal types are the same, it is judged whether the scale factor bands corresponding to the two channels satisfy the difference and the stereo encoding conditions, and if they are satisfied, the stereo coding is performed. If not, the stereo encoding process is not performed; if it is a mono signal or a multi-channel signal with inconsistent signal types, the frequency domain coefficients are not processed and the differential stereo encoding process is performed.

判断尺度因子带是否可进行和差立体声编码的方法 艮多, 本发明采用的判断方法是: 通过 K- L变换。 具体判断过程如下: 假如左声道尺度因子带的谱系数为 l (k) , 右声道相对应的尺度因子带的谱系数为 r (k) , 其相关矩阵There are many methods for judging whether the scale factor band can be performed and differential stereo coding. The judgment method adopted by the present invention is: by K-L transform. The specific judgment process is as follows: If the spectral coefficient of the left channel scale factor band is l (k), the spectral coefficient of the scale factor band corresponding to the right channel is r (k), and its correlation matrix

Figure imgf000024_0001
Figure imgf000024_0001

1  1

C„. =丄∑r(W * r(k); 是尺度因子带的谱线数目 对相关矩阵 C进行 -Z变换, 得到  C„. =丄∑r(W * r(k); is the number of spectral lines of the scale factor band. The -Z transformation is performed on the correlation matrix C to obtain

0、 cos a - sm a  0, cos a - sm a

RCR' = A 其中, R  RCR' = A where R

0 λ... sin a cos a 0 λ... sin a cos a

Figure imgf000024_0003
旋转角度 a满足 tan(2iz) 就是和差立体声编码模式。 因此
Figure imgf000024_0002
Figure imgf000024_0003
The rotation angle a satisfies tan(2iz) and the difference stereo coding mode. therefore
Figure imgf000024_0002

当旋转角度 a的绝对值偏离 ττ/4较小时, 比如3 /16 < |^ < 5 /16 , 对应的尺度因子带可以 进行和差立体声编码。 . When the absolute value of the rotation angle a deviates from ττ/4, such as 3 /16 < |^ < 5 /16 , the corresponding scale factor band can be subjected to and differential stereo coding. .

如果和差立体声编码应用在量化处理之前 , 则将左右声道在尺度因子带的频域系数通 过线性变换用和差声道的频域系数代替:

Figure imgf000024_0004
If the sum and difference stereo coding are applied before the quantization process, the frequency domain coefficients of the left and right channels in the scale factor band are replaced by the frequency domain coefficients of the linear transform and the difference channel:
Figure imgf000024_0004

其中, Μ表示和声道频域系数; S表示差声道频域系数; L表示左声道频域系数; R表示为 右声道频域系数。 Where Μ denotes the channel frequency domain coefficient; S denotes the difference channel frequency domain coefficient; L denotes the left channel frequency domain coefficient; R denotes the right channel frequency domain coefficient.

如果和差立体声编码应用在量化之后, 则左右声道在尺度因千带的量化后的频域系数 通过线性变换用和差声道的频域系数代替:

Figure imgf000024_0005
If the sum and difference stereo coding are applied after quantization, the quantized frequency domain coefficients of the left and right channels at the scale due to the thousands of bands are replaced by the frequency domain coefficients of the linear transform and the difference channel:
Figure imgf000024_0005

其中: /表示量化后的和声道频域系数; 表示量化后的差声道频域系数; 表示量化后 的左声道频域系数; A表示量化后的右声道频域系数。 Where: / represents the quantized sum channel frequency domain coefficients; represents the quantized difference channel frequency domain coefficients; represents the quantized left channel frequency domain coefficients; A represents the quantized right channel frequency domain coefficients.

将和差立体声编码放在量化处理之后, 不仅可以有效的去除左右声道的相关, 而 且由于在量化后进行, 因此可以达到无损编码。  By placing the sum and difference stereo coding after the quantization process, not only the correlation of the left and right channels can be effectively removed, but also since the quantization is performed, lossless coding can be achieved.

图 12是解码装置的实施例二的示意图。 该解码装置在图 8所示的解码装置的基础上, 增加了和差立体声解码模块 66,位于逆量化器组 62的输出与多分辦率综合模块 63的输入 之间, 接收比特流解复用模块 60输出的信号类型分析结果与和差立体声控制信号, 用于 根据上述控制信息将和差声道的逆量化谱转换成左右声道的逆量化潘。 Figure 12 is a schematic diagram of a second embodiment of a decoding apparatus. The decoding apparatus adds a difference and difference stereo decoding module 66, based on the decoding apparatus shown in FIG. 8, between the output of the inverse quantizer group 62 and the input of the multi-dividend ratio synthesizing module 63, and receives the bit stream demultiplexing. The signal type analysis result and the difference stereo control signal output by the module 60 are used for The inverse quantized spectrum of the difference channel and the difference channel are converted into inverse quantization pans of the left and right channels according to the above control information.

在和差立体声控制信号中, 有一个标志位用于表明当前声道对是否需要和差立体声解 码, 若需要, 则在每个尺度因子带上也有一个标志位表明对应尺度因子带是否需要和差立 体声解码, 和差立体声解码模块 66 根据尺度因子带的标志位, 确定是否需要对某些尺度 因子带中的逆量化侮进行和差立体声解码。 如果在编码装置中进行了和差立体声编码, 则 在解码装置中必须对逆量化语进行和差立体声解码。  In the sum and difference stereo control signals, there is a flag to indicate whether the current channel pair needs to be and the difference stereo decoding. If necessary, there is also a flag on each scale factor band indicating whether the corresponding scale factor band needs and is poor. The stereo decoding, and difference stereo decoding module 66 determines whether inverse quantization 侮 and differential stereo decoding are required in certain scale factor bands based on the flag bits of the scale factor band. If the sum and the difference stereo coding are performed in the encoding device, the inverse quantized speech must be subjected to the difference stereo decoding in the decoding device.

和差立体声解码模块 66还可以位于嫡解码模块 61的输出与逆量化器組 6 2的输入之 间, 接收比特流解复用模块 60输出的和差立体声控制信号和信号类型分析结 。  The sum and difference stereo decoding module 66 may also be located between the output of the demodulation module 61 and the input of the inverse quantizer group 62, and receive the sum and difference stereo control signals and signal type analysis outputs output by the bitstream demultiplexing module 60.

基于图 12所示解码装置的解码方法基本与基于图 8所示解码装置的解码方法相同, 区别在于增加了下述步骤:在得到逆量化谱后,如果信号类型分析结果表明信" "类型一致, 则根据和差立体声控制信号判断是否需要对逆量化语进行和差立体声解码; 如果需要, 则 根据每个尺度因子带上的标志位判断该尺度因子带是否需要和差立体声解码, ^。果需要, 则将该尺度因子带中的和差声道的逆量化谱转换成左右声道的逆量化谱, 再进行后续处 理; 如果信号类型不一致或者不需要进行和差立体声解码, 则对逆量化谱不进 4亍处理, 直 接进行后续处理。  The decoding method based on the decoding apparatus shown in FIG. 12 is basically the same as the decoding method based on the decoding apparatus shown in FIG. 8, except that the following steps are added: after the inverse quantization spectrum is obtained, if the signal type analysis result indicates that the letter type is consistent And determining whether it is necessary to perform inverse stereo decoding on the inverse quantized signal according to the difference stereo control signal; if necessary, determining whether the scale factor band requires and difference stereo decoding according to the flag bit on each scale factor band, ^. If necessary, the inverse quantized spectrum of the difference channel in the scale factor band is converted into the inverse quantized spectrum of the left and right channels, and then subjected to subsequent processing; if the signal type is inconsistent or does not need to perform and differential stereo decoding, then The quantized spectrum is processed without further processing.

和差立体声解码还可以在熵解码处理之后、 逆量化处理之前进行, 即: 当将到谱的量 化值后, 如果信号类型分析结果表明信号类型一致, 则根据和差立体声控制信号判断是否 需要对谱的量化值进行和差立体声解码; 如果需要, 则根据每个尺度因子带上的标志位判 断该尺度因子带是否需要和差立体声解码, 如果需要, 则将该尺度因子带中的和差声道的 傳的量化值转换成左右声道的谱的量化值, 再进行后续处理; 如果信号类型不一致或者不 需要进行和差立体声解码, 则对谱的量化值不进行处理, 直接进行后续处理。  And the difference stereo decoding can also be performed after the entropy decoding process and before the inverse quantization process, that is, after the quantized value of the spectrum, if the signal type analysis result indicates that the signal types are consistent, it is determined according to the difference and the stereo control signal whether it is necessary to The quantized value of the spectrum is subjected to and the difference stereo decoding; if necessary, the flag bit on each scale factor band is used to determine whether the scale factor band requires and the difference stereo decoding, and if necessary, the sum of the scale factor bands The quantized value of the channel is converted into the quantized value of the spectrum of the left and right channels, and then subjected to subsequent processing; if the signal type is inconsistent or does not need to be subjected to and the difference stereo decoding, the quantized value of the spectrum is not processed, and subsequent processing is directly performed.

如果和差立体声解码在熵解码之后、 逆量化之前, 则左右声道在尺度因子带的频域系 数采用下列运算通过和差声道的频域系数得到: , 其中: 表示量化后的 If the sum and difference stereo decoding are after entropy decoding and before inverse quantization, then the frequency domain coefficients of the left and right channels in the scale factor band are obtained by the following operations through the frequency domain coefficients of the difference channel: , where: represents the quantized

Figure imgf000025_0001
和声道频域系数; S表示量化后的差声道频域系数; Z表示量化后的左声道频域系数; f表 示量化后的右声道频域系数。
Figure imgf000025_0001
And the channel frequency domain coefficient; S represents the quantized difference channel frequency domain coefficient; Z represents the quantized left channel frequency domain coefficient; f represents the quantized right channel frequency domain coefficient.

如果和差立体声解码在逆量化之后, 则左右声道在子带的逆量化后的频威系数根据下 面的矩阵运算通过和差声道的频域系数得到 , 其中: 表 和声道频域 If the inverse stereo decoding is after inverse quantization, the frequency coefficients of the left and right channels after inverse quantization in the subband are obtained according to the following matrix operation and the frequency domain coefficients of the difference channel, where: Table and channel frequency domain

Figure imgf000025_0002
Figure imgf000025_0002

系数; s表示差声道频域系数; 7表示左声道频域系数; "表示右声道频域系数。 图 13给出了本发明编码装置的第三个实施例的结构示意图。 该实施例在图 9 ^基础 上, 增加了和差立体声编码模块 57, 位于频域线性预测及矢量量化模块 56的输出与量化 和熵编码模块 54的输入之间, 心理声学分析模块 51将和差声道的掩蔽阐值输出到 4:化和 熵编码模块 54。 Coefficient; s represents the difference channel frequency domain coefficient; 7 represents the left channel frequency domain coefficient; "represents the right channel frequency domain coefficient. Fig. 13 is a view showing the configuration of a third embodiment of the encoding apparatus of the present invention. This embodiment, on the basis of Fig. 9, adds a difference and stereo encoding module 57 between the output of the frequency domain linear prediction and vector quantization module 56 and the input of the quantization and entropy encoding module 54, the psychoacoustic analysis module 51 The masking values of the difference channels are output to a 4: quantization and entropy encoding module 54.

和差立体声编码模块 57 也可以位于量化和熵编码模块 54 中的量化器组与编冯器之 间, 接收心理声学分析模块 51输出的信号类型分析结果。  The sum and difference stereo coding module 57 may also be located between the quantizer group and the scalar in the quantization and entropy coding module 54, and receive the signal type analysis result output by the psychoacoustic analysis module 51.

在本实施例中, 和差立体声编码模块 57的功能及工作原理与其在图 11中的相同, 此 处不再赘述。  In the present embodiment, the function and working principle of the and difference stereo coding module 57 are the same as those in FIG. 11, and details are not described herein again.

基于图 13所示编码装置的编码方法与基于图 9所示编码装置的编码方法基本才目同, 区别在于增加了下述步驟: 在对频域系数进行量化和熵编码处理之前, 判断音频信号是否 为多声道信号, 如果是多声道信号, 则判断左、 右声道信号的信号类型是否一致, 如果信 号类型一致, 则判断尺度因子带是否满足编码条件, 如果满足, 则对该尺度因子带进行和 差立体声编码; 如果不满足, 则不进行和差立体声编码处理; 如果是单声道信号或信号类 型不一致的多声道信号 , 则不进行和差立体声编码处理。  The encoding method based on the encoding apparatus shown in FIG. 13 is basically the same as the encoding method based on the encoding apparatus shown in FIG. 9, except that the following steps are added: Before the quantization and entropy encoding processing of the frequency domain coefficients, the audio signal is judged. Whether it is a multi-channel signal, if it is a multi-channel signal, it is judged whether the signal types of the left and right channel signals are consistent. If the signal types are the same, it is judged whether the scale factor band satisfies the coding condition, and if it is satisfied, the scale is The factor band performs and the difference stereo encoding; if not, the sum and difference stereo encoding processing is not performed; if it is a mono signal or a multi-channel signal of which the signal type is inconsistent, the difference stereo encoding processing is not performed.

和差立体声编码除了可以应用在量化处理之前,还可以应用在量化之后、熵编码之前, 即: 在对频域系数量化后, 判断音频信号是否为多声道信号, 如果是多声道信号, 则判断 左、 右声道信号的信号类型是否一致, 如果信号类型一致, 则判断尺度因子带是否满足编 码条件, 如果满足, 则对该尺度因子带进行和差立体声编码; 如果不满足, 则不进行和差 立体声编码处理; 如果是单声道信号或信号类型不一致的多声道信号, 则对不进行和差立 体声编码处理。  And the difference stereo coding can be applied before quantization and before entropy coding, that is, after quantization of the frequency domain coefficients, whether the audio signal is a multi-channel signal, and if it is a multi-channel signal, And determining whether the signal types of the left and right channel signals are consistent. If the signal types are consistent, determining whether the scale factor band satisfies the coding condition, and if so, performing the difference stereo coding on the scale factor band; if not, not Performing and difference stereo encoding processing; if it is a mono signal or a multi-channel signal of which the signal type is inconsistent, the pairing and the difference stereo encoding processing are performed.

图 14是解码装置的实施例三的结构图。 该解码装置在图 10所示解码装置的基石出上, 增加了和差立体声解码模块 66 , 位于逆量化器组 62的输出与逆频域线性预测及矢量量化 模块 65的输入之间, 比特流解复用模块 60向其输出和差立体声控制信号。  Figure 14 is a block diagram showing a third embodiment of the decoding apparatus. The decoding apparatus adds a sum and difference stereo decoding module 66 on the basis of the decoding apparatus shown in Fig. 10, between the output of the inverse quantizer group 62 and the input of the inverse frequency domain linear prediction and vector quantization module 65, the bit stream The demultiplexing module 60 outputs and the difference stereo control signals thereto.

和差立体声解码模块 66也可以位于熵解码模块 61的输出与逆量化器组 62的输入之 间, 接收比特流解复用模块 60输出的和差立体声控制信号。  The sum and difference stereo decoding module 66 may also be located between the output of the entropy decoding module 61 and the input of the inverse quantizer group 62, and receive the sum and difference stereo control signals output by the bit stream demultiplexing module 60.

在本实施例中, 和差立体声解码模块 66的功能及工作原理与其在图 10中的相同, 此 处不再赘述。  In this embodiment, the function and working principle of the and difference stereo decoding module 66 are the same as those in FIG. 10, and details are not described herein again.

基于图 14所示解码装置的解码方法与基于图 10所示解码装置的解码方法基本 目同, 区別在于增加了下述步驟:在得到逆量化潘后,如果信号类型分析结果表明信号 型一致, 则根据和差立体声控制信号判断是否需要对逆量化谱进行和差立体声解码; 如果需要, 则 根据每个尺度因子带上的标志位判断该尺度因子带是否需要和差立体声解码, 如果需要, 则将该尺度因子带中的和差声道的逆量化谱转换成左右芦道的逆量化谱, 再进行后续处 理; 如果信号类型不一致或者不需要进行和差立体声解码, 则对逆量化谱不进行处理, 直 接进行后续处理。 The decoding method based on the decoding device shown in FIG. 14 is basically the same as the decoding method based on the decoding device shown in FIG. 10, and the difference is that the following steps are added: after the inverse quantization pan is obtained, if the signal type analysis result indicates that the signal type is consistent, Then, according to the difference stereo control signal, it is judged whether it is necessary to perform inverse stereo quantization and differential stereo decoding; if necessary, Determining whether the scale factor band requires and difference stereo decoding according to the flag bit on each scale factor band, if necessary, converting the inverse quantization spectrum of the difference channel in the scale factor band into the inverse quantization spectrum of the left and right aisles Then, the subsequent processing is performed; if the signal types are inconsistent or the difference stereo decoding is not required, the inverse quantization spectrum is not processed, and the subsequent processing is directly performed.

和差立体声解码还可以在逆量化处理之前进行, 即: 当得到镨的量化值后, 如果信号 类型分析结果表明信号类型一致, 则根据和差立体声控制 ft号判断是否需要对谱的量化值 进行和差立体声解码; 如果需要, 则根据每个尺度因子带 的标志位判断该尺度因子带是 否需要和差立体声解码, 如果需要, 则将该尺度因子带中 和差声道的谱的量化值转换成 左右声道的谱的量化值, 再进行后续处理; 如果信号类型 ^一致或者不需要进行和差立体 声解码, 则对谱的量化值不进行处理, 直接进行后续处理 '  And the difference stereo decoding can also be performed before the inverse quantization process, that is: after obtaining the quantized value of 镨, if the signal type analysis result indicates that the signal type is consistent, it is determined according to the difference and the stereo control ft number whether the quantized value of the spectrum needs to be performed. And difference stereo decoding; if necessary, determining whether the scale factor band requires and difference stereo decoding according to the flag bit of each scale factor band, and if necessary, converting the quantized value of the spectrum of the scale factor band and the difference channel The quantized values of the spectrum of the left and right channels are subjected to subsequent processing; if the signal type is consistent or no difference stereo decoding is required, the quantized values of the spectrum are not processed, and the subsequent processing is directly performed'

图 15给出了本发明编码装置的第四个实施例的示意围。 本实施例是在图 5所示编码 装置的基础上, 增加了重采样模块 590和频带扩展模块 591 , 其中重采样模块 590对输入 音频信号进行重采样, 改变音频信号的采样率, 再将改变了采样率的音频信号输出到信号 性质分析模块 50; 频带扩展模块 591用于将输入的音频 ^号在整个频带上进行分析, 提取 高频部分的谱包络及其与低频部分产生关系的特性, 并¾"出到比特流复用模块 55。  Fig. 15 shows a schematic representation of a fourth embodiment of the encoding device of the present invention. In this embodiment, based on the encoding apparatus shown in FIG. 5, a resampling module 590 and a band extending module 591 are added, wherein the resampling module 590 resamples the input audio signal, changes the sampling rate of the audio signal, and then changes The sampling rate audio signal is output to the signal property analysis module 50; the band extension module 591 is configured to analyze the input audio signal over the entire frequency band, and extract the spectral envelope of the high frequency portion and its relationship with the low frequency portion. , and 3⁄4" out to the bit stream multiplexing module 55.

重采样模块 590用于对输入音频信号进行重采样, t采样包括上采样和下采样两种, 下面以下采样为例说明重采样。 在本实施例中, 重采样 块 590包括低通滤波器和下采样 器, 其中低通滤波器用于限制音频信号的频带, 消除下 ^样可能引起的混叠。 输入的音频 信号经过低通滤波后, 进行下采样。 假设输入的音频信^ "为 s (n) , 经过脉冲响应为 1φ) 的低通滤波器滤波后的输出为 v (nj, 则有 1 ) = ^(» (« - 对 进行 M倍的下采 样后的序列为 x (n)、则 x( ) 。 这样,重釆样后的音频信号 x (n)

Figure imgf000027_0001
The resampling module 590 is configured to resample the input audio signal, and the t sampling includes both upsampling and downsampling. The following sampling is used as an example to illustrate resampling. In the present embodiment, the resampling block 590 includes a low pass filter and a down sampler, wherein the low pass filter is used to limit the frequency band of the audio signal, eliminating aliasing that may be caused by the sample. The input audio signal is low-pass filtered and downsampled. Assume that the input audio signal is "s (n), the impulse response is 1φ). The output of the low-pass filter is v (nj, then there is 1) = ^(» (« - for M times The sampled sequence is x (n), then x( ). Thus, the re-sampled audio signal x (n)
Figure imgf000027_0001

的采样率就比原始输入的音频信号 的釆样率降低了 1 倍。 The sampling rate is reduced by a factor of 1 compared to the original input audio signal.

原始输入音频信号输入到频带扩展模块 591后, 在整个频带上进行分析, 提取出高频 部分的谱包络及其与低频部分产生关系的特性, 作为频 ^扩展控制信息输出到比特流复用 模块 55。  After the original input audio signal is input to the band extension module 591, the analysis is performed on the entire frequency band, and the spectral envelope of the high-frequency portion and its characteristics related to the low-frequency portion are extracted, and output as the frequency extension control information to the bit stream multiplexing. Module 55.

频带扩展的基本原理是: 对于大多数音频信号, 其高频部分的特性与低频部分的特性 存在很强的相关性, 因此音频信号的高频部分可以通过 低频部分有效地重构出来,这样, 音频信号的高频部分可以不传输。 为确保高频部分能够 确的重构, 在压缩音频码流中仅 传输少量的频带扩展控制信号就可以了。 The basic principle of frequency band expansion is: For most audio signals, the characteristics of the high frequency part have a strong correlation with the characteristics of the low frequency part, so the high frequency part of the audio signal can be effectively reconstructed by the low frequency part, thus, The high frequency portion of the audio signal may not be transmitted. In order to ensure that the high frequency part can be reconstructed correctly, only the compressed audio stream is compressed. It is sufficient to transmit a small number of band extension control signals.

频带扩展模块 591包括参数提取模块和语包络提取模块, 输入信号进入参数提取模块 中, 提取在不同时频区域表示输入信号谱特性的参数, 然后在谱包络提取模块中, 以一定 的时频分辨率估计信号高频部分的谱包络。 .为了确保时频分辨率最适合于当前输入信号的 特性, 谱包络的时频分辨率可以自由选择。 输入信号谱特性的参数和高频部分的语包络作 为频带扩展的控制信号输出到比特流复用模块 55中复用。  The band expansion module 591 includes a parameter extraction module and a language envelope extraction module, and the input signal enters the parameter extraction module to extract parameters indicating spectral characteristics of the input signal in different time-frequency regions, and then in the spectral envelope extraction module, at a certain time The frequency resolution estimates the spectral envelope of the high frequency portion of the signal. To ensure that the time-frequency resolution is best suited to the characteristics of the current input signal, the time-frequency resolution of the spectral envelope is freely selectable. The parameters of the input signal spectrum characteristics and the speech envelope of the high frequency portion are output as a band extension control signal to the bit stream multiplexing module 55 for multiplexing.

比特流复用模块 55收到量化和熵编码模块 54输出的包括公共尺度因子、 尺度因子编 码值、 码书序号编码值和无损编码量化谱的码流或者是码字索引的编码值以及频带扩展模 块 591输出的频带扩展控制信号后, 对其进行复用, 得到压缩音频数据流。  The bitstream multiplexing module 55 receives the code stream including the common scale factor, the scale factor coded value, the codebook sequence number coded value, and the lossless coded quantized spectrum or the coded value of the codeword index and the band extension output by the quantization and entropy coding module 54. After the band extension control signal output by the module 591 is multiplexed, a compressed audio data stream is obtained.

基于图 15 所示编码装置的编码方法, 具体包括: 在整个频带上分析输入音频信号, 提取其高频谱包络和信号谱特性参数作为频带扩展控制信号; 对输入音频信号进行重采样 处理和信号类型分析; 计算重采样后信号的信掩比; 对重采样后的信号进行时频映射, 获 得音频信号的频域系数; 对频域系数进行量化和熵编码; 将频带扩展控制信号和编码后的 音频码流进行复用, 得到压缩音频码流。 其中重采样处理包括两个步骤: 限制音频信号的 频带; 对限制频带的音频信号进行多倍的下采样。  The encoding method based on the encoding apparatus shown in FIG. 15 specifically includes: analyzing an input audio signal over the entire frequency band, extracting a high spectral envelope and a signal spectral characteristic parameter as a frequency band extension control signal; resampling the input audio signal and a signal Type analysis; calculating the signal-to-mask ratio of the resampled signal; performing time-frequency mapping on the resampled signal to obtain a frequency domain coefficient of the audio signal; performing quantization and entropy coding on the frequency domain coefficient; and extending the frequency band control signal and coding The audio stream is multiplexed to obtain a compressed audio stream. The resampling process includes two steps: limiting the frequency band of the audio signal; and down-sampling the audio signal of the limited band.

图 16是解码装置的实施例四的结构示意图, 该实施例是在图 8所示解码装置的基础 上增加了频带扩展模块 68 , 接收比特流解复用模块 60 输出的频带扩展控制信息和频率- 时间映射模块 64 输出的低频段的时域音频信号, 通过频语搬移和高频调整重建高频信号 部分, 输出宽频带音频信号。  FIG. 16 is a schematic structural diagram of Embodiment 4 of the decoding apparatus. The embodiment is based on the decoding apparatus shown in FIG. 8. A band extension module 68 is added, and the band extension control information and frequency output by the bit stream demultiplexing module 60 are received. - The time domain audio signal of the low frequency band output by the time mapping module 64 is reconstructed by frequency shifting and high frequency adjustment to output a wideband audio signal.

基于图 16所示解码装置的解码方法, 基本与基于图 8所示解码装置的解码方法相同, 区別在于增加了下述步驟: 在获得时域音频信号后, 根据频带扩展控制信息和时域音频信 号, 重构音频信号的高频部分, 得到宽频带音频信号。  The decoding method based on the decoding apparatus shown in FIG. 16 is basically the same as the decoding method based on the decoding apparatus shown in FIG. 8, except that the following steps are added: After obtaining the time domain audio signal, the control information and the time domain audio are expanded according to the frequency band. The signal reconstructs the high frequency portion of the audio signal to obtain a wideband audio signal.

图 17、 1 和 21是编码装置的第五个至第七个实施例, 分别是在图 11、 图 9和图 13 所示编码装置的基础上, 增加了重采样模块 590和频带扩展模块 591 , 这两个模块与其他 模块的连接关系、 功能和原理均与其在图 15中的相同, 此处不再赘述。  Figures 17, 1 and 21 are fifth to seventh embodiments of the encoding apparatus, based on the encoding apparatus shown in Figures 11, 9, and 13, respectively, with the addition of the resampling module 590 and the band extending module 591. The connection relationship, functions, and principles of the two modules are the same as those in FIG. 15 and will not be described here.

图 18、 20和 22则是解码装置的第五个至第七个实施例, 分别是在图 12、 图 10和图 14所示解码装置的基础上, 增加了频带扩展模块 68 , 接收比特流解复用模块 60输出的频 带扩展控制信息和频率-时间映射模块 64输出的低频段的时域音频信号, 通过频谱搬移和 高频调整重建高频信号部分, 输出宽频带音频信号。  18, 20 and 22 are fifth to seventh embodiments of the decoding apparatus, respectively, based on the decoding apparatus shown in Figs. 12, 10 and 14, respectively, a band extension module 68 is added, and a bit stream is received. The band extension control information output by the demultiplexing module 60 and the time domain audio signal of the low frequency band output by the frequency-time mapping module 64 reconstruct the high frequency signal portion by spectrum shifting and high frequency adjustment, and output a wideband audio signal.

在上述编码装置的 7个实施例中, 还可以包括增益控制模块, 接收信号性质分析模块 50输出的音频信号, 控制快变类型信号的动态范围, 消除音频处 中的预回声,其输出连 接到时频映射模块 52和心理声学分析模块 51 , 同时将增益调整粮翰出到比特流复用模块 55。 In the seven embodiments of the foregoing encoding apparatus, the gain control module may further be included, and the signal property analysis module is received. 50 output audio signal, controlling the dynamic range of the fast-changing type signal, eliminating pre-echo in the audio, and outputting the output to the time-frequency mapping module 52 and the psychoacoustic analysis module 51, and simultaneously adjusting the gain to the bit stream Use module 55.

增益控制模块根据音频信号的信号类型, 只对快变类型信号进 4亍控制, 而对緩变类型 信号, 则不进行处理, 直接输出。 对于快变类型信号, 增益控制模块调整信号的时域能量 包络, 提升快变点前信号的增益值, 使得快变点前、 后的时域信号 度较为接近; 然后将 调整了时域能量包络的时域信号输出到时频映射模块 52, 同时将增 调整量输出到比特流 复用模块 55。  The gain control module only controls the fast-change type signal according to the signal type of the audio signal, and does not process and directly output the slowly-changing type signal. For fast-change type signals, the gain control module adjusts the time-domain energy envelope of the signal, and increases the gain value of the signal before the fast-changing point, so that the time-domain signal degrees before and after the fast-changing point are relatively close; then the time domain energy is adjusted. The time domain signal of the envelope is output to the time-frequency mapping module 52, and the increased adjustment amount is output to the bit stream multiplexing module 55.

基于该编码装置的编码方法基本与基于上述编码装置的编码方法相同, 区别在于增加 了下述步骤: 对经过信号类型分析的信号进行增益控制。  The encoding method based on the encoding device is basically the same as the encoding method based on the above encoding device, with the difference that the following steps are added: Gain control is performed on the signal subjected to signal type analysis.

在上述编码装置的 7 个实施例中, 还可以包括逆增益控制模 fe, 位于频率-时间映射 模块 64的输出之后, 接收比特流解复用模块 60输出的信号类型分 斤结果和增益调整量信 息, 用于调整时域信号的增益, 控制预回声。 逆增益控制模块接^ ^到频率-时间映射模块 64输出的重建时域信号后, 对快变类型信号进行控制, 而对緩变类^ H言号不进行处理。 对 于快变类型信号, 逆增益控制模块 4N¾增益调整量信息调整重建时 ^或信号的能量包络, 减 小快变点前信号的幅度值, 将能量包络调回原先的前低后高的状态, 这样快变点前的量化 噪声的幅度值会和信号的幅度值一起相应地减小, 从而控制了预回 。  In the seven embodiments of the foregoing encoding apparatus, an inverse gain control module fe may be further included, and after receiving the output of the frequency-time mapping module 64, the signal type and the gain adjustment amount output by the bit stream demultiplexing module 60 are received. Information, used to adjust the gain of the time domain signal, to control the pre-echo. After the inverse gain control module is connected to the reconstructed time domain signal outputted by the frequency-time mapping module 64, the fast-changing type signal is controlled, and the slow-changing type is not processed. For the fast-change type signal, the inverse gain control module 4N3⁄4 gain adjustment amount information adjusts the energy envelope of the reconstruction or the signal, reduces the amplitude value of the signal before the fast-changing point, and returns the energy envelope to the original front low and high State, such that the magnitude of the quantization noise before the fast change point is correspondingly reduced with the amplitude value of the signal, thereby controlling the pre-return.

基于该解码装置的解码方法基本与基于上述解码装置的解码方 相同, 区别在于增加 了下述步骤: 对重建时域信号进行逆增益控制。  The decoding method based on the decoding device is basically the same as the decoding method based on the above decoding device, with the difference that the following steps are added: inverse gain control is performed on the reconstructed time domain signal.

最后所应说明的是, 以上实施例仅用以说明本发明的技术方案 非限制, 尽管参照较 佳实施例对本发明进行了详细说明, 本领域的普通技术人员应当理解, 可以对本发明的技 术方案进行修改或者等同替换, 而不脱离本发明技术方案的精神和 围, 其均应涵盖在本 发明的权利要求范围当中。  It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and the present invention will be described in detail with reference to the preferred embodiments, and those skilled in the art should understand that the technical solutions of the present invention can be Modifications or equivalents are intended to be included within the scope of the appended claims.

Claims

权利要求 Rights request 1、 一种增强音频编码装置, 包括心理声学分析模块、 时频映射模块、 量化和熵编 码模块以及比特流复用模块,其特征在于,还包括信号性质分析模块和多分辨專分析模块; 其中所述信号性质分析模块,用于对输入音频信号进行类型分析,并输出到声斤述心理声学 分析模块和所述时频映射模块,同时将音频信号的类型分析结果输出到所迷 匕特流复用模 块; An enhanced audio coding apparatus, comprising a psychoacoustic analysis module, a time-frequency mapping module, a quantization and entropy coding module, and a bitstream multiplexing module, further comprising a signal property analysis module and a multi-resolution analysis module; The signal property analysis module is configured to perform type analysis on the input audio signal, and output the signal to the psychoacoustic analysis module and the time-frequency mapping module, and output the type analysis result of the audio signal to the lost stream. Multiplexing module 所述心理声学分析模块, 用于计算音频信号的掩蔽阔值和信掩比, 并输出到所述量化 和熵编码模块;  The psychoacoustic analysis module is configured to calculate a masking threshold and a signal mask ratio of the audio signal, and output to the quantization and entropy encoding module; 所述时频映射模块,用于将时域音频信号转变成频域系数,并输出到多: ^辨率分析模 块;  The time-frequency mapping module is configured to convert a time domain audio signal into a frequency domain coefficient, and output the data to: a resolution analysis module; 所述多分辨率分析模块, 用于根据所述信号性质分析模块输出的信号类 分析结果, 对快变类型信号的频域系数进行多分辨率分析, 并输出到量化和熵编码模块;  The multi-resolution analysis module is configured to perform multi-resolution analysis on the frequency domain coefficients of the fast-changing type signal according to the signal class analysis result output by the signal property analysis module, and output the same to the quantization and entropy coding module; 所述量化和熵编码模块, 在所述心理声学分析模块输出的信掩比的控制下, 用于对频 域系数进行量化和熵编码, 并输出到所述比特流复用模块;  The quantization and entropy coding module is configured to quantize and entropy encode frequency domain coefficients under control of a mask ratio output by the psychoacoustic analysis module, and output the same to the bit stream multiplexing module; 所述比特流复用模块用于将接收到的数据进行复用, 形成音频编码码 u。  The bit stream multiplexing module is configured to multiplex the received data to form an audio code u. 2、 根据权利要求 1 所述的增强音频编码装置, 其特征在于, 所述多 辨率分析模 块包括频域系数变换模块和重组模块, 其中所述频域系数变换模块用于将频域系数变换为 时频平面系数; 所述重组模块用于将时频平面系数按照一定的规则进行重组; 其中所述频 域系数变换模块是频域小波变换滤波器组或频域 MDCT变换滤波器组。  2. The enhanced audio encoding apparatus according to claim 1, wherein the multi-resolution analysis module comprises a frequency domain coefficient transform module and a recombination module, wherein the frequency domain coefficient transform module is configured to transform a frequency domain coefficient The time-frequency plane coefficient is used; the recombination module is configured to recombine the time-frequency plane coefficients according to a certain rule; wherein the frequency domain coefficient transform module is a frequency domain wavelet transform filter bank or a frequency domain MDCT transform filter bank. 3、 根据权利要求 1 所述的增强音频编码装置, 其特征在于, 还包括频域线性预测 及矢量量化模块, 位于所述多分辨率分析模块的输出与所述量化和熵编码 莫块的输入之 间; 所述频域线性预测及矢量量化模块具体由线性预测分析器、 线性预测滤波器、 转换器 和矢量量化器构成;  3. The enhanced audio encoding apparatus according to claim 1, further comprising a frequency domain linear prediction and vector quantization module, located at an output of said multiresolution analysis module and said input of said quantized and entropy encoded blocks The frequency domain linear prediction and vector quantization module is specifically composed of a linear prediction analyzer, a linear prediction filter, a converter, and a vector quantizer; 所述线性预测分析器, 用于对频域系数进行预测分析, 得到预测增益和 页测系数, 并 将满足一定奈件的频域系数输出到所述线性预测滤波器; 对于不满足条件的频域系数直接 输出到所述量化和熵编码模块;  The linear predictive analyzer is configured to perform predictive analysis on frequency domain coefficients, obtain prediction gain and page measurement coefficients, and output frequency domain coefficients satisfying a certain condition to the linear prediction filter; The domain coefficients are directly output to the quantization and entropy coding module; 所述线性预测滤波器, 用于对频域系数进行滤波, 得到残差序列, 并脊残差序列输出 到所述量化和熵编码模块, 将预测系数输出到转换器;  The linear prediction filter is configured to filter the frequency domain coefficients to obtain a residual sequence, and the ridge residual sequence is output to the quantization and entropy coding module, and the prediction coefficients are output to the converter; 所述转换器, 用于将预测系数转换成线谱对频率系数; 所述矢量量化器, 用于对线錯对频率系数进行多级矢量量化, 量化得到的有关边信息 被传送到所述比特流复用模块。 The converter is configured to convert a prediction coefficient into a line spectrum pair frequency coefficient; The vector quantizer is configured to perform multi-level vector quantization on the line error pair frequency coefficient, and the quantized related side information is transmitted to the bit stream multiplexing module. 4、 根据权利要求 1-3任一所述的增强音频编码装置, 其特征在于, 还包括和差立 体声编码模块, 位于所述频域线性预测及矢量量化模块的输出与所述量化和熵编码模块的 输入之间, 或者位于所述量化和熵编码模块中的量化器组与编码器之间; 所述信号性质分 析模块向其输出信号类型分析结果; 所述和差立体声编码模块用于将左右声道的残差序列 /频域系数转换为和差声道的残差序列 /频域系数。  The enhanced audio encoding apparatus according to any one of claims 1 to 3, further comprising a sum and difference stereo encoding module, the output of the frequency domain linear prediction and vector quantization module, and the quantization and entropy coding. Between the inputs of the module, or between the quantizer group and the encoder in the quantization and entropy coding module; the signal property analysis module outputs a signal type analysis result thereto; the sum and difference stereo coding module is used to The residual sequence/frequency domain coefficients of the left and right channels are converted to residual sequence/frequency domain coefficients of the difference channel. 5、 根据权利要求 1-4任一所述的增强音频编码装置, 其特征在于, 还包括重采样 模块和频带扩展模块;  The enhanced audio encoding apparatus according to any one of claims 1 to 4, further comprising a resampling module and a band extending module; 所述重采样模块, 用于对输入音频信号进行重采样, 改变音频信号的采样率, 并将改 变采样率后的音频信号输出到所述心理声学分析模块和所述信号性质分析模块; 具体包括 低通滤波器和下采样器; 其中所述低通滤波器用于限制音频信号的频带, 所述下采样器用 于对限制频带的音频信号进行下采样, 降低信号的采样率;  The resampling module is configured to resample the input audio signal, change the sampling rate of the audio signal, and output the audio signal after changing the sampling rate to the psychoacoustic analysis module and the signal property analysis module; a low pass filter and a down sampler; wherein the low pass filter is configured to limit a frequency band of the audio signal, and the down sampler is configured to downsample the audio signal of the limited frequency band to reduce a sampling rate of the signal; 所述频带扩展模块, 用于将输入音频信号在整个频带上进行分析, 提取高频部分的谱 包络及表征低、 高频谙之间相关性的参数, 并输出到所述比特流复用模块; 具体包括参数 提取模块和谱包络提取模块; 所述参数提取模块用于提取输入信号在不同时频区域表示输 入信号镨特性的参数; 所述 i普包络提取模块用于以一定的时频分辨率估计信号高频部分的 谱包络, 然后将输入信号谱特性的参数和高频部分的谱包络输出到所述比特流复用模块。  The frequency band expansion module is configured to analyze the input audio signal over the entire frequency band, extract a spectral envelope of the high frequency portion, and characterize the correlation between the low and high frequency chirps, and output the data to the bit stream multiplexing The module includes: a parameter extraction module and a spectral envelope extraction module; the parameter extraction module is configured to extract a parameter indicating an input signal characteristic of the input signal in different time-frequency regions; The time-frequency resolution estimates the spectral envelope of the high-frequency portion of the signal, and then outputs the parameters of the spectral characteristics of the input signal and the spectral envelope of the high-frequency portion to the bitstream multiplexing module. 6、 一种增强音频编码方法, 其特征在于, 包括以下步骤:  6. An enhanced audio coding method, comprising the steps of: 步驟一、对输入音频信号进行类型分析,将信号类型分析结果作为复用信息的一部分; 步骤二、 对类型分析后的信号进行时频映射, 获得音频信号的频域系数; 同时, 计算 音频信号的信掩比;  Step 1: Perform type analysis on the input audio signal, and use the signal type analysis result as part of the multiplexing information. Step 2: Perform time-frequency mapping on the type-analyzed signal to obtain a frequency domain coefficient of the audio signal; meanwhile, calculate an audio signal Letter cover ratio; 步骤三、 如果是快变类型信号, 则对频域系数进行多分辨率分析; 如果不是快变类型 信号, 则转至步骤四;  Step 3: If it is a fast variable type signal, perform multi-resolution analysis on the frequency domain coefficient; if it is not a fast change type signal, go to step 4; 步驟四、 在信掩比的控制下, 对频域系数进行量化和熵编码;  Step 4: Quantizing and entropy coding the frequency domain coefficients under the control of the mask ratio; 步驟五、 将编码后的音频信号进行复用, 得到压缩音频码流。  Step 5: multiplexing the encoded audio signal to obtain a compressed audio code stream. 7、 根据权利要求 6 所述增强音频编码方法, 其特征在于, 所述步驟四的量化是标 量量化, 具体包括: 对所有尺度因子带中的频域系数进行非线性压缩; 再利用每个子带的 尺度因子对该子带的频域系数进行量化, 得到整数表示的量化谱; 选择每帧信号中的笫一 个尺度因子作为公共尺度因子; 其它尺度因子与其前一个尺度因子进行差分处理; 所述熵编码包括: 对量化谱和差分处理后的尺度因子进行熵编码, 得到码书序"?"、 尺 度因子编码值和量化 "i普的无损编码值; 对码书序号进行熵编码, 得到码书序号编码 H。 The enhanced audio coding method according to claim 6, wherein the quantization in the fourth step is scalar quantization, and specifically includes: performing nonlinear compression on frequency domain coefficients in all scale factor bands; and using each subband The scale factor quantizes the frequency domain coefficients of the subband to obtain a quantized spectrum expressed by an integer; selects one scale factor in each frame signal as a common scale factor; and other scale factors are differentially processed with the previous scale factor; The entropy coding comprises: entropy coding the quantized spectrum and the differentially processed scale factor, obtaining a codebook sequence "?", a scale factor coded value, and a quantized "i" lossless coded value; entropy coding the codebook sequence number, Get the code book serial number code H. 8、 根据权利要求 6 所述增强音频编码方法, 其特征在于, 所述步骤三多分 率分 析包括: 对频域系数进行 MDCT 变换, 得到时频平面系数; 将上述时频平面系数按照、一定 的规则重组; 其中所述重组方法包括: 先将时频平面系数在频率方向组织, 每个频^中的 系数在时间方向组织, 然后将组织好的系数按照子窗、 尺度因子带的顺序排列。  The enhanced audio coding method according to claim 6, wherein the step three-multi-rate analysis comprises: performing MDCT transform on the frequency domain coefficients to obtain a time-frequency plane coefficient; and determining, according to the time-frequency plane coefficient, The reorganization method includes: firstly, the time-frequency plane coefficients are organized in the frequency direction, the coefficients in each frequency are organized in the time direction, and then the organized coefficients are arranged in the order of the sub-window and the scale factor band. . 9、 根据权利要求 6-8 任一所述增强音频编码方法, 其特征在于, 在所述步- 三与 步骤四之间, 还包括: 对频域系数进行标准的线性预测分析, 得到预测增益和预测系 数; 判断预测增益是否超过设定的阁值, 如果超过, 则根据预测系数对频域系数进行频 线性 预测误差滤波, 得到频域系数的线性预测残差序列; 将预测系数转换成线语对频率系 数, 并对线 i普对频率系数进行多级矢量量化处理,得到边信息;对残差序列进行量化和熵編码; 如果预测增益未超过设定的阁值, 则对频域系数进行量化和熵编码。  The enhanced audio coding method according to any one of claims 6-8, characterized in that, between the step-3 and the step 4, the method further comprises: performing standard linear prediction analysis on the frequency domain coefficients to obtain a prediction gain. And a prediction coefficient; determining whether the prediction gain exceeds a set value; if it exceeds, performing frequency linear prediction error filtering on the frequency domain coefficient according to the prediction coefficient, obtaining a linear prediction residual sequence of the frequency domain coefficient; converting the prediction coefficient into a line Word pair frequency coefficient, and multi-level vector quantization processing on the frequency coefficient to obtain side information; quantize and entropy encode the residual sequence; if the prediction gain does not exceed the set value, then the frequency domain The coefficients are quantized and entropy encoded. 10、 根据权利要求 6-9任一所述增强音频编码方法, 其特征在于, 所述步骤四进一 步包括: 对频域系数进行量化; 判断音频信号是否为多声道信号, 如果是多声道信号 , 则 判断左、 右声道信号的信号类型是否一致, 如果信号类型一致, 则判断两声道对应的尺度 因子带之间是否满足和差立体声编码条件, 如果满足, 则对该尺度因子带中的谱系氣进行 和差立体声编码, 得到和差声道的频域系数; 如果不满足, 则该尺度因子带中的谱系教不 进行和差立体声编码; 如果是单声道信号或信号类型不一致的多声道信号, 则对频 系数 不进行处理; 对频域系数进行熵编码; 其中  The enhanced audio encoding method according to any one of claims 6-9, wherein the step 4 further comprises: quantizing the frequency domain coefficients; determining whether the audio signal is a multi-channel signal, if it is multi-channel The signal determines whether the signal types of the left and right channel signals are consistent. If the signal types are the same, it is determined whether the scale factor bands corresponding to the two channels satisfy the difference and the stereo coding conditions, and if satisfied, the scale factor band is The spectral system in the middle and the differential stereo coding obtain the frequency domain coefficients of the difference channel; if not, the lineage in the scale factor band does not perform the difference stereo coding; if it is a mono signal or the signal type is inconsistent The multi-channel signal, the frequency coefficient is not processed; the frequency domain coefficients are entropy encoded; 所述判断尺度因子带是否满足编码条件的方法是: K- L 变换, 具体是: 计算左右声道 尺度因子带的谱系数的相关矩阵; 对相关矩阵进行 K-L变换; 如果旋转角度 α的绝对 直偏 离; r/4较小时, 如3 /16 < |^ < 5 /16 , 则对应的尺度因子带可以进行和差立体声编码; 所 述和差立体声编码为: , 其中: 表示量化后的和声道频域系数; The method for judging whether the scale factor band satisfies the coding condition is: K-L transformation, specifically: calculating a correlation matrix of spectral coefficients of left and right channel scale factor bands; performing KL transformation on the correlation matrix; if the rotation angle α is absolutely straight Deviation; when r/4 is small, such as 3 /16 < |^ < 5 /16, the corresponding scale factor band can be subjected to and differential stereo coding; the sum and difference stereo coding is: , where: represents the quantized harmony Channel frequency domain coefficient;
Figure imgf000032_0001
Figure imgf000032_0001
示量化后的差声道频域系数; 表示量化后的左声道频域系数; 表示量化后的右声 频 域系数。 The quantized difference channel frequency domain coefficient is displayed; the quantized left channel frequency domain coefficient is represented; and the quantized right sound frequency domain coefficient is represented.
11、 根据权利要求 6- 10任一所述增强音频编码方法, 其特征在于, 在所述步骤一之 前, 还包括重釆样步骤和频帶扩展步骤;  The enhanced audio encoding method according to any one of claims 6 to 10, characterized in that, before the step one, further comprising a repeating step and a frequency band expanding step; 所述的重釆样步骤, 对输入音频信号进行重釆样, 改变音频信号的采样率; 所述的频带扩展步骤, 在整个频带上分析输入音频信号, 提取其高频谱包絡和信号谱 特性参数, 作为信号复用的一部分。 In the repeating step, the input audio signal is re-sampled to change the sampling rate of the audio signal; The frequency band expansion step analyzes the input audio signal over the entire frequency band, extracting its high spectral envelope and signal spectral characteristic parameters as part of signal multiplexing. 12、 一种增强音频解码装置, 包括: 比特流解复用模块、 熵解码模块、 逆量化器组、 频率-时间映射模块, 其特征在于, 还包括多分辨率综合模块;  12. An enhanced audio decoding apparatus, comprising: a bit stream demultiplexing module, an entropy decoding module, an inverse quantizer group, and a frequency-time mapping module, further comprising: a multi-resolution synthesis module; 所述比特流解复用模块用于对压缩音频数据流进行解复用, 并向所述熵解码模块和多 分辨率综合模块输出相应的数据信号和控制信号;  The bitstream demultiplexing module is configured to demultiplex the compressed audio data stream, and output corresponding data signals and control signals to the entropy decoding module and the multi-resolution synthesis module; 所述熵解码模块用于对上述信号进行解码处理, 恢复錯的量化值, 输出到所述逆量化 器组;  The entropy decoding module is configured to perform decoding processing on the foregoing signal, recover the wrong quantized value, and output the result to the inverse quantizer group; 所述逆量化器组用于重建逆量化语, 并输出到所述到多分辨率综合模块;  The inverse quantizer group is configured to reconstruct an inverse quantized language and output to the multi-resolution synthesis module; 所述多分辨率综合模块用于对逆量化谱进行多分辨率综合, 并输出到所述频率-时间 映射模块;  The multi-resolution synthesis module is configured to perform multi-resolution synthesis on the inverse quantization spectrum and output to the frequency-time mapping module; 所述频率-时间映射模块用于对谱系数进行频率-时间映射, 输出时域音频信号。  The frequency-time mapping module is configured to perform frequency-time mapping on the spectral coefficients and output a time domain audio signal. 13、 根据权利要求 12所述的增强音频解码装置, 其特征在于, 所述多分辨率综合模 块包括: 系数重组模块和系数变换模块; 所述系数变换模块是频域逆小波变换滤波器组或 频域逆修正离散余弦变换滤波器组。  The enhanced audio decoding device according to claim 12, wherein the multi-resolution synthesis module comprises: a coefficient recombination module and a coefficient transformation module; the coefficient transformation module is a frequency domain inverse wavelet transform filter bank or Frequency domain inverse modified discrete cosine transform filter bank. 14、 根据权利要求 12或 13所述的增强音频解码装置, 其特征在于, 还包括逆频域 线性预测及矢量量化模块, 位于所述逆量化器组的输出与所述多分辨率综合模块的输入之 间; 所述逆频域线性预测及矢量量化模块具体包括逆矢量量化器、 逆转换器和逆线性预测 滤波器; 所述逆矢量量化器用于对码字索引进行逆量化, 得到线语对频率系数; 所述逆转 换器则用于将线谱对频率系数逆转换为预测系数; 所述逆线性预测滤波器用于根据预测系 数将逆量化谱进行逆滤波, 得到预测前的谱。  14. The enhanced audio decoding apparatus according to claim 12 or 13, further comprising an inverse frequency domain linear prediction and vector quantization module, the output of the inverse quantizer group and the multi-resolution synthesis module The inverse frequency domain linear prediction and vector quantization module specifically includes an inverse vector quantizer, an inverse transformer, and an inverse linear prediction filter; the inverse vector quantizer is used for inverse quantization of the codeword index to obtain a line language The inverse frequency converter is configured to inversely convert the line spectrum to the frequency coefficient into a prediction coefficient; and the inverse linear prediction filter is configured to inversely filter the inverse quantization spectrum according to the prediction coefficient to obtain a spectrum before prediction. 15、 根据权利要求 12-14 任一所述的增强音频解码装置, 其特征在于, 还包括和差 立体声解码模块, 位于所述逆量化器组之后或者位于所述熵解码模块的输出与所述逆量化 器组的输入之间, 接收所述比特流解复用模块输出的和差立体声控制信号, 用于根据和差 立体声控制信息将和差声道的逆量化谱 /谱的量化值转换成左右声道的逆量化傳 /谱的量 化值。  The enhanced audio decoding device according to any one of claims 12-14, further comprising a sum and difference stereo decoding module, located after the inverse quantizer group or at an output of the entropy decoding module Between the inputs of the inverse quantizer group, receiving the sum and difference stereo control signals output by the bit stream demultiplexing module, for converting the quantized values of the inverse quantized spectrum/spectrum of the difference channel and the differential channel according to the sum and difference stereo control information The quantized value of the inverse quantized/spectrum of the left and right channels. 16、 一种增强音频解码方法, 其特征在于, 包括以下步驟:  16. An enhanced audio decoding method, comprising the steps of: 步骤一、 对压缩音频数据流进行解复用, 得到数据信息和控制信息;  Step 1: Demultiplexing the compressed audio data stream to obtain data information and control information; 步骤二、 对上述信息进行熵解码, 得到谱的量化值;  Step 2: Entropy decoding the above information to obtain a quantized value of the spectrum; 步骤三、 对谱的量化值进行逆量化处理, 得到逆量化谱; 步骤四、 对逆量化谱进行多分辨率综合; Step 3: performing inverse quantization processing on the quantized value of the spectrum to obtain an inverse quantization spectrum; Step 4: performing multi-resolution synthesis on the inverse quantization spectrum; 步骤五、 进行频率 -时间映射, 得到时域音频信号。  Step 5: Perform frequency-time mapping to obtain a time domain audio signal. 17、 根据权利要求 16所述的增强音频解码方法, 其特征在于, 所述步驟四多分辨率 综合步驟具体是: 对逆量化语系数按照子窗、 尺度因子带的顺序排列, 再按照频序进行重 组, 然后对重组的系数进行多个逆修正离散余弦变换, 得到多分辨率分析前的逆量化谱。  The enhanced audio decoding method according to claim 16, wherein the step four-resolution integration step is specifically: arranging the inverse quantized coefficients according to the order of the sub-window and the scale factor band, and then following the frequency order. The recombination is performed, and then multiple inverse modified cosine transforms are performed on the recombined coefficients to obtain an inverse quantized spectrum before multiresolution analysis. 18、 根据权利要求 16所述的增强音频解码方法, 其特征在于, 所述步骤五可以进一 步包括: 进行逆修正离散余弦变换, 得到变换后的时域信号; 对变换后的时域信号在时域 进行加窗处理; 对上述加窗时域信号进行叠加处理, 得到时域音频信号; 其中所述加窗处 理中的窗函数为:  The enhanced audio decoding method according to claim 16, wherein the step 5 may further comprise: performing inverse modified discrete cosine transform to obtain a transformed time domain signal; and transforming the time domain signal at time The domain performs windowing processing; superimposing the windowed time domain signal to obtain a time domain audio signal; wherein the window function in the windowing processing is: ^(M=cos(/7i/2*((i+0.5)/ -0.94*8ίη(2*/7// *(1+0. )) I {2*pi))) , 其 中 ir = 0... ^l; ίΧ ^表示窗函数的第 k个系数, 有 w(k)=w(2*7^1_ ); 表示编码帧的样本数。  ^(M=cos(/7i/2*((i+0.5)/ -0.94*8ίη(2*/7// *(1+0. )) I {2*pi))) , where ir = 0 ... ^l; ίΧ ^ denotes the kth coefficient of the window function, with w(k)=w(2*7^1_ ); represents the number of samples of the encoded frame. 19、 根据权利要求 17或 18所述的增强音频解码方法, 其特征在于, 在所述步骤三 与步骤四之间, 还包括: 判断控制信息中是否包含有逆量化谱需要经过逆频域线性预测矢 量量化的信息, 如果含有, 则进行逆矢量量化处理, 得到预测系数, 并利用预测系数对逆 量化谱进行线性预测合成, 得到预测前的谱; 将预测前的谱进行频率-时间映射; 其中所 述逆矢量量化处理进一步包括: 从控制信息中获得预测系数矢量量化后的码字索引; 再根 据码字索引得到量化的线讲对频率系数, 并以此计算出预测系数。  The enhanced audio decoding method according to claim 17 or 18, further comprising: determining whether the inverse quantization spectrum is included in the control information needs to undergo inverse frequency domain linearity between the step 3 and the fourth step The information of the predicted vector quantization, if included, is subjected to inverse vector quantization processing to obtain a prediction coefficient, and the prediction coefficient is used for linear prediction synthesis of the inverse quantization spectrum to obtain a spectrum before prediction; and the spectrum before prediction is subjected to frequency-time mapping; The inverse vector quantization process further includes: obtaining a codeword index of the prediction coefficient vector quantization from the control information; and obtaining a quantized line frequency coefficient according to the codeword index, and calculating the prediction coefficient. 20、 根据权利要求 16-19 任一所述的增强音频解码方法, 其特征在于, 在所述步骤 二与步骤三之间, 还包括: 如果信号类型分析结果表明信号类型一致, 则根据和差立体声 控制信号判断是否需要对逆量化讲进行和差立体声解码; 如果需要, 则根据每个尺度因子 带上的标志位判断该尺度因子带是否需要和差立体声解码, 如果需要, 则将该尺度因子带 中的和差声道的逆量化 i普转换成左右声道的逆量化谱, 转至步骤三; 如果信号类型不一致 或者不需要进行和差立体声解码, 则对逆量化谱不进行处理, 转至步骤三; 其中所述和差立体声解码足: , 其中: 表示量化后的和声道频域系 The enhanced audio decoding method according to any one of claims 16 to 19, wherein, between the second step and the third step, the method further comprises: if the signal type analysis result indicates that the signal types are consistent, the basis and the difference are The stereo control signal determines whether inverse quantization and differential stereo decoding are required; if necessary, determining whether the scale factor band requires and difference stereo decoding according to the flag bit on each scale factor band, and if necessary, the scale factor The inverse quantization of the band and the difference channel is converted into the inverse quantization spectrum of the left and right channels, and the process proceeds to step 3; if the signal type is inconsistent or does not need to perform the difference and the stereo decoding, the inverse quantization spectrum is not processed, Go to step 3; wherein the sum and difference stereo decoding are: , where: represents the quantized sum channel frequency domain
Figure imgf000034_0001
数; ^表示量化后的差声道频域系数; 表示量化后的左声道频域系数; 表示量化后的右 声道频域系数。
Figure imgf000034_0001
The number represents the quantized difference channel frequency domain coefficient; represents the quantized left channel frequency domain coefficient; represents the quantized right channel frequency domain coefficient.
PCT/CN2005/000440 2004-04-01 2005-04-01 Enhanced audio encoding/decoding device and method Ceased WO2005096273A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP05742018A EP1873753A1 (en) 2004-04-01 2005-04-01 Enhanced audio encoding/decoding device and method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200410030946.9 2004-04-01
CN200410030946 2004-04-01

Publications (1)

Publication Number Publication Date
WO2005096273A1 true WO2005096273A1 (en) 2005-10-13

Family

ID=35064017

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2005/000440 Ceased WO2005096273A1 (en) 2004-04-01 2005-04-01 Enhanced audio encoding/decoding device and method

Country Status (2)

Country Link
EP (1) EP1873753A1 (en)
WO (1) WO2005096273A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090006081A1 (en) * 2007-06-27 2009-01-01 Samsung Electronics Co., Ltd. Method, medium and apparatus for encoding and/or decoding signal
CN108962266A (en) * 2014-03-24 2018-12-07 杜比国际公司 To the method and apparatus of high-order clear stereo signal application dynamic range compression
CN112530444A (en) * 2019-09-18 2021-03-19 华为技术有限公司 Audio encoding method and apparatus
US20240283945A1 (en) * 2016-09-30 2024-08-22 The Mitre Corporation Systems and methods for distributed quantization of multimodal images
US20240339119A1 (en) * 2014-05-01 2024-10-10 Nippon Telegraph And Telephone Corporation Coding device, decoding device, and method and program thereof
US12431148B2 (en) 2014-03-31 2025-09-30 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Encoding device, decoding device, encoding method, decoding method, and non-transitory computer-readable recording medium

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101756834B1 (en) 2008-07-14 2017-07-12 삼성전자주식회사 Method and apparatus for encoding and decoding of speech and audio signal
TWI430263B (en) * 2009-10-20 2014-03-11 弗勞恩霍夫爾協會 Audio signal encoder, audio signal decoder, method of encoding or decoding an audio signal using aliasing cancellation
EP2372704A1 (en) * 2010-03-11 2011-10-05 Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V. Signal processor and method for processing a signal
JP5704018B2 (en) * 2011-08-05 2015-04-22 富士通セミコンダクター株式会社 Audio signal encoding method and apparatus
CN110706715B (en) 2012-03-29 2022-05-24 华为技术有限公司 Method and apparatus for encoding and decoding signal
WO2014046916A1 (en) * 2012-09-21 2014-03-27 Dolby Laboratories Licensing Corporation Layered approach to spatial audio coding
WO2020039000A1 (en) 2018-08-21 2020-02-27 Dolby International Ab Coding dense transient events with companding
WO2025227292A1 (en) * 2024-04-28 2025-11-06 北京小米移动软件有限公司 Audio processing method and apparatus, and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5537510A (en) * 1994-12-30 1996-07-16 Daewoo Electronics Co., Ltd. Adaptive digital audio encoding apparatus and a bit allocation method thereof
US5613035A (en) * 1994-01-18 1997-03-18 Daewoo Electronics Co., Ltd. Apparatus for adaptively encoding input digital audio signals from a plurality of channels
CN1388517A (en) * 2002-06-05 2003-01-01 北京阜国数字技术有限公司 Audio coding/decoding technology based on pseudo wavelet filtering
CN1461112A (en) * 2003-07-04 2003-12-10 北京阜国数字技术有限公司 Quantized voice-frequency coding method based on minimized global noise masking ratio criterion and entropy coding

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5613035A (en) * 1994-01-18 1997-03-18 Daewoo Electronics Co., Ltd. Apparatus for adaptively encoding input digital audio signals from a plurality of channels
US5537510A (en) * 1994-12-30 1996-07-16 Daewoo Electronics Co., Ltd. Adaptive digital audio encoding apparatus and a bit allocation method thereof
CN1388517A (en) * 2002-06-05 2003-01-01 北京阜国数字技术有限公司 Audio coding/decoding technology based on pseudo wavelet filtering
CN1461112A (en) * 2003-07-04 2003-12-10 北京阜国数字技术有限公司 Quantized voice-frequency coding method based on minimized global noise masking ratio criterion and entropy coding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
P. P. VAIDYNATHAN: "Multirate Systems and Filter Banks", 1993, PRENTICE HALL

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090006081A1 (en) * 2007-06-27 2009-01-01 Samsung Electronics Co., Ltd. Method, medium and apparatus for encoding and/or decoding signal
US12273696B2 (en) 2014-03-24 2025-04-08 Dolby Laboratories Licensing Corporation Method and device for applying dynamic range compression to a higher order ambisonics signal
CN108962266B (en) * 2014-03-24 2023-08-11 杜比国际公司 Method and apparatus for applying dynamic range compression to high order hi-fi stereo signals
US11838738B2 (en) 2014-03-24 2023-12-05 Dolby Laboratories Licensing Corporation Method and device for applying Dynamic Range Compression to a Higher Order Ambisonics signal
CN108962266A (en) * 2014-03-24 2018-12-07 杜比国际公司 To the method and apparatus of high-order clear stereo signal application dynamic range compression
US12431148B2 (en) 2014-03-31 2025-09-30 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Encoding device, decoding device, encoding method, decoding method, and non-transitory computer-readable recording medium
US20240339119A1 (en) * 2014-05-01 2024-10-10 Nippon Telegraph And Telephone Corporation Coding device, decoding device, and method and program thereof
US12431151B2 (en) * 2014-05-01 2025-09-30 Nippon Telegraph And Telephone Corporation Coding device, decoding device, and method and program thereof
US20240283945A1 (en) * 2016-09-30 2024-08-22 The Mitre Corporation Systems and methods for distributed quantization of multimodal images
US12309395B2 (en) * 2016-09-30 2025-05-20 The Mitre Corporation Systems and methods for distributed quantization of multimodal images
CN112530444A (en) * 2019-09-18 2021-03-19 华为技术有限公司 Audio encoding method and apparatus
CN112530444B (en) * 2019-09-18 2023-10-03 华为技术有限公司 Audio coding method and device
US12057129B2 (en) 2019-09-18 2024-08-06 Huawei Technologies Co., Ltd. Audio coding method and apparatus

Also Published As

Publication number Publication date
EP1873753A1 (en) 2008-01-02

Similar Documents

Publication Publication Date Title
WO2005096274A1 (en) An enhanced audio encoding/decoding device and method
EP1914724B1 (en) Dual-transform coding of audio signals
CN110310659B (en) Apparatus and method for decoding or encoding audio signal using reconstructed band energy information value
JP4081447B2 (en) Apparatus and method for encoding time-discrete audio signal and apparatus and method for decoding encoded audio data
CN101276587B (en) Audio encoding apparatus and method thereof, audio decoding device and method thereof
AU2006332046B2 (en) Scalable compressed audio bit stream and codec using a hierarchical filterbank and multichannel joint coding
KR101161866B1 (en) Audio coding apparatus and method thereof
JP5395917B2 (en) Multi-channel digital speech coding apparatus and method
US7275036B2 (en) Apparatus and method for coding a time-discrete audio signal to obtain coded audio data and for decoding coded audio data
US9037454B2 (en) Efficient coding of overcomplete representations of audio using the modulated complex lapped transform (MCLT)
US20100023336A1 (en) Compression of audio scale-factors by two-dimensional transformation
CN102436819B (en) Wireless audio compression and decompression methods, audio coder and audio decoder
EP1612772A1 (en) Low-bitrate encoding/decoding method and system
CN103329197A (en) Improved stereo parametric encoding/decoding for channels in phase opposition
WO2006003891A1 (en) Audio signal decoding device and audio signal encoding device
KR19990041073A (en) Audio encoding / decoding method and device with adjustable bit rate
KR20080035454A (en) Fast Lattice Vector Quantization
CN103366750B (en) A kind of sound codec devices and methods therefor
WO2005096273A1 (en) Enhanced audio encoding/decoding device and method
CN101162584A (en) Method and device for encoding and decoding audio signals using bandwidth extension technology
CN101241701A (en) audio decoding
CN1677492A (en) Intensified audio-frequency coding-decoding device and method
WO2005096508A1 (en) Enhanced audio encoding and decoding equipment, method thereof
CN100555413C (en) Method and device for scalable encoding and decoding of audio data
WO2006056100A1 (en) Coding/decoding method and device utilizing intra-channel signal redundancy

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 69(1) EPC - FORM EPO 1205A DATED 21-03-2007

WWE Wipo information: entry into national phase

Ref document number: 2005742018

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2005742018

Country of ref document: EP