[go: up one dir, main page]

WO2005096273A1 - Ameliorations apportees a un procede et un dispositif de codage/decodage audio - Google Patents

Ameliorations apportees a un procede et un dispositif de codage/decodage audio Download PDF

Info

Publication number
WO2005096273A1
WO2005096273A1 PCT/CN2005/000440 CN2005000440W WO2005096273A1 WO 2005096273 A1 WO2005096273 A1 WO 2005096273A1 CN 2005000440 W CN2005000440 W CN 2005000440W WO 2005096273 A1 WO2005096273 A1 WO 2005096273A1
Authority
WO
WIPO (PCT)
Prior art keywords
module
signal
frequency
frequency domain
inverse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2005/000440
Other languages
English (en)
Chinese (zh)
Inventor
Xingde Pan
Dietz Martin
Andreas Ehret
Holger HÖRICH
Xiaoming Zhu
Michael Schug
Weimin Ren
Lei Wang
Hao Deng
Fredrik Henn
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING E-WORLD TECHNOLOGY Co Ltd
BEIJING MEDIA WORKS Co Ltd
Coding Technologies Sweden AB
Original Assignee
BEIJING E-WORLD TECHNOLOGY Co Ltd
BEIJING MEDIA WORKS Co Ltd
Coding Technologies Sweden AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING E-WORLD TECHNOLOGY Co Ltd, BEIJING MEDIA WORKS Co Ltd, Coding Technologies Sweden AB filed Critical BEIJING E-WORLD TECHNOLOGY Co Ltd
Priority to EP05742018A priority Critical patent/EP1873753A1/fr
Publication of WO2005096273A1 publication Critical patent/WO2005096273A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components

Definitions

  • the present invention relates to the field of audio codec technology, and in particular to an enhanced audio codec device and method based on a perceptual model. Background technique
  • the digital audio signal is audio encoded or audio compressed for storage and transmission.
  • the purpose of encoding an audio signal is to achieve a transparent representation of the audio signal with as few bits as possible, for example, there is little difference between the originally input audio signal and the encoded output audio signal.
  • the advent of CDs represented the many advantages of digitally representing audio signals, such as high fidelity, large dynamic range, and robustness.
  • these advantages are at the expense of high data rates.
  • the sample rate of the CD-quality stereo signal requires a sampling rate of 44.1 kHz, and each sample value is uniformly quantized with 15 bits, so that the uncompressed data rate reaches 1.41 Mb/s.
  • Such a high data rate brings great inconvenience to the transmission and storage of data, especially in the case of multimedia applications and wireless transmission applications, and is limited by bandwidth and cost.
  • new network and wireless multimedia digital audio systems are required to reduce the rate of data without compromising the quality of the audio.
  • MPEG-1 technology and MPEG-2 BC technology are high-quality audio coding techniques primarily used for mono and stereo audio signals, with the need for multi-channel audio coding that achieves higher coding quality at lower bit rates.
  • MPEG-2 BC encoding technology emphasizes backward compatibility with MPEG-1 technology, it is impossible to achieve high-quality encoding of five channels at a code rate lower than 540 kbps.
  • the MPEG- 2 AAC technology is proposed, which can achieve higher quality encoding for five-channel signals at a rate of 320 kbps.
  • Figure 1 shows a block diagram of an MPEG-2 AAC encoder comprising a gain controller 101, a filter bank 102, a time domain noise shaping module 103, an intensity/coupling module 104, a psychoacoustic model, a second order backward self.
  • the filter bank 102 employs a modified discrete cosine transform (MDCT) whose resolution is signal adaptive, that is, a 2048-point MDCT transform is used for the steady-state signal, and a 256-point MDCT transform is used for the transient signal; thus, for 48
  • MDCT modified discrete cosine transform
  • the kHz-like signal has a maximum frequency resolution of 23 Hz and a maximum time resolution of 2.6 ms.
  • a sine window and a Kaiser-Bessel window can be used in the filter bank 102, and the sinusoidal window is used when the harmonic interval of the input signal is less than 140 Hz, and the Ka iser-Bessel window is used when the strong component interval of the input signal is greater than 220 Hz. .
  • the audio signal After the audio signal passes through the gain controller 101, it enters the filter bank 102, performs filtering according to different signals, and then processes the spectral coefficients output by the filter bank 102 through the time domain noise shaping module 103.
  • the time domain noise shaping technique is in the frequency domain. Linear prediction analysis is performed on the spectral coefficients, and then the quantization noise is controlled in the time domain according to the above analysis, thereby achieving the purpose of controlling the pre-echo.
  • the intensity/coupling module 104 is for stereo encoding of signal strength, since for a high frequency band (greater than 2 kHz) the sense of direction of the hearing is related to the change in signal strength (signal envelope), regardless of the waveform of the signal, That is, the constant envelope signal has no influence on the sense of direction of the hearing, so this feature and the related information between multiple channels can be used to combine several channels into one common channel for encoding, which forms an intensity/coupling technique.
  • the second-order backward adaptive predictor 105 is used to eliminate redundancy of the steady state signal and improve coding efficiency.
  • the and difference stereo (M/S) module 106 operates for a pair of channels, which are two channels for capturing left and right channels or left and right surround channels, such as a two-channel signal or a multi-channel signal.
  • the M/S module 106 utilizes the correlation between the two channels of the channel pair to achieve the effect of reducing the code rate and improving the coding efficiency.
  • the bit allocation and quantization coding module 107 is implemented by a nested loop process in which the non-uniform quantizer performs lossy coding and the entropy coding module performs lossless coding, which removes redundancy and reduces correlation.
  • the nested loop includes an inner loop and an outer loop, wherein the inner loop adjusts the step size of the non-uniform quantizer until the supplied bits are used up, and the outer loop estimates the encoding of the ⁇ using the ratio of the quantization noise to the masking threshold. quality.
  • the last encoded signal forms an encoded audio stream output through bitstream multiplexing module 108.
  • the input signal simultaneously generates four equal-bandwidth bands in the quad-band polyphase filter bank (PQF), and each band uses MDCT to generate 256 spectral coefficients, for a total of 1024.
  • PQF quad-band polyphase filter bank
  • each band uses MDCT to generate 256 spectral coefficients, for a total of 1024.
  • a gain controller 101 is used in each band.
  • the high frequency PQF band can be ignored to obtain a low sampling rate signal.
  • FIG. 2 shows a block diagram of the corresponding MPEG-2 AAC decoder.
  • the decoder includes a bitstream demultiplexing block 201, a lossless decoding module 202, an inverse quantizer 203, a scale factor module 204, and a / difference stereo (M/S) module 205, a prediction module 206, an intensity/coupling module 207, The time domain noise shaping module 208, the filter bank 209, and the gain ⁇ empty module 210.
  • the encoded audio stream is demultiplexed by the bitstream demultiplexing module 201 to obtain a corresponding data stream and control stream.
  • the lossless decoding module 202 After the above signal is decoded by the lossless decoding module 202, an integer representation of the scale factor and a signal spectrum are obtained.
  • the inverse quantizer 203 is a set of non-uniform quantizers implemented by a companding function for converting integer quantized values into reconstruction Fu. Since the scale factor module in the encoder differentiates the current scale factor from the previous scale factor and then uses the Huffman code for the difference value, the scale factor module 204 in the decoder performs Huffman decoding to obtain the corresponding difference value, and then recovers. A true scale factor.
  • the M/S module 205 converts the difference channel to the left and right channels under the control of the side information. Since the second order backward adaptive predictor 105 is used in the encoder to eliminate the redundancy of the steady state signal and improve the coding efficiency, the prediction decoding is performed by the prediction module 206 in the decoder.
  • the intensity/coupling module 207 performs intensity/coupling decoding under the control of the side information, and then outputs it to the time domain noise shaping module 208 for time domain noise shaping decoding, and finally performs comprehensive filtering by the filter bank 209, and the filter bank 209 adopts the reverse direction.
  • IMDCT Improved Discrete Cosine Transform
  • the high frequency PQF band can be ignored by the gain control module 210 to obtain a low sampling rate signal.
  • MPEG-2 AAC codec technology is suitable for medium and high bit rate audio signals, but the encoding quality of low bit rate or low bit rate audio signals is poor. At the same time, the codec technology involves more codec modules. High complexity is not conducive to real-time implementation.
  • Figure 3 shows the structure of the encoder using Dolby AC-3 technology, including the transient signal detection module 301, ? The discrete cosine transform filter MDCT 302, the frequency borrowing envelope/exponential encoding module 303, the mantissa encoding module 304, the forward-backward adaptive sensing model 305, the parameter bit allocation module 306, and the bit stream multiplexing module 307.
  • the audio signal is judged to be a steady state signal or a transient signal by the transient signal detecting module 301, and the time domain data is mapped to the frequency domain data by the signal adaptive MDCT filter bank 302, wherein a long window of 512 points is applied to the steady state signal. , a pair of short windows applied to the transient signal.
  • the spectral envelope/exponential encoding module 303 encodes the exponential portions of the signals in three modes according to the requirements of the code rate and frequency resolution, namely the D15, D25, and D45 encoding modes.
  • the AC-3 technology differentially encodes the spectral envelope in frequency because at most ⁇ 2 increments are required, each increment represents a 6 dB level change, the first DC term is absolute coded, and the remaining indices are differential. coding.
  • each index requires approximately 1.33 bits, and three difference packets are encoded in a 7-bit word length.
  • the D15 coding mode provides fine frequency resolution by sacrificing temporal resolution.
  • D15 is occasionally transmitted for steady state signals, typically every 6 sound blocks ( The spectral envelope of a data frame is transmitted once. When the signal spectrum is unstable, the frequency estimate needs to be updated frequently. Estimates are encoded with a smaller frequency resolution, usually using the D25 and D45 encoding modes.
  • the D25 encoding mode provides suitable frequency resolution and time resolution, and differential encoding is performed every other frequency coefficient. Each index requires approximately 1.15 bits. When the frequency is stable on 2 to 3 blocks and then suddenly changes, the D25 encoding mode can be used.
  • the D45 coding mode is to perform a difference encoding every three frequency coefficients, so that each index needs about 0.58 bits.
  • the D45 encoding mode provides high temporal resolution and low frequency resolution, so it is generally used in the encoding of transient signals.
  • the forward-backward adaptive sensing model 305 is used to estimate the masking threshold of each frame of the signal.
  • the forward adaptive part is only applied to the encoder end. Under the limitation of the code rate, a set of the most perceptual model parameters is estimated by iterative loop, and then these parameters are passed to the backward adaptive part to estimate each frame. Mask the threshold.
  • the post-adaptive part is applied to both the encoder side and the decoder side.
  • the parameter bit allocation module 306 analyzes the frequency of the audio signal according to the masking criteria to determine the number of bits to assign to each mantissa.
  • the module 306 utilizes a bit pool for global ratio allocation of all channels.
  • bits are cyclically extracted from the bit pool and allocated to all channels, and the quantization of the mantissa is adjusted in accordance with the number of available bits.
  • the AC-3 encoder also uses high-frequency coupling technology to divide the high-frequency part of the coupled signal into 18 sub-bands according to the critical bandwidth of the human ear, and then select some channels from a certain sub-band. Start coupling. Finally, the AC-3 audio stream is formed by the bit stream multiplexing module 307.
  • Figure 4 shows a schematic diagram of the process using Dolby AC-3 decoding.
  • the bit stream encoded by the ⁇ AC-3 encoder is input, and data frame synchronization and error detection are performed on the bit stream. If a data code is detected, error concealment or mute processing is performed. Then the bit stream is unpacked to obtain the main information and the side information, and then the index is decoded.
  • two side information is needed: one is the number of indexed packets; one is the indexing strategy of the mining, such as D15, D25 or D45 mode.
  • the decoded index and bit allocation side information are then bit-matched, indicating the number of bits used for each packed mantissa, resulting in a set of bit allocation pointers, each bit allocation pointer corresponding to an encoded mantissa.
  • the bit allocation pointer indicates the quantizer used for the mantissa and the number of bits occupied by each mantissa in the code stream.
  • the single coded mantissa value is dequantized, converted into a dequantized value, and the mantissa occupying zero bits is complexed to zero by zero, or replaced by a random jitter value under the control of the jitter flag.
  • the decoupling operation is then performed, and the decoupling is to recover the high frequency portion of the coupled channel from the common coupling channel and the coupling factor, including the exponent and the mantissa.
  • the decoupling is to recover the high frequency portion of the coupled channel from the common coupling channel and the coupling factor, including the exponent and the mantissa.
  • matrix processing is applied to a certain sub-band, then the sub-band and the difference channel value are converted into left and right channel values by matrix recovery at the decoding end.
  • the dynamic range control of each audio block is included in the code stream to dynamically range the value to change the magnitude of the coefficients, including the exponent and mantissa.
  • the frequency domain coefficients are inverse transformed and converted into time domain samples.
  • the time domain samples are windowed, and the adjacent blocks are overlapped and added to reconstruct the PC1M audio signal.
  • the audio signal needs to be mixed, and finally the PCM is output.
  • Dolby AC- 3 encoding technology is mainly for high bit rate multi-channel surround sound signals, but when the 5.1 bit encoding bit rate is lower than 384kbps, its encoding effect is poor; and for mono and dual sound Channel stereo coding efficiency is also low.
  • the technical problem to be solved by the present invention is to provide an apparatus and method for enhancing audio encoding/decoding to solve the problem of low coding efficiency and poor quality of the lower rate audio signal in the prior art.
  • the enhanced audio coding device of the present invention comprises a psychoacoustic analysis module, a time-frequency mapping module, a quantization and entropy coding module, and a bit stream multiplexing module, a signal property analysis module and a multi-resolution analysis module; wherein the signal property analysis module, And performing type analysis on the input audio signal, and outputting to the psychoacoustic analysis module and the time-frequency mapping module, and outputting signal type analysis result information to the bit stream multiplexing module; the psychoacoustic analysis module And calculating a masking threshold and a mask ratio of the audio signal, and outputting to the quantization and entropy encoding module; the time-frequency mapping module, configured to convert the time domain audio signal into a frequency domain coefficient, and output to the multi-resolution
  • the multi-resolution analysis module is configured to perform multi-resolution analysis on the frequency domain coefficients of the fast-changing type signal according to the signal type analysis result output by the signal property analysis module, and output the same to the quantization and entrop
  • the quantization and entropy coding module outputted by the psychoacoustic analysis module Under the control of the mask ratio, the frequency domain coefficients are quantized and entropy encoded, and output to the bit stream multiplexing module; the bit stream multiplexing module is configured to multiplex the received data to form an audio code. Code stream.
  • the enhanced audio decoding apparatus of the present invention comprises: a bit stream demultiplexing module, an entropy decoding module, an inverse quantizer group, a frequency-time mapping module, and a multi-resolution synthesis module; and the bit stream demultiplexing module is used for compression And demultiplexing the audio data stream, and outputting corresponding data signals and control signals to the entropy decoding module and the multi-resolution synthesis module; the entropy decoding module is configured to decode the foregoing signals, and restore spectrum quantization And outputting to the inverse quantizer group; the inverse quantizer group is configured to reconstruct an inverse quantization spectrum, and outputting to the multi-resolution synthesis module; the multi-resolution synthesis module is configured to perform inverse quantization
  • the resolution is integrated and output to the frequency-time mapping module; the frequency-time mapping module is configured to perform frequency-time mapping on the spectral coefficients, and output a time domain audio signal.
  • the invention is applicable to high-fidelity compression coding of audio signals of various sampling rates and channel configurations, and can support audio signals with sampling rates between 8 kHz and 192 kHz; all possible channel configurations can be supported; and a wide range of support is supported. Audio encoding/decoding of the target bit rate.
  • FIG. 1 is a block diagram of an MPEG-2 AAC encoder
  • FIG. 2 is a block diagram of an MPEG-2 AAC decoder
  • Figure 3 is a schematic structural view of an encoder using Dolby AC-3 technology
  • Figure 4 is a schematic diagram of a decoding process using Dolby AC-3 technology
  • Figure 5 is a schematic structural view of an encoding device of the present invention.
  • FIG. 6 is a schematic diagram of a filtering structure using a Harr wavelet-based wavelet transform
  • Figure 7 is a schematic diagram of time-frequency division obtained by using Harr wavelet-based wavelet transform
  • FIG. 8 is a schematic structural diagram of a decoding apparatus of the present invention.
  • Figure 9 is a schematic structural view of Embodiment 1 of the coding apparatus of the present invention.
  • FIG. 10 is a schematic structural diagram of Embodiment 1 of a decoding apparatus according to the present invention.
  • Figure 11 is a schematic structural view of Embodiment 2 of the encoding apparatus of the present invention.
  • FIG. 12 is a schematic structural diagram of Embodiment 2 of a decoding apparatus according to the present invention.
  • Figure 13 is a schematic structural view of a third embodiment of the encoding apparatus of the present invention.
  • FIG. 14 is a schematic structural diagram of Embodiment 3 of a decoding apparatus according to the present invention.
  • Figure 15 is a schematic structural view of Embodiment 4 of the encoding apparatus of the present invention.
  • FIG. 16 is a schematic structural diagram of Embodiment 4 of a decoding apparatus according to the present invention.
  • Figure 17 is a schematic structural view of Embodiment 5 of the coding apparatus of the present invention.
  • Embodiment 5 of a decoding apparatus is a schematic structural diagram of Embodiment 5 of a decoding apparatus according to the present invention.
  • Figure 19 is a schematic structural view of Embodiment 6 of the encoding apparatus of the present invention.
  • FIG. 20 is a schematic structural diagram of Embodiment 6 of a decoding apparatus according to the present invention.
  • Figure 21 is a schematic structural view of Embodiment 7 of the coding apparatus of the present invention.
  • Figure 22 is a block diagram showing the structure of a seventh embodiment of the decoding apparatus of the present invention. Detailed ways
  • Fig. 1 to Fig. 4 are schematic diagrams showing the structure of several encoders of the prior art, which have been introduced in the background, and are not mentioned here.
  • the audio encoding apparatus includes a signal property analyzing module 50, a psychoacoustic analyzing module 51, a time-frequency mapping module 52, a multi-resolution analyzing module 53, a quantization and entropy encoding module 54, and bit stream multiplexing.
  • the module 55 is configured to perform type analysis on the input audio signal, output the audio signal to the psychoacoustic analysis module 51 and the time-frequency mapping module 52, and output the signal type analysis result to the bit stream multiplexing module 55.
  • the psychoacoustic analysis module 51 is configured to calculate a masking threshold and a signal mask ratio of the input audio signal, and output to the quantization and entropy encoding module 54.
  • the time-frequency mapping module 52 is configured to convert the time domain audio signal into frequency domain coefficients and output the same.
  • the resolution analysis module 53 is configured to perform multi-resolution analysis on the frequency domain coefficients of the fast-changing type signal according to the signal type analysis result output by the psychoacoustic analysis module 51, and output to the quantization and entropy coding module. 54; the quantization and entropy coding module 54 is used to control the frequency domain system under the control of the mask ratio output by the psychoacoustic analysis module 51.
  • the number is quantized and entropy encoded and output to the bit stream multiplexing module 55; the bit stream multiplexing module 55 is configured to multiplex the received data to form an audio encoded code stream.
  • the digital audio signal is subjected to signal type analysis in the signal property analysis module 50, and the type information of the audio signal is output to the bit stream multiplexing module 55; and the audio signal is simultaneously output to the psychoacoustic analysis module 51 and the time-frequency map.
  • the masking threshold and the mask ratio of the frame audio signal are calculated in the psychoacoustic analysis module 51, and then the mask ratio is transmitted as a control signal to the quantization and entropy encoding module 54;
  • the signal is converted into frequency domain coefficients by the time-frequency mapping module 52; the frequency domain coefficients are multi-resolution analysis module 53 in the multi-resolution analysis module 53 to improve the time resolution of the fast-changing signal, and output the result to In the quantization and entropy coding module 54; under the control of the mask ratio output by the psychoacoustic analysis module 51, quantization and entropy coding are performed in the quantization and entropy coding module 54, and the encoded data and control signals are multiplexed in the bit stream.
  • Module 55 multiplexes to form a code stream that enhances the audio coding.
  • the signal property analysis module 50 performs signal type analysis on the input audio signal, and outputs the type information of the audio signal to the bit stream multiplexing module 55; and simultaneously outputs the audio signal to the psychoacoustic analysis module 51 and the time-frequency mapping module 52. .
  • the signal property analysis module 50 performs front and back masking effect analysis based on the adaptive threshold and the waveform prediction to determine whether the signal type is a slow-changing signal or a fast-changing signal, and if it is a fast-changing type signal, continues to calculate related parameter information of the abrupt component. Such as the location of the mutation signal and the strength of the mutation signal.
  • the psychoacoustic analysis module 51 is mainly used to calculate the masking threshold, the mask ratio and the perceptual entropy of the input audio signal.
  • the perceptual entropy calculated by the psychoacoustic analysis module 51 can dynamically analyze the number of bits required for the current signal frame to be transparently encoded, thereby adjusting the bit allocation between frames.
  • the psychoacoustic analysis module 51 outputs the mask ratio of each sub-band to the quantized sum
  • the entropy encoding module 54 controls it.
  • the time-frequency mapping module 52 is configured to implement the transformation of the audio signal from the time domain signal to the frequency domain coefficient, and is composed of a filter bank, and specifically may be a discrete Fourier transform (DFT) filter bank, a discrete cosine transform (DCT) filter bank, Modified discrete cosine transform (MDCT) filter bank, cosine modulated filter bank, wavelet transform filter bank, etc.
  • DFT discrete Fourier transform
  • DCT discrete cosine transform
  • MDCT Modified discrete cosine transform
  • the encoding apparatus of the present invention increases the time resolution of the encoded fast-changing signal by the multi-resolution analyzing module 53.
  • the frequency domain coefficients output by the time-frequency mapping module 52 are input to the multi-resolution analysis module 53. If it is a fast-varying type signal, the frequency domain wavelet transform or the frequency domain modified discrete cosine transform (MDCT) is performed to obtain the frequency domain coefficients.
  • the multi-resolution representation is output to the quantization and entropy encoding module 54. If it is a slowly varying type signal, the frequency domain coefficients are not processed and are directly output to the quantization and entropy encoding module 54.
  • the multi-resolution analysis module 53 includes a frequency domain coefficient transform module and a recombination module, wherein the frequency domain coefficient transform module is configured to transform the frequency domain coefficients into time-frequency plane coefficients; the recombination module is configured to reorganize the time-frequency plane coefficients according to certain rules.
  • the frequency domain coefficient transform module may adopt a frequency domain wavelet transform filter bank, a frequency domain MDCT transform filter bank, or the like.
  • the quantization and chirp encoding module 54 further includes a non-linear quantizer group and an encoder, where the quantizer can be a scalar quantizer or a vector quantizer.
  • the vector quantizer is further divided into two categories: memoryless vector quantizer and memory vector quantizer. For a memoryless vector quantizer, each input vector is independently quantized, independent of the previous vectors; a memory vector quantizer considers the previous vector when quantizing a vector, ie, exploits the correlation between vectors.
  • the main memoryless vector quantizers include a full search vector quantizer, a tree search vector quantizer, a multilevel vector quantizer, a gain/waveform vector quantizer, and a separate mean vector quantizer; the main memory vector quantizer includes predictive vector quantization And finite state vector quantizer.
  • the non-linear quantizer group further includes M sub-band quantizers.
  • the scale factor is mainly used for quantization, specifically: nonlinearly compressing all the frequency domain coefficients in the M scale factor bands, and then using the scale factor to quantize the frequency domain coefficients of the sub-band,
  • the quantized spectrum represented by the integer is output to the encoder, and the first scale factor in each frame signal is output to the bit stream multiplexing module 55 as a common scale factor, and other scale factors are differentially processed with the previous scale factor and output to the encoder.
  • the scale factor in the above steps is a constantly changing value, which is adjusted according to the bit allocation strategy.
  • the present invention provides a bit allocation strategy with minimal global perceptual distortion, as follows:
  • each sub-band quantizer is initialized, and an appropriate scale factor is selected such that the quantized value of the spectral coefficients in all sub-bands is zero.
  • the quantization noise of each sub-band is equal to the energy value of each sub-band
  • the noise masking ratio of each sub-band is NMR, etc.
  • SMR letter mask ratio
  • the number of bits consumed for quantization is 0, and the number of remaining bits is equal to the number of target bits.
  • the scale factor of the corresponding subband quantizer is Decrease by one unit and then calculate the number of bits ⁇ 5,. ( ⁇ ) that the subband needs to increase. If the remaining bits of the subband
  • the frequency domain coefficients are composed into a plurality of dimensional vectors and input into the nonlinear quantizer group.
  • the flattening factor is used to perform the flattening, that is, the dynamic range of the chirp is reduced, and then the vector quantizer is used.
  • the subjective perceptual distance measure criterion finds the codeword with the smallest distance from the vector to be quantized in the codebook, and transmits the corresponding codeword index to the encoder.
  • the leveling factor is adjusted according to the bit allocation strategy of vector quantization, and the bit allocation of vector quantization is controlled according to the perceived importance between different sub-bands.
  • Entropy coding is a kind of source coding technology. The basic idea is to give shorter-length codewords to symbols with higher probability of occurrence and longer codewords to symbols with lower probability of occurrence, so that the length of average codewords The shortest. According to Shannon's noiseless coding theorem, if the symbols of the transmitted N source messages are independent, then using the appropriate variable length coding, the average length of the codeword will satisfy H(x) _ 1
  • the entropy coding mainly includes methods such as Huffman coding, arithmetic coding or run length coding, and any entropy coding in the present invention may adopt any of the above coding methods.
  • the quantized spectrum and the scaled factor after the differential processing are entropy encoded in the encoder, and the code book serial number, the scale factor coded value and the lossless coded quantized spectrum are obtained, and then the code book number is entropy coded.
  • the code book serial number encodes the value, and then outputs the scale factor code value, the code book sequence code value, and the lossless code quantizer ⁇ to the bit stream multiplexing module 55.
  • the codeword index obtained by the vector quantizer quantization is subjected to one-dimensional or multi-dimensional entropy coding in the encoder to obtain an encoded value of the codeword index, and then the encoded value of the codeword index is output to the bitstream multiplexing module 55.
  • the encoding method based on the above encoder specifically includes: performing signal type analysis on the input audio signal; calculating a signal mask ratio of the audio signal; performing time-frequency mapping on the audio signal to obtain a frequency domain coefficient of the audio signal; and performing more on the frequency domain coefficient Resolution analysis and quantization and entropy coding; multiplexing the signal type analysis result and the encoded audio code stream to obtain a compressed audio code stream.
  • the analysis signal type is determined based on adaptive threshold and waveform prediction for front and back masking effect analysis.
  • the specific steps are: decomposing the input audio data into frames; decomposing the input frame into multiple sub-frames, and finding each sub-frame The local maximum point of the absolute value of the upper PCM data; the subframe peak value is selected in the local maximum point of each subframe; for a certain subframe peak, a plurality of (typically 3) subframe peak predictions in front of the subframe are utilized Typical sample values of a plurality of (typically 4) subframes relative to the forward delay of the subframe; calculating a difference and a ratio of the peak value of the subframe to the predicted typical sample value; if both the predicted difference and the ratio are If it is greater than the set threshold, it is determined that there is a sudden signal in the sub-frame, and the sub-frame has a local maximum peak point of the back-masking pre-echo capability, if between the front end of the sub-frame and the mask peak, 2.5 ms.
  • the frame signal belongs to the fast-changing type signal; if the predicted difference value and the ratio are not greater than the set threshold, the above steps are repeated until it is determined
  • the frame signal is a fast-changing type signal or reaches the last sub-frame. If the last sub-frame is not determined to be a fast-changing type signal, the frame signal belongs to a slowly varying type signal.
  • time-frequency transform of time-domain audio signals such as discrete Fourier transform (DFT), discrete cosine transform (DCT), modified discrete cosine transform (MDCT), cosine-modulated filter bank, wavelet transform, and so on.
  • DFT discrete Fourier transform
  • DCT discrete cosine transform
  • MDCT modified discrete cosine transform
  • cosine-modulated filter bank wavelet transform
  • wavelet transform wavelet transform
  • time-frequency transform using the modified discrete cosine transform (MDCT)
  • MDCT transform is performed on the windowed signal to obtain a frequency domain coefficient.
  • the window function of the MDCT transform must satisfy the following two conditions:
  • the S ine window can be selected as the window function.
  • the analysis filter and synthesis filter modify the above limitations on the window function.
  • time-frequency transform using cosine modulation filtering first select the time domain signal of the previous frame sample and the M frame of the current frame, and then perform windowing operation on the time domain signals of two samples of the two frames, and then A cosine modulation transform is performed on the windowed signal to obtain a frequency domain coefficient.
  • n QX"', N h -1
  • n 0, l, ---, N f - 1 where 0 ⁇ ⁇ ⁇ - 1, 0 ⁇ ⁇ ⁇ 2 ⁇ -1, is an integer greater than zero, 1) * ⁇ .
  • the analysis response window (analytical prototype filter) of the dice with cosine-modulated filter bank has a shock response length of ⁇ ⁇
  • the integrated window (integrated prototype filter) p s ( ) has an impulse response length of N s .
  • the window function also needs to meet certain conditions, see the literature "Multirate Systems and Filter Banks", PP Vaidynathan, Prentice Hal 1, Englewood Cliffs, NJ, 1993.
  • Calculating the masking value and the mask ratio of the resampled signal includes the following steps:
  • the first step is to map the signal from time domain to frequency domain.
  • the fast Fourier transform and the hanning window technique can be used to convert the time domain data into frequency domain coefficients l], expressed by the amplitude r] and the phase
  • the energy of each sub-band is the sum of the energy of all the spectral lines in the sub-band, that is, the sum >, respectively representing the upper and lower boundaries of the sub-band b.
  • the second step is to determine the pitch and non-tonal components in the signal.
  • the tonality of the signal is estimated by inter-predicting each spectral line.
  • the Euclidean distance between the predicted and true values of each squall line is mapped to an unpredictable measure, and the highly predictive spectral components are considered to be tones.
  • Very strong, and low-predictive spectral components are considered to be noise-like.
  • r pred ⁇ , [k] + ⁇ r t _ [k] - r t _ 2 [k])
  • represents the coefficient of the current frame
  • -1 indicates the coefficient of the previous frame
  • Bu 2 indicates the coefficient of the first two frames.
  • the unpredictability of each subband is the weighted sum of the energy of all the spectral lines in the subband to its unpredictability.
  • Subband energy and unpredictability c[b] are convoluted separately with the extension function
  • n[b] ⁇ s[i,b]]
  • ⁇ command law is the number of subbands divided by the frame signal.
  • the third step is to calculate the Signal-to-Noise Ratio (SNR) required for each sub-band.
  • SNR Signal-to-Noise Ratio
  • TONE Tone-Masking-Noise
  • the fourth step is to calculate the masking threshold of each subband and the perceptual entropy of the signal.
  • the noise energy threshold [6] of the current frame is compared with the noise energy threshold n prev [b] of the previous frame.
  • the number of lines included in the band is the number of lines included in the band.
  • Step 5 Calculate the signal-to-mask ratio (Signa to-Mask Ratio, SMR for short) of each sub-band signal.
  • the multi-resolution analysis module 53 performs time-frequency salt re-organization on the input frequency domain data, and improves the time resolution of the frequency domain data at the cost of reducing the frequency precision, thereby automatically adapting to the time-frequency characteristics of the fast-changing type signal, The effect of the pre-echo is suppressed, and the form of the filter bank in the time-frequency mapping module 52 need not be adjusted.
  • the multi-resolution analysis includes two steps of frequency domain coefficient transform and recombination, wherein frequency domain coefficients are transformed into time-frequency plane coefficients by frequency domain coefficient transform; time-frequency plane coefficients are grouped according to certain rules by recombination.
  • the process of multi-resolution analysis is illustrated by taking the frequency domain wavelet transform and the frequency domain MDCT transform as examples.
  • a frequency domain wavelet or wavelet packet transform wavelet base can be fixed or adaptive.
  • Harr wavelet-based wavelet transform As an example to illustrate the multi-resolution analysis of frequency domain coefficients.
  • the scale factor of the Harr wavelet base is - ⁇ ]
  • Figure 6 shows the use of Harr.
  • V2 V2 H which means high-pass filtering (filter coefficient is)
  • I 2 means 1 time of downsampling operation
  • the middle and low frequency parts of the number :), 0, ⁇ ⁇ ⁇ , ⁇
  • Different wavelet bases can be selected, and different wavelet transform structures can be selected for processing, and other similar time-frequency plane partitions are obtained. Therefore, it is possible to adjust the time-frequency plane division of the signal analysis arbitrarily according to the needs, and to meet the analysis requirements of different time and frequency resolutions.
  • the time-frequency plane coefficients are reorganized according to certain rules in the recombination module. For example: the time-frequency plane coefficients can be organized in the frequency direction, the coefficients in each frequency band are organized in the time direction, and then the well-organized coefficients are followed by the sub-window. , the order of the scale factor bands.
  • Different frequency-domain MDCT transforms can be used in different frequency domain ranges to obtain different time-frequency plane divisions, that is, different time and frequency precisions.
  • the recombination module reorganizes the time-frequency domain data outputted by the frequency domain MDCT transform filter bank.
  • a recombination method is to first organize the time-frequency plane coefficients in the frequency direction, and the coefficients in each frequency band are organized in the time direction, and then The organized coefficients are arranged in the order of the sub-window and the scale factor.
  • Quantization and entropy coding further include two steps of nonlinear quantization and entropy coding, where the quantization can be scalar quantization or vector quantization.
  • the scalar quantization includes the following steps: nonlinearly compressing the frequency domain coefficients in all scale factor bands; and then quantizing the frequency domain coefficients of the subbands by using the scale factor of each subband to obtain a quantized spectrum represented by an integer;
  • the first scale factor in each frame of the signal is used as a common scale factor; other scale factors are differentially processed from their previous scale factor.
  • Vector quantization includes the following steps: constituting a plurality of multi-dimensional vector signals by frequency domain coefficients; panning for each dimension vector according to a flattening factor; ⁇ finding a code having the smallest distance from a vector to be quantized in a codebook according to a subjective perceptual distance measure criterion Word, get its codeword index.
  • the entropy coding step comprises: entropy coding the quantized spectrum and the differentially processed scale factor to obtain a codebook serial number, a scale factor coding value, and a lossless coding quantization language; entropy coding the codebook sequence number to obtain a codebook serial number coding value.
  • the above entropy coding method can adopt any of the existing methods such as Huffman coding, arithmetic coding or run length coding.
  • the encoded audio code stream is obtained, and the code stream is multiplexed together with the common scale factor and signal type analysis result to obtain a compressed audio code stream.
  • FIG. 8 is a block diagram showing the structure of an audio decoding device of the present invention.
  • the audio decoding apparatus includes a bit stream by demultiplexing module 60, entropy decoding module 61, an inverse quantizer group 62, the multi-resolution integration module 63 and a frequency - time map module 64.
  • the compressed audio code stream is demultiplexed by the bit stream demultiplexing module 60, corresponding data signals and control signals are obtained, which are output to the entropy decoding module 61 and the multi-resolution synthesis module 63; the data signal and the control signal are in the entropy decoding module.
  • the decoding process is performed, 'the quantized value of the spectrum is recovered.
  • the above quantized values are reconstructed in the inverse quantizer group 62 to obtain an inverse quantized spectrum.
  • the inverse quantized words are output to the multi-resolution synthesis module 63, and after multi-resolution synthesis, are output to the frequency-time mapping module 64, and then The frequency-time mapping yields an audio signal in the time domain.
  • the bit stream demultiplexing module 60 decomposes the compressed audio stream to obtain corresponding data signals and control signals, and provides corresponding decoding information for other modules.
  • the signal outputted to the entropy decoding module 61 includes a common scale factor, a scale factor coded value, a codebook sequence number coded value, and a lossless coded quantized spectrum, or an encoded value of the codeword index;
  • the type information is sent to the multi-resolution synthesis module 63.
  • the entropy decoding module 61 receives the common scale factor, scale factor encoded value, and code output by the bitstream demultiplexing module 60.
  • the book serial number coded value and the lossless coded quantized spectrum are then subjected to codebook sequence number decoding, spectral coefficient decoding and scale factor decoding, reconstructed quantization, and output the integer representation of the scale factor and the quantization of the spectrum to the inverse quantizer group 62. value.
  • the decoding method employed by the entropy decoding module 61 corresponds to an entropy-encoded encoding method in the encoding device, such as Huffman decoding, arithmetic decoding, or run-length decoding. ⁇
  • the inverse quantizer group 62 After receiving the quantized value of the spectrum and the integer representation of the scale factor, the inverse quantizer group 62 inversely quantizes the quantized value of the spectrum into a non-scaled reconstruction (inverse quantization spectrum), and outputs the inverse quantization borrowing to the multi-resolution synthesis module 63. .
  • the inverse quantizer group 62 may be a uniform quantizer group or a non-uniform quantizer group implemented by a companding function.
  • the quantizer group employs a scalar quantizer
  • the inverse quantizer group 62 also employs a scalar inverse quantizer in the decoding apparatus.
  • the spectral quantized values are first nonlinearly expanded, and then each scale factor is used to obtain all the spectral coefficients (inverse quantized spectrum) in the corresponding scale factor bands.
  • the entropy decoding module 61 receives the encoded value of the codeword index output by the bitstream multiplexing module 60, and uses the encoded value of the codeword index.
  • the entropy decoding method corresponding to the entropy coding method at the time of encoding is decoded to obtain a corresponding codeword index.
  • the codeword index is output to the inverse quantizer group 62, and the quantized value (inverse quantized spectrum) is obtained by querying the codebook, and output to the multi-resolution synthesis module 63.
  • the inverse quantizer group 62 employs an inverse vector quantizer.
  • the frequency-time mapping module 64 may be an inverse discrete cosine transform (IDCT) filter bank, an inverse discrete Fourier transform (IDFT) filter bank, an inverse modified discrete cosine transform (IMDCT) filter bank, an inverse wavelet transform filter bank, and a cosine Modulation filter bank, etc.
  • IDCT inverse discrete cosine transform
  • IMDCT inverse modified discrete cosine transform
  • the decoding method based on the above decoder includes: demultiplexing the compressed audio code stream to obtain data information and control information; performing entropy decoding on the information to obtain a quantized value of the i-precision; performing inverse quantization processing on the quantized value of the spectrum, The inverse quantization is obtained; after the inverse quantization spectrum is multi-resolution integrated, frequency-time mapping is performed to obtain a time domain audio signal.
  • the demultiplexed information includes the codebook serial number encoding value, the common scale factor, the scale factor encoding value, and the lossless encoding quantization spectrum, it indicates that the ixiang coefficient is quantized by the scalar quantization technique in the encoding device, and then the decoding is performed.
  • the steps include: decoding the code number of the code book to obtain the code book number of all the scale factor bands; decoding the quantized coefficients of all the scale factor bands according to the code book corresponding to the code book serial number; decoding the scale factors of all the scale factor bands, reconstructing Quantitative spectrum.
  • the entropy decoding method adopted in the above process corresponds to an entropy coding method in the coding method, such as a run length decoding method, a Huffman decoding method, an arithmetic decoding method, and the like.
  • the process of entropy decoding is illustrated by using a run-length decoding method to decode a codebook sequence number, a Huffman decoding method to decode a quantized coefficient, and a Huffman decoding method to decode a scale factor.
  • the codebook number of all scale factor bands is obtained by the process decoding method, and the decoded codebook sequence number is an integer of a certain interval. If the interval is set to [0, 11], then only the valid range is within the valid range. That is, the codebook number between 0 and 11 corresponds to the corresponding spectral coefficient Huffman codebook. For all zero sub-bands, you can select a code book serial number corresponding to a typical optional 0 serial number.
  • the spectral coefficient Huffman codebook corresponding to the codebook number is used to decode the quantized coefficients of all the scales. If the codebook number of a scale factor band is within the valid range, and the implementation is, for example, between 1 and 11, then the codebook number corresponds to a spectral coefficient codebook, and the quantized coefficient of the scale factor band is decoded from the quantized spectrum using the codebook. The codeword index is then unpacked from the codeword index to obtain the quantized coefficients. If the codebook number of the scale factor band is not between 1 and 11, the codebook number does not correspond to any spectral coefficient codebook, and the quantized coefficient of the subband is not decoded, and the quantized coefficients of the subband are all set to zero.
  • the scale factor is used to reconstruct the i-value based on the inverse quantized spectral coefficients. If the codebook number of the scale factor is within the valid range, each codebook number corresponds to a scale factor.
  • decoding the above scale factor first reading the code stream occupied by the first scale factor, and then performing Huffman decoding on other scale factors, and sequentially obtaining the difference between each scale factor and the previous scale factor, The difference is added to the previous scale factor value to obtain each scale factor. in case The quantized coefficients of the current subband are all zero, then the scale factor of the subband does not need to be decoded.
  • the inverse quantization process includes: nonlinearly expanding the quantized values of the spectra; and obtaining all spectral coefficients (inverse quantized spectra) in the corresponding scale factor bands according to each scale factor.
  • the step of decoding comprises: adopting an entropy corresponding to the entropy coding method in the coding device
  • the decoding method decodes the encoded value of the codeword index to obtain a codeword index.
  • the codeword index is then inverse quantized to obtain an inverse quantized spectrum.
  • the inverse quantization spectrum For the inverse quantization spectrum, if it is a fast-varying type signal, multi-resolution analysis is performed on the frequency domain coefficients, and then the multi-resolution representation of the frequency domain coefficients is quantized and entropy encoded; if it is not a fast-changing type signal, the frequency is directly The domain coefficients are quantized and entropy encoded.
  • Multi-resolution synthesis can be performed by frequency domain wavelet transform or frequency domain MDCT transform.
  • the frequency domain wavelet synthesis method includes: first recombining the above-mentioned time-frequency plane coefficients according to a certain rule; then performing wavelet transform on the frequency domain coefficients to obtain a time-frequency plane coefficient.
  • the MDCT transform method includes: first recombining the above-mentioned time-frequency plane coefficients according to a certain rule, and then performing MDCT transform on the frequency domain coefficients several times to obtain a time-frequency plane coefficient.
  • the method of recombining may include: firstly, the time-frequency plane coefficients are organized in the frequency direction, the coefficients in each frequency band are organized in the time direction, and then the organized coefficients are arranged in the order of the sub-window and the scale factor sub-band.
  • the method of performing frequency-time mapping processing on frequency domain coefficients corresponds to the time-frequency mapping processing method in the encoding method, and may use inverse discrete cosine transform (IDCT), inverse discrete Fourier transform (IDFT), inverse modified discrete cosine transform ( IMDCT), inverse wavelet transform and other methods are completed.
  • IDCT inverse discrete cosine transform
  • IDFT inverse discrete Fourier transform
  • IMDCT inverse modified discrete cosine transform
  • wavelet transform inverse wavelet transform
  • the inverse-corrected discrete cosine transform IMDCT is taken as an example to illustrate the frequency-time mapping process.
  • the frequency-time mapping process consists of three steps: IMDCT transformation, time domain windowing, and time domain superposition.
  • the IMDCT transform is performed on the pre-predicted spectrum or the inverse quantized spectrum to obtain the transformed time domain signal x,.
  • the expression of the IMDCT transform is: ⁇ ,. prepare , where ⁇ represents the sample number, and
  • the time domain signal obtained by the IMDCT transform is windowed in the time domain.
  • Typical window functions are S i ne windows, Ka i se r- Bes se l windows, and the like.
  • the present invention employs a fixed window function whose window function is:
  • the double positive history transform can be used to modify the above-mentioned restrictions on the window function using a specific analysis filter and synthesis filter.
  • the windowed time domain signal is superimposed to obtain a time domain audio signal.
  • Figure 9 is a schematic illustration of a first embodiment of an encoding apparatus of the present invention.
  • This embodiment adds a frequency domain linear prediction and vector quantization module 56 on the basis of FIG. 5, between the output of the multiresolution analysis module 53 and the input of the quantization and entropy coding module 5, and outputs the residual sequence to the quantized sum.
  • the entropy encoding module 54 simultaneously outputs the quantized codeword index as side information to the bitstream multiplexing module 55.
  • the frequency-domain linear prediction and vector quantization module 56 needs to perform linear prediction on the frequency domain coefficients in each time period. Multi-level vector quantization.
  • the frequency signal output from the multi-resolution analysis module 53 is transmitted to the frequency domain linear prediction and vector quantization module 56.
  • the frequency domain coefficients on each time period are standard.
  • Linear predictive analysis if the predicted gain satisfies a given condition, the frequency domain coefficients are linearly predicted error filtered, and the obtained predictive coefficients are converted into line spectral frequency coefficients LSF (Line Spec t rum Frequency ), and then the best distortion is used.
  • the metric search calculates the codeword index of each codebook, and transmits the code index as side information to the bit stream multiplexing module 55, and the residual sequence obtained through the prediction analysis is output to the quantity ⁇ and the entropy coding module. 54.
  • the frequency domain linear prediction and vector quantization module 56 is composed of a linear predictive analyzer, a linear predictive filter, a converter, and a vector quantizer.
  • the frequency domain coefficient is input into the line 'f prediction analyzer for prediction analysis, and the prediction gain and the prediction coefficient are obtained.
  • the frequency domain coefficients satisfying certain conditions are filtered out to the linear prediction filter to obtain a residual sequence;
  • the difference sequence is directly output to the quantization and entropy coding mode 54, and the prediction coefficient is converted into a line spectral frequency coefficient LSF by the converter, and then the LSF parameter is sent to the vector quantizer for multi-level vector quantization, and the quantized signal is transmitted to In the bit stream multiplexing module 55.
  • CO is a one-sided i-perform corresponding to the positive frequency component of the signal, ie the Hi lber t envelope of the signal is related to the autocorrelation function of the signal spectrum.
  • PSD(f) The relationship between the power ⁇ density function of the signal and the autocorrelation function of its time domain waveform is: PSD(f) , so the squared Hi lber t envelope of the signal in the time domain and the power of the signal in the frequency domain
  • the borrowing density function is mutually dual. From the above, the partial bandpass signal in each certain frequency range, if its Hi lbert envelope remains constant, then the autocorrelation of the adjacent value will remain constant, which means that the spectral coefficient sequence is relative to the frequency. It is a steady-state sequence so that the predictive coding technique can be used to process the speech values, and a common set of prediction coefficients is used to effectively represent the signal.
  • the encoding method based on the encoding apparatus shown in FIG. 9 is basically the same as the encoding method based on the encoding apparatus shown in FIG. 5, except that the following steps are added:
  • the frequency domain coefficients are subjected to multi-resolution analysis, for each time period
  • the frequency domain coefficient is subjected to standard linear prediction analysis to obtain prediction gain and prediction coefficient; whether the prediction gain exceeds the set threshold value, and if it exceeds, frequency domain linear prediction error filtering is performed on the frequency domain coefficient according to the prediction coefficient to obtain a residual
  • the prediction coefficient is converted into a line pair frequency coefficient, and the line potential is subjected to multi-level vector quantization processing to obtain side information; the residual sequence is quantized and entropy encoded; if the prediction gain does not exceed the set threshold,
  • the frequency domain coefficients are then quantized and entropy encoded.
  • the standard linear prediction analysis of the frequency domain coefficients on each time period is first performed, including calculating the autocorrelation matrix and recursively performing the Levinson-Durb in algorithm to obtain the prediction gain and the prediction coefficient. Then, it is judged whether the calculated prediction gain exceeds a preset threshold, and if it exceeds, linear prediction error filtering is performed on the frequency domain coefficients according to the prediction coefficient; otherwise, the frequency domain coefficients are not processed, and the next step is performed to quantize the frequency domain coefficients. And entropy coding.
  • Linear prediction can be divided into forward prediction and backward prediction.
  • Forward prediction refers to predicting the current value by using the value before a certain moment
  • backward prediction refers to predicting the current value by using the value after a certain moment.
  • the frequency domain coefficient YU is filtered to obtain the prediction error.
  • the frequency domain coefficient X (k) of the time-frequency transform output can be represented by the residual sequence E ⁇ 3 ⁇ 4 and a set of prediction coefficients. And then converting the set of prediction coefficients a t into a line 'i Pu frequency coefficient LSF and Perform multi-level vector quantization, vector quantization selects the best distortion metric (such as nearest neighbor criterion), and searches and calculates the codeword index of each codebook, so as to determine the code corresponding to the prediction coefficient, and use the codeword index as Side information output. At the same time, the residual sequence is quantized and entropy encoded.
  • the dynamic range of the residual sequence of the spectral coefficients is smaller than the dynamic range of the original coefficients, so that fewer bits can be allocated in the quantization, or an improved coding gain can be obtained for the same number of bits. .
  • Embodiment 10 is a schematic diagram of Embodiment 1 of a decoding apparatus.
  • the decoding apparatus adds an inverse frequency domain linear prediction and vector quantization module 65 based on the decoding apparatus shown in FIG. 8, and the inverse frequency domain cropping prediction and vector quantization module 65 is located between the output of the inverse quantizer group 62 and the input of the multiresolution synthesis module 63, and the bitstream demultiplexing module 60 outputs inverse frequency domain linear predictive vector quantization control information thereto for inverse quantization f (residual The inverse quantization process and the inverse linear prediction filter-wave are performed to obtain a spectrum before prediction, and output to the multi-resolution synthesis module 63.
  • frequency domain linear predictive vector quantization techniques are used to p-pre-echo and obtain a larger coding gain. Therefore, in the decoder, the inverse frequency domain linear predictive vector quantization control information output by the inverse quantization spectrum and bit stream demultiplexing module 60 is input to the inverse frequency domain linear prediction and vector quantization module 65 to recover the spectrum before the linear prediction.
  • the inverse frequency domain linear prediction and vector quantization module 65 includes an inverse vector quantizer, an inverse transformer, and an inverse linear predictor, wherein the inverse vector quantizer is used to inverse quantize the codeword index to the J line pair frequency coefficient (LSF).
  • the inverse converter is used to inversely convert the line spectral frequency coefficient (LSF) into a prediction coefficient; the inverse linear prediction filter is used to inversely filter the inverse quantized word by the prediction coefficient, obtain the spectrum before prediction, and output to the multi-resolution Synthesis module 63.
  • the decoding method based on the decoding device shown in FIG. 10 is basically the same as the decoding method based on the decoding device shown in FIG. 8, except that the following steps are added: After obtaining the inverse quantization word, it is determined whether or not the inverse quantization spectrum is included in the control information. Inverse frequency domain linear predictive vector quantization information, if it is, then inverse vector quantization process is performed to obtain prediction coefficients, and linear prediction synthesis is performed on the inverse quantization spectrum according to the prediction coefficients to obtain a pre-test spectrum; Resolution synthesis.
  • the pre-predicted spectrum 0 > is subjected to frequency-time mapping processing.
  • control information indicates that the signal frame has not undergone frequency domain linear predictive vector quantization
  • the inverse frequency domain linear predictive vector quantization process is not performed, and the inverse quantization ⁇ is directly subjected to frequency-time mapping processing.
  • FIG 11 is a block diagram showing the structure of a second embodiment of the encoding apparatus of the present invention.
  • This embodiment adds a difference stereo (M/S) encoding module 57 on the basis of FIG. 5 between the output of the multiresolution analysis module 53 and the input of the quantization and entropy encoding module 54.
  • M/S difference stereo
  • a psychoacoustic analysis module 51 in addition to a monaural audio signal is calculated masking threshold value, and further calculates a masking threshold difference channel output to quantization and entropy encoding module 54.
  • the and difference stereo encoding module 57 may also be located between the quantizer group and the encoder in the quantization and entropy encoding module 54.
  • the difference stereo encoding module 57 is to use the correlation between the two channels of the channel pair, and the frequency domain coefficient/residual sequence of the left and right channels is equivalent to the frequency domain coefficient/residual sequence of the difference channel, In this way, the effect of reducing the code rate and improving the coding efficiency is achieved, so that it is only applicable to multi-channel signals of the same signal type. If it is a mono signal or a multi-channel signal of which the signal type is not uniform, the difference and the stereo encoding process are not performed.
  • the encoding method based on the encoding apparatus shown in FIG. 11 is the same as the encoding method based on the encoding apparatus shown in FIG. 5, and the difference is that the following steps are added: before the quantization and encoding processing of the frequency domain coefficients are performed, Whether the audio signal is a multi-channel signal, if it is a multi-channel signal, it is determined whether the signal types of the left and right channel signals are consistent, and if the signal types are the same, it is determined whether the scale factor bands corresponding to the two channels are satisfied and The difference stereo coding condition, if it is satisfied, is subjected to the sum and the difference stereo coding to obtain the frequency domain coefficients of the difference channel; if not, the difference and the stereo coding are not performed; if the mono signal or the signal type is inconsistent For multi-channel signals, the frequency domain coefficients are not processed.
  • the difference stereo coding can be applied before quantization and before entropy coding, that is, after quantization of the frequency domain coefficients, whether the audio signal is a multi-channel signal, and if it is a multi-channel signal, Then, it is judged whether the signal types of the left and right channel signals are consistent. If the signal types are the same, it is judged whether the scale factor bands corresponding to the two channels satisfy the difference and the stereo encoding conditions, and if they are satisfied, the stereo coding is performed. If not, the stereo encoding process is not performed; if it is a mono signal or a multi-channel signal with inconsistent signal types, the frequency domain coefficients are not processed and the differential stereo encoding process is performed.
  • the judgment method adopted by the present invention is: by K-L transform.
  • the specific judgment process is as follows: If the spectral coefficient of the left channel scale factor band is l (k), the spectral coefficient of the scale factor band corresponding to the right channel is r (k), and its correlation matrix
  • the rotation angle a satisfies tan(2iz) and the difference stereo coding mode. Therefore
  • the frequency domain coefficients of the left and right channels in the scale factor band are replaced by the frequency domain coefficients of the linear transform and the difference channel:
  • denotes the channel frequency domain coefficient
  • S denotes the difference channel frequency domain coefficient
  • L denotes the left channel frequency domain coefficient
  • R denotes the right channel frequency domain coefficient.
  • / represents the quantized sum channel frequency domain coefficients; represents the quantized difference channel frequency domain coefficients; represents the quantized left channel frequency domain coefficients; A represents the quantized right channel frequency domain coefficients.
  • Figure 12 is a schematic diagram of a second embodiment of a decoding apparatus.
  • the decoding apparatus adds a difference and difference stereo decoding module 66, based on the decoding apparatus shown in FIG. 8, between the output of the inverse quantizer group 62 and the input of the multi-dividend ratio synthesizing module 63, and receives the bit stream demultiplexing.
  • the signal type analysis result and the difference stereo control signal output by the module 60 are used for The inverse quantized spectrum of the difference channel and the difference channel are converted into inverse quantization pans of the left and right channels according to the above control information.
  • the stereo decoding, and difference stereo decoding module 66 determines whether inverse quantization ⁇ and differential stereo decoding are required in certain scale factor bands based on the flag bits of the scale factor band. If the sum and the difference stereo coding are performed in the encoding device, the inverse quantized speech must be subjected to the difference stereo decoding in the decoding device.
  • the sum and difference stereo decoding module 66 may also be located between the output of the demodulation module 61 and the input of the inverse quantizer group 62, and receive the sum and difference stereo control signals and signal type analysis outputs output by the bitstream demultiplexing module 60.
  • the decoding method based on the decoding apparatus shown in FIG. 12 is basically the same as the decoding method based on the decoding apparatus shown in FIG. 8, except that the following steps are added: after the inverse quantization spectrum is obtained, if the signal type analysis result indicates that the letter type is consistent And determining whether it is necessary to perform inverse stereo decoding on the inverse quantized signal according to the difference stereo control signal; if necessary, determining whether the scale factor band requires and difference stereo decoding according to the flag bit on each scale factor band, ⁇ .
  • the inverse quantized spectrum of the difference channel in the scale factor band is converted into the inverse quantized spectrum of the left and right channels, and then subjected to subsequent processing; if the signal type is inconsistent or does not need to perform and differential stereo decoding, then The quantized spectrum is processed without further processing.
  • the difference stereo decoding can also be performed after the entropy decoding process and before the inverse quantization process, that is, after the quantized value of the spectrum, if the signal type analysis result indicates that the signal types are consistent, it is determined according to the difference and the stereo control signal whether it is necessary to The quantized value of the spectrum is subjected to and the difference stereo decoding; if necessary, the flag bit on each scale factor band is used to determine whether the scale factor band requires and the difference stereo decoding, and if necessary, the sum of the scale factor bands
  • the quantized value of the channel is converted into the quantized value of the spectrum of the left and right channels, and then subjected to subsequent processing; if the signal type is inconsistent or does not need to be subjected to and the difference stereo decoding, the quantized value of the spectrum is not processed, and subsequent processing is directly performed.
  • the frequency domain coefficients of the left and right channels in the scale factor band are obtained by the following operations through the frequency domain coefficients of the difference channel: , where: represents the quantized
  • S represents the quantized difference channel frequency domain coefficient
  • Z represents the quantized left channel frequency domain coefficient
  • f represents the quantized right channel frequency domain coefficient.
  • the frequency coefficients of the left and right channels after inverse quantization in the subband are obtained according to the following matrix operation and the frequency domain coefficients of the difference channel, where: Table and channel frequency domain
  • Fig. 13 is a view showing the configuration of a third embodiment of the encoding apparatus of the present invention. This embodiment, on the basis of Fig. 9, adds a difference and stereo encoding module 57 between the output of the frequency domain linear prediction and vector quantization module 56 and the input of the quantization and entropy encoding module 54, the psychoacoustic analysis module 51 The masking values of the difference channels are output to a 4: quantization and entropy encoding module 54.
  • the sum and difference stereo coding module 57 may also be located between the quantizer group and the scalar in the quantization and entropy coding module 54, and receive the signal type analysis result output by the psychoacoustic analysis module 51.
  • the function and working principle of the and difference stereo coding module 57 are the same as those in FIG. 11, and details are not described herein again.
  • the encoding method based on the encoding apparatus shown in FIG. 13 is basically the same as the encoding method based on the encoding apparatus shown in FIG. 9, except that the following steps are added:
  • the audio signal is judged. Whether it is a multi-channel signal, if it is a multi-channel signal, it is judged whether the signal types of the left and right channel signals are consistent.
  • the scale factor band performs and the difference stereo encoding; if not, the sum and difference stereo encoding processing is not performed; if it is a mono signal or a multi-channel signal of which the signal type is inconsistent, the difference stereo encoding processing is not performed.
  • the difference stereo coding can be applied before quantization and before entropy coding, that is, after quantization of the frequency domain coefficients, whether the audio signal is a multi-channel signal, and if it is a multi-channel signal, And determining whether the signal types of the left and right channel signals are consistent. If the signal types are consistent, determining whether the scale factor band satisfies the coding condition, and if so, performing the difference stereo coding on the scale factor band; if not, not Performing and difference stereo encoding processing; if it is a mono signal or a multi-channel signal of which the signal type is inconsistent, the pairing and the difference stereo encoding processing are performed.
  • Figure 14 is a block diagram showing a third embodiment of the decoding apparatus.
  • the decoding apparatus adds a sum and difference stereo decoding module 66 on the basis of the decoding apparatus shown in Fig. 10, between the output of the inverse quantizer group 62 and the input of the inverse frequency domain linear prediction and vector quantization module 65, the bit stream
  • the demultiplexing module 60 outputs and the difference stereo control signals thereto.
  • the sum and difference stereo decoding module 66 may also be located between the output of the entropy decoding module 61 and the input of the inverse quantizer group 62, and receive the sum and difference stereo control signals output by the bit stream demultiplexing module 60.
  • the decoding method based on the decoding device shown in FIG. 14 is basically the same as the decoding method based on the decoding device shown in FIG. 10, and the difference is that the following steps are added: after the inverse quantization pan is obtained, if the signal type analysis result indicates that the signal type is consistent, Then, according to the difference stereo control signal, it is judged whether it is necessary to perform inverse stereo quantization and differential stereo decoding; if necessary, Determining whether the scale factor band requires and difference stereo decoding according to the flag bit on each scale factor band, if necessary, converting the inverse quantization spectrum of the difference channel in the scale factor band into the inverse quantization spectrum of the left and right aisles Then, the subsequent processing is performed; if the signal types are inconsistent or the difference stereo decoding is not required, the inverse quantization spectrum is not processed, and the subsequent processing is directly performed.
  • the difference stereo decoding can also be performed before the inverse quantization process, that is: after obtaining the quantized value of ⁇ , if the signal type analysis result indicates that the signal type is consistent, it is determined according to the difference and the stereo control ft number whether the quantized value of the spectrum needs to be performed. And difference stereo decoding; if necessary, determining whether the scale factor band requires and difference stereo decoding according to the flag bit of each scale factor band, and if necessary, converting the quantized value of the spectrum of the scale factor band and the difference channel
  • the quantized values of the spectrum of the left and right channels are subjected to subsequent processing; if the signal type is consistent or no difference stereo decoding is required, the quantized values of the spectrum are not processed, and the subsequent processing is directly performed'
  • Fig. 15 shows a schematic representation of a fourth embodiment of the encoding device of the present invention.
  • a resampling module 590 and a band extending module 591 are added, wherein the resampling module 590 resamples the input audio signal, changes the sampling rate of the audio signal, and then changes The sampling rate audio signal is output to the signal property analysis module 50; the band extension module 591 is configured to analyze the input audio signal over the entire frequency band, and extract the spectral envelope of the high frequency portion and its relationship with the low frequency portion. , and 3 ⁇ 4" out to the bit stream multiplexing module 55.
  • the resampling module 590 is configured to resample the input audio signal, and the t sampling includes both upsampling and downsampling.
  • the following sampling is used as an example to illustrate resampling.
  • the resampling block 590 includes a low pass filter and a down sampler, wherein the low pass filter is used to limit the frequency band of the audio signal, eliminating aliasing that may be caused by the sample.
  • the input audio signal is low-pass filtered and downsampled. Assume that the input audio signal is "s (n), the impulse response is 1 ⁇ ).
  • the sampling rate is reduced by a factor of 1 compared to the original input audio signal.
  • the band extension module 591 After the original input audio signal is input to the band extension module 591, the analysis is performed on the entire frequency band, and the spectral envelope of the high-frequency portion and its characteristics related to the low-frequency portion are extracted, and output as the frequency extension control information to the bit stream multiplexing. Module 55.
  • the basic principle of frequency band expansion is: For most audio signals, the characteristics of the high frequency part have a strong correlation with the characteristics of the low frequency part, so the high frequency part of the audio signal can be effectively reconstructed by the low frequency part, thus, The high frequency portion of the audio signal may not be transmitted. In order to ensure that the high frequency part can be reconstructed correctly, only the compressed audio stream is compressed. It is sufficient to transmit a small number of band extension control signals.
  • the band expansion module 591 includes a parameter extraction module and a language envelope extraction module, and the input signal enters the parameter extraction module to extract parameters indicating spectral characteristics of the input signal in different time-frequency regions, and then in the spectral envelope extraction module, at a certain time
  • the frequency resolution estimates the spectral envelope of the high frequency portion of the signal. To ensure that the time-frequency resolution is best suited to the characteristics of the current input signal, the time-frequency resolution of the spectral envelope is freely selectable.
  • the parameters of the input signal spectrum characteristics and the speech envelope of the high frequency portion are output as a band extension control signal to the bit stream multiplexing module 55 for multiplexing.
  • the bitstream multiplexing module 55 receives the code stream including the common scale factor, the scale factor coded value, the codebook sequence number coded value, and the lossless coded quantized spectrum or the coded value of the codeword index and the band extension output by the quantization and entropy coding module 54. After the band extension control signal output by the module 591 is multiplexed, a compressed audio data stream is obtained.
  • the encoding method based on the encoding apparatus shown in FIG. 15 specifically includes: analyzing an input audio signal over the entire frequency band, extracting a high spectral envelope and a signal spectral characteristic parameter as a frequency band extension control signal; resampling the input audio signal and a signal Type analysis; calculating the signal-to-mask ratio of the resampled signal; performing time-frequency mapping on the resampled signal to obtain a frequency domain coefficient of the audio signal; performing quantization and entropy coding on the frequency domain coefficient; and extending the frequency band control signal and coding
  • the audio stream is multiplexed to obtain a compressed audio stream.
  • the resampling process includes two steps: limiting the frequency band of the audio signal; and down-sampling the audio signal of the limited band.
  • FIG. 16 is a schematic structural diagram of Embodiment 4 of the decoding apparatus. The embodiment is based on the decoding apparatus shown in FIG. 8. A band extension module 68 is added, and the band extension control information and frequency output by the bit stream demultiplexing module 60 are received. - The time domain audio signal of the low frequency band output by the time mapping module 64 is reconstructed by frequency shifting and high frequency adjustment to output a wideband audio signal.
  • the decoding method based on the decoding apparatus shown in FIG. 16 is basically the same as the decoding method based on the decoding apparatus shown in FIG. 8, except that the following steps are added: After obtaining the time domain audio signal, the control information and the time domain audio are expanded according to the frequency band. The signal reconstructs the high frequency portion of the audio signal to obtain a wideband audio signal.
  • Figures 17, 1 and 21 are fifth to seventh embodiments of the encoding apparatus, based on the encoding apparatus shown in Figures 11, 9, and 13, respectively, with the addition of the resampling module 590 and the band extending module 591.
  • the connection relationship, functions, and principles of the two modules are the same as those in FIG. 15 and will not be described here.
  • a band extension module 68 is added, and a bit stream is received.
  • the band extension control information output by the demultiplexing module 60 and the time domain audio signal of the low frequency band output by the frequency-time mapping module 64 reconstruct the high frequency signal portion by spectrum shifting and high frequency adjustment, and output a wideband audio signal.
  • the gain control module may further be included, and the signal property analysis module is received.
  • 50 output audio signal, controlling the dynamic range of the fast-changing type signal, eliminating pre-echo in the audio, and outputting the output to the time-frequency mapping module 52 and the psychoacoustic analysis module 51, and simultaneously adjusting the gain to the bit stream Use module 55.
  • the gain control module only controls the fast-change type signal according to the signal type of the audio signal, and does not process and directly output the slowly-changing type signal.
  • the gain control module adjusts the time-domain energy envelope of the signal, and increases the gain value of the signal before the fast-changing point, so that the time-domain signal degrees before and after the fast-changing point are relatively close; then the time domain energy is adjusted.
  • the time domain signal of the envelope is output to the time-frequency mapping module 52, and the increased adjustment amount is output to the bit stream multiplexing module 55.
  • the encoding method based on the encoding device is basically the same as the encoding method based on the above encoding device, with the difference that the following steps are added: Gain control is performed on the signal subjected to signal type analysis.
  • an inverse gain control module fe may be further included, and after receiving the output of the frequency-time mapping module 64, the signal type and the gain adjustment amount output by the bit stream demultiplexing module 60 are received. Information, used to adjust the gain of the time domain signal, to control the pre-echo. After the inverse gain control module is connected to the reconstructed time domain signal outputted by the frequency-time mapping module 64, the fast-changing type signal is controlled, and the slow-changing type is not processed.
  • the inverse gain control module 4N3 ⁇ 4 gain adjustment amount information adjusts the energy envelope of the reconstruction or the signal, reduces the amplitude value of the signal before the fast-changing point, and returns the energy envelope to the original front low and high State, such that the magnitude of the quantization noise before the fast change point is correspondingly reduced with the amplitude value of the signal, thereby controlling the pre-return.
  • the decoding method based on the decoding device is basically the same as the decoding method based on the above decoding device, with the difference that the following steps are added: inverse gain control is performed on the reconstructed time domain signal.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

L'invention concerne un dispositif de codage audio amélioré comprenant un module d'analyse psychoacoustique, un module de mappage temporel/fréquentiel, un module de quantification et de codage entropique, un module de multiplexage de trains de bits, un module d'analyse de caractéristiques de signal et un module d'analyse multirésolution. Le module d'analyse de caractéristiques de signal est configuré pour analyser le type de signal audio entré. Le module d'analyse psychoacoustique calcule un seuil de masquage ainsi qu'un rapport signal/masquage du signal radio, qu'il envoie au module de quantification et de codage entropique. Le module d'analyse multirésolution est configuré pour effectuer une analyse multirésolution sur la base du type de signal. Le module de quantification et de codage entropique réalise une quantification et un codage entropique des coefficients de domaine de fréquence en fonction du rapport signal/masquage. Le module de multiplexage de trains de bits forme un flux de codes pour le codage audio. Ce dispositif peut supporter un signal audio dont la fréquence d'échantillonnage est comprise entre 8 kHz et 192 kHz
PCT/CN2005/000440 2004-04-01 2005-04-01 Ameliorations apportees a un procede et un dispositif de codage/decodage audio Ceased WO2005096273A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP05742018A EP1873753A1 (fr) 2004-04-01 2005-04-01 Ameliorations apportees a un procede et un dispositif de codage/decodage audio

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200410030946.9 2004-04-01
CN200410030946 2004-04-01

Publications (1)

Publication Number Publication Date
WO2005096273A1 true WO2005096273A1 (fr) 2005-10-13

Family

ID=35064017

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2005/000440 Ceased WO2005096273A1 (fr) 2004-04-01 2005-04-01 Ameliorations apportees a un procede et un dispositif de codage/decodage audio

Country Status (2)

Country Link
EP (1) EP1873753A1 (fr)
WO (1) WO2005096273A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090006081A1 (en) * 2007-06-27 2009-01-01 Samsung Electronics Co., Ltd. Method, medium and apparatus for encoding and/or decoding signal
CN108962266A (zh) * 2014-03-24 2018-12-07 杜比国际公司 对高阶高保真立体声信号应用动态范围压缩的方法和设备
CN112530444A (zh) * 2019-09-18 2021-03-19 华为技术有限公司 音频编码方法和装置
US20240283945A1 (en) * 2016-09-30 2024-08-22 The Mitre Corporation Systems and methods for distributed quantization of multimodal images
US20240339119A1 (en) * 2014-05-01 2024-10-10 Nippon Telegraph And Telephone Corporation Coding device, decoding device, and method and program thereof
US12431148B2 (en) 2014-03-31 2025-09-30 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Encoding device, decoding device, encoding method, decoding method, and non-transitory computer-readable recording medium

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101756834B1 (ko) 2008-07-14 2017-07-12 삼성전자주식회사 오디오/스피치 신호의 부호화 및 복호화 방법 및 장치
TWI430263B (zh) * 2009-10-20 2014-03-11 弗勞恩霍夫爾協會 音訊信號編碼器、音訊信號解碼器、使用混疊抵消來將音訊信號編碼或解碼之方法
EP2372704A1 (fr) * 2010-03-11 2011-10-05 Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V. Processeur de signal et procédé de traitement d'un signal
JP5704018B2 (ja) * 2011-08-05 2015-04-22 富士通セミコンダクター株式会社 オーディオ信号符号化方法および装置
CN110706715B (zh) 2012-03-29 2022-05-24 华为技术有限公司 信号编码和解码的方法和设备
WO2014046916A1 (fr) * 2012-09-21 2014-03-27 Dolby Laboratories Licensing Corporation Approche de codage audio spatial en couches
WO2020039000A1 (fr) 2018-08-21 2020-02-27 Dolby International Ab Codage d'événements transitoires denses avec compression-extension
WO2025227292A1 (fr) * 2024-04-28 2025-11-06 北京小米移动软件有限公司 Procédé et appareil de traitement audio, et support de stockage

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5537510A (en) * 1994-12-30 1996-07-16 Daewoo Electronics Co., Ltd. Adaptive digital audio encoding apparatus and a bit allocation method thereof
US5613035A (en) * 1994-01-18 1997-03-18 Daewoo Electronics Co., Ltd. Apparatus for adaptively encoding input digital audio signals from a plurality of channels
CN1388517A (zh) * 2002-06-05 2003-01-01 北京阜国数字技术有限公司 一种基于伪小波滤波的音频编/解码技术
CN1461112A (zh) * 2003-07-04 2003-12-10 北京阜国数字技术有限公司 一种基于极小化全局噪声掩蔽比准则和熵编码的量化的音频编码方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5613035A (en) * 1994-01-18 1997-03-18 Daewoo Electronics Co., Ltd. Apparatus for adaptively encoding input digital audio signals from a plurality of channels
US5537510A (en) * 1994-12-30 1996-07-16 Daewoo Electronics Co., Ltd. Adaptive digital audio encoding apparatus and a bit allocation method thereof
CN1388517A (zh) * 2002-06-05 2003-01-01 北京阜国数字技术有限公司 一种基于伪小波滤波的音频编/解码技术
CN1461112A (zh) * 2003-07-04 2003-12-10 北京阜国数字技术有限公司 一种基于极小化全局噪声掩蔽比准则和熵编码的量化的音频编码方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
P. P. VAIDYNATHAN: "Multirate Systems and Filter Banks", 1993, PRENTICE HALL

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090006081A1 (en) * 2007-06-27 2009-01-01 Samsung Electronics Co., Ltd. Method, medium and apparatus for encoding and/or decoding signal
US12273696B2 (en) 2014-03-24 2025-04-08 Dolby Laboratories Licensing Corporation Method and device for applying dynamic range compression to a higher order ambisonics signal
CN108962266B (zh) * 2014-03-24 2023-08-11 杜比国际公司 对高阶高保真立体声信号应用动态范围压缩的方法和设备
US11838738B2 (en) 2014-03-24 2023-12-05 Dolby Laboratories Licensing Corporation Method and device for applying Dynamic Range Compression to a Higher Order Ambisonics signal
CN108962266A (zh) * 2014-03-24 2018-12-07 杜比国际公司 对高阶高保真立体声信号应用动态范围压缩的方法和设备
US12431148B2 (en) 2014-03-31 2025-09-30 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Encoding device, decoding device, encoding method, decoding method, and non-transitory computer-readable recording medium
US20240339119A1 (en) * 2014-05-01 2024-10-10 Nippon Telegraph And Telephone Corporation Coding device, decoding device, and method and program thereof
US12431151B2 (en) * 2014-05-01 2025-09-30 Nippon Telegraph And Telephone Corporation Coding device, decoding device, and method and program thereof
US20240283945A1 (en) * 2016-09-30 2024-08-22 The Mitre Corporation Systems and methods for distributed quantization of multimodal images
US12309395B2 (en) * 2016-09-30 2025-05-20 The Mitre Corporation Systems and methods for distributed quantization of multimodal images
CN112530444A (zh) * 2019-09-18 2021-03-19 华为技术有限公司 音频编码方法和装置
CN112530444B (zh) * 2019-09-18 2023-10-03 华为技术有限公司 音频编码方法和装置
US12057129B2 (en) 2019-09-18 2024-08-06 Huawei Technologies Co., Ltd. Audio coding method and apparatus

Also Published As

Publication number Publication date
EP1873753A1 (fr) 2008-01-02

Similar Documents

Publication Publication Date Title
WO2005096274A1 (fr) Dispositif et procede de codage/decodage audio ameliores
EP1914724B1 (fr) Codage à double transformation de signaux audio
CN110310659B (zh) 用重构频带能量信息值解码或编码音频信号的设备及方法
JP4081447B2 (ja) 時間離散オーディオ信号を符号化する装置と方法および符号化されたオーディオデータを復号化する装置と方法
CN101276587B (zh) 声音编码装置及其方法和声音解码装置及其方法
AU2006332046B2 (en) Scalable compressed audio bit stream and codec using a hierarchical filterbank and multichannel joint coding
KR101161866B1 (ko) 오디오 코딩 장치 및 그 방법
JP5395917B2 (ja) 多チャンネルデジタル音声符号化装置および方法
US7275036B2 (en) Apparatus and method for coding a time-discrete audio signal to obtain coded audio data and for decoding coded audio data
US9037454B2 (en) Efficient coding of overcomplete representations of audio using the modulated complex lapped transform (MCLT)
US20100023336A1 (en) Compression of audio scale-factors by two-dimensional transformation
CN102436819B (zh) 无线音频压缩、解压缩方法及音频编码器和音频解码器
EP1612772A1 (fr) Procédé et dispositif de codage/décodage à bas débit
CN103329197A (zh) 用于反相声道的改进的立体声参数编码/解码
WO2006003891A1 (fr) Dispositif de decodage du signal sonore et dispositif de codage du signal sonore
KR19990041073A (ko) 비트율 조절이 가능한 오디오 부호화/복호화 방법 및 장치
KR20080035454A (ko) 고속 래티스 백터 양자화
CN103366750B (zh) 一种声音编解码装置及其方法
WO2005096273A1 (fr) Ameliorations apportees a un procede et un dispositif de codage/decodage audio
CN101162584A (zh) 使用带宽扩展技术对音频信号编码和解码的方法和设备
CN101241701A (zh) 音频解码
CN1677492A (zh) 一种增强音频编解码装置及方法
WO2005096508A1 (fr) Equipement de codage et de decodage audio ameliore, procede associe
CN100555413C (zh) 可伸缩地编解码音频数据的方法和装置
WO2006056100A1 (fr) Procede et dispositif de codage/decodage utilisant la redondance des signaux intra-canal

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 69(1) EPC - FORM EPO 1205A DATED 21-03-2007

WWE Wipo information: entry into national phase

Ref document number: 2005742018

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2005742018

Country of ref document: EP