WO2007028280A1 - Codeur et decodeur pour commande de pre echo et son procede - Google Patents
Codeur et decodeur pour commande de pre echo et son procede Download PDFInfo
- Publication number
- WO2007028280A1 WO2007028280A1 PCT/CN2005/001435 CN2005001435W WO2007028280A1 WO 2007028280 A1 WO2007028280 A1 WO 2007028280A1 CN 2005001435 W CN2005001435 W CN 2005001435W WO 2007028280 A1 WO2007028280 A1 WO 2007028280A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- module
- window function
- signal
- echo
- analysis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/022—Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
- G10L19/025—Detection of transients or attacks for time/frequency resolution switching
Definitions
- the present invention relates to an apparatus and method for encoding and decoding pre-echo, and more particularly to an audio encoding and decoding apparatus for controlling pre-echo using a modified window function method and method.
- a perceptual encoder is used to encode and compress audio information.
- a conventional perceptual encoder generally has a psychoacoustic module, and the psychoacoustic module functions to analyze "unrelated components" in an audio signal. After the "unrelated component”, the quantification module is used to process these "unrelated components", so that the audio signal reaches "perceived transparency", that is, it has no influence on the human feeling or the influence is within an acceptable range.
- the psychoacoustic module analyzes "unrelated components”, it mainly uses the masking phenomenon of the human ear. The so-called “masking phenomenon”, as shown in Fig.
- Masking is further divided into s imul taneous masking, pre-masking, and pos t-masking. Among them, forward masking 2 and backward masking 3 are expressed in the time domain, so there is an additional requirement for the time domain characteristics of the perceptual encoder, that is, to achieve transparent and transparent coding quality, the quantization noise must also have a time domain. The associated masking threshold. But this requirement is not easy to implement for an actual perceptual encoder.
- the block time conversion method is used to transform the audio time domain signal into the frequency domain, and then the quantization error caused by the quantization and coding of the transformed spectral coefficient is reconstructed by the synthesis filter. Diffusion occurs in the time domain.
- filter designs such as a modified discrete cosine transform (MDCT) filter with a window length of 2048 sample points, the signal with a sampling frequency of 48000 Hz is transformed, and after being reconstructed by the synthesis filter, the quantization is caused.
- the diffusion of the error is about 42. 7ms. If the stronger energy of the signal in the analysis window is mainly concentrated in a very d, part, then the quantization noise will spread until the signal appears.
- MDCT modified discrete cosine transform
- the quantization noise is even higher than the energy level of the original signal. This is called the "pre-echo" phenomenon, as shown in Figure 2 and Figure 3.
- 2 is an uncoded audio signal time domain graph
- Fig. 3 is a time domain graph of the encoded reconstructed audio signal. The portion circled by an ellipse in Fig. 3 It is the pre-echo 5, according to the characteristics of the human ear, if the coding noise lasts for a short time before the signal break point, the forward echo can be masked by forward masking, otherwise the coding noise will be perceived by the human ear.
- the time domain characteristics of the quantization noise should be considered when designing the encoder to ensure that the time domain masking condition is satisfied, and the pre-echo phenomenon is always a fast variable type signal (such as a castanets signal).
- a major difficulty in code rate is always a fast variable type signal (such as a castanets signal).
- the prior art includes the following: Specific bit cell control technology: The filter coefficient is covered by the filter group in the fast variable segment window, and the coding precision is increased. This greatly increases the number of bits required for fast variable frame coding. This method cannot be used for fixed rate encoders. In the MPEG-1 standard, the bit pool method is used to use the bits left by the previous frame when the bits require a peak, thereby maintaining an average constant code rate. In reality, however, if you encounter a very fast-changing signal, you need a huge bit pool that can't be encoded.
- Adaptive sensing is used in many perceptual encoders. This method can adaptively adjust the size of the filter bank window according to the characteristics of the input signal; the steady-state part or the slow-changing part adopts a long-time window, and the fast-changing signal part adopts a short-time window for encoding.
- This approach increases the amount of encoder computation and complicates the encoder structure. Since different window lengths require different interpretations and normalizations of psychoacoustic models, as well as different frequency bands and noise-free coding structures, window switching significantly increases the complexity of the encoder structure. In addition, when using overlapping additive structure filter banks, window switching decisions require additional buffering and delay of the encoder, resulting in greater end-to-end delay. Finally, although the long and short windows have better time-frequency local characteristics, the start and end windows introduce larger inefficient coding.
- Filter bank switching techniques are techniques that control the pre-echo using different filter bank modes. Specifically, in the slow-change signal type, a cosine-modulated filter bank with a high frequency resolution is used; in a fast-changing signal type, a wavelet filter bank is used. When the two filter bank modes are switched to each other, it is difficult to ensure complete reconstruction of the transition block.
- Time domain noise shaping (TNS) technology Time domain noise shaping technology is to judge the signal type after the signal is transformed into the frequency domain coefficient by the filter bank. If it is a fast variable type signal, the frequency domain coefficient is not directly quantized. Instead, the frequency domain coefficients are first linearly predicted and then the residual sequence is quantized. The use of TNS technology will increase the amount of sideband information, affecting the overall coding efficiency.
- the encoder includes: The psychoacoustic analysis module 201, the window function module 202, the time-frequency mapping module 203, the quantization and entropy encoding module 204, and the code stream multiplexing module 205.
- the psychoacoustic analysis module 201 is configured to calculate the perceptual entropy and the masking threshold of the input audio signal, and determine whether the audio signal frame signal type is a fast variable type signal or a slowly varying type signal according to the perceptual entropy.
- the length of the analysis window function of the window function module 202 is determined according to the signal type output by the psychoacoustic analysis module 201. Specifically, if the frame signal is a fast-changing signal type, in order to prevent the pre-echo a window with a 256 sample point length with a higher temporal resolution and a lower frequency resolution; if the frame signal is a slowly varying signal type, to ensure encoding efficiency, a 2048 with a lower temporal resolution and a higher frequency resolution is used. The window of the sample point length.
- the time-frequency mapping module 203 is configured to convert the time domain audio signal into frequency domain coefficients and output to the quantization and entropy coding module 204; the quantization and entropy coding module 204 controls the masking threshold output by the psychoacoustic analysis module 201, The domain coefficients are quantized and entropy encoded and output to the code stream multiplexing module 205; the code stream multiplexing module 205 is configured to multiplex the received data to form an audio coded code stream.
- the window function module 202 uses windows of different length sample points, so that the structural complexity of the entire encoder becomes higher.
- variance estimation is performed on a frame signal to obtain a standard deviation, where ⁇ .
- a DCT transform is performed on the input sequence to obtain V, and is quantized to obtain V ⁇ .
- the quantized sequence is transmitted to the decoding end, inverse DCT transform is performed to obtain V ⁇ , and the last sequence is multiplied to obtain the reconstructed speech signal ⁇ . If this method is directly applied to audio coding, the pre-echo problem that occurs in fast-changing signal frames is still powerless.
- the transform-based speech coding method cannot change the characteristics of the frame signal 6 after dividing by the quantized standard deviation, that is, the frame signal is still non-stationary, as shown in FIG. 5, for applying the transform-based speech coding.
- Method of audio signal time domain graphics If you improve it, estimate a standard deviation ⁇ ⁇ before the fast change point. A standard deviation of 2 is estimated after the fast change point, and quantized as and as shown in Fig. 6, which is an audio signal time domain pattern of the improved transform-based speech coding method. In this way, first of all, fast change
- the improved method is for speech coding. For audio coding, this improved method is difficult to eliminate fast effects, and the coding efficiency is very low.
- a primary object of the present invention is to provide an encoding apparatus for controlling pre-echo, which has a simple structure and can effectively control a pre-echo phenomenon during audio encoding.
- Another object of the present invention is to provide an encoding method for controlling pre-echo, which can effectively control the pre-echo phenomenon during audio encoding.
- the present invention provides an encoding apparatus for controlling pre-echo, which includes: a signal type analyzing module for judging a signal type of an input audio signal frame, and outputting a fast change point position and a quantized mutation intensity Parameter
- a correction window function module coupled to the signal type analysis module, configured to modify the analysis window function and window-process the input audio signal frame, and output the windowed time domain audio signal; a time-frequency mapping module, And the correction window function module is configured to convert the windowed time domain audio signal into a frequency domain coefficient; a psychoacoustic analysis module, configured to perform psychoacoustic processing on the input audio signal frame, and a masking threshold parameter of the output scale factor band; a quantization and entropy coding module, which is respectively connected to the time-frequency mapping module and the psychoacoustic analysis module, and configured to output the time-frequency mapping module according to the masking threshold parameter output by the psychoacoustic analysis module
- the frequency domain coefficients are quantized and entropy encoded, and the encoded code stream is output;
- a code stream multiplexing module coupled to the quantization and entropy coding module and the signal type analysis module, configured to output the coded code stream and the signal type analysis module output by the quantization and entropy coding module Multiplexing is performed and an audio coded stream is formed.
- the present invention provides an encoding method for controlling pre-echo, which includes the following steps:
- Step 1 the signal type analysis module determines whether the signal type of the input audio signal frame is a fast change type signal, and the signal type analysis module calculates a parameter of the fast change point position and a sudden intensity parameter of the audio signal frame, and The mutation intensity parameter is quantized to obtain a quantized value of the mutation intensity, and then step 2 is performed; otherwise, the correction window function module uses the original analysis window function to window the audio signal frame to obtain a windowed time domain signal. , then perform step 4;
- Step 2 The correction window function module linearly transforms the analysis window function to obtain a modified analysis window function
- Step 3 The correction window function module adds a window to the audio signal frame by using a modified analysis window function to obtain a time domain signal after windowing;
- Step 4 The time-frequency mapping module performs time-frequency mapping processing on the windowed time domain signal to obtain a frequency domain coefficient.
- Step 5 The quantization and entropy coding module quantizes and entropy encodes the frequency domain coefficients according to a masking threshold parameter of a scale factor band obtained by psychoacoustic processing of the audio signal frame by the psychoacoustic module, to obtain the encoded audio code stream. ;
- Step 6 The code stream multiplexing module multiplexes the encoded audio code stream and the result of the signal type analysis to obtain a compressed audio code stream.
- the present invention provides a decoding apparatus for controlling pre-echo, which includes:
- a code stream demultiplexing module configured to demultiplex the compressed audio code stream
- An inverse quantization and entropy decoding module is connected to the code stream demultiplexing module, configured to decode and inverse quantize the demultiplexed audio code stream, and output inverse quantized frequency domain coefficients; a frequency time mapping module, coupled to the inverse quantization and entropy decoding module, configured to transform the inverse quantized frequency domain coefficients into a time domain signal;
- a correction window function module is coupled to the frequency time mapping module for modifying the integrated window function and windowing the time domain signal.
- the present invention provides a decoding method for controlling pre-echo, which includes the following steps:
- Step 1 The code stream demultiplexing module demultiplexes the input compressed audio code stream to obtain the demultiplexed audio code stream and side information.
- Step 2 The inverse quantization and entropy decoding module performs inverse quantization and entropy decoding on the demultiplexed audio code stream to obtain inversely quantized frequency domain coefficients;
- Step 3 The frequency time mapping module performs frequency-frequency mapping processing on the inverse-quantized frequency domain coefficients to obtain a time domain signal.
- Step 4 The correction window function module determines, according to the demultiplexed side information, whether the signal type of the audio signal frame is a fast change type, if yes, step 5 is performed; otherwise, step 6 is performed;
- Step 5 The correction window function module linearly transforms the integrated window function to obtain a modified integrated window function, and then uses the modified integrated window function to window the time domain signal to obtain the reconstructed audio signal;
- Step 6 The correction window function module uses the original integrated window function to window the time domain signal to obtain a reconstructed audio signal.
- the present invention has the following advantages: Since the window function employs a fixed window length, the structure of the encoding device is compressed, and the pre-echo phenomenon during audio encoding is controlled while the complete reconstruction of the audio signal is ensured.
- Figure 1 is a masking characteristic diagram of the human ear.
- Figure 2 is an uncoded audio signal time domain graph.
- Figure 3 is a time domain graph of the encoded audio signal after reconstruction.
- FIG. 4 is a structural block diagram of a prior art audio encoding device.
- FIG. 5 is an audio signal time domain graph to which a transform-based speech encoding method is applied.
- FIG. 6 is an audio signal time domain graph to which an improved transform-based speech encoding method is applied.
- Fig. 7 is a block diagram showing the configuration of a first embodiment of an apparatus for controlling pre-echo of the present invention.
- Fig. 8 is a flow chart showing the first embodiment of the encoding method of the pre-control echo according to the present invention.
- Figure 9 is a schematic illustration of the original analysis window and the original synthesis window of the encoding and decoding method for controlling the pre-echo of the present invention.
- Figure 10 is a schematic illustration of a modified analysis window of the encoding method for controlling pre-echo in accordance with the present invention.
- Figure 11 is a schematic illustration of a modified integrated window of the decoding method for controlling the pre-echo of the present invention.
- Fig. 12 is a view showing the window function correction in accordance with the full reconstruction condition in the encoding method of the pre-control echo of the present invention.
- Figure 13 is a diagram showing the modified analysis window function of the transition block of the encoding method of the pre-echo control of the present invention.
- Figure 14 is a diagram showing the modified integrated window function of the transition block of the decoding method for controlling the pre-echo of the present invention.
- Fig. 15 is a block diagram showing the configuration of a second embodiment of the apparatus for controlling pre-echo of the present invention.
- Fig. 16 is a flow chart showing the second embodiment of the encoding method of the pre-control echo according to the present invention.
- Figure 17 is a block diagram showing the structure of a first embodiment of a decoding apparatus for controlling pre-echo of the present invention.
- Figure 18 is a flow chart showing Embodiment 1 of the decoding method of the pre-control echo according to the present invention.
- Fig. 19 is a block diagram showing the configuration of a second embodiment of the decoding apparatus for controlling the pre-echo of the present invention.
- FIG. 20 is a flow chart showing Embodiment 2 of the decoding method of the pre-control echo according to the present invention. detailed description
- the present invention utilizes a modified window function (MWF) to control the pre-echo signal appearing in the audio coding, thereby realizing the pre-echo phenomenon when controlling the audio coding while ensuring complete reconstruction of the audio signal.
- MPF modified window function
- FIG. 7 is a structural block diagram of Embodiment 1 of an encoding apparatus for controlling pre-echo of the present invention.
- the device is composed of the following functional modules: a signal type analysis module 301, configured to determine a signal type of the input audio signal frame, and output a fast change point position and a quantized abrupt intensity parameter, wherein the signal type analysis module 301 includes a signal.
- a type analyzer configured to determine that the input audio frame signal is The slowly varying type signal is also a fast variable type signal; a fast change point positioner is connected, coupled to the signal type analyzer for calculating a position of the fast change point; a mutation strength calculator, connected to the signal type analyzer a mutation intensity for calculating a signal; a mutation intensity quantizer coupled to the mutation intensity calculator for quantizing the intensity of the mutation of the calculated signal; a correction window function module 302, and the signal type analysis module a 301 connection, configured to modify an analysis window function and window-process the input audio signal frame, and output a windowed time domain audio signal, thereby improving a time resolution of encoding the fast-changing signal; and a time-frequency mapping module 304, connected to the modified window function module 302, configured to convert the windowed time domain audio signal into a frequency domain coefficient; a psychoacoustic analysis module 303, configured to perform psychoacoustic processing on the input audio signal frame And outputting a masking threshold parameter of the scale factor band;
- the time-frequency mapping module 304 is composed of a filter bank, which may be a discrete Fourier transform (DFT) filter bank, a discrete cosine transform (DCT) filter bank, a modified discrete cosine transform (MDCT) filter bank, and a cosine modulation filter. Group and so on.
- DFT discrete Fourier transform
- DCT discrete cosine transform
- MDCT modified discrete cosine transform
- the window length in the analysis window function is equal to the audio signal frame length, and
- the window function can select Hanning window, Hamming window, Blacknan window; when using modified discrete cosine transform (MDCT) filter bank, the window length in the analysis window function is the audio signal frame. It is twice as long, and the window function can select any window function that conforms to the condition of the modified discrete cosine transform.
- the quantizer is composed of a set of sub-quantizers, each of which quantizes the frequency domain coefficients of the local region according to the masking threshold of the specific time-frequency region output by the psychoacoustic analysis module 303, usually This area is called the scale factor band.
- the quantizer can employ a scalar quantizer and a vector quantizer, such as a Moving Picture Experts Group Advanced Audio Coding (MPEG AAC) nonlinear scalar quantizer, and a Moving Picture Experts Group Dual (MPEG TwinVQ) vector quantizer.
- MPEG AAC Moving Picture Experts Group Advanced Audio Coding
- MPEG TwinVQ Moving Picture Experts Group Dual
- Step 21 the signal type analysis module 301 determines whether the signal type of the input audio signal frame is a fast change type signal, if yes, go to step 22, otherwise go to step 25;
- Step 22 The signal type analysis module 301 calculates a parameter of the fast change point position and a mutation intensity parameter of the audio signal frame, and quantizes the mutation strength parameter to obtain a quantized value of the mutation intensity;
- Step 23 The correction window function module 302 performs an equal scaling reduction on the function value of the fast change point position of the analysis window function, and the reduced value is equal to the quantization value of the mutation intensity, and the modified analysis window function is obtained;
- Step 24 the correction window function module 302 uses the modified analysis window function to window the audio signal frame to obtain a windowed time domain signal, and step 26 is performed;
- Step 25 the correction window function module 302 uses the original analysis window function to window the audio signal frame to obtain a windowed time domain signal, and step 26 is performed;
- Step 26 The time-frequency mapping module 304 performs time-frequency mapping processing on the windowed time domain signal to obtain a frequency domain coefficient.
- Step 27 The quantization and entropy coding module 305 quantizes and entropy encodes the frequency domain coefficients according to the masking threshold parameter of the scale factor band obtained by the psychoacoustic module 303 performing psychoacoustic processing on the audio signal frame, to obtain the encoded Audio stream
- Step 28 The code stream multiplexing module 306 multiplexes the encoded audio code stream and the result of the signal type analysis to obtain a compressed audio code stream.
- step 21 of the above decoding method while the signal type analysis module 301 determines the signal type of the input audio signal frame, the psychoacoustic module 303 performs psychoacoustic processing on the audio signal frame to obtain masking of the scale factor band. Threshold parameter.
- the psychoacoustic processing is a masking curve for calculating a current frame signal according to a human ear hearing characteristic, and a masking threshold of a specific time-frequency region can be calculated according to the masking curve for guiding quantization of a current audio frame signal, where the psychoacoustic model can be The first or second type of psychoacoustic model used by MPEG AAC.
- the signal type analysis module 301 performs front and back masking effects based on the adaptive threshold and the waveform prediction to perform signal type determination on the frame signal.
- the specific steps are: decomposing the input frame into multiple subframes, and searching for PCM on each subframe.
- the local maximum point of the absolute value of the data the absolute peak value of the sub-frame is selected in the local maximum point of each sub-frame; for a certain sub-frame absolute peak, a plurality of (typically 3) sub-frames in front of the sub-frame are used absolutely
- the peak sample predicts a typical sample value of a plurality of (typically 4) subframes relative to the forward delay of the subframe; calculates a difference and a ratio of the absolute peak of the subframe to the predicted typical sample value; If the ratio and the ratio are greater than the set threshold, it is determined that the sub-frame has a sudden signal, and the sub-frame has a local maximum peak point with a backward masking pre-echo capability, if the front end of the sub-
- the frame signal belongs to the fast-changing type signal, and the sub-frame with the sudden signal is used as the position of the fast change point, and the sub-frame of the sudden signal will be present.
- the ratio of the absolute peak to the largest absolute peak in all the sub-frames before the sub-frame is used as the intensity of the mutation, and the intensity of the mutation is quantified.
- the quantization method may be rounding up, down-and-rounding, rounding, etc.; If the ratio is not greater than the set value, the above steps are repeated until it is determined that the frame signal is a fast-changing type signal or reaches the last subframe, and if the last subframe is reached, the frame signal is not determined to be a fast-changing type signal. , the frame signal belongs to a slowly varying type signal.
- time-frequency transform of time-domain audio signals into time-frequency audio signals such as discrete Fourier transform (DFT), discrete cosine transform (DCT), cosine-modulated filter bank, and modified discrete cosine transform.
- DFT discrete Fourier transform
- DCT discrete cosine transform
- DCT cosine-modulated filter bank
- MDCT wavelet transform
- the window length in the analysis window function is equal to the length of the audio signal frame, and the window function can select Hanning window, Hamming (Ha ⁇ ing) ) window, Blackmail window; when using modified discrete cosine transform (MDCT), the window length in the analysis window function is twice the frame length of the audio signal, and the window function can select any condition that matches the modified discrete cosine transform. Window function.
- the analysis window function uses a fixed length window function, the length being an integer greater than 1, preferably 2 to the power of N, where N is a natural number. For the selection of windows, see “Discrete Time Signal Processing ( ⁇ 2)", Xi'an Jiaotong University Press, A.
- the time domain signals of the M samples of the previous frame and the M samples of the current frame are selected, and then the time domain signals of 2M samples of the two frames are windowed by the module 302.
- the window of the analysis window is twice as long as the frame length, and then the framed signal is subjected to MDCT transformation by the time-frequency mapping module 304 to obtain M frequency domain coefficients.
- the impulse response of the MDCT analysis filter is:
- MDCT transforms into: " 0, 0 ⁇ k ⁇ Ml, where: w (n) is a window function; x(n) is the input time domain audio signal of MDCT transform; X(k) is the output frequency domain of MDCT transform
- w (n) is a window function
- x(n) is the input time domain audio signal of MDCT transform
- X(k) is the output frequency domain of MDCT transform
- Sine window, KBD window, etc. can be selected as the window function.
- the Sine window is taken as an example to illustrate how to modify it in the module 302 to achieve the purpose of controlling the pre-echo. It should be noted that the present invention is not limited to the Sine window and the KBD window, and any window function that satisfies the MDCT transformation condition can be used to correct it, and finally achieve the purpose of controlling the pre-echo.
- Fig. 9 is a schematic diagram of the original analysis window function of the encoding and decoding method for controlling the pre-echo of the present invention. If the frame signal 9 is a fast-changing type signal, the original analysis window function is corrected. The correction processing is: performing a proportional reduction on the value of the window function after the fast change point, and the reduced value is equal to the magnitude of the mutation intensity after the quantization.
- the modified analysis window function is shown in Fig. 10, in which the fast change point position of Fig. 10 is the 1280 sample point 10, and the quantized mutation intensity is 5.
- the integrated window at the time of decoding must also be corrected.
- the correction processing is: equalizing the value of the window function after the fast change point.
- the value of the amplification is equal to the intensity of the mutation after quantification.
- the modified integrated window function is shown in Fig. 11.
- the fast change point position of Fig. 11 is the 1280 sample point 11, and the quantized mutation intensity is 5.
- FIG. 12 is a schematic diagram showing the window function correction in accordance with the full reconstruction condition in the encoding method of the pre-control echo according to the present invention.
- the signal of four consecutive frames is shown in Fig. 12, and it has been judged that the i-th frame is a fast-changing frame, the position of the fast change point 12 is the M+L point, and the mutation intensity is scs, where scs is a real number.
- the MDCT transform is performed on the i-th frame, the original analysis window ⁇ ) and the original synthesis window are corrected as described above to obtain the modified analysis window » and the modified integrated window ⁇ ( «), then:
- step 23 the function value after the fast change point position of the pair analysis window function is performed, etc.
- a transition block can be added near the fast change point to slowly change the value of the analysis window function near the fast change point.
- the window function method of adding the transition block is: Assume that the window function is 0 ⁇ " ⁇ 2M - the quantized intensity of the mutation is ⁇ The fast change point position is L. If no transition block is added, the modified window function is:
- FIG. 13 is a schematic diagram of the modified analysis window function of the transition block of the encoding method of the pre-echo control of the present invention.
- FIG. 14 is a schematic diagram of the modified integrated window function of the transition block of the pre-control echo decoding method of the present invention, the transition block 13 of FIG. 13 and the transition block 13 of FIG.
- the length of the transition block 14 is 64 sample points.
- the quantization and entropy coding includes two steps of nonlinear quantization and entropy coding, wherein the quantization process may employ scalar quantization or vector quantization.
- the scalar quantization method can employ a nonlinear scalar used by MPEG.
- AAC which can use vector quantization of MPEG TwinVQ.
- the quantization process may also employ an audio coding method based on minimizing global noise masking ratio criteria and entropy coding (patent application number 03146213. 8). After the quantization process, the entropy coding technique is used to further remove the quantized coefficients and the statistical redundancy of the side information, and finally the compressed audio code stream is obtained.
- Fig. 15 is a block diagram showing the structure of a second embodiment of the encoding apparatus for controlling the pre-echo of the present invention.
- a sub-band analysis module 307 is added, which is connected to the signal type analysis module 301 for performing the input audio signal.
- the subband analysis can modify the window function differently according to the difference of the intensity of the mutation and the nature of the signal for each frequency segment. For example, the general low frequency band does not generate the pre-echo phenomenon, then the window function of the low frequency band can be omitted. Corrected, thus more flexible control of window function correction for different frequency segments.
- FIG. 16 is a flowchart of Embodiment 2 of the encoding method for controlling the pre-echo according to the present invention.
- the steps are as follows: Step 40: Subband analysis
- the module 307 performs subband analysis on the input audio signal frame, and the subband analysis is segmented according to the frequency, and the audio signal frame is divided into multiple subband audio signals; Step 41, the signal type analysis module 301 respectively determines the multiple subband audio signals.
- step 42 the signal type analyzing module 301 calculates parameters of the fast change point position and the described manner for the multiple subband audio signal frames respectively.
- the multi-path subband has a mutation intensity parameter of the audio signal frame, and respectively quantizes the mutation intensity parameter of the multi-channel sub-band audio signal frame to obtain a quantized value of the mutation strength of the multi-channel sub-band audio signal frame; step 43, the correction window function module 3 02 And respectively reducing the function value after the fast change point position of the analysis window function by an equal ratio, and the reduced value is equal to the Mutant intensity values quantized audio signal frame with the road, to give 'analysis window function after positive; step 44, the correction window function module 302 using the modified analysis window function on each of the multi-way audio band signal frame Windowing, obtaining a multi-path sub-band time domain signal after windowing, and performing step 46; Step 45, the correction window function module 30 2 windowing the multi-channel sub-band audio signal frame with an original analysis window function, Obtaining the windowed time domain signal, and performing step 46; Step 46, The time-frequency mapping module 304 performs time-frequency mapping processing on the windowed multi-band sub-band time domain signal
- Step 47 the quantization and entropy coding module 305 integrates the multi-path sub-band frequency domain coefficients;
- the quantization and chirp encoding module 305 quantizes and entropy encodes the frequency domain coefficients according to a masking threshold parameter of a scale factor band obtained by psychoacoustic processing of the audio signal frame by the psychoacoustic module 303, to obtain an encoded audio code stream;
- Step 49 The code stream multiplexing module 306 multiplexes the encoded audio code stream and the result of the signal type analysis to obtain a compressed audio code stream.
- step 41 of the above decoding method while the signal type analysis module 301 determines the signal type of the multiplex subband audio signal frame, the psychoacoustic module 303 performs psychoacoustic processing on the input audio signal frame to obtain a scale factor. Masked threshold parameter with band.
- the psychoacoustic processing is a masking curve for calculating a current frame signal according to a human ear hearing characteristic, and a masking threshold value of a specific time-frequency region can be calculated according to the masking curve for guiding quantization of a current audio frame signal, where the psychoacoustic model can be Is the first or second type of psychoacoustic model used by MPEG AAC.
- the second embodiment of the encoding method of the present invention has the advantage that the window function can be modified differently according to the difference of the mutation strength and the signal property of each frequency segment. For example, if the low frequency band does not generate the pre-echo phenomenon, then the fault may be incorrect.
- the window function of the low frequency band is modified to more flexibly control the window function correction of different frequency segments.
- FIG. 17 is a structural block diagram of Embodiment 1 of a decoding apparatus for controlling pre-echo of the present invention.
- the device is composed of the following functional modules: a code stream demultiplexing module 401 for demultiplexing the compressed audio code stream; an inverse quantization and entropy decoding module 40 2 , connected to the code stream demultiplexing module 401, Decoding and dequantizing the demultiplexed audio code stream, and outputting the inverse quantized frequency domain coefficients; a frequency time mapping module 403, coupled to the inverse quantization and decoding module 402, for The inverse-quantized frequency domain coefficients are transformed into a time domain signal, and the frequency-time mapping module 403 is composed of a filter bank, the filter bank is an inverse transform filter bank corresponding to the encoding device; the modified window function module 404, and The frequency time mapping module 403 is connected to modify the integrated window function and perform windowing processing on the time domain signal.
- FIG. 18 is a flowchart of Embodiment 1 of a method for decoding a pre-control echo according to the present invention.
- the steps are as follows: Step 31: A code stream demultiplexing module demultiplexes an input compressed audio code stream to obtain a demultiplexed Audio code stream and side information; Step 32: The inverse quantization and entropy decoding module performs inverse quantization and entropy decoding on the demultiplexed audio code stream to obtain inverse quantized frequency domain coefficients; Step 33, frequency time mapping The radio module performs frequency-frequency mapping processing on the inverse-quantized frequency domain coefficients to obtain a time domain signal;
- the correction window function module determines, according to the demultiplexed side information, whether the signal type of the audio signal frame is a fast change type, if yes, go to step 35, otherwise go to step 37; Step 35, modify the window function module pair The function value after the fast change point position of the integrated window function is amplified in equal proportion, and the amplified value is equal to the quantized value of the mutation intensity, and the modified integrated window function is obtained; Step 36, the modified window function module is used The modified integrated window function windowes the time domain signal to obtain a reconstructed audio signal; and the modified window function module adds a window to the time domain signal by using an original integrated window function. The reconstructed audio signal is obtained.
- the method of performing frequency-frequency mapping and correcting window function processing on the frequency domain coefficients corresponds to the time-frequency mapping and the modified window function processing method in the encoding method, and is based on encoding control in the compressed audio code stream. Information is used to select the corresponding inverse mapping and window function.
- the frequency-time mapping processing can be implemented by inverse discrete cosine transform (IDCT), inverse discrete Fourier transform, inverse modified discrete cosine transform (IMDCT).
- the following is an example of the inverse time cosine transform IMDCT to illustrate the frequency time mapping process. Since the frequency-time mapping and windowing processing are indivisible, the frequency-time mapping and the correction window function are considered together here.
- IMDCT transform is performed on the inverse quantized word to obtain the transformed time domain signal x '. IMDCT change
- the time domain signal obtained by the DCT transform is windowed in the time domain.
- the above windowed time domain The signal is superimposed to obtain a time domain audio signal.
- the first ⁇ /2 samples of the signal obtained after the windowing operation are overlapped with the ⁇ /2 samples of the previous frame signal to obtain /2 outputs.
- Time domain audio samples, ⁇ timeS ⁇ i, n p re Sa mi , n + preSam ⁇ , where i represents the frame number, n represents the sample number, there are 2 , and the length is ⁇ .
- the integrated window function correction has been described above.
- the correction processing described in the original integrated window function is: scaling the window function value after the fast change point, the amplified value is equal to
- the modified comprehensive window function is shown in Fig. 11. Pair and edit
- the comprehensive window corresponding to the analysis window used in the code can still satisfy the complete reconstruction condition after the above correction, which has been proved in the foregoing.
- the fast change in the pair of integrated window functions corresponds to the correction process at the time of encoding.
- a transition block can be added near the fast change point to slowly change the value of the analysis window function near the fast change point.
- the window function method of adding the transition block is: Assume that the window function is "where 0 ⁇ " ⁇ 2 ⁇ - 1 , and the quantized mutation intensity is the fast change point position is L. If no transition block is added, the corrected window function is :
- the corrected window function is:
- w' n) w(n) /g(n) L -I ⁇ n ⁇ L + I-l
- Fig. 14 is a schematic diagram of the modified integrated window function of the transition block of the pre-control echo decoding method of the present invention, and the transition block 14 in Fig. 14 has a length of 64 Sample points. It has been proved that as long as the linear transformation of the original window does not change the complete reconstruction characteristics of the transformation, we can also perform arbitrary linear transformation on the analysis window function or the synthesis window according to the signal type.
- the window function of the transition section or the arbitrary linear transformation of the window function may be directed to the analysis window or the synthesis window used in the coding and decoding methods in all of the above embodiments.
- Fig. 19 is a block diagram showing the configuration of the first embodiment of the decoding apparatus for controlling the pre-echo of the present invention.
- a sub-band synthesis module 405 is added, which is connected to the correction window function module 404 for multiplexing the multi-channel reconstruction.
- Subband time domain signals are used for subband synthesis.
- FIG. 20 is a flowchart of Embodiment 2 of the method for decoding the pre-echo of the present invention.
- Step 51 Code stream The demultiplexing module demultiplexes the input compressed audio code stream to obtain the demultiplexed audio stream and side information
- Step 52 inverse quantization and entropy decoding module for demultiplexed audio
- the code stream is divided into multiple channels according to the frequency
- Step 53 The inverse quantization and entropy decoding modules respectively perform inverse quantization and entropy decoding on the demultiplexed audio code stream to obtain inversely quantized frequency domain coefficients
- Step 53 The frequency-time mapping module separately performs frequency-time mapping processing on the inverse-quantized frequency-domain coefficients to obtain a multi-channel time domain signal.
- Step 54 The correction window function module respectively determines the side information according to the demultiplexed Whether the signal type of the multi-channel audio signal frame is fast-changing type, if yes, step 55 is performed; otherwise, step 57 is performed; step 55, the correction window function module respectively performs the fast-changing point position of the multi-channel integrated window function.
- the function value is scaled up, the amplified value is equal to the quantized value of the mutation intensity, and the modified integrated window function is obtained;
- Step 56 the modified window function module uses the modified integrated window function to respectively The time domain signal is windowed to obtain a multiplexed reconstructed audio signal, and then step 58 is performed;
- Step 57 the modified window function module separately uses the original integrated window function Domain signal windowed multi-channel, audio signal to obtain reconstructed multi channel, then step 58;
- step 58 the sub-band audio signal synthesizing module reconstructed multi-channel are synthesized.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
L'invention concerne un codeur destiné à une commande de pré écho incluant un module d'analyse de type de signal, un module de fonction de fenêtrage modifié, un module de transformation du domaine dans le temps dans le domaine fréquentiel, un module de quantification et de codage par entropie et un module de multiplexage de flux de codes, lesquels sont reliés en série, ainsi qu'un module d'analyse psychoacoustique qui est relié au module de quantification et de codage par entropie. L'invention concerne également un procédé de codage pour la commande de pré écho, incluant les étapes consistant à : 1 déterminer si le type de signal de la trame audio d'entrée est une trame transitoire, si la réponse est oui, alors la valeur de fonction de fenêtrage est modifiée linéairement, puis la trame audio est multipliée par la fenêtre modifiée, 2 effectuer une transformation du domaine temporel en domaine fréquentiel pour la trame audio multipliée par la fenêtre et effectuer une quantification et coder par entropie les coefficients dans le domaine fréquentiel, puis multiplexer le flux audio codé et le résultat de l'analyse de type signal. L'invention concerne, en outre, un décodeur destiné à une commande de pré écho, incluant un module de démultiplexage, un module de quantification inverse et un module de décodage par entropie, un module de transformation du domaine fréquentiel en domaine temporel et un module de fonction de fenêtrage modifié, lesquels sont reliés en série. L'invention concerne également un procédé de décodage pour une commande de pré écho.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2005/001435 WO2007028280A1 (fr) | 2005-09-08 | 2005-09-08 | Codeur et decodeur pour commande de pre echo et son procede |
| CN200580051158.0A CN101228574A (zh) | 2005-09-08 | 2005-09-08 | 一种控制前回声的编码和解码装置及方法 |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2005/001435 WO2007028280A1 (fr) | 2005-09-08 | 2005-09-08 | Codeur et decodeur pour commande de pre echo et son procede |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2007028280A1 true WO2007028280A1 (fr) | 2007-03-15 |
Family
ID=37835353
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2005/001435 Ceased WO2007028280A1 (fr) | 2005-09-08 | 2005-09-08 | Codeur et decodeur pour commande de pre echo et son procede |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN101228574A (fr) |
| WO (1) | WO2007028280A1 (fr) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2008141579A1 (fr) * | 2007-05-17 | 2008-11-27 | Spreadtrum Communications (Shanghai) Co., Ltd. | Procédé de codage et de décodage de signal audio transitoire |
| WO2009092309A1 (fr) * | 2008-01-16 | 2009-07-30 | Huawei Technologies Co., Ltd. | Procédé et appareil de commande destinés à quantifier une fuite de bruit |
| CN109783767A (zh) * | 2018-12-21 | 2019-05-21 | 电子科技大学 | 一种短时傅里叶变换窗长的自适应选择方法 |
| CN114002733A (zh) * | 2021-10-27 | 2022-02-01 | 武汉科技大学 | 微震波信号初至到时自动拾取方法及微震监测装置 |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| RU2481650C2 (ru) * | 2008-09-17 | 2013-05-10 | Франс Телеком | Ослабление опережающих эхо-сигналов в цифровом звуковом сигнале |
| CN103327201B (zh) * | 2012-03-20 | 2016-04-20 | 联芯科技有限公司 | 残留回声消除方法及系统 |
| CN115881139A (zh) * | 2021-09-29 | 2023-03-31 | 华为技术有限公司 | 编解码方法、装置、设备、存储介质及计算机程序 |
| CN114974270B (zh) * | 2022-04-15 | 2025-03-25 | 北京邮电大学 | 一种音频信息自适应隐藏方法 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5117228A (en) * | 1989-10-18 | 1992-05-26 | Victor Company Of Japan, Ltd. | System for coding and decoding an orthogonally transformed audio signal |
| CN1153369A (zh) * | 1995-10-05 | 1997-07-02 | 索尼公司 | 使用多通道音频信号的编码方法及装置 |
| JP2001265392A (ja) * | 2000-03-17 | 2001-09-28 | Victor Co Of Japan Ltd | 音声符号化装置及びその方法 |
| JP2003216188A (ja) * | 2002-01-25 | 2003-07-30 | Matsushita Electric Ind Co Ltd | オーディオ信号符号化方法、符号化装置、及び記憶媒体 |
-
2005
- 2005-09-08 WO PCT/CN2005/001435 patent/WO2007028280A1/fr not_active Ceased
- 2005-09-08 CN CN200580051158.0A patent/CN101228574A/zh active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5117228A (en) * | 1989-10-18 | 1992-05-26 | Victor Company Of Japan, Ltd. | System for coding and decoding an orthogonally transformed audio signal |
| CN1153369A (zh) * | 1995-10-05 | 1997-07-02 | 索尼公司 | 使用多通道音频信号的编码方法及装置 |
| JP2001265392A (ja) * | 2000-03-17 | 2001-09-28 | Victor Co Of Japan Ltd | 音声符号化装置及びその方法 |
| JP2003216188A (ja) * | 2002-01-25 | 2003-07-30 | Matsushita Electric Ind Co Ltd | オーディオ信号符号化方法、符号化装置、及び記憶媒体 |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2008141579A1 (fr) * | 2007-05-17 | 2008-11-27 | Spreadtrum Communications (Shanghai) Co., Ltd. | Procédé de codage et de décodage de signal audio transitoire |
| WO2009092309A1 (fr) * | 2008-01-16 | 2009-07-30 | Huawei Technologies Co., Ltd. | Procédé et appareil de commande destinés à quantifier une fuite de bruit |
| CN109783767A (zh) * | 2018-12-21 | 2019-05-21 | 电子科技大学 | 一种短时傅里叶变换窗长的自适应选择方法 |
| CN109783767B (zh) * | 2018-12-21 | 2023-03-31 | 电子科技大学 | 一种短时傅里叶变换窗长的自适应选择方法 |
| CN114002733A (zh) * | 2021-10-27 | 2022-02-01 | 武汉科技大学 | 微震波信号初至到时自动拾取方法及微震监测装置 |
| CN114002733B (zh) * | 2021-10-27 | 2024-01-23 | 武汉科技大学 | 微震波信号初至到时自动拾取方法及微震监测装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN101228574A (zh) | 2008-07-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20240347067A1 (en) | Post-processor, pre-processor, audio encoder, audio decoder and related methods for enhancing transient processing | |
| US11380342B2 (en) | Hierarchical decorrelation of multichannel audio | |
| JP5539203B2 (ja) | 改良された音声及びオーディオ信号の変換符号化 | |
| CN101878504B (zh) | 使用时间分辨率能选择的低复杂性频谱分析/合成 | |
| RU2680352C1 (ru) | Способ и устройство для определения режима кодирования, способ и устройство для кодирования аудиосигналов и способ и устройство для декодирования аудиосигналов | |
| KR101425155B1 (ko) | 복소 예측을 이용한 다중 채널 오디오 신호를 처리하기 위한 오디오 인코더, 오디오 디코더, 및 관련 방법 | |
| CN102243873B (zh) | 分解滤波器组、合成滤波器组、编码器、解码器、混合器及会议系统 | |
| JP6704037B2 (ja) | 音声符号化装置および方法 | |
| US20090204397A1 (en) | Linear predictive coding of an audio signal | |
| CN110047500B (zh) | 音频编码器、音频译码器及其方法 | |
| KR20130133848A (ko) | 스펙트럼 도메인 잡음 형상화를 사용하는 선형 예측 기반 코딩 방식 | |
| CN101488344A (zh) | 一种量化噪声泄漏控制方法及装置 | |
| EP3069337B1 (fr) | Procédé et appareil destinés à l'encodage d'un signal audio | |
| CN101673545B (zh) | 一种编解码方法及装置 | |
| WO2007028280A1 (fr) | Codeur et decodeur pour commande de pre echo et son procede | |
| RU2803142C1 (ru) | Устройство повышающего микширования звука, выполненное с возможностью работы в режиме с предсказанием или в режиме без предсказания | |
| HK1143237B (en) | Improved transform coding of speech and audio signals |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| WWE | Wipo information: entry into national phase |
Ref document number: 200580051158.0 Country of ref document: CN |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC - FORM EPO 1205A DATED 27-05-2008 |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 05783980 Country of ref document: EP Kind code of ref document: A1 |