WO2008114078A1 - En encoder - Google Patents
En encoder Download PDFInfo
- Publication number
- WO2008114078A1 WO2008114078A1 PCT/IB2007/001712 IB2007001712W WO2008114078A1 WO 2008114078 A1 WO2008114078 A1 WO 2008114078A1 IB 2007001712 W IB2007001712 W IB 2007001712W WO 2008114078 A1 WO2008114078 A1 WO 2008114078A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio signal
- signal
- synthesized audio
- dependent
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/24—Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0204—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
- G10L19/0208—Subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0212—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
Definitions
- the present invention relates to coding, and in particular, but not exclusively to speech or audio coding.
- Audio signals like speech or music, are encoded for example for enabling an efficient transmission or storage of the audio signals.
- Audio encoders and decoders are used to represent audio based signals, such as music and background noise. These types of coders typically do not utilise a speech model for the coding process, rather they use processes for representing all types of audio signals, including speech.
- Speech encoders and decoders are usually optimised for speech signals, and often operate at a fixed bit rate.
- An audio codec can also be configured to operate with varying bit rates. At lower bit rates, such an audio codec may work with speech signals at a coding rate equivalent to pure speech codec. At higher bit rates, the performance may be good with any signal including music, background noise and speech.
- a further audio coding option is an embedded variable rate speech or audio coding scheme, which is also referred as a layered coding scheme.
- Embedded variable rate audio or speech coding denotes an audio or speech coding scheme, in which a bit stream resulting from the coding operation is distributed into successive layers.
- a base or core layer which comprises of primary coded data generated by a core encoder is formed of the binary elements essential for the decoding of the binary stream, and determines a minimum quality of decoding. Subsequent layers make it possible to progressively improve the quality of the signal arising from the decoding operation, where each new layer brings new information.
- One of the particular features of layered based coding is the possibility offered of intervening at any level whatsoever of the transmission or storage chain, so as to delete a part of binary stream without having to include any particular indication to the decoder.
- the decoder uses the binary information that it receives and produces a signal of corresponding quality.
- International Telecommunications Union Technical (ITU-T) standardisation aims at a wideband codec of 50 to 7000 Hz with bit rates from 8 to 32 kbps.
- the codec core layer will either work at 8 kbps or 12 kbps, and additional layers with quite small granularity will increase the observed speech and audio quality.
- the proposed layers will have as a minimum target at least five bit rates of 8, 12, 16, 24 and 32 kbps available from the same embedded bit stream.
- the structure of the codecs tend to be hierarchical in form, consisting of multiple coding stages.
- different coding techniques are used for the core (or base) layer and the additional layers.
- the coding methods used in the additional layers are then used to either code those parts of the signal which have not been coded by previous layers, or to code a residual signal from the previous stage.
- the residual signal is formed by subtracting a synthetic signal i.e. a signal generated as a result of the previous stage from the original.
- the codec core layer is typically a speech codec based on the Code Excited Linear Prediction (CELP) algorithm or a variant such as adaptive multi-rate (AMR) CELP and variable multi-rate (VMR) CELP .
- CELP Code Excited Linear Prediction
- AMR adaptive multi-rate
- VMR variable multi-rate
- VMR-WB codec Variable Multi-Rate Wide Band
- VMR-WB codec Details on the VMR-WB codec can be found in the 3GPP2 technical specification C.S0052-0. In a manner similar to the AMR family the source control VMR-WB audio codec also uses ACELP coding as a core coder.
- the higher layers utilise time frequency transformations as described in the prior art "Analysis/Synthesis Filter Bank Design Based on Time Domain Aliasing Cancellation" by J.P. Princen et al. (IEEE Transactions on ASSP, Vol. ASSP-34, No. 5. October 1986) and found in many audio coding technologies in order to quantise the residual signal.
- the residual signal formed by subtracting the synthetic output signal from the original, may contain signal components which are highly correlated with the synthetic output, generated from the core codec. Exploiting this correlation can result in a more efficient quantisation scheme for the signal components representing the residual. This may take the shape of using the energy content in the synthetic signal to normalise corresponding signal components in the residual signal. This can have the effect of limiting the dynamic range of signal components thereby increasing the efficiency of any subsequent quantisation stages.
- Embodiments of the present invention aim to address the above problem.
- an encoder configured to receive an audio signal, wherein the encoder is further configured to: generate a synthesized audio signal and a difference signal; and scale the difference signal dependent on the synthesized audio signal.
- the encoder may comprise: a first encoder configured to receive the audio signal and output the synthesized audio signal dependent on the audio signal; a difference circuit configured to receive the audio signal and the synthesized audio signal and subtract one of the audio signal and the synthesized audio signal from the other to output the difference signal; and a second encoder configured to receive the difference signal and the synthesized audio signal, and to encode the difference signal to generate a scaled encoded signal.
- the second encoder may be configured to scale the difference signal dependent on the synthesized audio signal by generating at least one scaling factor dependent on the synthesized audio signal.
- the second encoder may be configured to generate at least one scaling factor dependent on at least one estimated parameter dependent on the synthesized audio signal.
- the second encoder may be further configured to estimate the at least one parameter.
- the second encoder may further be configured to: perform a time domain to frequency domain transform on the synthesized audio signal; and group the synthesized audio signal frequency coefficients from the transform into at least two synthesized audio signal sub-band groups; wherein the second encoder is configured to estimate at least one parameter for each synthesized audio signal sub-band group.
- the parameter is preferably at least one of: a root mean squared value of the synthesized audio signal frequency coefficients; an average energy value of the synthesised audio signal frequency coefficients; an average magnitude value of the synthesised audio signal frequency coefficients; and a maximum magnitude value selected from a set of synthesised audio signal frequency coefficients, wherein the set members are synthesised audio signal frequency coefficients in a sub-band group.
- the second encoder is preferably configured to: perform a time domain to frequency domain transform on the difference signal; and group the difference signal frequency coefficients from the transform into at least two difference signal sub-band groups.
- the number of synthesized audio signal sub-band groups is preferably equal to the number of difference signal sub-band groups, and the second encoder may be configured to scale each difference signal sub-band group by the scaling factor generated from a parameter from a corresponding synthesized audio signal sub- band group.
- the time domain to frequency domain transformation may comprise an orthogonal discrete transform.
- the orthogonal discrete transform may comprise a modified discrete cosine transform.
- the encoder may be configured to further scale the difference signal dependent on a predetermined value.
- the encoder may be configured to further scale the difference signal dependent on parameters estimated from the encoded signal.
- a method for encoding an audio signal comprising: receiving an audio signal; generating a synthesized audio signal dependent on the received audio signal; generating a difference signal; and scaling the difference signal dependent on the synthesized audio signal.
- the generating a difference signal may further comprise: subtracting one of the audio signal and the synthesized audio signal from the other to generate the difference signal; and encoding the scaled difference signal to generate a scaled encoded signal.
- Scaling the difference signal may comprise generating at least one scaling factor dependent on the synthesized audio signal.
- Generating at least one scaling factor may comprise generating a scaling factor dependent on at least one estimated parameter dependent on the synthesized audio signal.
- Generating may further comprise estimating the at least one parameter.
- Estimating may comprise: performing a time domain to frequency domain transform on the synthesized audio signal; and grouping the synthesized audio signal frequency coefficients from the transform into at least two synthesized audio signal sub-band groups; and estimating at least one parameter for each synthesized audio signal sub-band group.
- Estimating at least one parameter for each synthesized audio signal sub-band group may comprise estimating at least one of: a root mean squared value of the synthesized audio signal frequency coefficients per synthesized signal sub-band group; an average energy value of the synthesised audio signal frequency coefficients; an average magnitude value of the synthesised audio signal frequency coefficients; a maximum magnitude value selected from a set of synthesised audio signal frequency coefficients, wherein the set members are synthesised audio signal frequency coefficients in a sub-band group.
- Encoding may comprise: performing a time domain to frequency domain transform to the difference signal; and grouping the difference signal frequency coefficients from the transform into at least two difference signal sub-band groups.
- the number of synthesized audio signal sub-band groups is preferably equal to the number of difference signal sub-band groups, and scaling may comprise scaling each difference signal sub-band group by the scaling factor generated from a parameter from a corresponding synthesized audio signal sub-band group.
- the time domain to frequency domain transformation may comprise an orthogonal discrete transform.
- the orthogonal discrete transform may comprise a modified discrete cosine transform.
- the method may further comprise scaling the difference signal dependent on a predetermined value.
- the method may further comprise scaling the difference signal dependent on parameters estimated from the encoded signal.
- a decoder configured to receive an encoded signal and output an estimate of an audio signal, wherein the decoder is further configured to: generate a synthesized audio signal; and scale the encoded signal dependent on the synthesized audio signal.
- the decoder may comprise: a first decoder configured to receive at least part of the encoded signal and output the synthesized audio signal dependent on the encoded signal; a second decoder configured to receive a further part of the encoded signal and the synthesized audio signal, and to scale the further part of the encoded signal dependent on the synthesized audio signal.
- the second decoder is preferably configured to scale the encoded signal dependent on the synthesized audio signal by generating at least one scaling factor dependent on the synthesized audio signal.
- the second decoder is preferably configured to generate at least one scaling factor dependent on at least one estimated parameter dependent on the synthesized audio signal.
- the second decoder is preferably further configured to estimate the at least one parameter.
- the second decoder is preferably further configured to: perform a time domain to frequency domain transform on the synthesized audio signal; and group the synthesized audio signal frequency coefficients from the transform into at least two synthesized audio signal sub-band groups; wherein the second decoder is preferably configured to estimate at least one parameter for each synthesized audio signal sub-band group.
- the parameter is preferably at least one of: a root mean squared value of the synthesized audio signal frequency coefficients; an average energy value of the synthesised audio signal frequency coefficients; an average magnitude value of the synthesised audio signal frequency coefficients; and a maximum magnitude value selected from a set of synthesised audio signal frequency coefficients, wherein the set members are preferably synthesised audio signal frequency coefficients in a sub-band group.
- the second decoder is preferably configured to: group the encoded signal frequency coefficients from the transform into at least two encoded signal sub-band groups.
- the number of synthesized audio signal sub-band groups is preferably equal to the number of encoded signal sub-band groups, and the second encoder is preferably configured to scale each encoded signal sub-band group by the scaling factor generated from a parameter from a corresponding synthesized audio signal sub- band group.
- the time domain to frequency domain transformation may comprise an orthogonal discrete transform.
- the orthogonal discrete transform may comprise a modified discrete cosine transform.
- the decoder is preferably configured to further scale the encoded signal dependent on a predetermined value.
- the decoder is preferably configured to further scale the encoded signal dependent on parameters estimated from the encoded signal.
- a method for decoding an audio signal comprising: receiving an encoded signal; generating a synthesized audio signal dependent on the received encoded signal; and scaling the encoded signal dependent on the synthesized audio signal.
- Scaling the encoded signal may comprise generating at least one scaling factor dependent on the synthesized audio signal.
- Generating at least one scaling factor may comprise generating a scaling factor dependent on at least one estimated parameter dependent on the synthesized audio signal.
- Generating at least one scaling factor may further comprise estimating the at least one parameter.
- Estimating may comprise: performing a time domain to frequency domain transform on the synthesized audio signal; and grouping the synthesized audio signal frequency coefficients from the transform into at least two synthesized audio signal sub-band groups; and estimating at least one parameter for each synthesized audio signal sub-band group.
- Estimating at least one parameter for each synthesized audio signal sub-band group may comprise estimating at least one of: a root mean squared value of the synthesized audio signal frequency coefficients; an average energy value of the synthesised audio signal frequency coefficients; an average magnitude value of the synthesised audio signal frequency coefficients; and a maximum magnitude value selected from a set of synthesised audio signal frequency coefficients, wherein the set members are preferably synthesised audio signal frequency coefficients in a sub-band group.
- the method may further comprise: grouping the encoded signal frequency coefficients from the transform into at least two encoded signal sub-band groups.
- the number of synthesized audio signal sub-band groups is preferably equal to the number of difference signal sub-band groups, and scaling may comprise scaling each difference signal sub-band group by the scaling factor generated from a parameter from a corresponding synthesized audio signal sub-band group.
- the time domain to frequency domain transformation may comprise an orthogonal discrete transform.
- the orthogonal discrete transform may comprise a modified discrete cosine transform.
- the method may further comprise scaling the encoded signal dependent on a predetermined value.
- the method may further comprise scaling the encoded signal dependent on parameters estimated from the encoded signal.
- an apparatus comprising an encoder as described above.
- an apparatus comprising a decoder as described above.
- an electronic device comprising an encoder as described above.
- an electronic device comprising a decoder as described above.
- a computer program product configured to perform a method for encoding an audio signal comprising: receiving an audio signal; generating a synthesized audio signal dependent on the received audio signal; generating a difference signal; and scaling the difference signal dependent on the synthesized audio signal.
- a computer program product configured to perform a method for decoding an audio signal; comprising: receiving an encoded signal; generating a synthesized audio signal dependent on the received encoded signal; and scaling the encoded signal dependent on the synthesized audio signal.
- an encoder configured to receive an audio signal and output a scaled encoded signal, wherein the encoder comprises: means for generating a synthesized audio signal; means for generating a difference signal; and means for scaling the difference signal dependent on the synthesized audio signal.
- the encoder may further comprise: means for outputting a difference signal comprising the means for subtracting one of the audio signal and the synthesized audio signal from the other; and wherein the means for generating the encoded signal comprises: means for receiving the difference signal and the synthesized audio signal; and means for encoding the scaled difference signal to generate the encoded signal.
- a decoder configured to receive an encoded signal and output an estimate of an audio signal, comprising: means for generating a synthesized audio signal; and means for scaling the encoded signal dependent on the synthesized audio signal.
- the decoder may comprise means for generating the synthesized audio signal dependent on at least part of the encoded signal.
- Figure 1 shows schematically an electronic device employing embodiments of the invention
- Figure 2 shows schematically an audio encoder according to an embodiment of the present invention
- Figure 3 shows a flow diagram illustrating the operation of the audio encoder according to an embodiment of the present invention
- FIG. 4 shows schematically an audio decoder according to an embodiment of the present invention.
- Figure 5 shows a flow diagram illustrating the operation of an embodiment of the audio decoder according to the present invention.
- Figure 1 shows a schematic block diagram of an exemplary electronic device 610, which may incorporate a codec according to an embodiment of the invention.
- the electronic device 610 may for example be a mobile terminal or user equipment of a wireless communication system.
- the electronic device 610 comprises a microphone 611 , which is linked via an analogue-to-digital converter 614 to a processor 621.
- the processor 621 is further linked via a digital-to-analogue converter 632 to loudspeakers 633.
- the processor 621 is further linked to a transceiver (TX/RX) 613, to a user interface (Ul) 615 and to a memory 622.
- the processor 621 may be configured to execute various program codes.
- the implemented program codes comprise an audio encoding code for encoding a lower frequency band of an audio signal and a higher frequency band of an audio signal.
- the implemented program codes 623 further comprise an audio decoding code.
- the implemented program codes 623 may be stored for example in the memory 622 for retrieval by the processor 621 whenever needed.
- the memory 622 could further provide a section 624 for storing data, for example data that has been encoded in accordance with the invention.
- the encoding and decoding code may in embodiments of the invention be implemented in hardware or firmware.
- the user interface 615 enables a user to input commands to the electronic device 610, for example via a keypad, and/or to obtain information from the electronic device 610, for example via a display.
- the transceiver 613 enables a communication with other electronic devices, for example via a wireless communication network.
- a user of the electronic device 610 may use the microphone 611 for inputting speech that is to be transmitted to some other electronic device or that is to be stored in the data section 624 of the memory 622.
- a corresponding application has been activated to this end by the user via the user interface 615.
- This application which may be run by the processor 621 , causes the processor 621 to execute the encoding code stored in the memory 622.
- the analogue-to-digital converter 614 converts the input analogue audio signal into a digital audio signal and provides the digital audio signal to the processor 621.
- the processor 621 may then process the digital audio signal in the same way as described with reference to Figures 2 and 3.
- the resulting bit stream is provided to the transceiver 613 for transmission to another electronic device.
- the coded data could be stored in the data section 624 of the memory 622, for instance for a later transmission or for a later presentation by the same electronic device 610.
- the electronic device 610 could also receive a bit stream with correspondingly encoded data from another electronic device via its transceiver 613.
- the processor 621 may execute the decoding program code stored in the memory 622.
- the processor 621 decodes the received data, for instance in the same way as described with reference to Figures 4 and 5, and provides the decoded data to the digital-to-analogue converter 632.
- the digital-to-analogue converter 632 converts the digital decoded data into analogue audio data and outputs them via the loudspeakers 633. Execution of the decoding program code could be triggered as well by an application that has been called by the user via the user interface 615.
- the received encoded data could also be stored instead of an immediate presentation via the loudspeakers 633 in the data section 624 of the memory 622, for instance for enabling a later presentation or a forwarding to still another electronic device.
- FIG. 2 a view of an encoder (otherwise known as the coder) embodiment of the invention is shown.
- a schematic view of the encoder 200 impiementing an embodiment of the invention is shown.
- the operation of the embodiment encoder is described as a flow diagram in figure 3.
- the encoder may be divided into: a core encoder 271 ; a delay unit 207; a difference unit 209; a difference encoder 273; a difference encoder controller 275; and a multiplexer 215.
- the encoder 200 in step 301 receives the original audio signal.
- the audio signal is a digitally sampled signal.
- the audio input may be an analogue audio signal, for example from a microphone 6, which is analogue to digitally (A/D) converted.
- the audio input is converted from a pulse code modulation digital signal to amplitude modulation digital signal.
- the core encoder 271 receives the audio signal to be encoded and outputs the encoded parameters which represent the core level encoded signal, and also the synthesised audio signal (in other words the audio signal is encoded into parameters and then the parameters are decoded using the reciprocal process to produce the synthesised audio signal).
- the core encoder 271 may be divided into three parts (the pre-processor 201 , core codec 203 and post-processor 205).
- the core encoder receives the audio input at the pre-processing element 201.
- the pre-processing stage 201 may perform a low pass filter followed by decimation in order to reduce the number of samples being coded. For example, if the input signal was originally sampled at 16 kHz, the signal may be down-sampled to 8 kHz using a linear phase FIR filter with a 3 decibel cut off around 3.6 kHz and then decimating the number of samples by a factor of 2.
- the pre-processing element 201 outputs a pre-processed audio input signal to the core codec 203. This operation is represented in step 303 of figure 3. Further embodiments may include core codecs operating at different sampling frequencies. For instance some core codecs can operate at the original sampling frequency of the input audio signal.
- the core codec 203 receives the signal and may use any appropriate encoding technique.
- the core codec is an algebraic code excited linear prediction encoder (ACELP) which is configured to a bitstream of typical ACELP parameters as lower level signals as depicted by R1 and/or R2.
- ACELP algebraic code excited linear prediction encoder
- the parameter bitstream is output to the multiplexer 215.
- the encoder output bit stream may include typical ACELP encoder parameters.
- these parameters include LPC (Linear prediction calculation) parameters quantized in LSP (Line Spectral Pair) or ISP (Immittance Spectral Pair) domain describing the spectral content, LTP (long- term prediction) parameters describing the periodic structure, ACELP excitation parameters describing the residual signal after linear predictors, and signal gain parameters.
- the core codec 203 may, in some embodiments of the present invention, comprise a configured two-stage cascade code excited linear prediction (CELP) coder, such as VMR, producing R1 and/or R2 bitstreams at 8 Kbit/s and/or 12 Kbit/s respectively.
- CELP cascade code excited linear prediction
- This encoding of the pre-processed signal is shown in figure 3 by step 305.
- the core codec 203 furthermore outputs a synthesised audio signal (in order words the audio signal is first encoded into parameters such as those described above and then decoded back into an audio signal within the same call codec).
- This synthesised signal is passed to the post-processing unit 205. It is appreciated that the synthesised signal is different from the signal input to the core codec as the parameters are approximations to the correct values - the differences are because of the modelling errors and quantisation of the parameters.
- the post-processor 205 re-samples the synthesised audio output in order that the output of the post-processor has a sample rate equal to the input audio signal.
- the synthesised signal output from the core codec 203 is first up-sampled to 16 kHz and then filtered using a low path filter to prevent aliasing occurring.
- step 309 The post processing of the synthesized signal is shown in figure 3 by step 309.
- the post-processor 205 outputs the re-sampled signal to the difference unit 209.
- the pre-processor 201 and post-processor 205 are optional elements and the core codec may receive and encode the digital signal directly.
- the core codec 203 receives an analogue or pulse width modulated signal directly and performs the parameterization of the audio signal outputting a synthesized signal to the difference unit 209.
- the audio input is also passed to the delay unit 207, which performs a digital delay equal to the delay produced by the core coder 271 in producing a synthesized signal, and then outputs the signal to the difference unit 209 so that the sample output by the delay unit 207 to the difference unit 209 is the same indexed sample as the synthesized signal output from the core coder 271 to the difference unit 209, that is a state of time alignment is achieved.
- the delay of the audio signal is shown in figure 3 by step 310.
- the difference unit 209 calculates the difference between the input audio signal, which has been delayed by the delay unit 207, and the synthesised signal output from the core encoder 271.
- the difference unit outputs the difference signal to the difference encoder 273.
- step 311 The calculation of the difference between the delayed audio signal and the synthesized signal is shown in figure 3 by step 311.
- the difference encoder 273 comprises a modified discrete cosine transform (MDCT) processor 211 and a difference coder 213.
- MDCT modified discrete cosine transform
- the difference encoder receives the difference signal at the modified discrete cosine transform processor 211.
- the modified discrete cosine transform processor 211 receives the difference signal and performs a modified discrete cosine transform (MDCT) on the signal.
- MDCT is a Fourier-related transform based on the type-IV discrete cosine transform (DCT-IV), with the additional property of being lapped.
- the transform is designed to be performed on consecutive blocks of a larger dataset, where subsequent blocks are overlapped so that the last half of one block coincides with the first half of the next block. This overlapping, in addition to the energy-compaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications, since it can remove time aliasing components which is a result of the finite windowing process.
- embodiments of the present invention may generate the time to frequency transformation (and vice versa) by any discrete orthogonal transform.
- the coefficients of the forward transform are given by the weighting factor of each orthogonal basis function.
- the MDCT processing of the difference signal is shown in figure 3 by step 313a.
- the output of the modified discrete cosine transform processor 211 is passed to the difference coder 213.
- the difference coder may encode the components of the difference signal as a sequence of higher coding layers, where each layer may encode the signal at a progressively higher bit rate and quality level. In figure 2, this is depicted by the encoding layers R3, R4 and/or R5. It is to be understood that further embodiments may adopt differing number of encoding layers, thereby achieving a different level of granularity in terms of both bit rate and audio quality.
- the difference coder 213 receives the MDCT coefficients output from the MDCT processor 211 and groups the coefficients into a number of sub-bands.
- Table 1 represents a grouping of coefficients according to a first embodiment of the invention.
- Table 1 represents grouping the coefficients according to a psycho-acoustical model. In the example shown each MDCT produces 280 critically sampled coefficient values. However it would be appreciated that depending on the sample size of the MDCT different numbers of coefficients per transform may be generated. Similarly table 1 represents only one non-limiting example of grouping the coefficients into groups of coefficients and that embodiments of the present invention may group the coefficients in other combinations.
- the first column represents the index of the sub-band or group
- the second column represents the starting coefficient index value from the MDCT unit
- the third column represents the length of the sub- band or group as a number of consecutive coefficients.
- Table 1 indicates that there are 280 coefficients in total with the first sub-band (the sub-band with an index 1) starting from coefficient 0 (the first coefficient and is 4 coefficients in length and the 21st sub-band (index 21) starts from coefficient 236 and is 44 coefficients in length.
- step 315a The grouping of the coefficients into sub-bands is shown in figure 3 by step 315a.
- the difference coder 213 is further arranged to process the coefficient values in order to scale the coefficient values.
- the difference coder 213 is arranged to perform a correlation related scaling on the coefficient values.
- the scaling factors are generated dependent on a sub- band parameter.
- One such sub-band parameter is the energy value per sub-band of the synthesized signal.
- a further sub-band parameter is the maximum coefficient value either within the sub-band of the synthesized signal or across the whole spectrum of the synthesized signal.
- the relationship between the sub-band parameter value per sub-band and the scaling factor per sub-band may be any mapping between the two.
- mapping between the sub-band parameter value and the scaling factor may be linear or non-linear.
- the scaling factor may in some embodiments of the present invention be generated by a look up table which receives the sub-band parameter value and outputs a scaling factor, or may be generated by a processor.
- more than one type of sub-band parameter is received and a combination of the sub-band parameter types is used to determine the scaling factor. For example a weighted average of the sub-band parameter type values may be used to generate the scaling factor.
- the first scaling of the coefficients is shown in figure 3 by step 317a.
- the difference coder 213 is arranged to perform a predetermined scaling on the coefficient values.
- This predetermined value is known to both encoder 200 and decoder 400 of the codec.
- This second scaling of the coefficients is optional and a signal may be passed from the encoder to the decoder indicating the presence or absence of the predetermined factor coefficient scaling.
- the second scaling of the coefficients is shown in figure 3 by step 319.
- the difference coder 213 performs a sub-band factor scaling of the coefficients.
- This second scaling of the coefficients is also optional and a signal may be passed from the encoder to the decoder indicating the presence or absence of the sub-band factor coefficient scaling.
- the difference coder 213 furthermore does not carry out the sub-band factor indexing as described below and shown in figure 3 in step 323 but only carries out the indexing as described below on the coefficients themselves as shown in figure 3 in step 325.
- the difference coder 213 performs a first determination to calculate the scale factor per sub-band from data in the sub-band.
- This calculation step is shown in figure 3 by step 321a.
- the difference coder 213 may quantize the scale factors.
- the quantization of the scale factors may be performed using a 5 codeword quantizer. In such examples one codebook may be used for each sub band.
- step 321 b This quantization of the factors is shown in figure 3 by step 321 b.
- the difference coder 213 then scales the coefficients according to the quantized scale factors.
- the difference coder 213 furthermore performs an indexing of both the quantized scaling factors and the coefficients.
- the indexing of the scaling factors is performed according to a psycho-acoustical analysis of the coefficients of the MDCT difference signal.
- the grouped sub-bands may be ordered in terms of significance - for example sub-bands with coefficients indicating loud signals are selected as having a high rank and neighbouring sub- bands are selected as having a much lower rank as loud audio signals in the human ear have a masking effect on neighbouring signals.
- An example of the indexing of the scales may be grouping the scales by three and creating a 7 bit index per group of scales.
- the value of 7 bits is derived in this example from 5 possible values for each of the 3 scales. Since there are 21 sub bands in the examples above, 49 bits may be used for encoding the scales, 1 bit may be used as a flag to indicate the bit allocation and the rest of the 327 bits may be used for the scaled coefficients.
- the indexing of the quantized scaling factors is shown in figure 3 by step 323.
- the difference coder 213 furthermore performs a quantization of the scaled coefficients, and as described above also performs an indexing of the quantized scaled coefficients.
- the MDCT coefficients corresponding to frequencies from 0 to 7000Hz are quantized, the rest being set to zero.
- the quantization may be performed with 4 dimensional quantizers, so that the 280 length vector is divided into 70 4-dimensional vectors which are independently quantized.
- the codebook used for the quantization of each of the 70 vectors depends on the number of bits allocated to it. An embedded codebook like the one in Table 2 could be used.
- the codevectors are obtained as signed permutations of the leader vectors from Table 2. From the leader vector 3 only 7 signed permutations are considered, the eighth one being mapped to the leader vector 2 (the value +/-0.7 is changed to +/- 1).
- the parity of the leader vector 4 may be equal to one, so that the number of negative signs in the codevectors may be even. For a parity value of -1 , the number of negative components of the codevectors should be odd and for a null parity value, there are no constraints on the signs of the codevector components.
- the number of bits allocated for each of the 70 vectors may be in order from lower frequency to higher frequency coefficients:
- the choice of the bit allocation may be made based on an analysis of the energy of the original signal, or equally made as a predetermined decision.
- the nearest neighbor search algorithm may be performed according to the search on leaders algorithm known in the art.
- indexing of the codevectors (the quantised coefficients) is provided here below.
- the input of the enumeration algorithm may be for example
- the index IB is obtained such that its binary representation of M bits may include a 'zero 1 bit for a negative valued component and a 'one' bit for each positive valued component.
- the index IB is calculated then as:
- the leader vector 1 describes a single vector.
- step 325 The quantization of the coefficients and the indexing of the quantized coefficients is shown in figure 3 by step 325.
- the difference coder 213, then passes the indexed quantized coefficient values, and may also pass the indexed quantized scaling factors, and any other indicators to the multiplexer 215.
- the multiplexer 215 merges the R1 and/or R2 bitstreams with the higher level signals R3, R4, and R5 generated from the difference encoder 273.
- the multiplexer 215 outputs a multiplex signal which may then be transmitted or stored.
- This multiplexing is shown in figure 3 by step 325.
- the difference encoder controller 275 is arranged to control the difference encoder 273 and the difference coder 213 in particular enabling the difference coder 213 to determine a series of scaling factors to be used on the MDCT coefficients of the difference signal.
- Embodiments of the invention may in use the correlation between the synthesized signal and the difference signal to enable the difference signal to be more optimally processed and may have the frequency domain coefficients and grouped coefficients more optimally scaled using this correlation between the signals.
- the difference encoder controller 275 receives the synthesized signal output from the core encoder 271.
- the synthesized signal is passed to a MDCT processor 251.
- the difference encoder controller MDCT processor 251 may be the same MDCT processor 211 used by the difference encoder 273.
- the MDCT processing of the synthesized signal step is shown in figure 3 by step 313b.
- the coefficients generated by the MDCT processor 251 are passed to a synthesized signal spectral processor 253.
- the operations of the synthesized signal spectral processor 253 may be performed by the difference coder 213.
- the synthesized signal spectral processor 253 groups the coefficients into sub- bands in a manner previously described above with respect to the difference signal transformed coefficients.
- the MDCT processor produces 280 synthesized signal coefficients and the same grouping as shown above in Table 1 may be applied to produce 22 sub-bands.
- This grouping step is shown in figure 3 in step 315b.
- the coefficients from each of the 22 sub-bands are then processed within the synthesized signal spectral processor 253 so that the root mean squared value for the MDCT synthesized signal coefficients per sub-band is calculated.
- the root mean square value calculation may take the form of summing the square of the value of each MDCT synthesised signal coefficient within a sub-band. This sum may then be normalised to the number of MDCT coefficients within the sub-band. The square root of the resulting value may then be taken to give the root mean square value. This calculated root mean square value may be considered to indicate the energy value of the synthesised signal for each sub-band.
- This energy per sub band may then be passed to the difference coder 213 in the difference encoder.
- the difference coder then uses these energy values to calculate the scaling factors for each sub-band as described above and seen in figure 3 in step 317a.
- the scaling factor associated with each sub-band is then used to normalise those spectral coefficients lyihg within the same sub-band of the difference signal.
- the energy per sub-band of the synthesised audio signal may be represented by the average energy of the coefficients lying within the sub-band.
- the sum of the square of the spectral coefficients within the synthesised audio signal sub-band is calculated, and the sum of the square value may be optionally normalised to the number of spectral coefficients in the sub-band.
- the energy per sub-band of the synthesised audio signal may be represented by the average magnitude of the coefficients lying within the sub-band.
- the sum of the spectral coefficients within the synthesised audio signal sub-band is calculated, and then the sum of the spectral coefficients value may be optionally normalised to the number of spectral coefficients in the sub-band.
- the synthesised signal spectral processor 253 may locate a local maximum coefficient value within in sub-band, on a per sub-band basis. These values may then be used to provide the sub-band parameter value to the scaling step for each spectral sub-band within the difference signal.
- any one of the above parameter values may be combined in any numerical manner to obtain additional scaling steps for each sub-band based on the spectral coefficients of the synthesised audio signal.
- the decoder 400 receives the encoded signal and outputs a reconstructed audio output signal.
- the decoder comprises a demultiplexer 401 , which receives the encoded signal and outputs a series of data streams.
- the demultiplexer 401 is connected to a core decoder 471 for passing the lower level bitstreams (R1 or/and R2).
- the demultiplexer 401 is also connected to a difference decoder 473 for outputting the higher level bitstreams (R3, R4, or/and R5).
- the core decoder is connected to a synthesized signal decoder 475 to pass a synthesized signal between the two.
- the core decoder 471 is connected to a summing device 413 via a delay element 410 which also receives a synthesized signal.
- the synthesized signal decoder is connected to the difference decoder 473 for passing root mean square values for sub-band coefficients.
- the difference decoder 473 is also connected to the summing device 413 to pass a difference signal to the summing device.
- the summing device 413 has an output which is an approximation of the original signal.
- the demultiplexer 401 receives the encoded signal, shown in figure 5 by step 501.
- the demultiplexer 401 is further arranged to separate the lower level signals (R1 or/and R2) from the higher level signals (R3, R4, or/and R5). This step is shown in figure 5 in step 503.
- the lower level signals are passed to the core decoder 471 and the higher level signals passed to the difference decoder 473.
- the core decoder 471 uses the core codec 403, receives the low level signal (the core codec encoded parameters) discussed above and performs a decoding of these parameters to produce an output the same as that produced by the synthesized signal output by the core codec 203 in the encoder 200.
- the synthesized signal is then up-sampled by the post processor 405 to produce a synthesized signal similar to the synthesized signal output by the core encoder 271 in the encoder 200. If however the core codec is operating at the same sampling rate as the eventual output signal, then this step is not required.
- the synthesized signal is passed to the synthesized signal decoder 475 and via the delay element 410 to the summing device 413.
- step 505c The generation of the synthesized signal step is shown in figure 5 by step 505c.
- the synthesized decoder 475 receives the synthesized signal.
- the synthesized signal is processed in order to generate a series of energy per sub-band values (or other correlation factor) using the same process described above.
- the synthesized signal is passed to a MDCT processor 407.
- the MDCT step is shown in figure 5 in step 509.
- the MDCT coefficients of the synthesized signals are then grouped in the synthesized signal spectral processor 408 into sub-bands (using the predefined sub-band groupings - such as shown in Table 1 ).
- the grouping step is shown in figure 5 by step 513.
- the synthesized signal spectral processor 408 may calculate the root mean square value of the coefficients to produce an energy per sub-band value (in a manner shown above) which may be passed to the difference decoder 473. The calculation of the values is shown in figure 5 by step 515. As appreciated in embodiments where different values are generated within the encoder 200 synthesized signal spectral processor 253 the same process is used in the decoder 400 synthesized signal spectral processor 408 so that the outputs of the two devices are the same or close approximations to each other.
- the difference decoder 472 passes the high level signals to the difference processor 409.
- the difference processor 409 demultiplexes from the high level signals the received scale factors and the quantized scaled MDCT coefficients.
- the difference processor then re-indexes the received scale factors and the quantized scaled MDCT coefficients.
- the re-indexing returns the scale factors and the quantized scaled MDCT coefficients into an order prior to the indexing carried out in the steps 323 and 325 with respect to the scale factors and coefficients.
- the decoding of the index / consists of the decoding of l pos , and of IB- TO recover the position vector p from an index / pos , the following algorithm may be used:
- the vector z may then be recovered by inserting the value 1 at the positions indicated in the vector p and the value 0 at all the other positions.
- step 505a The re-indexing of the coefficient values is shown in figure 5 as step 505a, and the re-indexing of the scaling factors as step 505b.
- the difference decoder 472 furthermore re-scales the coefficient values.
- step 321 the inverse to the third scaling process is performed.
- This sub-band factor re-scaling is shown in figure 5 as step 507.
- the difference decoder 472 rescales the coefficients using the predetermined factor - in other words performing the inverse to the second scaling process (step 319).
- This pre-determined factor re-scaling is shown in figure 5 as step 511.
- the difference decoder 472 having received the RMS values of the sub-bands of the synthesized signal from the synthesized processor 475 uses these values in a manner similar to that described above to generate a series of re-scaling factors to perform the inverse to the first scaling process (step 317a).
- This synthesized signal factor re-scaling operation is shown in figure 5 as step 517.
- the required re-scaling operations may be required on the coefficients.
- steps 507 or 511 may be not performed if one or other of the optional second or third scaling operations is not performed in the coding of the signal.
- the difference decoder 473 re-index and re-scale processor outputs the re-scaled and re-indexed MDCT coefficients representing the difference signal. This is then passed to an inverse MDCT processor 411 which outputs a time domain sampled version of the difference signal.
- This inverse MDCT process is shown in figure 5 as step 519.
- the time domain sampled version of the difference signal is then passed from the difference decoder 473 to the summing device 413 which in combination with the delayed synthesized signal from the coder decoder 471 via the digital delay 410 produces a copy of the original digitally sampled audio signal.
- the MDCT (and IMDCT) is used to convert the signal from the time to frequency domain (and vice versa).
- any other appropriate time to frequency domain transform with an appropriate inverse transform may be implemented instead.
- Non limiting examples of other transforms comprise: a discrete Fourier transform (DFT), a fast
- FFT Fourier transform
- DCT-I discrete cosine transform
- DCT-III discrete cosine transform
- DST discrete sine transform
- the embodiments of the invention described above describe the codec 10 in terms of separate encoders 200 and decoders 400 apparatus in order to assist the understanding of the processes involved.
- the apparatus, structures and operations may be implemented as a single encoder- decoder apparatus/structure/operation.
- the coder and decoder may share some/or all common elements.
- the core codec 403 and post processor 405 of the decoder may be implemented by using the core coder 203 and post processor 205.
- the synthesized signal decoder 475 similarly may be implemented by using the synthesized signal encoder 275 of the encoder.
- circuitry and/or programming objects or code may be reused when ever the same process is operated.
- the embodiment shown above provides a more accurate result due to the correlation between the difference and the synthesized signals enabling the scaling factors dependent on the synthesized signals when used to scale the difference signal MDCT coefficients produces a better quantized result.
- the combination of the correlation scaling, the predetermined scaling and the sub- band factor scaling may produce a more accurate result than the prior art scaling processes at no additional signalling cost.
- the scaling factors are always part of the transmitted encoded signal even if some of the high level signals are not transmitted due to bandwidth capacity constraints.
- the additional scaling factors featured in embodiments described in the invention are not sent separately (like the factors sent in some embodiments of the invention). Therefore embodiments of the invention may show a higher coding efficiency when compared with systems where multiple sets of scaling factors are transmitted separately as a higher percentage of the transmitted signal is signal information (either core codec or encoded difference signal) rather than scaling information.
- embodiments of the invention operating within a codec within an electronic device 610
- the invention as described below may be implemented as part of any variable rate/adaptive rate audio (or speech) codec where the difference signal (between a synthesized and real audio signal) may be quantized.
- embodiments of the invention may be implemented in an audio codec which may implement audio coding over fixed or wired communication paths.
- user equipment may comprise an audio codec such as those described in embodiments of the invention above.
- user equipment is intended to cover any suitable type of wireless user equipment, such as mobile telephones, portable data processing devices or portable web browsers.
- PLMN public land mobile network
- elements of a public land mobile network may also comprise audio codecs as described above.
- the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof.
- some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
- firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
- While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
- the embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions.
- the memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory.
- the data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs) and processors based on multi-core processor architecture, as non-limiting examples.
- Embodiments of the inventions may be practiced in various components such as integrated circuit modules.
- the design of integrated circuits is by and large a highly automated process.
- Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
- Programs such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules.
- the resultant design in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Reduction Or Emphasis Of Bandwidth Of Signals (AREA)
- Amplitude Modulation (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
An encoder configured to receive an audio signal. The encoder is further configured to generate a synthesized audio signal and a difference signal. The encoder is further configured to scale the difference signal dependent on the synthesized audio signal.
Description
An Encoder
Field of the Invention
The present invention relates to coding, and in particular, but not exclusively to speech or audio coding.
Background of the Invention
Audio signals, like speech or music, are encoded for example for enabling an efficient transmission or storage of the audio signals.
Audio encoders and decoders are used to represent audio based signals, such as music and background noise. These types of coders typically do not utilise a speech model for the coding process, rather they use processes for representing all types of audio signals, including speech.
Speech encoders and decoders (codecs) are usually optimised for speech signals, and often operate at a fixed bit rate.
An audio codec can also be configured to operate with varying bit rates. At lower bit rates, such an audio codec may work with speech signals at a coding rate equivalent to pure speech codec. At higher bit rates, the performance may be good with any signal including music, background noise and speech.
A further audio coding option is an embedded variable rate speech or audio coding scheme, which is also referred as a layered coding scheme. Embedded variable rate audio or speech coding denotes an audio or speech coding scheme, in which a bit stream resulting from the coding operation is distributed into successive layers. A base or core layer which comprises of primary coded data generated by a core encoder is formed of the binary elements essential for the decoding of the binary stream, and determines a minimum quality of decoding. Subsequent layers
make it possible to progressively improve the quality of the signal arising from the decoding operation, where each new layer brings new information. One of the particular features of layered based coding is the possibility offered of intervening at any level whatsoever of the transmission or storage chain, so as to delete a part of binary stream without having to include any particular indication to the decoder.
The decoder uses the binary information that it receives and produces a signal of corresponding quality. For instance International Telecommunications Union Technical (ITU-T) standardisation aims at a wideband codec of 50 to 7000 Hz with bit rates from 8 to 32 kbps. The codec core layer will either work at 8 kbps or 12 kbps, and additional layers with quite small granularity will increase the observed speech and audio quality. The proposed layers will have as a minimum target at least five bit rates of 8, 12, 16, 24 and 32 kbps available from the same embedded bit stream.
By the very nature of layered, or scalable, based coding schemes the structure of the codecs tend to be hierarchical in form, consisting of multiple coding stages. Typically different coding techniques are used for the core (or base) layer and the additional layers. The coding methods used in the additional layers are then used to either code those parts of the signal which have not been coded by previous layers, or to code a residual signal from the previous stage. The residual signal is formed by subtracting a synthetic signal i.e. a signal generated as a result of the previous stage from the original. By adopting this hierarchical approach a combination of coding methods make it possible to reduce the output to relatively low bit rates but retaining sufficient quality, whilst also producing good quality audio reproduction by using higher bit rates.
Typically techniques used for low bit rate coding do not perform well at higher bit rates and vice versa. This has resulted in structures using two different coding technologies. The codec core layer is typically a speech codec based on the Code Excited Linear Prediction (CELP) algorithm or a variant such as adaptive multi-rate (AMR) CELP and variable multi-rate (VMR) CELP .
Details of the AMR codec can be found in the 3GPP TS 26.090 technical specification, the AMR-WB codec 3GPP TS 26.190 technical specification, and the AMR-WB+ in the 3GPP TS 26.290 technical specification.
A similar scalable audio codec is the VMR-WB codec (Variable Multi-Rate Wide Band) was developed with regards to the CDMA 2000 communication system.
Details on the VMR-WB codec can be found in the 3GPP2 technical specification C.S0052-0. In a manner similar to the AMR family the source control VMR-WB audio codec also uses ACELP coding as a core coder.
The higher layers utilise time frequency transformations as described in the prior art "Analysis/Synthesis Filter Bank Design Based on Time Domain Aliasing Cancellation" by J.P. Princen et al. (IEEE Transactions on ASSP, Vol. ASSP-34, No. 5. October 1986) and found in many audio coding technologies in order to quantise the residual signal.
However these higher level signals are not optimally coded. For example, the codec described in Ragot et al, "A 8-32 Kbit/s scalable wideband speech and audio coding candidate for ITU-T G729EV standardisation" published in Acoustics,
Speech and Signal Processing 2006, ICASSP 2006 proceedings, 2006 IEEE
International Conference Volume 1 , page 1-1 to I-4 describes scalable wideband audio coding. However the higher layers within the codec are not optimally processed. Specifically the higher layer signals lose significant information by the application of quantisation thresholds.
Summary of the Invention
This invention proceeds from the consideration that the higher layers in a scalable layered coding scheme are not optimally encoded. Specifically the residual signal, formed by subtracting the synthetic output signal from the original, may contain
signal components which are highly correlated with the synthetic output, generated from the core codec. Exploiting this correlation can result in a more efficient quantisation scheme for the signal components representing the residual. This may take the shape of using the energy content in the synthetic signal to normalise corresponding signal components in the residual signal. This can have the effect of limiting the dynamic range of signal components thereby increasing the efficiency of any subsequent quantisation stages.
Embodiments of the present invention aim to address the above problem.
There is provided according to a first aspect of the invention an encoder configured to receive an audio signal, wherein the encoder is further configured to: generate a synthesized audio signal and a difference signal; and scale the difference signal dependent on the synthesized audio signal.
The encoder may comprise: a first encoder configured to receive the audio signal and output the synthesized audio signal dependent on the audio signal; a difference circuit configured to receive the audio signal and the synthesized audio signal and subtract one of the audio signal and the synthesized audio signal from the other to output the difference signal; and a second encoder configured to receive the difference signal and the synthesized audio signal, and to encode the difference signal to generate a scaled encoded signal.
The second encoder may be configured to scale the difference signal dependent on the synthesized audio signal by generating at least one scaling factor dependent on the synthesized audio signal.
The second encoder may be configured to generate at least one scaling factor dependent on at least one estimated parameter dependent on the synthesized audio signal.
The second encoder may be further configured to estimate the at least one parameter.
The second encoder may further be configured to: perform a time domain to frequency domain transform on the synthesized audio signal; and group the synthesized audio signal frequency coefficients from the transform into at least two synthesized audio signal sub-band groups; wherein the second encoder is configured to estimate at least one parameter for each synthesized audio signal sub-band group.
The parameter is preferably at least one of: a root mean squared value of the synthesized audio signal frequency coefficients; an average energy value of the synthesised audio signal frequency coefficients; an average magnitude value of the synthesised audio signal frequency coefficients; and a maximum magnitude value selected from a set of synthesised audio signal frequency coefficients, wherein the set members are synthesised audio signal frequency coefficients in a sub-band group.
The second encoder is preferably configured to: perform a time domain to frequency domain transform on the difference signal; and group the difference signal frequency coefficients from the transform into at least two difference signal sub-band groups.
The number of synthesized audio signal sub-band groups is preferably equal to the number of difference signal sub-band groups, and the second encoder may be configured to scale each difference signal sub-band group by the scaling factor generated from a parameter from a corresponding synthesized audio signal sub- band group.
The time domain to frequency domain transformation may comprise an orthogonal discrete transform.
The orthogonal discrete transform may comprise a modified discrete cosine transform.
The encoder may be configured to further scale the difference signal dependent on a predetermined value.
The encoder may be configured to further scale the difference signal dependent on parameters estimated from the encoded signal.
According to a second aspect of the present invention there is provided a method for encoding an audio signal comprising: receiving an audio signal; generating a synthesized audio signal dependent on the received audio signal; generating a difference signal; and scaling the difference signal dependent on the synthesized audio signal.
The generating a difference signal may further comprise: subtracting one of the audio signal and the synthesized audio signal from the other to generate the difference signal; and encoding the scaled difference signal to generate a scaled encoded signal.
Scaling the difference signal may comprise generating at least one scaling factor dependent on the synthesized audio signal.
Generating at least one scaling factor may comprise generating a scaling factor dependent on at least one estimated parameter dependent on the synthesized audio signal.
Generating may further comprise estimating the at least one parameter.
Estimating may comprise: performing a time domain to frequency domain transform on the synthesized audio signal; and grouping the synthesized audio signal frequency coefficients from the transform into at least two synthesized audio
signal sub-band groups; and estimating at least one parameter for each synthesized audio signal sub-band group.
Estimating at least one parameter for each synthesized audio signal sub-band group may comprise estimating at least one of: a root mean squared value of the synthesized audio signal frequency coefficients per synthesized signal sub-band group; an average energy value of the synthesised audio signal frequency coefficients; an average magnitude value of the synthesised audio signal frequency coefficients; a maximum magnitude value selected from a set of synthesised audio signal frequency coefficients, wherein the set members are synthesised audio signal frequency coefficients in a sub-band group.
Encoding may comprise: performing a time domain to frequency domain transform to the difference signal; and grouping the difference signal frequency coefficients from the transform into at least two difference signal sub-band groups.
The number of synthesized audio signal sub-band groups is preferably equal to the number of difference signal sub-band groups, and scaling may comprise scaling each difference signal sub-band group by the scaling factor generated from a parameter from a corresponding synthesized audio signal sub-band group.
The time domain to frequency domain transformation may comprise an orthogonal discrete transform.
The orthogonal discrete transform may comprise a modified discrete cosine transform.
The method may further comprise scaling the difference signal dependent on a predetermined value.
The method may further comprise scaling the difference signal dependent on parameters estimated from the encoded signal.
According to a third aspect of the present invention there is provided a decoder configured to receive an encoded signal and output an estimate of an audio signal, wherein the decoder is further configured to: generate a synthesized audio signal; and scale the encoded signal dependent on the synthesized audio signal.
The decoder may comprise: a first decoder configured to receive at least part of the encoded signal and output the synthesized audio signal dependent on the encoded signal; a second decoder configured to receive a further part of the encoded signal and the synthesized audio signal, and to scale the further part of the encoded signal dependent on the synthesized audio signal.
The second decoder is preferably configured to scale the encoded signal dependent on the synthesized audio signal by generating at least one scaling factor dependent on the synthesized audio signal.
The second decoder is preferably configured to generate at least one scaling factor dependent on at least one estimated parameter dependent on the synthesized audio signal.
The second decoder is preferably further configured to estimate the at least one parameter.
The second decoder is preferably further configured to: perform a time domain to frequency domain transform on the synthesized audio signal; and group the synthesized audio signal frequency coefficients from the transform into at least two synthesized audio signal sub-band groups; wherein the second decoder is preferably configured to estimate at least one parameter for each synthesized audio signal sub-band group.
The parameter is preferably at least one of: a root mean squared value of the synthesized audio signal frequency coefficients; an average energy value of the
synthesised audio signal frequency coefficients; an average magnitude value of the synthesised audio signal frequency coefficients; and a maximum magnitude value selected from a set of synthesised audio signal frequency coefficients, wherein the set members are preferably synthesised audio signal frequency coefficients in a sub-band group.
The second decoder is preferably configured to: group the encoded signal frequency coefficients from the transform into at least two encoded signal sub-band groups.
The number of synthesized audio signal sub-band groups is preferably equal to the number of encoded signal sub-band groups, and the second encoder is preferably configured to scale each encoded signal sub-band group by the scaling factor generated from a parameter from a corresponding synthesized audio signal sub- band group.
The time domain to frequency domain transformation may comprise an orthogonal discrete transform.
The orthogonal discrete transform may comprise a modified discrete cosine transform.
The decoder is preferably configured to further scale the encoded signal dependent on a predetermined value.
The decoder is preferably configured to further scale the encoded signal dependent on parameters estimated from the encoded signal.
According to a fourth aspect of the present invention there is provided a method for decoding an audio signal comprising: receiving an encoded signal; generating a synthesized audio signal dependent on the received encoded signal; and scaling the encoded signal dependent on the synthesized audio signal.
Scaling the encoded signal may comprise generating at least one scaling factor dependent on the synthesized audio signal.
Generating at least one scaling factor may comprise generating a scaling factor dependent on at least one estimated parameter dependent on the synthesized audio signal.
Generating at least one scaling factor may further comprise estimating the at least one parameter.
Estimating may comprise: performing a time domain to frequency domain transform on the synthesized audio signal; and grouping the synthesized audio signal frequency coefficients from the transform into at least two synthesized audio signal sub-band groups; and estimating at least one parameter for each synthesized audio signal sub-band group.
Estimating at least one parameter for each synthesized audio signal sub-band group may comprise estimating at least one of: a root mean squared value of the synthesized audio signal frequency coefficients; an average energy value of the synthesised audio signal frequency coefficients; an average magnitude value of the synthesised audio signal frequency coefficients; and a maximum magnitude value selected from a set of synthesised audio signal frequency coefficients, wherein the set members are preferably synthesised audio signal frequency coefficients in a sub-band group.
The method may further comprise: grouping the encoded signal frequency coefficients from the transform into at least two encoded signal sub-band groups.
The number of synthesized audio signal sub-band groups is preferably equal to the number of difference signal sub-band groups, and scaling may comprise scaling
each difference signal sub-band group by the scaling factor generated from a parameter from a corresponding synthesized audio signal sub-band group.
The time domain to frequency domain transformation may comprise an orthogonal discrete transform.
The orthogonal discrete transform may comprise a modified discrete cosine transform.
The method may further comprise scaling the encoded signal dependent on a predetermined value.
The method may further comprise scaling the encoded signal dependent on parameters estimated from the encoded signal.
According to a fifth aspect of the present invention there is provided an apparatus comprising an encoder as described above.
According to a sixth aspect of the present invention there is provided an apparatus comprising a decoder as described above.
According to a seventh aspect of the present invention there is provided an electronic device comprising an encoder as described above.
According to an eighth aspect of the present invention there is provided an electronic device comprising a decoder as described above.
According to a ninth aspect of the present invention there is provided a computer program product configured to perform a method for encoding an audio signal comprising: receiving an audio signal; generating a synthesized audio signal dependent on the received audio signal; generating a difference signal; and scaling the difference signal dependent on the synthesized audio signal.
According to a tenth aspect of the present invention there is provided a computer program product configured to perform a method for decoding an audio signal; comprising: receiving an encoded signal; generating a synthesized audio signal dependent on the received encoded signal; and scaling the encoded signal dependent on the synthesized audio signal.
According to an eleventh aspect of the present invention there is provided an encoder configured to receive an audio signal and output a scaled encoded signal, wherein the encoder comprises: means for generating a synthesized audio signal; means for generating a difference signal; and means for scaling the difference signal dependent on the synthesized audio signal.
The encoder may further comprise: means for outputting a difference signal comprising the means for subtracting one of the audio signal and the synthesized audio signal from the other; and wherein the means for generating the encoded signal comprises: means for receiving the difference signal and the synthesized audio signal; and means for encoding the scaled difference signal to generate the encoded signal.
According to a twelfth aspect of the present invention there is provided a decoder configured to receive an encoded signal and output an estimate of an audio signal, comprising: means for generating a synthesized audio signal; and means for scaling the encoded signal dependent on the synthesized audio signal.
The decoder may comprise means for generating the synthesized audio signal dependent on at least part of the encoded signal.
Brief Description of Drawings
For better understanding of the present invention, reference will now be made by way of example to the accompanying drawings in which:
Figure 1 shows schematically an electronic device employing embodiments of the invention;
Figure 2 shows schematically an audio encoder according to an embodiment of the present invention; Figure 3 shows a flow diagram illustrating the operation of the audio encoder according to an embodiment of the present invention;
Figure 4 shows schematically an audio decoder according to an embodiment of the present invention; and
Figure 5 shows a flow diagram illustrating the operation of an embodiment of the audio decoder according to the present invention.
Description of Preferred Embodiments of the Invention
The following describes in more detail possible codec mechanisms for the provision of layered or scalable variable rate audio codecs. In this regard reference is first made to Figure 1 which shows a schematic block diagram of an exemplary electronic device 610, which may incorporate a codec according to an embodiment of the invention.
The electronic device 610 may for example be a mobile terminal or user equipment of a wireless communication system.
The electronic device 610 comprises a microphone 611 , which is linked via an analogue-to-digital converter 614 to a processor 621. The processor 621 is further linked via a digital-to-analogue converter 632 to loudspeakers 633. The processor 621 is further linked to a transceiver (TX/RX) 613, to a user interface (Ul) 615 and to a memory 622.
The processor 621 may be configured to execute various program codes. The implemented program codes comprise an audio encoding code for encoding a lower frequency band of an audio signal and a higher frequency band of an audio signal. The implemented program codes 623 further comprise an audio decoding
code. The implemented program codes 623 may be stored for example in the memory 622 for retrieval by the processor 621 whenever needed. The memory 622 could further provide a section 624 for storing data, for example data that has been encoded in accordance with the invention.
The encoding and decoding code may in embodiments of the invention be implemented in hardware or firmware.
The user interface 615 enables a user to input commands to the electronic device 610, for example via a keypad, and/or to obtain information from the electronic device 610, for example via a display. The transceiver 613 enables a communication with other electronic devices, for example via a wireless communication network.
It is to be understood again that the structure of the electronic device 610 could be supplemented and varied in many ways.
A user of the electronic device 610 may use the microphone 611 for inputting speech that is to be transmitted to some other electronic device or that is to be stored in the data section 624 of the memory 622. A corresponding application has been activated to this end by the user via the user interface 615. This application, which may be run by the processor 621 , causes the processor 621 to execute the encoding code stored in the memory 622.
The analogue-to-digital converter 614 converts the input analogue audio signal into a digital audio signal and provides the digital audio signal to the processor 621.
The processor 621 may then process the digital audio signal in the same way as described with reference to Figures 2 and 3.
The resulting bit stream is provided to the transceiver 613 for transmission to another electronic device. Alternatively, the coded data could be stored in the data
section 624 of the memory 622, for instance for a later transmission or for a later presentation by the same electronic device 610.
The electronic device 610 could also receive a bit stream with correspondingly encoded data from another electronic device via its transceiver 613. In this case, the processor 621 may execute the decoding program code stored in the memory 622. The processor 621 decodes the received data, for instance in the same way as described with reference to Figures 4 and 5, and provides the decoded data to the digital-to-analogue converter 632. The digital-to-analogue converter 632 converts the digital decoded data into analogue audio data and outputs them via the loudspeakers 633. Execution of the decoding program code could be triggered as well by an application that has been called by the user via the user interface 615.
The received encoded data could also be stored instead of an immediate presentation via the loudspeakers 633 in the data section 624 of the memory 622, for instance for enabling a later presentation or a forwarding to still another electronic device.
It would be appreciated that the schematic structures described in figures 2 and 4 and the method steps in figures 3 and 5 represent only a part of the operation of a complete audio codec as exemplarily shown implemented in the electronic device shown in figure 1. The general operation of audio codecs are known and features of such codecs which do not assist in the understanding of the operation of the invention are not described in detail.
The embodiment of the invention audio codec is now described in more detail with respect to Figures 2 to 5.
With respect to figures 2 and 3, a view of an encoder (otherwise known as the coder) embodiment of the invention is shown.
With respect to figure 2 a schematic view of the encoder 200 impiementing an embodiment of the invention is shown. Furthermore the operation of the embodiment encoder is described as a flow diagram in figure 3.
The encoder may be divided into: a core encoder 271 ; a delay unit 207; a difference unit 209; a difference encoder 273; a difference encoder controller 275; and a multiplexer 215.
The encoder 200 in step 301 receives the original audio signal. In a first embodiment of the invention the audio signal is a digitally sampled signal. In other embodiments of the present invention the audio input may be an analogue audio signal, for example from a microphone 6, which is analogue to digitally (A/D) converted. In further embodiments of the invention the audio input is converted from a pulse code modulation digital signal to amplitude modulation digital signal.
The core encoder 271 , receives the audio signal to be encoded and outputs the encoded parameters which represent the core level encoded signal, and also the synthesised audio signal (in other words the audio signal is encoded into parameters and then the parameters are decoded using the reciprocal process to produce the synthesised audio signal). In the embodiment shown in figure 2 the core encoder 271 may be divided into three parts (the pre-processor 201 , core codec 203 and post-processor 205).
In the embodiment shown in figure 2, the core encoder receives the audio input at the pre-processing element 201. The pre-processing stage 201 may perform a low pass filter followed by decimation in order to reduce the number of samples being coded. For example, if the input signal was originally sampled at 16 kHz, the signal may be down-sampled to 8 kHz using a linear phase FIR filter with a 3 decibel cut off around 3.6 kHz and then decimating the number of samples by a factor of 2. The pre-processing element 201 outputs a pre-processed audio input signal to the core codec 203. This operation is represented in step 303 of figure 3. Further embodiments may include core codecs operating at different sampling frequencies.
For instance some core codecs can operate at the original sampling frequency of the input audio signal.
The core codec 203 receives the signal and may use any appropriate encoding technique. In the embodiment shown in figure 2 the core codec is an algebraic code excited linear prediction encoder (ACELP) which is configured to a bitstream of typical ACELP parameters as lower level signals as depicted by R1 and/or R2. The parameter bitstream is output to the multiplexer 215.
If ACELP is used, the encoder output bit stream may include typical ACELP encoder parameters. Non-limiting examples of these parameters include LPC (Linear prediction calculation) parameters quantized in LSP (Line Spectral Pair) or ISP (Immittance Spectral Pair) domain describing the spectral content, LTP (long- term prediction) parameters describing the periodic structure, ACELP excitation parameters describing the residual signal after linear predictors, and signal gain parameters.
The core codec 203 may, in some embodiments of the present invention, comprise a configured two-stage cascade code excited linear prediction (CELP) coder, such as VMR, producing R1 and/or R2 bitstreams at 8 Kbit/s and/or 12 Kbit/s respectively. In some embodiments of the invention it is possible to have a single speech coding stage, such as G729 - defined by the ITU-T standard. It is to be understood that embodiments of the present invention could equally use any audio or speech based codec to represent the core layer.
This encoding of the pre-processed signal is shown in figure 3 by step 305.
The core codec 203 furthermore outputs a synthesised audio signal (in order words the audio signal is first encoded into parameters such as those described above and then decoded back into an audio signal within the same call codec). This synthesised signal is passed to the post-processing unit 205. It is appreciated that the synthesised signal is different from the signal input to the core codec as the
parameters are approximations to the correct values - the differences are because of the modelling errors and quantisation of the parameters.
The decoding of the parameters is shown in figure 3 by step 307.
The post-processor 205 re-samples the synthesised audio output in order that the output of the post-processor has a sample rate equal to the input audio signal. Thus, using the example values described above with respect to the pre-processor 201 and the core codec 203, the synthesised signal output from the core codec 203 is first up-sampled to 16 kHz and then filtered using a low path filter to prevent aliasing occurring.
The post processing of the synthesized signal is shown in figure 3 by step 309.
The post-processor 205 outputs the re-sampled signal to the difference unit 209.
In some embodiments of the invention the pre-processor 201 and post-processor 205 are optional elements and the core codec may receive and encode the digital signal directly. In some embodiments of the invention the core codec 203 receives an analogue or pulse width modulated signal directly and performs the parameterization of the audio signal outputting a synthesized signal to the difference unit 209.
The audio input is also passed to the delay unit 207, which performs a digital delay equal to the delay produced by the core coder 271 in producing a synthesized signal, and then outputs the signal to the difference unit 209 so that the sample output by the delay unit 207 to the difference unit 209 is the same indexed sample as the synthesized signal output from the core coder 271 to the difference unit 209, that is a state of time alignment is achieved.
The delay of the audio signal is shown in figure 3 by step 310.
The difference unit 209 calculates the difference between the input audio signal, which has been delayed by the delay unit 207, and the synthesised signal output from the core encoder 271. The difference unit outputs the difference signal to the difference encoder 273.
The calculation of the difference between the delayed audio signal and the synthesized signal is shown in figure 3 by step 311.
The difference encoder 273 comprises a modified discrete cosine transform (MDCT) processor 211 and a difference coder 213.
The difference encoder receives the difference signal at the modified discrete cosine transform processor 211. The modified discrete cosine transform processor 211 receives the difference signal and performs a modified discrete cosine transform (MDCT) on the signal. The MDCT is a Fourier-related transform based on the type-IV discrete cosine transform (DCT-IV), with the additional property of being lapped. The transform is designed to be performed on consecutive blocks of a larger dataset, where subsequent blocks are overlapped so that the last half of one block coincides with the first half of the next block. This overlapping, in addition to the energy-compaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications, since it can remove time aliasing components which is a result of the finite windowing process.
It is to be understood that further embodiments may equally generate the difference signal within a frequency domain. For instance, the original signal and the core codec synthetic signal can be transformed into the frequency domain. The difference signal can then be generated by subtracting corresponding frequency coefficients.
It is to be further understood that embodiments of the present invention may generate the time to frequency transformation (and vice versa) by any discrete
orthogonal transform. Wherein, the coefficients of the forward transform are given by the weighting factor of each orthogonal basis function.
The MDCT processing of the difference signal is shown in figure 3 by step 313a.
The output of the modified discrete cosine transform processor 211 is passed to the difference coder 213.
The difference coder may encode the components of the difference signal as a sequence of higher coding layers, where each layer may encode the signal at a progressively higher bit rate and quality level. In figure 2, this is depicted by the encoding layers R3, R4 and/or R5. It is to be understood that further embodiments may adopt differing number of encoding layers, thereby achieving a different level of granularity in terms of both bit rate and audio quality.
The difference coder 213 receives the MDCT coefficients output from the MDCT processor 211 and groups the coefficients into a number of sub-bands. Table 1 , below, represents a grouping of coefficients according to a first embodiment of the invention.
Table 1 represents grouping the coefficients according to a psycho-acoustical model. In the example shown each MDCT produces 280 critically sampled coefficient values. However it would be appreciated that depending on the sample size of the MDCT different numbers of coefficients per transform may be generated. Similarly table 1 represents only one non-limiting example of grouping the coefficients into groups of coefficients and that embodiments of the present invention may group the coefficients in other combinations.
In the example provided by Table 1 , the first column represents the index of the sub-band or group, the second column represents the starting coefficient index value from the MDCT unit, and the third column represents the length of the sub- band or group as a number of consecutive coefficients.
Thus, for example, Table 1 indicates that there are 280 coefficients in total with the first sub-band (the sub-band with an index 1) starting from coefficient 0 (the first coefficient and is 4 coefficients in length and the 21st sub-band (index 21) starts from coefficient 236 and is 44 coefficients in length.
Table 1
The grouping of the coefficients into sub-bands is shown in figure 3 by step 315a.
The difference coder 213 is further arranged to process the coefficient values in order to scale the coefficient values.
Firstly the difference coder 213 is arranged to perform a correlation related scaling on the coefficient values. The scaling factors are generated dependent on a sub- band parameter. One such sub-band parameter is the energy value per sub-band
of the synthesized signal. A further sub-band parameter is the maximum coefficient value either within the sub-band of the synthesized signal or across the whole spectrum of the synthesized signal.
The relationship between the sub-band parameter value per sub-band and the scaling factor per sub-band may be any mapping between the two.
The mapping between the sub-band parameter value and the scaling factor may be linear or non-linear.
For example a linear mapping may simply be one of Scaling Factor for sub-band = k x sub-band parameter value per sub-band, where k is a predefined constant.
The scaling factor may in some embodiments of the present invention be generated by a look up table which receives the sub-band parameter value and outputs a scaling factor, or may be generated by a processor.
In some embodiments of the invention more than one type of sub-band parameter is received and a combination of the sub-band parameter types is used to determine the scaling factor. For example a weighted average of the sub-band parameter type values may be used to generate the scaling factor.
The calculation of the sub-band parameter value used to determine the scaling factor for the sub-band and thus scale the difference frequency coefficient values is described in further detail below.
The first scaling of the coefficients is shown in figure 3 by step 317a.
Secondly the difference coder 213 is arranged to perform a predetermined scaling on the coefficient values. This predetermined value is known to both encoder 200 and decoder 400 of the codec. This second scaling of the coefficients is optional
and a signal may be passed from the encoder to the decoder indicating the presence or absence of the predetermined factor coefficient scaling.
The second scaling of the coefficients is shown in figure 3 by step 319.
Thirdly the difference coder 213 performs a sub-band factor scaling of the coefficients. This second scaling of the coefficients is also optional and a signal may be passed from the encoder to the decoder indicating the presence or absence of the sub-band factor coefficient scaling. Where the sub-band factor scaling is not carried out the difference coder 213 furthermore does not carry out the sub-band factor indexing as described below and shown in figure 3 in step 323 but only carries out the indexing as described below on the coefficients themselves as shown in figure 3 in step 325.
The third scaling of the coefficients is shown in figure 3 by step 321.
The difference coder 213 performs a first determination to calculate the scale factor per sub-band from data in the sub-band.
This calculation step is shown in figure 3 by step 321a.
The difference coder 213 may quantize the scale factors. The quantization of the scale factors may be performed using a 5 codeword quantizer. In such examples one codebook may be used for each sub band.
This quantization of the factors is shown in figure 3 by step 321 b.
The difference coder 213 then scales the coefficients according to the quantized scale factors.
This scaling by the quantized scale factors is shown in figure 3 by step 321c.
The difference coder 213 furthermore performs an indexing of both the quantized scaling factors and the coefficients. The indexing of the scaling factors is performed according to a psycho-acoustical analysis of the coefficients of the MDCT difference signal. In such psycho-acoustical modelling the grouped sub-bands may be ordered in terms of significance - for example sub-bands with coefficients indicating loud signals are selected as having a high rank and neighbouring sub- bands are selected as having a much lower rank as loud audio signals in the human ear have a masking effect on neighbouring signals.
An example of the indexing of the scales may be grouping the scales by three and creating a 7 bit index per group of scales. The value of 7 bits is derived in this example from 5 possible values for each of the 3 scales. Since there are 21 sub bands in the examples above, 49 bits may be used for encoding the scales, 1 bit may be used as a flag to indicate the bit allocation and the rest of the 327 bits may be used for the scaled coefficients.
The indexing of the quantized scaling factors is shown in figure 3 by step 323.
The difference coder 213 furthermore performs a quantization of the scaled coefficients, and as described above also performs an indexing of the quantized scaled coefficients.
For completeness an example only of the quantisation process is described below. It is to be understood that other quantisation processes known in the art may be used, including inter alia, vector quantisation.
In the example the MDCT coefficients corresponding to frequencies from 0 to 7000Hz are quantized, the rest being set to zero. For the sampling frequency of 16 kHz as described above, this corresponds to having to quantize 280 coefficients for each frame of 20ms. The quantization may be performed with 4 dimensional quantizers, so that the 280 length vector is divided into 70 4-dimensional vectors which are independently quantized. The codebook used for the quantization of
each of the 70 vectors depends on the number of bits allocated to it. An embedded codebook like the one in Table 2 could be used.
Table 1 List of leader vectors forming the embedded codebook
No. Leader vector Cardinality Obs. Cumulated no. bits
1 0 0 0 0 1 0 Codebook 0 bits
2 1 0 0 0 8 4
3 0.7 0 0 0 7 incomplete 4 Codebook 4 bits
4 1 1 0 0 12 parity 1 5
5 1 1 1 0 32 6 Codebook 6 bits
6 1 0.7 0 0 48 7
7 1 1 1 1 16 7 Codebook 7 bits
The codevectors are obtained as signed permutations of the leader vectors from Table 2. From the leader vector 3 only 7 signed permutations are considered, the eighth one being mapped to the leader vector 2 (the value +/-0.7 is changed to +/- 1). The parity of the leader vector 4 may be equal to one, so that the number of negative signs in the codevectors may be even. For a parity value of -1 , the number of negative components of the codevectors should be odd and for a null parity value, there are no constraints on the signs of the codevector components.
In embodiments of the invention there may be several bit allocation arrangements. For example, the number of bits allocated for each of the 70 vectors may be in order from lower frequency to higher frequency coefficients:
{7,7,7,7,7,7,7,7,6,6,6,6,6,6,6,6,6,6,6,6,6,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4 ,4,4,4,4,4,4,0,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4} or
{6,6,6,6,6,6,6,6,6,6,6,4,6,4,6,4,6,4,6,4,6,4,6,4,6,4,6,4,6,4,6,4,6,4,6,4,6,4,6,4,4,4,4,4 ,4,4,4,4,4,4,0,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4}.
The choice of the bit allocation may be made based on an analysis of the energy of the original signal, or equally made as a predetermined decision.
The nearest neighbor search algorithm may be performed according to the search on leaders algorithm known in the art.
An example of the indexing of the codevectors (the quantised coefficients) is provided here below. In this example a distinction may be made between an index issued from the position of the non-null codevector components lpos, and an index issued from the signs of the non-null components, IB-
In order to obtain /pos, the input of the enumeration algorithm may be for example
the vector z
such that there are exactly M unitary components in the vector, at positions corresponding to the positions of non-null components in the codevector. Additionally, a position vector
P = {p0,..., PN^ ) G [O, ...,n-l} js created which specifies the exact location of
each non-null component. Since there are
such z vectors, they can be enumerated like binomial coefficients following the algorithm given by equations:
/JMf (»M,[i']) = i,O ≤ i < /i'≤ n _
The index IB is obtained such that its binary representation of M bits may include a 'zero1 bit for a negative valued component and a 'one' bit for each positive valued component.
The index IB is calculated then as:
J, = ∑*,2H
!=1 where b; is 0 if the i-th non-null component is negative and 1 otherwise.
The final index / of the codevector is calculated using 1 = 1 2 +1
This example may be applied the leader vectors 2, 3, 4, 5, 7. The leader vector 1 describes a single vector. For the leader vector 6, a supplementary offset of
22 = 24 may be added to the index of the codevectors for which the value +/-0.7
is before the value +/-1.
The quantization of the coefficients and the indexing of the quantized coefficients is shown in figure 3 by step 325.
The difference coder 213, then passes the indexed quantized coefficient values, and may also pass the indexed quantized scaling factors, and any other indicators to the multiplexer 215.
The multiplexer 215 merges the R1 and/or R2 bitstreams with the higher level signals R3, R4, and R5 generated from the difference encoder 273.
The multiplexer 215 outputs a multiplex signal which may then be transmitted or stored.
This multiplexing is shown in figure 3 by step 325.
The difference encoder controller 275 is arranged to control the difference encoder 273 and the difference coder 213 in particular enabling the difference coder 213 to determine a series of scaling factors to be used on the MDCT coefficients of the
difference signal. Embodiments of the invention may in use the correlation between the synthesized signal and the difference signal to enable the difference signal to be more optimally processed and may have the frequency domain coefficients and grouped coefficients more optimally scaled using this correlation between the signals.
The difference encoder controller 275 receives the synthesized signal output from the core encoder 271.
The synthesized signal is passed to a MDCT processor 251. In some embodiments of the invention the difference encoder controller MDCT processor 251 may be the same MDCT processor 211 used by the difference encoder 273.
The MDCT processing of the synthesized signal step is shown in figure 3 by step 313b.
The coefficients generated by the MDCT processor 251 are passed to a synthesized signal spectral processor 253. In some embodiments of the invention the operations of the synthesized signal spectral processor 253 may be performed by the difference coder 213.
The synthesized signal spectral processor 253 groups the coefficients into sub- bands in a manner previously described above with respect to the difference signal transformed coefficients. In a first embodiment of the invention the MDCT processor produces 280 synthesized signal coefficients and the same grouping as shown above in Table 1 may be applied to produce 22 sub-bands.
This grouping step is shown in figure 3 in step 315b.
The coefficients from each of the 22 sub-bands are then processed within the synthesized signal spectral processor 253 so that the root mean squared value for the MDCT synthesized signal coefficients per sub-band is calculated. The root
mean square value calculation may take the form of summing the square of the value of each MDCT synthesised signal coefficient within a sub-band. This sum may then be normalised to the number of MDCT coefficients within the sub-band. The square root of the resulting value may then be taken to give the root mean square value. This calculated root mean square value may be considered to indicate the energy value of the synthesised signal for each sub-band.
This root mean square calculation can be seen in figure 3 in step 317b.
This energy per sub band may then be passed to the difference coder 213 in the difference encoder.
The difference coder then uses these energy values to calculate the scaling factors for each sub-band as described above and seen in figure 3 in step 317a. The scaling factor associated with each sub-band is then used to normalise those spectral coefficients lyihg within the same sub-band of the difference signal.
In some embodiments of the present invention the energy per sub-band of the synthesised audio signal may be represented by the average energy of the coefficients lying within the sub-band. In this embodiment the sum of the square of the spectral coefficients within the synthesised audio signal sub-band is calculated, and the sum of the square value may be optionally normalised to the number of spectral coefficients in the sub-band.
In other embodiments of the present invention the energy per sub-band of the synthesised audio signal may be represented by the average magnitude of the coefficients lying within the sub-band. In this embodiment the sum of the spectral coefficients within the synthesised audio signal sub-band is calculated, and then the sum of the spectral coefficients value may be optionally normalised to the number of spectral coefficients in the sub-band.
In further embodiments of the present invention the synthesised signal spectral processor 253 may locate a local maximum coefficient value within in sub-band, on a per sub-band basis. These values may then be used to provide the sub-band parameter value to the scaling step for each spectral sub-band within the difference signal.
In some embodiments any one of the above parameter values may be combined in any numerical manner to obtain additional scaling steps for each sub-band based on the spectral coefficients of the synthesised audio signal.
With respect to Figure 4, an example of a decoder 400 for the codec is shown. The decoder 400 receives the encoded signal and outputs a reconstructed audio output signal.
The decoder comprises a demultiplexer 401 , which receives the encoded signal and outputs a series of data streams. The demultiplexer 401 is connected to a core decoder 471 for passing the lower level bitstreams (R1 or/and R2). The demultiplexer 401 is also connected to a difference decoder 473 for outputting the higher level bitstreams (R3, R4, or/and R5). The core decoder is connected to a synthesized signal decoder 475 to pass a synthesized signal between the two. Furthermore the core decoder 471 is connected to a summing device 413 via a delay element 410 which also receives a synthesized signal. The synthesized signal decoder is connected to the difference decoder 473 for passing root mean square values for sub-band coefficients. The difference decoder 473 is also connected to the summing device 413 to pass a difference signal to the summing device. The summing device 413 has an output which is an approximation of the original signal.
With respect to the figures 4 and 5 an example of the decoding of an encoded signal to produce an approximation of the original audio signal is shown.
The demultiplexer 401 receives the encoded signal, shown in figure 5 by step 501.
The demultiplexer 401 is further arranged to separate the lower level signals (R1 or/and R2) from the higher level signals (R3, R4, or/and R5). This step is shown in figure 5 in step 503.
The lower level signals are passed to the core decoder 471 and the higher level signals passed to the difference decoder 473.
The core decoder 471 , using the core codec 403, receives the low level signal (the core codec encoded parameters) discussed above and performs a decoding of these parameters to produce an output the same as that produced by the synthesized signal output by the core codec 203 in the encoder 200. The synthesized signal is then up-sampled by the post processor 405 to produce a synthesized signal similar to the synthesized signal output by the core encoder 271 in the encoder 200. If however the core codec is operating at the same sampling rate as the eventual output signal, then this step is not required. The synthesized signal is passed to the synthesized signal decoder 475 and via the delay element 410 to the summing device 413.
The generation of the synthesized signal step is shown in figure 5 by step 505c.
The synthesized decoder 475 receives the synthesized signal. The synthesized signal is processed in order to generate a series of energy per sub-band values (or other correlation factor) using the same process described above. Thus the synthesized signal is passed to a MDCT processor 407. The MDCT step is shown in figure 5 in step 509. The MDCT coefficients of the synthesized signals are then grouped in the synthesized signal spectral processor 408 into sub-bands (using the predefined sub-band groupings - such as shown in Table 1 ). The grouping step is shown in figure 5 by step 513. The synthesized signal spectral processor 408 may calculate the root mean square value of the coefficients to produce an energy per sub-band value (in a manner shown above) which may be passed to the difference decoder 473. The calculation of the values is shown in figure 5 by step 515. As
appreciated in embodiments where different values are generated within the encoder 200 synthesized signal spectral processor 253 the same process is used in the decoder 400 synthesized signal spectral processor 408 so that the outputs of the two devices are the same or close approximations to each other.
The difference decoder 472 passes the high level signals to the difference processor 409. The difference processor 409 demultiplexes from the high level signals the received scale factors and the quantized scaled MDCT coefficients. The difference processor then re-indexes the received scale factors and the quantized scaled MDCT coefficients. The re-indexing returns the scale factors and the quantized scaled MDCT coefficients into an order prior to the indexing carried out in the steps 323 and 325 with respect to the scale factors and coefficients.
An example of re-indexing with respect to the indexing process example described above is shown here. The decoding of the index /, consists of the decoding of lpos, and of IB- TO recover the position vector p from an index /pos, the following algorithm may be used:
L i=O 2. while (M>0){ j f n - i λ j+i f n _ i \ findysuchthat SU-lJ^^ 'SU-lJ
F=/+ 1 }
The vector z may then be recovered by inserting the value 1 at the positions indicated in the vector p and the value 0 at all the other positions.
The re-indexing of the coefficient values is shown in figure 5 as step 505a, and the re-indexing of the scaling factors as step 505b.
The difference decoder 472 furthermore re-scales the coefficient values.
Thus using the re-indexed scaled values determined in step 505b, the inverse to the third scaling process (step 321 ) is performed.
This sub-band factor re-scaling is shown in figure 5 as step 507.
The difference decoder 472 rescales the coefficients using the predetermined factor - in other words performing the inverse to the second scaling process (step 319).
This pre-determined factor re-scaling is shown in figure 5 as step 511.
The difference decoder 472, having received the RMS values of the sub-bands of the synthesized signal from the synthesized processor 475 uses these values in a manner similar to that described above to generate a series of re-scaling factors to perform the inverse to the first scaling process (step 317a).
This synthesized signal factor re-scaling operation is shown in figure 5 as step 517.
According to the embodiment of the present invention only the required re-scaling operations may be required on the coefficients. Thus where only the first scaling operation is carried out on the coefficients only step 517 from steps 507, 511 and 517 are performed. Similarly steps 507 or 511 may be not performed if one or other of the optional second or third scaling operations is not performed in the coding of the signal.
The difference decoder 473 re-index and re-scale processor outputs the re-scaled and re-indexed MDCT coefficients representing the difference signal. This is then passed to an inverse MDCT processor 411 which outputs a time domain sampled version of the difference signal.
This inverse MDCT process is shown in figure 5 as step 519.
The time domain sampled version of the difference signal is then passed from the difference decoder 473 to the summing device 413 which in combination with the delayed synthesized signal from the coder decoder 471 via the digital delay 410 produces a copy of the original digitally sampled audio signal.
This combination is shown in figure 5 by the step 521.
The above described a procedure using the example of a VMR audio codec. However, similar principles can be applied to any other speech or audio codec.
In the example provided above of the present invention the MDCT (and IMDCT) is used to convert the signal from the time to frequency domain (and vice versa). As would be appreciated any other appropriate time to frequency domain transform with an appropriate inverse transform may be implemented instead. Non limiting examples of other transforms comprise: a discrete Fourier transform (DFT), a fast
Fourier transform (FFT), a discrete cosine transform (DCT-I, DCT-II, DCT-III, DCT- IV etc), and a discrete sine transform (DST).
The embodiments of the invention described above describe the codec 10 in terms of separate encoders 200 and decoders 400 apparatus in order to assist the understanding of the processes involved. However, it would be appreciated that the apparatus, structures and operations may be implemented as a single encoder- decoder apparatus/structure/operation. Furthermore in some embodiments of the invention the coder and decoder may share some/or all common elements.
For example the core codec 403 and post processor 405 of the decoder may be implemented by using the core coder 203 and post processor 205. The synthesized signal decoder 475 similarly may be implemented by using the synthesized signal encoder 275 of the encoder. Thus circuitry and/or programming objects or code may be reused when ever the same process is operated.
The embodiment shown above provides a more accurate result due to the correlation between the difference and the synthesized signals enabling the scaling factors dependent on the synthesized signals when used to scale the difference signal MDCT coefficients produces a better quantized result.
The combination of the correlation scaling, the predetermined scaling and the sub- band factor scaling may produce a more accurate result than the prior art scaling processes at no additional signalling cost.
Furthermore as the synthesized signals are recreated from the low level signals the scaling factors are always part of the transmitted encoded signal even if some of the high level signals are not transmitted due to bandwidth capacity constraints. In other words by using the inherent information in the correlation between the synthesized signal generated from the core codec and the original difference signal the additional scaling factors featured in embodiments described in the invention are not sent separately (like the factors sent in some embodiments of the invention). Therefore embodiments of the invention may show a higher coding efficiency when compared with systems where multiple sets of scaling factors are transmitted separately as a higher percentage of the transmitted signal is signal information (either core codec or encoded difference signal) rather than scaling information.
Although the above examples describe embodiments of the invention operating within a codec within an electronic device 610, it would be appreciated that the invention as described below may be implemented as part of any variable
rate/adaptive rate audio (or speech) codec where the difference signal (between a synthesized and real audio signal) may be quantized. Thus, for example, embodiments of the invention may be implemented in an audio codec which may implement audio coding over fixed or wired communication paths.
Thus user equipment may comprise an audio codec such as those described in embodiments of the invention above.
It shall be appreciated that the term user equipment is intended to cover any suitable type of wireless user equipment, such as mobile telephones, portable data processing devices or portable web browsers.
Furthermore elements of a public land mobile network (PLMN) may also comprise audio codecs as described above.
In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
The embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may
represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions.
The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs) and processors based on multi-core processor architecture, as non-limiting examples.
Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.
The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims.
However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.
Claims
1. An encoder configured to receive an audio signal, wherein the encoder is further configured to: generate a synthesized audio signal and a difference signal; and scale the difference signal dependent on the synthesized audio signal.
2. An encoder as claimed in claim 1 , comprising: a first encoder configured to receive the audio signal and output the synthesized audio signal dependent on the audio signal; a difference circuit configured to receive the audio signal and the synthesized audio signal and subtract one of the audio signal and the synthesized audio signal from the other to output the difference signal; and a second encoder configured to receive the difference signal and the synthesized audio signal, and to encode the scaled difference signal to generate a scaled encoded signal .
3. An encoder as claimed in claim 2, wherein the second encoder is configured to scale the difference signal dependent on the synthesized audio signal by generating at least one scaling factor dependent on the synthesized audio signal.
4. An encoder as claimed in claim 3, wherein the second encoder is configured to generate at least one scaling factor dependent on at least one estimated parameter dependent on the synthesized audio signal.
5. An encoder as claimed in claim 4, wherein the second encoder is further configured to estimate the at least one parameter.
6. An encoder as claimed in claim 4 and 5, wherein the second encoder is further configured to: perform a time domain to frequency domain transform on the synthesized audio signal; and group the synthesized audio signal frequency coefficients from the transform into at least two synthesized audio signal sub-band groups; wherein the second encoder is configured to estimate at least one parameter for each synthesized audio signal sub-band group.
7. An encoder as claimed in claim 6, wherein the parameter is at least one of: a root mean squared value of the synthesized audio signal frequency coefficients; an average energy value of the synthesised audio signal frequency coefficients; an average magnitude value of the synthesised audio signal frequency coefficients; and a maximum magnitude value selected from a set of synthesised audio signal frequency coefficients, wherein the set members are synthesised audio signal frequency coefficients in a sub-band group.
8. An encoder as claimed in claims 4 to 7, wherein the second encoder is configured to: perform a time domain to frequency domain transform on the difference signal; and group the difference signal frequency coefficients from the transform into at least two difference signal sub-band groups.
9. An encoder as claimed in claim 8 when dependent on claim 6 or 7, wherein the number of synthesized audio signal sub-band groups is equal to the number of difference signal sub-band groups, and the second encoder is configured to scale each difference signal sub-band group by the scaling factor generated from a parameter from a corresponding synthesized audio signal sub-band group.
10. An encoder as claimed in claims 6 to 9, wherein the time domain to frequency domain transformation comprises an orthogonal discrete transform.
11. An encoder as claimed in claim 10, wherein the orthogonal discrete transform comprises a modified discrete cosine transform.
12. An encoder as claimed in claims 1 to 11 , wherein the encoder is configured to further scale the difference signal dependent on a predetermined value.
13. An encoder as claimed in claims 1 to 12, wherein the encoder is configured to further scale the difference signal dependent on parameters estimated from the difference signal.
14. A method for encoding an audio signal comprising: receiving an audio signal; generating a synthesized audio signal dependent on the received audio signal; generating a difference signal; and scaling the difference signal dependent on the synthesized audio signal.
15. A method as claimed in claim 14, wherein the generating a difference signal comprises: subtracting one of the audio signal and the synthesized audio signal from the other to generate the difference signal; and generating comprises encoding the scaled difference signal to generate a scaled encoded signal.
16. A method as claimed in claims 14 and 15, wherein scaling the difference signal comprises generating at least one scaling factor dependent on the synthesized audio signal.
17. A method as claimed in claim 16, wherein generating at least one scaling factor comprises generating a scaling factor dependent on at least one estimated parameter dependent on the synthesized audio signal.
18. A method as claimed in claim 17, wherein generating further comprises estimating the at least one parameter.
19. A method as claimed in claims 17 and 18, wherein estimating comprises: performing a time domain to frequency domain transform on the synthesized audio signal; and grouping the synthesized audio signal frequency coefficients from the transform into at least two synthesized audio signal sub-band groups; and estimating at least one parameter for each synthesized audio signal sub- band group.
20. A method as claimed in claim 19, wherein estimating at least one parameter for each synthesized audio signal sub-band group comprises estimating at least one of: a root mean squared value of the synthesized audio signal frequency coefficients per synthesized signal sub-band group; an average energy value of the synthesised audio signal frequency coefficients an average magnitude value of the synthesised audio signal frequency coefficients a maximum magnitude value selected from a set of synthesised audio signal frequency coefficients, wherein the set members are synthesised audio signal frequency coefficients in a sub-band group.
21. A method as claimed in claims 17 to 20, wherein encoding comprises: performing a time domain to frequency domain transform on the difference signal; and grouping the difference signal frequency coefficients from the transform into at least two difference signal sub-band groups.
22. A method as claimed in claim 21 when dependent on claim 19 or 20, wherein the number of synthesized audio signal sub-band groups is equal to the number of difference signal sub-band groups, and scaling comprises scaling each difference signal sub-band group by the scaling factor generated from a parameter from a corresponding synthesized audio signal sub-band group.
23. A method as claimed in claim 19 to 22, wherein the time domain to frequency domain transformation comprises an orthogonal discrete transform.
24. A method as claimed in claim 23, wherein the orthogonal discrete transform comprises a modified discrete cosine transform.
25. A method as claimed in claims 14 to 24, further comprising scaling the difference signal dependent on a predetermined value.
26. A method as claimed in claims 14 to 25, further comprising scaling the difference signal dependent on parameters estimated from the encoded signal.
27. A decoder configured to receive an encoded signal and output an estimate of an audio signal, wherein the decoder is further configured to: generate a synthesized audio signal; and scale the encoded signal dependent on the synthesized audio signal.
28. A decoder as claimed in claim 27, comprising: a first decoder configured to receive at least part of the encoded signal and output the synthesized audio signal dependent on the encoded signal; a second decoder configured to receive a further part of the encoded signal and the synthesized audio signal, and to scale the further part of the encoded signal dependent on the synthesized audio signal.
29. A decoder as claimed in claims 27 and 28, wherein the second decoder is configured to scale the encoded signal dependent on the synthesized audio signal by generating at least one scaling factor dependent on the synthesized audio signal.
30. A decoder as claimed in claim 29, wherein the second decoder is configured to generate at least one scaling factor dependent on at least one estimated parameter dependent on the synthesized audio signal.
31. A decoder as claimed in claim 30, wherein the second decoder is further configured to estimate the at least one parameter.
32. A decoder as claimed in claims 30 and 31 , wherein the second decoder is further configured to: perform a time domain to frequency domain transform to the synthesized audio signal; and group the synthesized audio signal frequency coefficients from the transform into at least two synthesized audio signal sub-band groups; wherein the second decoder is configured to estimate at least one parameter for each synthesized audio signal sub-band group.
33. A decoder as claimed in claim 32, wherein the parameter is at least one of: a root mean squared value of the synthesized audio signal frequency coefficients; an average energy value of the synthesised audio signal frequency coefficients; an average magnitude value of the synthesised audio signal frequency coefficients; and a maximum magnitude value selected from a set of synthesised audio signal frequency coefficients, wherein the set members are synthesised audio signal frequency coefficients in a sub-band group.
34. A decoder as claimed in claims 30 to 33, wherein the second decoder is configured to: group the encoded signal frequency coefficients from the transform into at least two encoded signal sub-band groups.
35. A decoder as claimed in claim 34 when dependent on claim 32 or 33, wherein the number of synthesized audio signal sub-band groups is equal to the number of encoded signal sub-band groups, and the second encoder is configured to scale each encoded signal sub-band group by the scaling factor generated from a parameter from a corresponding synthesized audio signal sub-band group.
36. A decoder as claimed in claims 31 to 35, wherein the time domain to frequency domain transformation comprises an orthogonal discrete transform.
37. A decoder as claimed in claim 36, wherein the orthogonal discrete transform comprises a modified discrete cosine transform.
38. A decoder as claimed in claims 27 to 37, wherein the decoder is configured to further scale the encoded signal dependent on a predetermined value.
39. A decoder as claimed in claims 27 to 38, wherein the decoder is configured to further scale the encoded signal dependent on parameters estimated from the encoded signal.
40. A method for decoding an audio signal comprising: receiving an encoded signal; generating a synthesized audio signal dependent on the received encoded signal; and scaling the encoded signal dependent on the synthesized audio signal.
41. A method as claimed in claim 40, wherein scaling the encoded signal comprises generating at least one scaling factor dependent on the synthesized audio signal.
42. A method as claimed in claim 41 , wherein generating at least one scaling factor comprises generating a scaling factor dependent on at least one estimated parameter dependent on the synthesized audio signal.
43. A method as claimed in claim 42, wherein generating at least one scaling factor further comprises estimating the at least one parameter.
44. A method as claimed in claims 42 and 43, wherein estimating comprises: performing a time domain to frequency domain transform to the synthesized audio signal; and grouping the synthesized audio signal frequency coefficients from the transform into at least two synthesized audio signal sub-band groups; and estimating at least one parameter for each synthesized audio signal sub- band group.
45. A method as claimed in claim 44, wherein estimating at least one parameter for each synthesized audio signal sub-band group comprises estimating at least one of: a root mean squared value of the synthesized audio signal frequency coefficients; an average energy value of the synthesised audio signal frequency coefficients; an average magnitude value of the synthesised audio signal frequency coefficients; and a maximum magnitude value selected from a set of synthesised audio signal frequency coefficients, wherein the set members are synthesised audio signal frequency coefficients in a sub-band group.
46. A method as claimed in claims 42 to 45, further comprising: grouping the encoded signal frequency coefficients from the transform into at least two encoded signal sub-band groups.
47. A method as claimed in claim 46 when dependent on claim 44 or 45, wherein the number of synthesized audio signal sub-band groups is equal to the number of difference signal sub-band groups, and scaling comprises scaling each difference signal sub-band group by the scaling factor generated from a parameter from a corresponding synthesized audio signal sub-band group.
48. A method as claimed in claim 44, wherein the time domain to frequency domain transformation comprises an orthogonal discrete transform.
49. A method as claimed in claim 48, wherein the orthogonal discrete transform comprises a modified discrete cosine transform.
50. A method as claimed in claims 40 to 49, further comprising scaling the encoded signal dependent on a predetermined value.
51. A method as claimed in claims 40 to 50, further comprising scaling the encoded signal dependent on parameters estimated from the encoded signal.
52. An apparatus comprising an encoder as claimed in claims 1 to 13.
53. An apparatus comprising a decoder as claimed in claims 27 to 39.
54. An electronic device comprising an encoder as claimed in claims 1 to 13.
55. An electronic device comprising a decoder as claimed in claims 27 to 39.
56. A computer program product configured to perform a method for encoding an audio signal comprising: receiving an audio signal; generating a synthesized audio signal dependent on the received audio signal; generating a difference signal; and scaling the difference signal dependent on the synthesized audio signal.
57 A computer program product configured to perform a method for decoding an audio signal; comprising: receiving an encoded signal; generating a synthesized audio signal dependent on the received encoded signal; and scaling the encoded signal dependent on the synthesized audio signal.
58. An encoder configured to receive an audio signal and output a scaled encoded signal, wherein the encoder comprises: means for generating a synthesized audio signal; means for generating a difference signal; and means for scaling the difference signal dependent on the synthesized audio signal.
59. An encoder as claimed in claim 52, further comprising: means for outputting a difference signal comprising means for subtracting one of the audio signal and the synthesized audio signal from the other; and wherein the means for generating the encoded signal comprises: means for receiving the difference signal and the synthesized audio signal; and means for encoding the scaled difference signal to generate the encoded signal.
60. A decoder configured to receive an encoded signal and output an estimate of an audio signal, comprising: means for generating a synthesized audio signal; and means for scaling the encoded signal dependent on the synthesized audio signal.
61. A decoder as claimed in claim 60, comprising means for generating the synthesized audio signal dependent on at least part of the encoded signal.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/IB2007/001712 WO2008114078A1 (en) | 2007-03-16 | 2007-03-16 | En encoder |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/IB2007/001712 WO2008114078A1 (en) | 2007-03-16 | 2007-03-16 | En encoder |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2008114078A1 true WO2008114078A1 (en) | 2008-09-25 |
| WO2008114078A8 WO2008114078A8 (en) | 2010-08-12 |
Family
ID=38562950
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/IB2007/001712 Ceased WO2008114078A1 (en) | 2007-03-16 | 2007-03-16 | En encoder |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2008114078A1 (en) |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2006049205A1 (en) * | 2004-11-05 | 2006-05-11 | Matsushita Electric Industrial Co., Ltd. | Scalable decoding apparatus and scalable encoding apparatus |
-
2007
- 2007-03-16 WO PCT/IB2007/001712 patent/WO2008114078A1/en not_active Ceased
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2006049205A1 (en) * | 2004-11-05 | 2006-05-11 | Matsushita Electric Industrial Co., Ltd. | Scalable decoding apparatus and scalable encoding apparatus |
| EP1808684A1 (en) * | 2004-11-05 | 2007-07-18 | Matsushita Electric Industrial Co., Ltd. | Scalable decoding apparatus and scalable encoding apparatus |
Non-Patent Citations (1)
| Title |
|---|
| RAGOT S ET AL: "A 8-32 KBIT/S Scalable Wideband Speech and Audio Coding Candidate for ITU-T G729EV Standardization", ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 2006. ICASSP 2006 PROCEEDINGS. 2006 IEEE INTERNATIONAL CONFERENCE ON TOULOUSE, FRANCE 14-19 MAY 2006, PISCATAWAY, NJ, USA,IEEE, 14 May 2006 (2006-05-14), pages I - 1, XP010930101, ISBN: 1-4244-0469-X * |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2008114078A8 (en) | 2010-08-12 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| KR101442997B1 (en) | Systems, methods, apparatus, and computer-readable media for multi-stage shape vector quantization | |
| CA2704812C (en) | An encoder for encoding an audio signal | |
| KR102078865B1 (en) | Apparatus and method for generating a bandwidth extended signal | |
| KR101785885B1 (en) | Adaptive bandwidth extension and apparatus for the same | |
| CN101297356B (en) | Audio compression | |
| CN101180676B (en) | Method and apparatus for vector quantization of spectral envelope representation | |
| CN101896968A (en) | Audio encoding apparatus and method thereof | |
| US8719011B2 (en) | Encoding device and encoding method | |
| US9373337B2 (en) | Reconstruction of a high-frequency range in low-bitrate audio coding using predictive pattern analysis | |
| EP2227682A1 (en) | An encoder | |
| JPWO2009125588A1 (en) | Encoding apparatus and encoding method | |
| US20100292986A1 (en) | encoder | |
| WO2009022193A2 (en) | Devices, methods and computer program products for audio signal coding and decoding | |
| US20100280830A1 (en) | Decoder | |
| US8924202B2 (en) | Audio signal coding system and method using speech signal rotation prior to lattice vector quantization | |
| WO2008114078A1 (en) | En encoder | |
| EP3320539A1 (en) | Bit error detector for an audio signal decoder | |
| WO2011114192A1 (en) | Method and apparatus for audio coding | |
| CN102568489B (en) | Scrambler |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07734892 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 07734892 Country of ref document: EP Kind code of ref document: A1 |