HK1135791B

HK1135791B - Audio decoding

Info

Publication number: HK1135791B
Application number: HK10100423.5A
Authority: HK
Inventors: Lars F. Villemoes; Erik G. P. Schuijers
Original assignee: Koninklijke Philips Electronics N.V.; Dolby International Ab
Priority date: 2006-03-29
Filing date: 2007-03-23
Publication date: 2012-11-02

Description

Audio decoding

The present invention relates to audio decoding and in particular, but not exclusively, to MPEG surround signal decoding.

Digital encoding of various source signals has also become increasingly important in the last decade as digital signal representation and communication has increasingly replaced analog representation and communication. For example, the distribution of media content, such as video and music, is increasingly based on digital content encoding.

Furthermore, the last decade has presented a trend towards the development of multi-channel audio, in particular spatial audio beyond conventional stereo signals. For example, traditional stereo recordings include only two channels, whereas modern advanced audio systems typically use five or six channels, as is the case with popular 5.1 surround sound systems. This provides a more complex listening experience in which the user can be surrounded by a sound source.

Various techniques and standards have also been developed to deliver such multi-channel signals. For example, six discrete channels representing a 5.1 surround system may be transmitted in accordance with a standard such as advanced audio coding (ACC) or Dolby digital standard.

However, it is well known that in order to provide backward compatibility, a larger number of channels are downmixed to a smaller number of channels, and in particular, it is often used to downmix a 5.1 surround sound signal to a stereo signal in order to allow a conventional (stereo) decoder to reproduce the stereo signal and to allow a surround sound decoder to reproduce the 5.1 signal.

An example of this is the MPEG2 backward compatible encoding method. The multi-channel signal is down-mixed to a stereo signal. In the auxiliary data part the additional signal will be encoded as multi-channel data, thereby allowing an MPEG2 multi-channel decoder to generate a multi-channel signal representation. The MPEG1 decoder discards the auxiliary data and thus decodes only the stereo down-mix signal. The main disadvantage of this encoding method applied in MPEG2 is that the additional data rate required for the additional signal is in the same order of magnitude as the data rate required for the stereo signal. Thus, the additional bit rate for expanding stereo to multi-channel audio will be large.

Other existing methods for backward compatible multi-channel transmission without additional multi-channel information can be generally characterized as matrixed surround methods. Examples of matrix surround methods include methods such as Dolby Prologic II and Logic-7. The common principle of these methods is that they perform matrix multiplication on multiple channels of an input signal with an appropriate matrix, thereby producing an output signal with a smaller number of channels. In particular, prior to mixing the surround channels with the front and center channels, the matrix encoder typically applies a phase shift process to the surround channels.

Another reason for performing channel conversion is coding efficiency. For example, it has been found that surround sound audio signals can be encoded as stereo audio signals mixed with a parametric bitstream describing the spatial properties of the audio signals. While the decoder can reproduce the stereo audio signal with a very satisfactory accuracy. This can provide significant bit rate savings.

We can use several parameters to describe the spatial properties of an audio signal. One of these parameters is the inter-channel cross-correlation, e.g. the cross-correlation between the left and right channel of a stereo signal. Another parameter is the channel power ratio. In so-called (parametric) spatial audio coders, such as MPEG surround coders, these and other parameters are extracted from the original audio signal, resulting in a reduced number of channels, e.g. an audio signal with only a single channel, and a set of parameters describing the spatial properties of the original audio signal. In so-called (parametric) spatial audio decoders the spatial properties described by the transmitted spatial parameters are recovered.

Preferably, such spatial audio coding uses a cascaded or tree-based hierarchical structure containing standard units in the encoder and decoder. In the encoder these standard units may be down-mixers, e.g. 2-1, 3-2, etc. down-mixers, which down-mix the channels into a reduced number of channels, while in the decoder the corresponding standard units may be up-mixers, e.g. 1-2, 2-3 up-mixers, which separate the channels into a larger number of channels.

Fig. 1 depicts an example of an encoder for encoding a multi-channel audio signal in accordance with the method currently set by the MPEG standard and known as MPEG surround. MPEG surround systems encode a multi-channel signal into a mono or stereo down-mix signal accompanied by a set of parameters. This down-mix signal may be encoded by a conventional audio encoder, such as an MP3 or AAC encoder. These parameters then represent the spatial image of the multi-channel audio signal and can be encoded and embedded in a way that is backward compatible with conventional audio streams.

At the decoder side, the core bit stream is decoded first, as a result of which a mono or stereo down-mix signal is generated. For conventional decoders, i.e. decoders that do not use MPEG surround decoding, these decoders can still decode this down-mix signal. However, if an MPEG surround decoder is available, the spatial parameters are restored, thereby producing a multi-channel representation that is perceptually close to the original multi-channel input signal. An example of an MPEG surround decoder is depicted in fig. 2.

Unlike the basic spatial encoding/decoding process shown in fig. 1 and 2, the MPEG surround system provides a rich feature set that allows implementation in a large number of application domains. One of the most prominent features is called matrix compatibility or matrixed-surround compatibility.

Examples of conventional matrix surround systems are Dolby ProLogic I and II and CircleSurround. These systems operate in the manner shown in figure 3. The multi-channel PCM input signal is transformed into a so-called matrixed down-mix signal, wherein the transformation is typically performed using a 5(.1) -2 matrix. The idea behind matrix surround systems is that: in a stereo down-mix signal, the front and surround (back) channels are mixed in-phase and out-of-phase, respectively. To some extent, this allows an inversion to be performed at the decoder side, thereby enabling multi-channel reconstruction.

In matrix surround systems, the stereo signal may be transmitted using conventional channels dedicated for stereo transmission. Thus, similar to the MPEG surround system, the matrix surround system also provides a backward compatibility. However, since stereo down-mix signals have certain phase properties due to matrix-mix encoding, these signals typically do not have a high sound quality when listened to as stereo signals from loudspeakers or headphones.

In a matrix surround decoder, a multi-channel PCM output signal is generated by applying an M-N (where, for example, M is 2 and N is 5(.1)) matrix. However, in general, the N-M matrix (N > M) is not reversible, and thus, matrix surround systems are usually not able to accurately reconstruct the original multi-channel PCM output signal, which often has significant artifacts (artifacts).

In contrast to such conventional matrix surround systems, matrix surround compatibility in MPEG surround is achieved by applying a 2 x 2 matrix to complex sample values in the frequency subbands of the MPEG surround encoder after MPEG surround encoding. An example of such an encoder is depicted in fig. 4. Typically, a 2 x 2 matrix is a complex valued matrix and the coefficients of the matrix depend on the spatial parameters. The spatial parameters in the system are time-and frequency-variant, whereby this 2 x 2 matrix is also time-and frequency-variant. Accordingly, complex matrix operations are typically applied to time-frequency matrix blocks (tiles).

By applying matrix surround in MPEG surround encoderCompatibility may allow the resulting stereo signal to be used with conventional matrix surround encoders, e.g. Dolby Pro-Logic^TMThe generated signals are compatible. This would allow a conventional decoder to decode the surround signal. Furthermore, the operation of matrix surround compatibility can be reversed in a compatible MPEG surround decoder, thereby allowing the generation of high quality multi-channel signals.

The matrix compatibility encoding matrix may be described as follows:

where L, R is the conventional MPEG stereo down-mix signal, L_MTX、R_MTXIs a matrix surround encoded downmix signal, and wherein h_xyAre complex coefficients determined in response to multi-channel parameters.

The main advantage of providing a matrix-compatible stereo signal by means of a 2 x 2 matrix is that: these matrices are all invertible. Thus, the MPEG surround decoder can achieve the same output audio quality regardless of whether a matrix-compatible stereo downmix signal is used at the encoder or not. An example of a compatible MPEG surround decoder is depicted in fig. 5.

Thus, in a conventional MPEG surround decoder, the inverse process at the decoder end can be determined as follows:

thus, since H is reversible, the operation of the matrix compatibility encoder is also reversible.

In the MPEG surround system, processing including matrix compatibility operations is performed in the frequency domain. More specifically, a so-called complex-exponential-modulated Quadrature Mirror Filter (QMF) bank is to be used in order to divide the frequency axis into a plurality of bands.

In different aspects, such a QMF bank may be equivalent to an overlap-add Discrete Fourier Transform (DFT) bank, or to a Fast Fourier Transform (FFT) as its valid counterpart. The QMF bank and DFT bank share the following desirable properties for signal processing:

the frequency domain representation is oversampled. Due to this property, we can apply processing such as equalization (single band scaling) without introducing aliasing distortion. For critically sampled representations, such as the well-known inverse Modified Discrete Cosine Transform (MDCT) used in AAC, these representations do not respect this property. Thus, time-varying and frequency-varying modifications of MDCT coefficients implemented prior to the synthesis process will incur aliasing, which in turn may cause auditory artifacts in the output signal.

The values of the frequency domain representation are complex. Complex-valued representations allow for simple modification of the phase of the signal compared to real-valued representations.

Although there are many advantages in signal processing over critically sampled real-valued representations, a significant disadvantage compared to such representations is computational complexity. One major part of the complexity of an MPEG surround decoder is due to the QMF analysis and synthesis filterbank, and the corresponding processing for complex-valued signals.

Accordingly, it is also currently proposed to perform a part of the processing in the real-valued domain for so-called low-power (LP) decoders. For this purpose, the complex modulated filter bank is replaced by a real-valued cosine modulated filter bank, followed by a local extension for the lower frequency band and for the complex-valued domain. Such a filter bank is depicted in fig. 6.

In a conventional mode of operation, the MPEG surround decoder applies real-valued processing to complex-valued sub-segmentsBand domain sampling, for LP, the decoder applies these processes to real-valued subband domain samples. However, in the decoder, the matrix compatibility property involves phase rotation in order to recover the original stereo down-mix signal in the frequency domain. These phase rotations are performed by means of complex-valued processing. In other words, the matrix compatibility decoding matrix H^-1Is itself complex-valued, thereby introducing the required phase rotation. Accordingly, in this system, the matrix surround compatible operation is irreversible in the real-valued part of the LP frequency domain representation, which results in a degradation of the decoding quality.

It would therefore be advantageous to have an improved audio decoding process.

Accordingly, the invention seeks to preferably mitigate, alleviate or eliminate one or more of the above-identified deficiencies in the art and disadvantages singly or in any combination.

According to a first aspect of the present invention, there is provided an audio decoder comprising: means for receiving input data, wherein the input data comprises an N-channel signal corresponding to a down-mix signal of an M-channel audio signal, M > N, thereby having complex valued subband coding matrices applied in frequency subbands, and parametric multi-channel data associated with the down-mix signal; means for generating frequency subbands for the N-channel signal, wherein at least some of the frequency subbands are real-valued frequency subbands; determining means for determining a real-valued subband decoding matrix for compensating an encoding matrix application in response to parametric multi-channel data; means for generating downmix data corresponding to the downmix signal by performing a matrix multiplication on the real-valued subband decoding matrix and data of the N-channel signal in at least some of the real-valued frequency subbands.

The invention may provide an improved and/or facilitated decoding process. In particular, the present invention can greatly reduce complexity while achieving high audio quality. For example, the invention allows the effect of complex-valued subband matrix multiplication to be at least partially reversed at the decoder using real-valued frequency subbands.

As a specific example, the present invention may allow, for example, the use of real-valued frequency subbands to partially flip the MPEG matrix-compatible encoding process in an MPEG surround decoder.

The decoder may comprise means for generating a down-mix signal in response to the down-mix data and may further comprise means for generating an M-channel audio signal in response to the down-mix data and the parametric multi-channel data. In such embodiments, the present invention may generate an accurate multi-channel audio signal based at least in part on real-valued frequency subbands.

A different decoding matrix may be determined for each frequency sub-band.

According to an optional feature of the invention, the determining means is arranged to determine an inverse matrix of the complex-valued subband of the encoding matrix, and to determine the decoding matrix in response to the inverse matrix.

This may provide a particularly efficient implementation and/or an improved decoding quality.

According to an optional feature of the invention, the determining means is arranged to determine each real valued matrix coefficient of the decoding matrix in response to an absolute value of a respective matrix coefficient of the inverse matrix.

This may provide a particularly efficient implementation and/or an improved decoding quality. Each real-valued matrix coefficient of the decoding matrix may be determined only in response to the absolute value of the corresponding matrix coefficient in the inverse matrix, without taking into account any other matrix coefficients. The corresponding matrix coefficients may be matrix coefficients of the same position in the inverse matrix for the same frequency sub-band.

According to an optional feature of the invention, the determining means is arranged to determine each real valued matrix coefficient to be substantially the absolute value of the corresponding matrix coefficient in the inverse matrix.

According to an optional feature of the invention, the determining means is arranged to determine the decoding matrix in response to a subband transformation matrix, wherein the subband transformation matrix is a product of the respective decoding matrix and the encoding matrix.

This may provide a particularly efficient implementation and/or an improved decoding quality. The corresponding decoding and encoding matrices may be encoding and decoding matrices for the same frequency sub-band. In particular, the determining means may be arranged to select the coefficient values of the decoding matrix such that the transformation matrix has the desired properties.

According to an optional feature of the invention, the determining means is arranged to determine the decoding matrix only in response to an amplitude measure of the transform matrix.

This may provide a particularly efficient implementation and/or an improved decoding quality. In particular, the determining means may be arranged to ignore the phase metric when determining the decoding matrix. This may reduce complexity while keeping perceptual audio quality degradation low.

According to an optional feature of the invention, the transform matrix for each sub-band is given by:

wherein G is a subband decoding matrix, H is a subband encoding matrix, and the determining means are arranged to select matrix coefficients

So that p is₁₂And p₂₁Satisfies a criterion.

This may provide a particularly efficient implementation and/or an improved decoding quality. By selecting a decoding matrix, a power metric below a certain threshold (which may be determined in response to a constraint or other parameter) may be generated, or the decoding matrix may be selected as the decoding matrix that generates the smallest power metric, for example.

According to an optional feature of the invention, the amplitude measure is responsive toTo be determined.

According to an optional feature of the invention, the determining means is further arranged to determine p at substantially 1₁₁And p₂₂The matrix coefficients are selected under the constraint of the magnitude of (c).

According to an optional feature of the invention, the downmix signal and the parametric multi-channel data are in accordance with an MPEG surround standard.

The invention may provide a particularly efficient, low complexity and/or improved audio quality decoding process for MPEG surround compatible signals.

According to an optional feature of the invention, the encoding matrix is an MPEG matrix surround compatibility encoding matrix and the first N channel signal is an MPEG matrix surround compatibility signal.

The invention may provide a particularly efficient, low complexity and/or improved audio quality, and in particular may provide a low complexity decoding process to compensate for MPEG matrix surround compatibility operations performed at the encoder.

According to another aspect of the present invention, there is provided an audio decoding method including: receiving input data, wherein the input data comprises an N-channel signal corresponding to a down-mix signal of an M-channel audio signal, M > N, thereby having complex valued subband encoding matrices applied to frequency subbands and parametric multi-channel data associated with the down-mix signal; generating frequency subbands for the N-channel signal, wherein at least some of the frequency subbands are real-valued frequency subbands; determining a real-valued subband decoding matrix for compensating an application of an encoding matrix in response to the parametric multi-channel data; and generating downmix data corresponding to the downmix signal by performing matrix multiplication on the real-valued subband decoding matrices and the N-channel signal data in at least some of the real-valued frequency subbands.

According to another aspect of the present invention, there is provided a receiver for receiving an N-channel signal, the receiver comprising: means for receiving input data, wherein the input data comprises an N-channel signal corresponding to a down-mix signal of an M-channel audio signal, M > N, thereby having complex valued subband coding matrices applied in frequency subbands, and parametric multi-channel data associated with the down-mix signal; means for generating frequency subbands for the N-channel signal, wherein at least some of the frequency subbands are real-valued frequency subbands; determining means for determining a real-valued subband decoding matrix for compensating an encoding matrix application in response to parametric multi-channel data; means for generating downmix data corresponding to the downmix signal by performing a matrix multiplication on the real-valued subband decoding matrix and the N-channel signal data in at least some of the real-valued frequency subbands.

According to another aspect of the present invention, there is provided a transmission system for transmitting an audio signal, the transmission system comprising: a transmitter, wherein the transmitter comprises: means for generating an N-channel down-mix signal for an M-channel audio signal, where M > N, means for generating parametric multi-channel data associated with the down-mix signal, means for generating a first N-channel signal by applying a complex valued subband coding matrix to the N-channel down-mix signal in frequency subbands, means for generating a second N-channel signal, where the second N-channel signal comprises the first N-channel signal and the parametric multi-channel data, and means for transmitting the second N-channel signal to a receiver; and a receiver, wherein the receiver comprises: means for receiving a second N-channel signal, means for generating frequency subbands for the first N-channel signal, wherein at least some of the frequency subbands are real-valued frequency subbands, means for determining a real-valued subband decoding matrix for compensating an application of an encoding matrix in response to the parametric multi-channel data, and means for generating downmix data corresponding to the N-channel downmix signal by performing a matrix multiplication on the real-valued subband decoding matrix and N-channel signal data in at least some of the real-valued frequency subbands.

The second N-channel signal may have additional associated channels, wherein the channels comprise parametric multi-channel data.

According to another aspect of the present invention, there is provided a method for receiving an audio signal from a scalable bitstream, the method comprising: receiving input data, wherein the input data comprises an N-channel signal corresponding to a down-mix signal of an M-channel audio signal, and M > N, thereby having complex valued subband coding matrices applied in frequency subbands, and parametric multi-channel data associated with the down-mix signal; generating frequency subbands for the N-channel signal, wherein at least some of the frequency subbands are real-valued frequency subbands; determining a real-valued subband decoding matrix for compensating an application of an encoding matrix in response to the parametric multi-channel data; and generating downmix data corresponding to the downmix signal by performing matrix multiplication on the real-valued subband decoding matrices and the N-channel signal data in at least some of the real-valued frequency subbands.

According to another aspect of the present invention, there is provided a method for transmitting and receiving an audio signal, the method including: performing the following steps at the transmitter: generating an N-channel down-mixed signal of an M-channel audio signal, wherein M > N, generating parametric multi-channel data associated with the down-mixed signal, generating a first N-channel signal by applying a complex valued subband coding matrix to the N-channel down-mixed signal in frequency subbands, generating a second N-channel signal comprising the first N-channel signal and the parametric multi-channel data, and transmitting the second N-channel signal to a receiver; and performing the following steps at the receiver: receiving a second N channel signal; generating frequency subbands for the N-channel signal, wherein at least some of the frequency subbands are real-valued frequency subbands; determining a real-valued subband decoding matrix for compensating an application of an encoding matrix in response to the parametric multi-channel data; and generating downmix data corresponding to the downmix signal by performing matrix multiplication on the real-valued subband decoding matrices and the N-channel signal data in at least some of the real-valued frequency subbands.

These and other aspects, features and advantages of the invention will be apparent from and elucidated with reference to one or more embodiments described hereinafter.

Embodiments of the invention will now be described by way of example with reference to the accompanying drawings, in which:

fig. 1 illustrates an example of an encoder for encoding a multi-channel audio signal according to the prior art;

fig. 2 illustrates an example of a decoder for decoding a multi-channel audio signal according to the prior art;

FIG. 3 illustrates an example of a matrix surround encoding/decoding system in accordance with the prior art;

FIG. 4 illustrates an example of an encoder for encoding a multi-channel audio signal in accordance with the prior art;

FIG. 5 illustrates an example of a decoder for decoding a multi-channel audio signal in accordance with the prior art;

FIG. 6 illustrates an example of a filter bank for generating complex and real-valued frequency subbands;

FIG. 7 illustrates a transmission system for communicating audio signals in accordance with some embodiments of the present invention;

FIG. 8 illustrates a decoder according to some embodiments of the inventions;

FIGS. 9-14 illustrate performance characteristics of a decoder according to some embodiments of the present invention; and

fig. 15 illustrates a decoding method according to some embodiments of the invention.

The following description focuses on embodiments of the present invention applicable to a decoder for decoding an MPEG surround encoded signal including a matrix surround compatibility encoding. It will be appreciated that the invention is not limited to this application but may be applied to numerous other coding standards.

Fig. 7 depicts a transmission system 700 for communicating audio signals in accordance with some embodiments of the present invention. The transmission system 700 comprises a transmitter 701 coupled to a receiver 703 via a network 705, which may in particular be the internet.

In the specific example, the transmitter 701 is a signal recording device and the receiver 703 is a signal player device, but it should be appreciated that in other embodiments, the transmitter and receiver may be used in other applications and for other purposes.

In the specific embodiment supporting the signal recording function, the transmitter 701 comprises a digitizer 707 receiving an analog multi-channel signal, which is converted to a digital PCM (pulse code modulation) multi-channel signal by sampling and analog-to-digital conversion.

The transmitter 701 is coupled to an encoder 709 of fig. 1, wherein the encoder encodes the PCM signal according to an MPEG surround coding algorithm, and wherein the algorithm comprises functionality for matrix surround compatibility coding. The encoder 709 may be, for example, the prior art decoder of fig. 4. In particular, in this example, the encoder 709 produces a stereo MPEG matrix surround compatible stereo down-mix signal.

Thus, the encoder 709 generates a signal as given below

Where L, R is the conventional MPEG stereo down-mix signal, L_MTX、R_MTXIs a matrix surround compatibly encoded down-mix signal output by the encoder 709. In addition, the signal generated by the encoder 709 includes multi-channel parametric data generated by an MPEG surround encoding process. In addition, h_xyAre complex coefficients determined in response to multi-channel parameters. As will be readily appreciated by those skilled in the art, the processing performed by the encoder 709 is performed using complex operations in complex-valued subbands.

The encoder 709 is coupled to a network transmitter 711, which receives the encoded signal and interfaces with the network 705. The network transmitter 711 may transmit the encoded signal to the receiver 703 over the network 705.

The receiver 703 comprises a network interface 713 which interfaces with the network 705 and is arranged to receive the encoded signal from the transmitter 701.

The network interface 713 is coupled to a decoder 715. The decoder 715 receives the encoded signal and decodes the signal in accordance with a decoding algorithm. In this example, the decoder 715 regenerates the original multi-channel signal. In particular, the decoder 715 first generates a compensated stereo downmix signal corresponding to a downmix signal generated by MPEG surround encoding prior to performing an MPEG matrix surround compatible operation. A decoded multi-channel signal is then generated from this down-mix signal and the received multi-channel parametric data.

In the specific example supporting the signal playing function, the receiver 703 further comprises a signal player 717 receiving the decoded multi-channel audio signal from the encoder 715 and displaying it to the user. In particular, the signal player 717 may include a digital-to-analog converter, an amplifier, and a speaker, which are required to output the decoded audio signal.

Fig. 8 depicts the decoder 715 in more detail.

The decoder 715 includes a receiver 801 for receiving signals generated by an encoder 709. As previously mentioned, the signal is a stereo signal corresponding to a down-mix signal processed from complex sample values in subbands of a complex-valued frequency table multiplied by a complex-valued coding matrix H. Furthermore, the received signal comprises multi-channel parametric data corresponding to the down-mix signal. In particular, the received signal is an MPEG surround encoded signal with matrix surround compatibility processing.

Receiver 801 also provides core decoding processing with respect to the received signal to produce a down-mixed PCM signal.

The receiver 801 is coupled to a parametric data processor 803 which extracts multi-channel parametric data from the received signal.

The receiver 801 is further coupled to a subband filter bank 805 which transforms the received stereo signal to the frequency domain. In particular, the subband filter bank 805 generates a plurality of frequency subbands. At least some of the frequency sub-bands are real-valued frequency sub-bands. In particular, the subband filter bank 805 may correspond to the function shown in fig. 6. Thus, the subband filter bank 805 may generate K complex-valued subbands and M-K real-valued subbands. Real-valued subbands are typically higher frequency subbands, e.g., subbands higher than 2 kHz. By using real-valued subbands, the subband generation process and the operations performed on the samples in these subbands can be greatly facilitated. Thus, in decoder 715, the M-K subbands are processed as real-valued data and operations rather than complex-valued data and operations, which greatly reduces complexity and cost.

The subband filter bank 805 is coupled to a compensation processor 807 that generates downmix data corresponding to the downmix signal. In particular, the compensation processor 807 compensates the matrix surround compatibility operation by attempting to invert the product of the encoding matrix H in the frequency sub-band of the encoder 709. This compensation is performed by multiplying the data values of the sub-band with the sub-band decoding matrix G. However, in contrast to the processing at the encoder 709, the matrix multiplication in the real-valued subband of the decoder 715 is performed only in the real domain. Thus, not only are the sample values real-valued samples, but the matrix coefficients of the decoding matrix G are also real-valued coefficients.

The compensation processor 807 is coupled to a matrix processor 809 which determines the decoding matrices to be applied in the sub-bands. For M complex-valued subbands, the decoding matrix G may simply be determined as the inverse of the encoding matrix H in the same subband. However, for real-valued subbands, matrix processor 809 determines real-valued matrix coefficients that provide efficient compensation for the encoding matrix operation.

Thus, the output of the compensation processor 807 corresponds to a sub-band representation of the MPEG surround encoded down-mix signal. Accordingly, the effects of matrix surround compatibility operations may be greatly reduced or eliminated.

The compensation processor 807 is coupled to a synthesis subband filter bank 811 that generates a time domain PCM MPEG surround decoder downmix signal from the subband representation. Thus, in the specific example, the synthesis subband filter bank 811 will form the counterpart of the subband filter bank 805 in the process of back-converting the signal to the time domain.

The synthesis subband filter bank 811 is fed to a multi-channel decoder 813 which is further coupled to a parametric data processor 803. The multi-channel decoder 813 receives the time domain PCM down-mix signal and the multi-channel parametric data and generates the original multi-channel signal.

In this example, the synthesis subband filter bank 811 transforms the subband signals on which the matrix operation is performed to the time domain. Thus, the multi-channel decoder 813 receives an MPEG surround encoded signal that can be compared to the signal received when no matrix surround compatible operation is applied at the decoder. Thus, the same MPEG multi-channel decoding algorithm can be used for both matrix surround compatible signals as well as non-matrix surround compatible signals. However, in other embodiments, the multi-channel decoder 813 may act directly on the sub-band samples, and then perform compensation using the compensation processor 807. In this case, the synthesis subband filter bank 811 may be omitted, or some functions of the synthesis subband filter bank 811 may be integrated with the multi-channel decoder 813.

Thus, to reduce complexity, it is generally preferable to stay in the sub-band domain while providing the compensated signal to the multi-channel decoder 813. Thus, the complexity of the synthesis subband filter bank 811 and the analysis filter bank, which is part of the multi-channel decoder 813, can be avoided here.

In practice, it is generally preferable not to move back and forth between the frequency and time domains, if possible, because this would be computationally expensive. Thus, in a decoder according to some embodiments of the invention, when converting the signal into the sub-band (frequency) domain (which in turn is determined by decoding the core bitstream and applying a filter bank to the resulting PCM signal), a matrix surround inversion process will be applied in the compensation processor 807 (if appropriate, that is, if signaled in the bitstream), and the resulting sub-band domain signal is then used directly to reconstruct the multi-channel (sub-band domain) signal. Finally, a synthesis filter bank is applied in order to obtain the time domain multi-channel signal.

Thus, in the system of FIG. 7, the encoder 709 may generate a matrix surround compatible signal, and this signal may be encoded by Dolby Pro Lo gic^TMA conventional matrix such as a decoder decodes around the decoder. While this requires distortion of the original MPEG surround encoded downmix signal by a matrix surround compatibility operation, this operation can be effectively removed in an MPEG multi-channel decoder, thereby allowing the use of parametric data to accurately represent the original multi-channels that will be generated.

Furthermore, the decoder 715 allows compensation for matrix surround compatibility operations to be performed in real-valued frequency subbands without the need for complex-valued frequency subbands, thereby greatly reducing the complexity of the decoder 715 while also achieving high audio quality.

An example for determining appropriate matrix coefficients for the decoding matrix will be described below.

The encoder 709 performs a matrix surround compatibility operation by applying the following complex-valued coding matrices in each sub-band (it is understood that each sub-band has a different coding matrix):

wherein L, R is a conventional stereo down-mix signal and L_MTX、R_MTXIs a matrix surround coded downmix signal. While the encoder matrix H is given as follows:

wherein w₁And w₂Depending on the spatial parameters generated by the MPEG surround encoding process. In particular:

wherein w₁T and w₂T are non-normalized weights, and they are defined as follows:

wherein CLD_lAnd CLD_rRespectively, represent the channel level difference (in dB) for the left front, left surround and right front, right surround channel pairs. c. C_1，MTXAnd c_2，MTXIs taken as a prediction coefficient c₁And c₂And the prediction coefficients c₁And c₂Is used in the decoder to decode from the left L_DMXAnd right R_DMXThe middle left L, middle C and right R signals are derived in the downmix signal:

c_1，MTXand c_2，MTXAre respectively determined as follows:

where x is {0, 1 }.

Alternatively, the MPEG surround decoder supports a mode in which the coefficient c is set to be lower than the coefficient c₁And c₂Respectively, represent the power ratio of left to left mid and right to right mid. In this case, will be c_1，MTXAnd c_2，MTXDifferent functions are applied.

Thus, for each time/frequency matrix block, a complex-valued coding matrix H is applied to the complex sample values. Weighting w if the preamble signal dominates the original multi-channel input signal₁And w₂Will be close to zero. Thus, the matrix surround downmix signal is close to the input stereo downmix signal. If the surround (post) signal dominates the original multi-channel input signal, the weight w₁And w₂Will be close to 1. Thus, the matrix surround downmix signal will contain a highly out-of-phase version of the original stereo downmix signal provided by the MPEG surround encoder.

The main advantage of providing a matrix-compatible stereo signal by means of a 2 x 2 matrix is that: these matrices are invertible. Thus, the MPEG surround decoder can achieve the same output audio quality regardless of whether the encoder uses a matrix-compatible stereo down-mix signal.

Thus, in an MPEG surround decoder where all frequency subbands are complex-valued subbands (e.g. using a complex-modulated QMF bank), the inverse processing at the decoder end is given by:

wherein

And wherein

N＝h₁₁h₂₂-h₁₂h₂₁

However, this inverse operation requires the use of complex values and thus will not be usable in the decoder 715 of fig. 7, since the decoder uses (at least in part) real-valued subbands. Accordingly, the matrix processor 809 generates a real-valued decoding matrix, and this matrix can be used to significantly reduce the impact of the encoding matrix.

In each subband, the overall effect of the encoding and decoding matrices can be represented by a transformation matrix P given as:

where H represents the encoder matrix and G represents the decoder matrix.

Theoretically, G ═ H^-1Thereby: p ═ H^-1H ═ I, i.e., identity matrix. Due to the weighting H of the encoder matrix H_xyAll are complex values and therefore the matrix cannot be inverted in the decoder for real-valued subbands.

Typically, the real-valued subbands are at higher frequencies, e.g., subbands above 2 kHz. At these frequencies, the phase relationship is perceptually less important, and thus the matrix processor 809 determines decoded matrix coefficients with appropriate amplitude (power) characteristics, regardless of the phase characteristics. In particular, the matrix processor 809 may determine real valued coefficients, assuming or defining | p₁₁1 and p₂₂In the case of | ≈ 1, the coefficient will produce a crosstalk term p of low amplitude or low power value₁₂And p₂₁。

In some embodiments, the matrix processor 809 may determine a complex-valued subband inverse matrix H of the encoding matrix^-1Then the real-valued decoding matrix G can be determined from the matrix coefficients of this matrix. In particular, each coefficient of G may be derived from being in the same positionH^-1Is determined. For example, the real-valued coefficients may be derived from H^-1Is determined from the amplitude values of the respective coefficients. Indeed, in some embodiments, the matrix processor may determine H^-1The coefficients of G can thus be determined as the absolute values of the corresponding matrix coefficients in the inverse matrix H-1.

Thus, the matrix processor 809 may compare

The determination is as follows:

wherein

N＝h1₁h₂₂-h₁₂h₂₁

As shown, at w₁＝w₂0 and w₁＝w₂In the special case of 1, this solution perfectly satisfies the above constraint ((| p)₁₁|＝|p₂₂1 and p₁₂|＝|p₂₁|＝0)。

FIG. 9 depicts the transformation matrix prime (101 og) for this solution₁₀|p₁₁|²) Of the amplitude of (c). FIG. 10 depicts p of FIG. 11₁₁And crosstalk item (101 og)₁₀|p₂₁|²) The phase angle.

In particular, FIG. 9 shows in dB as w₁And w₂The primary matrix term p of the function of₁₁Relative to an ideal value | p₁₁Amplitude deviation of 1. It can be observed that the maximum deviation from the ideal case is less than 1 dB. FIG. 10 shows as w₁And w₂P of the function of₁₁The angle of (c). Phase differences of up to 90 degrees can be expected from the difference versus the ideal complex-valued case. FIG. 11 shows as weights w₁And w₂And cross-talk matrix term P measured in dB as a function of₂₁Of the amplitude of (c). It should be noted that by interchanging w₁And w₂Other transformation matrix elements may also be obtained.

In some embodiments, the matrix processor 809 may determine the decoding matrix G for the subband in response to the subband transform matrix P ═ G · H. In particular, the matrix processor may select coefficient values for G to achieve specified characteristics for P.

Also, since the phase values for real-valued subbands tend to have very low perceptual weighting, the exemplary decoder 715 only considers the amplitude characteristics of P. High quality performance may be achieved by a matrix processor 809 that selects the decoded matrix coefficients to be p₁₂And p₂₁Satisfies a certain criterion-e.g. minimizes the power measure or the power measure is below a specified criterion. For example, the matrix processor 809 may search a range of possible real-valued coefficients and select those as p₁₂And p₂₁The coefficient that yields the lowest power metric. In addition, such an evaluation may also encounter other constraints, such as p₁₁And p₁₂Substantially equal to 1 (for example between 0.9 and 1.1)

In some embodiments, matrix processor 809 may perform some mathematical algorithm to determine the appropriate real coefficient values for the decoding method. One specific example of this is described below, where the algorithm attempts to do so at | p₁₁|²1 and | p₂₂|²The overall crosstalk is minimized under the constraint of 1: | p₁₂|²+|p₂₁|²。

This problem can be solved by a standard multivariate mathematical analysis tool. In particular, it is suitable to use the lagrange multiplier method, which transforms into a matrix eigenvalue problem of the form vA λ vB with a normalization requirement q (v) 1 given by the quadratic form q, for each row vector v of G. The matrices a and B and the quadratic form q depend on the entries of the complex matrix H.

In the following, the description is given with respect to v ═ g₁₁g₁₂]The solution of (1). In the following solution, the variable w is exchanged₁And w₂To solve for v ═ g₂₁g₂₂]The treatment of (2) is meaningless. The lagrangian matrices a and B are defined as follows:

wherein q is₁And q is₂Is defined as:

eigenvalues are then found as follows:

det(A-λB)＝0，

thereby generating the roots of a quadratic polynomial

Wherein

Two candidate solutions can now be determined:

(A-λ_1，2B)v_1，2＝O

the final solution is given by v ═ c_i·v_iIs determined wherein i is 1 or 2, whereby | p₁₁|²1 and has minimal crosstalk. First, c_iIs calculated as follows:

then, the crosstalk | p of the two solutions₁₂|²Is calculated as follows:

the index i that produces the minimum crosstalk will give: v ═ c_i·v_i. Without further demonstration it can be stated that the index i is always equal to 2, and the variable w₁And w₂Is irrelevant.

For completeness, a complete solution for G in accordance with the analytical equation is given below. The following variables are defined herein:

s＝q₁+q₂，

then, the variable b is calculated as follows:

two roots r for two rows in the matrix G_αAnd r_βIs calculated as follows:

then, the non-warped solution v_temp，1And v_temp，2May be determined to be:

the normalization constant c is calculated as follows:

finally, the matrix G is given by:

fig. 12, 13 and 14 illustrate the performance of the solution. FIG. 12 shows the values in dB as w₁And w₂Function of, primary matrix term p₁₁Relative to an ideal value | p₁₁A deviation of 1. It can be observed that this amplitude is always equal to the ideal value | p, since constraints are set for this solution₁₁|＝1。

FIG. 13 shows as w₁And w₂P of the function of₁₁The angle of (c). It should be noted that since all practical solutions present constraints, the phase difference here will be up to 90 degrees.

FIG. 14 shows as weights w₁And w₂And cross-talk matrix term p measured in dB as a function of₂₁Of the amplitude of (c).

As shown, this solution of setting the decoding matrix coefficients to the absolute values of the inverse coding matrix coefficients differs from the more complex crosstalk minimization method by only +/-1dB in terms of the main term gain and crosstalk suppression.

FIG. 15 illustrates an audio decoding method according to some embodiments of the inventions.

In step 1501, a decoder receives input data, wherein the input signal comprises an N-channel signal corresponding to a down-mix signal of an M-channel audio signal, M > N, having complex valued subband encoding matrices applied to frequency subbands and parametric multi-channel data associated with the down-mix signal.

Step 1501 is followed by step 1503 in which frequency subbands are generated for the N-channel signal. Wherein at least some of the frequency sub-bands are real-valued frequency sub-bands.

Step 1503 is followed by step 1505 in which real-valued subband decoding matrices for compensating the application of the encoding matrices are determined in response to the parametric multi-channel data.

Step 1505 is followed by step 1507 of generating downmix data corresponding to the downmix signal by performing a matrix multiplication on the real-valued subband decoding matrices and the N-channel signal data in at least some of the real-valued frequency subbands.

It will be appreciated that in the above description, for clarity, embodiments of the invention have been described with reference to different functional units and processors. It will be apparent, however, that any suitable manner of distributing functionality between different functional units or processors may be used without detracting from the invention. For example, functions described as being performed by separate processors or controllers may be performed by the same processor or controller. Thus, references to specific functional units are only to be seen as references to suitable means for providing the described functionality, rather than indicative of a strict logical or physical structure or organization.

The invention can be implemented in any suitable form including hardware, software, firmware or any combination of these. Alternatively, the invention may be implemented at least partly as computer software running on one or more data processors and/or digital signal processors. The elements and components of an embodiment of the invention may be physically, functionally and logically implemented in any suitable way. Indeed the functionality may be implemented in a single unit, in a plurality of units or as part of other functional units. Also, the invention may be implemented in a single unit or may be physically and functionally distributed between different units and processors.

Although the present invention has been described in connection with certain embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the invention is limited only by the attached claims. Furthermore, while a feature may appear to be described in connection with particular embodiments, one skilled in the art would recognize that various features of the described embodiments may be combined in accordance with the invention. In the claims, the term "comprising" does not exclude the presence of other elements or steps.

Furthermore, although individually listed, a plurality of means, elements or method steps may be implemented by a single unit or processor. Furthermore, although individual features may be included in different embodiments, these may possibly be advantageously combined, which does not imply that features are not to be combined and/or that a combination of features is not advantageous, if they are included in different claims. Furthermore, if a feature is included in one category of claims, this is not meant to be limiting with respect to that category, but rather indicates that the feature is equally applicable to other claim categories, as appropriate. Furthermore, the order of features in the claims does not imply any specific order in which the features operate and in particular the order of individual steps in a method claim does not imply that the steps must be performed in this order. Rather, the steps may be performed in any suitable order. Furthermore, singular references do not exclude a plurality. Thus, the use of the terms "a," "an," "first," "second," etc. do not exclude a plurality. Reference signs in the claims are provided merely as a clarifying example and shall not be construed as limiting the scope of the claims.

Claims

1. An audio decoder (715), comprising:

means (801) for receiving input data, wherein the input data comprises an N-channel signal corresponding to a down-mix signal of an M-channel audio signal, and M > N, thereby having complex valued subband coding matrices applied in frequency subbands, and parametric multi-channel data associated with the down-mix signal;

means (805) for generating frequency subbands for the N-channel signal, wherein at least some of the frequency subbands are real-valued frequency subbands;

determining means (809) for determining a real-valued subband decoding matrix in response to the parametric multi-channel data to compensate for the application of the encoding matrix; and

means (807) for generating downmix data corresponding to the downmix signal by performing a matrix multiplication on the real-valued subband decoding matrices and the N-channel signal data in at least some of the real-valued frequency subbands.

2. Audio decoder (715) according to claim 1, wherein the determining means (809) is arranged to determine a complex valued subband inverse matrix of the encoding matrix and to determine the decoding matrix in response to the inverse matrix.

3. Audio decoder (715) according to claim 2, wherein the determining means (809) is arranged for determining each real valued matrix coefficient of the decoding matrix in response to the absolute value of the corresponding matrix coefficient of the inverse matrix.

4. Audio decoder (715) according to claim 3, wherein the determining means (809) is arranged for actually determining each real-valued matrix coefficient to be the absolute value of the corresponding matrix coefficient in the inverse matrix.

5. Audio decoder (715) according to claim 1, wherein the determining means (809) is arranged for determining the decoding matrix in response to a subband transform matrix, wherein the subband transform matrix is a product of the respective decoding matrix and the encoding matrix.

6. Audio decoder (715) according to claim 5, wherein the determining means (809) is arranged for determining the decoding matrix only in response to the magnitude measure of the transform matrix.

7. The audio decoder (715) of claim 6, wherein the transform matrix for each sub-band is given by:

So that p is₁₂And p₂₁Satisfies a criterion.

8. The audio decoder (715) of claim 7, wherein the amplitude metric is responsive toTo be determined.

9. The audio decoder (715) according to claim 7, wherein the determining means (809) is further arranged to determine p₁₁And p₂₂Is substantially equal to 1.

10. Audio decoder in accordance with claim 1, in which the downmix signal and the parametric multi-channel data are in accordance with the MPEG surround standard.

11. The audio decoder (715) of claim 1, wherein the encoding matrix is an MPEG matrix surround compatibility encoding matrix and the first N-channel signal is an MPEG matrix surround compatibility signal.

12. A method of audio decoding, the method comprising:

receiving (1501) input data, wherein the input data comprises an N-channel signal corresponding to a down-mix signal of an M-channel audio signal, M > N, thereby having complex valued subband encoding matrices applied to frequency subbands and parametric multi-channel data associated with the down-mix signal;

generating (1503) frequency subbands for the N-channel signal, wherein at least some of the frequency subbands are real-valued frequency subbands;

determining (1505) a real-valued subband decoding matrix for compensating an encoding matrix application in response to the parametric multi-channel data; and

downmix data corresponding to the downmix signal is generated (1507) by performing a matrix multiplication on the real-valued subband decoding matrices and the N-channel signal data in at least some of the real-valued frequency subbands.

13. A receiver (703) for receiving N channel signals, the receiver (703) comprising:

means (801) for receiving input data, wherein the input data comprises an N-channel signal corresponding to a down-mix signal of an M-channel audio signal, M > N, thereby having complex valued subband encoding matrices applied in frequency subbands and parametric multi-channel data associated with the down-mix signal;

14. A transmission system (700) for transmitting an audio signal, the transmission system comprising:

a transmitter (701), the transmitter comprising:

means (709) for generating an N-channel down-mix signal of an M-channel audio signal, where M > N,

means (709) for generating parametric multi-channel data associated with the downmix signal,

means (709) for generating a first N-channel signal by applying a complex-valued subband coding matrix to the N-channel down-mixed signals in the frequency subbands,

means (709) for generating a second N-channel signal, said second N-channel signal comprising the first N-channel signal and parametric multi-channel data, and

means (711) for transmitting the second N channel signal to the receiver;

and

a receiver (703), the receiver comprising:

means (801) for receiving a second N-channel signal,

means (805) for generating frequency sub-bands for a first N channel signal, at least some of the frequency sub-bands being real-valued frequency sub-bands,

determining means (809) responsive to the parametric multi-channel data for determining a real-valued subband decoding matrix for compensating the application of the encoding matrix, and

15. A method for receiving an audio signal, the method comprising:

receiving (1501) input data comprising an N-channel signal corresponding to a down-mix signal of an M-channel audio signal, and M > N, thereby having complex valued subband coding matrices applied in frequency subbands and parametric multi-channel data associated with the down-mix signal;

16. A method for transmitting and receiving an audio signal, the method comprising:

-performing the following steps at the transmitter (701):

generating an N-channel downmix signal of an M-channel audio signal, wherein M > N,

parametric multi-channel data associated with the down-mix signal is generated,

a first N-channel signal is generated by applying a complex-valued subband coding matrix to the N-channel down-mixed signal in frequency subbands,

generating a second N-channel signal comprising the first N-channel signal and parametric multi-channel data, an

Transmitting the second N channel signal to a receiver (703);

and

performing the following steps at the receiver (703):

receiving (1501) a second N-channel signal;

generating (1503) frequency subbands for the first N-channel signal, wherein at least some of the frequency subbands are real-valued frequency subbands;

downmix data corresponding to the N-channel downmix signal is generated (1507) by performing a matrix multiplication on the real-valued subband decoding matrices and the N-channel signal data in at least some of the real-valued frequency subbands.

17. An audio playback device (703) comprising a decoder (715) according to claim 1.