[go: up one dir, main page]

HK1071271B - Method of decoding two-channel matrix encoded audio to reconstruct multichannel audio - Google Patents

Method of decoding two-channel matrix encoded audio to reconstruct multichannel audio Download PDF

Info

Publication number
HK1071271B
HK1071271B HK05104189.8A HK05104189A HK1071271B HK 1071271 B HK1071271 B HK 1071271B HK 05104189 A HK05104189 A HK 05104189A HK 1071271 B HK1071271 B HK 1071271B
Authority
HK
Hong Kong
Prior art keywords
audio
channel
sub
sound field
band
Prior art date
Application number
HK05104189.8A
Other languages
Chinese (zh)
Other versions
HK1071271A1 (en
Inventor
W.P.史密斯
S.M.史密斯
严明
Original Assignee
Dts (Bvi) 有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/680,737 external-priority patent/US7003467B1/en
Application filed by Dts (Bvi) 有限公司 filed Critical Dts (Bvi) 有限公司
Publication of HK1071271A1 publication Critical patent/HK1071271A1/en
Publication of HK1071271B publication Critical patent/HK1071271B/en

Links

Description

Decoding method for reconstructing two-channel matrix encoded audio into multi-channel audio
Technical Field
The present invention relates to multi-channel audio, and more particularly, to a decoding method for reconstructing two-channel matrix-encoded audio into multi-channel audio that more closely simulates a discrete surround sound presentation.
Background
Multi-channel audio has become a standard for movie theaters and home theaters, and is rapidly gaining acceptance in music, automobiles, computers, games and other audio applications, and is also being considered for television broadcasting. Multi-channel audio provides a surround sound environment that greatly enhances the listening experience and the overall presentation of any audio-visual system. The evolution from stereo to multi-channel audio is driven by a number of factors, the most significant of which is consumer demand for higher quality audio presentations. High quality does not represent more channels but higher fidelity channels and more advanced channel separation. Another factor of equal importance to consumers and manufacturers is maintaining the inverse compatibility of existing speaker systems and encoded content and enhancing audio presentation on existing systems and content.
Early multichannel system codes were generated by combining multiple audio channels, such as: left, right, center and surround (L, R, C, S) channels, arrayed into left full and right full (Lt, Rt) channels and recorded in standard stereo. Although, these two-channel matrix coding systems (e.g., Dolby Prologic)tm) Surround sound audio is provided, but the audio presentation is not discrete, but is characterized by crosstalk and phase distortion. The matrix decoding algorithm identifies a single dominant signal and locates this signal in a five-point sound field to reconstruct the L, R, C and S signals. The result may be a mushy audio
It is demonstrated that different signals are not clearly separated in space, and especially less dominant but important signals may actually be lost.
The standard in consumer applications today is to split 5.1 channel audio, dividing the surround sound channel into left and right surround audio channels plus a Sub bass channel (L, R, C, Ls, Rs, Sub). Each channel is compressed independently and then mixed into a 5.1 format to maintain the separation of each signal. Dolby AC-3tm,Sony SDDStmAnd DTS Coherent AcousticstmAre examples of 5.1 systems. Recently, 6.1 channel audio with a center surround channel Cs has also been introduced. True separated audio provides a clear spatial separation of audio channels while also supporting multiple dominant signals to provide richer and more natural sound presentation.
Consumers are reluctant to accept significantly poorer surround sound presentations when they are accustomed to separating multi-channel audio and having a 5.1 speaker system installed in their home. Unfortunately, only a small portion of the current content is in the 5.1 format. Most content is also only in two-channel matrix coding format, mainly Dolby Prologictm. Because Prologic decoders have been heavily adopted, it is expected that 5.1 content will continue to be encoded in Prologic format. Accordingly, there remains an unmet need in the industry to provide a decoding method for encoding audio in a two-channel matrix to reconstruct multi-channel audio that more closely approximates split multi-channel audio.
Dolby PrologictmIs one of the earliest two-channel matrix-coded multi-channel systems provided. Prologic uses phase-shifted surround landmarks to squeeze two channels (Lt, Rt) into four channels (L, R, C, S). The two channels are then later encoded into the current two-channel format. Decoding is a two-step process in which a current decoder receives Lt, Rt, and a Prologic decoder then expands Lt, Rt into L, R, C, S. Since the four signals are only two-channel extensions, the prolog operation is only an approximation and does not provide true separate multi-channel audio.
As shown in fig. 1, a recording studio 2 may mix a plurality of, for example: 48 audio sources to provide a four channel mix (L, R, C, S). The Prologic encoder 4 matrix encodes the mix in the following way:
l +.707C + S (+90 °), and (1)
Rt=R+.707C+S(-90°) (2)
That is extended by two separate channels, encoded in the current two-channel format and recorded on a medium 6 such as film, CD or DVD.
A Prologic matrix decoder 8 decodes the two separate channels Lt, Rt and their expansion into four separate reconstructed channels Lr, Rr, Cr and Sr which are amplified and distributed to a 5 loudspeaker system 10.
Many different proprietary algorithms are used to perform a dynamic decoding, all of which calculate the gain factor Gi based on measuring the power of Lt + Rt, Lt-Rt, Lt and Rt, thus:
Lr=G1*Lt+G2*Rt (3)
Rr=G3*Lt+G4*Rt (4)
cr (G5 × Lt + G6 × Rt), and (5)
Sr=G7*Lt+G8*Rt. (6)
More specifically, Dolby provides a set of gain coefficients for a null at the center of a 5-point sound field 11 as shown in fig. 2. The decoder measures the absolute power of the two-channel matrix-encoded signals Lt and Rt and calculates the power levels of the L, R, C and S channels according to the following equations:
Lpow(t)=C1*Lt+C2*Lpow(t-1) (7)
Rpow(t)=C1*Rt+C2*Rpow(t-1) (8)
Cpow(t)=C1*(Lt+Rt)+C2*Cpow(t-1) (9)
Spow(t)=C1*(Lt+Rt)+C2*Spow(t-1) (10)
where C1 and C2 are coefficients that control the degree of time averaging, and the parameter (t-1) is the respective power level at the previous instant.
These power levels are then used to calculate the L/R and C/S dominant vectors according to the following formulas:
If Lpow(t)>Rpow(t),Dom L/R=1-Rpow(t)/Cpow(t),
Else Dom L/R=Lpow(t)/Rpow(t)-1, (11)
and
If Cpow(t)>Spow(t),Dom C/S=1-Spow(t)/Cpow(t),
Else Dom C/R=Cpow(t)/Spow(t)-1, (12)
the vector sum of the L/R and C/S dominant vectors defines a dominant vector 12 which is in the 5-point sound field and thus emits a single dominant signal. The decoder adjusts the set of gain coefficients at the zero point according to the following explicit vector:
[G]Dom=[G]Null+Dom L/R*[G]R+Dom C/S*[G]C (13)
where [ G ] represents the set of gain coefficients G1, G2, … G8.
This assumes that the dominant point is in the R/C quadrant of the 5-point sound field. Generally, the appropriate power level is added to the equation based on the quadrant in which the dominant point is located. The L, R, C and S channels are then reproduced with [ G ] Dom coefficients according to equations 3-6 and passed to the loudspeaker and speaker arrangement.
The disadvantages are evident when compared to the split 5.1 system. The surround sound presentation includes crosstalk interference and phase distortion and an optimal approximately separate audio presentation. In addition to a single dominant signal, some signals originating from different locations or existing in different spectral bands tend to be cancelled by the single dominant signal.
Such as Dolby AC-3TM、Sony SDDSTMAnd DTS Coherent AcousticsTMThe 5.1 surround sound system of et al can maintain the separability of multi-channel audio, and thus can provide richer and more natural sound presentations. As shown in fig. 3, the studio 20 may provide a 5.1 channel mix. A5.1 encoder 22 independently compresses each signalNumber or channel, multiplexes them and encapsulates the audio data in a particular 5.1 format, and records it on a suitable medium 24, such as a DVD. A5.1 decoder 26 decodes the bit stream by extracting the audio data one frame at a time, multiplexes it into 5.1 channels and decompresses each channel to reproduce the signal (Lr, Rr, Cr, Lsr, Rsr, Sub). These 5.1 separate channels carrying the 5.1 separate audio signals are sent to appropriate separate speakers (subwoofers are not shown) in a speaker arrangement 28.
Disclosure of Invention
In view of the above, the present invention provides a method of decoding two-channel matrix encoded audio to reconstruct multi-channel audio that more closely resembles a split surround presentation.
This may be accomplished by subband filtering the two-channel matrix encoded audio, transforming each subband signal into an expanded sound field to produce multi-channel subband signals, and synthesizing the subband signals into reconstructed multi-channel audio. By separately controlling the sub-bands around an expanded sound field, different sounds can be simultaneously localized at different points around the sound field to make the location of each sound element more accurate and the pitch more distinct.
The sub-band filtering process defines a plurality of dominant signals for each sub-band. Therefore, signals that are important for audio presentation but may be masked by a single dominant signal can remain in the surround sound presentation as long as they are located in different sub-bands. To optimize the balance of performance and computation, a one-Bark (bark) filter method is preferred when tuning the sub-bands to fit the hearing sensitivity of the human ear.
By expanding the sound field, the decoder can more accurately locate the audio signal within the sound field. It appears that signals emanating from the same location can be separated and appear more separated. To optimize the performance, the expanded sound field and the multi-channel input are preferably adapted. For example: the nine-point sound field provides discrete points, each having a set of optimized gain coefficients, including points for each of the channels L, R, C, Ls, Rs, and Cs.
Some of the other features and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, taken in conjunction with the annexed drawings, discloses preferred embodiments.
Drawings
FIG. 1, as mentioned above, is a block diagram of a two-channel matrix-coded surround sound system;
FIG. 2, as mentioned above, is an illustration of a 5-point sound field;
FIG. 3, as mentioned above, is a block diagram of a 5.1 surround sound system;
FIG. 4 is a block diagram of a decoder for reconstructing multi-channel audio from two-channel matrix encoded audio according to the present invention;
FIG. 5 is a flow chart illustrating the steps of the present invention for reconstructing multi-channel audio from two-channel matrix encoded audio;
FIGS. 6a and 6b illustrate a subband filter and a synthesis filter shown in FIG. 4 for reconstructing separate multi-channel audio, respectively;
FIG. 7 illustrates a particular barker subband filter; and
fig. 8 is an illustration of a nine-point augmented sound field for a deployment of a split multi-channel audio presentation.
Detailed Description
The present invention satisfies the need in the industry to provide a decoding method for encoding audio in a two-channel matrix to reconstruct multi-channel audio that more closely approximates separate multi-channel audio. This technique is possible in combination with a multichannel audiovisual receiver such that a single device can supply both true 5.1 (or 6.1) multichannel audio and two-channel matrix encoded audio. Although less well than true discrete multi-channel audio, a surround sound presentation of content encoded by a two-channel matrix may provide a richer and more natural sound experience. This is achieved via: the sub-bands filter the two-channel audio, steer the sub-bands in an expanded sound field that includes a separation point with optimized gain coefficients for each speaker location and re-synthesize the multi-channel sub-bands to reconstruct the multi-channel audio. Although the preferred implementation uses sub-band filtering and sound field expansion functions, they can be used independently.
As illustrated in fig. 4, a decoder 30 receives a two-channel matrix encoded signal 32(Lt, Rt) and reconstructs a multi-channel signal 34, which is then amplified and distributed to speakers 36 to present a more natural and richer surround sound experience. The decoding algorithm is independent of the particular two-channel matrix encoding, so that the signal 32(Lt, Rt) may represent a standard Prologic mixture (L, R, C, S), a 5.0 mixture (L, R, C, Ls, Rs), a 6.0 mixture (L, R, C, Ls, Rs, Cs), or others. Reconstructing the multi-channel audio is dependent on the user speaker configuration. For example: for a 6.0 signal, the decoder generates a separate center surround Cs channel, if a Cs speaker is present, which would otherwise be mixed into the Ls and Rs channels to provide a Phantom (Phantom) center surround. Similarly, if the user has less than five speakers, the decoder will mix. Note that the ultra-bass or.1 channel is not included here
And (4) mixing. The bass response is provided by a separate software that extracts a low frequency signal from the reconstructed channel and is not part of the present invention.
The decoder 30 comprises a subband filter 38, a matrix encoder 40 and a synthesis filter 42, which together decode the two-channel matrix encoded audio Lt and Rt and the reconstructed multi-channel audio. As shown in fig. 5, decoding and reconstruction require the following sequence of steps:
1. for each input channel (Lt, Rt) a sample is taken, for example: 64 samples (step 50).
2. With a multi-band filter column 38, for example: a64-band polyphase filter bank 52 of the type shown in FIG. 6a filters each segment to form a sub-band audio signal (step 54).
3. The resulting samples are (optionally) grouped into the nearest resulting barker band 56 as shown in fig. 7 (step 58). The barker bands may be further combined to reduce the computational load.
4. The power level of each Lt and Rt sub-band is measured (step 60).
5. The power level for each of the L, R, C and S subbands is calculated (step 62).
Lpow(t)i=C1*Lt+C2*Lpowi(t-1) (14)
Rpow(t)i=C1*Rt+C2*Rpowi(t-1) (15)
Cpow(t)i=C1*(Lt+Rt)+C2*Cpowi(t-1) (16)
Spow(t)i=C1*(Lt-Rt)+C2*Spowi(t-1) (17)
Where i represents a subband, C1 and C2 are time-averaged coefficients, and (t-1) represents the previous instant.
6. Each subband L/R and C/S dominant vector is calculated (step 64).
If Lpow(t)i>Rpow(t)i,DomL/Ri=1-Rpow(t)i/Lpow(t)i
else Dom L/Ri=Lpow(t)i/Rpow(t)i-1 (18)
and
If Cpow(t)i>Spow(t)i,DomC/Si=1-Spow(t)i/Cpow(t)i
else Dom C/Ri=Cpow(t)i/Spow(t)i-1 (19)
7. The L/R and C/S dominant vectors for each subband are averaged using a slow and fast average and threshold to determine which average is to be used as the calculation matrix variable (step 66). This allows for fast steering when appropriate, i.e. large changes occur when unintentional drift is prevented.
8. The Lt, Rt sub-band signals are mapped/transformed into an expanded sound field 68 of the type shown in fig. 8, which is matched to the speaker layout of the moving image/DVD channel arrangement (step 70). The coordinates of the nine points (which can be augmented by a larger processor function) are confirmed at the location of the sound space. Each point corresponds to a set of gain values G1, G2, … G12, denoted G, that are determined to produce the best output for each speaker when the L/R and C/S dominant vectors define a signal vector 72 corresponding to that point.
As defined by equations 18 and 19 above, each Dom L/R and Dom C/S has a value in the range [ -1, 1], where the signal of the dominant vector indicates where the quadrant vector 72 is located and the associated position of the vector value within each sub-band quadrant.
The gain coefficients of the signal vector 72 at each sub-band are preferably calculated based on the gain coefficient values at the four corners of the quadrant in which the signal vector 72 is located. One approach is to interpolate the gain coefficients at the corner based on the coefficient values at that point.
The following equation is a generalized interpolation equation for points in the upper left quadrant:
[G]vectori=D1i*[G]Null+D2i*[G]L+D3i*[G]c+D4i*[G]UL
(20)
where D1, D2, D3 and D4 are linear interpolation coefficients given by:
D1i1-null (0, 0) and the distance between vector 72,
D2i1-L (0, 1) and vector 72,
D3i1-C (1, 0) and vector 72, and
D4ithe distance between 1-UL (1, 1) and vector 72,
where "distance" is any suitable distance measure.
Although higher order functions may be used, initial testing shows that a simple first order or linear interpolation is preferred, with the coefficients being provided by:
D1i=(1-|Dom LRi|-|Dom CSi|+|Dom LRi|*|Dom CSi|)
D2i=(|Dom LRi|-|Dom LRi|*|Dom CSi|)
D3i=(|Dom CSi|-|Dom LRi|*|Dom CSi|)
D4i=(|Dom LRi|*|Dom CSi|)
where | is a magnitude function and i represents a subband.
If the signal vector 72 is coincident with the zero, the coefficient is preset to be the zero coefficient. If the point is located in the center of a quadrant (1/2 ), all four corners are equally given a quarter value to them. If the point is at a point closer to the point to be given, then that point is heavier, except in a linear fashion. For example: if the point is located at (1/4 ), near zero, the base value is 9/16[ G]Null,3/16[G]L,3/16[G]CAnd 1/16[ G ]]UL
9. Reconstructing a multi-channel sub-band audio signal from (step 74):
Lri=G1i*Lti+G2i*Rti (21)
Rri=G3i*Lti+G4i*Rti (22)
Cri=G5i*Lti+G6i*Rti, (23)
Lsri=G7i*Lti+G8i*Rti, (24)
Rsri=G9i*Lti+G10i*Rti,and (25)
Csri=G11i*Lti+G12i*Rti (26)
wherein [ G ] is]vector iProviding G1i,G2i,…G12i.
10. The multi-channel sub-band audio signal is passed through a synthesis filter 42 of the type shown in figure 6b, for example: an inverse polyphase filter 76 to produce reformed multi-channel audio (step 78). Depending on the audio content, the reformed audio may contain multiple dominant signals, up to one per sub-band.
Compared with the known guide matrix system, the method comprises the following steps: prologic, has two main advantages:
1. by separately directing the sub-bands, different sounds can be simultaneously localized at different points in the matrix, which allows for a more accurate layout and a clearer resolution of each sound element.
2. The current matrix observes the active video/DVD channel allocation for three front channels and two rear channels and three rear channels. Thus, the best use is for DVD and Lt/Rt playback with 5.1/6.1 separation in a single speaker configuration through the matrix.
While various embodiments of the present invention have been shown and described, many modifications and changes will occur to those skilled in the art. Such modifications and variations are anticipated and can be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (16)

1. A method of reconstructing two-channel matrix encoded audio (32) into multi-channel audio (34) that approximates a discrete surround sound presentation, comprising:
filtering (54) the two-channel matrix encoded audio subbands into a plurality of two-channel subband audio signals;
directing (70) the two-channel sub-band audio signals separately in a sound field (68) to form multi-channel sub-band audio signals; and
the multi-channel sub-band audio signals are synthesized (78) in the sub-bands to reconstruct multi-channel audio.
2. The method of claim 1, wherein the reconstructed multi-channel audio comprises a plurality of dominant audio signals.
3. The method of claim 2, wherein the dominant audio signals are present in different sub-bands.
4. A method according to claim 3, wherein directing the two-channel sub-band audio signal comprises calculating (64) an explicit vector (72) for each of said sub-bands in said sound field, said explicit vector being determined by the explicit audio signal in the sub-band.
5. The method of claim 1, wherein the subband filtering groups (58) the subband audio signal into a plurality of bark bands.
6. A method as claimed in claim 1, characterized in that the two-channel matrix encoded audio comprises at least left, right, center, left surround and right surround (L, R, C, Ls, Rs) audio channels, said two-channel sub-band audio signals being directed to an extended sound field (68) comprising a split point for each of said audio channels.
7. The method of claim 6, wherein each of said discrete points corresponds to a set of gain values, said gain values being predetermined to produce an optimum audio output at each of the L, R, C, Ls, Rs speakers, respectively, when the two-channel sub-band audio signal is directed to that point in the expanded sound field.
8. The method of claim 7, wherein each of said discrete points further includes a gain value, said gain value being predetermined to produce an optimal audio output at the center surround (Cs) speaker when the two-channel sub-band audio signal is directed to the point in the expanded sound field.
9. The method of claim 7, wherein directing the audio signal comprises:
calculating (64) for each of said sub-bands an dominant vector in the sound field, the dominant vector being determined by the dominant audio signal at the sub-band;
calculating a set of gain values for each subband using the dominance vector and the predetermined gain value for each separation point; and
calculating a multi-channel sub-band audio signal using the two-channel sub-band audio signal and the gain value.
10. A method as claimed in claim 9, characterized by calculating the gain value for each sub-band by linear interpolation of predetermined gain values around the dominant vector to define the set of gain values for the point in the sound field indicated by the dominant vector.
11. The method of claim 1, wherein the expanded sound field comprises a-9 point sound field, each of said discrete points corresponding to a set of gain values, said gain values being predetermined to produce an optimum audio output at each of the L, R, C, Ls, Rs speakers, respectively, when the two-channel sub-band audio signal is directed to that point in the expanded sound field.
12. A method of decoding two-channel matrix encoded audio (32) to reconstruct multichannel audio (34) of an approximate analog split surround sound presentation, comprising:
providing two-channel matrix encoded audio comprising at least left, right, center, left surround and right surround (L, R, C, Ls, Rs) audio channels;
filtering (54) the two-channel matrix encoded audio subbands into a plurality of two-channel subband audio signals;
directing (70) said two-channel sub-band audio signals separately in a sound field (68) to form multi-channel sub-band audio signals, said sound field having a separation point for each audio channel, each said separation point corresponding to a set of gain values, said gain values being predetermined to produce a respective optimum audio output on each of the L, R, C, Ls, Rs speakers when the two-channel sub-band audio signals are directed to that point in the expanded sound field; and
a multi-channel sub-band audio signal is synthesized (78) in the sub-bands to reconstruct multi-channel audio.
13. The method of claim 12, wherein the reconstructed multi-channel audio comprises a plurality of dominant audio signals present in different sub-bands.
14. The method of claim 12, wherein the subband filtering groups (58) the subband audio signal into a plurality of bark bands.
15. The method of claim 12, wherein each of said discrete points further includes a gain value, said gain value being predetermined to produce an optimal audio output at a center surround (Cs) speaker when the two channel sub-band audio signal is directed to the point in the expanded sound field.
16. The method of claim 12, wherein expanding the sound field comprises a 9-point sound field.
HK05104189.8A 2000-10-06 2001-10-04 Method of decoding two-channel matrix encoded audio to reconstruct multichannel audio HK1071271B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US09/680,737 2000-10-06
US09/680,737 US7003467B1 (en) 2000-10-06 2000-10-06 Method of decoding two-channel matrix encoded audio to reconstruct multichannel audio
PCT/US2001/030997 WO2002032186A2 (en) 2000-10-06 2001-10-04 Method of decoding two-channel matrix encoded audio to reconstruct multichannel audio

Publications (2)

Publication Number Publication Date
HK1071271A1 HK1071271A1 (en) 2005-07-08
HK1071271B true HK1071271B (en) 2009-10-23

Family

ID=

Similar Documents

Publication Publication Date Title
CN100496149C (en) Decoding method for reconstructing two-channel matrix encoded audio into multi-channel audio
TWI489887B (en) Virtual audio processing for loudspeaker or headphone playback
US5594800A (en) Sound reproduction system having a matrix converter
CN100586227C (en) Output equalization in stereo widening networks
RU2752600C2 (en) Method and device for rendering an acoustic signal and a machine-readable recording media
KR100458021B1 (en) Multi-channel audio enhancement system for use in recording and playback and methods for providing same
JP5698189B2 (en) Audio encoding
US5381482A (en) Sound field controller
US8442241B2 (en) Audio signal processing for separating multiple source signals from at least one source signal
EP0571455B1 (en) Sound reproduction system
US20150213807A1 (en) Audio encoding and decoding
JP3788537B2 (en) Acoustic processing circuit
CN101356573A (en) Control over decoding of binaural audio signals
EP1381254A2 (en) Method and apparatus for producing multi-channel sound
JP5166030B2 (en) Audio signal enhancement
EP3808106A1 (en) Spatial audio capture, transmission and reproduction
JP5038145B2 (en) Localization control apparatus, localization control method, localization control program, and computer-readable recording medium
HK1071271B (en) Method of decoding two-channel matrix encoded audio to reconstruct multichannel audio
KR100598602B1 (en) Virtual stereo sound generating device and method