HK40021957B - Cross product enhanced harmonic transposition - Google Patents
Cross product enhanced harmonic transposition Download PDFInfo
- Publication number
- HK40021957B HK40021957B HK42020011718.2A HK42020011718A HK40021957B HK 40021957 B HK40021957 B HK 40021957B HK 42020011718 A HK42020011718 A HK 42020011718A HK 40021957 B HK40021957 B HK 40021957B
- Authority
- HK
- Hong Kong
- Prior art keywords
- subband
- analysis
- synthesis
- frequency
- signal
- Prior art date
Links
Description
This application is a European divisional application of European patent application 13164569.9 (reference: D08072EP02), for which EPO Form 1001 was filed 19 April 2013. EP 13164569.9 is itself a European divisional application of Euro-PCT patent application EP 10701342.7 (reference: D08072EP01), filed 15 January 2010 and granted as EP 2380172 on 24 July 2013 .
The present invention relates to audio coding systems which make use of a harmonic transposition method for high frequency reconstruction (HFR).
HFR technologies, such as the Spectral Band Replication (SBR) technology, allow to significantly improve the coding efficiency of traditional perceptual audio codecs. In combination with MPEG-4 Advanced Audio Coding (AAC) it forms a very efficient audio codec, which is already in use within the XM Satellite Radio system and Digital Radio Mondiale. The combination of AAC and SBR is called aacPlus. It is part of the MPEG-4 standard where it is referred to as the High Efficiency AAC Profile. In general, HFR technology can be combined with any perceptual audio codec in a back and forward compatible way, thus offering the possibility to upgrade already established broadcasting systems like the MPEG Layer-2 used in the Eureka DAB system. HFR transposition methods can also be combined with speech codecs to allow wide band speech at ultra low bit rates.
The basic idea behind HRF is the observation that usually a strong correlation between the characteristics of the high frequency range of a signal and the characteristics of the low frequency range of the same signal is present. Thus, a good approximation for the representation of the original input high frequency range of a signal can be achieved by a signal transposition from the low frequency range to the high frequency range.
This concept of transposition was established in WO 98/57436 , as a method to recreate a high frequency band from a lower frequency band of an audio signal. A substantial saving in bit-rate can be obtained by using this concept in audio coding and/or speech coding. In the following, reference will be made to audio coding, but it should be noted that the described methods and systems are equally applicable to speech coding and in unified speech and audio coding (USAC).
In a HFR based audio coding system, a low bandwidth signal is presented to a core waveform coder and the higher frequencies are regenerated at the decoder side using transposition of the low bandwidth signal and additional side information, which is typically encoded at very low bit-rates and which describes the target spectral shape. For low bit-rates, where the bandwidth of the core coded signal is narrow, it becomes increasingly important to recreate a high band, i.e. the high frequency range of the audio signal, with perceptually pleasant characteristics. Two variants of harmonic frequency reconstruction methods are mentioned in the following, one is referred to as harmonic transposition and the other one is referred to as single sideband modulation.
The principle of harmonic transposition defined in WO 98/57436 is that a sinusoid with frequency ω is mapped to a sinusoid with frequency Tω where T > 1 is an integer defining the order of the transposition. An attractive feature of the harmonic transposition is that it stretches a source frequency range into a target frequency range by a factor equal to the order of transposition, i.e. by a factor equal to T. The harmonic transposition performs well for complex musical material. Furthermore, harmonic transposition exhibits low cross over frequencies, i.e. a large high frequency range above the cross over frequency can be generated from a relatively small low frequency range below the cross over frequency.
In contrast to harmonic transposition, a single sideband modulation (SSB) based HFR maps a sinusoid with frequency ω to a sinusoid with frequency ω + Δω where Δω is a fixed frequency shift. It has been observed that, given a core signal with low bandwidth, a dissonant ringing artifact may result from the SSB transposition. It should also be noted that for a low cross-over frequency, i.e. a small source frequency range, harmonic transposition will require a smaller number of patches in order to fill a desired target frequency range than SSB based transposition. By way of example, if the high frequency range of (ω, 4ω] should be filled, then using an order of transposition T = 4 harmonic transposition can fill this frequency range from a low frequency range of . On the other hand, a SSB based transposition using the same low frequency range must use a frequency shift of 4 and it is necessary to repeat the process four times in order to fill the high frequency range (ω,4ω].
On the other hand, as already pointed out in WO 02/052545 A1 , harmonic transposition has drawbacks for signals with a prominent periodic structure. Such signals are superimpositions of harmonically related sinusoids with frequencies Ω,2Ω,3Ω,..., where Ω is the fundamental frequency.
Upon harmonic transposition of order T, the output sinusoids have frequencies TΩ, 2TΩ, 3TΩ,... , which, in case of T > 1, is only a strict subset of the desired full harmonic series. In terms of resulting audio quality a "ghost" pitch corresponding to the transposed fundamental frequency TΩ will typically be perceived. Often the harmonic transposition results in a "metallic" sound character of the encoded and decoded audio signal. The situation may be alleviated to a certain degree by adding several orders of transposition T =2,3,..., T max to the HFR, but this method is computationally complex if most spectral gaps are to be avoided.
An alternative solution for avoiding the appearance of "ghost" pitches when using harmonic transposition has been presented in WO 02/052545 A1 . The solution consists in using two types of transposition, i.e. a typical harmonic transposition and a special "pulse transposition". The described method teaches to switch to the dedicated "pulse transposition" for parts of the audio signal that are detected to be periodic with pulse-train like character. The problem with this approach is that the application of "pulse transposition" on complex music material often degrades the quality compared to harmonic transposition based on a high resolution filter bank. Hence, the detection mechanisms have to be tuned rather conservatively such that pulse transposition is not used for complex material. Inevitably, single pitch instruments and voices will sometimes be classified as complex signals, hereby invoking harmonic transposition and therefore missing harmonics. Moreover, if switching occurs in the middle of a single pitched signal, or a signal with a dominating pitch in a weaker complex background, the switching itself between the two transposition methods having very different spectrum filling properties will generate audible artifacts. Another variant for performing harmonic frequency reconstruction is proposed in US 2004/0028244 A1 .
The invention is defined as in the attached independent claims. Further embodiments are defined in the dependent claims.
The present invention provides a method and system to complete the harmonic series resulting from harmonic transposition of a periodic signal. Frequency domain transposition comprises the step of mapping nonlinearly modified subband signals from an analysis filter bank into selected subbands of a synthesis filter bank. The nonlinear modification comprises a phase modification or phase rotation which in a complex filter bank domain can be obtained by a power law followed by a magnitude adjustment. Whereas prior art transposition modifies one analysis subband at a time separately, the present invention teaches to add a nonlinear combination of at least two different analysis subbands for each synthesis subband. The spacing between the analysis subbands to be combined may be related to the fundamental frequency of a dominant component of the signal to be transposed.
In the most general form, the mathematical description of the invention is that a set of frequency components ω 1 , ω 2,..., ωK, are used to create a new frequency component where the coefficients T 1 ,T 2 ...,TK are integer transposition orders whose sum is the total transposition order T = T 1 +T 2+... +TK . This effect is obtained by modifying the phases of K suitably chosen subband signals by the factors T 1 ,T 2 ...,TK and recombining the result into a signal with phase equal to the sum of the modified phases. It is important to note that all these phase operations are well defined and unambiguous since the individual transposition orders are integers, and that some of these integers could even be negative as long as the total transposition order satisfies T≥1.
The prior art methods correspond to the case K=1, and the current invention teaches to use K≥2 .
The descriptive text treats mainly the case K=2, T≥2 as it is sufficient to solve most specific problems at hand. But it should be noted that the cases K > 2 are considered to be equally disclosed and covered by the present document.
The present invention will now be described by way of illustrative examples, not limiting the scope of the invention. It will be described with reference to the accompanying drawings, in which:
- Fig. 1 illustrates the operation of an HFR enhanced audio decoder;
- Fig. 2 illustrates the operation of a harmonic transposer using several orders;
- Fig. 3 illustrates the operation of a frequency domain (FD) harmonic transposer;
- Fig. 4 illustrates the operation of the inventive use of cross term processing;
- Fig. 5 illustrates prior art direct processing;
- Fig. 6 illustrates prior art direct nonlinear processing of a single sub-band;
- Fig. 7 illustrates the components of the inventive cross term processing;
- Fig. 8 illustrates the operation of a cross term processing block;
- Fig. 9 illustrates the inventive nonlinear processing contained in each of the MISO systems of Fig. 8;
- Figs. 10 - 18 illustrate the effect of the invention for the harmonic transposition of exemplary periodic signals;
- Fig. 19 illustrates the time-frequency resolution of a Short Time Fourier Transform (STFT);
- Fig. 20 illustrates the exemplary time progression of a window function and its Fourier transform used on the synthesis side;
- Fig. 21 illustrates the STFT of a sinusoidal input signal;
- Fig. 22 illustrates the window function and its Fourier transform according to Fig. 20 used on the analysis side;
- Figs. 23 and 24 illustrate the determination of appropriate analysis filter bank subbands for the cross-term enhancement of a synthesis filter band subband;
- Figs. 25, 26, and 27 illustrate experimental results of the described direct-term and cross-term harmonic transposition method;
- Figs. 28 and 29 illustrate embodiments of an encoder and a decoder, respectively, using the enhanced harmonic transposition schemes outlined in the present document; and
- Fig. 30 illustrates an embodiment of a transposition unit shown in Figs. 28 and 29.
The below-described embodiments are merely illustrative for the principles of the present invention for the so-called CROSS PRODUCT ENHANCED HARMONIC TRANSPOSITION. It is understood that modifications and variations of the arrangements and the details described herein will be apparent to others skilled in the art. It is the intent, therefore, to be limited only by the scope of the impending patent claims and not by the specific details presented by way of description and explanation of the embodiments herein.
This may also be written as:
In words, the phase of the complex subband signal x is multiplied by the transposition order T and the amplitude of the complex subband signal x is modified by the gain parameter g.
In relation to the usage of cross term processing, the following remarks should be considered. The pitch parameter Ω does not have to be known with high precision, and certainly not with better frequency resolution than the frequency resolution obtained by the analysis filter bank 301. In fact, in some embodiments of the present invention, the underlying cross product enhancement pitch parameter Ω is not entered in the decoder at all. Instead, the chosen pair of integer index shifts (p 1 ,p 2) is selected from a list of possible candidates by following an optimization criterion such as the maximization of the cross product output magnitude, i.e. the maximization of the energy of the cross product output. By way of example, for given values of T and r , a list of candidates given by the formula (p 1 ,p 2) = (rl,(T― r)l), l ∈ L , where L is a list of positive integers, could be used. This is shown in further detail below in the context of formula (11). All positive integers are in principle OK as candidates. In some cases pitch information may help to identify which l to choose as appropriate index shifts.
Furthermore, even though the example cross product processing illustrated in Fig. 8 suggests that the applied index shifts (p 1 ,p 2) are the same for a certain range of output subbands, e.g. synthesis subbands (n-1), n and (n+1) are composed from analysis subbands having a fixed distance p 1 + p 2 , this need not be the case. As a matter of fact, the index shifts (p 1 ,p 2) may differ for each and every output subband. This means that for each subband n a different value Ω of the cross product enhancement pitch parameter may be selected.
This may also be written as: whereµ(|u 1|,|u 2|) is a magnitude generation function. In words, the phase of the complex subband signal u 1 is multiplied by the transposition order T - r and the phase of the complex subband signal u 2 is multiplied by the transposition order r . The sum of those two phases is used as the phase of the output y whose magnitude is obtained by the magnitude generation function. Comparing with the formula (2) the magnitude generation function is expressed as the geometric mean of magnitudes modified by the gain parameter g, that is µ(|u 1|,|u 2|)=g·|u 1|1-r/T |u 2| rlT . By allowing the gain parameter to depend on the inputs this of course covers all possibilities.
It should be noted that the formula (2) results from the underlying target that a pair of sinusoids with frequencies (ω,ω+Ω)are to be mapped to a sinusoid with frequency Tω+rΩ, which can also be written as (T―r)ω+r(ω+Ω).
In the following text, a mathematical description of the present invention will be outlined. For simplicity, continuous time signals are considered. The synthesis filter bank 303 is assumed to achieve perfect reconstruction from a corresponding complex modulated analysis filter bank 301 with a real valued symmetric window function or prototype filter w(t). The synthesis filter bank will often, but not always, use the same window in the synthesis process. The modulation is assumed to be of an evenly stacked type, the stride is normalized to one and the angular frequency spacing of the synthesis subbands is normalized to π. Hence, a target signal s(t) will be achieved at the output of the synthesis filter bank if the input subband signals to the synthesis filter bank are given by synthesis subband signals yn (k),
Note that formula (3) is a normalized continuous time mathematical model of the usual operations in a complex modulated subband analysis filter bank, such as a windowed Discrete Fourier Transform (DFT), also denoted as a Short Time Fourier Transform (STFT). With a slight modification in the argument of the complex exponential of formula (3), one obtains continuous time models for complex modulated (pseudo) Quadrature Mirror Filterbank (QMF) and complexified Modified Discrete Cosine Transform (CMDCT), also denoted as a windowed oddly stacked windowed DFT. The subband index n runs through all nonnegative integers for the continuous time case. For the discrete time counterparts, the time variable t is sampled at step 1/N , and the subband index n is limited by N , where N is the number of subbands in the filter bank, which is equal to the discrete time stride of the filter bank. In the discrete time case, a normalization factor related to N is also required in the transform operation if it is not incorporated in the scaling of the window.
For a real valued signal, there are as many complex subband samples out as there are real valued samples in for the chosen filter bank model. Therefore, there is a total oversampling (or redundancy) by a factor two. Filter banks with a higher degree of oversampling can also be employed, but the oversampling is kept small in the present description of embodiments for the clarity of exposition.
The main steps involved in the modulated filter bank analysis corresponding to formula (3) are that the signal is multiplied by a window centered around time t = k, and the resulting windowed signal is correlated with each of the complex sinusoids exp[―inπ(t―k)] . In discrete time implementations this correlation is efficiently implemented via a Fast Fourier Transform. The corresponding algorithmic steps for the synthesis filter bank are well known for those skilled in the art, and consist of synthesis modulation, synthesis windowing, and overlap add operations.
For a sinusoid, s(t)=A cos(ωt+θ)=Re{Cexp(iωt)}, the subband signals of (3) are for sufficiently large n with good approximation given by where the hat denotes the Fourier transform, i.e. ŵ is the Fourier transform of the window function w. Strictly speaking, formula (4) is only true if one adds a term with -ω instead of ω. This term is neglected based on the assumption that the frequency response of the window decays sufficiently fast, and that the sum of ω and n is not close to zero.
The synthesis subband signals yn (k) can also be determined as a result of the analysis filter bank 301 and the non-linear processing, i.e. harmonic transposer 302 illustrated in Fig. 3 . On the analysis filter bank side, the analysis subband signals xn (k) may be represented as a function of the source signal z(t). For a transposition of order T, a complex modulated analysis filter bank with window wT (t)= w(t/T)/T, a stride one, and a modulation frequency step, which is T times finer than the frequency step of the synthesis bank, is applied on the source signal z(t). Fig. 22 illustrates the appearance of the scaled window wT 2201 and its Fourier transform ŵT 2202. Compared to Fig. 20 , the time window 2201 is stretched out and the frequency window 2202 is compressed.
The analysis by the modified filter bank gives rise to the analysis subband signals xn (k):
For a sinusoid, z(t)=Bcos(ξt+ϕ)=Re{Dexp(iξt)}, one finds that the subband signals of (5) for sufficiently large n with good approximation are given by
Hence, submitting these subband signals to the harmonic transposer 302 and applying the direct transposition rule (1) to (6) yields
The synthesis subband signals yn (k) given by formula (4) and the nonlinear subband signals obtained through harmonic transposition ỹn (k) given by formal (7) ideally should match.
For odd transposition orders T, the factor containing the influence of the window in (7) is equal to one, since the Fourier transform of the window is real valued by assumption, and T -1 is an even number. Therefore, formula (7) can be matched exactly to formula (4) with ω=Tξ , for all subbands, such that the output of the synthesis filter bank with input subband signals according to formula (7) is a sinusoid with a frequency ω=Tξ , amplitude A=gB , and phase θ=Tϕ, wherein B and ϕ are determined from the formula: D=Bexp(iϕ) , which upon insertion yields . Hence, a harmonic transposition of order T of the sinusoidal source signal z(t) is obtained.
For even T, the match is more approximate, but it still holds on the positive valued part of the window frequency response ŵ, which for a symmetric real valued window includes the most important main lobe. This means that also for even values of T a harmonic transposition of the sinusoidal source signal z(t) is obtained. In the particular case of a Gaussian window, ŵ is always positive and consequently, there is no difference in performance for even and odd orders of transposition.
Similarly to formula (6), the analysis of a sinusoid with frequency ξ+Ω, i.e. the sinusoidal source signal z(t)=B'cos((ξ+Ω)t+ϕ')=Re{Eexp(i(ξ+Ω)t)}, is
Therefore, feeding the two subband signals u 1=x n-p 1 (k), which corresponds to the signal 801 in Fig. 8 , and u 2=x' n+p2(k), which corresponds to the signal 802 in Fig. 8 , into the cross product processing 800-n illustrated in Fig. 8 and applying the cross product formula (2) yields the output subband signal 803 where
From formula (9) it can be seen that the phase evolution of the output subband signal 803 of the MISO system 800-n follows the phase evolution of an analysis of a sinusoid of frequency Tξ+rΩ. This holds independently of the choice of the index shifts p 1 and p 2 . In fact, if the subband signal (9) is fed into a subband channel n corresponding to the frequency Tξ + rΩ, that is if nπ ≈ Tξ + rΩ, then the output will be a contribution to the generation of a sinusoid at frequency Tξ + rΩ. However, it is advantageous to make sure that each contribution is significant, and that the contributions add up in a beneficial fashion. These aspects will be discussed below.
Given a cross product enhancement pitch parameter Ω, suitable choices for index shifts p 1 and p 2 can be derived in order for the complex magnitude M(n,ξ) of (10) to approximate ŵ(nπ―(Tξ+rΩ)) for a range of subbands n, in which case the final output will approximate a sinusoid at the frequency Tξ + rΩ. A first consideration on main lobes imposes all three values of (n―p 1)π―Tξ, (n+p 2)π―T(ξ+Ω), nπ ―(Tξ+rΩ) to be small simultaneously, which leads to the approximate equalities
This means that when knowing the cross product enhancement pitch parameter Ω, the index shifts may be approximated by fomula (11), thereby allowing a simple selection of the analysis subbands. A more thorough analysis of the effects of the choice of the index shifts p 1 and p 2 according to formula (11) on the magnitude of the parameter M(n,ξ) according to formula (10) can be performed for important special cases of window functions w(t) such as the Gaussian window and a sine window. One finds that the desired approximation to ŵ(nπ―(Tξ+rΩ)) is very good for several subbands with nπ ≈ Tξ + rΩ.
It should be noted that the relation (11) is calibrated to the exemplary situation where the analysis filter bank 301 has an angular frequency subband spacing of π/T . In the general case, the resulting interpretation of (11) is that the cross term source span p 1 + p 2 is an integer approximating the underlying fundamental frequency Ω, measured in units of the analysis filter bank subband spacing, and that the pair (p 1,p2 ) is chosen as a multiple of (r,T―r).
For the determination of the index shift pair (p 1 ,p 2) in the decoder the following modes may be used:
- 1. A value of Ω may be derived in the encoding process and explicitly transmitted to the decoder in a sufficient precision to derive the integer values of p 1 and p 2 by means of a suitable rounding procedure, which may follow the principles that
- ∘ p 1 + p 2 approximates Ω/ Δω, where Δω is the angular frequency spacing of the analyis filter bank; and
- ∘ p 1 / p 2 is chosen to approximate r/(T―r).
- 2. For each target subband sample, the index shift pair (p 1 ,p 2) may be derived in the decoder from a pre-determined list of candidate values such as (p 1 ,p 2)=(rl,(T―r)l),l∈L , r∈{1,2,...,T―1}, where L is a list of positive integers. The selection may be based on an optimization of cross term output magnitude, e.g. a maximization of the energy of the cross term output.
- 3. For each target subband sample, the index shift pair (p 1,p 2) may be derived from a reduced list of candidate values by an optimization of cross term output magnitude, where the reduced list of candidate values is derived in the encoding process and transmitted to the decoder.
It should be noted that phase modification of the subband signals u 1 and u 2 is performed with a weighting (T―r) and r , respectively, but the subband index distance p 1 and p 2 are chosen proportional to r and (T―r), respectively. Thus the closest subband to the synthesis subband n receives the strongest phase modification.
An advantageous method for the optimization procedure for the modes 2 and 3 outlined above may be to consider the Max-Min optimization: and to use the winning pair together with its corresponding value of r to construct the cross product contribution for a given target subband index n. In the decoder search oriented modes 2 and partially also 3, the addition of cross terms for different values r is preferably done independently, since there may be a risk of adding content to the same subband several times. If, on the other hand, the fundamental frequency Ω is used for selecting the subbands as in mode 1 or if only a narrow range of subband index distances are permitted as may be the case in mode 2, this particular issue of adding content to the same subband several times may be avoided.
Furthermore, it should also be noted that for the embodiments of the cross term processing schemes outlined above an additional decoder modification of the cross product gain g may be beneficial. For instance, it is referred to the input subband signals u 1 , u 2 to the cross products MISO unit given by formula (2) and the input subband signal x to the transposition SISO unit given by formula (1). If all three signals are to be fed to the same output synthesis subband as shown in Fig. 4 , where the direct processing 401 and the cross product processing 402 provide components for the same output synthesis subband, it may be desirable to set the cross product gain g to zero, i.e. the gain unit 902 of Fig. 9 , if for a pre-defined threshold q > 1. In other words, the cross product addition is only performed if the direct term input subband magnitude |x| is small compared to both of the cross product input terms. In this context, x is the analysis subband sample for the direct term processing which leads to an output at the same synthesis subband as the cross product under consideration. This may be a precaution in order to not enhance further a harmonic component that has already been furnished by the direct transposition.
In the following, the harmonic transposition method outlined in the present document will be described for exemplary spectral configurations to illustrate the enhancements over the prior art. Fig. 10 illustrates the effect of direct harmonic transposition of order T = 2 . The top diagram 1001 depicts the partial frequency components of the original signal by vertical arrows positioned at multiples of the fundamental frequency Ω. It illustrates the source signal, e.g. at the encoder side. The diagram 1001 is segmented into a left sided source frequency range with the partial frequencies Ω,2Ω,3Ω,4Ω,5Ω and a right sided target frequency range with partial frequencies 6Ω,7Ω,8Ω. The source frequency range will typically be encoded and transmitted to the decoder. On the other hand, the right sided target frequency range, which comprises the partials 6Ω,7Ω,8Ω above the cross over frequency 1005 of the HFR method, will typically not be transmitted to the decoder. It is an object of the harmonic transposition method to reconstruct the target frequency range above the cross-over frequency 1005 of the source signal from the source frequency range. Consequently, the target frequency range, and notably the partials 6Ω,7Ω,8Ω in diagram 1001 are not available as input to the transposer.
As outlined above, it is the aim of the harmonic transposition method to regenerate the signal components 6Ω,7Ω,8Ω of the source signal from frequency components available in the source frequency range. The bottom diagram 1002 shows the output of the transposer in the right sided target frequency range. Such transposer may e.g. be placed at the decoder side. The partials at frequencies 6Ω and 8Ω are regenerated from the partials at frequencies 3Ω and 4Ω by harmonic transposition using an order of transposition T = 2 . As a result of a spectral stretching effect of the harmonic transposition, depicted here by the dotted arrows 1003 and 1004, the target partial at 7Ω is missing. This target partial at 7Ω can not be generated using the underlying prior art harmonic transposition method.
The bottom diagram 1202 shows the regenerated partials 6Ω and 8Ω superimposed with the stylized frequency responses, e.g. reference sign 1207, of selected synthesis filter bank subbands. As described earlier, these subbands have a T = 2 times coarser frequency spacing. Correspondingly, also the frequency responses are scaled by the factor T = 2 . As outlined above, the prior art direct term processing method modifies the phase of each analysis subband, i.e. of each subband below the cross-over frequency 1205 in diagram 1201, by a factor T = 2 and maps the result into the synthesis subband with the same index, i.e. a subband above the cross-over frequency 1205 in diagram 1202. This is symbolized in Fig. 12 by diagonal dotted arrows, e.g. arrow 1208 for the analysis subband 1206 and the synthesis subband 1207. The result of this direct term processing for subbands with subband indexes 9 to 16 from the analysis subband 1201 is the regeneration of the two target partials at frequencies 6Ω and 8Ω in the synthesis subband 1202 from the source partials at frequencies 3Ω and 4Ω. As can be seen from Fig. 12 , the main contribution to the target partial 6Ω comes from the subbands with the subband indexes 10 and 11, i.e. reference signs 1209 and 1210, and the main contribution to the target partial 8Ω comes from the subband with subband index 14, i.e. reference sign 1211.
As can be seen from Fig. 13 , the partial 7Ω is placed primarily within the subband 1315 with index 12 and only secondarily in the subband 1316 with index 13. Consequently, for more realistic filter responses, there will be more direct and/or cross terms around synthesis subband 1315 with index 12 which add beneficially to the synthesis of a high quality sinusoid at frequency (T―r)ω+r(ω+Ω)=Tω+rΩ=6Ω+Ω=7Ω than terms around synthesis subband 1316 with index 13. Furthermore, as highlighted in the context of formula (13), a blind addition of all cross terms with p 1=p 2=2 could lead to unwanted signal components for less periodic and academic input signals. Consequently, this phenomenon of unwanted signal components may require the application of an adaptive cross product cancellation rule such as the rule given by formula (13).
The prior art direct term processing modifies the phase of the subband signals by a factor T = 3 for each analysis subband and maps the result into the synthesis subband with the same index, as symbolized by the diagonal dotted arrows. The result of this direct term processing for subbands 6 to 11 is the regeneration of the two target partial frequencies 6Ω and 9Ω from the source partials at frequencies 2Ω and 3Ω. As can be seen from Fig. 16 , the main contribution to the target partial 6Ω comes from subband with index 7, i.e. reference sign 1606, and the main contributions to the target partial 9Ω comes from subbands with index 10 and 11, i.e. reference signs 1607 and 1608, respectively.
As shown in Fig. 17 , the synthesis subband with index 8, i.e. reference sign 1710, is obtained from a cross product formed from the analysis subbands with index (n―p 1)=8―1=7, i.e. reference sign 1706, and (n+p 2)=8+2=10, i.e. reference sign 1708. For the synthesis subband with index 9, a cross product is formed from analysis subbands with index (n―p 1)=9―1=8, i.e. reference sign 1707, and (n+p 2)=9+2=11, i.e. reference sign 1709. This process of forming cross products is symbolized by the diagonal dashed/dotted arrow pairs, i.e. arrow pair 1712, 1713 and 1714, 1715, respectively. It can be seen from Fig. 17 that the partial frequency 7Ω is positioned more prominently in subband 1710 than in subband 1711. Consequently, it is to be expected that for realistic filter responses, there will be more cross terms around synthesis subband with index 8, i.e. subband 1710, which add beneficially to the synthesis of a high quality sinusoid at frequency (T―r)ω+r(ω+Ω)=Tω+rΩ=6Ω+Ω=7Ω.
In the following, reference is made to Figures 23 and 24 which illustrate the Max-Min optimization based selection procedure (12) for the index shift pair (p 1 ,p 2) and r according to this rule for T=3. The chosen target subband index is n = 18 and the top diagram furnishes an example of the magnitude of a subband signal for a given time index. The list of positive integers is given here by the seven values L={2,3,...,8}.
It should further more be noted that when the input signal z(t) is a harmonic series with a fundamental frequency Ω, i.e. with a fundamental frequency which corresponds to the cross product enhancement pitch parameter, and Ω is sufficiently large compared to the frequency resolution of the analysis filter bank, the analysis subband signals xn (k) given by formula (6) and x'n (k) given by formula (8) are good approximations of the analysis of the input signal z(t) where the approximation is valid in different subband regions. It follows from a comparison of the formulas (6) and (8-10) that a harmonic phase evolution along the frequency axis of the input signal z(t) will be extrapolated correctly by the present invention. This holds in particular for a pure pulse train. For the output audio quality, this is an attractive feature for signals of pulse train like character, such as those produced by human voices and some musical instruments.
In the following, reference is made to Fig. 28 and Fig. 29 which illustrate an exemplary encoder 2800 and an exemplary decoder 2900, respectively, for unified speech and audio coding (USAC). The general structure of the USAC encoder 2800 and decoder 2900 is described as follows: First there may be a common pre/postprocessing consisting of an MPEG Surround (MPEGS) functional unit to handle stereo or multi-channel processing and an enhanced SBR (eSBR) unit 2801 and 2901, respectively, which handles the parametric representation of the higher audio frequencies in the input signal and which may make use of the harmonic transposition methods outlined in the present document. Then there are two branches, one consisting of a modified Advanced Audio Coding (AAC) tool path and the other consisting of a linear prediction coding (LP or LPC domain) based path, which in turn features either a frequency domain representation or a time domain representation of the LPC residual. All transmitted spectra for both, AAC and LPC, may be represented in MDCT domain following quantization and arithmetic coding. The time domain representation uses an ACELP excitation coding scheme.
The enhanced Spectral Band Replication (eSBR) unit 2801 of the encoder 2800 may comprise the high frequency reconstruction systems outlined in the present document. In particular, the eSBR unit 2801 may comprise an analysis filter bank 301 in order to generate a plurality of analysis subband signals.
This analysis subband signals may then be transposed in a non-linear processing unit 302 to generate a plurality of synthesis subband signals, which may then be inputted to a synthsis filter bank 303 in order to generate a high frequency component. In the eSBR unit 2801, on the encoding side, a set of information may be determined on how to generate a high frequency component from the low frequency component which best matches the high frequency component of the original signal. This set of information may comprise information on signal characteristics, such as a predominant fundamental frequency Ω, on the spectral envelope of the high frequency component, and it may comprise information on how to best combine analysis subband signals, i.e. information such as a limited set of index shift pairs (p 1,p 2). Encoded data related to this set of information is merged with the other encoded information in a bitstream multiplexer and forwarded as an encoded audio stream to a corresponding decoder 2900.
The decoder 2900 shown in Fig. 29 also comprises an enhanced Spectral Bandwidth Replication (eSBR) unit 2901. This eSBR unit 2901 receives the encoded audio bitstream or the encoded signal from the encoder 2800 and uses the methods outlined in the present document to generate a high frequency component of the signal, which is merged with the decoded low frequency component to yield a decoded signal. The eSBR unit 2901 may comprise the different components outlined in the present document. In particular, it may comprise an analysis filter bank 301, a non-linear processing unit 302 and a synthesis filter bank 303. The eSBR unit 2901 may use information on the high frequency component provided by the encoder 2800 in order to perform the high frequency reconstruction. Such information may be a fundamental frequency Ω of the signal, the spectral envelope of the original high frequency component and/or information on the analysis subbands which are to be used in order to generate the synthesis subband signals and ultimately the high frequency component of the decoded signal.
Furthermore, Figs. 28 and 29 illustrate possible additional components of a USAC encoder/decoder, such as:
- a bitstream payload demultiplexer tool, which separates the bitstream payload into the parts for each tool, and provides each of the tools with the bitstream payload information related to that tool;
- a scalefactor noiseless decoding tool, which takes information from the bitstream payload demultiplexer, parses that information, and decodes the Huffman and DPCM coded scalefactors;
- a spectral noiseless decoding tool, which takes information from the bitstream payload demultiplexer, parses that information, decodes the arithmetically coded data, and reconstructs the quantized spectra;
- an inverse quantizer tool, which takes the quantized values for the spectra, and converts the integer values to the non-scaled, reconstructed spectra; this quantizer is preferably a companding quantizer, whose companding factor depends on the chosen core coding mode;
- a noise filling tool, which is used to fill spectral gaps in the decoded spectra, which occur when spectral values are quantized to zero e.g. due to a strong restriction on bit demand in the encoder;
- a rescaling tool, which converts the integer representation of the scalefactors to the actual values, and multiplies the un-scaled inversely quantized spectra by the relevant scalefactors;
- a M/S tool, as described in ISO/IEC 14496-3;
- a temporal noise shaping (TNS) tool, as described in ISO/IEC 14496-3;
- a filter bank / block switching tool, which applies the inverse of the frequency mapping that was carried out in the encoder; an inverse modified discrete cosine transform (IMDCT) is preferably used for the filter bank tool;
- a time-warped filter bank / block switching tool, which replaces the normal filter bank / block switching tool when the time warping mode is enabled; the filter bank preferably is the same (IMDCT) as for the normal filter bank, additionally the windowed time domain samples are mapped from the warped time domain to the linear time domain by time-varying resampling;
- an MPEG Surround (MPEGS) tool, which produces multiple signals from one or more input signals by applying a sophisticated upmix procedure to the input signal(s) controlled by appropriate spatial parameters; in the USAC context, MPEGS is preferably used for coding a multichannel signal, by transmitting parametric side information alongside a transmitted downmixed signal;
- a Signal Classifier tool, which analyses the original input signal and generates from it control information which triggers the selection of the different coding modes; the analysis of the input signal is typically implementation dependent and will try to choose the optimal core coding mode for a given input signal frame; the output of the signal classifier may optionally also be used to influence the behaviour of other tools, for example MPEG Surround, enhanced SBR, time-warped filterbank and others;
- a LPC filter tool, which produces a time domain signal from an excitation domain signal by filtering the reconstructed excitation signal through a linear prediction synthesis filter; and
- an ACELP tool, which provides a way to efficiently represent a time domain excitation signal by combining a long term predictor (adaptive codeword) with a pulse-like sequence (innovation codeword).
In Fig. 30 the low frequency component 3013 is fed into a QMF filter bank, in order to generate QMF frequency bands. These QMF frequency bands are not be mistaken with the analysis subbands outlined in this document. The QMF frequency bands are used for the purpose of manipulating and merging the low and high frequency component of the signal in the frequency domain, rather than in the time domain. The low frequency component 3014 is fed into the transposition unit 3004 which corresponds to the systems for high frequency reconstruction outlined in the present document. The transposition unit 3004 may also receive additional information 3011, such as the fundamental frequency Ω of the encoded signal and/or possible index shift pairs (p1,p2) for subband selection. The transposition unit 3004 generates a high frequency component 3012, also known as highband, of the signal, which is transformed into the frequency domain by a QMF filter bank 3003. Both, the QMF transformed low frequency component and the QMF transformed high frequency component are fed into a manipulation and merging unit 3005. This unit 3005 may perform an envelope adjustment of the high frequency component and combines the adjusted high frequency component and the low frequency component. The combined output signal is re-transformed into the time domain by an inverse QMF filter bank 3001.
Typically the QMF filter banks comprise 64 QMF frequency bands. It should be noted, however, that it may be beneficial to down-sample the low frequency component 3013, such that the QMF filter bank 3002 only requires 32 QMF frequency bands. In such cases, the low frequency component 3013 has a bandwidth of ƒs /4, where ƒs is the sampling frequency of the signal. On the other hand, the high frequency component 3012 has a bandwidth of ƒs /2.
The method and system described in the present document may be implemented as software, firmware and/or hardware. Certain components may e.g. be implemented as software running on a digital signal processor or microprocessor. Other component may e.g. be implemented as hardware and or as application specific integrated circuits. The signals encountered in the described methods and systems may be stored on media such as random access memory or optical storage media. They may be transferred via networks, such as radio networks, satellite networks, wireless networks or wireline networks, e.g. the internet. Typical devices making use of the method and system described in the present document are set-top boxes or other customer premises equipment which decode audio signals.
On the encoding side, the method and system may be used in broadcasting stations, e.g. in video headend systems.
The present document outlined a method and a system for performing high frequency reconstruction of a signal based on the low frequency component of that signal. By using combinations of subbands from the low frequency component, the method and system allow the reconstruction of frequencies and frequency bands which may not be generated by transposition methods known from the art. Furthermore, the described HTR method and system allow the use of low cross over frequencies and/or the generation of large high frequency bands from narrow low frequency bands.
Claims (12)
- A system for decoding an audio signal, the system comprising:a core decoder (101) for decoding a low frequency component of the audio signal;an analysis filter bank (301) for providing a plurality of analysis subband signals of the low frequency component of the audio signal;a subband selection reception unit for receiving information associated with a fundamental frequency Ω of the audio signal, and for selecting, in response to the information, a first (801) and a second (802) analysis subband signal from the plurality of analysis subband signals, from which a synthesis subband signal (803) is generated;a non-linear processing unit (302) to generate the synthesis subband signal with a synthesis frequency, a magnitude and a phase by:determining the magnitude of the synthesis subband signal from a generalized mean value of the magnitudes of the first and the second analysis subband signals, anddetermining the phase of the synthesis subband signal from a weighted sum of the phases of the first and the second analysis subband signals; anda synthesis filter bank (303) for generating a high frequency component of the audio signal from the synthesis subband signal;wherein the information associated with a fundamental frequency Ω of the audio signal is received in an encoded bit stream.
- The system according to claim 1, whereinthe analysis filter bank (301) has N analysis subbands at an essentially constant subband spacing of Δω;an analysis subband is associated with an analysis subband index n, with n∈{1,...,N};the synthesis filter bank (303) has a synthesis subband;the synthesis subband is associated with a synthesis subband index n; andthe synthesis subband and the analysis subband with index n each comprise frequency ranges which relate to each other through a factor T.
- The system according to claim 2, further comprising:an analysis window (2001), which isolates a pre-defined time interval of the low frequency component around a pre-defined time instance k; anda synthesis window (2201), which isolates a pre-defined time interval of the high frequency component around the pre-defined time instance k.
- The system according to claim 3, wherein the synthesis window (2201) is a time-scaled version of the analysis window (2001).
- The system according to claim 1, further comprising:an upsampler (104) for performing an upsampling of the low frequency component to yield an upsampled low frequency component;an envelope adjuster (103) to shape the high frequency component; anda component summing unit to determine a decoded audio signal as the sum of the upsampled low frequency component and the adjusted high frequency component.
- The system according to claim 5, further comprising an envelope reception unit for receiving information related to the envelope of the high frequency component of the audio signal.
- The system according to claim 6, further comprising:an input unit for receiving the audio signal, comprising the low frequency component; andan output unit for providing the decoded audio signal, comprising the low and the generated high frequency component.
- The system according to claim 1, wherein the non-linear processing unit (302) comprises a multiple-input-single-output unit (800-n) of a first and second transposition order for generating the synthesis subband signal (803) with the synthesis frequency from the first (801) and the second (802) analysis subband signals with a first and a second analysis frequency, respectively; wherein the synthesis frequency corresponds to the first analysis frequency multiplied by the first transposition order plus the second analysis frequency multiplied by the second transposition order.
- The system according to claim 8, wherein:the first analysis frequency is ω;the second analysis frequency is (ω+Ω)the first transposition order is (T-r);the second transposition order is r;T>1; and1 ≤ r < T;such that the synthesis frequency is (T-r)·ω+r·(ω+Ω).
- The system according to claim 1, wherein the analysis filter bank (301) exhibits a frequency spacing which is associated with the fundamental frequency Ω of the audio signal.
- A method for decoding an audio signal, the method comprising:decoding a low frequency component of the audio signal;providing a plurality of analysis subband signals of the low frequency component of the audio signal;receiving information associated with a fundamental frequency Ω of the audio signal which allows the selection of a first (801) and a second (802) analysis subband signal from the plurality of analysis subband signals;generating a synthesis subband signal with a synthesis frequency, a magnitude and a phase by:determining the magnitude of the synthesis subband signal from a mean value of the magnitudes of the first and the second analysis subband signals, anddetermining the phase of the synthesis subband signal from a weighted sum of the phases of the first and second analysis subband signals; andgenerating (303) a high frequency component of the audio signal from the synthesis subband signal;wherein the information associated with a fundamental frequency Ω of the audio signal is received in an encoded bit stream.
- A storage medium comprising a software program adapted for execution on a processor and for performing the method steps of claim 11 when carried out on a computing device.
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US61/145,223 | 2009-01-16 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK40021957A HK40021957A (en) | 2020-11-13 |
| HK40021957B true HK40021957B (en) | 2022-04-08 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11682410B2 (en) | Cross product enhanced harmonic transposition | |
| HK40081534B (en) | Cross product enhanced harmonic transposition | |
| HK40098914A (en) | Cross product enhanced harmonic transposition | |
| HK40098914B (en) | Cross product enhanced harmonic transposition | |
| HK40065442B (en) | Cross product enhanced harmonic transposition | |
| HK40120575B (en) | Cross product enhanced harmonic transposition | |
| HK40120575A (en) | Cross product enhanced harmonic transposition | |
| HK40120576A (en) | Cross product enhanced harmonic transposition | |
| HK40081534A (en) | Cross product enhanced harmonic transposition | |
| HK40021957A (en) | Cross product enhanced harmonic transposition | |
| HK40021957B (en) | Cross product enhanced harmonic transposition | |
| HK40065442A (en) | Cross product enhanced harmonic transposition | |
| HK40021958B (en) | Cross product enhanced harmonic transposition | |
| HK40021959A (en) | Cross product enhanced harmonic transposition | |
| HK40021959B (en) | Cross product enhanced harmonic transposition | |
| HK40021958A (en) | Cross product enhanced harmonic transposition | |
| HK1186566B (en) | Cross product enhanced harmonic transposition | |
| HK1186566A (en) | Cross product enhanced harmonic transposition | |
| HK1162735B (en) | Cross product enhanced harmonic transposition |