HK1143240B - Efficient design of mdct / imdct filterbanks for speech and audio coding applications - Google Patents
Efficient design of mdct / imdct filterbanks for speech and audio coding applications Download PDFInfo
- Publication number
- HK1143240B HK1143240B HK10109678.8A HK10109678A HK1143240B HK 1143240 B HK1143240 B HK 1143240B HK 10109678 A HK10109678 A HK 10109678A HK 1143240 B HK1143240 B HK 1143240B
- Authority
- HK
- Hong Kong
- Prior art keywords
- windowing
- transform
- factors
- dct
- piece
- Prior art date
Links
Abstract
A more efficient encoder/decoder is provided in which an N-point MDCT transform is mapped into smaller sized N/2 -point DCT-IV and/or DCT-II transforms with isolated pre-multiplications which can be moved to a prior or subsequent windowing stage. That is, the windowing operations may be merged with first/last stage multiplications in the core MDCT/IMDCT functions, respectively, thus reducing the total number of multiplications. Additionally, the MDCT may be systematically decimated by factor of 2 by utilizing a uniformly scaled 5-point DCT-II core function as opposed to the DCT-IV or FFT cores used in many existing MDCT designs in audio codecs. The modified windowing stage merges factors from a transform stage and windowing stage to obtain piece-wise symmetric windowing factors, which can be represented by a sub-set of the piece-wise symmetric windowing factors to save storage space. Such features offer appreciable reduction in complexity and less memory usage than the prior art.
Description
Claiming priority in accordance with 35U.S.C. § 119
The present patent application claims priority of united states provisional application No. 60/973,709 entitled "Design of Fast MDCT/IMDCT Module for g.vbr Codec" filed on 9/19/2007 and united states provisional application No. 60/989,400 entitled "Design of Fixed-Point MDCT/IMDCT Module for g.vbr Codec" filed on 11/20/2007, both of which are assigned to their assignee and are expressly incorporated herein by reference.
Technical Field
The following description relates generally to encoders and decoders, and in particular, to an efficient MDCT/IMDCT implementation for speech and audio codecs.
Background
One goal of audio coding is to compress an audio signal to a desired limited amount of information while preserving as much of the original sound quality as possible. In the encoding process, the audio signal in the time domain is transformed into the frequency domain, and the corresponding decoding process is reversed to this operation.
As part of this encoding process, the signal may be processed by a Modified Discrete Cosine Transform (MDCT). Modified Discrete Cosine Transform (MDCT) is a fourier-related transform based on the class IV discrete cosine transform (DCT-IV), with the additional property that blocks are overlapped so that the end of a block coincides with the beginning of the next block. This overlap helps to avoid aliasing artifacts (aliasing artifacts) and, together with the energy-concentrating quality of the DCT, makes MDCT particularly attractive for signal compression applications.
MDCT transforms have also been applied in speech compression. The ITU-T G.722.1 and G.722.1C vocoders apply the MDCT to the input speech signal, which more recent ITU-T G.729.1 and G.718 algorithms use to process the residual signal that remains after the use of a Code Excited Linear Prediction (CELP) encoder. The vocoder mentioned above operates at an input sampling rate of 8kHz or 16kHz, and 10 or 20 millisecond frames. Therefore, its MDCT filter bank is a 160-point or 320-point transform.
However, if future speech coders will support block switching functionality, support for decimation sizes (160 points, 80 points, 40 points) may also be required.
Disclosure of Invention
The following presents a simplified summary of one or more embodiments in order to provide a basic understanding of some embodiments. This summary is not an extensive overview of all contemplated embodiments, and is intended to neither identify key or critical elements of all embodiments nor delineate the scope of any or all embodiments. Its sole purpose is to present some concepts of one or more embodiments in a simplified form as a prelude to the more detailed description that is presented later.
An encoding method and/or apparatus is provided for computing transform values. Time domain input values representing an audio signal are received. A modified windowing function may be generated or obtained that combines factors from the transform operation and the windowing operation to obtain a piece-wise symmetric windowing factor. A subset of the piece-wise symmetric windowing factors is stored, from which a complete set of piece-wise symmetric windowing factors can be reconstructed. The stored subset of segment-wise symmetric windowing factors may include at least half of the unique factors for each segment-wise symmetric set of windowing factors. The complete set of reconstructed piece-wise symmetric windowing factors may be applied to the input values before transforming the input values. Input values may be transformed into spectral coefficients using a Modified Discrete Cosine Transform (MDCT) that is recursively split into at least one of a class IV discrete cosine transform (DCT-IV), a class II discrete cosine transform (DCT-II), or both DCT-IV and DCT-II, with each such transform having dimensions smaller than the MDCT, with at least some multiplication operations of the MDCT being merged with prior windowing operations applied to the input values. DCT-II may be a 5-point transform that may implement MDCTs of different sizes. The MDCT may implement at least two of 320-point, 160-point, 80-point, 40-point transforms using the same DCT-II. For fixed-point implementations, dynamic range estimation and renormalization may also be performed on the output from the windowing function.
Decoding methods and/or apparatus are provided for computing transform values. Spectral coefficients representing an audio signal are received. The spectral coefficients may be transformed into time-domain output values using an Inverse Modified Discrete Cosine Transform (IMDCT) that is recursively split into at least one of a class IV inverse discrete cosine transform (IDCT-IV), a class II inverse discrete cosine transform (IDCT-II), or both IDCT-IV and IDCT-II, wherein each such inverse transform has a smaller dimension than the IMDCT, wherein at least some multiplication operations of the IMDCT are combined with subsequent windowing operations applied to the output values. For example, IDCT-II is a 5-point inverse transform that implements IMDCTs of different sizes. IMDCT may implement at least two of 320-point, 160-point, 80-point, 40-point inverse transforms using the same core IDCT-II. In addition, a modified windowing function may be generated that combines factors from the transform operation and the windowing operation to obtain piece-wise symmetric windowing factors. A subset of the piece-wise symmetric windowing factors may be stored, from which a complete set of piece-wise symmetric windowing factors may be reconstructed. The stored subset of segment-wise symmetric windowing factors includes at least half of the unique factors for each segment-wise symmetric set of windowing factors. The complete set of reconstructed piece-wise symmetric windowing factors may be applied to the output values after the spectral coefficients are transformed. For fixed-point implementations, dynamic range estimation and renormalization may be performed on the output from the windowing function.
Yet another example demonstrates a method and/or apparatus for performing a windowing operation. A modified windowing function may be generated that combines factors from the transform stage and the windowing stage to obtain a piece-wise symmetric windowing factor. The piece-wise symmetric windowing factors may be split to obtain a subset of the piece-wise symmetric windowing factors and reduce the total number of unique factors. A subset of the piece-wise symmetric windowing factors may be stored, from which a complete set of piece-wise symmetric windowing factors may be reconstructed. The stored subset of segment-wise symmetric windowing factors includes at least half of the unique factors for each segment-wise symmetric set of windowing factors. Subsequently, an input value representing the audio signal may be received. The complete set of reconstructed piece-wise symmetric windowing factors may be applied to the input values to provide windowed output values.
In one example, the windowing stage may occur prior to the transform stage. The transform stage may implement a Modified Discrete Cosine Transform (MDCT) that is recursively split into at least one of a class IV discrete cosine transform (DCT IV), a class II discrete cosine transform (DCT II), or both DCT IV and DCT II, wherein each such transform has smaller dimensions than the MDCT. The transform level factor may be a cosine factor.
In another example, the windowing stage may occur after the transform stage. The transform stage may implement an Inverse Modified Discrete Cosine Transform (IMDCT) that is recursively split into at least one of a class IV inverse discrete cosine transform (IDCT IV), or both IDCT IV and IDCT II, with each such transform having smaller dimensions than the IMDCT.
Drawings
Various features, properties and advantages may be apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference characters identify correspondingly throughout.
Fig. 1 is a block diagram illustrating an example of an encoder that may include an MDCT analysis filter bank.
Fig. 2 is a block diagram illustrating an example in which a decoder including an IMDCT synthesis filter bank may be implemented.
Fig. 3 illustrates the manner in which an MDCT transform may be implemented based on an N/2-point DCT-IV core function.
FIG. 4 illustrates the manner in which the IMDCT transform may be implemented based on the N/2-point IDCT-IV core function.
FIG. 5 is a diagram illustrating a 5-point DCT-II transform that may be implemented as part of an encoder MDCT transform.
Fig. 6 is a diagram illustrating a 5-point IDCT-II transform that may be implemented as part of a decoder IMDCT transform.
Fig. 7 is a block diagram illustrating an example of a manner in which a DCT-IV transform of length N-10 points may be implemented using two DCT-II transforms.
Fig. 8 is a block diagram illustrating an example of a manner in which an IDCT-IV transform of length N-10 points may be implemented using two IDCT-II transforms.
FIG. 9 is a graph illustrating the piece-wise symmetric nature of the windowing function.
FIG. 10 is a block diagram illustrating an apparatus for computing transform values.
Fig. 11 illustrates an example of a method for encoding a signal using an MDCT transform based on a core DCT-II transform.
FIG. 12 is a block diagram illustrating an apparatus for computing transform values.
Fig. 13 illustrates an example of a method for decoding a signal using an IMDCT transform based on a core IDCT-II transform.
FIG. 14 is a block diagram illustrating a device for performing a windowing operation.
FIG. 15 illustrates an example of a method for performing a windowing operation.
Detailed Description
Various embodiments are now described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be evident, however, that such embodiment(s) may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate describing one or more embodiments.
Overview
One feature provides for implementing an N-point MDCT transform by mapping it to smaller size N/2-point DCT-IV and DCT-II transforms with isolated pre-multiplications that can be moved to subsequent windowing stages. That is, the windowing operations may be merged with the first/last stage multiplications in the core MDCT/IMDCT functions, respectively, thus reducing the total number of multiplications. In addition, the MDCT can be systematically decimated by a factor of 2 by utilizing a uniformly scaled 5-point DCT-II core function (using up to 5 non-ordinary multiplications) as opposed to DCT-IV or FFT cores used for many existing MDCT designs in audio codecs.
The modified windowing stage provides a piece-wise symmetry factor that can be stored using a half factor. Such features provide significant reductions in complexity and less memory usage than the prior art.
Codec structure
Fig. 1 is a block diagram illustrating an example of an encoder that may include an MDCT analysis filter bank. The encoder 102 may receive an input audio signal 104. The MDCT analysis filter bank 106 (i.e., a class IV discrete cosine transform-based modified discrete cosine transform) operates to decompose the time-domain input audio signal 104 into a plurality of subband signals, and to transform the signal into the frequency domain, where each subband signal is converted into transform coefficients per subband per block. The resulting signal is then quantized by quantizer 108 and encoded by entropy encoder 110 to produce a bitstream 112 of digitized audio signals. According to one example, the MDCT analysis filter bank 106 may be implemented by a windowing function 114, a transform 116 (e.g., time domain to frequency domain), and/or a scaling function 118. The MDCT analysis filter bank 106 including the windowing function 114, the transform 116, and/or the scaling function 116 may be implemented in hardware (e.g., as a processor, circuitry, programmable logic device, etc.), software (e.g., instructions executable by a processor), and/or a combination thereof.
Fig. 2 is a block diagram illustrating an example in which a decoder including an IMDCT synthesis filter bank may be implemented. Decoder 202 may receive bit stream 204. The entropy decoder 206 decodes the bitstream 204, and the bitstream 204 is then dequantized by a dequantizer 208 to produce a frequency-domain signal. The IMDCT synthesis filter bank 210 (i.e., the inverse modified discrete cosine transform based on the class IV discrete cosine transform) operates to convert the frequency domain signal 104 back to the time domain audio signal 212. The IMDCT synthesis filter bank 210 may reverse the operation of the MDCT analysis filter bank 106. According to one example, IMDCT synthesis filter bank 210 may be implemented by scaling function 214, inverse transform 216 (e.g., frequency domain to time domain), and windowing emphasis and addition functions 218. The IMDCT synthesis filter bank 210 including the scaling function 214, the inverse transform 216, and/or the windowing function 218 may be implemented in hardware (e.g., as a processor, circuitry, programmable logic device, etc.), software (e.g., as instructions executable by a processor), and/or a combination thereof.
MDCT implementation using DCT-IV and DCT-II
According to a feature, transform 116 (fig. 1) and inverse transform 216 (fig. 2) may be decimated and implemented by one or more DCT-IV (and IDCT-IV) transforms, which may be implemented as one or more DCT-II (and IDCT-II) transforms, respectively.
The Modified Discrete Cosine Transform (MDCT) may be defined by the following equation:
k is 0, 1,.., N/2-1. (equation 1)
Similarly, the inverse mdct (imdct) may be defined by the following equation:
n-1, 0, 1. (equation 2)
Where { x (N), N-0, 1, … N-1, represents the input sequence of samples, N indicates the frame length, x (k) is the resulting MDCT coefficients, andrepresenting the reconstructed output.
Using the matrix notation, the MDCT transform can be represented by the matrix M:
wherein i is 0, 1,.., N/2-1; j-0, 1. (equation 3)
Thus, X is Mx andwhere x represents a matrix of input samples [ x (0) ], x (N-1)]TX denotes a matrix of the resulting MDCT coefficientsAnd isMatrix representing reconstructed outputs
To implement the MDCT transform, the MDCT transform may be mapped into an N/2-point DCT-IV core function. For example, the transform 116 of FIG. 1 may be implemented as one or more N/2-point DCT-IV transforms.
The DCT-IV transform may be defined as:
k-0, 1. (equation 4)
Meanwhile, the IDCT-IV transform may be defined as:
n-1, 0, 1. (equation 5)
The MDCT transform may be mapped to an N/2-point DCT-IV transform, such as
(equation 6)
And may map the IMDCT transform to an N/2-point IDCT-IV transform, e.g.
(equation 7)
Wherein
(equation 8)
Wherein IN/4Is an N/4 XN/4 unit matrix, and JN/4Is an N/4 XN/4 reverse order matrix and defines the matrix S as
(equation 9)
And CN/.2 IVIs an N/2 XN/2 DCT-IV matrix, canIt is defined as
i, j ═ 0, 1.., N/2-1 (equation 10)
The DCT-IV matrix may be mapped to a DCT-II transform by using the symmetry and involution properties of the DCT-IV matrix. The DCT-II transform may be defined as:
k-0, 1. (equation 11)
Likewise, the IDCT-II transform may be defined as:
n-1, 0, 1. (equation 12)
Wherein if k is 0, thenOtherwise is1。
Fig. 3 illustrates the manner in which an MDCT transform may be implemented based on an N/2-point DCT-IV core function. The MDCT transform may be implemented as part of an encoder that transforms time-domain input samples into frequency-domain output samples. For an input sequence X (3N/4) to X (N/4)304, the MDCT transform may be represented by a cosine factor 306 and a subsequent DCT-IV transform 302 to produce an output 308. As discussed below, the cosine factor 306 may be absorbed in a previous windowing stage/function within the encoder.
Similarly, FIG. 4 illustrates the manner in which the IMDCT transform may be implemented based on the N/2-point IDCT-IV core function. The IMDCT transform may be implemented as part of a decoder that transforms frequency-domain input samples into time-domain output samples. For the input sequence X (0) to X (N/2-1)404, the IMDCT transform may be represented by an IDCT-IV transform 402 followed by a cosine factor 406 to produce an output 408. As discussed below, the cosine factor 406 may be absorbed in a subsequent windowing stage/function within the decoder. Note that the IMDCT mapping and cosine factors illustrated in fig. 4 are used to reverse the operation of the MDCT mapping (fig. 3) assuming that the same windowing functions are used at both the encoder and the decoder.
The use of cosine factors 306 and 406 in both of these mappings (fig. 3 and 4) provides numerical stability at or near zero values, which may not be achieved with other types of factors (e.g., inverse cosine factors).
Note that the inputs to the MDCT and IMDCT transforms may be processed as frames or blocks having multiple data points. Therefore, in order for a vocoder (e.g., g.vbr codec) to support data blocks with frame lengths less than 320, a transform of the decimated size is required. For blocks with frame lengths 160, 80, 40, etc., it is observed that these sizes are all multiples of 5. Thus, the last non-reducible (by decimation technique) block size may use a transform of size 5. It was observed that in terms of decimation techniques, it is more efficient to design a 5-point DCT-II transform than a DCT-IV or FFF transform.
The DCT-IV transform may be mapped to a DCT-II transform, e.g.
(equation 13)
Wherein D is a diagonal matrix having the following elements
i ═ 0, 1,., N/2-1, (equation 14)
(equation 15)
And CN/2 IIMay be an N/2 XN/2 DCT-II matrix, which may be defined as
i, j ═ 0, 1.., N/2-1. (equation 16)
Fig. 5 is a diagram illustrating the factorization of a 5-point DCT-II transform that may be implemented as part of an encoder MDCT transform. Note that the factor a in this transform is a dyadic rational number, and thus multiplication by it is only a binary shift operation. This 5-point transform can be implemented using plane rotation and 5 multiplications or using 4 multiplications by factoring the plane rotation, or using lifting steps. For a 5-point sequence of input x 502, the output C of a 5-point DCT-II transform may be generated using 4 non-ordinary multiplications and 13 additions or 5 multiplications and 13 additionsII504. DCT-II transform output CIIThe generation is:
f1=x(0)+x(4);f2=x(4)-x(0)
f3=x(3)-x(1);f4=x(3)+x(1)
f5=f1+f4;f6=f4-f1
g1=x(2)-αf5;g2=x(2)+f5
g3=βf2+γf3;g4=βf3-γf2
CII(0)=g2;CII(1)=g4;CII(2)=δf6-g1
CII(3)=g3;CII(4)=g1-δf6
fig. 6 is a diagram illustrating a 5-point IDCT-II transform that may be implemented as part of a decoder IMDCT transform. That is, this IDCT-II transform may be used to implement an IDCT-IV transform on the decoder IMDCT transform (fig. 4). It can be implemented using plane rotation and 5 multiplications or using 4 multiplications by factoring the plane rotation, or using lifting steps. For input CII602, the output of a 5-point IDCT-II transform may be generated using 4 non-ordinary multiplications and 13 additions or 5 non-ordinary multiplications and 12 additions as illustratedIDCT-II transform outputThe generation is:
a1=CII(2)+CII(4);a2=CII(4)-CII(2)
b1=CII(0)+a2;b2=CII(0)-αa2
b3=βCII(1)+γCII(3);b4=-βCII(3)+γCII(1)
b5=b2+δa1;b6=b2-δa1
fig. 7 is a block diagram illustrating an example of a manner in which a DCT-IV transform of length N-10 points may be implemented using two DCT-II transforms (N-5 points). For a sequence of ten input points x (0), …, x (9)702, a 10-point DCT-IV transform may be performed by two 5-point DCT-II transforms 704 and 706 and a factor 708 to produce an output coefficient CII(0),…,CII(9)710. In this way, a core 5-point DCT-II transform may be used to implement a transform capable of handling 160, 80, 40, etc. frame lengths.
Fig. 8 is a block diagram illustrating an example of a manner in which an IDCT-IV transform of length N-10 points may be implemented using two IDCT-II transforms (N-5 points). For ten input points CII(0),…,CII(9)802, a 10-point IDCT-IV transform may be implemented by two 5-point DCT-II transforms 804 and 806 and a factor 808 to produce output coefficients(9)810. In this way, a core 5-point IDCT-II transform may be used to implement a transform capable of handling frame lengths of 160, 80, 40, etc.
Merging multiplication factors in windowing stages
MDCT transforms are often used in speech and audio coding algorithms (e.g., g.vbr codecs), and are essentially scaled MDCTs combined with a windowing function h (n):
k-0,.., N-1. (equation 17)
Where f (n) indicates the input data samples, h (n) is a windowing function, and f (k) indicates the output MDCT spectral coefficients. For example, the windowing function h (n) may be a sine function:
(equation 18)
As previously discussed, mapping the DCT-IV transform to pre-multipliers involved in the MDCT transform (e.g., 306 in fig. 3), and mapping the IDCT-IV transform to post-multipliers involved in the IMDCT transform (e.g., 406 in fig. 4) may be incorporated in their respective windowing stages. For example, the windowing function may be a sinusoidal function, e.g., it may be defined as:
(equation 19)
The combination of this windowing function h (n) and the transform factor yields a modified windowing function:
(equation 20)
For 0 ≦ N < N/4:
(equation 21)
For N/4. ltoreq. N < 3N/4:
(equation 22)
These combined or combined windowing factors may be pre-computed and/or stored. In the case of cosine factor 306 in fig. 3 and cosine factor 406 in fig. 4, these provide the piecewise factor that was combined. Thus, only a subset (e.g., half) of the factors need to be stored for the modified windowing function. During the windowing operation on the values, a subset of the stored factors may be retrieved and used according to their piece-wise symmetric nature.
FIG. 9 is a graph illustrating the piece-wise symmetric nature of the windowing function (equation 20). In contrast to symmetric sinusoidal windows, since only half of the windowing factors 902 and 904 are stored, the same amount of memory can be used to store the windowing factors. In this example, for N-640 samples/factor and the illustrated piece-wise symmetric window, the first set of 160 samples (i.e., 0-N/4-1) may be represented by only the first 80 samples or factors 902 (since this is the symmetric portion). Likewise, the second set of 480 samples/factor (i.e., N/4 to N) may be represented by only the first 240 samples or factors 904. Thus, only half of the factor is stored, thereby saving memory space. In addition, since this reduction in sample points reduces the number of memory accesses to retrieve the windowing factor, it may also result in faster processing.
Example of encoding using MDCT transform
FIG. 10 is a block diagram illustrating an apparatus for computing transform values. The apparatus 1002 may include an input module 1006, a windowing module 1010, and/or a transformation module 1014. The input module 1006 may be adapted to receive the audio signal 1004 and provide a time domain input value 1008 representative of the audio signal. The windowing module 1010 may generate a modified windowing function that combines factors from the transform operation and the windowing operation to obtain piece-wise symmetric windowing factors. For example, the windowing module 1010 may include a merging module 1018, a factor splitting module 1019, a storage module 1020, and/or a windowing function 1022. The merge module 1018 may perform a function that merges factors from the transform operation and the windowing operation to obtain a piece-wise symmetric windowing factor. For example, the cosine factor 306 (FIG. 3) may be combined with other windowing function factors. The factor splitting module 1019 may then obtain a subset of segment-by-segment symmetric windowing factors (as illustrated in fig. 9). The storage module 1020 may then store a subset of the piece-wise symmetric windowing factors from which a complete set of the piece-wise symmetric windowing factors may be reconstructed. For example, the subset of piece-wise symmetric windowing factors may comprise at least half of the unique factors for each piece-wise symmetric set of windowing factors. The windowing module 1010 (via windowing function 1022) may be further configured to apply the full set of reconstructed piece-wise symmetric windowing factors to the input values 1008 (thereby obtaining windowed input values 1012) prior to transforming the input values.
The transform module 1014 may transform the windowed input values 1012 into spectral coefficients 1016 using, for example, a Modified Discrete Cosine Transform (MDCT). The MDCT may be recursively split into at least one of a class IV discrete cosine transform (DCT-IV), a class II discrete cosine transform (DCT-II), or both, where each such transform has a smaller dimension than the MDCT, with at least some multiplication operations of the MDCT being merged with previous windowing operations applied to the input values. In one example, the DCT-II may be a 5-point transform that implements MDCTs of different sizes, such as the DCT-II illustrated in FIG. 5. The MDCT may implement at least two of 320-point, 160-point, 80-point, 40-point transforms using the same core DCT-II. The components of device 1002 may be implemented as hardware, software, and/or combinations thereof. For example, the device 1002 may be a processor and/or circuitry that implements the functionality of its components or modules.
Fig. 11 illustrates an example of a method for encoding a signal using an MDCT transform based on a core DCT-II transform. A time domain input value 1102 representing an audio signal may be received. For example, an analog audio signal (e.g., a voice signal, music, video, etc.) may be sampled to obtain input values.
In one example, a modified windowing function 1104 may be generated that combines factors from the transform operation and the windowing operation to obtain piece-wise symmetric windowing factors. A subset of the piece-wise symmetric windowing factors is then stored from which a complete set 1106 of piece-wise symmetric windowing factors can be reconstructed. The complete set of reconstructed piece-wise symmetric windowing factors may be applied to the input values 1108 prior to transforming the input values. For example, the cosine factor illustrated in fig. 3 (reference 306) for the MDCT transform may be applied at a previous windowing operation. The subset of piece-wise symmetric windowing factors may include at least half of the unique factors for each piece-wise symmetric set of windowing factors.
The resulting (windowed) input values (from the windowing operation) may be transformed into spectral coefficients using a Modified Discrete Cosine Transform (MDCT) that is recursively split into at least one of a class IV discrete cosine transform (DCT-IV), a class II discrete cosine transform (DCT-II), or both DCT-IV and DCT-II, where each such transform has a smaller dimension than the MDCT, with at least some multiplication operations of the MDCT being merged 1110 with previous windowing operations applied to the input values. For example, the MDCT may be implemented based on a class IV discrete cosine transform (DCT-IV) that is implemented based on a core DCT-II (e.g., the transform in FIG. 5). DCT-II may be a 5-point transform that implements MDCTs of different sizes. For example, the MDCT may implement at least two of 320-point, 160-point, 80-point, 40-point transforms using the same core DCT-II. The core DCT-II may include five (5) multiplications and 12 additions or four (4) multiplications and 13 additions.
Additionally, for fixed-point implementations, dynamic range estimation and/or renormalization 1112 may be performed on the output from the windowing function. In one example, renormalization may be performed by shifting all remaining intermediate values (buffers), leaving at least one bit as headspace to prevent overflow in subsequent stages in the transform.
Example of decoding Using IMDCT transform
FIG. 12 is a block diagram illustrating an apparatus for computing transform values. The apparatus 1202 may include an input module 1206, an inverse transform module 1208, and/or a windowing module 1212. The inverse transform module 1208 may be adapted to transform the spectral coefficients 1204 into output values 1210. For example, the inverse transform module may transform the spectral coefficients into time-domain output values 1210 using an Inverse Modified Discrete Cosine Transform (IMDCT) that is recursively split into at least one of a class IV inverse discrete cosine transform (IDCT-IV), a class II inverse discrete cosine transform (IDCT-II), or both IDCT-IV and IDCT-II, wherein each such inverse transform has a smaller dimension than the IMDCT, wherein at least some multiplication operations of the IMDCT are merged with a subsequent windowing operation 1212 applied to the output values 1210.
The windowing module 1212 may generate a modified windowing function that combines factors from the transform operation and the windowing function to obtain a piece-wise symmetric windowing factor. For example, the windowing module 1212 may include a merge module 1218, a factor-split module 1219, a storage module 1220, and/or a windowing function 1222. The merge module 1218 may perform a function that merges the factors from the inverse transform operation and the windowing operation to obtain a piece-wise symmetric windowing factor. For example, the cosine factor 406 (FIG. 4) may be combined with other windowing function factors. The factor splitting module 1219 may then obtain a subset of the piece-wise symmetric windowing factors (as illustrated in fig. 9). The storage module 1220 may then store a subset of the piece-wise symmetric windowing factors from which a complete set of piece-wise symmetric windowing factors may be reconstructed. For example, the subset of piece-wise symmetric windowing factors may comprise at least half of the unique factors for each piece-wise symmetric set of windowing factors. The windowing module 1212 (via windowing function 1222) may be further configured to apply the complete set of reconstructed piece-wise symmetric windowing factors to the output values 1210 (thereby obtaining windowed output values 1214) after the transform of the spectral coefficients 1204. The components of the apparatus 1202 may be implemented as hardware, software, and/or combinations thereof. For example, the device 1202 may be a processor and/or circuitry that implements the functionality of its components or modules.
Fig. 13 illustrates an example of a method for decoding a signal using an IMDCT transform based on a core IDCT-II transform. Spectral coefficients 1302 representing an audio signal are received or obtained. The spectral coefficients may be transformed into time-domain output values using an Inverse Modified Discrete Cosine Transform (IMDCT) that is recursively split into at least one of a class IV inverse discrete cosine transform (IDCT-IV), a class II inverse discrete cosine transform (IDCT-II), or both IDCT-IV and IDCT-II, with each such inverse transform having a smaller dimension than the IMDCT, with at least some multiplication operations of the IMDCT being merged 1104 with subsequent windowing operations applied to the output values. The core IDCT-II may be a 5-point inverse transform that implements IMDCTs of different sizes. IMDCT implements at least two of 320-point, 160-point, 80-point, 40-point inverse transforms using the same core IDCT-II. In various embodiments, IDCT-II may include up to five (5) multiply operations and 12 additions or four (4) multiply operations and 13 additions.
Additionally, a modified windowing function 1106 may be generated that combines factors from the transform operation and the windowing operation to obtain piece-wise symmetric windowing factors. A subset of the piece-wise symmetric windowing factors may be stored from which the complete set of piece-wise symmetric windowing factors may be reconstructed 1308. The stored subset of segment-wise symmetric windowing factors may include at least half of the unique factors for each segment-wise symmetric set of windowing factors. The complete set of reconstructed piece-wise symmetric windowing factors may then be applied to the output values 1310 after the spectral coefficients are transformed.
Optionally, for a fixed-point implementation, dynamic range estimation and renormalization 1305 may be performed on the input to the windowing function. The dynamic range estimation and renormalization may be performed after all recursively processed inter-coefficient subtractions in the MDCT to DCT-IV mapping. Renormalization may be performed by shifting all remaining intermediate values (bit-shifting), leaving at least two bits as headroom to prevent overflow in subsequent transform stages. To compensate for the dynamic range extension, all intermediate stages in the IMDCT transform may perform a right shift of their resulting number by one bit.
Storage of segment-by-segment symmetric windowing factors
FIG. 14 is a block diagram illustrating a device for performing a windowing operation. The apparatus 1402 may include a merge module 1404, a factor split module 1405, a storage module 1406, a receiver module 1408, and/or a windowing module 1410. The merge module 1404 may configure or generate a modified windowing function that merges the factor 1412 from the transform stage and the factor 1414 from the windowing stage to obtain a piece-wise symmetric windowing factor 1420. Factor splitting module 1405 may split the complete set of piece-wise symmetric windowing factors 1420 into subsets 1423 of piece-wise symmetric windowing factors. Such splitting of factors is illustrated, for example, in fig. 9. The storage module 1406 may store a subset 1423 of the piece-wise symmetric windowing factors from which a complete set of the piece-wise symmetric windowing factors 1420 may be reconstructed. The receiver module 1408 may receive input values 1416 representing an audio signal. The windowing module 1410 may apply the (reconstructed) complete set of reconstructed piece-wise symmetric windowing factors to the input values 1416 and provide windowed output values 1418. Thus, since only a subset of the windowing factors are stored, this saves storage space and makes the windowing device more efficient. The components of device 1402 may be implemented as hardware, software, and/or combinations thereof. For example, the device 1402 may be a processor and/or circuitry that implements the functionality of a component or module.
FIG. 15 illustrates an example of a method for performing a windowing operation. A modified windowing function 1502 may be generated that combines factors from the transform stage and the windowing stage to obtain piece-wise symmetric windowing factors. The set of piece-wise symmetric windowing factors may be split to obtain a subset of the piece-wise symmetric windowing factors and reduce the total number of unique factors 1504. A subset of the piece-wise symmetric windowing factors is stored from which a complete set of piece-wise symmetric windowing factors 1506 can be reconstructed. An input value 1508 representing an audio signal may be received. The complete set of reconstructed piece-wise symmetric windowing factors may be applied to the input values and provide windowed output values 1510. The subset of piece-wise symmetric windowing factors may include at least half of the unique factors for each piece-wise symmetric set of windowing factors.
In one example, the windowing stage occurs before the transform stage. In such cases, the transform stage may implement a Modified Discrete Cosine Transform (MDCT) that is recursively split into at least one of a class IV discrete cosine transform (DCT IV), a class II discrete cosine transform (DCT II), or both a DCT IV and a DCT II, with each such transform having smaller dimensions than the MDCT. For example, the transform level factor may be the cosine factor of fig. 3.
In another example, the windowing stage may occur after the transform stage. The transform stage may implement an Inverse Modified Discrete Cosine Transform (IMDCT) that is recursively split into at least one of a class IV inverse discrete cosine transform (IDCT IV), a class II inverse discrete cosine transform (IDCT II), or both IDCT IV and IDCT II, with each such transform having smaller dimensions than the IMDCT. For example, the transform level factor may be the cosine factor of fig. 4.
In addition to the examples provided herein, any other transform that is a multiple of two may be implemented using the algorithm described herein that implements a decimation transform. Additionally, it should be noted that the techniques described herein may be applied to various types of signals including audio, voice, video, data, and so on.
Information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, and the like referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
The various illustrative logical blocks, modules, and circuits described herein, and algorithm steps may be implemented or performed in electronic hardware, software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Note that a configuration may be described as a process, which is depicted as a flowchart, a flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. Additionally, the order of the operations may be rearranged. A process is terminated when its operations are completed. A process may correspond to a method, a function, a procedure, a subroutine, etc. When a procedure corresponds to a function, the termination of the procedure corresponds to the return of the function to the calling function or the main function.
When implemented in hardware, the various examples may employ a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device designed to perform the functions described herein, discrete gate or transistor logic, discrete hardware components, or any combination thereof. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
When implemented in software, various examples may employ firmware, middleware, or microcode. The program code or code segments to perform the necessary tasks may be stored in a computer readable medium such as a storage medium or other memory. The processor may perform the necessary tasks. A code segment may represent a program, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, etc.
As used in this application, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity (either hardware, firmware, a combination of hardware and software, or software in execution). For example, a component may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, an application running on a computing device and the computing device can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from a component interacting with another component in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).
In one or more examples herein, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media. Software may comprise a single instruction, or many instructions, and may be distributed over several different code segments, among different programs, and across multiple storage media. An exemplary storage medium may be coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor.
The methods disclosed herein comprise one or more steps or actions for achieving the described method. The method steps and/or actions may be interchanged with one another without departing from the scope of the claims. In other words, unless a specific order of steps or actions is required for proper operation of the described embodiments, the order and/or use of specific steps and/or actions may be modified without departing from the scope of the claims.
One or more of the components, steps, and/or functions illustrated in fig. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, and/or 15 may be rearranged and/or combined into a single component, step, or function or implemented in several components, steps, or functions. Additional elements, components, steps, and/or functions may also be added. The apparatus, devices, and/or components illustrated in fig. 1, 2, 10, 12, and 14 may be configured or adapted to perform one or more of the methods, features, or steps described in fig. 3-9, 11, 13, and 15. For example, the algorithms described herein may be effectively implemented in software and/or embedded hardware.
It should be noted that the foregoing configuration is merely an example and should not be construed as limiting the claims. The description of the configuration is intended to be illustrative, and not to limit the scope of the claims. Also, the teachings of the present disclosure may be readily applied to other types of apparatuses and many alternatives, modifications, and variations will be apparent to those skilled in the art.
Claims (31)
1. A method of computing transform values, comprising:
receiving a time domain input value representing an audio signal;
transforming the input values into spectral coefficients using a Modified Discrete Cosine Transform (MDCT) that is recursively split into at least one of a class IV discrete cosine transform (DCT-IV), a class II discrete cosine transform (DCT-II), or both a DCT-IV and a DCT-II, wherein each such transform has a smaller dimensionality than the MDCT, wherein at least some multiplication operations of the MDCT are merged with prior windowing operations applied to the input values; and
generating a modified windowing function, said modified windowing function combining factors from said transformation operation and said windowing operation to obtain piecewise symmetric windowing factors.
2. The method of claim 1, wherein the DCT-II is a 5-point transform that may implement MDCTs of different sizes.
3. The method of claim 1, wherein the DCT-II is defined by a factorization characterized by a transform:
wherein
4. The method of claim 1, wherein the MDCT implements at least two of 320-point, 160-point, 80-point, 40-point transforms using the same DCT-II.
5. The method of claim 1, wherein the DCT-II comprises a maximum of five multiplication operations.
6. The method of claim 1, further comprising:
storing a subset of the piece-wise symmetric windowing factors from which a complete set of the piece-wise symmetric windowing factors can be reconstructed.
7. The method of claim 6, further comprising:
applying the complete set of reconstructed piece-wise symmetric windowing factors to the input values prior to transforming the input values.
8. The method of claim 6, wherein the subset of the piece-wise symmetric windowing factors comprises at least half of the unique factors for each piece-wise symmetric set of windowing factors.
9. The method of claim 1, further comprising:
dynamic range estimation and renormalization are performed on the output from the windowing function.
10. A device for computing transform values, comprising:
an input module that receives an audio signal and provides a time domain input value representative of the audio signal;
a transform module that transforms the input values into spectral coefficients using a Modified Discrete Cosine Transform (MDCT) that is recursively split into at least one of a class IV discrete cosine transform (DCT-IV), a class II discrete cosine transform (DCT-II), or both DCT-IV and DCT-II, wherein each such transform has a dimension that is smaller than the MDCT, wherein at least some multiplication operations of the MDCT are merged with previous windowing operations applied to the input values; and
a windowing module for generating a modified windowing function that combines factors from the transformation operation and the windowing operation to obtain piecewise symmetric windowing factors.
11. The device of claim 10, wherein the DCT-II is a 5-point transform that implements MDCTs of different sizes.
12. The device of claim 10, wherein the DCT-II is defined by a factorization characterized by a transform:
wherein
13. The device of claim 10, wherein the MDCT implements at least two of 320-point, 160-point, 80-point, 40-point transforms using the same core DCT-II.
14. The device of claim 10, further comprising:
a storage module for storing a subset of the piece-wise symmetric windowing factors from which a complete set of the piece-wise symmetric windowing factors can be reconstructed.
15. The device of claim 14, wherein the windowing module is further configured to apply the complete set of reconstructed piece-wise symmetric windowing factors to the input values prior to transforming the input values.
16. The device of claim 14, wherein the subset of the piece-wise symmetric windowing factors comprises at least half of the unique factors for each piece-wise symmetric set of windowing factors.
17. A device for computing transform values, comprising:
means for receiving a time domain input value representing an audio signal;
means for transforming the input values into spectral coefficients using a Modified Discrete Cosine Transform (MDCT) that is recursively split into at least one of a class IV discrete cosine transform (DCT-IV), a class II discrete cosine transform (DCT-II), or both DCT-IV and DCT-II, wherein each such transform has a dimension that is smaller than the MDCT, wherein at least some multiplication operations of the MDCT are merged with previous windowing operations applied to the input values; and
means for generating a modified windowing function that combines factors from the transformation operation and the windowing operation to obtain piecewise symmetric windowing factors.
18. The device of claim 17, wherein the DCT-II is a 5-point transform that may implement MDCTs of different sizes.
19. The device of claim 17, further comprising:
means for storing a subset of the piece-wise symmetric windowing factors from which a complete set of the piece-wise symmetric windowing factors can be reconstructed; and
means for applying the complete set of reconstructed piece-wise symmetric windowing factors to the input values prior to transforming the input values.
20. A method of providing a decoder, comprising:
receiving spectral coefficients representing an audio signal;
transforming the spectral coefficients into time-domain output values using an Inverse Modified Discrete Cosine Transform (IMDCT) that is recursively split into at least one of a class IV inverse discrete cosine transform (IDCT-IV), a class II inverse discrete cosine transform (IDCT-II), or both IDCT-IV and IDCT-II, wherein each such inverse transform has a smaller dimension than the IMDCT, wherein at least some multiplication operations of the IMDCT are combined with subsequent windowing operations applied to the output values; and
generating a modified windowing function, said modified windowing function combining factors from said transformation operation and said windowing operation to obtain piecewise symmetric windowing factors.
21. The method of claim 20, wherein the IDCT-II is a 5-point inverse transform that implements IMDCTs of different sizes.
22. The method of claim 20, wherein the IDCT-II is defined by a factorization characterized by a transform:
wherein
23. The method of claim 20, wherein the IMDCT implements at least two of 320-point, 160-point, 80-point, 40-point inverse transforms using the same core IDCT-II.
24. The method of claim 20, further comprising:
storing a subset of the piece-wise symmetric windowing factors from which a complete set of the piece-wise symmetric windowing factors can be reconstructed.
25. The method of claim 24, further comprising:
applying the complete set of reconstructed piece-wise symmetric windowing factors to the output values after transforming the spectral coefficients.
26. The method of claim 24, wherein the subset of the piece-wise symmetric windowing factors comprises at least half of the unique factors for each piece-wise symmetric set of windowing factors.
27. The method of claim 24, further comprising:
dynamic range estimation and renormalization are performed on the output from the windowing function.
28. A device for computing transform values, comprising:
an input module for receiving spectral coefficients representing an audio signal;
an inverse transform module for transforming the spectral coefficients into time-domain output values using an inverse modified discrete cosine transform, IMDCT, that is recursively split into at least one of class IV inverse discrete cosine transform, IDCT-IV, class II inverse discrete cosine transform, IDCT-II, or both IDCT-IV and IDCT-II, wherein each such inverse transform has a smaller dimension than the IMDCT, wherein at least some multiplication operations of the IMDCT are combined with subsequent windowing operations applied to the output values; and
a merging module for generating a modified windowing function that merges factors from the transformation operation and the windowing operation to obtain piecewise symmetric windowing factors.
29. The device of claim 28, wherein the IDCT-II is a 5-point inverse transform that implements IMDCTs of different sizes.
30. The device of claim 28, further comprising:
a storage module for storing a subset of the piece-wise symmetric windowing factors from which a complete set of the piece-wise symmetric windowing factors can be reconstructed; and
a windowing module that applies the complete set of reconstructed piece-wise symmetric windowing factors to the output values after transforming the spectral coefficients.
31. A device for computing transform values, comprising:
means for receiving spectral coefficients representing an audio signal;
means for transforming the spectral coefficients into time-domain output values using an Inverse Modified Discrete Cosine Transform (IMDCT) that is recursively split into at least one of a class IV inverse discrete cosine transform (IDCT-IV), a class II inverse discrete cosine transform (IDCT-II), or both IDCT-IV and IDCT-II, wherein each such inverse transform has a smaller dimension than the IMDCT, wherein at least some multiplication operations of the IMDCT are combined with subsequent windowing operations applied to the output values; and
means for generating a modified windowing function that combines factors from the transformation operation and the windowing operation to obtain piecewise symmetric windowing factors.
Applications Claiming Priority (7)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US97370907P | 2007-09-19 | 2007-09-19 | |
| US60/973,709 | 2007-09-19 | ||
| US98940007P | 2007-11-20 | 2007-11-20 | |
| US60/989,400 | 2007-11-20 | ||
| US12/212,920 | 2008-09-18 | ||
| US12/212,920 US8548815B2 (en) | 2007-09-19 | 2008-09-18 | Efficient design of MDCT / IMDCT filterbanks for speech and audio coding applications |
| PCT/US2008/077129 WO2009039451A2 (en) | 2007-09-19 | 2008-09-19 | Efficient design of mdct / imdct filterbanks for speech and audio coding applications |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1143240A1 HK1143240A1 (en) | 2010-12-24 |
| HK1143240B true HK1143240B (en) | 2014-01-17 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US8548815B2 (en) | Efficient design of MDCT / IMDCT filterbanks for speech and audio coding applications | |
| US8631060B2 (en) | Fast algorithms for computation of 5-point DCT-II, DCT-IV, and DST-IV, and architectures | |
| US8392200B2 (en) | Low complexity spectral band replication (SBR) filterbanks | |
| WO2005073959A1 (en) | Audio signal decoding using complex-valued data | |
| JP2004531151A (en) | Method and apparatus for processing time discrete audio sample values | |
| TWI420511B (en) | Method, device, and circuit of providing an analysis filterbank and a synthesis filterbank, and machine-readable medium | |
| RU2451998C2 (en) | Efficient design of mdct/imdct filterbank for speech and audio coding applications | |
| HK1143240B (en) | Efficient design of mdct / imdct filterbanks for speech and audio coding applications | |
| JP7275217B2 (en) | Apparatus and audio signal processor, audio decoder, audio encoder, method and computer program for providing a processed audio signal representation | |
| HK1145563A (en) | Fast algorithms for computation of 5-point dct-ii, dct-iv, and dst-iv, and architectures |