EP1112625B1 - Method for coding an information signal - Google Patents
Method for coding an information signal Download PDFInfo
- Publication number
- EP1112625B1 EP1112625B1 EP99943854A EP99943854A EP1112625B1 EP 1112625 B1 EP1112625 B1 EP 1112625B1 EP 99943854 A EP99943854 A EP 99943854A EP 99943854 A EP99943854 A EP 99943854A EP 1112625 B1 EP1112625 B1 EP 1112625B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- positions
- pulse
- pulses
- speech
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000000034 method Methods 0.000 title claims description 21
- 230000005236 sound signal Effects 0.000 claims description 4
- 230000001419 dependent effect Effects 0.000 claims description 3
- 230000005284 excitation Effects 0.000 description 11
- 239000013598 vector Substances 0.000 description 9
- 239000011159 matrix material Substances 0.000 description 8
- 238000004891 communication Methods 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 230000004044 response Effects 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/12—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/10—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/10—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation
- G10L19/107—Sparse pulse excitation, e.g. by using algebraic codebook
Definitions
- the present invention relates, in general, to communication systems and, more particularly, to coding information signals in such communication systems.
- CDMA communication systems are well known.
- One exemplary CDMA communication system is the so-called IS-95 which is defined for use in North America by the Telecommunications Industry Association (TIA).
- TIA Telecommunications Industry Association
- EIA Electronic Industries Association
- a variable rate speech codec, and specifically Code Excited Linear Prediction (CELP) codec, for use in communication systems compatible with IS-95 is defined in the document known as IS-127 and titled Enhanced Variable Rate Codec, Speech Service Option 3 for Wideband Spread Spectrum Digital Systems, September 1996. IS-127 is also published by the Electronic Industries Association (EIA), 2001 Eye Street, N.W., Washington, D.C. 20006.
- EIA Electronic Industries Association
- the invention is defined by a method claim 1.
- constraints on position combinations among two or more pulses are implemented. By placing constraints on position combinations, certain combinations of pulses are prohibited which allows the most significant pulses to always be coded, thereby improving speech quality.
- a list of pulse pairs (codebook) which can be indexed using a single, predetermined bit length codeword is produced. The codeword is transmitted to a destination where it is used by a decoder to reconstruct the original information signal.
- a method for coding an information signal comprises the steps of dividing the information signal into blocks and deriving a target signal based on a block of the information signal.
- the method further includes the steps of coding the target signal using pulse positioning techniques based on an error criteria, wherein the allowable positions of a given pulse are dependent on the positions of one or more other pulses, to produce coded pulse positions and transmitting the coded pulse positions to a destination.
- the information signal further comprises a speech signal or an audio signal and a block of the information signals further comprise a frame or a subframe of the information signals.
- the error criteria further comprises a perceptually weighted squared error criteria and the allowable pulse positions are determined using an arbitrary closed-form expression F ( ⁇ ), in which at least one of the conditions within the expression pertain to at least two of the elements within ⁇ .
- FIG. 1 generally depicts a Code Excited Linear Prediction (CELP) decoder 100 as is known in the art.
- CELP Code Excited Linear Prediction
- the excitation sequence or "codevector" c k is generated from a fixed codebook 102 (FCB) using the appropriate codebook index k.
- This signal is scaled using the FCB gain factor ⁇ and combined with a signal E(n) output from an adaptive codebook 104 (ACB) and scaled by a factor ⁇ , which is used to model the long term (or periodic) component of a speech signal (with period r).
- the signal E t (n) which represents the total excitation, is used as the input to the LPC synthesis filter 106, which models the coarse short term spectral shape, commonly referred to as "formants".
- the output of the synthesis filter 106 is then perceptually postfiltered by perceptual postfilter 108 in which the coding distortions are effectively "masked” by amplifying the signal spectra at frequencies that contain high speech energy, and attenuating those frequencies that contain less speech energy. Additionally, the total excitation signal E l (n) is used as the adaptive codebook for the next block of synthesized speech.
- FIG. 2 generally depicts a CELP encoder 200.
- H zs (z) is the "zero state" response of H(z) from filter 206, in which the initial state of H(z) is all zeroes
- H ZIR (z) is the "zero input response" of H(z) from filter 210, in which the previous state of H(z) is allowed to evolve with no input excitation.
- the initial state used for generation of H ZIR ( z ) is derived from the total excitation E t (n) from the previous subframe.
- FCB fixed codebook
- Eq. 4 can also be expressed in vector-matrix form as: min k ⁇ ( x w ⁇ ⁇ k H c k ) T ( x w ⁇ ⁇ k H c k ) ⁇ , 0 ⁇ k ⁇ M , where c k and x w are length L column vectors, H is the L x L zero-state convolution matrix: and T denotes the appropriate vector or matrix transpose.
- index k corresponding to the codevector c k that results in the minimum squared error between the perceptually weighted target signal x w ( n ) and the perceptually weighted excitation signal x ⁇ w ( n ) can be found by maximizing the term in Eq. 12.
- the FCB utilizes a multipulse configuration in which the excitation vector c k contains very few non-zero, unit magnitude values.
- This configuration is known in the art as Algebraic CELP, or ACELP. Since there are very few non-zero elements within c k , the computational complexity involved with Eq. 12 is relatively low.
- an associated "track” defines the allowable positions for each of the three pulses within c k (3 bits per pulse plus 1 bit for composite sign of +, -, + or -, +, -).
- pulse 1 can occupy positions 0, 7, 14, ..., 49
- pulse 2 can occupy positions 2, 9, 16, ..., 51
- pulse 3 can occupy positions 4, 11, 18, ....53. This is known as "interleaved pulse permutation", which is well known in the art.
- the sign bit is then set according to the sign of the gain term ⁇ k .
- Table 1 generally depicts pulse positions defined for IS-127 Rate 1/2.
- the excitation codevector c k can contain "holes" in which certain positions are not represented by the vector space. That is, an optimal match to the target vector may require a pulse at position 12, but the definitions of the pulse positions in Table I does not allow a pulse to be located at that position.
- the constraints on positions may cause the pulse to be placed either at locations close to the optimal position, or worse, the energy of the target signal may be completely missed at that position. This can cause distortion, and possibly audible artifacts in the synthesized speech signal.
- the bit allocation of 16 bits would be divided between the four tracks equally so that each track would receive four bits.
- the four bits per track would further be composed of three bits for position (comprising 8 different positions) and one sign bit to indicate the polarity of the pulse.
- Codeword 11 ⁇ p i 5 ⁇ + ⁇ p j 5 ⁇ , where p i and p j are the positions of the i-th and j -th pulses, and ⁇ x ⁇ represents the largest integer ⁇ x .
- all positions are not adequately represented by the vector space which would allow efficient, low rate coding of pulse positions.
- design of an efficient 16 bit, 4 pulse, 56 position codebook (with all positions representable) is not readily achievable in the prior art.
- a method is presented which allows all pulse positions to be coded, while maintaining the design constraints as presented in the previous example.
- the present invention provides a general flexibility which allows efficient solutions to a wide variety of design constraints.
- the present invention solves the aforementioned problems by placing constraints on position combinations among two or more pulses.
- the allowable positions for a given pulse are jointly dependent on the associated positions of one or more other pulses.
- FIG. 3 where a joint interleaved pulse permutation matrix in accordance with the invention is shown.
- the respective positions of pulse 0 are shown along the horizontal axis, and the positions of pulse 1 are shown along the vertical axis.
- the "forbidden" pulse combinations are designated by the shaded regions while the allowable combinations are unshaded.
- FIG. 4 generally depicts a flow chart describing how the codebook is generated in accordance with the invention.
- the flowchart shows a basic nested loop structure in which all permutations of 0 ⁇ i ⁇ M and 0 ⁇ j ⁇ N are generated.
- N and M are the total number of allowable positions for each pulse.
- This function returns a value of 1 for cases when the absolute value of the difference of i and j is an element of the given set; otherwise, a zero is returned. This is shown in step 403.
- the elements of the given set correspond to the distances between the diagonal shaded elements of FIG. 3, and the expression is therefore sufficient in describing all necessary shaded regions.
- the codebook index k is incremented at step 404, and the process continues until the entire codebook is filled via steps 400-401 and 405-408.
- a similar technique would be used for generating position information for pulses p 2 and p 3 of the given example.
- FIG. 5 generally depicts a joint interleaved pulse permutation matrix for pulses p 2 and p 3 in accordance with the present invention. As shown in FIG.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
- Paper (AREA)
- Control Of El Displays (AREA)
- Control Of Motors That Do Not Use Commutators (AREA)
- Reduction Or Emphasis Of Bandwidth Of Signals (AREA)
Abstract
Description
- The present invention relates, in general, to communication systems and, more particularly, to coding information signals in such communication systems.
- Code-division multiple access (CDMA) communication systems are well known. One exemplary CDMA communication system is the so-called IS-95 which is defined for use in North America by the Telecommunications Industry Association (TIA). For more information on IS-95, see TIA/EIA/IS-95, Mobile Station-Base-station Compatibility Standard for Dual Mode Wideband Spread Spectrum Cellular System, January 1997, published by the Electronic Industries Association (EIA), 2001 Eye Street, N.W., Washington, D.C. 20006. A variable rate speech codec, and specifically Code Excited Linear Prediction (CELP) codec, for use in communication systems compatible with IS-95 is defined in the document known as IS-127 and titled Enhanced Variable Rate Codec, Speech Service
Option 3 for Wideband Spread Spectrum Digital Systems, September 1996. IS-127 is also published by the Electronic Industries Association (EIA), 2001 Eye Street, N.W., Washington, D.C. 20006. - Another example of speech codec is disclosed in the document by Cheng Deguan "An 8 kbls Low Complexity ACELP Speech Codec" in Proceeding of ICSP'96, October 1996, XP 10209596. In this document the linear prediction error signal is encoded using pulses having positions preset in so-colled pulse tracks.
- In modem CELP codecs, there is a problem with maintaining high quality speech reproduction at low bit rates. The problem originates since there are too few bits available to appropriately model the "excitation" sequence or "codevector" which is used as the stimulus to the CELP synthesizer. Thus, a need exists for an improved method and apparatus which overcomes the deficiencies of the prior art.
-
- FIG. 1 generally depicts a CELP decoder as is known in the prior art.
- FIG. 2 generally depicts a Code Excited Linear Prediction (CELP) encoder as is known in the prior art.
- FIG. 3 generally depicts a joint interleaved pulse permutation matrix in accordance with the invention.
- FIG. 4 generally depicts a flow chart describing how the codebook is generated in accordance with the invention.
- FIG. 5 generally depicts a joint interleaved pulse permutation matrix for
3 and 4 in accordance with the present invention.pulses - The invention is defined by a
method claim 1. - Stated generally, to achieve high quality speech reconstruction at low bit rates, constraints on position combinations among two or more pulses are implemented. By placing constraints on position combinations, certain combinations of pulses are prohibited which allows the most significant pulses to always be coded, thereby improving speech quality. After all valid combinations are considered, a list of pulse pairs (codebook) which can be indexed using a single, predetermined bit length codeword is produced. The codeword is transmitted to a destination where it is used by a decoder to reconstruct the original information signal.
- Stated specifically, a method for coding an information signal comprises the steps of dividing the information signal into blocks and deriving a target signal based on a block of the information signal. The method further includes the steps of coding the target signal using pulse positioning techniques based on an error criteria, wherein the allowable positions of a given pulse are dependent on the positions of one or more other pulses, to produce coded pulse positions and transmitting the coded pulse positions to a destination.
- In the preferred embodiment, the information signal further comprises a speech signal or an audio signal and a block of the information signals further comprise a frame or a subframe of the information signals. The error criteria further comprises a perceptually weighted squared error criteria and the allowable pulse positions are determined using an arbitrary closed-form expression F(λ), in which at least one of the conditions within the expression pertain to at least two of the elements within λ.
- FIG. 1 generally depicts a Code Excited Linear Prediction (CELP)
decoder 100 as is known in the art. In modem CELP decoders, there is a problem with maintaining high quality speech reproduction at low bit rates. The problem originates since there are too few bits available to appropriately model the "excitation" sequence or "codevector" c k which is used as the stimulus to theCELP decoder 100. - As shown in FIG. 1, the excitation sequence or "codevector" c k , is generated from a fixed codebook 102 (FCB) using the appropriate codebook index k. This signal is scaled using the FCB gain factor γ and combined with a signal E(n) output from an adaptive codebook 104 (ACB) and scaled by a factor β, which is used to model the long term (or periodic) component of a speech signal (with period r). The signal E t (n), which represents the total excitation, is used as the input to the
LPC synthesis filter 106, which models the coarse short term spectral shape, commonly referred to as "formants". The output of thesynthesis filter 106 is then perceptually postfiltered byperceptual postfilter 108 in which the coding distortions are effectively "masked" by amplifying the signal spectra at frequencies that contain high speech energy, and attenuating those frequencies that contain less speech energy. Additionally, the total excitation signal E l (n) is used as the adaptive codebook for the next block of synthesized speech. - FIG. 2 generally depicts a
CELP encoder 200. WithinCELP encoder 200, the goal is to code the perceptually weighted target signal x w (n), which can be represented in general terms by the z-transform:
where W(z) is the transfer function of theperceptual weighting filter 208, and is of the form: and H(z) is the transfer function of the perceptually 206 and 210, and is of the form:weighted synthesis filters and where A(z) are the unquantized direct form LPC coefficients, A q (z) are the quantized direct form LPC coefficients, and λ1 and λ2 are perceptual weighting coefficients. Additionally, H zs (z) is the "zero state" response of H(z) fromfilter 206, in which the initial state of H(z) is all zeroes, H ZIR (z) is the "zero input response" of H(z) fromfilter 210, in which the previous state of H(z) is allowed to evolve with no input excitation. The initial state used for generation of H ZIR (z) is derived from the total excitation E t (n) from the previous subframe. - To solve for the parameters necessary to generate x w (n), a fixed codebook (FCB) closed loop analysis in accordance with the invention is described. Here, the codebook index k is chosen to minimize the mean square error between the perceptually weighted target signal x w (n) and the perceptually weighted excitation signal x̂ w (n). This can be expressed in time domain form as:
where c k (n) is the codevector corresponding to FCB codebook index k, γ k is the optimal FCB gain associated with codevector c k (n), h(n) is the impulse response of the perceptually weighted synthesis filter H(z), M is the codebook size, L is the subframe length, * denotes the convolution process and x̂ w (n) = γ k c k (n)*h(n). In the preferred embodiment, speech is coded every 20 milliseconds (ms) and each frame includes three subframes of length L. - Eq. 4 can also be expressed in vector-matrix form as:
where c k and x w are length L column vectors, H is the L x L zero-state convolution matrix: and T denotes the appropriate vector or matrix transpose. Eq. 5 can be expanded to: and the optimal codebook gain γ k for codevector c k can be derived by setting the derivative (with respect to γ k ) of the above expression to zero: and then solving for γ k to yield: Substituting this quantity into Eq. 7 produces: Since the first term in Eq. 10 is constant with respect to k, it can be written as: From Eq. 11, it is important to note that much of the computational burden associated with the search can be avoided by precomputing the terms in Eq. 11 which do not depend on k; namely, by letting and Θ =H T H. When this is done. Eq. 11 reduces to: which is equivalent to equation 4.5.7.2-1 of IS-127. The process of precomputing these terms is known as "backward filtering". The result is that the index k corresponding to the codevector c k that results in the minimum squared error between the perceptually weighted target signal x w (n) and the perceptually weighted excitation signal x̂ w (n) can be found by maximizing the term in Eq. 12. - In the IS-127 half rate case (4.0 kbps), the FCB utilizes a multipulse configuration in which the excitation vector c k contains very few non-zero, unit magnitude values. This configuration is known in the art as Algebraic CELP, or ACELP. Since there are very few non-zero elements within c k , the computational complexity involved with Eq. 12 is relatively low. For the IS-127 three "pulse" case, there are only 10 bits allocated for the pulse positions and associated signs for each of the three subframes (of length of L = 53, 53, 54). In this configuration, an associated "track" defines the allowable positions for each of the three pulses within c k (3 bits per pulse plus 1 bit for composite sign of +, -, + or -, +, -). As shown in Table 4.5.7.4-1 of IS-127,
pulse 1 can occupy 0, 7, 14, ..., 49,positions pulse 2 can occupy 2, 9, 16, ..., 51, andpositions pulse 3 can occupy 4, 11, 18, ....53. This is known as "interleaved pulse permutation", which is well known in the art. The positions of the three pulses are optimized jointly so Eq. 12 is executed 83 = 512 times. The sign bit is then set according to the sign of the gain term γ k .positions Table 1 Pulse Positions p0 0 7 14 21 28 35 42 49 p1 2 9 16 23 30 37 44 51 p2 4 11 18 25 32 39 46 53 - Table 1 generally depicts pulse positions defined for IS-127
Rate 1/2. One problem in the above scenario is that the excitation codevector c k can contain "holes" in which certain positions are not represented by the vector space. That is, an optimal match to the target vector may require a pulse atposition 12, but the definitions of the pulse positions in Table I does not allow a pulse to be located at that position. The constraints on positions may cause the pulse to be placed either at locations close to the optimal position, or worse, the energy of the target signal may be completely missed at that position. This can cause distortion, and possibly audible artifacts in the synthesized speech signal. - In a similar example, a design requirement may be to have four pulses with one pulse on each of four separate tracks, with a subframe sizes of L = [53, 53, 54], and a bit allocation of 16 bits per subframe. In this scenario, the tracks would be configured as 4 pulses x 14 positions = 56 total positions, which could be positioned according to the prior art as in Table 2, which depicts examples of pulse positions as used in the prior art. Here, the bit allocation of 16 bits would be divided between the four tracks equally so that each track would receive four bits. The four bits per track would further be composed of three bits for position (comprising 8 different positions) and one sign bit to indicate the polarity of the pulse.
Table 2 Pulse Positions p 0 0 7 14 21 28 35 42 49 p 12 9 16 23 30 37 44 51 p 23 10 17 24 31 38 45 52 p 35 12 19 26 33 40 47 54 - As can be seen from this example, there are still holes in the vector space since all of the pulse positions cannot be adequately represented. One solution would be to allow all fourteen positions to be valid, e.g., the positions of pulse p0 would be [0, 4, 8,..., 52], p1 would be [1, 5, 9,..., 53], etc. The problem with this method is that four bits would be required to encode the position information, thereby violating the 16 bit per subframe requirement (4 tracks x (4 position bits + 1 sign bit) = 20 bits).
- Another method for pulse coding that is known in the prior art deals with multiplexing the indices of two pulses into a single codeword. For example, in the IS-127
Rate 1 case (8.5 kbps), there are 11 possible pulse positions spread over five tracks. Rather than using four bits for each pulse position, the positions of two pulses can be coded jointly using only seven bits. This is accomplished by considering that the total number of positions for two pulses is 11 x 11 = 121, which is less than the total number of positions that can be coded with seven bits (27 = 128). Details of the coding can then be expressed as:
where p i and p j are the positions of the i-th and j-th pulses, and └x┘ represents the largest integer ≤ x. - The pulse positions can then be extracted at the decoder by:
where λi and λj are the decimated positions within the appropriate track, which can be decoded using Table 2, where the value of λ corresponds to the column in the table. The problem with using this method for the 14 position case in Table 2 is that a 14 x 14 = 196 position multiplex would still require 8 bits (28 = 256 possible positions), so there is no savings over simply using four bits per pulse. Clearly, with all of the above prior art methods, all positions are not adequately represented by the vector space which would allow efficient, low rate coding of pulse positions. - As previously mentioned, design of an efficient 16 bit, 4 pulse, 56 position codebook (with all positions representable) is not readily achievable in the prior art. In accordance with the present invention, however, a method is presented which allows all pulse positions to be coded, while maintaining the design constraints as presented in the previous example. In addition, the present invention provides a general flexibility which allows efficient solutions to a wide variety of design constraints.
- The present invention solves the aforementioned problems by placing constraints on position combinations among two or more pulses. For example, the allowable positions for a given pulse are jointly dependent on the associated positions of one or more other pulses. This can be seen for the 14 position track example in FIG. 3, where a joint interleaved pulse permutation matrix in accordance with the invention is shown. In this embodiment, the matrix depicted in FIG. 3 is for
0 and 1, and the subframe length is L=54. In this figure, the respective positions ofpulses pulse 0 are shown along the horizontal axis, and the positions ofpulse 1 are shown along the vertical axis. The "forbidden" pulse combinations are designated by the shaded regions while the allowable combinations are unshaded. As one may notice, the number of unshaded regions is exactly the number of combinations that can be represented by the given number of bits, in thiscase 27 = 128, and the number of shaded regions is exactly the total number of decimated positions ofpulse 0 times the total number of decimated positions ofpulse 1 minus the number of combinations that can be represented by the given number of bits, i.e., (14 x 14) - 128 = 68. - As the various pulse position codevectors are searched (via Eq. 12), when pulse p1 is placed at λ1 = 0 (corresponding to position (0 x 4) + 1 = 1), then the allowable positions for pulse p0 would be [4, 8, 16, 20, 28, 32, 40, 48, 52]. Likewise, when pulse p1 is placed at position 5 (λ1 = 1), the allowable positions for pulse p0 would be [0, 8, 12, 20, 24, 32, 36, 44, 52], and so on. After considering all valid combinations, a 128 x 2 list of pulse pairs (codebook) that can be indexed using a single 7 bit codeword is produced in accordance with the invention. This codeword is suitable for transmission to a destination for decoding and reconstruction. Furthermore, this codebook can be generated algebraically at run time, stored in volatile memory (RAM), or stored in nonvolatile memory (ROM).
- FIG. 4 generally depicts a flow chart describing how the codebook is generated in accordance with the invention. First, the flowchart shows a basic nested loop structure in which all permutations of 0 ≤ i < M and 0 ≤ j < N are generated. In this example, N and M are the total number of allowable positions for each pulse. The decision in the innermost loop simply checks for forbidden combinations [i,j] according to function F(i,j) at
step 402, which in the example of FIG. 3 is described as: This function returns a value of 1 for cases when the absolute value of the difference of i and j is an element of the given set; otherwise, a zero is returned. This is shown instep 403. The elements of the given set correspond to the distances between the diagonal shaded elements of FIG. 3, and the expression is therefore sufficient in describing all necessary shaded regions. For allowed pulse combinations, the respective positions are calculated using the following expression:
where λ is the decimated track position, N tracks is the number of tracks, and n is the track number. Once the codebook entry has been generated atstep 403, the codebook index k is incremented atstep 404, and the process continues until the entire codebook is filled via steps 400-401 and 405-408. A similar technique would be used for generating position information for pulses p2 and p3 of the given example. - Although the previous example shows the forbidden regions to be strict upper left to lower right diagonal, any pattern utilizing 128 unshaded regions is feasible and assumed to be within the scope of the invention. Another aspect of the preferred embodiment is explained as follows: there are 4 x 14 = 56 total possible pulse positions. The length of a subframe, however, is not greater than 54 samples. Therefore, dedicating positions to locations greater than 53 (or 52 for subframes one and two) results in reduced coding efficiency, and thus, degraded quality. FIG. 5 generally depicts a joint interleaved pulse permutation matrix for pulses p2 and p3 in accordance with the present invention. As shown in FIG. 5, the
54 and 55 are omitted by the shaded regions, which allows more combinations to be represented in the valid vector space since the total number of unshaded regions is still 128. This can be observed by comparing the relative spacing between the diagonals in FIG. 3 and FIG. 5, where FIG. 3 has generally two spaces between forbidden diagonals while FIG. 5 has three spaces. The closed form expression for the forbidden combinations of FIG. 5 can be expressed as:positions As one may observe, the example in FIG. 5 is inherently less restrictive and therefore results in higher coding accuracy. - As one skilled in the art will appreciate, it is possible to form upper right to lower left diagonals and a number of various other patterns that may benefit a specific application using the techniques described herein in accordance with the invention. Furthermore, it is possible to extend the dimension of the number of pulses to beyond two so that any closed-form expression F(λ) is allowed, where λ = [λ0,λ1,...,λ n-1] is the vector of candidate pulse positions, and n is the number of pulses.
- While the invention has been particularly shown and described with reference to a particular embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention as defined by the claims.
Claims (4)
- A method for coding a speech or audio signal based on linear prediction comprising the steps of:a) dividing the speech or audio signal into blocks;b) deriving a target signal based on a representation of the difference between a weighted version of said speech or audio signal and a weighted syntherized wherein of said signal derived by linear prediction from a block of the information signal;c) characterized by coding the target signal using pulse positioning techniques based on an error criteria, wherein the allowable positions of a given pulse are dependent on the positions of one or more other pulses, to produce coded pulse positions; andd) transmitting the coded pulse positions to a destination.
- The method in claim 1, wherein a block of the information signals further comprise a frame or a subframe of the information signals.
- The method in claim 1, wherein the error criteria further comprises a perceptually weighted squared error criteria.
- The method in claim 1, wherein the allowable pulse positions are determined using an arbitrary closed-form expression F(λ), in which at least one of the conditions within the expression pertain to at least two of the elements within λ.
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15143098A | 1998-09-11 | 1998-09-11 | |
| US151430 | 1998-09-11 | ||
| PCT/US1999/019217 WO2000016501A1 (en) | 1998-09-11 | 1999-08-24 | Method and apparatus for coding an information signal |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| EP1112625A1 EP1112625A1 (en) | 2001-07-04 |
| EP1112625A4 EP1112625A4 (en) | 2004-06-16 |
| EP1112625B1 true EP1112625B1 (en) | 2006-05-31 |
Family
ID=22538745
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP99943854A Expired - Lifetime EP1112625B1 (en) | 1998-09-11 | 1999-08-24 | Method for coding an information signal |
Country Status (6)
| Country | Link |
|---|---|
| EP (1) | EP1112625B1 (en) |
| JP (1) | JP4460165B2 (en) |
| KR (1) | KR100409167B1 (en) |
| AT (1) | ATE328407T1 (en) |
| DE (1) | DE69931641T2 (en) |
| WO (1) | WO2000016501A1 (en) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6539349B1 (en) * | 2000-02-15 | 2003-03-25 | Lucent Technologies Inc. | Constraining pulse positions in CELP vocoding |
| US7889103B2 (en) * | 2008-03-13 | 2011-02-15 | Motorola Mobility, Inc. | Method and apparatus for low complexity combinatorial coding of signals |
Family Cites Families (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| FR2579356B1 (en) * | 1985-03-22 | 1987-05-07 | Cit Alcatel | LOW-THROUGHPUT CODING METHOD OF MULTI-PULSE EXCITATION SIGNAL SPEECH |
| CA2032520C (en) * | 1989-05-11 | 1996-09-17 | Tor Bjorn Minde | Excitation pulse positioning method in a linear predictive speech coder |
| US5754976A (en) * | 1990-02-23 | 1998-05-19 | Universite De Sherbrooke | Algebraic codebook with signal-selected pulse amplitude/position combinations for fast coding of speech |
| JP3057907B2 (en) * | 1992-06-16 | 2000-07-04 | 松下電器産業株式会社 | Audio coding device |
| KR950011967B1 (en) * | 1992-07-31 | 1995-10-12 | 임홍식 | Memory rearangement device for semiconductor recorder |
| JP3196595B2 (en) * | 1995-09-27 | 2001-08-06 | 日本電気株式会社 | Audio coding device |
| JP4063911B2 (en) * | 1996-02-21 | 2008-03-19 | 松下電器産業株式会社 | Speech encoding device |
| US5970444A (en) * | 1997-03-13 | 1999-10-19 | Nippon Telegraph And Telephone Corporation | Speech coding method |
| US5963897A (en) * | 1998-02-27 | 1999-10-05 | Lernout & Hauspie Speech Products N.V. | Apparatus and method for hybrid excited linear prediction speech encoding |
| JP3180762B2 (en) * | 1998-05-11 | 2001-06-25 | 日本電気株式会社 | Audio encoding device and audio decoding device |
| JP3824810B2 (en) * | 1998-09-01 | 2006-09-20 | 富士通株式会社 | Speech coding method, speech coding apparatus, and speech decoding apparatus |
-
1999
- 1999-08-24 EP EP99943854A patent/EP1112625B1/en not_active Expired - Lifetime
- 1999-08-24 AT AT99943854T patent/ATE328407T1/en not_active IP Right Cessation
- 1999-08-24 JP JP2000570919A patent/JP4460165B2/en not_active Expired - Fee Related
- 1999-08-24 KR KR10-2001-7003129A patent/KR100409167B1/en not_active Expired - Fee Related
- 1999-08-24 WO PCT/US1999/019217 patent/WO2000016501A1/en not_active Ceased
- 1999-08-24 DE DE69931641T patent/DE69931641T2/en not_active Expired - Lifetime
Also Published As
| Publication number | Publication date |
|---|---|
| JP4460165B2 (en) | 2010-05-12 |
| KR100409167B1 (en) | 2003-12-12 |
| EP1112625A1 (en) | 2001-07-04 |
| EP1112625A4 (en) | 2004-06-16 |
| DE69931641D1 (en) | 2006-07-06 |
| JP2002525667A (en) | 2002-08-13 |
| DE69931641T2 (en) | 2006-10-05 |
| ATE328407T1 (en) | 2006-06-15 |
| KR20010073146A (en) | 2001-07-31 |
| WO2000016501A1 (en) | 2000-03-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US6236960B1 (en) | Factorial packing method and apparatus for information coding | |
| US6141638A (en) | Method and apparatus for coding an information signal | |
| US7280959B2 (en) | Indexing pulse positions and signs in algebraic codebooks for coding of wideband signals | |
| DE69928288T2 (en) | CODING PERIODIC LANGUAGE | |
| US6470313B1 (en) | Speech coding | |
| US6055496A (en) | Vector quantization in celp speech coder | |
| EP2805324B1 (en) | System and method for mixed codebook excitation for speech coding | |
| US6865534B1 (en) | Speech and music signal coder/decoder | |
| CA2231925C (en) | Speech coding method | |
| JP3396480B2 (en) | Error protection for multimode speech coders | |
| US6415252B1 (en) | Method and apparatus for coding and decoding speech | |
| EP1112625B1 (en) | Method for coding an information signal | |
| Juan et al. | An 8-kb/s conjugate-structure algebraic CELP (CS-ACELP) speech coding | |
| US20050096903A1 (en) | Method and apparatus for performing harmonic noise weighting in digital speech coders | |
| JP2853170B2 (en) | Audio encoding / decoding system | |
| EP1892701A1 (en) | Injection high frequency noise into pulse excitation for low bit rate celp | |
| Moulsley et al. | Fast vector quantisation using orthogonal codebooks | |
| GB2352949A (en) | Speech coder for communications unit | |
| Saleem et al. | Implementation of Low Complexity CELP Coder and Performance Evaluation in terms of Speech Quality | |
| Ravishankar et al. | Voice Coding Technology for Digital Aeronautical Communications | |
| JPH0291699A (en) | Sound encoding and decoding system | |
| Lee et al. | Encoding of Speech Spectral Parameters Using Adaptive Vector-Scalar Quantization Methods for Mobile Communication Systems | |
| CA2254620A1 (en) | Vocoder with efficient, fault tolerant excitation vector encoding |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| 17P | Request for examination filed |
Effective date: 20010411 |
|
| AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
| A4 | Supplementary search report drawn up and despatched |
Effective date: 20040506 |
|
| RIC1 | Information provided on ipc code assigned before grant |
Ipc: 7G 10L 19/10 B Ipc: 7H 04B 7/216 A |
|
| GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
| RTI1 | Title (correction) |
Free format text: METHOD FOR CODING AN INFORMATION SIGNAL |
|
| GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
| GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
| AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060531 Ref country code: LI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060531 Ref country code: CH Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060531 Ref country code: BE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060531 Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060531 |
|
| REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D Ref country code: CH Ref legal event code: EP |
|
| REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
| REF | Corresponds to: |
Ref document number: 69931641 Country of ref document: DE Date of ref document: 20060706 Kind code of ref document: P |
|
| REG | Reference to a national code |
Ref country code: SE Ref legal event code: TRGR |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20060824 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20060831 Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060831 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: IT Payment date: 20060831 Year of fee payment: 8 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060911 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20061031 |
|
| NLV1 | Nl: lapsed or annulled due to failure to fulfill the requirements of art. 29p and 29m of the patents act | ||
| REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
| ET | Fr: translation filed | ||
| PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
| 26N | No opposition filed |
Effective date: 20070301 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060901 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20060824 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: CY Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060531 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IT Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20070824 |
|
| REG | Reference to a national code |
Ref country code: GB Ref legal event code: 732E Free format text: REGISTERED BETWEEN 20110127 AND 20110202 |
|
| REG | Reference to a national code |
Ref country code: DE Ref legal event code: R081 Ref document number: 69931641 Country of ref document: DE Owner name: MOTOROLA MOBILITY, INC. ( N.D. GES. D. STAATES, US Free format text: FORMER OWNER: MOTOROLA, INC., SCHAUMBURG, ILL., US Effective date: 20110324 |
|
| REG | Reference to a national code |
Ref country code: FR Ref legal event code: TP Owner name: MOTOROLA MOBILITY, INC., US Effective date: 20110912 |
|
| REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 18 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FI Payment date: 20160831 Year of fee payment: 18 Ref country code: GB Payment date: 20160830 Year of fee payment: 18 Ref country code: DE Payment date: 20160826 Year of fee payment: 18 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20160825 Year of fee payment: 18 Ref country code: SE Payment date: 20160829 Year of fee payment: 18 |
|
| REG | Reference to a national code |
Ref country code: GB Ref legal event code: 732E Free format text: REGISTERED BETWEEN 20170831 AND 20170906 |
|
| REG | Reference to a national code |
Ref country code: FR Ref legal event code: TP Owner name: GOOGLE TECHNOLOGY HOLDINGS LLC, US Effective date: 20171214 Ref country code: FR Ref legal event code: CD Owner name: GOOGLE TECHNOLOGY HOLDINGS LLC, US Effective date: 20171214 |
|
| REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 69931641 Country of ref document: DE |
|
| GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20170824 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170825 Ref country code: FI Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170824 |
|
| REG | Reference to a national code |
Ref country code: FR Ref legal event code: ST Effective date: 20180430 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170824 Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20180301 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170831 |
|
| P01 | Opt-out of the competence of the unified patent court (upc) registered |
Effective date: 20230520 |