HK1067222B - Apparatus and method for reducing memory requirements of a codebook search - Google Patents
Apparatus and method for reducing memory requirements of a codebook search Download PDFInfo
- Publication number
- HK1067222B HK1067222B HK04110238.7A HK04110238A HK1067222B HK 1067222 B HK1067222 B HK 1067222B HK 04110238 A HK04110238 A HK 04110238A HK 1067222 B HK1067222 B HK 1067222B
- Authority
- HK
- Hong Kong
- Prior art keywords
- pulse
- vector
- cross
- codebook
- correlation
- Prior art date
Links
Description
Technical Field
The present invention relates generally to communication systems, and more particularly to speech processing within communication systems.
Background
The field of wireless communications has wide application including, for example, cordless telephones, paging, wireless local loops, Personal Digital Assistants (PDAs), internet telephony, and satellite communication systems. One particularly important application is cellular (cellular) mobile telephone systems provided for mobile subscribers. As used herein, the term "cellular" system encompasses cellular and Personal Communications Service (PCS) frequencies. Various over-the-air interfaces have been developed for such mobile telephone systems including, for example, Frequency Division Multiple Access (FDMA), Time Division Multiple Access (TDMA), and Code Division Multiple Access (CDMA) systems. Various national and international standards have been established in connection therewith, including, for example, "advanced mobile phone service" (AMPS), "global system for mobile" (GSM), and "interim standard 95" (IS-95). In particular, the "Telecommunications industry Association" (TIA) and other well-known standards bodies have published IS-95 and its derivatives IS-95A, IS-95B, ANSI J-STD-008 (often collectively referred to herein as "IS-95"), and proposed high data rate systems for data, and the like.
Mobile telephone systems configured in accordance with the use of the IS-95 standard employ CDMA signal processing techniques in order to provide very efficient and robust mobile telephone service. Exemplary mobile telephone systems configured substantially in accordance with the use of the IS-95 standard are described in U.S. patent nos. 5,103,459 and 4,901,307, which are assigned to the assignee of the present invention and are incorporated herein by reference. An exemplary system utilizing CDMA techniques is the "CDMA 2000 ITU-R Radio Transmission Technology (RTT) candidate proposal," issued by the TIA (referred to herein as "CDMA 2000"). The standard for cdma2000 IS provided in the draft version of IS-2000 and IS approved by the TIA. The Cdma2000 proposal IS compatible with IS-95 systems in many respects. E.g. documents with numbers 3G TS 25.211, 3G TS 25.212, 3G TS 25.213 and 3G TS 25.2143 rd generation partnership project "3 GPP"Another CDMA standard embodied in this publication is the W-CDMA standard.
With the development of digital communication systems, there is a continuing need to efficiently use frequencies. One method for improving system efficiency is: the compressed signal is transmitted. In conventional landline telephone systems, a sampling rate of 64 kilobits per second (kbps) is used to reconstruct the quality of the analog voice signal during digital transmission. However, by using compression techniques that exploit redundancy of the speech signal, the amount of information transmitted over the air can be reduced while still maintaining high quality.
Typically, the conversion from an analog sound signal to a digital signal is performed by an encoder, while the conversion of the digital signal back to a sound signal is performed by a decoder. In an exemplary CDMA system, a vocoder comprising an encoding portion and a decoding portion is located within a remote station and a base station. An exemplary vocoder is described in U.S. patent No. 5,414,796, entitled variable rate vocoder, which is assigned to the assignee of the present invention and is incorporated herein by reference. In a vocoder, the encoding portion extracts parameters relating to a model of human speech generation. The decoding section re-synthesizes the speech using the parameters received on the transmission path. The model is constantly changing in order to accurately model the time-varying speech signal. Thus, the speech is divided into blocks of time or analysis frames during which the parameters are calculated. These parameters are then updated for each new frame. As used herein, the word "decoder" refers to any device or any portion of a device that may be used to convert a digital signal that has been received over a transmission medium. The word "encoder" refers to any device or any portion of a device that can be used to convert an acoustic signal into a digital signal. Thus, the various embodiments described herein may be performed using a vocoder of a CDMA system or, alternatively, an encoder and decoder of a non-CDMA system.
Among the various speech coder classes, the "code excited linear predictive coding" (CELP) coder, the "random coding" coder or the "vector excited speech coding" coder belong to one class. An example of this particular class of encoding algorithms IS described in "interim standard 127" (IS-127) entitled Enhanced Variable Rate Coder (EVRC). Another example of this particular class of encoder is described in the pending proposal draft alternate mode vocoder service option for wideband spread spectrum communication systems (document number 3GPP2 c.p 9001). The vocoder functions as: the digitized speech signal is compressed into a low bit rate signal by removing all natural redundancies inherent in speech. In a CELP encoder, a short-term formant (or LPC) filter is used to remove redundancy. Once these redundancies are removed, the resulting residual signal can be modeled as either white Gaussian noise (white Gaussian noise) or white periodic signal (white periodic signal), which must also be encoded. Thus, by using speech analysis, followed by appropriate encoding, transmission and re-synthesis at the receiver, the data rate can be greatly reduced.
The encoding parameters for a given speech frame are determined by first determining the coefficients of a Linear Predictive Coding (LPC) filter. Proper selection of the coefficients will remove short-term redundancy of the speech signal in the frame. By determining the pitch lag L and pitch gain g of the speech signalpLong-term periodic redundancy in the signal can be removed. Possible combinations of pitch lag values and pitch gain values are stored as vectors in the adaptive codebook. Then, an excitation signal is selected from a number of waveforms stored in an excitation waveform codebook. If this appropriate excitation signal is excited by a given pitch lag and pitch gain and then input to the LPC filter, a signal very close to the original speech can be generated. Thus, by transmitting the LPC filter coefficients, the identity of the adaptive codebook vector, and the identity of the fixed codebook excitation vector, compressed speech transmission may be performed.
The effective excitation codebook structure is called an "algebraic codebook". The actual structure of algebraic codebooks is well known in the art and is described in the paper "fast CELP coding based on algebraic coding" (ICASSP proceedings of 4, 6-9, 1987) by J.P.Adoul et al. The use of algebraic coding is further disclosed in us patent No. 5,444,816 entitled "dynamic codebook for efficient speech coding based on algebraic coding", the disclosure of which is incorporated by reference.
Due to the intensive computational and memory requirements for performing codebook searches on optimal excitation vectors, it is often desirable to reduce the memory requirements involved in performing codebook searches.
Disclosure of Invention
Novel methods and apparatus for performing fast code-vector search in an encoder are presented. In one aspect, a method is presented for reducing the storage requirements needed to search a codebook for vectors.
In another aspect, an apparatus for selecting an optimal pulse vector from a pulse vector codebook is presented, wherein a linear prediction encoder uses the optimal pulse vector to encode a residual waveform. The device includes: an impulse response generator for generating an impulse response; a cross-correlation element configured to determine a cross-correlation vector relating the impulse response to a plurality of samples of the target signal from the filter and to use the cross-correlation vector to determine a plurality of pulse positions so that if the plurality of pulse positions are inserted into the cross-correlation vector, a predetermined number of high cross-correlation values are provided; a pulse codebook generator configured to receive an indication signal from the cross-correlation element indicative of the plurality of pulse positions and to output a plurality of pulse vectors in response to the indication signal, wherein the plurality of pulse vectors is a subset of the pulse vector codebook; and a calculation element for determining an autocorrelation submatrix from the impulse response and the determined pulse position, wherein the autocorrelation submatrix and the cross-correlation vector are used to select an optimal impulse vector from the codebook.
In another aspect, an apparatus for reducing memory requirements for codebook searches is presented. The device includes: an impulse response generator for generating an impulse response signal; a cross-correlation element configured to determine a cross-correlation vector that relates the impulse response signal to a target signal; a selection element configured to receive the cross-correlation vector, identify an optimal set of pulse positions using the cross-correlation vector and generate an indication signal carrying an identification of the optimal set of pulse positions; a pulse codebook generator configured to receive the indication signal from the selection element and generate a plurality of pulse vectors, wherein the plurality of pulse vectors are generated according to the optimal set of pulse positions carried by the indication signal; and a computing element for determining an autocorrelation submatrix from the impulse response signal and the identified set of pulse positions, wherein the autocorrelation submatrix is used in place of the autocorrelation matrix, thereby reducing memory requirements for codebook searches.
In another aspect, a method for selecting a best-fit pulse vector from a plurality of pulse vectors for encoding a residual waveform is presented, the method comprising: determining an optimal set of pulse positions based on a cross-correlation vector between the target signal and the impulse response; determining a plurality of pulse vectors corresponding to the optimal set of pulse positions, wherein the plurality of pulse vectors are smaller than a pulse waveform codebook; calculating an autocorrelation submatrix from only the impulse response and the determined set of impulse positions; determining a plurality of energy values using the autocorrelation submatrix, wherein each energy value corresponds to a pulse vector of the plurality of pulse vectors; and selecting the pulse vector with the highest standard value from the plurality of pulse vectors as the best-fit pulse vector, wherein the highest standard value is determined from the plurality of energy values and the cross-correlation vector.
In another aspect, a method for selecting an optimal pulse vector from a codebook is presented. The method comprises the following steps: determining a cross-correlation vector between the target signal and the impulse response, wherein each component of the cross-correlation vector corresponds to a location in the analysis frame; determining P pulse positions corresponding to the P largest components of the cross-correlation vector; selecting a plurality of pulse vectors from the codebook to form a subcodebook (subcodebook), wherein each pulse vector of the plurality of pulse vectors corresponds to at least one of the P pulse positions; determining an autocorrelation sub-matrix according to the P pulse positions; and selecting an optimal pulse vector from the plurality of pulse vectors based on the autocorrelation submatrices and the cross-correlation vectors.
In another aspect, an apparatus for selecting an optimal pulse vector from a codebook is presented, comprising: means for determining a cross-correlation vector between the target signal and the impulse response, wherein each component of the cross-correlation vector corresponds to a location in the analysis frame; means for determining the P pulse positions corresponding to the P largest components of the cross-correlation vector; means for generating a plurality of pulse vectors from a codebook to form a sub-codebook, wherein each pulse vector of the plurality of pulse vectors corresponds to at least one of the P pulse positions; means for determining an autocorrelation sub-matrix based on the P pulse positions; and means for selecting the best pulse vector from the plurality of pulse vectors based on the autocorrelation submatrices and cross-correlation vectors.
Drawings
Fig. 1 is a block diagram of an exemplary communication system.
Fig. 2 is a block diagram of a conventional apparatus for performing codebook search.
Fig. 3 is a flow chart of method steps for pre-selecting a subset of pulse vectors from a pulse codebook.
Fig. 4 is a block diagram of an apparatus for performing codebook search by pre-selecting and searching a sub-codebook.
FIG. 5 is a block diagram of an apparatus for performing codebook search in an encoder using pitch-enhanced impulse responses.
Fig. 6 is a block diagram of an apparatus for performing codebook search in an encoder using a pitch-enhanced impulse response by pre-selecting and searching a sub-codebook.
FIG. 7 is a flowchart of method steps for performing a fast codebook search by using a look-up table.
Detailed Description
As shown in fig. 1, a wireless communication network 10 typically includes a plurality of remote stations (also referred to as "mobile stations" or "subscriber units" or "user equipment") 12a-12d, a plurality of base stations (also referred to as "base station transceivers (BTSs)" or "node bs") 14a-14c, a Base Station Controller (BSC) (also referred to as a "radio network controller" or "packet control function 16"), a Mobile Switching Center (MSC) or switch 18, a Packet Data Serving Node (PDSN) or internetworking function (IWF)20, a Public Switched Telephone Network (PSTN)22 (typically a telephone company), and an "internet protocol" (IP) network 24 (typically the internet). For simplicity, four remote stations 12a-12d, three base stations 14a-14c, one BSC 16, one MSC 18, and one PDSN 20 are shown. Those skilled in the art will understand that: there may be any number of remote stations 12, base stations 14, BSCs 16, MSCs 18, and PDSNs 20.
In one embodiment, the wireless communication network 10 is a packet data service network. The remote stations 12a-12d may be any of a number of different types of wireless communication devices, such as a portable telephone, a mobile telephone connected to a portable computer running an IP-based Web browser application, a mobile telephone with associated hands-free car kits, a Personal Data Assistant (PDA) running an IP-based Web browser application, a wireless communication module incorporated into a portable computer, or a fixed location communication module such as might be found in a wireless local loop or meter reading system. In the most general embodiment, the remote station may be any type of communication unit.
The remote stations 12a-12d may be configured to perform one or more wireless packet data protocols such as those described in the EIA/TIA/IS-707 standard. In particular embodiments, the remote stations 12a-12d generate IP packets destined for the IP network 24 and encapsulate the IP packets into frames using a Point-to-Point protocol (PPP).
In one embodiment, the IP network 24 is coupled to the PDSN 20, the PDSN 20 is coupled to the MSC 18, the MSC 18 is coupled to the BSC 16 and the PSTN 22, and the BSC 16 is coupled to the base stations 14a-14c, in accordance with any of several known protocols, including, for example, E1, T1, "asynchronous transfer mode" (ATM), IP, "frame Relay", HDSL, ADSL, or xDSL, and via wireline configured for transmission of voice and/or data packets. In another embodiment, the BSC 16 is coupled directly to the PDSN 20, while the MSC 18 is not coupled to the PDSN 20. In another embodimentRemote stations 12a-12d communicate with base stations 14a-14c over an RF interface; to be published as TIA/EIA/IS-2000-2-AGeneration 3 is combined Scheduling 2 "3 GPP 2"And physical layer standard for cdma2000 spread spectrum systems (3 GPP2 document numbered c.p0002-a, TIA PN-4694) (draft, editions release 30) (11/19 1999), which is fully incorporated herein by reference, are defined for this RF interface. In another embodiment, the remote stations 12a-12d communicate with the base stations 14a-14c over an RF interface; in3 rd generation partnership project “3GPP”The RF interface is defined in documents with numbers 3G TS 25.211, 3G TS 25.212, 3G TS 25.213, and 3G TS 25.214.
During typical operation of the wireless communication network 10, the base stations 14a-14c receive sets of reverse link signals from respective remote stations 12a-12d engaged in processing telephone calls, Web browsing, or other data communications, and demodulate the reverse link signals. Each reverse link signal received by a given base station 14a-14c is processed within that base station 14a-14 c. Each base station 14a-14c may communicate with a plurality of remote stations 12a-12d by modulating and transmitting sets of forward link signals to the remote stations 12a-12 d. For example, as shown in FIG. 1, the base station 14a communicates with both a first remote station 12a and a second remote station 12b, and the base station 14c communicates with both a third remote station 12c and a fourth remote station 12 d. The resulting packets are sent to the BSC 16, and the BSC 16 will provide call resource allocation and mobility management functions (including direction of soft handoffs (soft handoffs) for calls provided from one base station 14a-14c to another base station 14a-14c for a particular remote station 12a-12 d). For example, the remote station 12c is communicating with two base stations 14a and 14c simultaneously. Finally, when the remote station 12c is far enough away from one of the base stations 14c, the call will be handed off to the other base station 14 b.
If the transmission is a conventional telephone call, the BSC 16 routes the received data to the MSC 18, and the MSC 18 provides additional routing services for interfacing with the PSTN 22. If the transmission is a packet-based transmission (e.g., a data call destined for the IP network 24), the MSC 18 will route the data packets to the PDSN 20, and the PDSN 20 will send the packets to the IP network 24. Alternatively, the BSC 16 will route the packets directly to the PDSN 20, and the PDSN 20 sends the packets to the IP network 24.
As described above, the speech signal may be segmented into frames and then modeled using LPC filter coefficients, adaptive codebook vectors, and fixed codebook vectors. In order to create an optimal model of the speech signal, the difference between the actual speech and the reconstructed speech must be minimal. One technique for determining whether the difference is minimal is: a correlation value between the actual speech and the reconstructed speech is determined, and then a set of components having the largest correlation properties is selected.
Reducing memory requirements for encoders that do not use pitch enhancement
Fig. 2 is a block diagram of an apparatus in a conventional encoder for selecting a best excitation vector from a codebook. Such an encoder is designed to minimize the computational complexity involved in searching a waveform codebook by convolving the input signal with the impulse response of a filter, which complexity is further increased by the need to search through multiple waveforms in order to determine which waveform will result in the closest match to the target signal. The storage requirement for the convolution is M × M, where M is the size of the analysis frame.
The speech samples s (n) of a frame are filtered by perceptual weight filter 230 to produce target signal x (n). The design and implementation of perceptual weight filters is described in the aforementioned U.S. patent No. 5,414,796. The impulse response generator 210 generates an impulse response h (n). By using the impulse response h (n) and the target signal x (n) and according to the following relation, a cross-correlation vector d (n) is generated at the calculation element 290:
for j 0 to M-1
The computation element 250 also generates an autocorrelation matrix using the impulse response h (n):
for i is more than or equal to j
Each entry of the autocorrelation matrix phi is sent to the computation element 240. Pulse codebook generator 200 generates a plurality of pulse vectors { c }k,k=1,…,CBsizeThese pulse vectors are also input to the calculation element 240. CB (CB)sizeIs the size of the codebook from which the best codebook vector is to be selected. N is a radical ofpIs a value representing the number of pulses in the pulse vector. Can respond to a plurality of pulse position signals pi k,i=0,….,Np-1 (not shown in the figures) to generate an excitation waveform codebook (alternatively referred to herein as a "pulse waveform codebook" or "pulse codebook"), wherein p isi kIs a pulse vector ckThe position of the ith unit pulse in (1). For each pulse pi kWill correspond to the symbol si kIs assigned to the pulse. The following equation provides the resulting code vector ck:
0≤j≤M-1
The calculation element 240 filters the pulse vectors using the autocorrelation matrix phi according to the following equation:
the pulse vector c is also used by the calculation element 290 according to the following equationk,k=1,…,CBsizeDetermine the cross-correlation between d (n) and ck (n):
once E is knownyyAnd ExyThe calculation element 260 determines the value T using the following relationshipk:
And TkIs selected as the best vector to encode the residual waveform.
The various embodiments described herein may be used to reduce the storage requirements of the above scheme. The embodiments described herein do make any codebook search computationally more efficient. In one embodiment, the number of computations required to select the best codebook vector is reduced by a step of: preselecting a subset of pulse vectors from a complete codebook, and then performing a search only on the preselected subset. In one embodiment, the pre-selection is determined by a cross-correlation vector d (n). If pre-selected, a correspondingly smaller autocorrelation matrix φ is used to determine the energy values Eyy. The use of a small, incomplete autocorrelation matrix phi may seem undesirable to those of ordinary skill in the art because computationally efficient methods of utilizing recursion may not be used. Recursion typically relies on past values to calculate future values. Deliberately omitting certain values in the recursion will lead to undesirable results.
However, embodiments herein require the use of smaller autocorrelation matrices to reduce the memory requirements of codebook searches at the expense of the ability to use recursion in the computation. When the size of the pre-selected subset is small, the gains in memory reduction far outweigh the cost of increased computational complexity.
Fig. 3 is a flow diagram of an embodiment in which a subset of pulse vectors are preselected from the pulse codebook. In step 300, a cross-correlation vector d (n) is determined for 0 ≦ n ≦ M-1, where M is the dimension of the vector, which corresponds to the length of the analysis frame. In step 302, P (such that P < M) positions in a target signal of length M are selected based on the P highest values of the vector d (n) (0. ltoreq. n. ltoreq.M-1). For purposes of illustration, these pre-selected sets of pulse positions are denoted by P'. Let p 'be denoted by a symbol for further convenience'i kBecomes a pulse vector ckTo p'i kBelonging to the set P'. In addition, let P '(i) (0 ≦ i ≦ P-1) represent each element in set P'. For example, in a frame of size M80, P20 positions (P '(i), 0 ≦ i ≦ 19) in the frame may be selected in advance so that d (P' (i)) is within the highest 20 values of d (n) (0 ≦ n ≦ 79).
In step 304, a plurality of codevectors are selected from the codebook based on whether they contain only pulses at P' (i) (0 ≦ i ≦ P-1). In step 306, a submatrix φ' of size P is determined according to the following equation:
0≤i,j≤P-1
in step 308, an autocorrelation submatrix is used to determine the energy term E for the pulse vectors in the subcodebookyy. No energy determination need be performed for the unselected pulse vectors in the codebook. In step 310, a criterion value T is determined for each pulse vector of the subcodebookk. In step 312, with TkThe pulse vector of the sub-codebook corresponding to the maximum value of (a) is selected as the best pulse vector for encoding the speech signal. The various method steps described herein may be interchanged without affecting the scope of the embodiments described herein.
By using the above-described embodiments, the memory space required for codebook vector search is reduced from (M × M) to (P × P). For example, if the analysis frame is 80 samples long, then when the sub-codebook is selected based on 20 pulse positions, the requirement for 80 × 80 — 6400 positions of the analysis frame is reduced to only 20 × 20 — 400. The choice of P is an implementation detail that may vary depending on the memory constraints of the encoder in which the embodiments are implemented. Thus, the range of possible values for P may vary from 1 to M.
FIG. 4 is an apparatus configured to: the codebook search is performed by pre-selecting and searching a sub-codebook. The speech samples s (n) of a frame are filtered by perceptual weighting filter 430 to produce a target signal x (n). The impulse response generator 410 generates an impulse response h (n). By using the impulse response h (n) and the target signal x (n) and according to the following relation, a cross-correlation vector d (n) is generated at the calculation element 415:
for j 0 to M-1
Using the pulse vectors generated by pulse codebook generator 400, selection element 425 determines pulse positions P '(i) (0 ≦ i ≦ P-1) for which d (P' (i)) has P maxima for d (n). The calculation element 435 uses the pulse position p' (i) to determine a cross-correlation value (E) according to the following equationxy’)2:
It should be noted that the number of pulses is still NpHowever, these pulse positions are only taken from the set P'.
In one embodiment, the cross-correlation element 490 is configured to perform the functions of the computation elements 415, 435 and the selection element 425. In another embodiment, the apparatus may be configured such that the function of the selection element 425 is performed by a component separate from the component that performs the function of the computation elements 415, 435. Many component configurations may be provided within the device without affecting the scope of the embodiments described herein.
The calculation element 450 further uses the pulse position P ' (i) to determine an autocorrelation sub-matrix phi ' of dimension P x P, and the pulse codebook generator 400 further uses the pulse position P ' (i) to determine search parameters for the sub-codebook.
The calculation element 450 generates the autocorrelation submatrix phi ' using the pulse position p ' (i) ' and the impulse response h (n) according to the following equation:
0≤i,j≤P-1
the entries of the autocorrelation submatrix phi' are sent to the computation element 440.
Responsive to a plurality of pulse position signals { p'i k,i=0,….,Np-1} a pulse subcodebook, p'i kIs a pulse vector ckOf the ith unit pulse, so that p'i kIs an element of the set P'. N is a radical ofpIs a value representing the number of pulses in the pulse vector. Pulse codebook generator 400 generates a plurality of pulse vectors ck,k=1,…,CB1sizeWherein, as a result of the pre-selection, CB1sizeLess than CBsize。
The calculation element 440 filters the pulse vectors using the autocorrelation submatrix phi' according to the following equation:
the computation element 490 also uses the pulse vector ck,k=1,…,CB1size) To determine d (n) and c as described abovek(n) cross-correlation between the two.
Once E is knownyyAnd ExyThe calculation element 460 determines the value T using the following relationshipk:
And TkIs selected as the best vector to encode the residual waveform. In one embodiment, the pulse positions are not indexed for all positions in the frame during the search for the best codebook vector. Instead, by only pre-selectionPosition to index these pulse positions.
In another embodiment, a single processor and memory may be configured to perform all of the functions of the individual components of FIG. 4.
Reducing storage requirements of an encoder using pitch enhancement
In new generation coders, such as "enhanced variable rate multimedia digital signal codec" (EVRC) and "selectable mode vocoder" (SMV), the pitch periodicity contribution of these codebook pulses is enhanced by adding gain-adjusted forward and backward pitch sharpening processes to the analysis frame of the speech signal.
An example of pitch sharpening is to form a composite impulse response from h (n) according to the following relationship
Where P is the number of pitch lags (all or part) of length L contained in the subframe (subframe), L is the pitch lag, gpIs the pitch gain.
FIG. 5 is a block diagram of an apparatus for searching an excitation codebook in which the impulse response of the filter has been pitch enhanced. The speech samples s (n) of a frame are filtered by perceptual weighting filter 530 to produce a target signal x (n). The impulse response generator 510 generates an impulse response h (n). The impulse responses h (n) are input to the pitch sharpener element 570 and a composite impulse response is generatedWill synthesize the impulse responseAnd target signal x (n) are input to the calculation element 590 to determine the crossover according to the following relationshipCorrelation vector d (n):
for j 0 to M-1
The computational element 550 also uses the composite impulse responseGenerating an autocorrelation matrix:
for i is more than or equal to j
The individual entries of the autocorrelation matrix phi are sent to a calculation element 540. Pulse codebook generator 500 generates a plurality of pulse vectors { c }k,k=1,…,CBsizeThese pulse vectors are also input to the calculation element 540. CB (CB)sizeIs the size of the codebook from which the best codebook vector is to be selected. N is a radical ofpIs a value representing the number of pulses in the pulse vector. The autocorrelation matrix is used by the calculation element 540 to filter the pulse vectors according to the following equation:
the computation element 590 also uses the pulse vector ck,k=1,…,CBsizeTo determine the cross-correlation between d (n) and ck (n) according to the following formula:
once E is knownyyAnd ExyThe calculation element 560 determines the value T using the following relationshipk:
And TkIs selected as the best vector to encode the residual waveform.
FIG. 6 is a block diagram of an apparatus that will perform fast codebook search of an encoder that incorporates pitch enhancement in the impulse response. The speech samples s (n) of a frame are filtered by perceptual weighting filter 630 to produce a target signal x (n). The impulse response generator 610 generates an impulse response h (n). The impulse responses h (n) are input to the pitch sharpener element 670 and a composite impulse response is generatedWill synthesize the impulse responseAnd target signal x (n) are input to the calculation element 615 to determine the cross-correlation vector d (n) according to the following relationship:
for j 0 to M-1
Using the pulse vectors generated by pulse codebook generator 600, selection element 625 determines pulse positions P '(i) (0 ≦ i ≦ P-1) for which d (P' (i)) has P maxima for d (n). The pulse position p' (i) is used by the calculation element 635 to determine a cross-correlation value (E) according to the following formulaxy’)2:
In one embodiment, the cross-correlation element 690 is configured to perform the functions of the computation elements 615, 635 and the selection element 625. In another embodiment, the apparatus may be configured such that the function of the selection element 625 is performed by a component separate from the component performing the function of the computing elements 615, 635. Many component configurations may be provided within the device without affecting the scope of the various embodiments described herein.
The computing element 650 further uses the pulsesThe position P ' (i) determines the autocorrelation sub-matrix phi ' of dimension P x P and pulse codebook generator 600 further uses the pulse position P ' (i) to determine the search parameters for that sub-codebook. The calculation element 650 uses the pulse position p' (i) and the composite impulse response according to the following formulaGenerating an autocorrelation submatrix phi':
0≤i,j≤P-1
the entries of the autocorrelation submatrix phi' are sent to the computation element 640.
Responsive to a plurality of pulse position signals { p'i k,i=0,….,Np-1), generating a pulse subcodebook, p'i kIs a pulse vector ckIs p'i kIs an element of the set P'. N is a radical ofpIs a value representing the number of pulses in the pulse vector. Pulse codebook generator 600 generates a plurality of pulse vectors ck,k=1,…,CB1size}。
The calculation element 640 filters these pulse vectors using the autocorrelation submatrix phi' according to the following equation:
the calculation element 635 also uses the pulse vector ck,k=1,…,CB1sizeD (n) and c as described abovekCross-correlation between (n) Exy。
Once E is knownyyAnd ExyThe calculation element 660 determines the value T using the following relationshipk:
And TkIs selected as the best vector to encode the residual waveform. EyyThe advantages of the above calculation are: forward and backward pitch sharpening is added to codebook search without memory intensive computations. Thus, these embodiments convert existing requirements for M memory space to requirements for only P memory space.
Reducing complexity of 2-pulse codebook search
In another embodiment, E is calculated by pre-calculatingyyMatrix instead of autocorrelation matrix phi to reduce 2 pulses (N)p2) complexity of the search. This embodiment is described in comparison to the various embodiments described above for fig. 6, but it should be noted that this embodiment can be practiced separately without undue experimentation. The notation in the description of fig. 6 is used for illustration purposes only.
Fig. 7 is a flow chart illustrating the use of a stored lookup table (rather than being computationally intensive) to determine the best code-vector. In step 700, the pulse of the LPC filter is usedThe impulse response h (n) and the target signal x (n) to determine the cross-correlation vector d (n). In step 702, an energy vector E is determined according to the following formulayy:
Eyy(p′(i),p′(j))=
φ′(p′(i),i)+φ′(p′(j),p′(j))+2c(p′(i))c(p′(i))φ′(p′(i),p′(j)),
Where 0 ≦ i, j ≦ P-1, and the value of φ' (i, j) is calculated according to the following formula:
0≤i,j≤P-1
thus, instead of computing the entire matrix φ ', special entries of the matrix φ' are computed and used to generate the matrix Eyy. In step 704, the stored value E is usedyy(i, j) to perform a search for the best codevector.By using a device with stored EyyLookup tables of values can reduce the complexity of the search because the system no longer needs to add many values of the matrix phi to determine E for each pulse vector being searched in the codebookyyThe value is obtained.
Those skilled in the art will understand that: information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Those skilled in the art will further understand that: the various illustrative logical blocks, modules, circuits, and operational steps described in connection with the embodiments described herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly show this interchangeability of hardware and software, components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The various illustrative logical blocks, modules, and circuits described in connection with the embodiments described herein may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The various steps of a method or operation described in connection with the embodiments described herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art; and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (6)
1. An apparatus for selecting an optimal pulse vector from a pulse vector codebook, wherein a linear prediction coder uses the optimal pulse vector to encode a residual waveform, the apparatus comprising:
an impulse response generator for generating an impulse response;
a cross-correlation element configured to determine a cross-correlation vector relating the impulse response to a plurality of samples of the target signal from the filter and to use the cross-correlation vector to determine a plurality of pulse positions such that if the plurality of pulse positions are inserted into the cross-correlation vector, a predetermined number of high cross-correlation values are provided;
a pulse codebook generator configured to receive an indication signal from the cross-correlation element indicative of the pulse positions and to output a plurality of pulse vectors in response to the indication signal, wherein the plurality of pulse vectors is a subset of the pulse vector codebook; and the number of the first and second groups,
a calculation element for determining an autocorrelation sub-matrix based on the impulse response and the determined pulse position, wherein the optimal impulse vector is selected from the codebook using the autocorrelation sub-matrix and a cross-correlation vector.
2. The apparatus of claim 1, wherein the cross-correlation element comprises:
at least one computing element for determining the cross-correlation vector; and the number of the first and second groups,
a selection element for determining the plurality of pulse positions and for generating the indication signal.
3. An apparatus for reducing memory requirements for codebook searches, comprising:
an impulse response generator for generating an impulse response signal;
a cross-correlation element configured to determine a cross-correlation vector that relates the impulse response signal to a target signal;
a selection element configured to receive the cross-correlation vector, to identify an optimal set of pulse positions using the cross-correlation vector and to generate an indication signal carrying an identification of the optimal set of pulse positions;
a pulse codebook generator configured to receive the indication signal from the selection element and generate a plurality of pulse vectors, wherein the plurality of pulse vectors are generated based on an identification of a set of optimal pulse positions carried by the indication signal; and the number of the first and second groups,
a computing element for determining an autocorrelation submatrix from the impulse response signal and the identified set of pulse positions, wherein the autocorrelation submatrix is used in place of the autocorrelation matrix, thereby reducing the memory requirements of the codebook search.
4. A method for selecting a best-fit pulse vector from a plurality of pulse vectors for encoding a residual waveform, the method comprising:
determining an optimal set of pulse positions based on a cross-correlation vector between the target signal and the impulse response;
determining a plurality of pulse vectors corresponding to the optimal set of pulse positions, wherein the plurality of pulse vectors are smaller than a pulse waveform codebook;
calculating an autocorrelation submatrix from only the impulse response and the determined set of impulse positions;
determining a plurality of energy values using the autocorrelation submatrix, wherein each energy value corresponds to one pulse vector of the plurality of pulse vectors; and
selecting the pulse vector with the highest standard value from the plurality of pulse vectors as the best-fit pulse vector, wherein the highest standard value is determined from the plurality of energy values and the cross-correlation vector.
5. A method for selecting a best pulse vector from a codebook, comprising:
determining a cross-correlation vector between the target signal and the impulse response, wherein each component of the cross-correlation vector corresponds to a location in the analysis frame;
determining P pulse positions corresponding to the P largest components of the cross-correlation vector;
selecting a plurality of pulse vectors from the codebook to form a subcodebook, wherein each pulse vector of the plurality of pulse vectors corresponds to at least one of the P pulse positions;
determining an autocorrelation sub-matrix from the P pulse positions; and the number of the first and second groups,
the best pulse vector is selected from the plurality of pulse vectors based on the autocorrelation submatrices and cross-correlation vectors.
6. An apparatus for selecting a best pulse vector from a codebook, comprising:
means for determining a cross-correlation vector between the target signal and the impulse response, wherein each component of the cross-correlation vector corresponds to a location in the analysis frame;
means for determining the P pulse positions corresponding to the P largest components of the cross-correlation vector;
means for generating a plurality of pulse vectors from a codebook to form a sub-codebook, wherein each pulse vector of the plurality of pulse vectors corresponds to at least one of the P pulse positions;
means for determining an autocorrelation sub-matrix from the P pulse positions; and the number of the first and second groups,
means for selecting the best pulse vector from the plurality of pulse vectors based on the autocorrelation submatrices and cross-correlation vectors.
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US09/876,352 US6789059B2 (en) | 2001-06-06 | 2001-06-06 | Reducing memory requirements of a codebook vector search |
| US09/876,352 | 2001-06-06 | ||
| PCT/US2002/017816 WO2002099788A1 (en) | 2001-06-06 | 2002-06-05 | Reducing memory requirements of a codebook vector search |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1067222A1 HK1067222A1 (en) | 2005-04-01 |
| HK1067222B true HK1067222B (en) | 2008-05-23 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN100336101C (en) | Reducing memory requirements of codebook vector search | |
| CN1306473C (en) | Fast code-vector searching | |
| CN1223989C (en) | Frame Erasure Compensation Method in Variable Rate Speech Coder and Device Using the Method | |
| CN1192356C (en) | Decoding method and system including adaptive post filter | |
| JP5280480B2 (en) | Bandwidth adaptive quantization method and apparatus | |
| CN1655236A (en) | Method and apparatus for predictively quantizing voiced speech | |
| CN1347550A (en) | CELP transcoding | |
| US7698132B2 (en) | Sub-sampled excitation waveform codebooks | |
| CN1145930C (en) | Method and device for linear spectral information quantization method in interleaved speech coder | |
| CN1272939A (en) | Speech coding apparatus and speech decoding apparatus | |
| CN1279510C (en) | Method and apparatus for subsampling phase spectrum information | |
| HK1067222B (en) | Apparatus and method for reducing memory requirements of a codebook search | |
| HK1066901B (en) | Fast code-vector searching apparatus and method | |
| HK1055174B (en) | Frame erasure compensation method in a variable rate speech coder and apparautus using the same |