HK1109950B

HK1109950B - Method and device for low bit rate speech coding

Info

Publication number: HK1109950B
Application number: HK08104262.5A
Authority: HK
Inventors: B‧贝西特
Original assignee: Nokia Technologies Oy
Priority date: 2004-11-03
Filing date: 2005-11-02
Publication date: 2012-09-21

Description

Method and apparatus for low bit rate speech coding

Technical Field

The present invention relates to digital coding of sound signals, in particular but not exclusively speech signals, with a view to transmitting and synthesizing such sound signals. More particularly, the present invention relates to an efficient low bit rate coding method for sound signals based on the code excited linear prediction coding paradigm.

Background

In various application fields such as teleconferencing, multimedia, and wireless communications, the demand for efficient digital narrowband and wideband speech coding techniques with a good balance between subjective quality and bit rate is increasing. Until recently, phones with bandwidth limitations in the 200- & 3400Hz range were used primarily in speech coding applications. In communications, however, broadband voice applications provide enhanced intelligibility and naturalness relative to traditional telephone bandwidth. It has been found that a bandwidth in the range of 50-7000Hz is sufficient for delivering good quality and giving the impression of face-to-face communication. For general audio signals this bandwidth gives an acceptable subjective quality, but is still lower than the quality of FM broadcasts operating at 20-16000Hz, or CDs operating at 20-20000Hz, respectively.

A speech encoder converts speech signals into a digital bit stream that is transmitted over a communication channel or stored in a storage medium. The speech signal is digitized, i.e., sampled and quantized, typically with 16-bits per sample. The speech encoder has the function of representing these digital samples by a small number of bits while maintaining a good subjective speech quality. A speech decoder or synthesizer operates on the transmitted or stored bit stream to convert it back into a sound signal.

Code Excited Linear Prediction (CELP) coding is a well-known technique that allows to obtain a good compromise between subjective quality and bit rate. This coding technique is the basis of many speech coding standards, both in wireless and wired applications. In CELP coding, a sampled speech signal is processed in successive blocks of L samples, usually called frames, where L is a predetermined number, usually corresponding to 10-30 ms. A Linear Prediction (LP) filter is calculated and transmitted every frame. The computation of the LP filter typically requires advancing the speech segment, e.g., 5-15ms, from the subsequent frame. The L-sample frame is divided into smaller blocks called subframes. Typically the number of subframes is three or four, resulting in subframes of 4-10 ms. In each subframe, the excitation signal is typically obtained from two parts, the past excitation and the innovative fixed codebook (codebook) excitation. The portion formed from past excitation is typically referred to as an adaptive codebook or pitch (pitch) excitation. The parameters characterizing the excitation signal are encoded and transmitted to a decoder, where the reconstructed excitation signal is used as input to an LP filter.

In wireless systems using Code Division Multiple Access (CDMA) technology, the use of source-controlled Variable Bit Rate (VBR) speech coding significantly improves the system's capabilities. In source-controlled VBR encoding, the codec operates at several bit rates, and a rate selection module is used to determine the bit rate used to encode each speech frame based on the characteristics of the speech frame (e.g., voiced, unvoiced, transient, background noise). The goal is to achieve the best speech quality at a given average bit rate, also known as the Average Data Rate (ADR). The codec may operate in different modes by tuning the rate selection module to obtain different ADRs in the different modes, wherein the codec performance improves as the ADRs increase. Depending on the channel conditions, the mode of operation is affected by the system. This makes possible a codec with a balancing mechanism between speech quality and system capability.

Typically, in VBR coding for CDMA systems, eighth rate is used for coded frames with no speech activity (silence or noise-only frames). When the frame is a fixed utterance or a fixed unvoiced frame, a half rate or a quarter rate is used according to an operation mode. If half rate is used, CELP mode without pitch codebook is used without voicing and signal modification is used with voicing to increase the period and reduce the number of bits used for pitch indexing. If the mode of operation is limited to quarter rate, waveform matching is generally not possible due to insufficient number of bits, and some parametric coding is typically applied. Full rate is used for start, instant frames and mixed voiced frames (typical CELP mode is usually used). In addition to source-controlled codec operation in CDMA systems, the system may limit the maximum bit rate in certain speech frames in order to send in-band signaling information (known as ambiguity and burst signaling) or in poor channel conditions (such as near cell boundaries) in order to improve the codec robustness. This is called half rate max.

As can be seen from the above description, efficient low bit rate coding (at half rate) is important for VBR coding, so that the average data rate is reduced while maintaining good sound quality, and also to maintain good performance when the codec is forced to operate at maximum half rate.

Disclosure of Invention

The present invention relates to a method for low bit rate CELP coding. The method is suitable for encoding half-rate modes (normal and voiced) in a source-controlled variable rate speech coding system. The above and other problems are overcome, and other advantages are realized, in accordance with the presently described embodiments of these principles.

According to one aspect, the present invention is a method for encoding a speech signal. In the method, a speech signal is divided into a plurality of frames, and at least one of the frames is divided into at least two subframe units. Searching for the fixed codebook contribution and the adaptive codebook contribution is performed for subframe units. At least one subframe unit is selected to be encoded without using a fixed codebook contribution.

According to another embodiment is an encoder. The encoder has a first input coupled to the codebook and a second input for receiving a speech signal. The encoder is operative to search a codebook for a fixed codebook contribution and an adaptive codebook contribution for the received speech signal, and to output the speech signal as a frame comprising at least two subframe units. The encoder encodes at least one subframe unit of the frame without using the fixed codebook contribution.

According to another aspect, the present invention is a program of machine-readable instructions, tangibly embodied on an information bearing medium and executable by a digital data processor, to perform actions directed toward encoding a speech frame. The actions include: the method includes dividing a speech signal into a plurality of frames, and dividing at least one of the plurality of frames into at least two subframe units. The fixed codebook contribution and the adaptive codebook contribution are searched for subframe units. At least one subframe unit is selected to be encoded without using a fixed codebook contribution.

According to another aspect, the invention is an encoding device having means for: the apparatus includes means for dividing a speech signal into a plurality of frames, and means for dividing at least one of the plurality of frames into at least two subframe units. This may be an encoder. The apparatus further has means for: a processor for searching for a fixed codebook contribution and an adaptive codebook contribution for a subframe unit, such as coupled to an encoder and to a computer readable memory storing a codebook. The apparatus further comprises means for: for selecting the at least one subframe unit to be encoded without using the fixed codebook contribution, the selecting means is preferably also the processor.

According to another aspect is a communication system having an encoder and a decoder. The encoder includes: a first input coupled to a codebook; and a second input for receiving a speech signal to be transmitted. The encoder is operative to search the codebook for a fixed codebook contribution and an adaptive codebook contribution for the received speech signal, and to output the speech signal (or at least a portion thereof) as a frame comprising at least two subframe units. The encoder is further operative to encode the at least one subframe unit of the frame without using the fixed codebook contribution. A decoder for a communication system comprising: a first input coupled to a codebook; and a second input for inputting encoded frames of the speech signal received over the channel. The encoded speech frame includes at least two subframe units. The decoder is operative to search the codebook for fixed codebook contributions and adaptive codebook contributions for a received encoded speech frame and to decode at least one subframe unit without fixed codebook contributions.

Further details regarding various embodiments and implementations are described in detail below.

Drawings

The foregoing and other aspects of these principles will become more apparent upon reading the following detailed description in conjunction with the drawings in which:

fig. 1 and 2 are block diagrams of a mobile station and internal elements of the mobile station, respectively, according to embodiments of the present invention.

Fig. 3 is a process flow diagram according to the first embodiment of the present invention.

Fig. 4 is a process flow diagram according to a second embodiment of the present invention.

Detailed Description

The use of source-controlled VBR speech coding significantly improves the capabilities of many communication systems, particularly wireless systems using CDMA technology. In source-controlled VBR encoding, a codec operates at several bit rates and uses a rate selection module to determine the bit rate used to encode each speech frame based on the characteristics of the speech frame (e.g., voiced, unvoiced, transient, background noise). In this regard, reference may be found in commonly owned U.S. patent application No.10/608,943 entitled "Low-sensitivity part Code for multiple Code Rates" filed by Victor Stolpman on 26/6/2003, the contents of which are hereby incorporated by reference. In VBR coding, the goal is to achieve the best speech quality at a given average data rate. The codec may operate at different modes by tuning the rate selection module to obtain different ADRs at the different modes, wherein the codec performance improves as the ADRs increase. In some systems, the mode of operation is limited by the system, depending on the channel conditions. This makes possible a codec with a balancing mechanism between speech quality and system capability.

In a cdma2000 system, two sets of bit rate configurations are defined. In rate set I, the bit rate is: full Rate (FR) at 8.55kbit/s, Half Rate (HR) at 4kbit/s, Quarter Rate (QR) at 2kbit/s, and eighth rate at 0.8 kbit/s. In Rate set II, the bit rates are FR at 13kbit/s, HR at 6.2kbit/s, QR at 2.7kbit/s, and ER at 1 kbit/s.

In an exemplary embodiment of the present invention, the disclosed method for low bit rate encoding is applied to half rate encoding in rate set I operation. In particular, an embodiment is shown whereby the disclosed method is incorporated into a variable bit rate wideband speech codec for encoding normal HR frames and voiced HR frames at 4 kbit/s. This is discussed in particular detail at the beginning of fig. 3.

Fig. 1 shows an exemplary schematic diagram of a mobile station MS 20 in which the present invention is embodied. The present invention may be disposed in any host computer device having a variable rate encoder,whether or not the device is mobile, whether or not the device is coupled to other data network cells. The MS 20 is a hand-held portable device capable of wirelessly accessing a communication network, such as a mobile telephone network coupled to a public switched telephone network base station. A cellular phone,Devices, and Personal Digital Assistants (PDAs) having internet or other two-way communication capabilities, are examples of the MS 20. The portable wireless device includes: mobile stations and other handheld devices such as walkie-talkies, and devices that have access only to a local network such as a Wireless Local Area Network (WLAN) or WIFI network.

The component blocks shown in fig. 1 are functional and the functions described below may or may not be performed by a single physical entity as described with reference to fig. 1. A display driver 22, such as a circuit board for driving a graphical display screen, and an input driver 24, such as for converting input from a user actuating an array of buttons and/or joysticks to electrical signals, are provided with a display screen and button/joystick array (not shown) for interfacing with a user. The input driver 24 may also translate user input at the display screen when the display screen is touch sensitive, as is known in the art. The MS 20 further includes a power source 26, such as a self-contained battery that provides electrical power within the MS 20 to a central processor 28 that controls functions. Such as digital sampling, decimation, interpolation, encoding and decoding, modulation and demodulation, encryption and decryption, spreading and despreading (for CDMA compatible MSs 20), and other signal processing functions known in the art, within the processor 28.

Sound or other audible input is received at a microphone 30, which microphone 30 may be coupled to the processor 28 by a buffer memory 32. Computer programs such as algorithms to modulate, encode and decode, data arrays such as for encoders/decoders (codecs) and look-up tables, etc. are stored in the main memory storage medium 34, which may be as in the prior artElectronic, optical, or magnetic memory storage media for storing computer readable instructions and programs and data are known in the art. The main memory 34 is typically divided into volatile and non-volatile portions, and is typically dispersed among various storage units, some of which are removable. The MS 20 communicates over a network link, such as a mobile telephone link, via one or more antennas 36, the antennas 36 being selectively coupled to a transmitter 40 and a receiver 42 via a T/R switch 38, or a duplex filter. The MS 20 may additionally have a second transmitter and receiver for use in other networks, such as WLAN, WIFI, Bluetooth, etcTo communicate therewith, or to receive digital video broadcasts. Known antenna types include: monopole antennas, dipole antennas, Planar Inverted Folded Antennas (PIFA), and others. The various antennas may be mounted externally first (e.g., a whip antenna) or, as shown, completely within the MS 20 housing. The audible output from the MS 20 is transduced at a speaker 44. Most of the above components, and in particular the processor 28, are disposed on a main wiring board (not shown). Typically, the main wiring board includes a ground plane to which one or more antennas 36 are electrically coupled.

Fig. 2 is a schematic block diagram of processes and circuits performed within, for example, the MS 20 of fig. 1, according to an embodiment of the present invention. The speech signal output from the microphone is digitized at a digitizer and encoded at an encoder 48 using a codebook 50 stored in the memory 34. The codebook or mother code has both fixed and adaptive portions for variable rate coding. The sampler 52 and rate selector 54 obtain the code rate by sampling and interpolation/decimation, or by other means known in the art. The rate between frames may vary as described above. The data is parsed into subframes at block 56, which are divided by type and combined into frames by any of the methods disclosed below. Typically, processor 28 combines the various types of subframes into a single frame in such a way as to minimize error measurements. In some embodiments, this is iterative, in which the processor determines the gain using only the adaptive portion of the codebook 50, applies the gain to one or both subframes in a frame, and applies the gain derived from both the fixed and adaptive codebook portions to other frames. This result is considered the first calculation. The second calculation is inverted; only the fixed gain from the adaptive codebook section is applied to the other subframes and the gain derived from the fixed and adaptive codebooks are applied to the original subframe, the result of which is a second calculation. Whichever of the first or second calculations minimizes the error measure is a representation of how the sub-frame is excited by the linear prediction filter 58. The excitation comes from the processor which iteratively determines the optimal excitation on a subframe by subframe basis. Other techniques are disclosed below. In some embodiments, the energy feedback 60 used to excite the frame immediately preceding the current frame is used to determine a fixed pitch gain applied to one of the subframes in the frame. The value of the energy may be stored only in the memory 34 and accessed again by the processor 28. Various other hardware arrangements operating on speech signals may be compiled as described herein without departing from these teachings.

A detailed description of embodiments of the present invention is illustrated using the accompanying text, which corresponds to a description of a rate of change multimode wideband encoder, currently filed for 3GPP2[3GPP2 c.s0052-a: standardization in Source-Controlled Variable rateMultimode Wireless Speed code (VMR-WB), Service Options 62 and63 for Spread Spectrum Systems "], which is incorporated herein by reference. A new enhancement to the standard involves the use of a mode of operation called rate set 1 configuration, which is necessary for the design of the HR of the utterance and the HR common coding type at 4 kbps. In order to be able to reduce the bit rate while maintaining the same codec structure and with limited use of external memory, the inventive idea described below is presented in detail.

According to a first embodiment, the speech coding system uses a linear predictive coding technique. The speech frame is divided into several subframe units or subframes, whereby the excitation of the Linear Prediction (LP) synthesis filter is calculated in each subframe. Preferably, the subframe unit may be a half frame or a quarter frame. In a conventional linear prediction coder, the excitation includes an adaptive codebook and a fixed codebook, scaled by their respective gains. In an embodiment of the invention, to maintain good performance while reducing the bit rate, a number of K subframes are grouped and the pitch lag is calculated once for the K subframes. Then, when determining the excitation in individual subframes, some subframes use non-fixed codebook contributions and the pitch gain for those frames is fixed to a particular value. The remaining subframes use both fixed and adaptive codebook contributions. In a preferred embodiment, a number of iterations are performed whereby in the iterations, subframes with non-fixed codebook contributions are designated differently to obtain a combination of a number of subframes with fixed codebook contributions and subframes with non-fixed codebook contributions; and thereby determine the best combination by minimizing the error measure. In addition, the index of the best combination obtained in the smallest error is encoded.

In a variant, the pitch gain in the sub-frame with the non-fixed codebook contribution is set to a value given by the ratio between the energies from the LP synthesis filters of the previous and current frame. This is shown in fig. 3.

In fig. 3, each subframe is assigned a type 301. The pitch gain is calculated once for all subframes of a particular type and stored 302. Processor 28 then iteratively computes various combinations of the different types of subframes into frame 304 using the computed pitch gains. For those subframes of the first type that use only one contributing excitation, an adaptive codebook is formed, with the pitch gain set to g at block 306_fProportional to the LP synthesis filter energy as described above and described in further detail below. An error measure for the particular synthesis is determined and stored at block 308. The calculation process repeats 310 for a small number of iterations so as not to delay the transmission, preferably limited by the number of subframes or time constraints. Once all iterations are completed, a minimum error is determined 312And the independent sub-frame is excited 314 by the linear prediction filter according to the gain at which the minimum error measure is obtained, and transmitted 316. Note that the encoder may perform each of steps 301 to 314 of fig. 3, where the encoder is read broadly to include calculations by the processor and excitations by the filter, even if the processor and filter are separate from the encoding circuitry. In all embodiments, the functional blocks of FIG. 2 are not meant to be separate components; many such blocks may be combined in one encoder.

The decoder according to the invention operates similarly, although it does not have to repeatedly determine how subframe units are arranged in a frame, since it has already received the frame over the channel. The decoder determines which subframe units are not encoded using the fixed codebook contribution, preferably from a set of bits in a frame at the transmitter. The decoder has a first input coupled to the codebook and a second input for receiving an encoded frame of the speech signal. For the transmitter, the encoded frame includes at least two subframe units. Similar to the encoder, the decoder searches the codebook for a fixed codebook contribution and an adaptive codebook contribution. The decoder decodes at least one of the subframe units without using the fixed codebook contribution.

According to a second embodiment, schematically shown in fig. 4, in a frame of two subframes, the subframes are grouped. The pitch lag is calculated over two subframes 402. Then, the pitch gain is forced to a specific value g in the first or second subframe_fTo calculate the excitation at each subframe. For a force of g_fUsing a non-fixed codebook (excitation is based on adaptive codebook contribution only). The determination of which sub-frame the pitch gain is forced to g is made in the closed loop 402 by trying both combinations and selecting the one that minimizes the weighted error over the two sub-frames_f. In the first iteration 406, the pitch gain and adaptive codebook excitation and fixed codebook excitation and gain are computed in the first subframe 408a, and in the second subframe, the pitch gain is forced to g_fAnd the adaptive codebook excitation is computed 410a using the non-fixed codebook contribution. In the first placeIn two iterations 412, in the first subframe, the pitch gain is forced to g_fAnd the adaptive codebook excitation 410b is computed using the non-fixed codebook contribution and, in the second subframe, the pitch gain and adaptive codebook excitation and fixed codebook excitation and gain 408b are computed. A weighted error is calculated for both iterations 412a, 412b, and the one that minimizes the error is retained 414 and selected for transmission 416. One bit may be used every two subframes to determine the subframe index that is contributed using the fixed codebook.

In a third embodiment, a fixed codebook contribution is used in one of the two subframes. In subframes with non-fixed codebook contribution, the pitch gain is forced to a certain value g_f. This value is determined as the ratio between the energies of the LP synthesis filters in the previous and current frames, constrained to be less than or equal to one. g_fThe values of (a) are given by:

in which is subjected to g_fConstraint of less than or equal to 1; (1)

wherein h is_LPold(n) and h_LPnew(n) represents the impulse responses of the previous and current frames, respectively. For stable vocal tract, g_fIs close to one. When the current frame becomes resonant, g determined using the above ratio_fThe pitch gain is forced to a low value. This avoids unnecessary increases in energy. The process is similar to that shown in fig. 4, but the pitch gain is given as described above in particular.

Determining which sub-frame the pitch gain in is forced to g in a closed loop by trying two combinations and selecting the one that minimizes the weighted error over half-frame_f. The determination of the excitation in every two subframes is performed in two iterations. In a first iteration, the excitation is determined in a first subframe as usual. The adaptive codebook excitation and pitch gain are determined. Then, the target signal for the fixed codebook search is updated and the fixed codebook excitation and gain are computed and the adaptive and fixed codebook gains are jointly quantized. In the second subframe, the adaptive codebook memory is updated with the total excitation from the first subframe, and then the pitch gain is forced to g_fAnd the adaptive codebook excitation is computed by having a non-fixed codebook contribution. Thus, the total excitation from the first iteration in the first sub-frame is given by:

n＝0，...，63 (2)

and the total excitation in the second sub-frame is given by:

n＝0，...，63. (3)

before starting the second iteration, the memory of the synthesis and weighting filter and the adaptive codebook memory are saved for the second subframe.

In the second iteration, in the first subframe, the pitch gain is forced to g_fAnd computing the adaptation using the non-fixed codebook contributionShould be codebook excited. The total excitation in the first sub-frame is given by:

n＝0，...，63. (4)

then the memory of the adaptive codebook and the memory of the filter are updated based on the excitation from the first subframe.

In the second subframe, the target signal is calculated and the adaptive codebook excitation and pitch gain are determined. The target signal is then updated and the fixed codebook excitation and gain are calculated. The adaptive and fixed codebook gains are jointly quantized. Thus, the total excitation in the second sub-frame is given by:

n＝0，...，63 (5)

finally, to decide which iteration to select, the weighted error is calculated for both iterations over two subframes, and the total excitation corresponding to the iteration that results in the smaller mean square weighted error is retained. Each field uses 1 bit to indicate the index of the subframe in which the fixed codebook contribution is used (or vice versa).

The weighted error for two subframes in the first iteration is given by:

and the weighted errors for the two subframes in the second iteration are given by:

where y (n) and z (n) are the filtered adaptive codebook and the filtered fixed codebook contributions, respectively.

With the first iteration retained, the saved memory is copied back into the filter memory and adaptive codebook buffer for use in the next two subframes (since after performing the two iterations, the filter memory and adaptive codebook buffer correspond to the second iteration).

Various embodiments of the present invention may be implemented by computer software executable by a data processor, such as processor 28, of mobile station 20 or other host device, or by hardware, or by a combination of software and hardware. Further in this regard it will be understood that the blocks of the various figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions.

The memory or memories 34 may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The one or more data processors 28 may be of any type suitable to the local technical environment, and may include by way of non-limiting example: one or more of a general purpose computer, a special purpose computer, a microprocessor, a Digital Signal Processor (DSP), and a processor based on a multi-core processor architecture.

In general, the various embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, although the invention is not limited in this respect, certain aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

Embodiments of the invention may be practiced in various components such as integrated circuit modules. The design of integrated circuits is basically a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.

Programs such as those provided by Synopsys, inc. of Mountain View, california, and Cadence Design, of san jose, california, using well-established rules of Design, and libraries of pre-stored Design modules, can automate conductor routing and locate components on a semiconductor chip. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.

While described in the context of particular embodiments, it will be apparent to those skilled in the art that many modifications and various changes to these teachings are possible. Thus, while the invention has been particularly shown and described with respect to one or more embodiments thereof, it will be understood by those skilled in the art that certain modifications or changes may be made therein without departing from the scope and spirit of the invention as set forth above and without departing from the scope of the claims below, particularly when such changes are effected by a similar set of process steps or a similar or equivalent arrangement of hardware.

Claims

1. A method for encoding a speech signal, the method comprising:

dividing a speech signal into a plurality of frames;

dividing at least one of the plurality of frames into at least two subframe units;

searching for a fixed codebook contribution and an adaptive codebook contribution for a subframe unit;

preparing two or more different combinations of subframe units for encoding a given frame, wherein in each combination at least one subframe unit is encoded without the fixed codebook contribution and with the adaptive codebook contribution, and at least one subframe unit is encoded with both the fixed codebook contribution and the adaptive codebook contribution; and

selecting one of the combinations having the minimized error measure and outputting the selected combination for transmission.

2. The method according to claim 1, wherein a fixed pitch gain is applied to the at least one subframe unit that is not coded using the fixed codebook contribution in the selected combination.

3. The method of claim 2, wherein the fixed pitch gain is calculated based on energies of a current frame and a previous frame, the current frame including the selected combination.

4. The method of claim 3, wherein the fixed pitch gain is calculated by:

in which is subjected to g_fConstraint of less than or equal to 1;

wherein h is_LPold(n) and h_LPnew(n) represents impulse responses of the previous frame and the current frame, respectively.

5. The method of claim 1, wherein the steps of preparing and selecting comprise:

combining at least one subframe unit having the fixed codebook contribution followed by at least one subframe unit not having the fixed codebook contribution to form a first combination, and combining at least one subframe unit not having the fixed codebook contribution followed by at least one subframe unit having the fixed codebook contribution to form a second combination; and

only one of the first and second combinations is selected for transmission.

6. The method of claim 5, wherein combining the first and second combinations comprises combining subframe units so as to minimize an error measure for the frame comprising the selected combination.

7. The method of claim 6, wherein combining subframe units so as to minimize the error measure comprises: iteratively combining different combinations of subframe units and selecting for transmission a particular combination that minimizes the error measure for the frame.

8. A method according to claim 5, wherein prior to said selecting, the method comprises calculating an error measure for each of the first and second combinations of subframe units so as to select the one of the combinations having the smallest error measure.

9. The method of claim 1, wherein the minimization of error measure is a mean square error.

10. The method of claim 1, further comprising setting at least one bit in a frame comprising the selected combination to indicate which at least one subframe is not encoded using a fixed codebook contribution.

11. The method of claim 1, wherein the subframe units comprise half-frames.

12. The method of claim 1, wherein the subframe units comprise quarter frames.

13. An encoder, comprising:

a first input interfaced to a codebook; and

a second input configured to receive a speech signal;

wherein the encoder is configured to search the codebook for a fixed codebook contribution and an adaptive codebook contribution for the received speech signal and to output the speech signal as a frame comprising at least two subframe units, and the encoder is further configured to encode at least one subframe unit of the frame without the fixed codebook contribution, to prepare two or more different combinations of subframe units for encoding a given frame, wherein in each combination at least one subframe unit is encoded without the fixed codebook contribution and with the adaptive codebook contribution, and at least one subframe unit is encoded with both the fixed codebook contribution and the adaptive codebook contribution; and selecting for output one of the combinations having the minimized error measure.

14. The encoder of claim 13, wherein the encoder is configured to prepare and select by combining at least one subframe unit with the fixed codebook contribution followed by at least one subframe unit without the fixed codebook contribution to form a first combination, and combining at least one subframe unit without the fixed codebook contribution followed by at least one subframe unit with the fixed codebook contribution to form a second combination; and

the encoder is configured to output only one of the first combination and the second combination.

15. The encoder of claim 14, wherein the encoder is configured to combine the first combination and the second combination by combining subframe units so as to minimize an error measure for a frame comprising the selected combination.

16. The encoder of claim 15, wherein combining subframe units so as to minimize the error measure comprises: iteratively combining different combinations of subframe units and selecting for transmission a particular combination that minimizes the error measure for the frame.

17. The encoder of claim 13, wherein the minimization of error measure is a mean square error.

18. An encoding apparatus comprising:

means for dividing the speech signal into a plurality of frames;

means for dividing at least one of the plurality of frames into at least two subframe units;

means for searching for a fixed codebook contribution and an adaptive codebook contribution for a subframe unit;

means for preparing two or more different combinations of subframe units for encoding a given frame, wherein in each combination at least one subframe unit is encoded without the fixed codebook contribution and with the adaptive codebook contribution, and at least one subframe unit is encoded with both the fixed codebook contribution and the adaptive codebook contribution; and

means for selecting one of the combinations for transmission having a minimized error measure.

19. The encoding device of claim 18, further comprising: gain means for applying a fixed pitch gain to the at least one subframe unit that is not encoded using the fixed codebook contribution in the selected combination.

20. The encoding device of claim 19, further comprising: processing means for calculating the fixed pitch gain based on the energy of a current frame and a previous frame, the current frame comprising the selected combination.

21. An encoding apparatus according to claim 20, wherein the processing means calculates the fixed pitch gain g by_f：

In which is subjected to g_fConstraint of less than or equal to 1;

22. The encoding apparatus of claim 18, further comprising means for setting at least one bit in a frame comprising the selected combination to indicate which at least one subframe is not encoded using a fixed codebook contribution.

23. The encoding device of claim 18, wherein the subframe units comprise half-frames.

24. The encoding device of claim 18, wherein the subframe units comprise quarter frames.

25. A decoder, comprising:

a first input interfaced to a codebook; and

a second input configured to receive an encoded frame of a speech signal, the encoded frame comprising at least two subframe units;

wherein the decoder is configured to search the codebook for a fixed codebook contribution and an adaptive codebook contribution for the received encoded frame and to decode at least one of the subframe units that does not use the fixed codebook contribution and to which a fixed pitch gain has been applied and to decode another of the at least two subframe units that is encoded using both the fixed codebook contribution and the adaptive codebook contribution, wherein the decoder is configured to read one bit in the frame and to determine, based on the bit, which subframe unit will not use the fixed codebook contribution for decoding.

26. The decoder of claim 25, wherein the subframe units comprise half-frames.

27. The decoder of claim 25, wherein the subframe units comprise quarter-frames.

28. A communication system comprising an encoder and a decoder, wherein the encoder comprises:

a first input configured to interface to a codebook; and

a second input configured to receive a voice signal to be transmitted;

wherein the encoder is configured to search the codebook for a fixed codebook contribution and an adaptive codebook contribution for the received speech signal and to output the speech signal as a frame comprising at least two subframe units, and the encoder is further operative to encode at least one subframe unit of the frame without the fixed codebook contribution, prepare two or more different combinations of subframe units for encoding a given frame, wherein in each combination at least one subframe unit is encoded without the fixed codebook contribution and with the adaptive codebook contribution and at least one subframe unit is encoded with both the fixed codebook contribution and the adaptive codebook contribution; and selecting one of the combinations having a minimized error measure for the transmission;

and wherein the decoder comprises:

a first input configured to interface to a codebook; and

a second input configured to receive an encoded frame of the received speech signal over a channel, the encoded frame comprising at least two subframe units;

wherein the decoder is configured to search the codebook for a fixed codebook contribution and an adaptive codebook contribution for the received encoded frame and to decode at least one of the subframe units of the encoded frame that does not use the fixed codebook contribution and to which a fixed pitch gain has been applied and another of the at least two subframe units that is encoded using both the fixed codebook contribution and the adaptive codebook contribution, wherein the decoder is configured to read one bit in the frame and to determine based on the bit which subframe unit will not use the fixed codebook contribution for decoding.

29. The communication system of claim 28, further comprising an amplifier for applying the fixed pitch gain to the at least one subframe unit without a fixed codebook contribution.

30. The communication system of claim 29, wherein the fixed pitch gain is calculated based on the energy of a current frame and a previous frame.

31. The communication system of claim 28, wherein the encoder is configured to combine at least one subframe unit with the fixed codebook contribution followed by at least one subframe unit without the fixed codebook contribution to form a first combination, and to combine at least one subframe unit without the fixed codebook contribution followed by at least one subframe unit with the fixed codebook contribution to form a second combination; and to output only one of the first combination and the second combination having the minimized error measure.

32. The communication system of claim 31, wherein the encoder is configured to set a bit in the frame indicating which subframe units are to be encoded without the fixed codebook contribution.

33. The communication system of claim 28, wherein the minimization of error measure is a mean square error.

34. The communication system of claim 28, wherein the subframe units comprise half-frames.

35. The communication system of claim 28, wherein the subframe units comprise quarter frames.