HK1130558B - Method and device for cdma wireless systems - Google Patents
Method and device for cdma wireless systems Download PDFInfo
- Publication number
- HK1130558B HK1130558B HK09110429.1A HK09110429A HK1130558B HK 1130558 B HK1130558 B HK 1130558B HK 09110429 A HK09110429 A HK 09110429A HK 1130558 B HK1130558 B HK 1130558B
- Authority
- HK
- Hong Kong
- Prior art keywords
- signal
- rate
- parameters
- coding parameters
- communication mode
- Prior art date
Links
Description
The application is a divisional application of Chinese patent application with application number 03820762.1, application date 2003-27.6.15 entitled "method and apparatus for efficient in-band half-blank-burst sequence signaling and half-rate maximum operation in variable bit rate wideband speech coding in CDMA wireless system".
Technical Field
The invention relates to a method for interworking a first station employing a first communication scheme and comprising a first encoder and a first decoder with a second station employing a second communication scheme and comprising a second encoder and a second decoder, wherein communication between the first and second stations is performed by transferring signal coding parameters from the encoder of one of the first and second stations to the decoder of the other of said first and second stations.
Background
The demand for efficient digital narrowband and wideband speech coding techniques with a good compromise between subjective quality and bit rate is increasing in various application areas such as teleconferencing, multimedia and wireless communications. Until recently, the telephone bandwidth limited to the 200-. However, broadband speech applications provide improved intelligibility and fidelity in communications compared to the traditional telephone bandwidth. It has been found that a bandwidth in the range of 50-7000Hz is sufficient to deliver good quality, giving the feeling of face-to-face communication. For a typical audio signal this bandwidth provides an acceptable subjective quality, but still lower than the quality of an FM radio or CD operating in the range of 20-16000Hz and 20-20000Hz, respectively.
A speech encoder converts speech signals into a digital bit stream that is transmitted over a communication channel or stored in a storage medium. The speech signal is digitized, i.e., sampled and quantized, typically using 16 bits per sample. The role of the speech coder is to represent these digital samples with a smaller number of bits while maintaining a good subjective quality of speech. A speech decoder or synthesizer operates on the transmitted or stored bit stream and re-converts it to a speech signal.
Code Excited Linear Prediction (CELP) coding is one of the best prior art techniques for obtaining a good compromise between subjective quality and bit rate. This coding technique forms the basis of several speech coding standards in both wireless and wireline applications. In CELP coding, a sampled speech signal is processed in successive blocks of N samples, commonly referred to as frames, where N is a predetermined number, typically corresponding to 10-30 ms. A Linear Prediction (LP) filter is calculated and transmitted every frame. The computation of the LP filter typically requires the prediction, i.e., 5-15ms speech segments from subsequent frames. The N-sample frame is divided into smaller blocks called sub-frames. The number of subframes in a frame is typically three (3) or four (4), resulting in 4-10ms subframes. In each subframe, the excitation signal is typically obtained from two components, the past excitation and the innovative fixed codebook excitation. The contribution formed from past excitation is often referred to as an adaptive codebook or pitch excitation. Parameters characterizing the excitation signal are encoded and transmitted to the decoder, where the reconstructed excitation signal is used as input to the LP filter.
In wireless systems employing Code Division Multiple Access (CDMA) technology, the use of source-controlled Variable Bit Rate (VBR) speech coding significantly improves system capacity. In source-controlled VBR coding, the codec operates at several bit rates, and a rate selection module is used to determine the bit rate to be used for encoding each speech frame based on the nature of the speech frame (e.g., voiced, unvoiced, transient, background noise, etc.). The goal is to obtain the best speech quality at a given average bit rate, also referred to as the Average Data Rate (ADR). By tuning the rate selection module to obtain different ADRs in different modes, the codec can operate in different modes, with codec performance increasing with increasing ADRs. This provides a mechanism for the codec to trade off speech quality against system capacity. In CDMA systems (e.g., CDMA-one and CDMA2000), 4 bit rates are typically used, which are referred to as Full Rate (FR), Half Rate (HR), Quarter Rate (QR), and Eighth Rate (ER). Two rate sets are supported in the present system, referred to as rate set I and rate set II. In rate set II, the variable rate codec with rate selection mechanism operates at source code bit rates of 13.3(FR), 6.2(HR), 2.7(QR), and 1.0(ER) kbits/sec corresponding to total bit rates of 14.4, 7.2, 3.6, and 1.8 kbits/sec, with some bits added for error detection.
In CDMA systems, half-rate may be implemented in place of full-rate in some voice frames in order to send in-band signaling information (called half-blank-burst sequence signaling). The use of half rate as the maximum bit rate may also be enforced by the system during poor channel conditions (e.g., near cell boundaries) in order to improve codec robustness. This is called half rate max. In VBR coding, half rate is typically used when a frame is either stationary voiced or stationary unvoiced. Two codec structures are used for each signal (in case of unvoiced, CELP model without pitch codebook is used, whereas in case of voiced, signal modification is used to enhance periodicity and reduce the number of bits used for pitch indexing). Full rate is used for start, transient frames, and mixed voiced frames (typically using a typical CELP model). When the rate selection module selects frames to encode as full-rate frames and the system enforces half-rate frames, speech performance degrades because the half-rate mode cannot efficiently encode start and transient signals.
Wideband codecs, called adaptive multi-rate wideband (AMR-WB) speech codecs, have recently been selected by the ITU-T (international telecommunications union — telecommunication standardization sector) for several wideband speech telephony technologies and services, and by the 3GPP (third generation partnership project) for GSM and W-CDMA third generation wireless systems. The AMR-WB codec comprises nine (9) bit rates ranging from 6.6 to 23.85 kbits/sec. An advantage of designing an AMR-WB based source-controlled VBR codec for CDMA2000 systems is that interworking between CDMA2000 and other systems employing AMR-WB codecs is achieved. The AMR-WB bit rate of 12.65 kbit/sec is the closest rate to the 13.3 kbit/sec full rate that can fit rate set II. This rate can be used as a common rate between the CDMA2000 wideband VBR codec and the AMR-WB to achieve interoperability without the need for transcoding (which degrades speech quality). A half rate of 6.2 kbits/sec must be added to the CDMA2000VBR wideband solution to achieve efficient operation in the rate set II framework. The codec can then operate in few CDMA 2000-related modes and include modes that employ an AMR-WB codec to achieve interoperability with the system. However, in a cross-system tandem-free operation call between CDMA2000 and another system employing AMR-WB, the CDAM2000 system may force the use of half rate, as previously described (e.g., with half blank-burst sequence signaling). Since the AMR-WB codec does not recognize the 6.2 kbit/sec half rate of the CDMA2000 wideband codec, the mandatory half rate frame is interpreted as an erased frame. This negatively affects the performance of the connection.
Disclosure of Invention
According to a first aspect of the present invention, there is provided:
-a method for interworking a first station employing a first communication scheme and comprising a first encoder and a first decoder with a second station employing a second communication scheme and comprising a second encoder and a second decoder, wherein communication between the first and second stations is performed by transferring signal coding parameters from the encoder of one of the first and second stations to the decoder of the other of said first and second stations, the method comprising: receiving a request to transmit signal-coding parameters from said one station to another station using a communication mode designed to reduce the bit rate during transmission of the signal-coding parameters; in response to the request, discarding a portion of the signal-coding parameters from the encoder of said one station and transmitting the remaining signal-coding parameters to the decoder of the other station; and regenerating the part of the signal encoding parameters and decoding the signal encoding parameters in a decoder of the other station.
-a system for interworking a first station employing a first communication scheme and comprising a first encoder and a first decoder with a second station employing a second communication scheme and comprising a second encoder and a second decoder, wherein communication between the first and second stations is performed by transferring signal coding parameters from the encoder of one of the first and second stations to the decoder of the other of said first and second stations, the system comprising: means for receiving a request to transmit signal encoding parameters from said one station to another station using a communication mode designed to reduce the bit rate during transmission of the signal encoding parameters; means for discarding a portion of the signal encoding parameters from the encoder of said one station and transmitting remaining signal encoding parameters to the decoder of the other station in response to the request; and means for regenerating the part of the signal encoding parameters and a decoder of the further station for decoding the signal encoding parameters.
According to a second aspect of the present invention, there is provided:
-a method for interworking a first station employing a first communication scheme and comprising a first encoder and a first decoder with a second station employing a second communication scheme and comprising a second encoder and a second decoder, wherein communication between the first station and the second station is performed by transferring signal coding parameters related to sound signals from the encoder of one of the first station and the second station to the decoder of the other of the first station and the second station, the method comprising: classifying the sound signal to determine whether the signal-coding parameters should be transmitted from the encoder of said one station to the decoder of the other station in a first communication mode in which full bit rate is used for transmission of the signal-coding parameters; receiving a request to transmit signal-coding parameters from the encoder of said one station to the decoder of the other station using a second communication mode designed to reduce the bit rate during transmission of the signal-coding parameters; when the classification of the sound signal determines that the signal-coding parameters should be transmitted in the first communication mode, and when a request for transmission of the signal-coding parameters in the second communication mode is received, a portion of the signal-coding parameters from the encoder of the one station is discarded, and the remaining signal-coding parameters are transmitted to the decoder of the other station in the second communication mode.
-a system for interworking a first station employing a first communication scheme and comprising a first encoder and a first decoder with a second station employing a second communication scheme and comprising a second encoder and a second decoder, wherein communication between the first station and the second station is performed by transferring signal coding parameters related to sound signals from the encoder of one of the first station and the second station to the decoder of the other of the first station and the second station, the system comprising: means for classifying the sound signal to determine whether the signal-coding parameters should be transmitted from the encoder of the one station to the decoder of the other station in a first communication mode in which full bit rate is used for transmission of the signal-coding parameters; means for receiving a request to transmit signal-coding parameters from the encoder of said one station to the decoder of the other station using a second communication mode designed to reduce the bit rate during transmission of the signal-coding parameters; means for discarding a portion of the signal encoding parameters from the encoder of the one station and transmitting the remaining signal encoding parameters to the decoder of the other station using the second communication mode when the classification of the sound signal determines that the signal encoding parameters should be transmitted using the first communication mode and when a request to transmit the signal encoding parameters using the second communication mode is received.
According to a third aspect of the present invention, there is provided:
-a method for transmitting signal coding parameters from a first station to a second station, comprising: encoding, in one of the first station and the second station, the sound signal according to a full rate communication mode; receiving a request to transmit signal-coding parameters from said one station to the other of the first and second stations using a second communication mode designed to reduce the bit rate during transmission of the signal-coding parameters; in response to the request, converting signal encoding parameters encoded in the full-rate communication mode into signal encoding parameters encoded in the second communication mode; and transmitting the signal encoding parameters encoded in the second communication mode to the other of the first and second stations.
-a system for transmitting signal coding parameters from a first station to a second station, comprising: an encoder in one of the first station and the second station for encoding the sound signal according to a full rate communication mode; means for receiving a request to transmit signal-coding parameters from said one station to the other of the first and second stations in a second communication mode designed to reduce the bit rate during transmission of the signal-coding parameters; means for converting, in response to the request, signal encoding parameters encoded in the full-rate communication mode into signal encoding parameters encoded in the second communication mode; and means for communicating the signal-encoding parameters encoded in the second communication mode to the other of the first and second stations.
The above and other objects, advantages and features of the present invention will become more apparent upon reading of the following non-restrictive description of illustrative embodiments thereof, given by way of example only with reference to the accompanying drawings.
Drawings
FIG. 1 is a schematic block diagram of a non-limiting example of a voice communication system in which the present invention may be used;
FIG. 2 is a functional block diagram of a non-limiting example of a variable bit rate codec including rate determination logic;
FIG. 3 is a functional block diagram of a non-limiting example of a variable bit rate codec including rate determination logic that employs a normal HR for low energy frames;
FIG. 4 is a schematic block diagram of a non-limiting example of a variable bit rate codec according to FIG. 3 including a half rate system request within rate determination logic;
FIG. 5 is a functional block diagram of one example of a variable bit rate codec according to a non-limiting illustrative embodiment of the present invention, including a half rate system request at the packet level (or bitstream level) within the rate determination logic;
FIG. 6 relates toMobile to mobile call orExample configuration of a semi-blanking-burst sequence signaling method according to a non-limiting illustrative embodiment of the present invention in interoperable mode of VBR-WB at call time;
FIG. 7 is a schematic block diagram of a non-limiting example of a wideband encoding device, more specifically an AMR-WB encoder; and
FIG. 8 is a schematic block diagram of a non-limiting example of a wideband decoding device, more specifically an AMR-WB decoder.
Detailed Description
Although the following description describes illustrative embodiments of the invention in connection with speech signals, it should be kept in mind that the inventive concept is equally applicable to other types of signals, in particular, but not exclusively, other types of sound signals.
Fig. 1 illustrates a speech communication system 100 depicting the use of speech encoding and decoding apparatus. The voice communication system 100 of fig. 1 supports the transmission of voice signals over a communication channel 101. The communication channel 101 typically comprises, at least in part, a radio frequency link, although it may comprise, for example, a wire, an optical link, or a fiber optic link. Radio frequency links typically support multiple simultaneous voice communications that require shared bandwidth resources, such as found in cellular telephone systems. Although not shown, in a single device implementation of system 100, communication channel 101 may be replaced by a storage device that records and stores the encoded speech signal for later playback.
In the voice communication system 100 of fig. 1, a microphone 102 produces an analog voice signal 103, which is provided to an analog-to-digital (a/D) converter 104 for conversion to a digital voice signal 105. A speech encoder 106 encodes a digital speech signal 105 to produce a set of signal encoding parameters 107, which are encoded into binary form and passed to a channel encoder 108. An optional channel encoder 108 adds redundancy to the binary representation of the signal encoding parameters 107 before transmitting them over the communication channel 101.
In the receiver, a channel decoder 109 uses redundant information in the received bit stream 111 to detect and correct channel errors occurring during transmission. The speech decoder 110 converts the bit stream 112 received from the channel decoder 109 back into a set of signal coding parameters and creates a digitally synthesized speech signal 113 from the recovered signal coding parameters. The digitally synthesized speech signal 113 reconstructed in the speech decoder 110 is converted into an analog form 114 by a digital-to-analog (D/a) converter 115 and played back through a speaker unit 116.
Source controlled variable bit rate speech coding
Fig. 2 illustrates a non-limiting example of a variable bit rate codec configuration including rate determination logic for controlling four encoding bit rates. In this example, the set of bit rates includes the dedicated codec bit rate for inactive speech frames (eighth rate (CNG) encoding module 208), the bit rate for unvoiced speech frames (half-rate unvoiced encoding module 207), the bit rate for stable voiced frames (half-rate voiced encoding module 206), and the bit rate for other types of frames (full-rate encoding module 205).
The rate determination logic is based on signal classification performed in three steps (201, 202 and 203) on a frame basis, the operation of which is well known to those of ordinary skill in the art.
First, a Voice Activity Detector (VAD)201 distinguishes between active and inactive speech frames. If an inactive speech frame (background noise signal) is detected, the signal classification chain ends and the frame is encoded in the encoding module 208 as an eighth rate frame with Comfort Noise Generation (CNG) at the decoder (1.0 kbit/s according to CDMA2000 rate set II). If an active speech frame is detected, the frame passes through a second classifier 202.
The second classifier 202 is dedicated to making the voicing decision. If the classifier 202 classifies the frame as an unvoiced speech frame, the classification chain ends and the frame is encoded in block 207 with a half rate optimized for unvoiced signals (6.2 kbits/sec according to CDMA2000 rate set II). Otherwise, the speech frame is processed through a "stationary voiced" classifier 203.
If the frame is classified as a stationary voiced frame, the frame is encoded in block 206 with a half rate optimized for a stationary voiced signal (6.2 kbits/sec according to CDMA2000 rate set II). Otherwise, the frame is likely to contain unstable speech segments, such as voiced onset or a rapidly evolving voiced speech signal. These frames typically require a high bit rate to maintain good subjective quality. Thus, in this case, the speech frame is encoded in block 205 as a full-rate frame (13.3 kbits/sec according to CDMA2000 rate set II).
In the non-limiting alternative implementation shown in FIG. 3, if the frame is not classified as "stable voiced," it is processed by the low energy frame classifier 311. This is used to detect frames that are not considered by the VAD detector 201. If the frame energy is below a certain threshold, the frame is encoded using the normal half rate encoder 312, otherwise the frame is encoded as a full rate frame in block 205.
The signal classification modules 201, 202, 203 and 311 are well known to those of ordinary skill in the art and thus will not be described in this description. In the non-limiting example of fig. 3, the coding blocks, i.e., blocks 205, 206, 207, 208 and 312, which assume different bit rates, are based on Code Excited Linear Prediction (CELP) coding techniques, as are well known to those of ordinary skill in the art. For example, the bit rate is set according to rate set II of the CDMA2000 system described herein above.
Non-limiting illustrative embodiments of the present invention are described herein with reference to a wideband speech codec that has been standardized by the International Telecommunications Union (ITU) as recommendation g.722.2 and is referred to as the AMR-WB codec [ ITU-T recommendation g.722.2 "wideband coding of speech at approximately 16 kbits/sec using adaptive multi-rate wideband (AMR-WB)", Geneva, 2002 ]. Such codecs have also been selected by the third generation partnership project (3GPP) for wideband telephony in third generation wireless systems [3GPP TS 26.190 "AMR wideband speech codec: code conversion function ", 3GPP technical specification ]. AMR-WB can operate at 9 bit rates from 6.6 to 23.85 kbits/sec. Here, a bit rate of 12.65 kbit/sec is used as an example of the full rate.
Of course, the non-limiting illustrative embodiments of the present invention are applicable to other types of codecs.
For the convenience of the reader, an overview of the AMR-WB codec is provided below.
Overview of AMR-WB encoder
Referring to fig. 7, a sampled speech signal is block-wise encoded by the encoding apparatus 700 of fig. 7, wherein the encoding apparatus 700 is decomposed into eleven modules numbered from 701 to 711.
Thus, the input speech signal 712 is processed block by block, i.e. in the above-mentioned L-sample blocks called frames.
Referring to fig. 7, a sampled input speech signal 712 is downsampled in a downsampler module 701. The signal is down-sampled from 16kHz to 12.8kHz using techniques well known to those of ordinary skill in the art. Downsampling improves coding efficiency because a smaller frequency bandwidth is coded. This also reduces algorithm complexity, as the number of samples in a frame is reduced. After downsampling, the 20ms 320-sample frame is reduced to a 256-sample frame (downsampling rate of 4/5).
The input frame is then provided to an optional pre-processing module 702. The pre-processing module 702 may be comprised of a high pass filter with a 50Hz cutoff frequency. The high pass filter 702 eliminates undesired sound components below 50 Hz.
The downsampled preprocessed signal is denoted as sp(n), n ═ 0, 1, 2.., L-1, where L is the frame length (256 at a sampling frequency of 12.8 kHz). This signal s is filtered using a pre-emphasis filter 703 having the following transfer functionp(n) performing pre-emphasis:
P(z)=1-μz-1
where μ is a pre-emphasis factor with a value between 0 and 1 (typical value is μ ═ 0.7). The function of the pre-emphasis filter 703 is to enhance the high frequency content of the input speech signal. It also reduces the dynamic range of the input speech signal, which makes it more suitable for fixed-point implementations. The pre-emphasis also plays an important role in achieving a proper overall perceptual weighting of the quantization error, which contributes to an improved sound quality.
The output of the pre-emphasis filter 703 is denoted as s (n). This signal is used to perform LP analysis in block 704. LP analysis is a technique well known to those of ordinary skill in the art. In the example of fig. 7, an autocorrelation method is employed. In the autocorrelation method, a hamming window, typically having a length of about 30-40ms, is first employed to window the signal s (n). Autocorrelation is calculated from the windowed signal, and the Levinson-Durbin recursion is used to calculate the LP filter coefficients aiWhere i 1., p, and p are of the LP order, typically 16 in wideband coding. Parameter aiIs the coefficient of the transfer function a (z) of the LP filter, given by the following relation:
the LP analysis is performed in block 704, which also performs quantization and interpolation of the LP filter coefficients. The LP filter coefficients are first converted to another equivalent domain that is more suitable for quantization and interpolation purposes. The Line Spectral Pair (LSP) and Immittance Spectral Pair (ISP) domains are two domains in which quantization and interpolation can be efficiently performed. The 16 LP filter coefficients a may be paired by a plurality of bits of about 30 to 50 bits using split or multi-level quantization or a combination thereofiAnd (6) quantizing. The purpose of the interpolation is to enable the update of the LP filter coefficients for each subframe while transmitting them once per frame, which improves the encoder performance without increasing the bit rate. Quantization and interpolation of LP filter coefficients is considered well known to those of ordinary skill in the art and is therefore described hereinWill not be described.
The following paragraphs will describe the remaining encoding operations performed on a sub-frame basis. The input frame is divided into 4 sub-frames of 5ms (64 samples at a sampling frequency of 12.8 kHz). In the following description, filter A (z) represents an unquantized interpolated LP filter for a subframe, and the filterRepresents a quantized interpolated LP filter for the sub-frame. Filter with a filter element having a plurality of filter elementsAt each subframe is provided to a multiplexer 713 for transmission over the communication channel.
In the analysis-by-synthesis encoder, the best pitch and innovation parameters are searched for by minimizing the mean square error between the input speech signal 712 and the synthesized speech signal in the perceptual weighted domain. In response to the signal s (n) from the pre-emphasis filter 703, a weighted signal s is calculated in the perceptual weighting filter 705w(n) of (a). A perceptual weighting filter 705 is employed with a fixed denominator, adapted for wideband signals. An example of a transfer function of perceptual weighting filter 705 is given by the following relation:
W(z)=A(z/γ1)/(1-γ2z-1) Wherein 0 < gamma2<γ1≤1
To simplify pitch analysis, the weighted speech signal s is first retrieved from the open-loop pitch search module 706w(n) estimating the open-loop pitch lag TOL. The closed-loop pitch analysis performed on a subframe basis in the closed-loop pitch search module 707 is then limited to an open-loop pitch lag TOLThis, in turn, greatly reduces the search complexity of the LTP parameters T (pitch lag) and b (pitch gain). The open-loop pitch analysis is typically performed every 10ms (two subframes) in block 706 using techniques well known to those skilled in the art.
The target vector x for LTP (long term prediction) analysis is first calculated. This is achieved byUsually by filtering from a weighted speech signal sw(n) minus the weighted combining filter W (z) < >Zero input response s0To proceed with. The quantized interpolation LP Filter from LP analysis, quantization and interpolation Module 704 is responded to by a zero input response calculator 708And in response to LP filters A (z) andand a weighted synthesis filter W (z) stored in the memory update module 711 in response to the excitation vector uTo calculate this zero input response s0. This operation is well known to those skilled in the art and will not be described.
LP filters A (z) and A from module 704 are employed in impulse response generator 709To calculate a weighted combining filter w (z) based on the coefficients of (a) and (b)N-dimensional impulse response vector h. This operation is also well known to those skilled in the art and will therefore not be described in this description.
Closed-loop pitch (or pitch codebook) parameters b, T, and j are computed in a closed-loop pitch search module 707 that employs a target vector x, an impulse response vector h, and an open-loop pitch lag TOLAs an input.
Pitch search involves finding the optimum pitch lag T and gain b that minimizes the mean-square weighted pitch prediction error, e.g.
e(j)=||x-b(j)y(j)||2Wherein j is 1, 2
Between the target vector x and the scaled filtered version of the past excitation by.
More specifically, the pitch (pitch codebook) search consists of three levels.
In the first stage, the weighted speech signal s is responded to in an open-loop pitch search module 706w(n) estimating the open-loop pitch lag TOL. As described in the above description, this open-loop pitch analysis is typically performed every 10ms (two subframes) using techniques well known to those skilled in the art.
In the second stage, a search criterion C is searched in the closed-loop pitch search module 707 to obtain an estimated open-loop pitch lag TOLThe surrounding integer pitch lag (typically ± 5), which greatly simplifies the search process. A simple procedure for updating the filtered codevector yT(this vector is defined in the following description) without the need to compute the convolution for each pitch lag. An example of a search criterion C is given by:
where t denotes the vector transposition
Once the best integer pitch lag is found in the second stage, the third stage of the search (block 707) tests the fraction around the best integer pitch lag by searching for criterion C. For example, the AMR-WB standard employs1/4And1/2a sub-sampling resolution.
In a wideband signal, the harmonic structure is present only up to a certain frequency, depending on the speech segment. Thus, in order to obtain an efficient representation of tonal components in voiced segments of a wideband speech signal, flexibility is required to vary the amount of periodicity over the wideband spectrum. This is achieved by processing the pitch codevectors through a plurality of frequency shaping filters, such as low pass or band pass filters. Selecting the mean square weighted error e defined above(j)Is the smallest frequency shaping filter. The selected frequency shaping filter is identified by the index j.
The tone codebook index T is encoded and communicated to a multiplexer 713 for transmission over a communication channel. The pitch gain b is quantized and transmitted to the multiplexer 713. Additional bits are used to encode the index j, which are also provided to multiplexer 713.
Once the pitch or LTP (long term prediction) parameters b, T, and j are determined, the next step includes finding the best innovation excitation by the innovation excitation search module 710 of FIG. 7. First, the target vector x is updated by subtracting the LTP component.
x’=x-byT
Where b is the pitch gain, and yTFor the filtered pitch codebook vector (past excitation with delay T filtered with the selected frequency shaping filter (index j) and convolved with impulse response h).
The innovative excitation search process in CELP is performed in an innovative codebook to find the best excitation codevector ckAnd a gain g which makes the target vector x' and the code vector ckThe mean square error E between scaled filtered versions of (a) is minimal, e.g.:
E=||x’-gHck||2
where H is the lower triangular convolution matrix derived from the impulse response vector H. Corresponding to the found best code vector ckThe index k of the innovative codebook and the increment g are provided to the multiplexer 213For transmission over a communication channel.
It should be noted that according to us patent 5444816 issued to Adoul et al on 8/22 of 1995, the innovative codebook used may be a dynamic codebook comprising an algebraic codebook followed by an adaptive pre-filter f (z) enhancing a given spectral component in order to improve the quality of the synthesized speech. More specifically, the innovative codebook search may be performed in block 710 by an algebraic codebook as described in the following U.S. patents: no.5444816(Adoul et al), granted 8/22/1995; no.5699482, issued to Adoul et al on 12/17 1997; no.5754976, 5/19/1998 to Adoul et al; and No.5701392(Adoul et al), 1997, 12/23.
Overview of AMR-WB decoder
The speech decoder 800 of FIG. 8 illustrates various steps performed between a digital input 822 (input bitstream to demultiplexer 817) and an output sampled speech signal 823 (output of adder 821).
Demultiplexer 817 extracts signal-coding parameters from binary information (input bit stream 822) received from a digital input channel. From each received binary frame, the extracted signal coding parameters are:
-quantized interpolated LP coefficients(line 825), also known as short term prediction parameter (STP), generated once per frame;
long-term prediction (LTP) parameters T, b and j (for each subframe); and
innovation excitation index k and gain g (for each subframe).
The current speech signal is synthesized based on these parameters, as will be explained below.
Innovative excitation codebook 818 generates an innovative codevector c in response to index kkIt is scaled by the decoded innovation excitation gain g by amplifier 824. As described aboveThis innovative codebook 818 described in U.S. Pat. Nos. 5444816, 5699482, 5754976 and 5701392 is used to generate an innovative codevector ck。
Scaled codevector gc generated at the output of amplifier 824kProcessed by a frequency dependent pitch enhancer 805.
Enhancing the periodicity of the excitation signal u improves the quality of the voiced sound segments. Innovative codevector c from the innovative (fixed) excitation codebook is modified by the innovative filter f (z) (pitch enhancer 805) emphasizing higher frequencies over lower frequencies via its frequency responsekAnd filtering to realize periodic enhancement. The coefficients of the innovation filter f (z) are related to the magnitude of the periodicity in the excitation signal u.
An efficient possible way to derive the coefficients of the innovation filter f (z) is to relate them to the amount of tonal components in the total excitation signal u. This produces a frequency response that depends on the sub-frame periodicity, with higher frequencies being emphasized to a greater extent (stronger overall slope) in order to obtain greater pitch gain. The effect of the innovation filter 805 is to reduce the innovation codevector c at lower frequencies when the excitation signal u is more periodickWhich enhances the periodicity of the excitation signal u more at lower frequencies than at higher frequencies. The recommended form of the innovation filter 805 is as follows:
F(z)=-αz+1-αz-1
where α is a periodicity factor derived from the periodicity level of the excitation signal u. The periodicity factor a is calculated in the voicing factor generator 804. First, the turbidity factor rvIn the voicing factor generator 804, the calculation is as follows:
rv=(Ev-Ec)/(Ev+Ec)
wherein EvFor scaled pitch codevector bvTAnd E, andcfor the scaled innovation codevector gckThe energy of (a). Namely:
and
note rvIs located between-1 and 1 (1 corresponds to a pure voiced signal and-1 corresponds to a pure unvoiced signal).
By passingApplying the pitch delay T to the pitch codebook 801 to produce a pitch codevector, producing the scaled pitch codevector bv described aboveT. The pitch codevector is then processed by a low-pass or band-pass filter 802 whose cut-off frequency is selected relative to the index j from the demultiplexer 817 to produce a filtered pitch codevector vT. The filtered pitch codevector v is thenTAmplified by a tone gain b by an amplifier 826 to produce a scaled tone codevector bvT。
The voicing factor α is then calculated in voicing factor generator 804 according to the following equation:
α=0.125(1+rv)
this corresponds to a value of 0 for a pure unvoiced signal and 0.25 for a pure voiced signal.
Thus, the scaled innovative codevector gc is passed through the innovative filter 805(F (z))kFiltering to calculate enhanced signal cf。
The enhanced excitation signal u' is calculated by adder 820 according to the following equation:
u’=cf+bvT
it should be noted that this process is not performed in the encoder 700. Therefore, it is desirable to update the content of the pitch codebook 801 with past values of the excitation signal u without the enhancement stored in the memory 803, thereby maintaining synchronization between the encoder 700 and the decoder 800. Thus, the excitation signal u is used to update the memory 803 of the pitch codebook 801 and the enhancement excitation signal u' is used at the input of the LP synthesis filter 806.
By having the form 1(whereinInterpolating LP filter for quantization in current sub-frame) of LP synthesis filter 806 pairsThe strong excitation signal u 'is filtered to calculate the composite signal s'. As can be seen in FIG. 8, the quantized interpolated LP coefficients from demultiplexer 817 on line 825Is provided to the LP synthesis filter 806 to adjust the parameters of the LP synthesis filter 806 accordingly. The de-emphasis filter 807 is the inverse of the pre-emphasis filter 703 of fig. 7. The transfer function of the de-emphasis filter 807 is given by:
D(z)=1/(1-μz-1)
where μ is a pre-emphasis factor with some value between 0 and 1 (typical value is μ ═ 0.7). Higher order filters may also be used.
The vector s' is filtered by a de-emphasis filter D (z)807 to obtain the vector sdIt is processed by a high pass filter 808 to remove undesired frequencies below 50Hz and further obtain sh。
The over sampler 809 performs an inverse process of the down sampler 701 of fig. 7. For example, oversampling reconverts the 12.8kHz sampling rate to the original 16kHz sampling rate using techniques well known to those of ordinary skill in the art. The oversampled composite signal is represented as. SignalAlso referred to as a composite wideband intermediate signal.
Over-sampled composite signalDo not contain higher frequency components, they are lost in the downsampling process of the encoder 700 (block 701 of fig. 7). This provides a low-pass perception of the synthesized speech signal. To recover the full band of the original signal, a high frequency generation process is performed in block 810 and is required from the voicingInput to the factor generator 804 (FIG. 8).
The resulting band-pass filtered noise sequence z from the high frequency generation module 310 is added to the oversampled synthesized speech signal by adder 821Resulting in a final reconstructed output speech signal s on output 823out. An example of a high frequency regeneration process is described in international PCT patent application WO 00/25305 published 5, 4, 2000.
Referring again to fig. 3, in the full rate communication mode, a codec according to the AMR-WB standard operates at 12.65 kbits/sec and is used in conjunction with the bit allocation given in table 1. The use of the 12.65 kbit/s rate of the AMR-WB codec enables the design of a variable bit rate codec for CDMA2000 systems that can interwork with other systems that employ the AMR-WB codec standard. An additional 13 bits are added to accommodate the 13.3 kbit/sec full rate of CDMA2000 rate set II. These bits are used to improve codec robustness in the case of erased frames. More details about the AMR-WB codec can be found in reference ITU-T recommendation G.722.2 "wideband coding of speech at approximately 16 kbits/sec using adaptive multi-rate wideband (AMR-WB)" (Geneva, 2002). The codec is based on an algebraic code-excited linear prediction (ACELP) model optimized for wideband signals. It operates on 20ms speech frames with a sampling frequency of 16 kHz. The LP filter parameters are encoded once per frame using 46 bits. The frame is then divided into four sub-frames, where adaptive and fixed codebook indices and gains are coded once per frame. The fixed codebook is constructed using an algebraic codebook structure, where 64 positions in the subframe are divided into four tracks of interleaved positions, and two signed pulses are placed in each track. The two pulses for each track are encoded with nine bits, providing a total of 36 bits per sub-frame.
Table 1. the AMR-WB standard is allocated with bits of 12.65 kbits/sec (20 ms frame containing four subframes).
| Parameter(s) | Bit/frame |
| VAD sign | 1 |
| LP parametric pitch delay pitch filter gain algebraic codebook | 4630=9+6+9+64=1+1+1+128=7+7+7+7144=36+36+36+36 |
| Total of | 253 bit |
According to AMR-WB assuming 12.65 kbit/s, a variable bit rate wideband (VBR-WB) solution can work according to several communication modes, where one mode is interworking with AMR-WB assuming 12.65 kbit/s. Thus, two forms of Full Rate (FR) are used: interoperable FR, where 13 unused bits are added to get 13.3 kbits/sec; and normal or CDMA related FR, where VAD bits and an additional 13 available bits are used to convey information, which improves the robustness of the codec to Frame Erasures (FER). The bit allocation for the two FR code forms is shown in table 2. It should be noted that no additional bits are required for the frame classification information. The 14-bit FER protection contains 6 bits of energy information. Thus, only 63 stages are used to quantize the energy, and the last stage corresponding to value 63 is reserved to indicate the use of the interoperable mode. Thus, in the case of the interoperable FR, the energy information index is set to 63.
Table 2 bit allocation for normal and interoperable full rate CDMA2000 rate set II according to the 12.65 kbit/s AMR-WB standard.
In the case of a stable voiced frame, the half-rate voiced coding block 206 is used. The half-rate voiced bit allocation is given by table 3. Since the frames to be encoded in this communication mode are characteristically very periodic, a sufficiently low bit rate is sufficient to maintain a good subjective quality, e.g. compared to transient frames. Signal modification is used which allows efficient coding of delay information using only nine bits per 20ms frame, saving a significant portion of the bit budget for other signal coding parameters. In signal modification, the signal is forced to follow a certain pitch contour, which can be transmitted 9 bits per frame. The good performance of long-term prediction allows using only 12 bits per 5ms subframe for fixed codebook excitation without compromising subjective speech quality. The fixed codebook is an algebraic codebook comprising two tracks each with one pulse, and each track has 32 possible positions.
Table 3. half rate normal, voiced, unvoiced bit allocation according to CDMA2000 rate set II.
In the case of unvoiced frames, no adaptive codebook (or pitch codebook) is used. A 13-bit gaussian codebook is used for each subframe, wherein the codebook gain is encoded with 6 bits per subframe. Note that in case the average bit rate needs to be further reduced, the unvoiced quarter rate may be used in case of stable unvoiced frames.
The normal half rate mode (312) is used for the low energy segments as shown in fig. 3. This normal HR mode may also be used for maximum half rate operation as will be explained later. The bit allocation for normal HR is shown in table 3 above.
For example, for the classification information of different HR encoders, in the case of normal HR, 1 bit is used to indicate whether the frame is normal HR or other HR. In case of unvoiced HR, 2 bits are used for classification: the first bit indicates that the frame is not a normal HR and the second bit indicates that it is an unvoiced HR and not a voiced HR or inter-working HR (explained later). In case of voiced HR, 3 bits are used: the first 2 bits indicate that the frame is not a normal or unvoiced HR and the third bit indicates whether the frame is an unvoiced or interoperable HR.
An eighth rate (CNG) encoding module 208 is used to encode inactive speech frames (silence or background noise). In this case, the LP filter parameters are coded with 14 bits per frame and the gains are coded with 6 bits per frame. These parameters are used for Comfort Noise Generation (CNG) at the decoder. The bit allocation is shown in table 4.
Table 4.20ms frame bit allocation at rate one eighth of 1.0 kbit/sec.
| Parameter(s) | Bit/frame |
| LP parameter gain | 146 |
| Total of | 20 bits/frame 1.0 kbit/s |
System enforced half rate operation
According to the CDMA coding scheme, the system may force the use of half rate instead of full rate in some speech frames in order to send inband signaling information. This is called half-blank-burst sequence signaling. The use of half rate as the maximum bit rate may also be enforced by the system during poor channel conditions (e.g., near cell boundaries) in order to improve codec robustness. This is called half rate max. In the above-described VBR encoding configuration, the half rate is used when the frame is a stationary voiced or stationary unvoiced. Full rate is used for start, transient frames, and mixed voiced frames. When the rate selection module selects a frame to encode as a full-rate frame and the system enforces a half-rate frame, speech performance degrades because the half-rate communication mode is not able to efficiently encode the start and transient frames.
Furthermore, in a cross-system tandem-free operation call between CDMA2000 employing an AMR-WB based VBR rate set II solution and another system employing standard AMR-WB, the CDMA2000 system may eventually enforce half-rate, as previously described (e.g., with half-blanking-burst sequence signaling). Since the AMR-WB codec does not recognize the 6.2 kbit/sec half rate of the CDMA2000 wideband codec, the mandatory half rate frame is interpreted as an erasure frame. This reduces the performance of the connection.
The non-limiting illustrative embodiment of the present invention implements a novel technique that improves the performance of a variable bit rate speech codec operating in a CDMA wireless system in the case where half rate is enforced by the system. Furthermore, this novel technique improves the performance of cross-system tandem-free operation between CDMA2000 and other systems employing AMR-WB codecs when the CDMA2000 system mandates the use of half rate.
In half-space white-burst signaling or half-rate maximum operation, when the system requests the use of half-rate, while full-rate has been selected by the classification mechanism, this indicates that the frame is not unvoiced or stationary voiced, and that the frame is likely to contain non-stationary speech segments, such as voiced onset or rapidly evolving voiced speech signals. Thus, the use of half-rate optimized for unvoiced or stationary voiced signals degrades speech performance. In this case a new half rate mode is required, and a normal HR has been introduced, which can be used in such cases. Thus, in half-rate maximum or half-space white-burst sequence operation, if a frame is not classified as voiced or unvoiced HR, the encoder employs normal HR. However, in CDMA2000 systems, there is an operation known as packet level signaling whereby signaling information is not provided to the encoder and the system may force the use of HR after the frame has been encoded. Thus, if a frame is already encoded as FR and the system requires the use of HR, the frame will be declared erased. Furthermore, in the interoperable mode where the VBR encoder interworks with 12.65 kbit/s AMR-WB, the normal HR cannot be used in case of half rate maximum and half space-burst sequence operation, since it is not part of AMR-WB. To avoid erasing the frame in these cases (packet level signaling or half-space white-burst sequence and half-rate maximum in interoperable mode), a non-limiting illustrative embodiment of the present invention employs a half-rate mode derived directly from the full-rate mode by discarding a portion of the signal coding parameters, e.g., fixed codebook index, after the frame has been coded as a full-rate frame. At the decoder side, the discarded portions of the signal-coding parameters, e.g., the fixed codebook indices, may be randomly generated and the decoder will operate as if it were full-rate. This half rate mode is called signaling HR or interworking HR because both encoding and decoding are done at full rate. The bit allocation for the interoperable half rate mode according to a non-limiting illustrative embodiment of the present invention is given by table 5. In this non-limiting illustrative embodiment, the full rate is based on the 12.65 kbit/sec AMR-WB standard, and the half rate is derived by discarding 144 bits required for the index of the algebraic fixed codebook. The signaling HR differs from the interoperable HR in that the signaling HR is used for packet level signaling operations in CDMA2000 systems and may still use FER protection bits. The signaling HR is directly derived from the normal FR shown in table 1 by discarding 144 bits for algebraic codebook indices. Three bits are added for class information and only six bits are used for FER protection, leaving five unused bits. The interoperable HR is derived from the interoperable FRs by discarding 144 bits for algebraic codebook indices. Three bits are added for the class information, leaving 12 unused bits. As previously mentioned, three bits are used for the case of voiced HR or interoperable HR when discussing classification information in different half-rate cases. No extra information is sent to distinguish the signaling HR from the interoperable HR. The final stage of 6-bit energy information is used for this purpose, similar to the case of FR. Only level 63 is used to quantize the energy and the last level corresponding to value 63 is reserved to indicate the use of the interoperable mode. Thus, in the case of an interoperable HR, the energy information index is set to 63.
Table 5.6.2 kbit/s signaling and interoperable half rate bit allocation.
Fig. 4 illustrates the schematic block diagram of fig. 3 by adding a system request using half rate in the rate determination logic. The configuration in fig. 3 is valid for operation in a CDMA2000 system. When the rate determination chain is finished, module 404 checks if there is a half rate system request. If the rate determination logic indicates that the frame is an active speech frame (block 201) and it is neither unvoiced nor stationary voiced (block 202) nor a frame with low energy (block 203) (block 311), but the system requests a half rate operation (block 404), the frame is encoded with a normal half rate in block 312.
Otherwise (no half rate system request is present), the speech frames are encoded in block 205 as full rate frames (13.3 kbits/sec according to CDMA2000 rate set II).
In the non-limiting illustrated embodiment of the invention as shown in fig. 5, the rate determination logic and variable rate encoding are the same as in fig. 3. However, after the frame has been encoded and the bits transmitted, a test is made to check whether the system requests a half rate operation in block 514. If this is the case and the transmitted frame is an FR frame, a portion of the signal coding parameters, e.g., the fixed codebook index, is discarded to obtain a signaling half-rate frame (block 510). Note that in this non-limiting illustrative embodiment, one to three bits are used for half rate mode (normal, voiced, unvoiced, or interoperable). Thus, 3 bits indicating signaling or interoperable half rate are added after discarding part of the signal coding parameters (fixed codebook indices). The bits in the frame are allocated according to table 5.
The choice of discarding fixed codebook indices is due to the fact that: these bits are the least sensitive to errors and their random generation has little impact on performance. It should be kept in mind, however, that other bits may be discarded in order to get interoperable or signaling half rate without loss of generality.
In this non-limiting illustrative embodiment, the encoder operates as a full rate encoder in signaling or interoperable half rate operation on the encoder side. Wideband coding of speech at approximately 16 kbits/sec using adaptive multi-rate wideband (AMR-WB) according to the 12.65 kbits/sec AMR-WB standard [ ITU-T recommendation g.722.2 ], Geneva, 2002 ] [3gpp ts 26.190 "AMR wideband speech codec: transcoding function ", 3GPP technical specification ], fixed codebook search is performed as usual, and the determined fixed codebook excitation is used to update the adaptive codebook content and the filter memory of the subsequent frame. Therefore, no random codebook index is used in the encoder operation. This is evident in the implementation of fig. 5, where half rate system requests are checked (block 514) after the frame has been encoded by normal full rate operation.
In the signaling or interoperable half-rate operation at the decoder side, the discarded part of the signal coding parameters, e.g. the index of the fixed codebook, is randomly generated. The decoder then operates as a full rate operation. Other methods of generating the discarded portions of the signal-coding parameters may be used. For example, the dropped parameters may be obtained by copying portions of the received bitstream. Note that a mismatch may occur between the memories on the encoder and decoder side, because the discarded parts of the signal coding parameters, e.g. the fixed codebook excitation, are not identical. However, this mismatch does not seem to affect performance, especially in the case of half space-burst signaling when interworking between CDMA2000VBR and AMR-WB is possible, where the typical rate is about 2%.
The performance of the proposed method in a half space-time-burst sequence operation is almost transparent compared to the case without a half rate system request. In many cases, the rate determination logic has determined that the frame is to be encoded at one-eighth rate, one-quarter rate, or half rate (normal, voiced, or unvoiced). In this case, the half rate system request is ignored because it has already been accommodated by the encoder, and the type of signal in the frame is appropriate for half rate or lower rate encoding.
It should be noted that the classification logic is adaptive to a certain operating mode. Thus, to improve performance, this classification logic may be more relaxed for using a particular half-rate codec (half-rate voiced and unvoiced are used more frequently than in normal operation) in half-rate maximum mode and half-space white-burst sequence signaling. This is an extension to multi-mode operation, where the classification logic is more relaxed and modes with lower average data rates are used.
Tandem-free operation between CDMA2000 system and other systems employing AMR-WB standard
As described earlier, designing a variable bit rate wideband (VBR-WB) codec of a CDMA2000 system according to an AMR-WB codec is advantageous in that a tandem-free operation (TFO) or a packet-switched operation between the CDMA2000 system and other systems employing the AMR-WB standard, such as a mobile GSM system or a W-CDMA third generation wireless system, is achieved. However, in a cross-system tandem-free operation call between CDMA2000 and another system employing AMR-WB, the CDMA2000 system may force the use of half rate, as previously described (e.g., with half blank-burst sequence signaling). Since the AMR-WB codec does not recognize the 6.2 kbit/sec half rate of the CDMA2000 wideband codec, the mandatory half rate frame is interpreted as an erasure frame. This reduces the performance of the connection. The use of the interoperable half rate mode disclosed earlier will greatly improve performance because this mode can interwork with the 12.65 kbit/s rate of the AMR-WB standard.
As disclosed herein above, the interoperable half-rate is essentially a pseudo-full-rate, where the codec operates as if it were a full-rate mode. The difference is that a part of the signal coding parameters, e.g. the algebraic codebook index, is eventually discarded and not transmitted. At the decoder side, the discarded portions of the signal-coding parameters, e.g., algebraic codebook indices, are randomly generated, and the decoder then operates as if it were in full-rate mode.
Fig. 6 illustrates a configuration demonstrating the use of interoperable half-rate mode during in-band transmission of signaling information (i.e., half-space-burst sequence conditions) in the CDMA2000 system side, according to a non-limiting illustrative embodiment of the present invention. In the figure, the other side is a system employing the AMR-WB standard, which is given as an example a 3GPP wireless system.
In the link from CDMA2000 to 3GPP or other systems employing AMR-WB, when the multiplexing sublayer indicates a request for half rate mode (see half blanking-burst sequence system request 601), the VBR-WB encoder 602 will operate at the interoperable half rate (I-HR) as described earlier. At the system interface 604, a randomly generated algebraic codebook index is inserted into the bitstream by the module 603 through the IP-based system interface 604 to output a 12.65 kbit/s rate when an I-HR frame is received. The decoder 605 on the 3GPP side interprets it as a normal 12.65 kbit/sec frame.
In the other reverse direction, i.e. the link from 3GPP or other systems employing AMR-WB to CDMA2000, if a half rate request is received at the system interface 606 (see half blank-burst sequence system request 607), the module 608 discards the algebraic codebook index and inserts 3 bits indicating the I-HR frame type. The decoder 609 on the CDMA2000 side will operate as an I-HR frame type, which is an integral part of the VBR-WB solution.
This proposal requires minimal logic at the system interface, which greatly improves performance for forcing half blank-burst sequence frames as blank-burst sequence frames (erasure frames).
Another problem in interpolation is the handling of background noise frames. On the AMR-WB side, the encoder 610 supports DTX (discontinuous transmission) and CNG (comfort noise generation) operation. Inactive speech frames (silence or background noise) are either encoded with 35 bits as SID (silence description) frames or they are not transmitted (no data). On the CDMA2000 side, inactive speech frames are encoded with an Eighth Rate (ER). Since 35 bits of the SID cannot be transmitted with ER, the SID frame is transmitted from the AMR-WB side to the CDMA2000 side with CNG Quarter Rate (QR). The untransmitted no-data frame on the AMR-WB side is converted to an ER frame (in the illustrative embodiment, all bits are set to 1). On the CDMA2000 side in the interoperable mode, ER frames are handled by the decoder as frame erasures.
In the interworking from CDMA2000 to AMR-WB side, CNG QR is used at the beginning of inactive speech segment, and then ER frame is used again. In a non-limiting illustrative embodiment of the present invention, the operation is similar to VAD/DTX/CNG operation in AMR-WB, where SID frames are sent every eight frames. In this case, the first inactive speech frame is encoded as a CNG QR frame, followed by 7 frames as ER frames. At the system interface, the CNG QR frame is converted to an AMR-WB SID frame, and the ER frame is not transmitted (no data frame).
Bit allocation for CNG QR and CNG ER frames is shown in table 6.
Table 6.20ms frame bit allocation for CNG QR of 2.7 kbit/s and CNG ER of 1 kbit/s.
Although the invention has been described in the foregoing specification with respect to a non-limiting illustrative embodiment thereof, this illustrative embodiment can be modified within the scope of the appended claims without departing from the scope and spirit of the invention. For example, bits other than those involving fixed codebook indices, especially bits with less error sensitivity, may be discarded in order to obtain interoperable half-rate frames.
Claims (13)
1.A method for encoding speech, comprising:
receiving signal coding parameters representing a sound signal coded according to a full rate communication mode of a CDMA2000VBR-WB communication scheme;
receiving a request to transmit the signal coding parameters using a half-rate communication mode of the CDMA2000VBR-WB communication scheme to reduce a bit rate during transmission of the signal coding parameters;
discarding a portion of the signal-coding parameters in response to the request to enable transmission of remaining signal-coding parameters using the half-rate communication mode of the CDMA2000VBR-WB communication scheme, wherein the discarded portion of the signal-coding parameters is a fixed codebook index of an algebraic codebook; and
inserting an identification of an interoperable half-rate communication mode to be transmitted with the residual signal encoding parameters.
2. The method of claim 1, further comprising:
generating replacement signal encoding parameters to replace the discarded portions of the signal encoding parameters.
3. The method of claim 2, wherein generating replacement signal encoding parameters comprises randomly generating the fixed codebook index.
4. The method of claim 1, further comprising:
transmitting the residual signal coding parameters using a half-rate communication mode of the CDMA2000VBR-WB communication scheme;
generating replacement signal encoding parameters to replace the discarded portions of the signal encoding parameters; and
decoding signal coding parameters including the replaced portion of the signal coding parameters according to a full rate communication mode of an AMR-WB communication scheme.
5. The method of claim 1, further comprising: an initial step of encoding the sound signal according to a full rate communication mode of the CDMA2000VBR-WB communication scheme.
6. A method for encoding speech, comprising:
receiving an indication that signal coding parameters representing sound signals encoded according to a half-rate communication mode of a CDMA2000VBR-WB communication scheme have been transmitted using the half-rate communication mode of the CDMA2000VBR-WB communication scheme instead of a full-rate communication mode of the CDMA2000VBR-WB communication scheme to reduce a bit rate during transmission of the signal coding parameters; and
in response to the indication, generating replacement signal coding parameters so as to replace a portion of the signal coding parameters that are discarded to reduce a bit rate during transmission, wherein the discarded portion of the signal coding parameters are fixed codebook indices of an algebraic codebook.
7. The method of claim 6, further comprising receiving the signal encoding parameters and decoding the sound signal using the replaced signal encoding parameters.
8. A system for encoding speech comprising a first station using a CDMA2000VBR-WB communication scheme and a second station using an AMR-WB communication scheme, a full-rate communication mode of said CDMA2000VBR-WB communication scheme interoperable with a full-rate communication mode of said AMR-WB communication scheme;
the first station includes:
means for encoding the sound signal to generate signal encoding parameters according to a full rate communication mode of a CDMA2000VBR-WB communication scheme;
means for receiving a request to transmit the signal coding parameters using a half rate communication mode of a CDMA2000VBR-WB communication scheme;
means for discarding, in response to the request, a portion of the signal-coding parameters encoded in accordance with the full-rate communication mode of the CDMA2000VBR-WB communication scheme, wherein the discarded portion of the signal-coding parameters is a fixed codebook index of an algebraic codebook; and
means for transmitting residual signal coding parameters using the half-rate communication mode of the CDMA2000VBR-WB communication scheme;
the second station includes:
means for receiving the residual signal encoding parameters;
means for generating replacement signal encoding parameters to replace discarded portions of the signal encoding parameters; and
means for decoding the signal-encoding parameters using the residual signal-encoding parameters and the generated replacement signal-encoding parameters.
9. An apparatus for encoding speech, comprising:
means for receiving signal coding parameters representing sound signals encoded in accordance with a full rate communication mode of a CDMA2000VBR-WB communication scheme;
means for receiving a request to transmit the signal coding parameters using a half-rate communication mode of the CDMA2000VBR-WB communication scheme to reduce a bit rate during transmission of the signal coding parameters;
means for discarding a portion of the signal coding parameters to enable transmission of remaining signal coding parameters using the half-rate communication mode of the CDMA2000VBR-WB communication scheme, wherein the discarded portion of the signal coding parameters is a fixed codebook index of an algebraic codebook;
means for inserting an identification of an interoperable half-rate communication mode to be transmitted with the residual signal encoding parameters; and
means for transmitting the residual signal coding parameters in accordance with the half-rate communication mode of the CDMA2000VBR-WB communication scheme.
10. An apparatus for encoding speech, comprising:
means for receiving an indication that signal coding parameters representing sound signals encoded in accordance with a half-rate communication mode of a CDMA2000VBR-WB communication scheme have been transmitted using the half-rate communication mode of the CDMA2000VBR-WB communication scheme instead of a full-rate communication mode of the CDMA2000VBR-WB communication scheme to reduce a bit rate during transmission of the signal coding parameters; and
for generating, in response to the indication, replacement signal coding parameters for replacing a portion of the signal coding parameters that are discarded to reduce a bit rate during transmission, wherein the discarded portion of the signal coding parameters are fixed codebook indices of an algebraic codebook.
11. Apparatus according to claim 10, wherein the means for generating replacement signal encoding parameters is arranged to randomly generate replacement signal encoding parameters.
12. The apparatus of claim 11, wherein:
the randomly generated replacement signal encoding parameters comprise a randomly generated replacement fixed codebook index.
13. The apparatus of claim 10, further comprising means for receiving the signal encoding parameters, and means for decoding the sound signal using the replaced signal encoding parameters.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CA002392640A CA2392640A1 (en) | 2002-07-05 | 2002-07-05 | A method and device for efficient in-based dim-and-burst signaling and half-rate max operation in variable bit-rate wideband speech coding for cdma wireless systems |
| CA2,392,640 | 2002-07-05 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1130558A1 HK1130558A1 (en) | 2009-12-31 |
| HK1130558B true HK1130558B (en) | 2013-07-12 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP5173939B2 (en) | Method and apparatus for efficient in-band dim-and-burst (DIM-AND-BURST) signaling and half-rate max processing during variable bit rate wideband speech coding for CDMA radio systems | |
| EP1554718B1 (en) | Methods for interoperation between adaptive multi-rate wideband (amr-wb) and multi-mode variable bit-rate wideband (wmr-wb) speech codecs | |
| KR100732659B1 (en) | Method and device for gain quantization in variable bit rate wideband speech coding | |
| US7657427B2 (en) | Methods and devices for source controlled variable bit-rate wideband speech coding | |
| JP2006525533A5 (en) | ||
| EP1808852A1 (en) | Method of interoperation between adaptive multi-rate wideband (AMR-WB) and multi-mode variable bit-rate wideband (VMR-WB) codecs | |
| HK1130558B (en) | Method and device for cdma wireless systems | |
| CA2491623C (en) | Method and device for efficient in-band dim-and-burst signaling and half-rate max operation in variable bit-rate wideband speech coding for cdma wireless systems | |
| HK1084227A (en) | Method and device for efficient in-band dim-and-burst signaling and half-rate max operation in variable bit-rate wideband speech coding for cdma wireless systems | |
| Paksoy | Variable rate speech coding with phonetic classification | |
| HK1082315B (en) | Method and device for gain quantization in variable bit rate wideband speech coding |