HK1022548A

HK1022548A - Method and apparatus for improving the voice quality of tandemed vocoders

Info

Publication number: HK1022548A
Application number: HK00101383.3A
Authority: HK
Inventors: P‧默海尔斯泰恩; R‧拉比普尔; P‧科维尔达勒; W‧纳瓦罗
Original assignee: 北方电讯有限公司
Priority date: 1997-06-26
Filing date: 1997-11-05
Publication date: 2000-08-11

Description

Method and apparatus for improving voice quality of tandem vocoder

Technical Field

The present invention relates to a method and apparatus for transmitting digitized voice signals in a communications environment that may be wireless in nature. And more particularly, to a method and apparatus for improving the quality of an audio signal that has been compressed or encoded using digital signal processing techniques as the signal is transmitted from one end of a communication network to another.

Background

In recent years, the telecommunications industry has seen a trend towards a dramatic increase in a variety of digital vocoders in order to meet the bandwidth requirements of different wireline and wireless communication systems, the name vocoder originating from the fact that its application is primarily specific to the encoding and decoding of voice signals. Vocoders are typically integrated into mobile phones and base stations of a communication network. They provide speech compression and inverse transformation of digitized voice signals. Typically, the speech signal is digitized by one of many quantization techniques. Examples of these techniques are Pulse Amplitude Modulation (PAM), Pulse Code Modulation (PCM) and delta modulation. For the purposes of this description, we consider PCM as the input format of a vocoder. Thus, the vocoder comprises a coding stage that will accept the digitized voice signal as input and output a compressed signal, with a possible compression ratio of 8: 1. As an inverse transform, the vocoder has a decoding stage that accepts a compressed speech signal and outputs a digitized signal, e.g., PCM samples.

The main advantage of compressing speech is that it will use less of the limited available channel bandwidth for transmission, the main disadvantage being loss of speech quality.

Most modern low-bit-rate vocoders are based on a linear prediction model, which divides the speech signal into a set of linear prediction coefficients, a residual signal and various other parameters. In general, speech can be recovered from these components with good quality. However, when speech is subjected to a multi-stage vocoder, a quality degradation is introduced.

The rapid growth in network diversity and the number of users in these networks has led to an increasing number of occasions where two vocoders are placed in series to serve a single connection point. In such a case, the first encoder is used to compress the speech of the first mobile user. The compressed speech is transmitted to the base station serving the local mobile subscriber where it is decompressed (converted to PCM format samples). The resulting PCM samples arrive via the digital trunk of the telephone network at a base station serving the second mobile terminal, where a second encoder is used to compress the input signal for transmission to the second mobile terminal. A speech decoder at the second mobile terminal decompresses the received compressed speech data to synthesize the original speech signal from the first mobile terminal. One particular example of such a situation may include a call from a wireless terminal operating in accordance with the north american Time Division Multiple Access (TDMA) system to a european standard Global System for Mobile (GSM) mobile phone.

In an attempt to eliminate the vocoder tandem condition, a method called bypass has been proposed in the past. The basic idea behind this approach is to provide a digital signal processor comprising a vocoder and a bypass mechanism, which is put into use when the format of the input signal is compatible with the vocoder. In use, the digital signal processor is in contact with a first base station receiving RF signals from a first mobile terminal, and determines that the second base station is in contact with the mobile terminal to which the call is directed by signalling and controlling the identical digital signal processor present at the second base station. A digital signal processor associated with the first base station does not convert the compressed speech signal into PCM samples, and activates the bypass mechanism and outputs the compressed speech into the communication network. When the compressed speech signal reaches the digital signal processor associated with the second base station, it is routed, for example, to bypass a local vocoder. The decompression of the signal only takes place at the second mobile terminal. The "bypass" method is described in international application number PCT95CA704 dated 12/13 of 1995. The disclosure of which is incorporated herein by reference.

However, this solution works only for exactly the same vocoder, with the rapid expansion of the network. The spread of vocoders also grows rapidly. Thus, the bypass solution is only useful for a small number of connection points containing the concatenated vocoders.

Thus, there is an industry need to provide an apparatus that improves voice quality during connections that include incompatible tandem vocoders.

Objects and statements of the invention

An object of the present invention is to provide an apparatus for processing an audio signal, which can reduce signal quality degradation when signals are exchanged between two vocoders in a communication network.

It is another object of the present invention to provide a method for reducing audio signal degradation when signals are transmitted from one vocoder to another vocoder in a communication network.

As embodied and broadly described herein, the present invention provides an apparatus for processing an audio signal, said apparatus comprising an input and an output, said apparatus being responsive to a frame of compressed audio data in a first format applied to said input to produce at said output a frame of compressed audio data in a second format, the frame in the first format having a coefficient segment and an excitation segment, the frame in the second format having a coefficient segment and an excitation segment, said apparatus comprising:

a) first processing means coupled to said input for receiving coefficient segments of compressed audio data frames in a first format and for emitting coefficient segments of compressed audio data frames in a second format at said output;

b) second processing means coupled to said input for generating an excitation segment of a compressed audio data frame of a second format from a compressed audio data frame of a first format.

In a preferred embodiment of the invention, a pair of transcoders is provided to effect conversion of the compressed audio signal from one format to a different format. Each transcoder has a pseudo-decoder that transforms the input compressed audio signals to a common format. And then transmitted over the telephone company network to the second transcoder. A pseudo encoder at the remote transcoder processes the common format signal and converts it to a compressed audio signal having a different format than the original compressed audio signal sent to the first transcoder, each transcoder having a pseudo decoder for full duplex operation. A common format signal is generated, and a pseudo encoder is provided for converting the common format signal into a compressed audio signal.

The system is particularly advantageous when the telephone network has a wide variety of different vocoders. In order to enable the exchange of speech signals from one vocoder to another, irrespective of whether they are the same or different, it must undergo a transformation of the compressed audio signal emitted by the local vocoder into a common format that can be processed by the distant pseudo-encoder, which common format can be defined as an intermediate form of the compressed audio signal, with the expectation that important parameter information transmitted by the pseudo-decoder of the local vocoder is conveyed directly to the pseudo-encoder of the distant vocoder, such parameter information including coefficient segments and parameters describing the excitation segment of the transmitted speech signal. An important element of the common format form is to preserve the basic frame structure of the audio signal when it is encoded by one of the vocoders in the network that may be interconnected during a given call. In particular, the common-format frame includes a coefficient segment and an excitation segment, which will be explained below, however, it is important to point out that, as a common-format structure, it is not intended to make the audio signal into PCM samples or equivalent forms. This is undesirable because transforming the compressed signal into PCM and then transforming the PCM samples into a compressed form causes a significant reduction in signal quality, which should be avoided as much as possible. The present invention has found that by devising a common format configuration, these quality degradations are greatly reduced by preserving the basic structure of the audio signal when encoded by the vocoder.

In this specification, the term "coefficient segment" is considered to be any set of coefficients that uniquely specifies a filter function that mimics a human vocal system. But also any type of information format from which the coefficients can be extracted indirectly. In a typical vocoder, several different types of coefficients are known, including reflection coefficients, the arcsine of the reflection coefficients, line spectral pairs, and logarithmic scale. These different types of coefficients are usually linked by a digital transformation and have different properties that make them suitable for different applications. Thus, the term "coefficient segment" is intended to encompass any of these types of coefficients.

An "excitation segment" may be considered as information that needs to be associated with a coefficient segment in order to provide a complete representation of the audio signal. But also any type of information format from which incentives can be indirectly extracted. The excitation section complements the coefficient section when synthesizing the signal to obtain a signal in uncompressed form, such as PCM samples, wherein such excitation section may comprise parametric information describing the periodicity of the speech signal as the excitation signal computed by the pseudo-decoder, the speech frame control signal, the encoder frame, the audio period, the pitch lag, the gain and the relative gain that ensure synchronization in the pseudo-encoder associated with the distant vocoder. The coefficient segment and the excitation segment may be represented in various ways in transmitting signals through the telephone company network. One possibility is to send information, in other words a sequence of bits representing the values of the parameters to be communicated, and another possibility is to send an index list, not conveying the parameters of the common format signal by themselves, but simply composing a list of values in a database or code book, allowing a pseudo-encoder to look up this database and extract the appropriate information according to the various indices received in order to compose the common format signal.

The expressions "first format", "second format" or "third format", when used to describe audio signals in compressed form, either in a common format form or in a given vocoder format, refer to these signals, which, in general, although they share a common basic structure, are incompatible, in other words they are divided into coefficient segments and excitation segments. Thus, a vocoder capable of transforming signals in a first format will, in general, not be able to process signals expressed as any other than the first format.

In a preferred embodiment, the conversion of the compressed form of the audio signal to the common format is achieved in two steps. The first step is to process the coefficient segments in the compressed audio signal data frames to produce the coefficient segments in the common format. Generally, the transformation from one type of coefficient to another is achieved by well-known mathematical algorithms. Depending on the type of vocoder associated with the pseudo-decoder, this transformation may be accomplished simply by re-quantizing the coefficients from the compressed audio signal data frame to new values of the coefficients making up the common format data frame. Next, the excitation segment of the common format data frame is obtained by processing the frame energy, gain values, lag values and encoding manual information (typically as part of the vocoder's decoding) and quantizing the excitation signal prior to forming the common format data frame.

The conversion from the common format data frame to the compressed audio signal by the pseudo-encoder is carried out in a similar manner as described earlier, the coefficient segments of the common format data frame being processed first to produce the coefficient segments of the compressed audio signal data frame, the excitation segment of the compressed audio signal data frame being obtained first by synthesizing the speech signal, the common format excitation segment being realized by a filter whose coefficients are also taken from the common format. Typically, this signal is applied to the encoding portion of the vocoder.

Another possibility to obtain the excitation segment of one format from the data frame of another format, not to synthesize the audio signal and to implement the analysis is to recalculate the excitation segment only from the data available in the excitation segment of the source data frame. The choice of this method or the above described methods will depend on the intended application or the type of transformation required. In particular, compressed audio signals of certain formats can be easily transformed to a common format by recalculating segments of each frame independently of each other. However, in other cases, it is more practical to obtain the excitation segment using a method of analysis-by-synthesis.

As embodied and broadly described herein, the present invention also provides an apparatus for transmitting compressed audio information of a data frame, the apparatus comprising:

a) a first transcoder including a first input and a first output, said first transcoder being responsive to a frame of compressed audio data in a first format applied to said input to produce at said output a frame of compressed audio data in a second format, the frame in the first format having a coefficient segment and an excitation segment, the frame in the second format having a coefficient segment and an excitation segment;

b) a second transcoder including a second input and a second output, said second input coupled to said first output for receiving frames of compressed audio data in a second format, said second transcoder being responsive to frames of compressed audio data in the second format applied to said second input for generating frames of compressed audio data in a third format at said second output, the frames in the third format having coefficient segments and excitation segments.

As embodied and broadly described herein, the present invention provides a method of processing a representation of a data frame of audio information in digitized and compressed form. A data frame comprising a coefficient segment and an excitation segment, the data frame being in a first format, said method comprising the steps of:

a) processing the coefficient segments of the first format data frame to generate coefficient segments of a second format data frame;

b) processing the first format data frame to generate an excitation section of a second format data frame;

c) combining the coefficient segments of the second-format data frames generated in steps a and b with the excitation segments of the second-format data frames to generate second-format data frames representing the audio information contained in the first-format data frames, respectively.

As embodied and broadly described herein, the present invention provides a method of transmitting a data frame representing audio information in digitized and compressed form, the data frame including a coefficient segment and an excitation segment, the data frame being in a first format, the method comprising the steps of:

a) processing the data frame in the first format at the first location to generate a data frame in a second format, the data frame in the second format including a coefficient segment and an excitation segment;

b) transmitting the data frames in the second format to a second location remote from the first location;

c) and processing the data frame of the second format at the second location to generate a data frame of a third format, wherein the data frame of the third format comprises a coefficient section and an excitation section.

As embodied and broadly described herein, the present invention provides a method of transmitting an audio signal between incompatible vocoders, the method comprising the steps of:

a) receiving a data frame of a first format from a first vocoder, the data frame comprising a coefficient segment and an excitation segment;

b) transforming a data frame of a first format into a data frame of an intermediate format comprises the sub-steps of:

i) processing the coefficient segments of the first format data frame to generate coefficient segments of the intermediate format data frame;

ii) processing the first format data frame to generate an excitation segment of the intermediate format data frame;

iii) combining the coefficient segments of the intermediate format data frame with the excitation segments of the intermediate format data frame to produce an intermediate format data frame representing the audio information contained in the first format data frame;

c) transforming the intermediate format data frame into a third format data frame comprises the sub-steps of:

i) processing the coefficient segments of the intermediate format data frames to generate coefficient segments of a third format data frame;

ii) processing the data frame of the intermediate format to generate an excitation segment of the data frame of the third format;

iii) combining the coefficient segments of the third format data frame with the excitation segments of the third format data frame to produce a third format data frame representing the audio information contained in the first format and intermediate format data frames.

d) The third format data frame is sent to a second vocoder.

As embodied and broadly described herein, the present invention provides a machine-readable storage medium containing program portions for instructing a computer to process audio signals, said computer including an input and an output, said program portions causing said computer to respond to frames of compressed audio data of a first format applied to said input, and to generate frames of compressed audio data of a second format at said output, the frames of the first format having a coefficient segment and an excitation segment, the frames of the second format having a coefficient segment and an excitation segment, said program portions embodied in functional blocks of said computer including:

a) first processing means coupled to said input for receiving coefficient segments of frames of compressed audio data in a first format and for issuing coefficient segments of frames of compressed audio data in a second format at said output;

b) second processing means coupled to the input for generating an excitation segment of a data frame of the second format compressed audio data from a data frame of the first format compressed audio data.

As embodied and broadly described herein, the present invention further provides an interface node between vocoders for converting a frame of a first format compressed audio signal into a frame of a second format compressed audio signal, the first format frame having a coefficient section and an excitation section, the second format frame having a coefficient section and an excitation section, the node comprising:

a) a first transcoder including a first input and a first output, said first transcoder being responsive to frames of first format compressed audio data applied to said input to produce frames of intermediate format compressed audio data at said output, the intermediate format frames having a coefficient segment and an excitation segment;

b) a second transcoder including a second input and a second output, said second input coupled to said first output for receiving frames of intermediate format compressed audio data, said second transcoder generating frames of second format compressed audio data at said second output in response to frames of intermediate format compressed audio data being applied to said second input.

Brief Description of Drawings

FIG. 1 is a block diagram of a CELP vocoder encoding stage;

FIG. 2 is a block diagram of a CELP vocoder decoding stage;

fig. 3a is a schematic representation of a communication line between a wireless mobile terminal and a fixed (wired) terminal;

FIG. 3b is a schematic representation of a communication line between two wireless mobile terminals with an embodiment of the present invention including two transcoders;

fig. 3c is a schematic representation of a communication line between two wireless mobile terminals with an embodiment of the present invention including a cross-transcoding node;

FIG. 4 is a block diagram of a system constructed in accordance with the present invention for converting a compressed speech signal from one format to another format via a common format without requiring decompression of the signal to a PCM type of digitization technique;

FIG. 5 is a more detailed block diagram of the system depicted in FIG. 4;

FIG. 6 is a block diagram of a cross-transcoding node, which constitutes a variation of the system depicted in FIG. 5;

FIG. 7a shows a data frame in IS54 format;

FIG. 7b shows a data frame of a common format produced by the transcoder of FIG. 5 or the transcoder of FIG. 6;

fig. 7c shows a data frame of IS641 format;

FIG. 8 IS a flowchart of the operation of transforming a compressed speech data frame in IS54 format into a common format;

FIG. 9 IS a flowchart of the operation of transforming a common format data frame into a compressed speech frame IS 141;

FIG. 10 is a block diagram of an apparatus that implements the functionality of a pseudo-encoder of the type shown in FIG. 5;

FIG. 11 is a functional block diagram of the device shown in FIG. 10; and

FIG. 12 is a functional block diagram of a variation of the apparatus shown in FIG. 10;

description of the preferred embodiments

The following is a description of Linear Predictive Coding (LPC) vocoder technology currently employed in wireless telecommunications, one application of particular interest being the wireless transmission of signals between mobile terminals and fixed base stations. Another application is voice transmission over interconnected communication networks, where different vocoders may be used in separate components of the wireless network.

In communication applications where channel bandwidth is a premium, it is necessary to utilize as small a portion of the transmission channel as possible. One common solution is to quantize and compress the voice signal emitted by the user prior to transmission.

Typically, the voice signal is first digitized by one of many quantization techniques. Examples of these techniques are Pulse Amplitude Modulation (PAM), Pulse Code Modulation (PCM) and delta modulation, PCM being perhaps the most common. Basically, in PCM, samples of an analog signal are taken at a particular rate (typically 8KHz) and quantized into discrete values for representation in a digital format.

For optimal use of the transmission channel, a codec comprising encoding and decoding stages is used to compress (and decompress) the digital signal at the source and sink points, respectively. Codecs that are specific for speech signals are entitled vocoder (for speech coding). By encoding only the necessary characteristics of the speech signal, less bits are required for transmission than are required to reproduce the original waveform in a way that does not significantly degrade speech quality, and a lower bit rate transmission will be achievable due to the fewer bits required.

Currently, the lowest bit rate vocoders are the Linear Predictive Coding (LPC) family, which extracts the appropriate speech features from the time domain waveform. The vocoder has two main components: an encoder and a decoder, the encoder section processing the digitized speech signal to compress it, and the decoder section expanding the compressed speech into a digitized audio signal.

The LPC-type vocoder estimates the current sample (Sn) using a weighted sum of the past P speech samples (Sn-k). The number P determines the order of the model. The higher the order, the better the speech quality, and a typical model order of 1In the range of 0 to 15. The equation for the speech sample can be written as follows:wherein, a_kIs the coefficient determining the contribution of the last sample Sn-k, and e_nIs the error signal of the current sample, using Sn and e_nAnd defining a prediction filter, we obtain:whereinFilterOnly poles and are therefore referred to as all-pole filters.

Fig. 1 is a block diagram of the encoding portion of the generic model of a CELP vocoder. It can be seen from this figure that input to the encoder section pitch domain analysis block 100 are PCM samples. The output consists of an LPC filter coefficient segment and an excitation segment consisting of several parameters representing the prediction error signal (also called residual). This output is forwarded to the remote communication channel.

The number of LPC filter coefficients in the coefficient segment is determined by the order P of the model, and examples of excitation segment parameters are: the nature of the excitation (speech or non-speech), pitch period (for the case of speech excitation), gain factor, energy, pitch prediction gain, etc. Code Excited Linear Prediction (CELP) vocoders are the most common type of vocoder currently used in telephony. Instead of sending the excitation parameters, the CELP vocoder sends index information that indicates a set of vectors in an adaptive random codebook, i.e., for each speech signal, the encoder searches through its codebook a set of vectors that best matches the sound perception when used as excitation for the LPC synthesis filter.

The speech frames comprising this information are recalculated every T seconds. The usual value of T is 20ms, a 20ms compressed speech frame representing 160 PCM samples taken at an 8KHz rate.

Fig. 2 is a block diagram of the general model decoding portion of a CELP vocoder. Compressed speech frames are received from the telecommunication channel 210 and passed to the LPC synthesis filter 220. The LPG synthesis filter 220 utilizes the LPC filter coefficient segment and the excitation segment to produce an output speech signal, typically in the form of PCM samples.

A technique called interpolation is used to enhance the vocoder. It subdivides the 20ms speech frame into 5ms sub-frames and interpolates their predictor coefficients, a technique useful for avoiding undesirable "pop" or "click" noise in the resulting speech signal, which is typically the result of rapid changes in predictor coefficients from one signal frame to the other. Specifically, for reference purposes, each signal frame is divided into four subframes, which may be labeled as subframe (1), subframe (2), subframe (3) and subframe (4), the predictor coefficients used to generate the speech signal on the first subframe, i.e., subframe (1), are the combination of the predictor coefficients of the previous frame and the current frame, with a ratio of 75%/25%, which for subframe (2) is changed to 50%/50%, which for subframe (3) is up to 25%/75%, and for the last subframe, subframe (4) is 0%/100%, in other words, only the coefficients from the current frame are used.

Fig. 3a,3b and 3c are schematics depicting telephone communications including wireless communication lines and embodying CELP vocoder technology.

Fig. 3a is a schematic representation of the communication link between a wireless mobile terminal 300 and a fixed (wired) terminal 330. The speech is compressed (encoded) by a vocoder located in the mobile terminal 300 and transmitted over a wireless communication line (RF channel) to the base station 310 where it is decoded into PCM samples by a decoder of the second vocoder. The signal is then directed through various switches in the digital trunk of the telecommunications company network 315 to the central office 320 to which the fixed terminal 330 is physically connected, where the digital signal is converted to analog format and transmitted to the terminal 330, in such a scenario that the voice is compressed and decompressed only once.

Fig. 3b is a schematic representation of the communication line between two wireless mobile terminals 340 and 380 with an embodiment of the invention comprising two transcoders, speech being compressed (encoded) by a vocoder located in mobile terminal a340 and transmitted over the wireless communication line (RF channel a) to base station a350, where it is decoded into PCM samples by the decoder of the second transcoder. The PCM samples are then transmitted to the second mobile terminal's base station B370 through the telecommunications company network 360 where they are compressed (encoded) a second time by the second base station vocoder. The compressed signal is transmitted over the wireless communication line (RF channel B) to the mobile terminal 380 where it is decoded a second time by the vocoder of the second mobile terminal. Audible speech is then available at the mobile terminal 380. Fig. 3b also shows an embodiment of the present invention that includes two transcoders 392 and 394, which are described in more detail below.

Fig. 3c is a schematic representation of a communication line between two wireless mobile terminals with an embodiment of the present invention including a cross-transcoding node 390, which will be described in detail below.

This vocoder scheme is an example of what is known as a tandem vocoder, other examples of tandem vocoders are where a wireless mobile terminal communicates with a fixed wireless terminal, and any type of wireless terminal is recovering messages from a central voice-mail system that uses a vocoder to compress speech before data is stored, in which case the speech passes through the vocoder compression and decompression algorithms more than once, and the quality of the speech is typically degraded when the vocoders are tandem in this manner.

In order to compensate for the degradation of the speech signal due to the concatenation of low bit rate codecs (vocoders). A method called bypass was developed to eliminate the double decoding/encoding performed by the vocoders in base stations 350 and 370. The basic idea behind this approach is that base station a350, knowing through signaling and control that the vocoder in mobile terminal B380 is the same as the vocoder in mobile terminal a340, bypasses the vocoder, thus allowing the signal data frames to pass through in the digital trunk 360 directly without conversion. Likewise, the base station 370, knowing that it received the compressed speech data frame, simply sends the signal to the mobile terminal B380 without any encoding operation. This bypass method is fully described in the international application referred to in the preceding part of the specification.

However, this solution is only effective for the same vocoder. With the rapid expansion of networks, the spread of vocoders also increases rapidly, and therefore, the bypass solution is only useful for a small fraction of connections involving tandem vocoding operations.

The present invention provides a method and system for reducing signal degradation that occurs when vocoders are connected in series during a call, the system being characterized by mechanisms and protocols for converting compressed voice data frames to an intermediate common representation, applicable between both mobile terminals and between a mobile terminal and a fixed wireless terminal.

Fig. 4 shows a block diagram of a system constructed in accordance with the present invention for converting a compressed speech signal from one format to another format through a common format without the need for decompressing the signal to PCM type digitization techniques.

A particular embodiment of the system is depicted in fig. 5, which is a block diagram showing a modular cross transcoding system 510 having two transcoders with identical functional blocks for implementing the method according to the invention, which transcoders are separate devices installed at the end of a communication path for providing signal transformation functions, which may be different, depending on the communication standard in use by the network, in a typical application each transcoder may be associated with a base station of the network, so that a signal sent by one transcoder is transmitted over the telephone network to a second transcoder to be processed, so that as will be described in detail later, both transcoders have identical functional blocks, one transcoder is described herein for simplicity, and is also applicable to the other units.

Transcoder 510 includes a signal and control 520, an encoding block 530 and a decoding block 540. The primary function of the signal and control block 520 is to determine whether to:

a) this connection is connected to an identical LPC-type vocoder,

b) this connection is connected to a different LPC-type vocoder,

c) this connection is made to an entity not including a) or b) above (i.e., another family vocoder, a new LPC vocoder, a wireless terminal, etc.)

The decode block 540 includes a decoder 542, a dummy decoder 544 and a bypass portion 546, and under control of the signal and control block 520, the decode block 540 will perform one of the following tasks:

a) when connected to a same LPC-type vocoder, the compressed voice signal is sent from mobile terminal a through bypass portion 546 which will pass the compressed voice data, perhaps after reformatting, for delivery to bypass portion 586 of transcoder 550 towards mobile terminal B;

b) when the connection is connected to an LPC-type vocoder available on a different transcoding module, a pseudo-decoder 544 is applied to convert the compressed speech data from mobile terminal a to a common format signal for transmission to the pseudo-encoder 584 of transcoder 550; or

c) When the connection is made to an entity not comprised by a) or b) above (i.e., another family type of vocoder, a new type of LPC vocoder, a wireless terminal, etc.), a voice decoder 542 is employed to convert the compressed voice data from mobile terminal a into PCM samples for transmission to the encoder 582 or central office 590 of transcoder 550.

The encoding block 530 includes an encoder 532, a dummy encoder 534 and a bypass portion 536. Under control of the signal and control block 520, the encoding block 530 will perform one of the following tasks:

a) when the connection source has one and the same LPC-type vocoder, the speech signal received from the bypass portion 576 of the transcoder 550 is sent to the bypass portion 536, which will pass the compressed speech data, possibly after reformatting, for delivery to the mobile terminal a to which the transcoder 510 is connected;

b) when there is a different LPC-type vocoder available at the connection source to the transcoding module, the pseudo-encoder 534 is caused to convert the common format signal received from the pseudo-decoder portion 574 of the transcoder 550 into compressed speech data and to forward this signal to the mobile terminal a;

c) when the connection is connected to an entity not included in a) or b) above (i.e., another family vocoder, a new LPC vocoder, a wireless terminal, etc.), the application voice encoder 532 converts the PCM format samples received from the decoder 572 of the transcoder 550 or the central office 590 into compressed voice data and forwards the compressed voice data to the mobile terminal a.

Signaling and control block 520 in transcoder 510 is designed to send messages to transcoder 550 and also to receive messages from transcoder 550, enabling the operation of the transcoder to be appropriately adjusted in accordance with the data received from transcoder 550 or sent to transcoder 550. Communication between two transcoders is effected via a communication channel established between them.

During PCM transmission, a bit stealing method is used. This approach is to use certain bits of certain speech samples to convey signal information. The position of the signal bits and the bit stealing rate are selected to reduce the perceived effect of bit substitution so that the audible signal at either mobile terminal is not significantly affected, and the receiving vocoder knows the position of the signal bits in the speech sample and is therefore able to decode the message.

The handshaking step between transcoders 510 and 550 includes exchanging different messages to enable one transcoder to identify the partner transcoder so that each unit can be set to the mode that allows the best possible voice quality to be produced, the handshaking step includes exchanging the following messages:

a) the transmitter of signal and control block 520 embeds an identifier into the PCM speech signal emitted by transcoder 510. This identifier enables any remote transcoder to accurately determine the type of vocoder connected to the originating transcoder, i.e., transcoder 510, through a base station search operation, as will be described below.

b) Signal and control block 560 looks at the data frames received by transcoder 550 and extracts in-band signal information by observing bit values at predefined locations in the data frames. If the message is a transcoder identifier, a database (not shown) considers determining the type of vocoder connected to the transcoder that sent the message. Depending on the content of the message, the following possibilities arise.

1) The default mode of encoding blocks 530 and 580, decoding blocks 540 and 570 are such that encoders 532 and 582, and decoders 542 and 572 are active, while the remaining functional blocks, namely dummy encoders 534 and 584, dummy decoders 544 and 574, and bypass portions 536,546,576 and 586 are inactive. This means that if transcoder 510 (or 550) does not recognize the presence of a partner transcoder in the network, it will act as a normal vocoder, i.e. it will convert the compressed speech data received from mobile terminal a into PCM samples for input to the transmission network. Likewise, the transcoder will expect to receive PCM samples from the transmission network and transform these samples into a compression format compatible with the vocoder of the mobile terminal served by the transcoder;

2) if the signal & control block 510 has recognized that there is a remote transcoder, the transcoder identifier is checked in the local database to determine the type of transcoder that sent the message, such as:

i) the transcoder is the same, in other words the vocoder connected to the distant transcoder operates according to the same frame format or standard as the vocoder connected to the transcoder 510, the signal and control block 520 causes the decoding block to enable the bypass stage 546 and disable the decoder 542 and the dummy decoder 544. Thus, any compressed voice data received from the distant transcoder will be directed to mobile terminal a without decoding operations. This mode of operation allows the best possible voice quality to be achieved because no concatenation of vocoders occurs. Signal and control block 520 will also switch encoding block 530 to a state in which bypass portion 536 is active and encoder 532 and dummy encoder 534 are inactive. Therefore, the compressed voice data received from the mobile terminal a will pass through the transcoder 510 without any decoding operation. It should be noted that switching the encoding block 530 to the bypass mode is based on the assumption that the signal and control block 560 of the remote transcoder 550 has received the identifier of the transcoder 510 and has also set the decoding block 570 and the encoding block 580 to the bypass mode. In this case, a full duplex connection is established between the transcoders exchanging compressed speech signals;

ii) the transcoder is different, i.e. the distant transcoder indicates that the vocoder with which the mobile terminal B is associated is of a different LPC type, then the signal and control block 520 enables the decoding block 540 to activate the pseudo decoder 544 to disable the decoder 542 and the bypass portion 546. In this mode of operation, signaling and control block 520 expects to receive speech signals encoded in a common format, pseudo decoder 544 will convert to the format of a vocoder associated with mobile a, and signaling and control block 520 switches encoding block 530 to a mode in which pseudo encoder 534 is active and encoder 532 and bypass portion 536 are inactive. Thus, the data sent by transcoder 510 is in a common format and dummy encoder 584 will encode the data in the format of a vocoder associated with mobile terminal B.

A cross transcoding node as shown in fig. 6 is another embodiment of the present invention. Note that only half of the total cross-transcoding nodes are shown for clarity. The other half of the cross-transcoding node is the same, providing communication capability in the opposite direction, and cross-transcoding node 600 acts as a centralized interface between different speech codecs, and in general, transcoding node 600 can be viewed as two pairs of physically interconnected transcoders, rather than the separate cases described in the previous embodiments, instead of using separate signaling and control blocks for each transcoder, using a single signaling and control stage 610. The transcoding node 600 also includes a decoding block 620, an encoding block 630 and a switch 640.

The primary function of the signal and control block 610 is to communicate (or attempt to communicate) with an entity on the other end of the communication line to determine whether:

a) the connection end is connected with the same LPC type vocoder;

b) the connecting end is connected with different LPC type vocoders available for a transcoding module;

c) connecting the terminal to an entity not included in a) or b) above (i.e., another family vocoder, a new LPC vocoder, a wireless terminal, etc.);

the timing and synchronization information is used to control the decoding block 620 and the encoding block 630. The control information is used to select the correct position for switch 640 to route the correct signal.

The decoding block 620 includes a decoder 622, a dummy decoder 624 and a bypass portion 626. The encoding block 630 includes a bypass portion 632, a dummy encoder 634 and an encoder 636.

When two vocoders are connected to each other, the cross-transcoding node will function as described below. Under control of the signal and control block 610, the decode block 620 will perform one of the following tasks:

a) when the connection is connected to an identical LPC-type vocoder, the compressed speech signal is sent to the bypass portion 626, via which portion 632 the speech data will be passed, possibly after reformatting, for transmission to the same LPC-type vocoder,

b) when the connection is connected to different LPC-type vocoders available to a transcoding module, the compressed speech data is converted to a common format signal using the pseudo decoder 624, then the signal is transferred to the pseudo encoder 634, the common format is converted back to a compressed signal, and finally the compressed speech signal is transferred to different LPC-type vocoders; or

c) When connected to an entity not included in a) or b) above (i.e., another family type of vocoder, a new LPC vocoder, a wireless terminal, etc.), a voice decoder 622 is used to convert the compressed voice data into PCM samples, which are then passed to an encoder 636 for conversion back into a compressed voice signal, which is then transmitted to the end entity.

When connected to a wireless terminal, the cross-transcoding node will function as described below, when the PCM signal is being input, it is passed to switch 640, signal and control block 610 selects to switch to passing the signal to encoder 636, where it is converted to compressed speech, and finally, the compressed speech will be sent to an external vocoder. When the wireless terminal is on the receiving end of the communication line and compressed speech is being input, the signal is passed to a decoder 622 where it is converted to PCM format, and the signal and control block selects to switch to handing the signal over to the wireless terminal.

The following description will provide a specific example to understand how the pseudo-encoder units perform the transformation from compressed signals to common format signals and vice versa, i.e., from common format to compressed signals, and in particular, consider the case where speech signals are transformed when sent from a mobile terminal (MTA340 to MTB 380. in this example, the MTA uses the vector and enhanced linear prediction (VSELP) vocoder of the IS54 wireless telephony communication standard fig. 7a describes the frame format of IS54 the signals are transformed to common format as shown in fig. 7b, at the receiving station MTB using the enhanced full rate Encoder (EFRC) of the IS641 standard, fig. 7C shows the frame format of IS 641.

Referring to fig. 3b and 5, for conversion in this example, the voice signal IS compressed (encoded) by the VSELP vocoder located in the MTA340 according to the IS54 standard and transmitted to the base station a350 through a wireless communication line (RF channel a). Where it is transformed into a common format by a dummy decoder 544 of transcoder 510 (shown in figure 5). The common format data frames are then sent through the telecommunication company network 360 to the transcoder 550 where they are transformed by the pseudo encoder 584 into compressed speech of the IS641 standard. The compressed signal is sent over a wireless communication line (RF channel B) to the MT380 where it is decoded by the EFRC vocoder of the second MT. Audible speech is then available on the MT 380.

The pseudo decoder 544 receives the voice data frames of IS54 format shown in FIG. 7a, transforms them as described below, and IS also shown by the flow chart of FIG. 8. The pseudo decoder 544 recalculates a 10-dimensional vector representing the LPC reflection coefficients for the 20ms frame of data using its own quantizer. Then 4 sets of interpolated LPC coefficient vectors are determined for the 4 sub-frames using the 10-dimensional vector. The interpolation method is the same as previously described. This portion of the common format data frame is ready for storage by the dummy decoder 544 as a future recovery. The pseudo-decoder 544 then reads the 4 lag values (pitch lag) from the compressed format. Dummy decoder 544 stores them for future insertion into the common format. The pseudo-decoder 544 then uses the code book information, gain factors and pitch delays for the 4 sub-frames and frame energy for the frame to create a synthesized excitation signal (4 by 40 samples) for the common format. Finally, a common format data frame is formed by correlating the excitation signal with the stored LPC filter coefficients and pitch delay. This data frame is sent to the pseudo encoder 584 of the next base station. Note that in fig. 7b, information bits have been reserved in the common format frame for energy and pitch prediction gain information. This information is not calculated in this particular example.

As shown in fig. 9, the dummy encoder 584 receives the common format voice data frame and now needs to convert it to IS641 compressed voice format in order for EFRC to decode it correctly in MTB. The dummy encoder 584 reads the LPC coefficients for the 4 subframes, discards the coefficients for the first three subframes, and retains only the coefficients for the fourth subframe. Note that this IS the LPC reflection coefficient vector computed for the entire frame, and the first three vectors used for the transform are not needed in this particular example because the EFRC vocoder in MTB will interpolate the first three sub-frame vectors according to the IS-641 interpolation scheme. However, all four vectors may be used where the transform includes other types of vocoders. In this regard, the dummy encoder 584 re-quantizes the fourth sub-frame LPC reflection coefficients using its own quantizer. Before the pseudo-encoder submits the 10 LPC reflection coefficients to its quantizer, they first need to be transformed into LP (linear prediction) coefficients, then into Line Spectral Pair (LSP) coefficients, and finally into line spectral frequencies (LSF vectors). The LSF vector is then quantized and transformed into a quantized LSP vector. This quantized LSF vector IS part of IS641 format and IS stored as IS. The pseudo encoder 584 then transforms the quantized LSP vector into quantized LP coefficients and interpolates the LP coefficients for the first three sub-frames. This set of LP coefficient vectors will be used in the next step.

The pseudo encoder 584 reconstructs the speech signal using the common format excitation signal and passing each of the four 40-sample sub-frames through a synthesis filter using the quantized and interpolated LP coefficients as tap coefficients. The pseudo-encoder 584 calculates (in the same manner as a conventional EFRC encoder) the pitch lag, gain, and excitation values (algebraic codes used in the MTB code book) from the speech signal by using the previously calculated 10 LSP coefficients. Finally, IS641 compressed speech format frames are composed using quantized pitch delay, gain and excitation values and stored LSP vectors. This speech data frame is sent to the EFRC decoder in MTB where it is converted to a speech signal as usual.

Note that pitch delay information from the common format is not used in this example, but can be used in other transformations, instead of calculating pitch delay information from the generated speech signal using known algorithms.

In general, the pseudo decoder 534 transforms the input compressed speech signal into a common format, having a coefficient portion and an excitation portion. The common format is then used by the pseudo encoder to reconstruct the compressed speech, but in a format that is different from the compressed speech format input to the pseudo decoder 544, and specifically the pseudo encoder 584 generates the coefficients of the compressed speech signal output by the pseudo encoder 584 from the coefficient portions in the common format signal. On the basis of the common format signal, the speech signal is reconstructed and used to extract any excitation or other information that is used to represent the speech information together with coefficients calculated for the compressed speech signal.

It is noted that the pseudo-encoder and pseudo-decoder of transcoder 510 are designed according to the type of vocoder with which it will interact. Common is that each pseudo-decoder will accept a compressed speech signal and emit a common format signal, which in turn will be converted by the pseudo-encoder to another compressed speech signal format. This feature makes the system very flexible, especially when new vocoders are introduced. It is sufficient to design a dummy encoder and a dummy decoder that provide an exchange between the new vocoder signal format and the common format. There is no need to alter existing transcoders in any way. Since the common format used by the present system is still the same.

From a structural point of view, the apparatus shown in FIG. 10 can be used to implement the functionality of the pseudo encoder 584, the operation of which has been described in detail above in connection with FIG. 9. The apparatus includes an input signal line 910, a signal output line 912, a processor 914 and a memory 916. Memory 916 is used to store instructions for operating processor 914 and also to store data used by processor 914 in executing such instructions. The bus 918 is used to exchange information between the memory 916 and the processor 914.

The instructions stored in the memory 916 enable the device to operate in accordance with the functional block diagram shown in fig. 11. The apparatus comprises a coefficient segment transformer for transforming the coefficient segments from the common format frame into the coefficient segments of the compressed audio signal frame by known mathematical processes, as has been described in connection with fig. 9. In this example, transformed to IS641 frame format, the apparatus also includes a synthesis filter. Quantized LPC coefficients for four subframes are received from a coefficient segment transformer. The synthesis filter also receives an excitation signal from the excitation segment of the common format frame to compose an audio signal. This signal IS then input to a synthesis analysis process that generates an excitation segment for the IS641 frame format, and the quantized LSP vector IS output by the coefficient segment transformer using the tap coefficients.

Fig. 12 shows a block diagram of the dummy decoder 544 shown in fig. 5. The apparatus includes two main functional blocks, namely a coefficient segment transformer, which receives coefficient segments from a data frame in IS54 format and transforms them into coefficient segments of a data frame in a common format. The apparatus also includes an excitation segment transformer for transforming the excitation segment from the IS54 data format into an excitation segment of the common format data frame by processing all segments of the compressed audio signal data frame to form the common format data frame.

When designing a transcoder for a particular application, the pseudo-encoder and pseudo-decoder may be comprised using one of the devices shown in fig. 11 and 12. Which of the two systems to choose will depend on the particular format transform to be implemented, the apparatus shown in fig. 12 may be most suitable for this operation when the format of the compressed audio signal (either the source data frame or the destination data frame) is such that the coefficient segments and excitation segments from the source data frame can be processed independently to implement the transform to the destination data frame. On the other hand, when it is appropriate to reconstruct the audio signal, the apparatus shown in fig. 11 should be employed.

As regards the encoder and bypass stages that make up each transcoder, they can be made according to systems that are currently well known to those skilled in the art. In particular, the encoder and decoder may be composed according to the block diagrams of fig. 1 and 2, respectively, while the bypass mechanism may be designed according to the content of the aforementioned international application.

The above-described preferred embodiment should not be construed in any limiting sense as many variations or modifications are possible without departing from the spirit of the invention. The scope of the invention is defined in the appended claims and their equivalents.

Claims

1. An apparatus for processing an audio signal, said apparatus comprising an input and an output, said apparatus being responsive to frames of compressed audio data in a first format applied to said input to produce frames of compressed audio data in a second format at said output, the frames in the first format having coefficient segments and excitation segments, the frames in the second format having coefficient segments and excitation segments, said apparatus comprising:

a) first processing means coupled to said input for receiving coefficient segments of frames of compressed audio data in a first format and for inverting coefficient segments of frames of compressed audio data in a second format at said output;

b) second processing means coupled to said input for generating excitation segments of frames of compressed audio data in a second format from frames of compressed audio data in a first format.

2. An apparatus according to claim 1, wherein said first processing means is arranged to emit the coefficient segments of the compressed audio data frames in the second format without any substantial excitation segments in the compressed audio data frames in the first format.

3. An apparatus according to claim 2, wherein said first processing means comprises a quantizer.

4. An apparatus according to claim 1, wherein said second processing means comprises a quantizer.

5. An apparatus according to claim 1, wherein said second processing means calculates excitation segments of data frames of compressed audio data in the second format without any substantial coefficient segments of data frames of compressed audio data in the first format.

6. An apparatus according to claim 1, wherein said second processing means comprises a filter.

7. An apparatus according to claim 6, wherein said filter comprises a first input for receiving the reconstructed audio signal and a second input for receiving coefficient segments of data frames of the second format compressed audio data.

8. An apparatus according to claim 1, wherein the first format IS 54.

9. An apparatus according to claim 1, wherein the second format IS 641.

10. An apparatus for transmitting compressed audio information data frames, said apparatus comprising:

a) a first transcoder including a first input and a first output, said first transcoder being responsive to frames of compressed audio data in a first format applied to said input to produce frames of compressed audio data in a second format at said output, the frames of compressed audio data in the first format having coefficient segments and excitation segments, the frames of the second format having coefficient segments and excitation segments;

b) a second transcoder, including a second input and a second output, said second input coupled to said first output, receiving compressed audio data frames in a second format, said second transcoder being responsive to compressed audio data frames in the second format applied to said second input to generate compressed audio data frames in a third format at said second output, the third format frames having coefficient segments and excitation segments.

11. An apparatus according to claim 10, wherein said first transcoder comprises:

a) first processing means coupled to said first input for receiving coefficient segments of frames of compressed audio data in a first format and for emitting coefficient segments of frames of compressed audio data in a second format at said first output;

b) second processing means coupled to said first input for generating an excitation segment of a data frame of compressed audio data of a second format from a data frame of compressed audio data of a first format.

12. An apparatus according to claim 11, wherein said first processing means is arranged to emit the coefficient segments of the compressed audio data frames in the second format without any substantial excitation segments in the compressed audio data frames in the first format.

13. An apparatus according to claim 12, wherein said first processing means comprises a quantizer.

14. An apparatus according to claim 12, wherein said second processing means comprises a quantizer.

15. An apparatus according to claim 12, wherein said second processing means calculates excitation segments of data frames of compressed audio data in the second format without any substantial coefficient segments of data frames of compressed audio data in the first format.

16. An apparatus according to claim 12, wherein said second processing means comprises a filter.

17. An apparatus according to claim 16, wherein said filter comprises a first input for receiving the reconstructed audio signal and a second input for receiving coefficient segments of data frames of the second format compressed audio data.

18. An apparatus according to claim 10, wherein said second transcoder comprises:

a) third processing means coupled to said second input for receiving coefficient segments of frames of compressed audio data in a second format and for outputting coefficient segments of frames of compressed audio data in a third format on said second output;

b) fourth processing means coupled to said second input for generating an excitation segment of a data frame of third format compressed audio data from a data frame of second format compressed audio data.

19. An apparatus according to claim 18, wherein said third processing means is arranged to emit the coefficient segments of the frames of compressed audio data in the second format without any substantial excitation segments in the frames of compressed audio data in the second format.

20. An apparatus according to claim 19, wherein said third processing means comprises a quantizer.

21. An apparatus according to claim 19, wherein said fourth processing means comprises a quantizer.

22. An apparatus according to claim 18, wherein said fourth processing means calculates excitation segments of data frames of compressed audio data in the third format without any substantial coefficient segments of data frames of compressed audio data in the second format.

23. An apparatus according to claim 18, wherein said fourth processing means comprises a filter.

24. An apparatus according to claim 23, wherein said filter comprises an input for receiving the reconstructed audio signal and an input for receiving the coefficient segments of the data frames of the third format compressed audio data.

25. A method for processing a data frame representing audio information in digitized and compressed form, the data frame including a coefficient segment and an excitation segment, the data frame being in a first format, said method comprising the steps of:

c) combining the coefficient segments of the second-format data frames generated at steps a) and b), respectively, with the excitation segments of the second-format data frames to generate second-format data frames representing the audio information included in the first-format data frames.

26. A method according to claim 25 wherein the step of generating an excitation segment of the second format data frame comprises the steps of:

a) synthesizing an audio signal based at least in part on information contained in an excitation segment of a data frame;

b) analyzing the audio signal synthesized in step a) to generate at least part of the excitation segment of the second format data frame.

27. A method according to claim 26, comprising the steps of passing the audio signal synthesized in step a) of claim 26 through a filter and providing tap coefficients in the coefficient section of said second format data frame to said filter.

28. A method according to claim 25 wherein the excitation segment of the second format data frame is generated by a transformation of the excitation segment of the first format data frame only.

29. A method according to claim 25, wherein the generation of the coefficient segments of the second format data frame is achieved solely by transformation of the coefficient segments of the first format data frame.

30. A method of transmitting a data frame representing audio information in digitized and compressed form, the data frame including a coefficient segment and an excitation segment, the data frame being in a first format, said method comprising the steps of:

b) transmitting the data frames in the second format to a second location remote from said first location;

c) and processing the data frame of the second format at the second position to generate a data frame of a third format, wherein the data frame of the second format comprises a coefficient section and an excitation section.

31. A method according to claim 30, comprising the steps of:

a) processing the coefficient segments of the first format data frame at the first location to generate coefficient segments of the second format data frame;

b) processing the first format data frame at the first location to generate an excitation segment of the second format data frame;

c) combining the coefficient segments of the second-format data frames generated in steps a) and b), respectively, with the excitation segments of the second-format data frames to generate second-format data frames representing the audio information contained in the first-format data frames.

32. A method according to claim 31, comprising the steps of:

a) processing the coefficient segments of the second format data frame at said second location to generate coefficient segments of a third format data frame;

b) processing the data frame of the second format at said second location to generate an excitation segment of a data frame of a third format;

c) combining the coefficient segments of the third format data frame generated in steps a) and b), respectively, with the excitation segments of the third format data frame to generate a third format data frame representing the audio information contained in the first format and second format data frames.

33. A method of communicating an audio signal between two incompatible vocoders, said method comprising the steps of:

a) receiving a first format data frame from a first vocoder, the data frame comprising a coefficient segment and an excitation segment;

b) transforming the first format data frame into an intermediate format data frame, comprising the sub-steps of:

iii) combining the coefficient segments of the intermediate format data frame with the excitation segments of the intermediate format data frame to produce an intermediate format data frame representing the audio information contained in the first format data frame,

ii) processing the intermediate format data frame to generate an excitation segment of a third format data frame;

iii) combining the coefficient segments of the third format data frame with the excitation segments of the third format data frame to produce a third format data frame representing the audio information contained in the first format and intermediate format data frames,

d) the third format data frame is transmitted to the second vocoder.

34. A machine readable storage medium containing program portions for instructing a computer to process an audio signal, said computer including an input and an output, said program portions causing said computer to generate frames of compressed audio data in a second format at said output in response to frames of compressed audio data in a first format applied to said input, the frames of compressed audio data in the first format having coefficient segments and excitation segments, the frames of compressed audio data in the second format having coefficient segments and excitation segments, said program portions implementing functional blocks in said computer comprising:

a) first processing means coupled to said input for receiving coefficient segments of frames of compressed audio data in a first format and for emitting coefficient segments of frames of compressed audio data in a second format at said output;

b) second processing means coupled to said input for generating an excitation segment of a data frame of second format compressed audio data from a data frame of first format compressed audio data.

35. An interface node between vocoders for converting frames of a compressed audio signal of a first format into frames of a compressed audio signal of a second format, the frames of the first format having a coefficient segment and an excitation segment, said node comprising:

a) a first transcoder including a first input and a first output, said first transcoder being responsive to frames of compressed audio data in a first format applied to said input to produce frames of compressed audio data in an intermediate format at said output, the intermediate format frames having a coefficient segment and an excitation segment;

b) a second transcoder including a second input coupled to said first output for receiving frames of intermediate format compressed audio data and a second output for generating frames of second format compressed audio data at said second output in response to frames of intermediate format compressed audio data being applied to said second input.