WO2009099381A1

WO2009099381A1 - Robust speech transmission

Info

Publication number: WO2009099381A1
Application number: PCT/SE2009/050014
Authority: WO
Inventors: Daniel ENSTRÖM; Hans Hannu; Per Synnergren
Original assignee: Telefonaktiebolaget LM Ericsson AB
Current assignee: Telefonaktiebolaget LM Ericsson AB
Priority date: 2008-02-05
Filing date: 2009-01-12
Publication date: 2009-08-13
Anticipated expiration: 2010-08-05

Abstract

In transmission of speech data from a mobile transmitter over an air interface at least two different parallel speech data frames from a stream of speech data are encoded. The transmitter selects one of the at least two speech data frames for transmission based on one or more pre-defined parameters. Hereby the transport layer will have the possibility to send the speech data frame using the encoding that has the best chance of arriving at the receiver without bit errors. As a result the speech quality is enhanced in poor radio conditions.

Description

ROBUST SPEECH TRANSMISSION

TECHNICAL FIELD

The present invention relates to a method and a device for transmitting speech over a circuit switched connection.

BACKGROUND

Cellular Circuit Switched (CS) telephony was the first service introduced in the first generation of mobile networks. Since then CS telephony has become the largest service in the world.

Today, it is the second generation (2G) Global System for Mobile Communication (GSM) network that dominates the world in terms of installed base. The third generation (3G) networks are slowly increasing in volume, but the early predictions that the 3 G networks should start to replace the 2G networks already a few years after introduction and become dominating in sales has proven to be wrong.

There are many reasons for this, mostly related to the costs of the different systems and terminals. But another reason may be that the early 3 G networks was unable to provide the end user the performance they needed for IP services like e.g. web surfing and peer-to-peer IP traffic. Another reason may also be the significantly worse battery lifetime of a 3G phone compared to a 2G phone. Some 3G users actually turn of the 3G access, in favor for the 2G access, to save battery. Later 3G network releases includes High Speed Packet Access (HSPA). HSPA enable the end users to have bit rates that can be compared to bit the rates provided by fixed broadband transport networks like DSL. Since the introduction of HSPA, a rapid increase of data traffic has occurred in the 3G networks. This traffic increase is mostly driven by lap-top usage when the 3 G telephone acts as a modem. In this case battery consumption is of less interest since the lap-top powers the phone.

After HSPA was introduced, battery consumption became a focus area in the standardization. This lead to the opening of a working item in the 3rd Generation Partnership Project (3GPP) called Continuous Packet Connectivity (CPC). This working item aimed to introduce a mode of operation where the phone could be in an active state but still have reasonably low battery consumption. Such state could for instance give the end- user a low response time when clicking a link in a web page but still give a long stand by time.

The features developed in the CPC working item were successfully included in the 3GPP Release 7 specifications. But, the gain of CPC could only be utilized when running HSPA. This means that battery lifetime increase cannot be achieved for users using the CS telephony service.

In order to be able to increase the talk time of CS telephony another working item has been open that aims to make CS telephony over HSPA possible.

From a high-level perspective the CS over HSPA solution can be depicted as in Fig. 1. An originating mobile station connects via HSPA to the base station NodeB. The base station is connected to a Radio Network Controller (RNC) comprising a jitter buffer. The RNC is via a Mobile Switching Center (MSC)/Media Gateway (MGW) connected to an RNC of the terminating mobile station. The terminating mobile station is connected to its RNC via a local base station (NodeB). The mobile station on the terminating side also comprises a jitter buffer.

In the scenario depicted in Fig. 1 , the air interface is using Wideband Code Division Multiple Access (WCDMA) HSPA, which result in that:

- The uplink is High Speed Uplink Packet Access (HSUPA) running 2 ms Transmission Time Interval TTI and with Dedicated Physical Control Channel (DPCCH) gating.

- The downlink is High Speed Downlink Packet Access (HSDPA) and can utilize Fractional Dedicated Physical Channel (F-DPCH) gating and Shared Control Channel for HS-DSCH

(HS-SCCH) less operation, where the abbreviation HS-DSCH stands for High Speed Downlink Shared Channel.

- Both uplink and downlink uses Hybrid Automatic Repeat Request (HARQ) to enable fast retransmissions of damaged voice packets.

The use of fast retransmissions for robustness, and HSDPA scheduling, requires a jitter buffer to cancel the delay variations that can occur due to the (HARQ) retransmissions, and scheduling delay variations. Two jitter buffers are needed, one at the originating RNC and one in the terminating terminal. The jitter buffers use a time stamp that is created by the originating terminal or the terminating RNC to de-jitter the packets.

The timestamp will be included in the Packet Data Convergence Protocol (PDCP) header of a special PDCP packet type. A PDCP header is depicted in Fig. 2.

The jitter buffer typically needs sequence number information as well to handle reordering. The sequence number used is the RLC sequence number that is passed on to higher layers.

It has been proposed that the CS over HSPA solution that is being standardized in 3GPP R7 and R8 should include the concept of rate adaptation. The concept of rate adaptation means that an Adaptive Multi Rate (AMR) codec mode is changed in response to the current radio conditions. As a result lower AMR codec modes, such as 4.75, 5.15, and 5.9 kbit/s can be used when the User Equipment (UE) is transmitting in bad radio conditions.

An adaptive scheme is today in the Global System for Mobile communication (GSM) system, but the scheme is associated with some problems. A first problem is that the adaptation loop is too slow to be able handle the rapid changes of the radio conditions. As a comparison it can be mentioned that the GSM link adaptation can change the AMR codec mode and thus the transport format with a maximum frequency of 50 Hz. While WCDMA uses a link adaptation based on fast power control that can change the transmission power with a frequency of 1500 Hz.

There is a constant desire to improve the quality of existing communication methods. Hence there is a need for an improved method for transmitting Circuit Switched calls.

SUMMARY

It is an object of the present invention to improve the transmission of circuit switched calls and in n particular circuit switched calls transmitted to a mobile terminal over HSPA.

At least one of the above objects and others is obtained by the method and system as set out in the appended claims.

Thus, in accordance with the present invention speech data is transmitted from a mobile transmitter over an air interface. When generating frames to transmit, at least two different parallel speech data frames from a stream of speech data are encoded. The transmitter selects one of the at least two speech data frames for transmission based on one or more predefined parameters. Hereby the transport layer will have the possibility to send the speech data frame using the encoding that has the best chance of arriving at the receiver without bit errors. As a result the speech quality is enhanced in poor radio conditions. In accordance with one embodiment the selection is based on available transmission power.

In accordance with one embodiment the selection is based on current radio link conditions.

In accordance with one embodiment the parallel encoded speech data frames are encoded as Adaptive MuI ti Rate frames.

In accordance with one embodiment the parallel encoded speech data frames are encoded as different combinations of classes of bits.

In accordance with one embodiment the speech data is transmitted over an air interface employing High Speed Packet Access transmission.

The invention also extends to a transmitter adapted to employ a transmission scheme as described above.

By using the present invention the transport layer will be enabled to send a speech data frame using the encoding that has the best chance of arriving at the receiver without bit errors. As a result the speech quality is enhanced in poor radio conditions.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will now be described in more detail by way of non-limiting examples and with reference to the accompanying drawings, in which:

- Fig. 1 is a view illustrating CS transmission over HSPA,

- Fig. 2 is a view of a PDCP header,

- Fig. 3 is a view illustrating the scheduling grant that corresponds to a transport format (TF), - Fig. 4 is a view of parallel encoding of speech,

- Fig. 5 is a flowchart illustrating procedural steps performed when providing parallel encoding,

- Fig. 6 is a transmitter adapted to use parallel encoding, - Fig. 7 is a flowchart illustrating procedural steps performed when providing parallel encoding in one embodiment, and

- Fig. 8 is a flowchart illustrating procedural steps performed when providing parallel encoding in another embodiment,

DETAILED DESCRIPTION

The radio condition in Wideband Code Division Multiple Access (WCDMA) needs to be well known in order to set the correct output power for the transmission. In order have this control, WCDMA utilize an outer loop power control and a fast inner loop power control. The outer loop power control adopts the power relationship between the data channel and the control channels by investigating the current bit error rate. The inner power control loop adapts the terminals total output power in the transmissions by controlling the current radio condition with the help of pilot bits. The inner power control is fast and adopts the output power with 1500 Hz.

The operator needs to provide the end user with a sufficient good quality when the user is running a telephony service like CS over HSPA. The basic quality of the service is given by the target AMR codec mode that is used. Typically in a network running a full-rate voice service, AMRl 2.2 is used as the target codec.

However, in some cases, the quality of the air link deteriorates so much that it is not possible to send the bits encoded by the AMR 12.2 codec mode error- free over the air link. In this case, from a speech quality perspective, it may have been better to send a fewer number of bits but with higher protection, i.e. stronger channel coding over the air link. For example AMR 5.9 can be used in such circumstances. Such adaptation is the type of rate adaptation of AMR used in GSM today. The drawback of the solution in GSM is that the information given to the AMR encoder basically comes from a too slow adaptation loop which runs in 50Hz.

In WCDMA HSPA the link adaptation is much faster. In WCDMA HSUPA (EUL), the User Equipment (UE) is granted a maximum bit rate (the scheduling grant) that in case of CS over HSPA can be set be equal to the bit rate of the AMR 12.2 mode if this mode is the target mode. The scheduling grant that corresponds to a transport format (TF) is depicted in Fig. 3.

The fast power control will give an instant value of the power limit, which in the case of Fig. 3 is lower than the scheduling grant. Therefore, to have a large possibility to get an error free transmission a lower TF can be selected. In this case TF2 is selected. In accordance with the present invention an encoded frame that equals the size of the transport format is then provided when it is time to transmit the data. In the scenario above this would result in that an AMR encoded frame corresponding to TF2 is selected. Hence in accordance with one embodiment parallel coding of speech using a set of different encoder modes is performed. The encoder can for example be an AMR codec. The result is that there will be a set of speech frames, all with a different compression ratio, that can be sent when it is time to transmit the speech. This is depicted in Fig. 4.

In Fig. 4 use of parallel encoding of speech is depicted. The example is given in the context of CS over HSPA, but other transmission standards are envisaged. In Fig. 3 speech is received by the microphone and encoded in parallel for example into a bit-stream using the AMR 12.2, AMR 7.95, AMR 5.9 and AMR 4.75 mode. All these frames are then fed to and available for the mechanism that chooses the speech frame to send. The radio/transmission layer sends information about the current radio conditions to the mechanism that that selects the speech frame to send based on the available speech frames and information about the current radio conditions. In one embodiment the information about the current radio conditions can be power control information. In the case of WCDMA that information is updated with 1500 Hz thereby providing a good estimate of the channel for a particular time. This is a significant improvement compared to existing rate adapted transmission schemes. In the speech frame selection mechanism one codec mode is selected depending on the instant radio condition information. In example depicted in Fig. 4 AMRl 2.2 is selected.

In Fig. 5 a flowchart illustrating procedural steps performed in accordance with one embodiment of the invention is shown where speech data is to be transmitted over a radio interface. First in a step 501 parallel encoding of speech data is performed thereby generating at least two speech data frames available for transmission. Next, in a step 503 one of the parallel encoded speech data frames is selected for transmission based on some, one or many, parameter(s) fed to a selector. The parameter(s) can be any suitable parameter(s) available including but not limited to available power and current radio link conditions.

In Fig. 6 a transmitter 600 for performing parallel encoding in accordance with the above is depicted. The transmitter 600 comprises a parallel encoding block 601 for performing parallel encoding of speech data thereby generating at least two speech data frames available for transmission. The transmitter 600 also comprises a selector 603 for selecting one of the parallel encoded speech data frames for transmission. The selector may employ any suitable criteria for selecting speech data frame for example the selection can be based on some, one or many, parameter(s) fed to a selector. The parameter(s) can be any suitable parameter(s) available including but not limited to available power and current radio link conditions.

In Fig. 7 a flowchart illustrating procedural steps performed in accordance with one embodiment of the invention shows the logical flow of the invention. First, in a step 701, the power needed to transmit the bits of the target AMR codec rate is calculated. Information from the fast power control will then tell if the UE is power limited or not. This is performed in a check in a step 703. In the case the UE is not power limited it sends the speech using the target rate in a step 705. In case of power limitation, the UE will calculate how many bits that can be sent in a step 707. In this case it may be -160- 180 bits, and then the UE checks how large the individual encoded AMR frames are in the set of parallel encoded speech frames. Based on the information on which transport format that is possible to send the UE takes the encoded AMR frame that matches this TF and transmit that in a step 709.

In accordance with one embodiment, the TF sizes are set to exactly match the different codec modes that are used in the procedure.

Further, the number of different AMR encoding modes is implementation specific. The number can be any suitable number, i.e. two or more.

In accordance with one embodiment the parallel coding of speech using a set of different encoder modes is performed by coding different classes of bits. For example A-class, B- class and C-class bits are coded in the transmission in different combinations in the transmission and then selecting one of the modes for transmission.

In accordance with one exemplary embodiment different classes of bits a re coded in accordance with the following:

In a first mode A-class, B-class and C-class bits are coded. In a second mode A-class bits and the B-class bits are coded. In a third mode only A-class bits are coded.

The transmission mode is then selected based on for example available power. The choice can for example depend on current radio link conditions. In one embodiment the choice is made based on the importance of the B- and C-class bits. Such an embodiment is illustrated in Fig. 8. In the exemplary embodiment of Fig. 8 the power needed to transmit the bits of the target AMR codec rate is first calculated in a step 801. Information from the fast power control will then tell if the UE is power limited or not. This is performed in a check in a step 803. In the case the UE is not power limited it sends the speech using the all bits, i.e. in the first mode, in a step 805. In case of power limitation, the UE will calculate how many bits that can be sent in a step 807. In case there is room to transmit both A and B bits this will be done by selecting the second mode in a step 809. In case there is only room to transmit the A bits this mode, i.e. the third mode is selected in a step 811.

Using the embodiment described in conjunction with Fig. 8 would call for a protocol enhancement, either a higher layer protocol (higher than the PDCP layer) or in the PDCP protocol must be used to inform that only a partial frame is sent. Below a higher layer than the PDCP is used to inform the encoder

010 AMR counter value X

Partial speech frame follows

AMR: Partial speech Frame Pn

And in the case depicted below the PDCP protocol is used

011 AMR counter value X

AMR: Partial speech Frame Pn (only A and B-class bits)

In accordance with the present invention the transport layer will have the possibility to send the speech data frame using the encoding that has the best chance of arriving at the receiver without bit errors. As a result the speech quality is enhanced in poor radio conditions.

Claims

1. A method of transmitting speech data from a mobile transmitter over an air interface, the method comprising the steps of:

- generating (501) at least two different parallel speech data frame from a stream of speech data, and

- selecting (503) one of the at least two speech data frames for transmission based on one or more pre-defined parameters.

2. The method according to claim 1, wherein the selection is based on available transmission power.

3. The method according to any of claims 1 or 2, wherein the selection is based on current radio link conditions.

4. The method according to any of claims 1 - 3, wherein the parallel encoded speech data frames are Adaptive Multi Rate frames.

5. The method according to any of claims 1 - 4 wherein the parallel encoded speech data frames are different combinations of classes of bits.

6. The method according to claim 5, wherein the classes of bits are A, B and C classes of bits.

7. The method according to any of claims 1 - 6, wherein the speech data is transmitted over an air interface employing High Speed Packet Access transmission.

8. A transmitter (600) for transmitting speech data from a mobile transmitter over an air interface, the transmitter comprising: - an encoder (601) for generating at least two different parallel speech data frame from a stream of speech data, and

- a selector (603) for selecting one of the at least two speech data frames for transmission based on one or more pre-defined parameters.

9. The transmitter according to claim 8, wherein the selector is adapted to base a selection on available transmission power.

10. The transmitter according to any of claims 8 or 9, wherein the selector is adapted to base a selection on current radio link conditions.

11. The transmitter according to any of claims 8 - 10, wherein the encoder is adapted to generate the parallel encoded speech data frames as Adaptive MuI ti Rate frames.

12. The transmitter according to any of claims 8 - 11 wherein the encoder is adapted to generate the parallel encoded speech data frames as different combinations of classes of bits.

13. The transmitter according to claim 12, wherein the classes of bits are A, B and C classes ofbits.

14. The transmitter according to any of claims 8 - 13, wherein the transmitter is adapted to transmit the speech data over an air interface employing High Speed Packet Access transmission.