GB2641375A

GB2641375A - Audio coding adaptive to wireless conditions

Info

Publication number: GB2641375A
Application number: GB2407557.4A
Authority: GB
Inventors: Law Malcolm; S J Wood Alan
Original assignee: Lenbrook Industries Ltd
Current assignee: Lenbrook Industries Ltd
Priority date: 2024-05-28
Filing date: 2024-05-28
Publication date: 2025-12-03
Also published as: GB202407557D0

Abstract

A method for adaptively transmitting and receiving audio data over a wireless link. At 204, a transmitter, 200, chooses a value k for the number of message packets to be transmitted in a radio interval, e.g. the time for transmission of five packets. A radio interval’s worth of audio is supplied to audio encoder 201 producing e.g. k = 3 message packets of encoded audio. These are expanded to e.g. n =5 codeword packets by an (n, k) erasure code 202, where n > k. At a receiver, 300, only n' of the codewords are received as two are lost over the wireless link, i.e. n' < n and, for this example n' = 3. The (n, k) erasure code 302 is used to decode k' message packets from the n' codeword packets received if n' ≥ k. The k' message packets are decoded by audio decoder 301 and output. Quality of service statistics, 304, may be fed back from the receiver to the transmitter which uses the statistics to decide what value of k should be used for future radio intervals. Thus, the datarate used for audio data is maximised for an acceptably low chance of message packet loss.

Description

AUDIO CODING ADAPTIVE TO WIRELESS CONDITIONS Field of Invention The present invention relates to methods and devices for transmitting audio over wireless links of variable quality.

Background to the Invention

Bluetooth LE is widely used to convey audio signals wirelessly, but the audio quality is typically degraded to a greater or lesser degree in doing so. One of the limiting factors is the datarate available for the audio codec to use.

The potentially available datarate is not constant but varies according to environmental conditions such as range and any interfering signals. Many audio codecs choose to operate at a fixed datarate, this rate has to be chosen conservatively for reliable operation.

For many applications degradation of audio quality is undesirable and is desirable to minimise it by operating the system at as high a datarate as practical in the prevailing conditions.

There are existing systems that operate adaptive datarates over Bluetooth, for example Qualcomm's Snapdragon Sound (comprising Aptx Adaptive and Qualcomm's Aptx Lossless) and Sony's LDAC but we are unaware of either Qualcomm or Sony publishing their methods.

There is a need for a method of operating an audio codec over a packet based radio channel at datarate (and associated audio quality) that adapts to the prevailing conditions and preferably does not require access to the internal details of the protocol stack.

Summary of the Invention

According to a first aspect of the present invention, there is provided a method for transmitting audio across a wireless link comprising the steps of: choosing a number k of message packets; encoding the audio for a time interval to k message packets; expanding the k message packets to n codeword packets where n > k; and transmitting the n codeword packets over the wireless link.

In this way, audio can be transmitted in k message packets at a datarate and hence quality level that the link can sustain whilst transmitting excess codeword packets allows link capacity to be probed for the possibility of better performance.

Preferably, there is an additional step of maintaining an evaluation of potential wireless link throughput, wherein k is chosen in dependence on the current evaluation of the link throughput.

In this way, the transmitter can track what channel performance is currently achievable and adjust the datarate used for audio accordingly, taking advantage of good channel conditions to operate at high audio quality and backing off in poor channel conditions to mitigate the risks of irrecoverable message packets.

In some embodiments, the k message packets are expanded to n codeword packets using a (n, k) erasure code. Use of an erasure code allows for integrity of message packets in the face of lost codeword packets.

In some embodiments, the (1, k) erasure code is optimal. An optimal erasure code ensures that loss of 71 -k codeword packets can always be tolerated In some embodiments, the (n, k) erasure code is systematic, which allows useful recovery of message packets even when more than 71 -k codeword packets are lost. Preferably, 71 = k + 1 and the non-systematic codeword packet is the exclusive or (XOR) of the message packets. Appending an XOR packet is both optimal and systematic whilst being extremely simple.

In some embodiments, 72 -k is constant across different time intervals. Expanding by a constant number of probe packets allows effective channel capacity probing with an acknowledgement based wireless protocol.

Alternatively, in other embodiments, 71 is constant across different time intervals. Using a constant number of codeword packets allows available airtime to be fully utilised in a broadcast protocol.

Preferably, the codeword packets comprise a field indicating the value of k. A field specifying k helps a receiver to select the appropriate configuration for reconstituting and decoding message packets.

The evaluation of wireless link throughput may be maintained in dependence on how many codeword packets are flushed in the transmitter. Observing how many codeword packets are flushed in the encoder gives useful information about how 30 many acknowledgements were not received.

Alternatively, or additionally, the evaluation of wireless link throughput is maintained in dependence on messages from the receiver comprising statistics derived from how many packets are successfully received. A feedback channel from the receiver gives accurate information about packet loss.

Preferably, the method of encoding the audio for the time interval is adapted to producing a variable number of packets. An audio codec adapted to producing a varying number of packets per time period allows efficient matching to an (n, k) packet erasure code where k varies with channel capacity.

In some embodiments, the method of encoding the audio for the time interval uses a scalable codec producing base layer data and enhancement layer data; enhancement layer data from the scalable codec is pushed into a FIFO buffer; each of the k message packets contains an integer number of encoded base layer blocks; and remaining space in the k message packets is filled with enhancement data pulled from the FIFO buffer.

In this way, the integer number of base layer blocks included in each of the k packets can vary, decoupling the packet frequency from the fixed block duration and implementing an audio codec adapted to a variable number of packets per time period. Enhancement data flowing into the remaining packet space ensures efficient utilisation of available packet capacity. The inclusion of an integral number of base layer blocks in each message packet ensures that no message packet is critical, base layer decode of successfully received message packets can still proceed even whilst other message packets are lost.

Preferably, the remaining space in the k message packets is filled backwards such that earlier bits pulled are placed later in the packet than subsequent bits pulled. In this way, the codec avoids the need to utilise message data on a field specifying where enhancement data starts in a packet.

According to a second aspect of the present invention, there is provided a method for receiving audio data across a wireless link comprising the for steps of: determining a number n of codeword packets transmitted over the wireless link corresponding to a time interval of encoded audio; receiving, after packet loss, n' of the codeword packets over the wireless link where n' < n;d; determining a number k of message packets corresponding to the time interval of encoded audio; decoding k' message packets from the n' codeword packets; and decoding the time interval of encoded audio from the k' message packets, wherein the step of decoding the message packets from the codeword packets uses an (n, k) erasure code if n' > k.

In this way a receiver can determine a packet erasure code used by the transmitter and use it to recover all transmitted message packet if packet loss is not excessive.

In some embodiments, the (n, k) erasure code is optimal. An optimal erasure code ensures that loss of n -k codeword packets can always be tolerated In some embodiments, the (n, k) erasure code is systematic, which allows useful recovery of message packets even when more than n -k codeword packets are lost. Preferably, Ti = k + 1 and the non-systematic codeword packet is the exclusive or (XOR) of the message packets. Appending an XOR packet is both optimal and systematic whilst being extremely simple.

In some embodiments, n -k is constant across different time intervals. Expanding by a constant number of probe packets allows effective channel capacity probing with an acknowledgement based wireless protocol.

Alternatively, in other embodiments, n is constant across different time intervals. Using a constant number of codeword packets allows available airtime to be fully utilised in a broadcast protocol.

Preferably, the step of determining a number k of message packets comprises reading a field indicating the value of k from a header in any codeword packet and removing the header from all received codeword packets. Reading a header field specifying k allows the receiver to select the appropriate configuration for reconstituting and decoding message packets.

In some embodiments, the method comprises not receiving further codeword packets after successful reception of k codeword packets. Switching off the receiver radio when sufficient codeword packets have been successfully received saves power.

Preferably, the method comprises the steps of: accumulating statistics containing information about the number n' of received codeword packets across one or more time intervals; and communicating those accumulated statistics back to the transmitter.

In this way the receiver can feed back statistics to the transmitter allowing it to adapt its behaviour to the actual packet loss experienced.

The step of decoding the time interval of encoded audio may be adapted to 30 consuming a variable number of packets. Use of an audio codec adapted to producing a varying number of packets per time period allows efficient matching to an (n, k) packet erasure code where k varies with channel capacity.

Preferably, the step of decoding the time interval of encoded audio uses a scalable codec consuming both base layer data and enhancement layer data; first enhancement data from each of the k message packets is pushed into a FIFO buffer; an integer number of base layer blocks are decoded from each message packet; and second enhancement data is pulled from the FIFO buffer and supplied to the scalable codec to improve the decoded audio quality.

In this way, the integer number of base layer blocks decoded from each of the k packets can vary, decoupling the packet frequency from the fixed block duration and implementing an audio codec adapted to a variable number of packets per time period. First enhancement data pulled from the remaining packet space ensures efficient utilisation of available packet capacity whilst the FIFO buffer applies a variable delay allowing second enhancement data to be supplied to the scalable codec alongside the corresponding base data. The inclusion of an integral number of base layer blocks in each message packet ensures that no message packet is critical; base layer decode of successfully received message packets can still proceed even whilst other message packets are lost.

More preferably, first enhancement data towards the end of a message packet is pushed into the FIFO buffer before first enhancement data earlier in the packet.

In this way, the codec avoids the need to utilise message data on a field specifying where enhancement data starts in a packet.

According to a third aspect of the present invention, there is provided an encoder adapted to perform the method of the first aspect.

According to a fourth aspect of the present invention, there is provided a decoder 20 adapted to perform the method of the second aspect.

According to a fifth aspect of the present invention, there is provided a codec comprising an encoder according to the third aspect in combination with a decoder according to the fourth aspect According to a sixth aspect of the present invention, there is provided a computer readable medium comprising instructions that, when executed by one or more processors, cause said one or more processors to perform the method of the first and/or second aspect.

As will be appreciated by those skilled in the art, the present invention is capable of various implementations according to the application, as will be apparent from the following discussion.

Brief Description of the Drawings

Embodiments of the invention will now be described by way of example with reference to the accompanying figures in which: Figure 1 shows a first scheme for transmitting packets whilst using a probe packet to detect channel capacity; Figure 2 shows a second scheme for transmitting packets whilst using an XOR packet to detect channel capacity; Figure 3 shows a third scheme which uses a low datarate backchannel for flow control instead of acknowledgements; Figure 4 shows how an audio encoder can be constructed to produce packets at a variable frequency; Figure 5 shows variable frequency packetization from the perspective of how a varying number of message packets are populated; Figure 6 shows the corresponding decoder to the encoder of Figure 4; and Figure 7 shows how the delay to enhancement data in each FIFO buffer varies but their total remains constant.

Detailed Description

The goal of adapting the codec's datarate to best exploit the currently available bandwidth across a wireless link (such as Bluetooth LE) can be broken down into a number of subgoals: * Determining the currently available throughput of the wireless data link * Adjusting the amount of useful information sent over the wireless data link * Adjusting the data rate produced by the audio codec * Communicating the current configuration to the receiving end of the communications Bluetooth LE provides a mechanism that can provide an adjustable datarate over a connected isochronous stream (CIS). The CIS can be configured for maximum throughput but fewer packets can be actually transmitted in the isochronous interval than the schedule allows for. Thus airtime is left available for any required retransmissions due to unacknowledged packets. The receiver can determine how many distinct packets were actually received and whether this constituted the complete number transmitted or whether any packets were dropped.

However this mechanism still leaves open the question of how the codec might determine the currently available throughput of the wireless link.

The codec operates at the application layer, but packet retries happen deep inside the protocol stack at the link layer. Without access to the protocol stack implementation, the application layer will usually be unaware of whether any or how many packet retransmissions happened. Operating the codec at too high a datarate for prevailing conditions will be detectable at the application layer by observing dropped packets. But operating the codec at a lower datarate than could be achieved in the prevailing conditions is not so easily detectable.

This mechanism also leaves open the question of how the audio codec might seamlessly adjust its datarate because audio codec packets generally represent a fixed duration of audio and the channel provides a variable frequency of fixed size packets, not a fixed frequency of variable sized packets.

Terminology Radio interval: We are presenting techniques that operate over repeating fixed length time intervals, and this term describes the interval over which we perform them. Over Bluetooth LE it will often correspond to an isochronous interval, but we have used a different term because multiple isochronous intervals might be combined to make one radio interval.

Message packets: Packets of data produced by an audio encoder or decoded by an audio decoder.

Codeword packets: Packets of data transmitted by the wireless protocol.

Flushed: Packets are said to be flushed if the transmitter wireless protocol gives up on attempting to communicate them to the receiver and discards them.

k denotes the number of message packets in a radio interval. Datarate is adapted by varying k.

n denotes the number of codeword packets in a radio interval. In some embodiments it will be constant for all radio intervals, for others it will vary.

n' denotes the number of codeword packets successfully received by the receiver. This may be less than Tr.

Determining potential throughput at the application layer Potential throughput can be determined without link layer information by transmitting extra packets in a radio interval beyond the message packets required to communicate data from the audio codec.

The use of additional packets allows wireless link performance to be evaluated at higher datarates than currently being used by the audio codec, allowing the audio codec datarate to be adjusted whilst maintaining a suitable margin of safety.

Consequently poor conditions can be detected and datarate for the audio codec reduced if this margin of safety diminishes before suffering the consequences of actual dropped message packets that would leave gaps in the decoded audio requiring lost packet concealment. In the other direction, good conditions can be detected and datarate for the audio codec increased if this margin of safety increases sufficiently that it would still be adequate at a higher codec datarate.

The link quality and margin of safety could either be monitored in the transmitter or monitored in the receiver.

If the protocol stack reports any packets which get flushed after a lack of acknowledgement, then this is information available at the application level in the transmitter about wireless link performance. It can be aggregated over a period of time to gather statistics about how much of a margin of safety currently exists.

Even if the protocol stack does not report such packets, the receiver will definitely be aware of which packets have successfully arrived -a measure of quality of service (QoS). The receiver can aggregate these observations over a suitable period and report its observations on link performance back to the encoder. This report could be included in acknowledgement packets, but it could also be sent back over the wireless link as occasional messages on a separate channel. Such a back channel is extremely low datarate and the back channel's physical characteristics could be chosen to prioritise reliability of communication (low modulation rate, high forward error correction) over throughput.

Regardless of whether information on packet loss is measured by transmitter or receivers, it is the transmitter which will need to decide how much datarate is wise to allocate to transmitting audio data. To make this decision, it should maintain an evaluation of potential wireless link throughput and choose datarates that fit within that throughput with a safety margin. Such an evaluation should be informed by information the transmitter gleans from all sources, such as observing flushed packets or receiving feedback from receivers. It might be a single number, slowly increasing if no problems are observed but dropping on report of a problem, or it might comprise several figures such as a central estimate and error bars.

Recovering packet loss There are several ways in which extra packets can be utilised to mitigate issues of packet loss. The characteristics of the wireless link are important in deciding which to use.

In a first embodiment, the wireless protocol might resend the first packet in a radio interval until it receives an acknowledgement before doing likewise with the next packet. This would guarantee that if an attempt is made to send n packets in a radio interval and only k are acknowledged, then it will be the last ?1, -k packets that get flushed at the end of the radio interval. (Bluetooth LE does not behave like this, the definition of flush point means an unacknowledged packet may be flushed part way through an isochronous interval).

In this case, since the initial packets in the radio interval are the most robust, they will be used to carry message packets from the audio codec. The latter packets are probe packets, their primary function is to indicate how much spare capacity the wireless link has compared to the throughput currently used by the audio codec. Their data payload is incidental, it could have no relationship to the audio or it could be used as an enhancement layer in a scalable codec.

Figure 1 illustrates this scheme.

The radio interval 101 defines the periodicity of the scheme, and is drawn as having sufficient airtime to transmit up to 5 packets. The first axes 102 show packets being transmitted, the second axes 103 show acknowledgements being received.

In this example, three message packets (labelled A, B & C) are sent and two transmissions failed to be acknowledged leading to retransmissions of packets A & C. Consequently there was no remaining airtime to transmit the fourth probe packet P which was flushed at the end of the radio interval. The encoder can deduce from the non-transmission of P that the link is currently being operated near capacity without dropping any message packets. If the probe packet was consistently transmitted and acknowledged for an extended period, the encoder could conclude there is capacity to increase k. Conversely if the probe packet often failed to be acknowledged then we could conclude link capacity was marginal and consider reducing k.

In a second embodiment, the wireless protocol may behave like Bluetooth LE and define the flush point such that retries on non-acknowledgement are limited for each packet, with the effect that the likelihood of flushing is comparable for each packet within an isochronous interval.

In this case, we can set n = k + c where c is a small constant and use an (n, erasure code to expand the k message packets into n codeword packets. The use of an optimal erasure code means the k message packets can be recovered after losing any c of the codeword packets.

Thus our n codeword packets comprise k message packets from the audio codec and c check packets. If c or fewer packets are lost, there should still be enough data for the receiver to recover the k message packets.

Figure 2 illustrates this scheme. Four codeword packets consisting of three message packets (A,B & C) and one check packet (XOR) were sent, but packet A was flushed before it was acknowledged. The decoder can reconstruct packet A from packets B, C & XOR and so suffers no loss in information from packet A failing. As in Figure 1, the encoder is aware packet A was flushed and can conclude link capacity is marginal.

In this second scheme, retransmissions on non-acknowledgment (automatic repeat request -ARQ) are our primary mechanism for addressing packet loss and it is sensible to choose the erasure code so that there are k message packets and a small constant number of check packets. A (k + 1, k) erasure code can be particularly simple, with the first k codeword packets being precisely the message packets and the last radio packet being a parity packet -the exclusive or (XOR) of the message packets.

In a third embodiment, we can dispense with acknowledgements and just use erasure codes for dealing with packet loss. Bluetooth LE provides broadcast mode which allows for each packet to be sent exactly once without acknowledgements.

In this case we would choose n as large as possible to maximise airtime for transmitting and choose k to configure the audio codec datarate. A receiver that successfully receives n' codeword packets can hope to recover all k message packets if n' k.

Figure 3 illustrates this third embodiment, transmitting packets from a transmitter 200, over a wireless channel 100 without acknowledgements to a receiver 300.

Once again the figure is drawn with the radio interval having time for n = 5 packets and the audio codec actually using k = 3 packets.

A radio interval's worth of audio is supplied to the encoder 201 of an audio codec producing k = 3 message packets of encoded audio. These are expanded to it = 5 codeword packets by an (n, k) erasure code 202 and a header 203 indicating k = 3 prepended the codeword packets for the radio interval.

The codeword packets are transmitted across the wireless channel where two of them are drawn as being lost, so the receiver successfully receives n' = 3 codeword packets. The header is read and stripped 303 from these codeword packets and the decoder recovers the message packets using the (n, k) erasure code 302 and feeds them to the decoder 301 of the audio codec.

If there were insufficient received packets to decode the erasure code then there will be a need for missing packet concealment. Preferably the erasure code is systematic which means k of the codeword packets are actually message packets. Any message packets that were successfully received are decoded, with the gaps covered by missing packet concealment.

The receiver notes that only n' packets were received out of the it sent and aggregates these into QoS stats 304. From time to time these are reported back 110 over the wireless channel to the transmitter. This back channel communication is not time critical and extremely low datarate, so it can be transmitted over a physical layer that priorities robustness over throughput.

The encoder gathers up statistics reports and uses them to decide what value of k should be used for future radio intervals so that the datarate used for audio is maximised subject to an acceptably low chance of message packet loss.

If there are multiple receivers, then not all need to report their QoS, though of 35 course the transmitter can only accommodate the experience of those who do report.

This third embodiment might use more airtime than acknowledgement based schemes, because the transmitter does not know when the receiver has successfully received sufficient packets and cannot abort surplus later packets. But it does have several advantages over the ARQ based schemes: * With ARQ, retransmissions may be performed even when the packet was received correctly if the receiver's acknowledgement failed to be successfully received by the transmitter. This consumes airtime on unhelpful data, duplicating information the receiver already has. It is especially likely if the receiver is operating its radio at low transmit power to conserve battery.

* Multiple receivers require multiple acknowledgements, which is impractical (and not supported by Bluetooth LE).

* Multiple receivers may fail to receive different packets. Wth ARQ every packet that any receiver failed to receive needs retransmission. With erasure codes, every codeword packet helps all receivers that failed to receive a packet, whichever packet it was.

Decoders can reduce power consumption by turning off their radios after receiving k packets successfully. This is actually to a lower level than would be needed with ARQ, since in both cases they need to operate until they have successfully received k packets but with ARQ they additionally need to transmit k acknowledgements.

Communicating configuration In the above schemes, the receiver needs to be able to determine k to work out which erasure code to apply and how many packets it is reconstituting in each radio interval.

This can be done by prepending to each codeword packet a header field indicating the value of k used for that radio interval.

Sometimes this might not be necessary, if the receiver can deduce the value by other means.

For example in the first embodiment above, the probe packets could be distinguished by being of smaller length than the message packets.

Choice of erasure code There is ample literature on erasure codes. Two properties of erasure codes that are particularly useful to this application are: * Optimal (n, k) erasure codes can recover the k message packets from any selection of k codeword packets out of the Ti. A non-optimal code cannot always do this and might sometimes require k + 1 codeword packets or more to recover the message packets * Systematic (n., k) erasure codes arrange that k of the codeword packets are actually message packets (with the remaining n -k codeword packets being check packets) Whilst any erasure code could be used, preferably the code should be both optimal and systematic. The benefit of optimality is obvious, but a systematic erasure code means that if packet loss is unexpectedly high and insufficient codeword packets are received to recover the complete set of message packets, any received systematic packets can still be fed to the audio decoder. With a nonsystematic code, failure to reconstruct the message packets would cause packet loss for the whole radio interval.

There are a couple of particularly simple systematic optimal erasure codes: * A repetition code simply repeats a single message packet it times. It is a systematic optimal (n, 1) erasure code * A parity code appends a single check packet to k message packets, where the check packet is the exclusive or (XOR) of all the message packets. It is a systematic optimal (k + 1, k) erasure code For more general (n, k) pairs, Reed Solomon is optimal and has a systematic variant. It operates on multi-bit symbols, typically octets. Its only defect is computational cost as it gets large. However in this application, acceptable latency considerations mean that at will not be large and the computational costs of implementing Reed Solomon are likely to be small compared to those of the audio codec.

There are plenty of alternative erasure codes, but they mostly aim to reduce the computational cost of decoding for large n, often at the cost of the code being optimal and systematic.

Audio Codec adaptive packet frequency We have disclosed how the wireless protocol can be organised to adapt the number of message packets per radio interval to suit conditions. It still remains to discuss how the audio codec can utilise that capability to achieve audio quality commensurate with the datarate.

One possibility is to use a codec whose packets each represent one radio interval but where the amount of data in the packet is adjustable corresponding to the currently available throughput. This packet is then segmented into k message packets for transmission over a wireless link.

This solution however has the defect that if packet losses exceed expectations and the k message packets cannot be recovered at the decoder then the remaining packet fragments are likely to be useless and packet loss concealment needs applying to the whole radio interval.

Preferably, a scalable codec can be used, with a base layer corresponding to the datarate available from a small value of k and enhancement layers corresponding to datarates available from greater numbers of packets per radio interval. This strategy is a good fit to the first scheme above where the earlier packets are the most robust to difficult radio conditions so the base layer would be conveyed in the early packets, the first enhancement layer in the next packet and so on.

But, whilst better than segmenting a large packet, in the other schemes an irrecoverable lost packet may happen to be the base layer packet in which case once again the whole radio interval will need packet loss concealment.

Preferably still, an audio codec is used in which each packet can represent a variable duration of audio and this can be adjusted without causing artifacts.

With such a codec, each packet might represent Yk of the radio interval and if the erasure code fails then any remaining systematic packets can be used leaving only the gaps corresponding to the irrecoverable message packets requiring error 15 concealment.

Typically audio codecs produce packets that represent a fixed duration of audio. For example an MDCT based codec would naturally produce packets that represent a duration in samples corresponding to the number of frequency points produced by the MDCT. Even if an audio codec has a configurable packet duration, it may not be adjustable mid-stream, especially not without artifacts on the change.

An audio codec offering this desirable adjustable packet frequency property is described in co-pending patent application PCT/GB2023/053071. We will now discuss how it comes to offer flexible packet frequency.

Figure 4 shows how an audio codec with this desirable property can be constructed from a scalable audio codec. As a scalable codec, the encoder has two data outputs, one a base layer which can decode on its own to a certain quality level and the other an enhancement layer which can be decoded in combination with the base layer to a higher quality level.

It operates on configurable but constant duration blocks of audio, but the encoded data from these blocks is gathered up into message packets in a flexible way that allows a variable number of packets to be emitted in a fixed time interval such as a radio interval. Being scalable, each block of audio encodes to both an encoded base block and an encoded enhancement block.

Figure 4 shows the scalable audio encoder (401) producing two outputs, base layer data blocks and enhancement layer data blocks. Encoded base layer blocks are treated as whole indivisible units and sent to a delay line (402). Encoded enhancement data is treated as a stream of bits and pushed into a first-in-first-out (FIFO) buffer (403).

Each packet (410) comprises an integer number (3 blocks are shown in Figure 4) of base layer blocks (420,421 & 422, combining to 412) taken from the output of the delay line (402). The space (413) left in the packet after a packet header (411) and base layer data (412) is filled up with enhancement data pulled as a stream of bits from the FIFO buffer. Typically this will not align with the boundaries of enhancement data blocks, so in Figure 4 this is drawn starting with a partial enhancement block 431 B, the remaining amount of data after the initial portion of enhancement block 431 was transmitted in the previous packet. 3 full blocks (432, 433, 434) are then drawn plus an initial portion (435A) of the next block.

Preferably (as shown in Figure 4), the enhancement data (413) does not fill the packet starting at the end of the base layer data (412), but backwards in reverse order starting from the end of the packet. This arrangement allows the decoder to process the enhancement data before it has determined the boundary between regions 412 and 413 and so saves the overhead of a packet header field indicating the boundary.

Figure 5 is another illustration of how this variable frequency packetization operates.

A radio interval 101 is divided up into 5 audio blocks 500-504. (For clarity, the number 5 is an arbitrary choice for drawing the figure and need not match n or k).

Each of these audio blocks is encoded by the scalable audio encoder into an encoded base layer block 510-514 and an encoded enhancement layer block 520524.

Two arrangements of this data into k message packets are shown, one into k = 3 packets 530-532 and one into k = 2 packets 540 & 541. Each fixed size message packet contains a variable but integer number of encoded base layer blocks, and the remaining space is filled with enhancement data from Figure 4's FIFO buffer 403.

Clearly there is less data capacity in k = 2 packets than there is in k = 3 packets.

The encoder needs to reduce encoded datarate to match the reduced channel capacity at lower k which will either correspond to shorter base layer or enhancement layer blocks. It is generally straightforward to operate an audio codec at a different datarate (typically by adjusting the degree of approximation error).

Figure 6 shows the corresponding decoder architecture. On receiving a packet (410) the decoder component of the scalable codec decodes the base layer blocks (412) it comprises. The remaining data in the packet (413) is enhancement data which is pushed into a FIFO buffer (406). The enhancement data corresponding to the base layer blocks is pulled from the FIFO buffer and used to enhance the decoding to the full quality level.

Since each packet contains an integer number of base layer blocks, the scalable codec can decode the corresponding duration of audio to an acceptable level, even if the buffered enhancement data required for best decode is unavailable, perhaps having been transmitted in an earlier lost packet. This is the advantage over splitting a large packet covering the whole radio interval into k message packets for transmission. In such a scheme loss of one message packet is liable to cause failed decode for the whole radio interval.

Figure 7 shows a different perspective on how the differential delay from the FIFO buffering varies and decouples the encoder output from the packet capacity.

The base layer blocks encounter N blocks of delay in the encoder delay line 402 and no delay in the decoder. The variable delay in both the encoder and decoder FIFOs sums to a constant delay duration, matching the N blocks of delay applied to the base layer data. Since the base layer blocks are delayed by N blocks in the encoder and the enhancement data by a variable amount of up to N, enhancement data is advanced in the packets compared to base layer data. The decoder FIFO buffer supplies the required delay to enhancement data to realign them.

In Figure 7a, the encoder FIFO buffer 403 is almost empty, so there is little delay in the encoder FIFO applied to the enhancement data. We label this delay f (which is not usually integer) blocks. In contrast, the decoder FIFO 406 is almost full and imposes a delay of N-f blocks, so that the total delay is N blocks matching the constant delay in the base layer path.

Suppose the encoder produces data faster than the channel capacity for a short time. Data is pushed into the encoder FIFO 403 faster than the channel can transmit it and the encoder FIFO 403 fills up to the situation illustrated in Figure 7b. At the decoder side, the decoder is consuming data faster than the transmission channel is pushing data into the decoder FIFO, and so the decoder FIFO 406 empties, but the total delay across both FIFOs remains constant.

A field in the packet header indicates how many base layer blocks are included in each specific packet. Another packet header field also indicates the low significance bits of a cumulative count of base layer blocks. This second field allows the decoder to deduce after a missing packet how many blocks it described and thus how much of an audio gap needs missing packet concealment before decoding the next received packet.

Some packet headers also comprise a field describing how many bits should be in the decoder's FIFO buffer 406. Decoders use this field to ensure that their FIFO buffer contains the correct amount of data and thus ensure that enhancement data is correctly parsed.

This arrangement decouples the encoded packets from the fixed blocksize, allowing packets to be emitted at a variable frequency. For example if the radio interval were 10ms, blocksize 1ms and circumstances currently called for k = 3 then two of the three message packets would contain 3 base layer blocks and one would contain 4. If one of the three packets were to be irrecoverably lost, 3 or 4ms of audio would need error concealment not the whole 10ms. Enhancement layer data would be lost for a longer duration, but by the nature of scalable coding, acceptable audio output is produced for that period and the loss of enhancement is of small consequence compared to the duration that needs packet loss concealment.

Claims

Claims 1. A method for transmitting audio across a wireless link comprising the steps of: choosing a number k of message packets; encoding the audio for a time interval to k message packets; expanding the k message packets to 71 codeword packets where n > k; and transmitting the n codeword packets over the wireless link.
2. A method according to claim 1, comprising an additional step of maintaining an evaluation of potential wireless link throughput, wherein k is chosen in dependence on the current evaluation of the link throughput.
3. A method according to claim 1 or claim 2, wherein the k message packets are expanded to it codeword packets using a (2, k) erasure code.
4. A method according to claim 3, wherein the (71, k) erasure code is optimal.
5. A method according to claim3, wherein the (n,k) erasure code is systematic
6. A method according to claim 5, wherein 71 = k + 1 and the non-systematic codeword packet is the exclusive or (XOR) of the message packets. 25
7. A method according to any of claims 1 to 5, wherein 71 -k is constant across different time intervals.
8. A method according to any of claims 1 to 5, wherein 71 is constant across different time intervals.
9. A method according to any of claims 1 to 8, wherein the codeword packetscomprise a field indicating the value of k.
10. A method according to any of claims 1 to 9, wherein the evaluation of wireless link throughput is maintained in dependence on how many codeword packets are flushed in the transmitter.
11. A method according to any of claims 1 to 10, wherein the evaluation of wireless link throughput is maintained in dependence on messages from the receiver comprising statistics derived from how many packets are successfully received.
12. A method according to any of claims 1 to 11, wherein the method of encoding the audio for the time interval is adapted to producing a variable number of packets.
13. A method according to claim 12, wherein: the method of encoding the audio for the time interval uses a scalable codec producing base layer data and enhancement layer data; enhancement layer data from the scalable codec is pushed into a first-infirst-out (FIFO) buffer; each of the k message packets contains an integer number of encoded base layer blocks; and remaining space in the k message packets is filled with enhancement data pulled from the FIFO buffer.
14. A method according to claim 13, wherein the remaining space in the k message packets is filled backwards such that bits pulled earlier are placed later in the packet than subsequently pulled bits.
15. A method for receiving audio data across a wireless link comprising the steps of: determining a number n of codeword packets transmitted over the wireless link corresponding to a time interval of encoded audio; receiving, after packet loss, n' of the codeword packets over the wireless link where n' < n;d; determining a number k of message packets corresponding to the time interval of encoded audio; decoding k' message packets from the n' codeword packets; and decoding the time interval of encoded audio from the k' message packets, wherein the step of decoding the message packets from the codeword packets uses an (n, k) erasure code if n' > k
16. A method according to claim 15, wherein the (1, k) erasure code is optimal.
17. A method according to claim 15 or 16, wherein the (n, k) erasure code is systematic.
18. A method according to 17, wherein 11 = k + 1 and the non-systematic codeword packet is the exclusive or (XOR) of the message packets.
19. A method according to any of claims 15 to 17, wherein 11 -k is constant across different time intervals.
20. A method according to any of claims 15 to 17, wherein n is constant across different time intervals.
21. A method according to any of claims 15 to 20, wherein the step of determining a number k of message packets comprises reading a field indicating the value of k from a header in any codeword packet and removing the header from all received codeword packets.
22. A method according to any of claims 15 to 21, further comprising the step of: not receiving further codeword packets after successful reception of k codeword packets.
23. A method according to any of claims 15 to 22, further comprising the steps of: accumulating statistics containing information about the number n' of received codeword packets across one or more time intervals; and communicating those accumulated statistics back to the transmitter.
24. A method according to any of claims 15 to 23, wherein the step of decoding the time interval of encoded audio is adapted to consuming a variable number of packets.
25. A method according to claim 24, wherein: the step of decoding the time interval of encoded audio uses a scalable codec consuming both base layer data and enhancement layer data and first enhancement data from each of the k message packets is pushed into a first-in-first-out (FIFO) buffer; and an integer number of base layer blocks are decoded from each message packet; and second enhancement data is pulled from the FIFO buffer and supplied to the scalable codec to improve the decoded audio quality.
26. A method according to claim 25, wherein: first enhancement data towards the end of a message packet is pushed into the FIFO buffer before first enhancement data earlier in the packet.
27. An encoder adapted to perform the method of any of claims 1 to 14.
28. A decoder adapted to perform the method of any of claims 15 to 26.
29. A codec comprising an encoder according to claim 27 in combination with a decoder according to claim 28.
30. A computer readable medium comprising instructions that, when executed by one or more processors, cause said one or more processors to perform the method of any of claims 1 to 26.