HK1145217B

HK1145217B - Generating and communicating source identification information to enable reliable communications

Info

Publication number: HK1145217B
Application number: HK10111715.9A
Authority: HK
Inventors: S.陈; M．G．卢比; M．普拉萨德; W．锡德; T．斯托克汉姆
Original assignee: 数字方敦股份有限公司
Priority date: 2007-09-12
Filing date: 2008-09-12
Publication date: 2014-04-11

Description

Generating and communicating source identification information to enable reliable communication

Priority

This application claims the benefit of non-provisional U.S. application S/n.60/971,884 filed on 12.9.2007 and U.S. application S/n.61/093,277 filed on 29.8.2008, according to 35u.s.c.119 (e). All of these applications are incorporated herein by reference for all purposes.

Cross-referencing

The following references are included herein and incorporated by reference for all purposes:

U.S. Pat. No.6,307,487 entitled "Information Additive Code Generator and Decoder for communication Systems" to Luby (hereinafter referred to simply as "Luby I");

U.S. Pat. No.7,418,651 entitled "File Download and Streaming System" (hereinafter referred to simply as "Luby II") to Luby et al;

U.S. patent No.7,068,729 entitled "Multi-Stage Code Generator and decoder for Communication Systems" (hereinafter referred to simply as "Shokrollahi I") to Shokrollahi et al;

U.S. published patent application No.2006/0036930 (hereinafter referred to as "Luby III") entitled "Method and Apparatus for fast Encoding Data Symbols to Half-Weight Codes", published by Luby et al, 16.2.2006; and

U.S. Pat. No.6,856,263 entitled "Systems and Processes for Decoding chain reaction Codes Through Inactivation" (hereinafter referred to simply as "Shokrollahi II") to Shokrollahi et al.

Watson et al, published on 27.12.2007, entitled "Code Generator and decoder for Communications Systems Operating Using Hybrid Codes to Low for multiple Efficient Users Communications Systems" (Code generators and decoders for use in Communications Systems) U.S. published patent application No.2007/0300127 (hereinafter "Watson" for short).

U.S. patent No.7,249,291 (hereinafter referred to simply as "Rasmussen") entitled "System and method for reproducing the Content of a Live Data Stream" to Jens Rasmussen et al, published 24/7/2007.

Technical Field

The present invention relates to encoding and decoding data in a communication system, and more particularly, to a communication system that encodes and decodes data to account for errors and gaps in communication.

Background

The transfer of files and streams between a sender and a receiver over a communication channel has been the subject of many documents. The receiver preferably wishes to receive with a certain degree of certainty an exact copy of the data transmitted by the sender over the channel. Where the channel does not have perfect fidelity, which characterizes most physically realizable systems, one concern is how to deal with data lost or corrupted in transmission. Lost data (erasures) tend to be easier to handle than corrupted data (errors) because the recipient does not always realize that the transmitted data has been corrupted.

Many error correcting codes have been developed to correct erasures and/or errors. The particular code used is typically selected based on some information about the distortion of the channel over which the data is being transmitted, as well as the characteristics of the data being transmitted. For example, where the channel is known to have a long period of distortion, burst error codes may be best suited for this application. Where only short, infrequent errors are expected, a simple parity code may be best.

Another common technique for providing reliable data delivery is to use data retransmission. For example, the well-known TCP/IP protocol ensures reliable delivery of data using packet retransmission of lost or missing packets. Another example is the HTTP protocol, which builds on top of TCP/IP and uses the reliability of the TCP/IP protocol to provide reliable data delivery. Enhancements to other protocols, such as the RTP protocol, using retransmissions have also been proposed as a way to cope with lost or missing packets at the receiver.

Another technique that has been proposed for improving streaming applications is to send the initial data of a stream to a receiver in one channel and then move to sending the main data stream to the receiver in a second channel. For example, Rasmussen proposes such a method. As another example, initial data may be sent to a receiver via a unicast connection to ensure that the receiver has enough data to quickly begin playing out a video or multimedia stream, and then the receiver may switch to a multicast connection to receive more data for the stream.

"communication," as used herein, refers to the transfer of data through space and/or time, such as the transfer of data from one location to another, or the storage of data at one time and the use of data at another time. A channel is a thing that separates a sender and a receiver. A channel may be spatially a line, network, fiber, wireless medium, etc., between a sender and a receiver. The channel may be a data storage device in time. In a realizable channel, the likelihood that data sent or stored by the sender becomes different when received or read by the receiver is often not zero, and those differences may be due to errors introduced in the channel.

Data transmission is straightforward when the transmitter and receiver have all the computational power and electrical power required for communication, and the channel between the transmitter and receiver is reliable enough to allow relatively error-free communication. Data transmission becomes more difficult when the channel is in a hostile environment or the transmitter and/or receiver has limited capabilities. In some applications, uninterrupted, error-free communication is required over long periods of time. For example, in a digital television system, it is desirable to receive a transmission error-free for many hours at once. In these cases, the problem of data transmission is difficult, even under conditions of relatively low error levels.

Another scenario where data communication is difficult is where a single transmission is directed to multiple recipients that may experience widely different data loss conditions. Furthermore, the conditions experienced by a given recipient may vary widely or may be relatively constant in time.

One solution to dealing with data loss (errors and/or erasures) is to use Forward Error Correction (FEC) techniques, in which data is encoded at the transmitter in such a way that the receiver can correct transmission erasures and errors. Where feasible, the reverse channel from the receiver to the transmitter enables the receiver to relay information about these errors to the transmitter, which can then adjust its transmission accordingly. However, reverse channels are often unavailable or infeasible, or have limited capacity available. For example, in the case where a transmitter is transmitting to a large number of receivers, the transmitter may not be able to maintain the reverse channels from all receivers. In another example, the communication channel may be a storage medium.

For example, data may be transmitted chronologically forward through time, and causal relationships exclude a reverse channel that can fix errors before they occur. As a result, it is often desirable to design communication protocols that have no reverse channel or only a limited capacity of reverse channels, and as such, the transmitter may have to deal with widely differing channel conditions without a priori knowledge of those channel conditions. One example is a broadcast or multicast channel where reverse communication is not provided, or if provided is only very limited or expensive. Another example of such a situation is a storage application, where data is stored in a state encoded using FEC, and then at some later point in time the data is recovered, possibly using FEC decoding.

Another solution is based on retransmissions, which are based on the receiver knowing which packets were not received and then sending a request to the sender to retransmit those missing packets. Identifying which packets are missing is often based on the sequence numbers carried within the packets. Examples of such protocols include TCP/IP, NORM, RTP with retransmission, and so forth.

Another solution is based on a combination of FEC and retransmission. In this case, the FEC may be sent proactively, and then the receiver requests retransmission of packets or transmission of additional FEC packets, for example, only if the data lost by the receiver exceeds what the FEC decoder can recover, in order to provide enough packets to the FEC decoder for recovery of the original source packets. As another example, FEC may not be sent initially, but only in the event of packet loss, the receiver will request additional packets, which may be FEC packets, that can be used to recover the original source packets. This may be an interesting solution, for example, in case the original source stream is sent via multicast, and then the requested packets are sent either also in the same stream or in a different multicast stream. For example, different receivers may lose different numbers of packets, and then the sender sending the requested packets may for example send the maximum number of FEC packets requested by all receivers, which would allow all receivers to recover the original source regardless of their individual packet loss patterns.

In the case of a packet protocol for data transmission over a channel that can lose packets, a file, stream, or other block of data to be transmitted over a packet network is divided into source symbols (which may all be of equal size or may differ in size depending on block size or other factors). Using the FEC code, code symbols are generated from the source symbols and placed in packets and transmitted in the packets. The "size" of a symbol can be measured in bits, regardless of whether the symbol is actually broken into a bitstream, where the size of a symbol is M bits when the symbol is selected from an alphabet of 2M symbols. In such packet-based communication systems, a packet-oriented erasure FEC coding scheme may also be suitable.

File transfers are said to be reliable if they enable the intended recipient to recover an exact copy of the original file despite erasure and/or other corruption of the data transferred over the network. Streaming is said to be reliable if it enables the intended recipient to recover an exact copy of each part of the stream in a timely manner despite erasures and/or corruption that occur within the network. However, both file transfer and streaming may not be completely reliable, but rather reliable to the extent that some parts of the file or stream are not recoverable, or for streaming, some parts of the stream may be recoverable, but not in a timely manner. The goal is often to provide as high a reliability as possible depending on certain constraints, where examples of constraints may be timely delivery for streaming applications, or the type of network conditions over which the solution is expected to work.

Packet loss often occurs because sporadic congestion can cause the buffering mechanism in the router to reach its tolerance, forcing it to drop incoming packets. Other causes of packet loss include weak signals, intermittent signals, and noise interference, where corrupted packets are discarded. Protection mechanisms against erasures occurring during transmission have been the subject of much research.

In systems where a single transmission is directed to more than one receiver and different receivers experience widely different conditions, the transmission is often configured for some set of conditions between the transmitter and any receiver, and any receiver in poor conditions may not be able to reliably receive the transmission.

Erasure codes are known that provide excellent recovery of lost packets in such scenarios. For example, Reed-Solomon (Reed-Solomon) codes are well known and can be adapted for this purpose. However, a known disadvantage of reed-solomon codes is their relatively high computational complexity. Chain reaction codes, including LT^TMChain reaction code and Raptor^TMMulti-level chain reaction ("MSCR") codes, provide excellent recovery of lost packets and are highly adaptable to changing channel conditions. See, for example, Luby I, which describes some aspects of chain reaction codes, and Shokrollahi I, which describes some aspects of multi-stage chain reaction codes. Herein, the term "chain reaction codes" should be understood to include chain reaction codes or multi-stage chain reaction codes, unless otherwise indicated.

Retransmission protocols are also known to be a good way to recover lost packets. Retransmission protocols based on TCP/IP, NORM, UDP and RTP are all examples of such retransmission protocols. Furthermore, using a combination of erasure code protocols and retransmission protocols can be useful for recovering lost packets. A retransmission protocol includes any protocol in which a receiver requests a particular packet or a particular number of packets to be sent to the receiver, and in response the sender may send the particular packet or the particular number of packets to the receiver or a group of receivers.

Some communication systems use transport protocols, such as RTP, that include information that can be used to identify the location or sequence of source packets relative to other source packets in the same stream. By providing sequence information for each packet, the receiver can detect and correct received out-of-network-order packets. The receiver is also able to detect that packets have been lost and when combined with application layer FEC techniques, such as those of DVB-IPI (see, e.g., the description in Watson), the receiver will be able to recover the lost packets and effectively mask network reliability flaws. By referring to the DVB-IPI standard: "DVB Bluebook A086r4(03/07)/ETSI Technical Specification 102034v 1.3.1(DVB blue book A086r4(03/07)/ETSI Technical Specification 102034 version 1.3.1)", the details of DVB can be found in the following URL: http:// www.dvb.org/technology/standards/a086r4.dTS 102034.V1.3.1. pdf.

Many deployed communication systems use transmission level protocols that do not include any form of timing or sequence information. For example, it is common practice in IPTV networks to deliver MPEG2 transport stream packets over "raw UDP". The unique sequence information available to the receiver is embedded in the audio and video elementary streams, which may be difficult to access, or unreliable, and is generally not available at the transmission level. Thus, there is no inherent mechanism at the transport level for the original UDP stream to allow the receiver to identify packets received out of network order or to identify missing packets. Although well-known mechanisms such as application layer FEC ("AL-FEC") can be used to efficiently recover lost packets, the lack of transport-level sequence information on source stream packets limits the direct applicability of such recovery techniques. These same problems apply to retransmission solutions and combinations of retransmission and AL-FEC solutions. Thus, this is a generalization of some prior approaches.

Another problem with some communication systems is that the components are interrelated. In some situations, it may be necessary or desirable to improve the reliability of the communication system after deployment. However, while it may be desirable to improve network reliability, it is often not feasible to replace or upgrade all receiving devices in the network immediately or at all. For example, the result may be that actual network packet loss is higher than originally planned due to degradation in network reliability, increased traffic load, expansion and/or change of the network, etc., or that quality of service requirements may need to be increased to rival competing services, but it may be impractical to spread new receivers immediately across all nodes of the communication system, or to distribute these new receivers over time and retire some of the receiving stations until the new receivers arrive.

To ensure that legacy devices are not affected by protocol enhancements used by upgraded or new receivers, such as AL-FEC or retransmissions or a combination of AL-FEC and retransmissions, it is necessary to continue to deliver source packets using the same transport level protocol. Furthermore, to ensure that source packets delivered to legacy devices do not become more susceptible to burst loss, the same source packet timing distribution or inter-packet timing needs to be maintained. In some communication systems, source packets within a source block may be transmitted with a small inter-packet gap to allow repair packets to be delivered immediately after the associated source packet; such a technique would increase the chance of exposing the source stream to burst loss and thus degrade the transmission efficiency of legacy devices.

In order to provide the best possible service at the lowest cost, the communication system must simultaneously balance conflicting resource constraints. Network bandwidth is a critical resource constraint. Transmission and reception devices need to enable network bandwidth to be used efficiently to support reliable services. The CPU processing available on the receiving device is often a severe limitation, which means that any transmission reliability enhancement method must only require a modest amount of computational effort. Furthermore, it is often desirable to limit the incremental latency associated with reliable transmission methods so that end users do not perceive a reduction in system responsiveness, especially for streaming media.

Disclosure of Invention

According to one embodiment of the present invention, a method of generating source identification information or data, referred to as source identification data or information, from a source stream is described. The method can act on an original source stream and generate relatively compact source identification information or data that can be used by a receiving device to identify the location of source packets relative to other source packets, as well as identify lost or damaged source packets. The source identification information or data generation method enables a compact representation of the source packet identification, which enables the source identification information of a source packet block to be delivered using a relatively small amount of network bandwidth. The computational load of deriving the source identification information is small.

The source identification data may be combined with forward error correction ("FEC") techniques to enable recovery of lost or damaged source packets. The source identification data may also be combined with retransmission techniques, and with a combination of FEC techniques and retransmission techniques, to enable recovery of lost or corrupted source packets. The source identification data may also be used with techniques in the receiver that identify the order in which a sequence of source packets was sent from a sender to the receiver, even when some of the sent packets were lost, corrupted, or reordered.

In another embodiment of the invention, methods and devices may be provided that enable reliable delivery of source identification data, so that the identity or relative location of source packets can be determined at the receiving device, and by extension, the absence of received source packets can also be deduced. To provide reliable delivery of the source identification, packets carrying the source identification can be transmitted in such a way as to minimize the chance of exposure to burst loss or periodic loss patterns. Since the source identification data is derived from the source packet content, two or more source packets may share the same source identification data, and variations of these methods enable protection against such source identification collisions at the receiver.

According to another embodiment of the present invention, there is provided a method of transmitting data over a communication channel, the method comprising the steps of: the original unmodified source stream is transmitted and as an additional stream FEC repair data and source identification data can be transmitted which can be used to efficiently recover lost or damaged packets in the original unmodified source stream.

According to another embodiment of the present invention, there is provided a method of transmitting data over a communication channel, the method comprising the steps of: the original unmodified source stream is transmitted and as an additional stream source identification data is transmitted which can be used to identify which data is lost or corrupted from the received data and to request retransmission. In a further enhancement of this embodiment, the retransmitted data is also identified using source identification information. In a further enhancement of this embodiment, source identification information is used to identify and reorder out of order data receptions. In a further enhancement, the source identification information is used both to recover lost or corrupted data in the original unmodified source stream by using FEC, and to identify which data is lost or corrupted from the original source stream and to request retransmission of the lost or corrupted data, or to transmit additional repair data that can be used to recover the data in the unmodified source stream. In a further enhancement of this embodiment, the first repair stream is transmitted with the original unmodified source stream. In a further enhancement of this embodiment, the second repair stream is transmitted in response to any retransmission request. In a further enhancement of this embodiment, one or more of the various streams described above are cached.

According to another embodiment of the invention, the source identification information identifies which data block within the unmodified source data stream the source packet is associated with. In this case, it may not be possible to generate a unique identifier for each unmodified source packet, i.e., source identification information may be used to identify the data block with which the source packet is associated, but it may not be possible to distinguish the source packets within a block from each other.

According to another embodiment of the present invention, there is provided a method of receiving data transmitted over a communication channel, the method comprising the steps of: (1) receiving an original unmodified source stream for an un-FEC-enabled receiver or for zero or more FEC-enabled receivers; (2) for one or more FEC-enabled receivers, the original unmodified source stream, and some or all of the additional FEC repair data and source identification data streams are received and used to efficiently recover packets lost from the original source stream.

According to another embodiment of the present invention, there is provided a method of receiving data transmitted over a communication channel, the method comprising the steps of: (1) receiving an original unmodified source stream for a retransmission-not-enabled receiver or for zero or more retransmission-enabled receivers; (2) for one or more retransmission-enabled receivers, the original unmodified source stream, and some or all of the additional retransmission repair data and source identification data streams are received and used to efficiently recover packets lost from the original source stream.

According to another embodiment of the present invention, there is provided a method of receiving data transmitted over a communication channel, the method comprising the steps of: (1) receiving an original unmodified source stream for an FEC-not-enabled and retransmission-not-enabled receiver or for zero or more FEC-enabled and retransmission receivers; (2) for one or more FEC and retransmission enabled receivers, the original unmodified source stream, and some or all of the additional FEC and retransmission repair data and source identification data streams are received and used to efficiently recover packets lost from the original source stream.

According to another embodiment of the present invention, there is provided a method of receiving data transmitted over a communication channel, the method comprising the steps of: (1) for an FEC-enabled or retransmission-enabled receiver, determining whether a particular stream being received has associated FEC repair data or retransmission data available; (2) if FEC repair data or retransmission data is available, some or all of the additional FEC and retransmission repair data streams are received as needed and the source identification data is received and used to efficiently recover the packets lost from the original source stream.

According to another embodiment of the present invention, there is provided a method of transmitting data over a communication channel, the method comprising the steps of: the original unmodified source stream is transmitted, and additional associated data streams are transmitted that can be used to efficiently recover lost packets in the original unmodified source stream.

As will be clear to those skilled in the art upon reading this disclosure, the methods described herein can naturally extend to many variations of the above schemes, including sending multiple layers of FEC repair data, sending a source stream to one IP address and additional streams to other IP addresses, sending retransmission data individually to receivers, sending the same retransmission data to multiple receivers via broadcast or multicast, sending a combination of FEC repair data and retransmission data to receivers, sending FEC repair data to one group of receivers and sending retransmissions to a second group of receivers, sending all streams to the same IP address, but differentiated by using different port numbers within the packet, sending only some streams to some of the receivers and not to other receivers, and so on.

A better understanding of the features and advantages of the embodiments disclosed herein may be realized by reference to the remaining portions of the specification.

Brief description of the drawings

Fig. 1 is a block diagram of a communication system capable of generating and using one or more source identification techniques described herein in conjunction with FEC repair techniques.

Fig. 2 is a block diagram of a communication system supporting both legacy set-top boxes and FEC-enabled set-top boxes.

Fig. 3 is a state diagram for an FEC-enabled receiver to determine whether repairs are available for a particular stream.

Fig. 4 is a flow diagram of a possible FEC-enabled receiver.

FIG. 5A is a flow chart showing an example of one of the methods of source identification data.

FIG. 5B is a block diagram showing an example of one of the source identification data methods.

Fig. 6A is a block diagram showing an example of the composition of some source identification data.

Fig. 6B is a flow chart showing how the receiver can calculate the identification information of a received unmodified source packet using source identification data.

Fig. 7 is a block diagram showing an example of the composition of some source identification data.

FIG. 8A is a block diagram showing an example of some further details of the composition of source identification data.

FIG. 8B is a block diagram showing an example of a possible source identification data format.

Fig. 9 is a block diagram of an example of source block boundaries and overlap with neighboring source blocks in a composition of source identification data.

FIG. 10 is a block diagram showing an example of another possible source identification data format.

Fig. 11A is a block diagram showing an example of two identical packets hashed to the same value.

Fig. 11B is a block diagram showing a sequence-position mapping that can handle the same packet.

Fig. 12 is a block diagram showing an example of non-identical packets hashed to the same value.

Fig. 13 is a block diagram showing how the source identification data is FEC protected.

Fig. 14 is a block diagram showing how FEC repair symbols are generated from unmodified source packets.

Fig. 15 is a block diagram showing a possible format of a packet including both FEC repair data and source identification data.

Fig. 16 is a block diagram of a possible partial reconstruction of source identification information from source identification data carried in a packet.

Fig. 17 is a flow chart showing an example of how a receiver may process received source data using source identification information.

Fig. 18 is a block diagram showing lost packets spanning two source blocks during a receive timeout at the receiver.

Fig. 19 is a block diagram showing lost packets occurring early in the source block during a reception timeout at the receiver.

Fig. 20 is a block diagram showing delayed packets within a source block during reception at a receiver.

Fig. 21 is a block diagram showing delayed packets between source blocks during reception at a receiver.

Fig. 22 is a block diagram showing the spanning of delayed packets outside the spreading block during reception at the receiver.

Fig. 23 is a block diagram showing the reception of a duplicate packet within an extension block at a receiver, an example of which is given in fig. 23A and 23B.

Fig. 24 is a block diagram illustrating the reception of duplicate packets at the receiver beyond the extension block.

Fig. 25 is a table showing some example results of protecting a stream with FEC repair data and source identification data.

Fig. 26 is a block diagram showing an example of protecting a stream with FEC repair data and source identification data and retransmission repair data.

A further understanding of the nature and advantages of the inventions disclosed herein may be realized by reference to the remaining portions of the specification and the attached drawings.

Detailed description of the specific embodiments

It is to be understood that the various functional blocks described herein may be implemented by combinations of hardware and/or software, and that some or all of the functionality of certain blocks may be combined in a particular implementation. Similarly, it should also be understood that the various methods described herein may be implemented by a combination of hardware and/or software. Thus, where computing steps are performed (which may be described as "then we proceed to step X"), it should be understood that such description includes electronic hardware and/or software or the like that performs those steps, typically as part of a communications process, without involving human or manual interaction.

In some embodiments described herein, data to be encoded is segmented into "source blocks," each block comprising a number of data packets referred to as "source packets," the number of source packets in a block may vary from block to block. For each source block, several "FEC repair" packets are generated by the encoder, the number of which may also vary from block to block. The "source packet" and "FEC repair" (or "repair") packet have the characteristics of: a receiver that receives a combination of source and repair packets greater in number than or equal to the number of source packets has some non-zero probability of recovering all the source packets. Source identification data type and usage

In some embodiments herein, source identification data may refer to data that: from these data, the location of the source symbols (which can be transmitted in packets or other transmission units (hereinafter more generically referred to as "source data units") relative to other source symbols), as well as the location of lost source symbols, can be identified at the receiving device. As used herein, a source symbol map or source system map (both of which can be referred to as "SSM") is a form of source identification data or information. For the purposes of this disclosure, source symbol mapping and source system mapping are used interchangeably with source identification data. In some embodiments herein, source identification data may refer to data that: from these data, the location of the transmitted packet in the packet stream relative to other packets, including both the location of the received packet and the location of the lost packet, can be identified at the receiving device. In some embodiments herein, source identification data may refer to data used at the sender to identify packets or symbols or other data units requested for retransmission from the receiver. The source identification data may be used at the receiver to recover lost symbols or packets or other data units from the combination of the source and repair symbols or packets. The source identification data can be used to identify which symbols or packets or other data units are lost in order to request retransmission of them. The source identification data can also be used at the receiver to determine the original transmission order of packets or symbols relative to other packets or symbols. The source identification data may be considered a single portion, divided into sub-portions, or may also be considered a stream, and a single portion, sub-portions, or stream of source identification data may be used to identify all or some portion of the source data units, e.g., a portion of the source identification data may be used to identify those source data units within a source block. As will be appreciated by those skilled in the art, there are many other possible uses for the source identification data that are not enumerated herein, but can be easily inferred using the methods and processes disclosed herein. Possible different types and uses of source identification data include, but are not limited to, the following:

(1) method of determining identification tags of known source data units-the ability to determine a short identification tag for each known source data unit from the source identification data. One example use is that the receiver can send the sender the label of a received source data unit, so that source data units associated with no received label can be transmitted from the sender to the receiver. A variant of this approach allows the identification tag to be determined for a restricted set of possible source data units, such as those belonging to a particular source block, or those sent within a specified time period.

(2) Method of determining identification tags of all source data units-the ability to determine short identification tags of all source data units from source identification data, whether the source data units are known or unknown. One example use is that the receiver can send the sender the tags of the missing source data units, so that the source data units associated with the received tags can be transmitted from the sender to the receiver. A variant of this approach allows identification tags to be determined for a restricted set of possible source data units, such as those source data units belonging to a particular source block, or those source data units sent within a specified time period.

(3) Method of determining positioning-the ability to determine the position of each source data unit relative to the ordering of all source data units, for both known and unknown source data units, wherein the ordering is determined by the source identification data, but may be different from the order of transmission of these data units, or from the order of the source data units within a stream or file. One example use is that the sender and receiver can use this information to determine where to place source data units within the source block for FEC encoding and FEC decoding. A variant of this method allows determining a location for a restricted set of possible source data units, such as those source data units belonging to a particular source block or those source data units sent within a specified time period.

(4) Method of determining sequential order-for known and unknown source data units, the position of each source data unit relative to a sequence of transmission orders of all source data units or other source-dependent order sequence (such as the order of the source data units in a file, or the order of the data in an audio-video stream) is determined from the source identification data. One example use is that the recipient can use this ordering to determine in what order to post data units to the application, e.g., in what order to write data units to a file, or in what order to send data units of a media stream to a media player. A variant of this method allows determining a location for a restricted set of possible source data units, such as those source data units belonging to a particular source block or those source data units sent within a specified time period.

When the method of determining position location is used in conjunction with the FEC method, the source symbols within the source block may be ordered according to the position from the method of determining position location, and in this case the source identification method may interact with the manner of operation of the FEC encoding and decoding method. On the other hand, when the method of determining the sequential order is used in conjunction with the FEC method, the source symbols within the source block may be ordered according to the sequential order, and since the sequential order is generally not determined by the method of determining the sequential order, in this case the source identification method may be largely independent of the way in which the FEC encoding and decoding method operates. In some cases, a source identification method that combines the method of determining position location with the method of determining sequential order may be used, for example, there may be some initial source identification data used to determine the position of the source symbols within the source block, then FEC decoding is performed to recover the source symbols based on the position of the source symbols, then there may be some additional source identification data that is part of the source block that is used to determine the sequential order of the recovered source symbols for delivery to the application.

When the method of determining position location is used in conjunction with a retransmission method, the source packet that the receiver requests for retransmission can be identified by its location, e.g., the receiver can send the sender the location of the source packet that requested for retransmission. In these cases, since the sequential order may be different from the position order, the sender may transmit the retransmission packets according to their sequential order, and the receiver may determine the sequential order based on the original reception order of the source packets that are not lost and the reception order of the retransmitted source packets. When the method of determining the sequential order is used in conjunction with a retransmission method, the source packets that the receiver requests for retransmission can be identified by their sequential order, e.g., the receiver can send the sender the sequence number of the source packet requesting for retransmission. In some cases, a source identification method that combines the method of determining position location with the method of determining sequential order may be used, e.g., there may be some initial source identification data sent during the original transmission of the source packet that is used to determine the location within the source block, and then there may be some additional source identification data sent during the retransmission, e.g., the sequence number of the retransmitted source packet, that is used to determine the sequential order in which the source packets are delivered to the application.

As will be appreciated by those skilled in the art, there are many variations of the above scheme. For example, the source identification method can be used in combination with the FEC method and the retransmission method, in which case the retransmission request can be for a particular source packet, for a particular repair packet, or for a specified number of repair packets instead of for a particular repair packet.

Example of Using Source identification method with FEC method

Fig. 1 is a block diagram of a communication system capable of generating and using the source identification techniques described herein in conjunction with FEC repair methods.

In the communication system 100, an input stream or file 110 is provided to a source identification generator 120. Source identification generator 120 generates source identification data that contains information about the relative position of source packets or symbols within a packet or symbol stream. All active symbols or packets may have the same size, which is typically determined using communication system 100, or the symbol size or packet size may vary from use to use, or within a stream. The source identification data may be generated independently of the source symbol size or packet size.

The input stream or file 110 may be provided to an FEC encoder 130. In some embodiments, FEC encoder 130 performs systematic FEC encoding that generates FEC repair symbols based on one or more input source symbols and their positions, and often places these FEC repair symbols in repair packets for transmission.

Notably, there may be source packets constructed from the input stream or file 110 for transmission over the communication channel, the construction of which may be independent of source symbol size. There may or may not be a one-to-one mapping between source packets and source symbols. The source packets may or may not have variable lengths.

Source packets from an input stream or file 110, source identification data from a source identification generator 120, and FEC repair symbols from FEC encoder 130 are provided to transmission module 140. The transmission module 140 may transmit the source identification data and FEC repair symbols in packets to the channel 145; during this process, the source identification data and FEC repair symbols may or may not be combined in the packet. These packets are transmitted along with the source packets over channel 145.

Channel 145 may be an erasure channel, but this is not a requirement for communication system 100 to operate properly. Typically, the original source packet is sent as a data stream that is logically separated from other packets. Examples of logically separate data streams include, but are not limited to, streams sent on different multicast groups, or streams sent to different ports. The data stream transmitted to the channel 145 may be received by the receiving modules 150 and 155. The receiving module 150 is intended to work with an FEC decoder, while the receiving module 155 is a legacy device that may not include FEC decoding. It should be noted that communication system 100 may be designed so that legacy receive module 155 receives only source packets, or a combination of packets but filters out other packets that are not source packets.

Legacy receive module 155 may be given the ability to identify and handle source packets from channel 145. If any repair packets or source identification data arrive, they may be on logically separate streams and can be ignored (silently dropped) by legacy receive module 155. A received stream 185 is generated from legacy receive module 155.

The receiving module 150 is capable of logically distinguishing the source identification data, repair symbols, and source packets. The source packet and source identification data are provided to a source identification interpreter 160 from which interpreter 160 source symbol IDs can be identified that identify the source symbols carried in the source packet. Source identification data providing a relationship between the received source symbol values and their positions is sent to the FEC decoder 170.

In addition, the FEC decoder 170 also has as input FEC repair symbols and source packets, both of which are provided from the receiving module 150. The FEC decoder 170 attempts to recover the missing source packets, if any. The received source packets are reassembled, as necessary, along with any recovered missing source packets from the FEC decoder 170 to produce a recovered stream or file 180. The source packets sent as the recovered input stream or file 180 may be sent exactly or approximately in the original order in which they were sent from the input stream 110.

There are many variations of the communication system 100. For example, FEC encoder 130 may not be used, or FEC encoder 130 may be replaced with a repeater sender module that receives retransmission requests for lost or missing packets and retransmits those packets based on the source identification data. As another example, the FEC decoder 170 may not be used, or the FEC decoder 170 may be replaced with a re-transmitter receiver module that determines which packets or symbols are missing or missing based on the source identification data and sends a request to retrieve those missing packets or symbols and, upon receiving the retransmitted packets or symbols, places them in the proper order and delivers them as the recovered stream 180. As another example, the FEC encoder 130 and FEC decoder 170 may not be used, and a packet reorderer may be used in place of the FEC decoder 170 for reordering the received misordered source packets based on the source identification information. As another example, FEC encoder 130 and FEC decoder 170 may be enhanced to include logic and methods to handle both FEC repair and retransmission. As another example, some portions of the source identification data may be transmitted directly to the receiver, while other portions of the source identification data may be recovered as part of the FEC recovery process.

The communication system 100 can be used to deliver streams, files, or other data types.

Seamless upgrade path

A complete upgrade of a communication system may take a long time for various reasons. For example, older devices may not have the capability to support an upgraded system, possibly due to limitations in computing resources or memory. As another example, some devices are designed with non-upgradeable characteristics, whereby an upgrade can only be performed if the device is replaced with a new device that has already installed the upgrade. In other cases, even where upgrades are theoretically possible, it may be too expensive or risky to upgrade a large number of devices in practice, because of the high cost of remotely installing upgrade software on millions of devices and ensuring that the upgrade is working properly, troubleshooting problems during installation, and the like. Furthermore, if a large number of devices are installed and they all need to be upgraded before the system is put into operation again, it is practically almost impossible to shut down the system during the upgrade, as is the case with large scale operational deployment of IPTV services with thousands, tens of thousands or even millions of receivers, which may take days or months to fully upgrade.

In such a scenario, seamless upgrade paths are highly desirable, but often difficult to implement.

There is a known potential seamless upgrade path for the case where FEC protection for streaming data is deployed within systems that previously did not use FEC. The way to do this is to design the application of FEC in such a way: the original source stream is sent and then additional FEC repair data that can be used by already upgraded receivers to support much higher quality streaming playout is sent in a logically separate stream, while older receivers that have not yet been upgraded only need to ignore the FEC repair data and still operate as before using the original source data stream. This possibility of seamlessly upgrading the path is one of the main reasons that streaming applications prefer systematic FEC codes. The systematic FEC code has the property of including the original source data and transmitting it as part of the FEC encoded data, and in this context the additional FEC data generated and transmitted is called "FEC repair data".

An obstacle to such a seamless upgrade path is that the association between the FEC repair data and the original source data stream used to generate the FEC repair data needs to be determined at the receiver. In particular, for many FEC schemes, the source symbols or packets comprising the original source data stream can be identified and/or sequenced by a source symbol identifier or packet identifier, which can be used by the FEC decoder to recover missing or lost source data from received FEC repair data. However, in many streaming systems, such source symbol or packet identifiers are not included, are only partially included, are not fully reliable, or are difficult to access in the original source stream. Although there are transport protocols, such as RTP, that natively include sequence information in each source packet that can serve as a source packet identifier and/or sequencer, many deployed communication systems do not utilize these protocols and, as a result, no timing or sequence information is included in the original source packet or is otherwise inaccessible at the transport layer. By way of example, IPTV deployments often use UDP packets carrying MPEG2TS units, and in these deployments, identifiers are often not readily accessible or available at the transport level. Furthermore, adding identification data to symbols or packets in the original source stream is also not possible for a variety of reasons. These reasons are that existing deployed equipment will not work if the original source stream is modified in any way, or that existing communication protocols are not desirable or possible to alter for various other reasons. In these cases, the simple seamless upgrade path described above is not possible and other methods and processes are required.

To maintain compatibility with legacy devices in these systems and to allow the use of communication systems that do not natively include packet or symbol or data sequencing or identification information, the methods presented herein disclose transmitting source packet sequence or identifier information in logically separate streams so that capable receivers can gather this information along with FEC repair data and recover erased or lost or missing or corrupted source packets or symbols. As outlined in one embodiment, the source identification data is capable of capturing and representing this information. In some embodiments, this identifier or sequence information is sent in one of the channels used to send the FEC repair data, although this is not absolutely necessary.

Fig. 2 shows an example of a seamless upgrade path for a communication system 200 using a mix of devices that support and do not support the FEC method. The headend device 210 generates and transmits the original program including the original source data and the program repair including the FEC repair data and the source identification data to the core network 220. The core network 220 sends the original program to the access networks 230(1), 230(2), and sends the repaired program to the access network 230(1) where there are legacy Set Top Boxes (STBs) 240(1), 240(2) that do not support FEC, and to the access network 230(2) where there are STBs 250(1), 250(2) that support FEC methods. There are many variations of communication system 200, including variants of legacy STB 240(1) that receive the repaired program but use only the original program, including variants that do not distinguish between the core network and the access network, including variants in which the original program and the repaired program are transmitted as one logical stream, including variants that support a retransmission method as an alternative to or in combination with an FEC method, including variants in which the set-top box is another type of receiving device, including variants in which some receiving devices do not support a source identification data method, some receiving devices support a source identification data method in combination with an FEC method, other receiving devices support a source identification method in combination with a retransmission method, other receiving devices support a source identification method in combination with both a retransmission method and an FEC method, and other receiving devices support a variant in which a source identification method in combination with a data rearrangement method, including variants with multiple transmitting devices, a transmitting and/or generating device comprising program repair is either logically or physically different from a variant of the transmitting and/or generating device of the original program, including at least sometimes the transmitting device is the receiving device or at least sometimes the receiving device is a variant of the transmitting device, including a variant that supports the source identification data method for other reasons than the FEC method or the retransmission method, such as for example to support the data reordering method.

It is desirable that a communication system supporting the source identification method and the associated recovery method, such as the FEC method and the retransmission method, have as few differences as possible in terms of signalling and data delivery compared to a communication system not supporting such a method. For example, it may be desirable to indicate whether a particular stream supports the source identification method and associated method without requiring special pre-signaling, which may include, for example, an electronic program guide or its equivalent. For example, for a variety of reasons, some data flows may support the source identification method and associated recovery method, while other flows may not support, whether at the same time or at different points in time. For example, it may be that initially no flows support the source identification method, then over time more and more flows support the source identification method, and eventually all flows may support the source identification method. It is highly desirable that it is not necessary to signal whether a stream supports or does not support the source identification method, for example, it is not necessary to include information in the electronic program guide whether each stream listed in the guide supports the source identification method.

Fig. 3 shows an example of a possible state diagram 300 that may be used as logic in a receiving device that supports a source identification method and an FEC method, where no pre-signaling is needed to indicate whether a particular stream uses the source identification method and the FEC method. In state diagram 300, the receiver starts in a legacy state 310 when it first joins a stream, i.e., the receiver initially assumes that the stream being transmitted does not use the source identification method and FEC method. Nevertheless, in the legacy state 310, the receiver constantly monitors the reception of packets of the current stream and moves to the SI non-synchronized state 320 in the state diagram if source identification data (or FEC repair data) is received. For streams using both the source identification method and the FEC method, there may be some time, e.g., 100ms, before the first source identification data (or FEC repair data) arrives at the receiver, which may introduce some latency when moving into the SI non-synchronized state 320. Once the receiver is in the SI non-synchronized state 320, the receiver constantly receives FEC repair data (if available) along with source identification data, which the receiver attempts to use to interpret the identity and/or determine the relative ordering of received and missing source packets or symbols. If the receiver is able to interpret the current source identification data, the receiver moves to the SI synchronization state 330. While in the SI synchronization state 330, the receiver may use the interpretation of the source identification data to identify and/or determine the relative ordering of the received and missing source packets or symbols and, in turn, use this relative ordering and use the received FEC repair data to recover portions of the original source data stream. If the receiver is in the SI synchronized state 310 and can no longer interpret the current source identification data, the receiver moves to the SI unsynchronized state 320 until the current source identification data can again be interpreted. The amount of latency moving between the SI wave energy synchronized state 310 and the SI non-synchronized state 320 may be on the order of a guard period, i.e., the length of time it takes to receive source data blocks from a stream if the source data is organized into source blocks and if the source identification data is organized source block by source block.

There are many variations of state diagram 300. For example, a retransmission method may be used instead of or in combination with the FEC method. As another example, a reordering method may be used instead of the FEC method. As another example, other data may be carried in addition to the source identification data, e.g. to indicate whether FEC repair or retransmission methods are in use, or what type of FEC code is in use, or what type of retransmission method is in use for the stream.

Fig. 4 provides an example of a process flow diagram that may be used within a receiving device that supports a source identification method and an FEC method. In step 420, the process monitors for receipt of a packet, and when a packet is received, the process moves to step 430. In step 430, the process checks to see if the SI synchronization state is the current state (corresponding to being in SI synchronization 330 in fig. 3). If the SI synchronization is the current state in step 430, the process moves to step 440. In step 440, the process associates the received packet with the appropriate source block based on the source identification data, and the process proceeds to step 450. In step 450, the process checks to see if it is time to FEC decode the current source block, and if not, the process moves back to step 420 to receive more packets, but if so, the process moves to step 460 and FEC decodes the current source block using the received source and repair packets and associated source identification data relating the repair packets to the source packets, then outputs the current source block (e.g., to a media player on the receiving device) in step 470, the next source block becomes the current source block, and processing continues to step 420. If the SI synchronization is not the current state in step 430, the process moves to step 480. In step 480, if the received packet is a source packet, the packet is output (e.g., to a media player device), otherwise it is discarded, and processing returns to step 420.

Many details are not shown in fig. 4. For example, fig. 4 does not show the receipt of source identification data for determining the association between source and repair packets and the order of the packets, and for determining the status of the SI synchronization state in step 430. There are many variations of the process shown in fig. 4. For example, in step 420, a plurality of packets may be received before proceeding to step 430.

Relationships between packets and symbols

Forward erasure correction codes often operate on symbols selected from a fixed alphabet. For example, the size may be from 2^MIn which case there is a distinct mapping between the symbol and the M binary digit string. Such symbols are hereinafter referred to as having a size of M bits. Data is often transmitted in packets over a communication channel. In some cases, there is a one-to-one mapping between symbols and packets. In other cases, including those described in Luby II for example, the packet may be larger than a symbol, where the packet includes several symbols. In addition, the symbols and packets may comprise the same data, but the bit order and placement may be different, or supplemental information not explicitly contained within the packet, such as a binary representation of the packet length, may be placed into the FEC symbols. Also, data that does not appear in the FEC symbols may appear in the packets, e.g., data that does not need to be protected by the FEC code, because it includes unused fields or fields with the same value in each packet, or because it may be recovered in other ways.

The retransmission method may also be based on symbols, where a symbol may be a byte or some large data unit. For example, the receiver may request that a certain byte range from a file or from a data stream be retransmitted.

Relationships between source blocks and symbols

Many forward error correction codes operate on discrete sets of source symbols called source blocks. For example, a source block may include all source symbols of a file, or in the case of a streaming service, may include all source symbols of a particular segment.

Each source symbol is given a unique identification within the source block to which it belongs. This identification is often not part of the source packet data, but it is often a piece of information used in the FEC decoding process.

The retransmission method may also be based on the source block and the reordering method may also be based on the source block. For example, the source blocks may be of a size that can be conveniently embedded in the receiver memory, and for example, blocks of data that are taken into memory or written from memory to disk may be conveniently identified. As another example, the source blocks may be of a size corresponding to the amount of data transmitted in a particular time period, and it may be convenient to distinguish data transmitted in one time period from data transmitted in another time period, for example, based on which source block the data belongs to.

Source blocks are used in many embodiments of the methods described herein. The source block is not a necessary component of the method for those skilled in the art, and there are variations of the method that do not use a source block.

Receiver with a plurality of receivers

In order for a receiver to be able to exploit the FEC repair data, the receiver often needs to be able to obtain source identification data, which may or may not be delivered to the same channel as the FEC repair data. Generally, the sender and receiver need to use the same protocol when representing the source identification data. Examples of such protocols are given in particular embodiments. The receiver uses the source identification data to identify the location of received source packets within their source blocks, as well as the location of missing packets. The FEC repair data is then used to recover these missing packets, if possible.

During the upgrade process, legacy receivers may ignore both source identification information and FEC repair data if such data is received. It receives the source packet as before the upgrade, and it will continue to execute at the pre-upgrade level until upgraded.

Source identification method

There are many ways to represent and transport the source identification data. In general, there is a trade-off between bandwidth overhead, CPU load, and quality of service (QoS).

The purpose of the source identification method includes the ability to distinguish, identify or order among all source symbols or source packets or other source data units to be transmitted, for example, among all source data transmitted for a file or stream. There are other situations where the source identification method can be applied so that source data units that are in close proximity to other source data units (e.g., sent in order of sending, or within a small time window) can be distinguished, identified, or ordered, for example, for a data stream where the amount of rearrangement, differentiation, or identification only needs to be higher than the maximum amount of source data in the transmission between the sender and receiver. In these cases, a source identification method of dividing a source data unit into a plurality of source blocks or an equivalent method thereof may be suitable, and a method for achieving this is described later within the present disclosure. The source block structure used by the source identification method may or may not be identical to the source block structure used by the FEC or retransmission method (if used).

Many of the source identification methods described below use hash functions in their description. One of the functions of the hash function is to map long strings to much shorter strings in such a way that distinct long strings are mapped to distinct shorter strings. Equivalently, the hash function may map values to values. There are many possible hash functions that can be used for the methods described herein, for example, the hash function (hereinafter referred to simply as "hash function") described in the publication entitled "pair Independence and Derandomization" by Mike Luby and AviWigderson (questions and Trends in the Theoretical computer science, Volume 1, Issue 4, 2005, Print ISSN 1551-. There are many ways to represent a list of hash functions, some of which are described in the context of "hash functions," e.g., by representing how to determine how to meter a particular inputThe bit sequence of the hash function is computed or the hash function is identified by selecting a small list of hash functions from a family of hash functions and indexing the list accordingly. One example of a family of possible hash functions may be described by positive integers (a, b), where a and b range between 0 and a positive integer value P, where the hash function determined by the (a, b) pair maps the input X to (a X + b) modulo P. In this example, a value pair (a, b) may be used to indicate selection of a hash function governed by (a, b) if P is fixed or governed by (a, b), or alternatively to indicate selection of a hash function governed by a triplet (a, b, P) if P is not governed by (a, b). The number of values in the range of the hash function is not limited to prime numbers, and there are known methods for performing hashing similar to that described herein that allow the number of values to be any positive number. Another way to indicate a hash function selected from this same family of hash functions (or any family of hash functions) is with respect to a predetermined set of ternary values (a)₁，b₁，P₁)，(a₂，b₂，P₂)，…，(a_i，b_i，P_i) Wherein the index j may be used to indicate that the selection is by (a)_j，b_j，P_j) The determined hash function.

The following is a description of the first source identification method. Assume that there are K different source data units (e.g., source packets or symbols) that include source data to be transmitted. It is assumed that all K source data units have distinct values. Let H be a hash function that maps K source data elements to K L-bit vectors v (0), v (1), …, v (K-1). If H is a random hash function, then the probability that two of the K L-bit vectors map to the same value is approximately K ^2/(2^ (L +1)), where ^ is the exponentiation function. Thus, to ensure that the K L bit vectors differ with a probability of at least 1- ε, the value of L should be approximately 2 × log (K) + log (1/ε), where log is the base 2 logarithmic operator. As a first example, if K1000 1e3 and e 1e-6, then L should be roughly 40 bits or 5 bytes, where "1 e 3" is a scientific exponential representation of 10^3 ^ 1000. As a second example, if K is 1e6 and e is 1e-12, then L should be approximately 80 bits or 10 bytes. The source identification data may simply be a concatenation of K L-bit vectors concatenated in the order of the source data units, assuming that the sender and receiver are able to compute the same hash function H. The sender is able to calculate source identification data from K source data units by applying a hash function H to each source data unit to generate K L-bit vectors. Upon receiving a source data unit, the receiver can apply H to the source data unit to calculate a corresponding L-bit vector value, and then based on the source identification data, the receiver can distinguish, identify, and order the received source data units of the K source data units, i.e., this is an example of a method of determining order. In the first example above, the source identification data size is 5,000 bytes, while in the second example above, the source identification data size is 10,000,000 bytes. For some applications, both sizes may be prohibitive. For some uses of the source identification data, e.g. some retransmission methods, at the receiver, a much shorter source identification data is sufficient to identify and request the source data units. For example, in a NACK-based retransmission method, the source identification data communicated to the receiver may be a description of a hash function H, and the receiver may compute the hash function H for each received source data unit and send the L-bit vector to which the received source data unit is mapped to the sender. In this case, the sender may determine which source data units were not received by the receiver based on which L-bit vectors were received, i.e. this is an example of a method of determining the identification tag of a known source data unit. As such, the amount of source identification data explicitly communicated to the receiver may even be zero, assuming that the sender and receiver have pre-established which hash function H to use. However, in this case a significant amount of data in the form of a source data unit label may be communicated from the receiver to the sender in order to request retransmission of the missing source data units.

A variation of the above source identification method is that the sender iteratively chooses a new hash function H from the hash function family and applies H to K source data units until the hash function H maps the K source data units to distinct vectors. Note that L2 log (K) is sufficient for H to map K source data units to distinct vectors with a probability of at least 1/2. Thus, from the list of log (1/ε) random hash functions, there is at least one hash function that maps K source data units to distinct vectors with a probability of at least 1- ε. In this case, the source identification data may be a concatenation of an index of the hash function H selected by the sender, a K value, and K L-bit vectors. The receiver can use the source identification data to determine the index of the hash function H and the value of K to use, and then, upon receiving a source data unit, can apply H to the source data unit to determine an L-bit vector that distinguishes, identifies, or orders the source data unit among the K source data units, i.e., this is an example of a method of determining the sequential order. If K is 1,000, then the source identification data is 20 bits to identify the hash function, the K value, and 1,000 20-bit vectors, or approximately 2500 bytes, and if K is 1,000,000, then the source identification data is 40 bits to identify the hash function, the K value, and 1,000,000 40-bit vectors, or approximately 5,000,000 bytes.

The following is a third variation of the source identification data method. The sender can calculate the source identification data as follows. The first hash function H is chosen to map the K source data units into K distinct L-bit vectors u (1), u (2), …, u (K), for example using the first or second source identification data method described above. Order set S₁0, 1, …, K-1. Note that in a different variation, S₁May be greater than or less than K, but in some embodiments K may be S₁Is selected with good accuracy. Selecting L bit vector to be mapped to S₁Hash function H of₁. Using H₁Mapping u (1), u (2), …, u (K) to S₁. If there is no j so that for H₁For u (j) and u (i) map to the same value, so the vector u (i) is said to be relative to H₁Uniquely mapped, otherwise said u (i) is related to H₁A conflict is encountered. If u (1), u (2), …, u (K) is less than a threshold number with respect to H₁Uniquely mapped, then a new hash is chosenColumn function H₁And repeating the above process until H is reached₁The number of uniquely mapped u (1), u (2), …, u (k) is at least a threshold number. Note that for any i, u (i) with respect to a uniformly randomly chosen hash function H₁The probability of uniquely mapping is approximately 1/e, where e is 2.718281828459045, so K/e is a reasonable choice for the threshold number, although other threshold numbers may be appropriate depending on the value of K, e.g., as K approaches zero, the threshold number may approach or reach K. Once the appropriate H is determined₁Then K bit vector V is formed₁Initialized to K zero bits. For each occurrence i such that u (i) is related to H₁Uniquely map to position j of position j, and map V to position j₁Is set to 1.

Let u₁(1)，…，u₁(K₁) Is related to H in K vectors₁K encountering a conflict₁And (4) a vector. Order set S₂＝{0，…，K₁-1} and choosing to map the L-bit vector to S₂Hash function H of₂. Using H₂Will u₁(1)，…，u₁(K₁) Mapping to S₂. If u is less than the threshold number₁(1)，…，u₁(K₁) With respect to H₂Uniquely mapped, then a new hash function H is chosen₂And repeating the above process until H is reached₂Uniquely mapped u₁(1)，…，u₁(K₁) Is at least a threshold number, wherein the threshold may depend on K₁. Vector V of K bits₂Initialized to K zero bits. For each occurrence i such that u (i) is related to H₂Uniquely map to position j of position j, and map V to position j₂Is set to 1.

More hash functions H may be used₃，…，H_hThe above process is repeated until u (1), u (2), …, u (K) are all uniquely mapped with respect to the hash function. Specifically, H_hUniquely mapping u_h-1(1)，u_h-1(2)，…，u_h-1(Kh) in which u_h-1(1)，u_h-1(2)，…，u_h-1(K_h) Is related to H in K vectors₁，…，H_h-1K encountering a conflict_hAnd (4) a vector.

The steps of a third variation of the source identification data method are outlined in fig. 5A. A hash function H is chosen (502) that maps K source packets (501) to distinct vectors u (1), u (2), …, u (K). Next, a hash function is chosen (503) to act on a subset of the K vectors that has not been uniquely mapped by the previous hash function (for the first hash function, the subset is the full set of K vectors). A subset of the K vectors is then hashed (504) using the selected hash function. During this hashing, the index of the packet on which the annotation operates, the hash function and resulting values, and potentially conflicting information are recorded (506). In one embodiment, this data is recorded in a bit vector. Next, the output of the hash function may be analyzed to determine if any collisions have occurred (505). If there is a collision, the process is repeated with a different hash function. If there is no conflict, the recorded information (506) can be used to form a source symbol map (507).

Source identification data or equivalently, the source symbol map may be an index of a hash function H, a value of K, the number of hash functions added, H, a hash function H₁，…，H_hIndex of (1), K₁，…，K_h-1Value of (d), and bit vector V₁，…，V_hIs cascaded. The location of the source data unit may be determined as follows. A hash function H is applied to the source data unit to generate an L-bit label u. H is to be₁，…，H_hSuccessively applied to u until there is the first i to make u about H_iIs uniquely mapped and let j be H_iThe location to which u is mapped. The location of the source data unit is V₁，…，V_i-1The total number of bits set to 1 in plus V_iThe number of bits set to 1 within the top j bits. This method for determining the ordering of the source data units and their positions is an example of a source identification method implementing the method of determining the positioning, i.e. based on a source labelThe identification data determines the location of the source data unit, but these locations are not necessarily the transmission location of the source data unit. If K is 1000 and a threshold number K/e is used, then the source identification data is the sum of the bits required to identify the hash function and K, K₁，K₂，…，K_h-1Followed by 2700 bits, or approximately 350 bytes. If K is 1,000,000, the source identification data is approximately 340,000 bytes. As will be appreciated by those skilled in the relevant art, this variant may have many alternatives and optimizations, for example, the sequence of values K may be chosen in different ways₁，…，K_h-1The threshold number, bit vector V, can be chosen in different ways₁，…，V_hMay be a number of bits per entry or a variable number of bits per entry if the last vector V_hIs always set to 1, the vector V_hCan be omitted, etc.

Fig. 5B is an illustration of a third variation of the source identification data method just described. In the example of fig. 5B, there are eight data units (510(1), …, 510(8)) for which source identification data is to be generated. A first hash function H that hashes the contents of the eight data units (510) into a set (520) of eight L-bit hash values is chosen, wherein H is chosen such that if the eight data unit contents (510) differ, then the eight L-bit hash values (520) also differ, wherein in this example the eight L-bit hash values (520) are denoted A, B, C, D, E, F, G, and H, respectively. Note that more than one candidate hash function may be chosen before choosing the appropriate hash function H that maps all source data units to unique hash values. It is also noted that in some embodiments, the application of this initial hash function may be skipped, and the process outlined below may be performed by operating directly on the source data unit (510) rather than operating on the hash value (520).

Referring to FIG. 5B, a second hash function H is chosen that maps each of these L-bit hash values (520) to hash values ranging between 0 and 7₁Wherein is formed by H₁The defined mapping is composed of the slave hash value (520)An arrow to the hash value (530). Note that the appropriate hash function H is chosen₁More than one candidate hash function may have been previously selected so that there are enough hash values (520) to map to unique hash values (530), where in this example the threshold number may be such that at least four of the hash values (520) map to unique hash values (530). In bit vector V₁(540) A value of 1 for bit j indicates that there is a unique hash value (520) mapped to hash value j (530), and a value of 0 for bit j indicates that either zero or two or more hash values (520) are mapped to hash value j (530). In this example, V₁Is set to 0 because of the presence of a bit in H₁Next, no hash value (520) maps to the value 0(530), V₁Is set to 0 because of the presence of a bit in H₁Next, hash values B and C (520) both map to the values 1(530), V₁Is set to 0 because of the presence of a bit in H₁Next, no hash value (520) maps to the value 2(530), V₁Is set to 1 because at H₁Next, hash value A (520) is uniquely mapped to the value 3(530), V₁Is set to 1 because at H₁Next, hash value G (520) is uniquely mapped to the value 4(530), V₁Is set to 1 because of the presence of a bit in H₁Next, hash value D (520) is uniquely mapped to the value 5(530), V₁Is set to 0 because at H₁Next, hash values E and H (520) both map to values 6(530), and V₁Is set to 1 because of the presence of a bit in H₁Next, hash value F (520) is uniquely mapped to value 7 (530). In this example, hash values B, C, E and H (520) do not map to unique values, and thus, the four hash values (520) are to be hashed by another hash function H₂To four values 0 to 3(550), resulting in a collision bit vector V₂(560). In this example, hash values E and H (520) are represented by H₂Uniquely mapped, with hash values B and C (520) at H₂Conflict under mapping, so that the two hash values B and C (520) are passed through another hash function H₃Mapping to two values 0 to 1(570), resulting in a collision bit vector V₃(580) And is andin this example, all remaining values are represented by H₃Uniquely mapped, such that H₃Is the last hash function required to complete the source identification data. The source identification data in this example includes the value h-3, the value K-8, K₁＝4，K₂Hash function H, H ═ 2₁，H₂，H₃And a bit vector V₁，V₂，V₃。

In the example shown in FIG. 5B, the collision bit vector V₁，V₂And V₃The index on the left indicates the position of the source data unit as determined by this source identification data, i.e. position 0 corresponds to a packet with sequence number 0, position 1 corresponds to a packet with sequence number 6, position 2 corresponds to a packet with sequence number 3, position 3 corresponds to a packet with sequence number 5, position 4 corresponds to a packet with sequence number 7, position 5 corresponds to a packet with sequence number 4, position 6 corresponds to a packet with sequence number 2 and position 7 corresponds to a packet with sequence number 1. From this source identification data, the client can easily calculate the location of any received or missing packets of the eight source packets. Thus, for example, the positions may be used to order source packets within a source block for FEC encoding and decoding purposes, such that the source identification data is sufficient for FEC encoding and decoding of the source data units. As another example, the recipient may calculate the location of all received source data units and determine therefrom the location of the missing source data unit. The receiver may send a request for missing source data units, identifying which source data units are being requested via the location of the source data units, and the sender may determine which source data units to send in response based on the requested location. This third variant is an example of a source identification method implementing the method of determining a position fix.

As a fourth variant, the source identification data of the third variant may be augmented to include a sequential order of K source data units (where, for example, their sequential order corresponds to the original source data order, or as another example, to the source data transmission order) as described above for the third variantTo their respective positions. For example, the additional sequential data may comprise an ordered list of K sequence numbers SEQ ═ SEQ₀，seq₁，…，seq_K-1>Wherein seq_jIs the sequence number of the data unit having position j. As such, this additional sequential data may include a mapping of positions to sequences.

An example of source identification data for this fourth variation is shown in fig. 6A, which is an augmentation of the example shown in fig. 5B. For the example of fig. 5B, K-8 and h-3. As shown in fig. 6A, h is 3(610), K is 8, K₁＝4，K₂2(620), the four hash functions are H, H₁，H₂，H₃(630) At (640) a bit vector V of size 8, 4 and 2 respectively is shown₁，V₂，V₃Concatenated, additional sequential data including K-8 sequence numbers is displayed at (650). In this case, the source data unit mapped to position 0 according to the hash function is a source data unit with sequence number 0, the source data unit mapped to position 1 has sequence number 6, the source data unit mapped to position 2 has sequence number 3, the source data unit mapped to position 3 has sequence number 5, the source data unit mapped to position 4 has sequence number 7, the source data unit mapped to position 5 has sequence number 4, the source data unit mapped to position 6 has sequence number 2, and the source data unit mapped to position 7 has sequence number 1.

For this fourth variant, the source identification data is sufficient to distinguish, identify and determine the order of the K source data units, i.e. this is an example of a source identification method that implements the method of determining the order. The additional size of the source identification data of this fourth variant exceeds the size of the source identification data in the third variant by K × log (K). For example, if K is 1,000, then the additional source identification data size beyond the third variant size is 1250 bytes, and if K is 1,000,000, then the additional size is 2,500,000 bytes. In some implementations of this fourth variant, the portion of the source identification data corresponding to the third variant may be sent to a receiver that allows for determining the location of the source data units, which may be used to FEC decode the source block (where the source data units within the source block are ordered according to their location), and a complementary portion of the source identification data that allows for determining the sequential order of the source data units is included within the source block that can be recovered using the FEC decoding method. For example, referring to fig. 6A and 6B, source identification data, which may include an h value (610), a K value (620), a hash function description (630), and a bit vector (640), is communicated to a receiver to determine the location of a source data unit in a source block. This source identification data may be in the form of a source symbol map (601). The information contained in the source symbol map may be parsed (602) to extract the data outlined in fig. 6A, except that the position-to-sequence map (650) is not included. (this is also illustrated at 603.) source symbol mapping may then be used in a method of determining position location, involving several hash calculations and collision resolution (605) of the received source data unit. In one embodiment, the received source data unit may be in the form of a source packet (604). The received symbols may be verified and associated (606) with positioning information before being fed to the FEC decoder (607). The position-to-sequence mapping data (650) may include sequential source identification data (608) included in the source block, available after FEC decoding, and used to determine the recovered source data units. Fig. 7 shows an example of position-to-sequence mapping data (650) included in a source block comprising a number of source data units (510 (1-8)). In fig. 7, the position-to-sequence mapping data (650) is appended to the end of the source data units (510(1-8)) included in the source block, where the order of the source data units in the source block is determined by the position information.

As another variation, the source identification data may be further augmented to include the size of each source data unit. This variant is generally not used if all active data units are the same or approximately the same size. This variant allows the position of source data units of variable size to be determined more accurately.

FIG. 8A is an example of another embodiment of source identification data similar to the example in FIG. 5B. One difference between the example shown in fig. 5B and the example shown in fig. 8A is that the bit vector in fig. 8A includes two bits of information instead of just one bit of information. This additional data may be used to convey more detail at a given level about any conflicts or non-conflicts with hash operations. For example, in FIG. 8A, "11" is used to indicate that no hash value is mapped to a particular location in the bit vector; "00" is used to indicate that a particular location in the bit vector maps to a unique source packet, while "10" is used to indicate that more than one source packet maps to a particular location. Those skilled in the art will recognize that other similar schemes can be used to convey information such as this.

Fig. 8B illustrates an example of source identification information formatted in a data structure (810). The data structure may then be used in various ways. Those skilled in the art will recognize that there are many possible formats for the source identification information. Any data structure for organizing the source identification information can be inserted into each packet, or spread across multiple packets, which can optionally be FEC protected, and then transmitted to the receiver.

In the data structure (810) embodiment illustrated in fig. 8B, the approximate size of each illustrated field is listed below the field in bytes. The size of the fields may also vary for different embodiments of the data structure. In fig. 8B, G denotes the number of symbols per packet. If G is equal to 0, this indicates a variable length packet. H represents the number of modulo hash attempts to create the source identification information. F is the initial hash function index. Check is the checksum of the source block. B is the hash location map bit allocation. L is the hash collision vector length for each iteration of the hash. In fig. 8B, for the first 2 levels, L is 2 bytes each, followed by 1 byte each. V stores the actual hash collision vector. M stores a hash location map. In fig. 8B, each value takes B bits. B1 is an optional field applicable only to G ═ 0. B1 stores bit allocations for variable numbers of block symbols. M1 is another optional field applicable only for G ═ 0. M1 stores the number of symbols in the packet.

Fig. 8B also illustrates an example of this general format using data from fig. 8A (820).

How a receiver uses source identification data

The protocol communicates the number of hashed packets K, K between the sender and the receiver₁，…，K_n}, collision vector { V, V₁，…，V_nH, and a hash function H, H₁，…，H_nIn which H is_nResulting in no collision, i.e., Vn is all 0's. In addition, the positions of all K source packets are conveyed along with their location in the vector { V, V₁，…，V_nThe mapping between the corresponding hash positions within the packet, and the number of symbols in each source packet.

As an example of how the receiver can determine the sequence number of the received source data, it is assumed that the receiver receives the fifth source data unit (510(5)) according to the example shown in fig. 5B. The receiver applies H to this source data unit to obtain a hash value E (520). The receiver will H₁Applied to E (520) to obtain 6 (530). Due to V₁Equal to 0(540) at position 6, so the receiver determines that E (520) is about H₁Conflict with some other hash value (520), thereby converting H₂Applied to E (520), thereby obtaining 3 (550). Due to V₂Equal to 1(560) at position 3, so the receiver determines E (520) as to H₂Uniquely mapped. The receiver can pass through the pair V₁The number of 1 s in (1) is counted and compared with V₂Middle to V₂Is determined by means of an addition of the number of 1's up to but not including a 1's in that position 3, wherein the result of the addition equals 5, whereby the position of the fifth source data unit is 5. Referring to fig. 6A, the receiver can determine the sequence number of the fifth source data unit by looking at position 5 of the position-to-sequence mapping (650), which position 5 equals 4, whereby the sequence number of the fifth data unit is 4. By using a similar procedure for each received source data unit, the receiver can determine the sequence of each received source data unitAnd since the sequence numbers run consecutively from 0 to K-1, this allows the receiver to determine the sequence numbers of all missing source data units.

One potential difficulty is the identification of a particular source block when source identification data is used to identify a portion of the source data rather than the entire source data, because the source packet itself may not contain this information and there may be no existing synchronization protocol for determining and communicating source block boundaries. One possible solution is to use the above-described source identification communication protocol over a wider range of source packets than the source block of interest, i.e. the generated source identification data may overlap with preceding and succeeding source blocks. An example of this solution is illustrated in fig. 9. The source block boundaries may be signaled and conveyed relative to their relative positions over the entire range of the source packets involved. The wider the range of overlap, the more robust the protocol becomes, but at the cost of additional bandwidth overhead and CPU load.

FIG. 10 illustrates an example of an embodiment that is similar to the example shown in FIG. 8B, but the structure of the source identification data has been modified to include a new field "O" that represents an offset from the reference. Offset from baseline as illustrated in fig. 9, the amount of overlap in the generated source identification data between source blocks can be determined. Such an offset may be used to determine an overlap with source data units of other contiguous source blocks at the beginning and end of a source block, respectively. Any packet received within this offset range (below the contiguous source block) can be discarded for determining the sequential order and location of the source data units within the source block covered by this source identification data. (other source identification data covering the contiguous source blocks to which the dropped packets belong can handle the determination as to how sequentially or positioned the packets within these other contiguous source blocks.)

Another potential problem is that two or more source packets within a single source block may be identical. These same packets will always have hash collisions regardless of the hash function applied to the packet. In the hash location mapping, both packet sequences will map to the same hash location. Fig. 11A illustrates this case.

A method of indicating which source packets are the same and communicating this piece of information to the receiving device may be devised. In this way, this approach reduces the problem of the receiver to that of just handling source packets that are all different from each other, in which case the protocol as described before can be applied. Assuming that there are two packets that are identical and that they do not collide with any other different packet in the hash function, the collision due to the same packet is not marked in the collision bit vector for communication. The same output value means that the same source packet exists. The following are examples of such methods. Instead of using a position-to-sequence mapping, such as that shown in fig. 6A (650), a sequence-to-position mapping is used instead. Each identical packet can then map to the same location regardless of its sequence number, and for each identical packet, the sequence-to-location map can list the same location to which it maps at its sequence number. For example, assume nine source data elements (1110) as shown in FIG. 11B, where the corresponding hash values from H are displayed (1120), which are the same hash values (520) as shown in FIG. 5B. Assume that the location determined by these hash values (1120) is the same as the location shown in FIG. 5B. The sequence-to-location map (1130) then indicates that the same source data units with sequence numbers 4 and 8 are both mapped to the source data unit at position 5. When using this method, it may be useful to indicate the number of source data units and the number of unique source data units in the sequence as part of the source identification data, e.g., these numbers are 9 and 8, respectively, in the example of fig. 11B.

Another potential problem is the rare situation that may be as follows: after each available hash function has been applied to a packet, different packets may still collide. This may occur due to a limited pool of hash function selections, or due to an insufficient amount of time allocated to resolve the conflict. In these cases, the last collision bit vector will have one or more bits marked as "final collision". The receiver simply discards packets whose hash collision cannot be resolved. This potential solution is illustrated in fig. 12.

Using source system mappings to identify source data provides a number of advantages. For example, the efficiency of the source system mapping size is reasonable because for some embodiments described for some K values ranging up to several hundred, they only require about an additional 4 to 12 bits of data for each source packet. In addition, multiple hash functions also help to reduce CPU load. Firstly, operating a low-complexity hash function; the higher complexity hash function then operates on much fewer packets. Another advantage is that no additional methods are required to resolve hash collisions. Furthermore, the source system mapping is flexible to hash function design or replacement.

Reliable transmission of source identification data

Depending on the network configuration, the source identification data may either be combined with the FEC repair data in a single channel or transmitted independently. Since this piece of information may be critical in the packet recovery and/or reordering process, it is preferable to apply a forward erasure correction technique to the source identification data. Note that the FEC applied to the source identification data does not necessarily have any relation to the FEC code on the original source stream. It may or may not be the same FEC code, and it may or may not be systematic. Furthermore, it may also be beneficial to spread out the transmission of these source identification data packets (and their FEC repair portions), since then the transmission will be more resilient to burst erasures. Another technique is to combine all or part of the source identification data with FEC repair data generated from the original source stream, if possible, to further reduce bandwidth overhead. FEC protection of the source identification data is still preferred in case the source identification data is not used together with the FEC repair data of the original data.

In one embodiment, the source symbol map can be transmitted with FEC data (such as data generated using chain reaction codes). The result is that the receiver can receive two streams-one with unmodified source data and a second with the FEC data (chain reaction repair data) and source symbol mapping data of the original stream. The source symbol mapped data can then have its own FEC data.

Fig. 13 shows an example of a method of generating FEC repair data from a source symbol map. In fig. 13, the source symbols are first mapped into a number of fragments (1310), which are then used as input to an appropriate FEC protection scheme (1320). The FEC protection scheme generates a set of repair data (1330), and these repair data (1330) can then be transmitted along with the original source symbol map. In one embodiment, sequence numbers are applied to the source symbol mapped fragments (1310) and repair data (1330), such that if there are J source symbol mapped fragments, the source symbol mapped fragments are assigned sequence numbers ranging from 0 to J-1, and repair data is assigned to sequence numbers beginning with J and proceeding from J.

Fig. 14 is an illustration of generating repair symbols from a source packet. In one embodiment, the repair symbols are generated using chain reaction codes. As referenced in fig. 14, a source packet may contain multiple source symbols, and the source packets each contain a different number of symbols.

Fig. 15 illustrates the structure of repair packet information in one embodiment, where a repair packet contains both FEC data and source identification data. In the embodiment shown in fig. 15, the source identification data is in the form of a source symbol map. In fig. 15, R denotes the number of FEC repair symbols in a packet. In rare cases, R may be 0 if there are no repair symbols in the packet. SBN denotes the source block number protected by this packet. In the embodiment shown in fig. 15, the SBN may range from 0 to 4095. PP is the protection period provided by the FEC scheme. In the embodiment illustrated in fig. 15, the unit of PP is 10 ms. Both SBN and PP can be used to enhance the synchronization of the streams. The ESI is the encoding symbol ID of the first FEC repair symbol in the packet. K is the number of source symbols in the source block. J is the SSM fragment sequence number. X is the number of original SSM fragments. Consecutive FEC repair symbols within the same packet have consecutive coded symbol IDs. The SSM segments may include either the original source identification data or FEC repair data for the original source identification data segments.

FIG. 16 illustrates one embodiment showing how fields from the data structure illustrated in FIG. 15 can be used to reconstruct a source system map. In fig. 16, field J, X, as well as the SSM fragment from repair packet (1610), are extended to show what SSM data may be carried in the FEC repair packet protecting the original source data in one embodiment.

Fig. 17 is a flow chart illustrating how a receiver can apply source system mapping to source symbols received by the receiver. SSM information is applied 1720 to the received source packet 1710. If the receiver cannot identify the unique location of the source packet or if the process outputs an invalid result (1730) for any reason, for example, because the source packet is not within a valid range of the location or sequence number of this source block, the source packet may be discarded (1740) from this source block. If the mapping is successful, the source symbol positions and/or sequential order can be extracted (1750) from the source symbol map and associated with the source symbols. The source symbols may then be passed to an FEC decoder (1760), such as a chain reaction decoder, for further processing, such as recovering the source blocks.

Transmission error handling

In many embodiments, the data stream transmitted from the encoder to the decoder is transmitted over the network in a series of packets. In many such networks, there are many potential errors that may occur in the transmission of packets. Packets may be lost, delivery of packets may be delayed, and packets may be duplicated during transmission, and the same packets may be received by the decoder at different times. In some embodiments, the decoder is time constrained and may time out the decoding of a given source block after a period of time. However, the nature of the network may result in packets not being delivered to the receiver until the source block of interest has timed out. The following are some examples of how the receiver can handle various packet errors. None of the examples below are intended to give an exhaustive list of how the receiver can handle various errors.

Fig. 18 illustrates a packet reception timeout where a lost packet spans two blocks. If the source packet may be associated with the wrong block, the two blocks may be flushed to the application. As used herein, flushing may mean not attempting to decode those blocks; the source packet is returned as is. Alternatively, flushing may mean simply dropping the source packet. The operation on block N +2 will return to normal reception and decoding. Sometimes, a timeout may be associated with a lost packet spanning more than two blocks. In such a case, the source packets in between may be stored in a temporary block and then eventually flushed to the application.

Fig. 19 illustrates early burst loss in a source block. Packets belonging to block N are flushed to the application. Block N +1 may be operated on with normal reception and decoding.

Fig. 20 and 21 illustrate delayed packets received either within the same block or within an extension block. In both cases, the delayed packet can be correctly identified within block N without special processing. The receiving and decoding operations proceed as usual for source block N and for source blocks after N.

Fig. 22 illustrates a delay packet spanning out of the spreading block. Block N will be decoded as usual. For block N +1, there are three possible situations that may occur on a delayed packet. A first possibility is that delayed packets are considered invalid. For example, the packet may not be properly hashed to any packet ID contained in the SSM. In this case, the packet is discarded and block N +1 is decoded as usual. A second possibility is that, according to SSM on block N +1, this delayed packet is assigned a packet ID, but this assigned packet ID coincides with the ID of another packet received for block N + 1. In this case, no attempt is made to decode block N +1, but block N +1 is flushed to the application. A third possibility is that, according to the SSM on block N +1, this delayed packet is assigned a packet ID and no other packet in block N +1 adopts this packet ID. In this scenario, decoding of block N +1 is attempted and the checksum of the recovered block is relied upon to invalidate this decoding attempt. This invalidation results in only the received source packet of block N +1 being returned to the application.

Fig. 23A/23B illustrate duplicate packets received within an extension block. In both cases, duplicate packets can be correctly identified within block N. The receiving and decoding operations can be easily performed as usual.

Fig. 24 illustrates duplicate packets received beyond the outside of the extension block. Block N will be decoded as usual. For block N +1, there are three possible scenarios that may occur on a duplicate packet. A first possibility is that duplicate packets are considered invalid. For example, a packet may not be able to properly hash to any packet ID contained in the SSM. In this case, the packet is discarded and block N +1 is decoded as usual. A second possibility is that, according to SSM on block N +1, this duplicate packet is assigned a packet ID, but this assigned packet ID coincides with the ID of another packet received for block N + 1. In this scenario, block N +1 is flushed to the application without attempting to decode block N + 1. A third possibility is that, according to the SSM on block N +1, this duplicate packet is assigned a packet ID and no other packet in block N +1 adopts this packet ID. In this scenario, decoding of block N +1 is attempted and the checksum of the recovered block is relied upon to invalidate this decoding attempt. This invalidation results in only the received source packet of block N +1 being returned to the application.

Case study

The following is a number of case studies showing the features of the described method on various systems. These are case studies for streaming applications, where FEC repair packets are added using FEC codes to protect the original stream, and where each SSM covers a small portion of the entire data stream. In all of these case studies, it was shown that,

each source data packet carries 7 MPEG-2TS units, where each MPEG-2TS unit is 188 bytes.

The symbol size is chosen to be 1320 bytes so that each source packet is a symbol.

Each repair packet carries one FEC repair symbol and other data such as all or part of the SSM.

SSM is either carried completely within each FEC repair packet or partially within the FEC repair packet.

Each SSM covers one source data block to which FEC protection is to be applied, and each SSM carries some overlap information about the contiguous source blocks in order to filter out packets from those source blocks.

In each case, a maximum packet size of 1500 bytes is desired in order to avoid packet fragmentation when sending packets over the network.

In each case, two values are derived: p_SSMAnd P_FEC。P_SSMIs the probability that not enough packets arrive for the source block to recover the SSM data, and P_FECIs the probability that not enough packets arrive for the source block to recover the source block, assuming an ideal FEC code. These probabilities are calculated assuming random and independent packet losses under the loss conditions declared for each case.

Case study 1

The > source data stream rate is 4 Mbps.

> SSM coverage period (═ protection period): 125ms

Loss condition: 0.1 percent of

Parameter derived:

number of packets in source block 48

The number of repair packets is 3

SSM consideration:

number of packets for SSM (5 packets overlap per end) 58

The number of hash functions is 5

Hash collision vector assignment ≈ 20 bytes

Hash location map allocation ≦ 58 bytes

The SSM length is less than 100 bytes, so the entire SSM can be carried in each FEC repair packet.

Failure rate:

·P_SSM＝(0.1％)³＝10^-9

·P_FEC＝2.4*10^-7

case study 2

The > source data stream rate is 4 Mbps.

> SSM coverage period (═ protection period): 250ms

Loss condition: 0.7 percent

Parameter derived:

the number of packets in the source block is 95

The number of repair packets is 9

SSM consideration:

number of packets for SSM (10 packets overlap per end) 115

The number of hash functions is 5

Hash collision vector assignment ≈ 40 bytes

Hash position map assignment ≈ 114 bytes

The entire SSM length ≈ 168 bytes

SSM divided into 2 fragments

Use FEC protection to generate another 9-2 ═ 7 repair SSM fragments

Each FEC repair packet carries an SSM fragment

Failure rate:

·P_SSM＝5.156*10^-17

·P_FEC＝4.052*10^-9

case study 3

The > source data stream rate is 6 Mbps.

> SSM coverage period (═ protection period): 500ms

Loss condition: 0.7 percent

Parameter derived:

the number of packets in the source block 285

The number of repair packets is 25

SSM consideration:

the number of packets for SSM (30 packets overlap per end) ═ 345

The number of hash functions is 7

Hash collision vector assignment ≈ 120 bytes

Hash position map assignment ≈ 400 bytes

The entire SSM length ≈ 538 bytes

SSM divided into 5 fragments

Use FEC protection to generate another 25-5-20 repair SSM fragments

Each FEC repair packet carries an SSM fragment

Failure rate:

·P_SSM＝6.879*10^-42

·P_FEC＝1.887*10^-15

FIG. 25 is a graph of failure rates for a 4Mbps stream for varying protection periods and amounts of protection under 0.7% loss conditions. It can be concluded from this data that the probability of decoding failure due to incomplete SSM is much less than the probability of decoding failure due to insufficient number of symbols received by FEC.

Multiple stream/retransmission embodiments

Fig. 26 shows an example of a communication system 2600 that is transmitting data to a set-top box using a mix of streams. The headend device 2610 sends the original program, which includes the original source data in the source stream, and two separate repair streams for the program, which may include FEC repair data and/or source identification data, to the network 2620. The network 2620 sends the original program to the intermediate server 2630. The intermediate server 2630 sends the source stream and one of the repair streams to the network 2650. Network 2650 may be the same network as network 2620, but it may also be a different network. In any case, network 2650 then transmits the source stream and the one repair stream to one or more STBs 2640. In case the data received by any STB is not sufficient to create the source data, the STB may request a retransmission stream. This request is sent to an intermediate server 2630, which intermediate server 2630 may respond to the request requiring retransmission by sending a second repair flow different from the first repair flow. In some embodiments, packets from the second repair flow are cached on the intermediate server. In certain embodiments, the intermediate server 2630 caches repair data that has not yet been sent to the STB. In certain embodiments, the intermediate server 2630 caches the repair data based on the protection period of the source stream. In some embodiments, the caching of the source or repair stream may be performed elsewhere in the system/network. In some embodiments, the STB may request a certain amount of repair data, such as a certain number of repair packets, in its retransmission request. In some embodiments, the STB may request repair data (source data or FEC repair data) based on source identification data that the STB has received. In some embodiments, the STB that initially receives the source data and/or repair data first receives the data cached at the intermediate server in a unicast stream, and then the intermediate server transfers the stream sent to the STB to a multicast stream. There are many variations of communication system 2600, including variations where the source stream and repair stream are transmitted as one logical stream, including variations where the set-top box is another type of receiving device, including variations where some receiving devices do not support the source identification data method, some receiving devices support the use of the source identification data method in conjunction with the FEC method, other receiving devices support the use of the source identification method in conjunction with the retransmission method, other receiving devices support the use of the source identification method in conjunction with both the retransmission method and the FEC method, and other receiving devices support the use of the source identification method in conjunction with the data reordering method, including variations where there are multiple transmitting devices, where the transmitting and/or generating device that includes the repair of the program is different from the transmitting and/or generating device of the original program, including variations where the transmitting device is at least sometimes the receiving device or where the receiving device is at least sometimes the transmitting device, including variations in the method of supporting source identification data for other reasons than the FEC method or the retransmission method, such as for supporting a data reordering method.

DVB-T/DVB-H embodiments

Source identification data transmitted in a data stream separate from the data stream containing the source data may be applied to various systems. For example, appendix a discloses such a system: wherein the FEC protection and the source identification data may be transmitted on a first data stream carried over a network such as a DVB-T broadcast network and the original source data may be transmitted on a second data stream transmitted over the same network. Appendix a is incorporated into the present disclosure for all purposes. In other embodiments, the FEC protection and feed identification data may be transmitted on a first data stream carried on a network such as a DVB-H broadcast network, while the original feed data may be transmitted on a second data stream transmitted across another network such as a DVB-T network.

Note that DVB-T networks and settings have been designed to transmit to fixed location receivers, and it would be of great benefit to extend the use of DVB-T networks to other types of receivers, such as mobile receivers or fixed location receivers in poor locations. For these types of new receivers, the error conditions present on existing DVB-T networks are too challenging to support relevant applications such as high quality streaming video. By using the methods described in this disclosure, the use of these networks can be extended to these new receivers to support applications that are too challenging to support by other means. These methods can be applied without the need for network upgrades, nor do existing receivers need to be upgraded to support these new receivers. By using the method described in the present disclosure, new receivers will be able to receive the original unmodified data streams and, in addition, also the source identification data and FEC repair data associated with these original data streams in order to recover and play out the video stream with the same quality as existing receivers, although the network conditions are much worse for these new receivers than for existing receivers.

It will be apparent to those skilled in the art upon reading this disclosure that the methods described herein may be extended naturally to support different FEC codes to maintain this property: at the receiver, data from more than one transmitted code can be combined in such a way as to provide greater error correction than if the data were processed independently according to the procedure associated with the respective code.

The foregoing description has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The described embodiments were chosen in order to best explain the principles of the invention and its practical application to thereby enable others skilled in the art to best utilize the invention in various embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims appended hereto.

Claims

1. A method for generating and transmitting source identification data for source data, wherein the source data has been divided into K source data units, where K is greater than 1, wherein the source identification data can be used to identify the location of source data units relative to other source data units and to identify lost or corrupted source data units, the method comprising:

generating source identification data from the K source data units;

transmitting a first data stream to a receiver, wherein the first data stream comprises a set of the source data units; and

transmitting a second data stream to the receiver, wherein the second data stream includes the source identification data;

wherein the second data stream is different from the first data stream, wherein the source identification data is different from the set of source data units communicated on the first data stream, and wherein the source identification data is usable to identify a source data unit of the set of source data units communicated on the first data stream.

2. The method of claim 1, wherein the source identification data enables a recipient of the first and second data streams to determine identification tags of the K source data units used to generate the source identification data.

3. The method of claim 1, wherein the source identification data enables a recipient of the first and second data streams to determine an identification tag of any source data unit.

4. The method of claim 1, wherein the source identification data enables a recipient of the first and second data streams to determine an ordered position of a given source data unit relative to all of the source data units.

5. The method of claim 1, wherein the source identification data enables a recipient of the first and second data streams to determine a position of a given source data unit relative to a source-dependent ordered sequence of all of the source data units.

6. The method of claim 1, wherein the second data stream further includes repair data for the source data unit.

7. The method of claim 1, wherein the second data stream further comprises repair data for the source identification data.

8. The method of claim 1, wherein the first data stream is transmitted via DVB-T, and wherein the second data stream is transmitted via DVB-H.

9. The method of claim 1, further comprising:

receiving a request for additional data from the receiver; and

transmitting a third data stream to the receiver.

10. The method of claim 9, wherein the third data stream includes a second set of source data units.

11. The method of claim 9, wherein the third data stream includes repair data for the source data unit.

12. The method of claim 1, wherein generating source identification data from the K source data units comprises:

creating an ordered set S from each of the K source data units;

i) hash function H_iApplied to each member of S, where H_iMapping the members of S to one of the possible L hash outputs, and wherein H_iMapping at least X members of S to a unique hash output;

ii) creating a vector corresponding to the L hash outputs, wherein the vector indicates which of the L hash outputs encountered a collision and which of the L hash outputs uniquely mapped to a member of S;

iii) creating a set S_iIncluding unmapping of SMembership to a unique hash output;

iv) iterating steps i-iii until either S_iIs an empty set, or has been performed a maximum number of iterations, wherein for each iteration of steps i-iii, S_iIs used as S, where for each iteration H_iIs a hash function different from the hash function used during the previous iteration of steps i-iii, wherein the values of L and X can change for each iteration; and

source identification data is generated from the vectors created during each iteration of steps i-iii.

13. The method of claim 12, wherein the vector further indicates a location in S of each uniquely mapped member of S, the method further comprising:

a position-to-sequence mapping is generated from the positions of the uniquely mapped members in S indicated by the vectors created during each iteration of steps i-iii.

14. An apparatus for generating and transmitting source identification data for source data, wherein the source data has been divided into K source data units, where K is greater than 1, wherein the source identification data can be used to identify the location of source data units relative to other source data units and to identify lost or corrupted source data units, the apparatus comprising:

means for generating source identification data from the K source data units;

means for transmitting a first data stream to a receiver, wherein the first data stream comprises a set of the source data units; and

means for transmitting a second data stream to the receiver, wherein the second data stream includes the source identification data;

15. A method for receiving source identification data and associating the source identification data with source data, wherein the source data has been divided into K source data units, where K is greater than 1, wherein the source identification data can be used to identify the location of source data units relative to other source data units and to identify lost or damaged source data units, the method comprising:

receiving a plurality of source data units from a first data stream;

receiving source identification data from a second data stream, wherein the second data stream is different from the first data stream; and

associating the source identification data obtained from the set of packets in the second data stream with the plurality of source data units obtained from the set of packets in the first data stream, wherein the source identification data is usable to identify a source data unit of the plurality of source data units.

16. The method of claim 15, further comprising:

an identification tag of the received plurality of source data units is determined.

17. The method of claim 15, further comprising:

an identification tag of any source data unit that is not a member of the received plurality of source data units is determined.

18. The method of claim 15, further comprising:

determining an ordered position of a source data unit of the plurality of source data units relative to all of the source data units.

19. The method of claim 15, further comprising:

determining a position of a source data unit of the plurality of source data units relative to a source-determined ordered sequence of all of the source data units.

20. The method of claim 15, further comprising:

repair data for the plurality of source data units is received from the second data stream.

21. The method of claim 15, further comprising:

repair data for the source identification data is received from the second data stream.

22. The method of claim 15, wherein the first data stream is received via DVB-T, and wherein the second data stream is received via DVB-H.

23. The method of claim 15, further comprising:

analyzing the source identification data and the plurality of source data units to determine if any of the K source data units have not been received; and

additional data is requested if any of the K source data units are not received.

24. The method of claim 23, further comprising:

the additional data is received from a third data stream.

25. The method of claim 15, wherein associating the source identification data with the plurality of source data units comprises:

creating a set S from each of the plurality of source data units;

i) hash function H_iApplied to each member of S to obtain a hash output associated with each member of S, where H_iMapping each member of S to one of the possible L hash outputs;

ii) determining which of the hash outputs uniquely identifies the source data unit associated with the hash output;

iii) if it is determined that the hash output does uniquely identify a source data unit, removing the uniquely identified source data unit from S and associating a proper subset of identification data from the source identification data with the uniquely identified source data unit; and

iv) if it is determined that one of the hash outputs is not unique, iterating steps i-iii until S is an empty set, or until a maximum number of iterations have been performed, wherein for each iteration H_iIs a hash function different from the hash function used during the previous iteration of steps i-iii, where the value of L is potentially different.

26. The method of claim 25, wherein H is applied for each iteration of steps i-iv_iIs determined from the source identification data.

27. The method of claim 25, wherein determining which of the hash outputs uniquely identifies a source data unit associated with the hash output comprises:

comparing each hash output to a vector from the source identification data, wherein the vector corresponds to the possible L hash outputs, and wherein the vector indicates which of the L hash outputs encountered a collision and which of the L hash outputs uniquely mapped to a member of S.

28. The method of claim 27, wherein the source identification data comprises a location-to-sequence mapping, and wherein associating a proper subset of identification data from the source identification data with the uniquely identified source data unit comprises associating location information from the location-to-sequence mapping with the uniquely identified source data unit.

29. The method of claim 25, further comprising:

passing a source data unit to a decoder if the source data unit is associated with a proper subset of identification data from the source identification data.

30. The method of claim 25, further comprising:

discarding a source data unit if the source data unit is not associated with a proper subset of identification data from the source identification data.

31. An apparatus for receiving source identification data and associating the source identification data with source data, wherein the source data has been divided into K source data units, where K is greater than 1, wherein the source identification data can be used to identify the location of source data units relative to other source data units and to identify lost or damaged source data units, the apparatus comprising:

means for receiving a plurality of source data units from a first data stream;

means for receiving source identification data from a second data stream, wherein the second data stream is different from the first data stream; and

means for associating the source identification data obtained from the set of packets in the second data stream with the plurality of source data units obtained from the set of packets in the first data stream, wherein the source identification data is usable to identify a source data unit of the plurality of source data units.