HK1232691B - Apparatus for generating and interpreting a data stream with segments having specified entry points - Google Patents
Apparatus for generating and interpreting a data stream with segments having specified entry points Download PDFInfo
- Publication number
- HK1232691B HK1232691B HK17106077.4A HK17106077A HK1232691B HK 1232691 B HK1232691 B HK 1232691B HK 17106077 A HK17106077 A HK 17106077A HK 1232691 B HK1232691 B HK 1232691B
- Authority
- HK
- Hong Kong
- Prior art keywords
- seg
- data
- frame
- audio
- information
- Prior art date
Links
Description
The present invention relates to the transmission of data over error prone channels with fixed length data packages. It is especially suitable for perceptual audio coding.
Modern audio coding methods such as e.g. MPEG Layer 3, MPEG AAC or MPEG HE-AAC (MPEG=moving picture experts group, HE-AAC=high efficient advanced audio coding) are capable of reducing the data rate of digital audio signals by means of exploiting some psycho-acoustical properties of the human ear. Hereby a block of a fixed number of audio samples, called frame, is encoded to a compressed bit stream representation of this fixed time interval. The compressed audio frame will be transformed back to an audio sample representation in the decoder. Since the difficulty to encode an audio signal may vary for different audio frames, the well-known bit reservoir technique allows exchanging bits between the frames. Although the overall bit rate is constant, as a consequence the length of the frames in the bit stream is variable. The encoded frame has a part with side information containing essential information for the decoder to interpret the compressed data, followed by the compressed spectral data.
For transmission, the compressed audio frame has to be embedded into a transport format such as e.g. the ADTS (ADTS = audio data transport stream) or LOAS (LOAS = low overhead audio stream) transport format for MPEG AAC. If there are errors in the transmission, it will be possible for the decoder to re-synchronize, due to sync-words, on the bit stream after the loss of one or more frames. Since in modern audio codecs, spectral data and parts of the side information is often entropy coded with code words of variable length such as e.g. Huffman coding in MPEG AAC, a single bit error is often sufficient for the decoder having to discard the whole frame and to mute the output signal or use some error concealment technique, e.g. noise insertion or interpolation between intact frames or a combination thereof. If longer regions of errors occur during the transmission, the decoder is still able to re-synchronize on the bit stream, but it does not have information about the number of frames that have been lost. In addition to the concealment of multiple frames, this can lead to audible time shift on the audio played back by the decoder or dropouts due to buffer over- or under-runs. Especially over error-prone channels, to keep a high quality of the transmitted audio signal, it is extremely important to have a sophisticated error-management available.
The invention is especially suited for the transmission over error prone channels with fixed length data segments. Because of the variable length of the frames, such as compressed audio frames, a new frame for a well-known transport format such as e.g. the already mentioned ADTS or LOAS formats usually starts at arbitrary positions of the fixed length data segment. Therefore, in case such a segment gets lost, which contains data of two consecutive frames, both frames will be corrupt and must be replaced by an error concealment strategy of the decoder. Document EP1021039 discloses a method of transmitting MPEG2 data stream from an encoder to a decoder.
In the following description, a data frame refers to a frame of data from e.g. an audio codec such as MPEG-4 High Efficiency AAC. This data frame can have varying length in bits, i.e. varying size. Furthermore, the data frame is divided into several data segments of constant size. There can be one or multiple constant size segments for every data frame. Within the data segments of constant size, data entities are present. These correspond to e.g. Huffman code-words representing e.g. spectral data of the encoded signal. The data segments contain several data entities. Some are complete data entities, referred to as interpretable data entities, and some are data entity fragments, which are in-complete data entities not interpretable on their own. Furthermore, in the following description, the transport protocol header or the information block, refers to elements that contain information to make a single data segment self-contained, i.e. the information describes the range of the e.g. audio spectrum a certain data segment covers, and where in the data segment the interpretable data entity begins, without depending on valid reception of another data segment.
The present invention provides a method for efficient transport of packaged data with variable length framing over error prone channels with fixed length data segments. In a preferred embodiment it is used for transmitting compressed audio data in form of audio frames of variable length, in which it comprises the following steps.
At an encoder: compressed audio data frames of arbitrary size are mapped into fixed size data segments for a transmission over an error prone channel; a transport protocol header or an information block is inserted at the beginning of each data segment; the transport protocol header or the information block contains information to be able to identify where in the data segment the interpretable data entity begins. In further embodiments, the information identifies the boundaries of a variable length audio data frame; the above transport protocol header information or information block can be coded in a very efficient manner down to a single byte. This is achieved by exploiting certain parameter inter-dependencies such that only cases with highest likelihood are coded.
At a decoder: a transport handler receives the segments and the information whether the transmission was successful or not, it strips off the transport protocol header or the information block and concatenates the data of each received frame which is then passed to the decoder; for the case of data segment losses, the transport protocol header or the information block contains information to reconstruct the number of lost audio frames which allows for a correct time synchronization; for the case of data segment losses, the transport protocol header or the information block contains information to make a single data segment self-contained, i.e. the information describes the range of the audio spectrum a certain data segment covers, without depending on valid reception of another data segment. If this information is passed to the decoder it can apply partial concealment methods
It is an object of the present invention to provide a concept for obtaining an improved audio quality even in situations of transmitting audio data over error prone channels.
In accordance with a first aspect of the invention, this object is achieved by an apparatus comprising a packetiser for packetising data from a data frame into a series of segments having a first segment and a second segment, where the second segment has interpretable data entities and has a data entity fragment, the data entity fragment including only a part of an interpretable data entity preceding an interpretable data entity. The apparatus comprises furthermore an information block adder for adding an information block associated with the second segment, the information block indicating an entry point into the second segment, the entry point indicating a start of the interpretable data entity following the data entity fragment.
In accordance with a second aspect of the invention, this object is achieved by an apparatus for interpreting a data stream having a series of segments with a first segment having an associated additional information block, the additional information block indicating a starting point of a data frame having interpretable data entities, and a second segment having an associated information block, the second segment following an erroneous segment and the information block indicating an entry point into the second segment, the entry point indicating a start of an interpretable data entity following a data entity fragment, the data entity fragment including only a part of an interpretable data entity preceding the interpretable data entity. The apparatus comprises an error detector for detecting the erroneous segment, an information block interpreter for interpreting the additional information block to extract information about the starting point of the data frame and for interpreting the information block to extract information about the entry point, and a frame re-constructor for reconstructing data of the data frame by collecting the data starting from the starting point of the data frame, by dropping the erroneous segment and the data entity fragment, by dropping the additional information block and the information block and by applying a error concealment operation for the dropped frame data.
In accordance with a third aspect of the invention, this object is achieved by a data stream comprising data organized in a series of segments. It comprises a first segment and a second segment having interpretable data entities and having a data entity fragment, the data entity fragment including only a part of an interpretable data entity preceding an interpretable data entity, an information block indicating an entry point into the second segment, the entry point indicating a start of the interpretable data entity following the data entity fragment.
In accordance with a fourth aspect of the invention, this object is achieved by a method for generating a data stream having a series of segments using data organized in subsequent data frames. It comprises the following steps: packetising data from a data frame into the series of segments having a first segment and a second segment, the second segment having interpretable data entities and having a data entity fragment, the data entity fragment including only a part of an interpretable data entity preceding an interpretable data entity and the step of adding an information block associated to the second segment, the information block indicating an entry point into the second segment, the entry point indicating a start of the interpretable data entity following the data entity fragment.
In accordance with a fifth aspect of the invention, this is achieved by a method for interpreting a data stream having a series of segment with a first segment having an associated addition information block, the additional information block indicating a starting point of a data frame having interpretable data entities, and a second segment having an associated information block, the second segment following an erroneous segment and the information block indicating an entry point into the second segment, the entry point indicating a start of an interpretable data entity following a data entity fragment, the data entity fragment including only a part of an interpretable data entity preceding the interpretable data entity. It comprises the following steps: detecting the erroneous segment, interpreting the additional information block to extract information about the starting point of the data frame and interpreting the information block to extract information about the entry point, reconstructing data of the data frame by collecting the data starting from the starting point of the data frame, dropping the erroneous segment and the interpretable data entity fragment, dropping the additional information block and the information block and by applying a error concealment, the error concealment operation for the dropped frame data.The present invention also comprises a computer program for implementing the inventive methods.
In summary, the present invention defines a new, efficient transport format. It lowers the amount of lost data over an error prone channel significantly, and is especially suitable for transmitting compressed audio data. This is achieved by adding additional information to each segment that is transmitted over the error-prone channel and this information indicates especially entry points for resuming to interpret the data output. Preferably, these entry points are the first code words of a beginning scale factor band. The scale factor bands define scale values for a region in the spectral representation and contain spectral values of the frame encoded into code words, which are sorted in ascending order of their corresponding frequency values. The information about the entry point contains an offset into the data stream, where a new scale factor band starts. By choosing these entry points, the overhead is lowered, since less information has to be transmitted. Basically, other code words can also be taken, but then further information has to be transmitted about which code word in which scale factor band represents the entry point. In a very efficient coding the information blocks comprise only a single byte or very few bytes.
Preferred embodiments of the invention provide information about a data frame number by assigning different counter values to different data frames. By interpreting the counter values, the number of lost data frames can be identified. Thereby, the problem of wrong time-synchronisation is greatly reduced. In further embodiments of the invention a re-ordering of the data is done, which has the advantage that the most important information like the Side Info data, which is essential to re-construct the whole frame (see also below at Fig. 7 ), is located in a single segment and hence decreases the likelihood of losing a whole frame.
In the example of data frames representing compressed audio frames, well-known procedures are concealments by interpolating the data between intact audio frames or to replace the erroneous part by a noise signal or simply to mute the output. The concrete choice depends on the situation, e.g. whether a noise replacement is tolerable or whether enough resources are available to perform a sophisticated interpolation algorithm. The most significant advantage of embodiments of the present invention is that, in the best case, an erroneous segment results only in a loss of the data transmitted in the this segment and all remaining data of the frame can be decoded correctly. The invention is disclosed in the appended claims.
The present invention will now be described by way of illustrated examples. Features of the invention will be more readily appreciated and better understood by reference to the following detailed description, which should be considered with reference to the accompanying drawings, in which:
- Fig. 1
- illustrates the data segments with fixed length and the compressed audio frames with variable length;
- Fig. 2a
- illustrates the block diagram of a data transmission over the transmission channel with fixed length data segments using the present invention;
- Fig. 2b
- illustrates the block diagram of a complete audio encoding and decoding chain, including the transmission over the transmission channel with fixed length data segments using the present invention;
- Fig. 3a-3c
- illustrates an example how information blocks are distributed over multiple data segments, e.g. how one raw audio frame plus the transport information is distributed over multiple data segments, according to the present invention;
- Fig. 4
- illustrates the advanced concealment mechanism;
- Fig. 5
- illustrates two subsequent segments with information blocks and the data entities;
- Fig. 6
- shows an information block comprising eight bits; and
- Fig. 7
- gives a schematic view on an encoded audio frame.
The below-described embodiments are merely illustrative for the principles of the present invention for improvement of transmitting for example compressed audio over error prone channels with fixed length data segments. It is understood that modifications and variations of the arrangements and the details described herein will be apparent to others skilled in the art. It is the intent, therefore, not to be limited by the specific details presented by way of the description and explanation of embodiments herein.
A purpose of the information blocks is to provide an indication of a next possible entry point and an offset pointing to the position belonging to the signaled entry point allows extracting data even if a previous segment has been corrupted by an erroneous transmission, e.g. by a decoder that decodes spectral data of an audio frame. In Fig. 3a , a pointer 305 gives an example. At the entry points a new interpretable data entity starts. Observing the example that the data stream comprises a stream of compressed audio frames, where the spectral data is coded with code words of variable length, this would require signaling the offset from the start of the segment to the next possible entry point with a precision of one bit. This increases the number of positions to be signaled. However, the present invention teaches that it is not necessary to consider signaling all possible combinations of entry point identifications and entry point offsets. In order to keep a low overhead, also signaling only a subset comprising e.g. the most probable values is possible, which results in a reduction of the number of frames that need to be concealed completely and hence the perceived audio quality is improved compared to prior art methods.
For the case of transmitting compressed audio data, possible entry points are basically any beginning of a new code word. But to keep the overhead as small as possible, in a preferred embodiment, the entry points will be as mentioned above a beginning of a scale factor band and the information blocks will provide the information about the scale factor band. If the main issue is to provide a maximum in data security, and the size of a bigger overhead is tolerable, the information blocks can also indicate multiple entry points, which not necessarily coincide with the beginning of a scale factor band.
The present invention teaches that when the essential data in a data frame, required to be able to decode the rest of the data in a data frame, is stored at a beginning of that frame, this data should be put, in a preferred embodiment, at the beginning of a new data segment. For compressed audio data, for example, this is the case, i.e. information necessary to reconstruct the audio frame is stored at a beginning of a frame (see Fig. 7 below). Storing the essential data at the beginning of a new data segment ensures that the decoder does not have to conceal two consecutive data frames in case of a single segment loss, as will be clear from the following example.
According to a preferred embodiment of the present invention data of a data frame, sorted in an order X0,X1..Xm, starts with a new segment Seg#0 comprising data X0...Xi (being the more important data needed to be able to decode the rest of the data in the data frame) and subsequent data are stored according to the following order. Seg#-1 comprises the data Xi+1...Xj, Seg#1 comprising the data Xj+1...Xk, Seg#2 comprising the data Xk+1...Xl and Seg#3 comprising the remaining data Xl+1...Xm. (cp. Fig. 3a ). This re-ordering avoids the risk of having to conceal two consecutive data frames in case of a single segment loss, since if the Seg#-1 in Fig. 3a is damaged and if the first data stored in a data frame comprises essential information about the data in the data frame, the following segments seg#0 to seg# 3 cannot be decoded correctly.
To distinguish between data segments comprising the start of a new data frame (Seg#0) and succeeding segments comprising additional parts of the data frame (Seg#1-#3), the different segment types are signaled, e.g. in the information block 302 and 304, respectively. Since the beginning of the data frame (i.e. the essential information) was put in seg#0 in Fig. 3a , the seg# -1 needs to be filled with data following the essential data from the data frame. Hence, the rest of the incomplete previous segment Seg#-1 that has been left over by a previous data frame is filled up with parts of the bit stream data (Xi+1...Xj) of the current data frame. An offset pointer 303 in Fig. 3a , contained in the information block 302 of the first segment of the data frame, points to the start of this data in the previous segment Seg#-1. A concrete embodiment of the transmission of compressed audio frames of an aacPlus bitstream over data segments with a fixed length and information blocks comprising eight bits is given below.
In another embodiment, the information block of Seg#0 has a frame counter value that is increased with every new data frame. This mechanism allows for a re-synchronization in case more segments get lost. The information blocks for the other segments not belonging to the start of the data frame, as e.g. 304, are different from the first segment information block 302.
Summarizing, in a preferred embodiment of the present invention the transmitted data is compressed audio data and Fig. 3a-3c shows one audio frame embedded together with transport information according to the present invention into the fixed segment length transmission channel. In each segment a small amount of transport information is preceding the raw audio data stored in this segment. In the invention an audio frame always starts with a new segment Seg #0, avoiding the risk of having to conceal two consecutive audio frames in case of a single segment loss. With the transport information 302 and 304 it is possible to distinguish between data segments containing the start of a new audio frame (Seg #0) and succeeding segments containing additional parts of the exemplary compressed audio frame (Seg #1-#3). The distinction is done by signaling the segment type in the transport information 302 resp. 304 (the "0" or "1" values in Figs. 3b ,3c ). The rest of the incomplete previous segment Seg #-1 that has been left over by the previous audio frame is filled up with parts of the bitstream data of the current frame. An offset 303 in Fig. 3a contained in the transport information 302 of the first segment of an audio frame points to the start (Xa in Fig. 3c ) of this data in the previous segment Seg #-1. In addition, in the transport information of the segment with the start of the audio frame, there is a small frame counter f# in Fig. 3c that is increased with every new audio frame. This mechanism allows for an immediate re-synchronization in case of segments get lost. Because of the frame counter f# the number of lost audio frames is always known, the problem of wrong time-synchronisation is greatly reduced. The transport information for the other segments not belonging to the start of the audio frame 304 is different from the first segment transport information 302. An indication ("I" in Fig. 3b ) of the next possible entry point and an offset (Xb in Fig. 3b ) pointing to the position belonging to the signaled entry point allows the decoder to continue decoding the spectral data even if the previous segment has been corrupt by the erroneous transmission. There might be cases where the spectral data is coded with code words of variable length. This would require signaling the offset from the start of the segment to the next possible entry point bit exact, which increases the number of positions to be signaled. It is not necessary to consider signaling all possible combinations of entry point identification and entry point offset. In order to keep a low overhead, also signaling only a subset comprising the most probable values is possible and results in a reduction of the number of frames that need to be concealed completely and hence improve the perceived audio quality.
In Fig. 4 the advantage of partial concealment is illustrated. It shows a spectral representation of three consecutive data frames as for example audio frames: a data frame 401, a data frame 402 and a data frame 403. In this example, a data segment in the data frame 402 is lost because of an erroneous transmission, while the previous data frame 401 as well as the next data frame 403 are error-free. Usually, either the whole data frame 402 is lost or in the best case all spectral data after the position in the spectrum corresponding to the lost data segment is not available and has to be estimated. According to the present invention, the additional information about possible entry points for extracting of data as e.g. the decoding of spectral data allows to skip the corrupt segment e.g. during decoding, losing only a small part of the data (e.g. spectral data). With help of the known data (e.g. spectral data) of the previous data frame 401 and the following data frame 403, a replacement for the missing part of the spectral data has to be calculated by an error concealment algorithm. In the example of data frames representing compressed audio frames, well-known procedures are concealments by interpolating the data between intact audio frames or to replace the erroneous part by a noise signal or simply to mute the output. The concrete choice depends on the situation, e.g. whether a noise replacement is tolerable or whether enough resources are available to perform a sophisticated interpolation algorithm.
In summary, the present invention defines a new, efficient transport format. It lowers significantly the amount of lost data over an error prone channel and is especially suitable for transmitting compressed audio data. This is achieved by adding additional information to each segment that is transmitted over the error-prone channel and this information indicates especially entry points for resuming to interpret the data output. Preferably, these entry points are the first code words of a beginning scale factor band. The scale factor bands define scale values for a region in the spectral representation and contain spectral values of the frame encoded into code words, which are sorted and the order of the code words is given by an order of the spectral values sorting form a lowest value followed by subsequent higher values. The information about the entry point gives the bit of the data stream where a new scale factor band starts, and which scale factor band it is. By choosing these entry points, the overhead is lowered, since less information has to be transmitted. Basically, other code words can also be taken, but then further information has to be transmitted in order to identify the code word within the scale factor band. In a very efficient coding the information blocks comprise only a single byte or very few bytes. With the low overhead, it may not be possible to indicate all entry points or only certain positions of entry points can be indicated. E.g. if the number of bits of the information block is small, only positions in a part of a segment can be indicated. In the cases, that no entry points can be given, the information block remains empty or an escape value is given.
Embodiments of the invention provide furthermore information about a data frame number by assigning different counter values to different data frames. By interpreting the counter values, the number of lost data frames can be identified. Thereby, the problem of wrong time-synchronisation is greatly reduced. In further embodiments of the invention a re-ordering of the data is done, which has the advantage that the most important information like the Side Info data, which is essential to re-construct the whole frame , is located in a single segment and hence decreases the likelihood of losing a whole frame.
In further embodiments, the information blocks comprise additional redundancy information, in order to identify erroneous segments after the transmission. This can be, e.g., CRC, parity bits, etc. This error detection is in addition to the usual error detection mechanisms of the underlying transport protocol, as e.g. ADTS or LOAS. In addition, in preferred embodiments the size of the information blocks, as measured in bits, it fixed for all information blocks. Since the segment size is also fixed in preferred embodiments, this means that also the data stored in each segment has a fixed size.
In the example of data frames representing compressed audio frames, well-known procedures are concealments by interpolating the data between intact audio frames or to replace the erroneous part by a noise signal or simply to mute the output. The concrete choice depends on the situation, e.g. whether a noise replacement is tolerable or whether enough resources are available to perform a sophisticated interpolation algorithm. By interpreting the counter values of intact frames, multiple erroneous frames can be identified and an error concealment for the multiple erroneous frames can be applied. The error concealment can be performed either for the compressed audio data, e.g. by replacing the corresponding code words, or after decoding by replacing the erroneous parts of the corresponding audio signals.
The most significant advantage of embodiments of the present invention is that, in the best case, an erroneous segment results only in a loss of the data transmitted in the this segment and all remaining data of the frame can be reconstructed by employing an error concealment.
In other embodiments, the size of the segments can be a multiple of the segment size of the underlying transport protocol. This alternative embodiment has the advantage, that the overhead due to the information blocks is less than for segment size equal to the size of the segment size of the underlying transport protocol. It has, however, the disadvantage of a possible loss of more data.
To further clarify the above-described invention in a further embodiment, the transmission of compressed audio frames of an aacPlus bitstream over data segments with a fixed length is described in detail. In the example the length of a data segment is 168 Bits and a new segment arrives every 20ms. Thus the overall data rate is 168 bits/20 ms = 8400 bit/s. Each 20ms a segment starts with a one byte information block. An aacPlus audio frame always starts right after the information block with the aacPlus Side Info data (including the side info data needed to decode the AAC spectral data). The aacPlus side info data is followed by the AAC spectral data. The spectral data is ordered from the 0 spectrum line up the maximum spectral line.
If a 20 ms segment comprising the aacPlus side info was lost, the entire audio frame would need to be concealed by the aacPlus decoder. If however one of the 20 ms segments not comprising the aacPlus Side Info data is lost, only parts of the spectrum would have to be concealed. This is possible because the information block includes information to specify the part of the spectrum that is covered by that 20 ms segment.
The structure of an data segment is shown in Table 1 and Table 2 shows the structure of an information block. The description of the solution by means of pseudo code:
Table 1 - Structure of one 20 ms segment
Table 2 - transport Header()
| NBits | Notes |
| segment(){ | |
| transport_header () | 8 |
| raw_payload () | 160 |
| } |
| Nbits | Notes | |
| if (audio_frame_start) { | ||
| framecnt_offset_code | 7 | |
| } | ||
| else { | ||
| scfb_offset_code [seg] | 7 | The choice of the code , table is dependent on the segment, counted from the first segment of the current frame |
| } |
The expressions in the Tables comprise the following information.
raw_payload() contains raw aacPlus audio payload data. The de-multiplexer shall concatenate the raw payload chunks belonging to one audio frame and pass on the complete raw audio frame to the aacPlus decoder.
transport_header() contains all information needed for the de-multiplexer to identify audio frame boundaries and in case of transmission errors the number of missing audio frames and the parts of the missing spectrum. Information on the missing data shall be passed on to the decoder in order to steer the advanced concealment algorithm.
audio _frame_start is a flag to indicate the start boundary of an aacPlus audio frame, i.e. if this value is for example true, it represents an information block for Seg.#0 (see Fig. 3a ) and if this value is for example false, the information block belongs to one of the remaining segments of the data frame.
framecnt_offset_code is a code that combines the values of a aacPlus frame counter value framecnt ranging from 0-5 and an offset value ranging from 0-20. The code is added, for example, to the information block 302 is calculated by the following formula:
With the above-mentioned range for the framecnt and for the offset, the code has 126 possible values, which can be encoded by seven bits assigned to framecnt_offset_code in the information blocks. The aacPlus audio frame sequence counter value allows specifying the number of missed audio frames. It is increased by one for each audio frame. The audio frame counter framecnt is wrapped around at a value of 6, i.e. the max value is 5. The offset value points to the spectral data content of the previous 20ms segment. It points in backward direction with a value given in bytes, an offset value of 0 indicates that the previous 20ms segment did not contain any spectral data belonging to this audio frame.
scfb_offset_code [seg] are added, for example, to the information block 304 are specified by code lookup tables that combine the values of a certain scale factor band index indicating the start scale factor band of the succeeding spectral data plus an offset pointer to the spectral data content of the current segment. The code lookup tables depend on the number of the segment following an audio frame start segment. The code refers to the spectral data contained in the same data segment. The offset points in forward direction with a value given in bits, an offset of 0 indicates that no offset is present. If the combination start scale factor band index and offset value cannot be coded because the value is not contained in the lookup tables, an escape value will be used to indicate that the current data segment cannot be decoded and the according spectrum range needs to be concealed.
For the preferred embodiment of the transmission of compressed audio data, the invention can be summarized as follows.
The invention provides a method for storage or transmission of data with the following steps. Data frames of variable frame size coming from a continuously sending source are packaged into segments which are on average smaller or equal in size than the data frames, all segments have same size and are on average or always smaller or equal in size than the data frames. Then, all segments carry information to signal the beginning of the frame and use additional information to signal that a previous segment contains a part of the current frame. The information about erroneous segments is either given by an underlying transport or storage mechanism or ensured by adding redundancy to the segments e.g. CRC, parity bits, etc.
In addition, further information about the timing or replay order of the frames, e.g. a sequence number, which wraps around can be given.
The most important information is preferably concentrated in a single or only a few bytes.
Segments, which do not contain the beginning of a frame, carry additional information, which guide the drain of the data stream to decode the data in the current segment even if a segment was lost during transmission or storage.
The additional information used to guide the drain of the data stream to decode the data in the current segment even if a segment was lost during transmission or storage is only added for the cases with the highest likelihood to reduce transport overhead.
The additional information embedded during the process is coded for redundancy reduction e.g. using adaptive code tables, combining multiple symbols into a single codeword, using Huffman coding or similar.
The data source can be transform based audio codec, which may or may not use bandwidth extension
The decoder can use the information about erroneous segments to apply concealment to the missing parts of the signal only.
The whole packaging method does not need any knowledge of the data to be transported, the information added is taken from the encoder and passed to the decoder.
Therefore, the present invention comprises a transport mechanism, which allows to package compressed data with variable frame lengths into fixed length data segments. It provides signalling means to apply partial concealment of an audio spectrum in case of transmission errors while adding only a very low transport overhead. It allows for a quick resynchronisation at the decoder in case of transmission errors with an accurate time alignment. It also adds preventions for error propagation. The present invention does not demand changes in the raw compressed data format such that a low complexity and "simple design" solution can be achieved.
Depending on certain implementation requirements of the inventive methods, the inventive methods can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, in particular a disk or a CD having electronically readable control signals stored thereon, which cooperate with a programmable computer system such that the inventive methods are performed. Generally, the present invention is, therefore, a computer program product with a program code stored on a machine readable carrier, the program code being operative for performing the inventive methods when the computer program product runs on a computer. In other words, the inventive methods are, therefore, a computer program having a program code for performing at least one of the inventive methods when the computer program runs on a computer.
Claims (11)
- A method for embedding a stream (102) of compressed audio frames (d1, d2, d3) into data segments (Seg#-1, Seg#0, Seg#1, Seg#2, Seg#3, 510, 520) of a data stream (500), comprising:embedding, by an encoder, the stream (102) of compressed audio frames (d1, d2, d3) into a series of fixed length data segments (Seg#-1, Seg#0, Seg#1, Seg#2, Seg#3, 510, 520) of the data stream (500), wherein each data segment in the series of fixed length data segments (Seg#-1, Seg#0, Seg#1, Seg#2, Seg#3, 510, 520) comprises compressed audio payload data (Xo ... Xm) and an information block (302, 304, 306, 600), wherein at least one data segment (Seg#-1, Seg#3) in the series of fixed length data segments (Seg#-1, Seg#0, Seg#1, Seg#2, Seg#3, 510, 520) is embedded with portions of two compressed audio frames (d1, d2, d3) in the stream (102) of compressed audio frames (d1, d2, d3),wherein the information block (302, 304, 306, 600) comprises audio frame boundary information (transport_header(), audio_frame_start, 610)), wherein the audio frame boundary information (transport _header(), audio_frame_start, 610) identifies frame boundaries of a plurality of compressed audio frames (d1, d2, d3), wherein a plurality of data segments in the series of fixed length data segments (Seg#-1, Seg#0, Seg#1, Seg#2, Seg#3, 510, 520) contains the plurality of compressed audio frames (d1, d2, d3); andtransmitting, by the encoder, the data stream (500) comprising the series of fixed length data segments (Seg#-1, Seg#0, Seg#1, Seg#2, Seg#3, 510, 520) to a decoding device, wherein the audio frame boundary information (transport_header() audio _frame _start, 610) is stored in the information blocks (302, 304, 306, 600) of the series of fixed length data segments (Seg#-1, Seg#0, Seg#1, Seg#2, Seg#3, 510, 520), and wherein the audio frame boundary information (transport_header(), audio_frame_start, 610) is used by the decoding device to perform re-synchronization with time alignment at the decoding device in case of transmission errors.
- The method of Claim 1, wherein the audio frame boundary information comprises a flag to indicate a start boundary of an audio frame, wherein if the flag has a first state, it represents an information block for a data frame start segment and if the flag has a second state, the information block belongs to one of the remaining segments of the data frame.
- The method of Claim 1, wherein information about a data frame number is provided in an information block of a corresponding data segment with the start of a compressed audio frame by assigning different frame counter values to different audio frames and wherein a number of lost data frames is identifiable by interpreting the counter values.
- The method of Claim 1, wherein the series of compressed audio frames have compressed audio frames of variable lengths but of a same fixed time interval.
- The method of Claim 1, wherein the data stream (500) is transmitted to the decoding device at a specific data transmission rate.
- The method of Claim 1, wherein each compressed audio frame in the series of compressed audio frames represents a fixed number of audio samples.
- The method of Claim 1, wherein each compressed audio frame in the series of compressed audio frames comprises scale factor data.
- The method of Claim 1, wherein the series of compressed audio frames represents Moving Picture Experts Group (MPEG) frames.
- One or more computer readable storage media comprising a sequence of instructions, which when executed by a computer, cause performing the method as recited in any one of Claims 1-8.
- An apparatus for embedding a data stream (102) of compressed audio frames (d1, d2, d3) into data segments (Seg#-1, Seg#0, Seg#1, Seg#2, Seg#3, 510, 520) of a data stream (500), comprising:means for embedding the data stream (102) of compressed audio frames (d1, d2, d3) into a series of fixed length data segments (Seg#-1, Seg#0, Seg#1, Seg#2, Seg#3, 510, 520) of the data stream (500), wherein each data segment in the series of fixed length data segments (Seg#-1, Seg#0, Seg#1, Seg#2, Seg#3, 510, 520) comprises compressed audio payload data(X0 ... Xm) and an information block (302, 304, 306, 600), wherein at least one data segment (Seg#-1, Seg#3) in the series of fixed length data segments (Seg#-1, Seg#0, Seg#1, Seg#2, Seg#3, 510, 520) is embedded with portions of two compressed audio frames in the data stream (102) of compressed audio frames (d1, d2, d3);wherein the information block (302, 304, 306, 600) comprises audio frame boundary information (transport_header(), audio_frame_start, 610) for the data stream (500), wherein the audio frame boundary information (transport_header() audio_frame_start, 610) identifies frame boundaries of a plurality of compressed audio frames, wherein a plurality of data segments in the series of fixed length data segments (Seg#-1, Seg#0, Seg#1, Seg#2, Seg#3, 510, 520) contains the plurality of compressed audio frames (d1, d2, d3); andmeans for transmitting the data stream (500) comprising the series of fixed length data segments (Seg#-1, Seg#0, Seg#1, Seg#2, Seg#3, 510, 520) to a decoding device, wherein the audio frame boundary information (transport_header(), audio_frame_start, 610) for the data stream (500) is stored in the information blocks (302, 304, 306, 600) of the series of fixed length data segments (Seg#-1, Seg#0, Seg#1, Seg#2, Seg#3, 510, 520), and wherein the audio frame boundary information (transport_header(), audio_frame_start, 610) is used by the decoding device to perform re-synchronization with time alignment at the decoding device in case of transmission errors.
- A hardware device being configured to perform the method as recited in any one of Claims 1-8.
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US750897P | 2005-12-16 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1232691A1 HK1232691A1 (en) | 2018-01-12 |
| HK1232691B true HK1232691B (en) | 2022-11-18 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP3116194B1 (en) | Apparatus for generating and interpreting a data stream with segments having specified entry points | |
| US7809018B2 (en) | Apparatus for generating and interpreting a data stream with segments having specified entry points | |
| KR102730957B1 (en) | Audio transmitter processor, audio receiver processor and related methods and computer programs | |
| TWI449032B (en) | Method and apparatus for synchronizing highly compressed enhancement layer data | |
| KR100717600B1 (en) | Audio file format conversion | |
| KR100955014B1 (en) | Method and apparatus for encoding and decoding digital information signals | |
| KR20110026445A (en) | Method and apparatus for forming, truncating or modifying a frame-based bit stream format file comprising at least one header portion and corresponding data structure | |
| JP4376623B2 (en) | Streaming A / V data protection | |
| JP2002538654A (en) | Apparatus and method for generating a data stream and apparatus and method for reading a data stream | |
| HK1232691A1 (en) | Apparatus for generating and interpreting a data stream with segments having specified entry points | |
| HK1232691B (en) | Apparatus for generating and interpreting a data stream with segments having specified entry points | |
| CN1157853C (en) | Transmitting device for transmitting a digital information signal alternately in encoded form and non-encoded form | |
| CA2714578A1 (en) | Systems and methods for adaptive multi-rate protocol enhancement | |
| EP1341161B1 (en) | Method and apparatus for encoding and for decoding a digital information signal | |
| KR20070042328A (en) | Digital signal transmission method | |
| HK1085853A (en) | Audio file format conversion |