GB2620582A - Method, device, and computer program for improving indexing of portions of encapsulated media data - Google Patents
Method, device, and computer program for improving indexing of portions of encapsulated media data Download PDFInfo
- Publication number
- GB2620582A GB2620582A GB2210189.3A GB202210189A GB2620582A GB 2620582 A GB2620582 A GB 2620582A GB 202210189 A GB202210189 A GB 202210189A GB 2620582 A GB2620582 A GB 2620582A
- Authority
- GB
- United Kingdom
- Prior art keywords
- byte
- byte range
- level value
- data
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 74
- 238000004590 computer program Methods 0.000 title claims description 8
- 238000012545 processing Methods 0.000 claims abstract description 20
- 230000008569 process Effects 0.000 claims description 34
- 239000012634 fragment Substances 0.000 description 46
- 238000005538 encapsulation Methods 0.000 description 19
- 101100476202 Caenorhabditis elegans mog-2 gene Proteins 0.000 description 9
- AWSBQWZZLBPUQH-UHFFFAOYSA-N mdat Chemical compound C1=C2CC(N)CCC2=CC2=C1OCO2 AWSBQWZZLBPUQH-UHFFFAOYSA-N 0.000 description 8
- 230000002123 temporal effect Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 6
- 230000001419 dependent effect Effects 0.000 description 6
- 230000003044 adaptive effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000013507 mapping Methods 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 239000002689 soil Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/235—Processing of additional data, e.g. scrambling of additional data or processing content descriptors
- H04N21/2353—Processing of additional data, e.g. scrambling of additional data or processing content descriptors specifically adapted to content descriptors, e.g. coding, compressing or processing of metadata
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/238—Interfacing the downstream path of the transmission network, e.g. adapting the transmission rate of a video stream to network bandwidth; Processing of multiplex streams
- H04N21/2389—Multiplex stream processing, e.g. multiplex stream encrypting
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/262—Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
- H04N21/26258—Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists for generating a list of items to be played back in a given order, e.g. playlist, or scheduling item distribution according to such list
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/84—Generation or processing of descriptive data, e.g. content descriptors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8455—Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/85406—Content authoring involving a specific file format, e.g. MP4 format
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Library & Information Science (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A method for processing encapsulated media data comprising metadata and data associated with the metadata, the encapsulated media data comprising a plurality of segments and sub-segments. For a plurality of ordered byte ranges of one sub-segment, the byte ranges being defined in metadata descriptive of partial sub-segments of the sub-segment, one level value associated with each byte range within the metadata descriptive of partial sub-segments of the sub segment is obtained. In addition, a feature type value indicating that the level values are representative of dependency levels, the feature type value being obtained from the metadata descriptive of partial sub-segments of the sub-segment, is also obtained. Based on the level value associated with a given byte range to be processed, only byte ranges that are required for processing the given byte range are selected. The selected byte ranges being byte ranges preceding the given byte range, that are associated with lower level values.
Description
METHOD, DEVICE, AND COMPUTER PROGRAM FOR IMPROVING INDEXING OF
PORTIONS OF ENCAPSULATED MEDIA DATA
FIELD OF THE INVENTION
The present invention relates to a method, a device, and a computer program for improving encapsulating and parsing of media data, making it possible to improve the indexing and transmission of portions of encapsulated media data for allowing the reconstruction of a valid media file from the portions of encapsulated media data.
BACKGROUND OF THE INVENTION
The invention relates to encapsulating, parsing, and streaming media data, e.g. according to ISO Base Media File Format (ISOBMFF) as defined by the MPEG standardization organization, to provide a flexible and extensible format that facilitates interchange, management, editing, and presentation of group of media data or bit-streams and to improve its delivery for example over an IP (Internet Protocol) network such as the Internet using adaptive HTTP (Hypertext Transfer Protocol) streaming protocol.
The International Standard Organization Base Media File Format (ISOBMFF, ISO/IEC 14496-12) is a well-known flexible and extensible format that describes the encapsulation of timed media data or bit-streams either for local storage or transmission via a network or via another bit-stream delivery mechanism. The timed media data may represent encoded media data. This file format has several extensions, e.g. Part-15, ISO/IEC 14496-15 that describes encapsulation tools for various NAL (Network Abstraction Layer) unit-based video encoding formats. Examples of such encoding formats are AVC (Advanced Video Coding), SVC (Scalable Video Coding), HEVC (High Efficiency Video Coding), L-HEVC (Layered HEVC), or VVC (Versatile Video Coding). Other examples of file format extensions are ISO/IEC 23090-18 for carriage of Geometry-based Point Cloud Compression (G-PCC) or ISO/IEC 23090-10 for carriage of Visual Volumetric Video-based Coding (V3C) Data. ISOBMFF is object-oriented. It is composed of building blocks called boxes (also denoted objects, atoms or data structures, each of which being identified by a four character code) that are sequentially or hierarchically organized and that define descriptive parameters of the timed media data or bit-stream such as timing and structure parameters. In the file format, the overall presentation over time is called a movie. The movie is described by a movie box (with four character code 'moot/) at the top level of the media or presentation file. This movie box represents an initialization information container containing a set of various boxes describing the presentation. It may be logically divided into tracks represented by track boxes (with four character code itrak). Each track (uniquely identified by a track identifier (track ID)) represents a timed sequence of media data pertaining to the presentation (frames of video, timed metadata, or audio samples, for example). Within each track, each timed unit of media data is called a sample, which may be a video frame, an audio sample, or a set of timed metadata. Samples are implicitly numbered in sequence. The actual sample data are in boxes called Media Data boxes (with four character code 'mclat) or Identified Media Data boxes (with four character code 'imda') at the same level as the movie box. The movie may also be fragmented, i.e. organized temporally as a movie box containing information for the whole presentation followed by a list of movie fragment and Media Data box pairs or movie fragment and Identified Media Data box pairs. Within a movie fragment (box with four-character code 'moot) there is a set of track fragments (box with four character code 'fret), zero or more per movie fragment.
The track fragments in turn contain zero or more track run boxes ('trud), each of which documents a contiguous run of samples for that track fragment.
Media data encapsulated with ISOBMFF can be used for adaptive streaming with HTTP. For example, MPEG DASH (for "Dynamic Adaptive Streaming over HTTP") and Smooth Streaming are HTTP adaptive streaming protocols enabling segment or fragment-based delivery of media files. The MPEG DASH standard (see "ISO/IEC 23009-1, Dynamic adaptive streaming over HTTP (DASH), Part1: Media presentation description and segment formats") makes it possible to establish a link between a compact description of the content(s) of a media presentation and the HTTP addresses.
Usually, this association is described in a file called a manifest file or description file. In the context of DASH, this manifest file is a file also called the MPD file (for Media Presentation Description). When a client device gets the MPD file, the description of each encoded and deliverable version of a media content component (representing a single continuous encapsulated timed media data) can easily be determined by the client. By reading or parsing the manifest file, the client is aware of the kind of media content components proposed in the media presentation and is aware of the HTTP addresses for downloading the associated media content components. Therefore, it can decide which media content components to download (via HTTP requests) and to play (decoding and playing after reception of the segments). DASH defines several types of segments, mainly initialization segments, media segments, or index segments.
Initialization segments contain setup information and metadata describing the media content component, typically at least the 'ftyp' and 'moot/ boxes of an ISOBMFF media file. A media segment contains the media data corresponding to a media content component. It can be for example one or more 'moot plus mdat or citnda' boxes of an ISOBMFF file or a byte range in the Emdaf or (anda' box of an ISOBMFF file. A media segment may be further subdivided into sub-segments (also corresponding to one or more complete 'moor plus mdat or Eimda' boxes). The DASH manifest may provide segment URLs or a base URL to the file with byte ranges to segments for a streaming client to address these segments through HTTP requests. The byte range information may be provided by index segments or by specific ISOBMFF boxes such as the Segment Index box sidx' or the SubSegment Index box ssix'.
While these file formats and these methods for transmitting media data have proven to be efficient, there is a continuous need to improve selection of the data to be sent to a client and to improve the description of the indexation allowing a client, a reader, or a file parser to exploit portions of the data, e.g., to reconstruct a valid media file compliant with these file formats.
SUMMARY OF THE INVENTION
The present invention has been devised to address one or more of the foregoing concerns.
According to a first aspect of the invention, there is provided a method for processing encapsulated media data, the encapsulated media data comprising metadata and data associated with the metadata, the metadata being descriptive of the associated data, the encapsulated media data comprising a plurality of segments, at least one segment comprising a plurality of sub-segments, the method comprising: for a plurality of ordered byte ranges of at least one of the sub-segments, the byte ranges being defined in metadata descriptive of partial sub-segments of the at least one of the sub-segments, obtaining one level value associated with each byte range within the metadata descriptive of partial sub-segments of the at least one of the sub-segments, obtaining a feature type value indicating that the level values are representative of dependency levels, the feature type value being obtained from the metadata descriptive of partial sub-segments of the at least one of the sub-segments, based on the level value associated with a given byte range to be processed, selecting only byte ranges that are required for processing the given byte range, the selected byte ranges being byte ranges preceding the given byte range, that are associated with lower level values, obtaining the given byte range and the selected byte ranges.
Accordingly, the method of the invention makes it possible to select only required byte ranges when processing a given byte range and to generate ISOBMFF compliant media files when extracting and concatenating selected byte ranges. According to some embodiments, the obtained level value associated with the given byte range indicates that all data from byte ranges preceding the given byte range, that are associated with level values lower than the obtained level value associated with the given byte range and higher than a predetermined level value, from the given byte range to a first preceding byte range associated with the predetermined level value, and the data of the first byte range associated with the predetermined level value preceding the given byte range, denoted the required byte ranges, are required to process the given byte range.
According to some embodiments, the obtained level value associated with the given byte range further indicates that in addition to all data from the required byte ranges, denoted the first required byte ranges, the predetermined level value being denoted the first predetermined level value, data from each first byte range associated with a second predetermined level value preceding a byte range of the first required byte ranges, denoted the second require byte ranges, are required to process the given byte range.
According to some embodiments, the second predetermined level value is value zero, a byte range associated with level value zero comprising only metadata. According to some embodiments, the obtained level value associated with the given byte range further indicates that in addition to all data from the first and second required byte ranges, data from each first byte range associated with a third predetermined level value preceding a byte range of the first required byte ranges and following a first byte range of the second required byte ranges are required to process the given byte range.
According to some embodiments, the third predetermined level value is value one, a byte ranges associated with level value one comprising only metadata depending on a first preceding byte range associated with level value zero.
According to some embodiments, the method further comprises identifying a byte range which is not required for processing the given byte range, between two consecutive required byte ranges, and padding the identified byte range with dummy values.
According to some embodiments, the method further comprises generating a media file comprising required byte ranges, that complies with the ISOBMFF standard According to some embodiments, the metadata descriptive of partial sub-segments of the at least one of the sub-segments belong to a box of the ssix' type, the media data being encapsulated according to ISOBMFF.
According to a second aspect of the invention, there is provided a method for encapsulating media data, the media data comprising metadata and data associated with the metadata, the metadata being descriptive of the associated data, the media data comprising a plurality of segments, at least one segment comprising a plurality of sub-segments, the method comprising: for a plurality of ordered byte ranges of at least one of the sub-segments, associating one level value with each byte range within metadata descriptive of partial sub-segments of the at least one of the sub-segments, wherein the metadata descriptive of partial sub-segments of the at least one of the sub-segments further comprise a feature type value indicating that the level values are representative of dependency levels and wherein the level value associated with a given byte range indicates which data from preceding byte ranges that are associated with lower level values are required to process the given byte range.
Accordingly, the method of the invention makes it possible to select only required byte ranges when processing a given byte range and to generate ISOBMFF compliant media files when extracting and concatenating selected byte ranges.
According to some embodiments, the level value associated with the given byte range indicates that all data from byte ranges preceding the given byte range, that are associated with level values lower than the obtained level value associated with the given byte range and greater than a predetermined level value, from the given byte range to a first preceding byte range associated with the predetermined level value, and the data of the first byte range associated with the predetermined level value preceding the given byte range, denoted the required byte ranges, are required to process the given byte range.
According to some embodiments, the level value associated with the given byte range further indicates that in addition to all data from the required byte ranges, denoted the first required byte ranges, the predetermined level value being denoted the first predetermined level value, data from each first byte range associated with a second predetermined level value preceding a byte range of the first required byte ranges, denoted the second require byte ranges, are required to process the given byte range. According to some embodiments, the second predetermined level value is value zero, a byte range associated with level value zero comprising only metadata.
According to some embodiments, the level value associated with the given byte range further indicates that in addition to all data from the first and second required byte ranges, data from each first byte range associated with a third predetermined level value preceding a byte range of the first required byte ranges and following a first byte range of the second required byte ranges are required to process the given byte range.
According to some embodiments, the third predetermined level value is value one, a byte ranges associated with level value one comprising only metadata depending on a first preceding byte range associated with level value zero.
According to some embodiments, the metadata descriptive of partial sub-segments of the at least one of the sub-segments belong to a box of the ssix' type, the media data being encapsulated according to ISOBMFF.
According to some embodiments, a same level value is associated with at least two non-contiguous byte ranges of the at least one of the sub-segments.
According to other aspects of the invention, there is provided a processing device comprising a processing unit configured for carrying out each step of the methods described above. The other aspects of the present disclosure have optional features and advantages similar to the first and second above-mentioned aspects.
At least parts of the methods according to some embodiments of the invention may be computer implemented. Accordingly, some embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit", a "module", or a "system". Furthermore, some embodiments of the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.
Since some embodiments of the present invention can be implemented in software, some embodiments of the present invention can be embodied as computer readable code for provision to a programmable apparatus on any suitable carrier medium. A tangible carrier medium may comprise a storage medium such as a floppy disk, a CD-ROM, a hard disk drive, a magnetic tape device or a solid state memory device, and the like. A transient carrier medium may include a signal such as an electrical signal, an electronic signal, an optical signal, an acoustic signal, a magnetic signal or an electromagnetic signal, e.g. a microwave or SF signal.
BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments of the invention will now be described, by way of example only, and with reference to the following drawings in which: Figure 1 illustrates an example of streaming media data from a server to a client; Figure 2 illustrates an example of data encapsulated in a media file; Figure 3a illustrates an example of an initialization segment of data encapsulated in a media file; Figure 3b illustrates an example of one or more media segments of data encapsulated in a media file; Figure 4a illustrates an example of use of segment index box sidx' such as those represented in Figures 2 and 3b, as defined by ISO/IEC 14496-12, in a simple mode wherein an index provides durations and sizes for each sub-segment encapsulated in the corresponding file or segment; Figure 4b illustrates an example of use of sub-segment index box issix' such as those represented in Figures 2 and 3b, according to some embodiments of the invention, wherein an index provides sizes and associated level for each partial sub-segment encapsulated in a sub-segment described by a segment index box sidx'; Figure 5 illustrates an example of requests and responses between a server and a client, as performed with DASH, to obtain media data; Figure 6 is a block diagram illustrating an example of steps carried out by a server to transmit data to a client, according to some embodiments of the invention; Figure 7 is a block diagram illustrating an example of steps carried out by a client to obtain data from a server according to some embodiments of the invention; Figures 8a and 8b illustrate two different examples of level assignments using an extended sub-segment index box Issix', according to some embodiments of the invention; Figures 9a to 9d illustrate some examples of the organisation of byte ranges of data in levels and of the selection of byte ranges to process a given byte range, depending on these levels, according to some embodiments of the invention; and, Figure 10 schematically illustrates an example of a processing device configured to implement at least one embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
According to some embodiments, the invention makes it possible to improve the signalling of dependencies between portions of encapsulated media data. The invention also makes it possible to improve the indexing of portions of encapsulated media data in order to enable a client, a reader, or a file parser to reconstruct a valid media file from portions of encapsulated media data.
Figure 1 illustrates an example of streaming media data from a server to a client.
As illustrated, a server 100 comprises an encapsulation module 105 connected, via a network interface (not represented), to a communication network 110 to which is also connected, via a network interface (not represented), a de-encapsulation module 115 of a client 120.
Server 100 processes media data, e.g. video and/or audio data, for streaming or for storage. To that end, server 100 obtains or receives media data comprising, for example, an original sequence of images 125. Optionally, it can encode the sequence of images into encoded media data (or bit-streams) using a media encoder (e.g. a video encoder), not represented. It encapsulates the media data, possibly encoded, in one or more media files or media segments 130 using encapsulation module 105. The encapsulation process mainly consists in storing the media data in ISOBMFF boxes and generating and/or storing associated metadata describing the media data.
Encapsulation module 105 comprises at least one of a writer or a packager to encapsulate the media data. The media encoder may be implemented within encapsulation module 105 to encode received media data or may be separate from encapsulation module 105.
Client 120 is used for processing media file(s) received from communication network 110, or read from a storage device, for example for processing media file 130.
After the received media file have been de-encapsulated in de-encapsulation module 115 (also known as a parser), the de-encapsulated data (or parsed data), corresponding to media data or to a bit-stream, are optionally decoded, forming, for example, audio and/or video data that may be stored, rendered (e.g. play or display), or output. The media decoder may be implemented within de-encapsulation module 115 or it may be separate from de-encapsulation module 115. The media decoder may be configured to decode media data or one or more bit-streams in parallel.
It is noted that media file 130 may be communicated to de-encapsulation module 115 in different ways. In particular, encapsulation module 105 may generate media file 130 with a media description (e.g. DASH MPD) and communicates (or streams) it directly to de-encapsulation module 115 upon receiving a request from client 120.
For the sake of illustration, media file 130 may encapsulate media data (e.g. encoded audio or video) into boxes according to ISO Base Media File Format (ISOBMFF, ISO/IEC 14496-12 and ISO/IEC 14496-15 standards). In such a case, media file 130 may correspond to one or more media files (indicated by a FileTypeBox Iftyp), as illustrated in Figure 2, or one or more segment files corresponding to one initialization segment (when indicated by a FileTypeBox ftyp') and/or one or more media segments (when indicated by a SegmentTypeBox 'styp), as illustrated in Figures 3a and 3b.
Optionally, the segment files may also contain one or more Segment Index boxes sidx' and SubSegment Index boxes ssbe providing indexation information on media segments. According to ISOBMFF, media file 130 may include two kinds of boxes, "media data boxes" (e.g. tmdaf or limda) containing the media data and "metadata boxes" (e.g. mooV, 'moot, sidxc 'ssix) containing metadata defining placement and timing of the media data.
Figure 2 illustrates an example of media data encapsulated in a media file. As illustrated, media file 200 contains a 'mom/ box 205 providing metadata to be used by a client during an initialization step. For the sake of illustration, the items of information contained in the 'moot/ box may comprise the number of tracks present in the file as well as a description of the samples contained in the file. According to the illustrated example, the media file further comprises a segment index box sidx' 210, a sub-segment index box ssix' 215 and several fragments such as fragments 220 and 225, each composed of a metadata part and a media data part. For example, fragment 220 comprises metadata represented by 'moor box 230 and media data part represented by mdat box 235. Segment index box 'sick' 210 documents how the file is divided into one or more sub-segments (i.e. into one or more segment byte ranges), each sub-segment being composed of a complete set of fragments. It comprises an index making it possible to reach directly data associated with a particular sub-segment. It comprises, in particular, the duration and size of the sub-segment. Sub-segment index box ssix' 215 documents how a sub-segment is divided into one or more partial sub-segments 0.e. into one or more sub-segment byte ranges). It comprises an index making it possible to reach data of a sub-segment and a mapping of data byte ranges to level values. Level values can be either values with a predefined meaning or can be documented by an optional Level Assignment Box Veva' located within the Movie box 'moot/ 205 and providing the meaning associated with each level values. The media file 200 may include a chain of multiple segment index boxes sidx' and sub-segment index boxes 'ssix'.
Figure 3a and Figure 3b illustrate an example of data encapsulated in one or more media segments, being observed that media segments are suitable for live streaming.
Figure 3a illustrates the first segment of encapsulated media data. It is an initialization segment 300 that begins with the ftyp' box and with a 'moot/ box 305 indicating the presence of movie fragments (e.g., with a box 'mvex', not represented). In addition, the initialization segment may comprise index information ('sidx' and issix' boxes) and/or movie fragments. When a Sub-segment index box ssix' is defined in one of several segments, an optional Level Assignment Box Veva' may be declared within the Movie box 'moot/ to document the level values.
Figure 3b illustrates an example of one or more media segments of data encapsulated in a media file. As illustrated, media segment 350 begins with a tstyp' box. It is noted that for using segments like segment 350, an initialization segment 300 must be available. According to the example illustrated in Figure 3b, media segment 350 contains one segment index box sidx' 355, one sub-segment index box ssix' 360 and several fragments such as fragments 365 and 370. Segment index box sidx' 355 documents how the segment is divided into one or more sub-segments, each sub-segment being composed of a complete set of fragments. For example, each of the fragments 365 and 370 may represent a sub-segment or the combination of fragments 365 and 370 may represent one single sub-segment.
Segment index box sidx' 355 comprises an index making it possible to reach directly data associated with a particular sub-segment. It comprises, in particular, the duration and size of the sub-segment. Sub-segment index box issix' 360 documents how a sub-segment is divided into one or more partial sub-segments. It comprises an index making it possible to reach data of a partial sub-segment and a mapping of data byte ranges to level values. Level values can be either values with a predefined meaning or can be documented by a Level Assignment Box Veva' located within the Movie box 'moot/ 305 and providing the meaning associated with each level values. Multiple segment index boxes 'sidx' and sub-segment index boxes 'ssix' can be defined and organised as a daisy-chain of boxes. When a segment beginning with a styp' box only contains index boxes (e.g. sidx', ssil), it is called an index segment. Again, each fragment is composed of a metadata part and a media data part. For example, fragment 365 comprises metadata represented by 'moor box 375 and media data part represented by mdaf box 380.
Figure 4a and Figure 4b illustrate indexation of a media segment using a segment index box sidx' and a sub-segment index box Essix' authorizing either multiple byte ranges for a given level, with the meaning of the level provided by a "eye' box, or a single or multiple byte ranges for a given level, with the meaning of the level provided through a predefined feature type (also denoted level assignment type) and corresponding predefined level values.
Figure 4a illustrates an example of use of segment index box sidx', referenced 400, similar to those represented in Figures 2 and 3b, as defined by ISO/IEC 14496-12, in a simple mode wherein an index provides durations and sizes for two sub-segments. For the sake of illustration, the first sub-segment, referenced 430, is composed of one fragment and the second sub-segment, referenced 435, is composed of two fragments encapsulated in the corresponding file or segment. When the reference type field referenced 405 is set to zero for each entry in the references loop, the simple index, described within Isidx' box 400, consists in a loop on the sub-segments contained in the segment. Each entry in the index (e.g. entries referenced 420 and 425) provides the size in bytes and the duration of a sub-segment as well as information on whether the sub-segment is beginning with a random access point or not. For example, entry 420 in the index provides the size (a) referenced 410 and the duration (Di) referenced 415 of sub-segment 430.
Figure 4b illustrates an example of use of sub-segment index box issix', referenced 440, similar to those represented in Figures 2 and 3b. According to this example, the syntax version referenced 445 of sub-segment index box ssix' 440 enables to use predefined feature types (also denoted level assignment types) and assignment of byte ranges to levels without the need for a leya' box to document the meaning of level values.
More precisely, sub-segment index box Essix' provides a mapping of levels referenced 450 to byte ranges referenced 455 of the indexed sub-segment, the meaning of the level value associated with a byte range being either specified by a level assignment box tleya', (located in the movie box 'moot') or indicated by predefined level values. The indexed sub-segments are described by a segment index box sidx'. A sub-segment index box ssix' is usually the next box after an associated segment index box sicfr. For each sub-segment described by the associated segment index box 'sidx' (e.g. entry 470), the ssix' box provides a compact index describing how the data in a sub-segment are ordered in partial sub-segments, according to levels (i.e., values of the parameter level 450). The (ssix' box enables a client to easily access data for partial sub-segments by downloading ranges of data in the sub-segment as a function of the level value associated with a byte-range.
According to the example illustrated in Figure 4b, the subsegment_count parameter is equal to the reference_count parameter in the associated segment index box (i.e. the loop entries 470 and 475 are related to the same sub-segment 480). The sub-segment 480 corresponding to loop entries 470 and 475 is divided into two partial sub-segments corresponding to the second loop entries referenced 485 and 490. Each entry in the second loop (e.g., entries 485 and 490) provides the size in bytes of the partial sub-segments (denoted RSj and RSj+1 and corresponding to parameter range_size 455) and an associated level (denoted Lj and Lj+1 and corresponding to parameter level 450). The data range corresponding to a partial sub-segment may include (part of) boxes (e.g., 'moor, mdat' or 'imda' boxes), media data or both.
It is observed here that, in general, the media data constructed from the byte ranges extracted using byte ranges signalled in a ssix' box are incomplete, i.e., they do not conform to the media format of the entire sub-segment or of the media file. Some of the embodiments of the present invention have been devised to address this concern. According to the example illustrated in Figure 4b, for the version 1 or higher of the ssix' box, multiple byte ranges, possibly discontinuous, associated with the same level, may be described. As a consequence, obtaining all the data corresponding to a given level may require multiple byte ranges to be retrieved. The presence of the level assignment box '/eva' is only required for a feature type (or level_assignment_type) equals to 0, in which case the level assignment box Veva' may have a version set to 1 to signal that data for each level need not be stored contiguously and data for levels may be stored in random order of level value. The presence of the level assignment box Veva' is not required for the other feature type values.
According to the example illustrated in Figure 4b and to some of the embodiments of the invention, the semantics of the parameters in version 1 of the issix' are defined as follows: subsegment_count parameter is a parameter having a positive integer value specifying the number of sub-segments for which partial sub-segment information is specified in this box. The subsegment_count parameter value is equal to the reference_count parameter value (i.e., the number of movie fragment references) in the immediately preceding segment index box sidx'; isc is a flag that indicates, when it is set (e.g. when its value is equal to one), that the number of indexed ranges within a partial sub-segment is coded onto 32 bits, otherwise the number of indexed ranges within a partial sub-segment is coded onto 16 bits; incomplete is a flag, referenced 465 in Figure 4b, that indicates, when it is set (e.g. when its value is equal to one), that the last range of a given sub-segment may not cover the entire sub-segment, i.e., the last range of a given sub-segment may end before the last byte of the subsegment, in which case assignment of remaining bytes to a level is unknown but the remaining bytes should not correspond to any level listed in the box. Despite this is not recommended, remaining bytes may correspond to a level already listed in the box, soil is possible to index only the first occurrence of a level, e.g., only the first I frame. This flag allows to alert the client, reader, or file parser that one or more sub-segments are not completely indexed and to define a last byte ranges assigned to an unknown level value In other words, the incomplete flag is an indication that the sum of byte ranges in a sub-segment may not be equal to the corresponding sub-segment size indicated in csidx' box; lbs is a parameter that gives the number of bytes, minus 1, that are used for coding the level field; rbs is a parameter that gives the number of bytes, minus 1, that are used for
coding the range field;
-feature type (also denoted level assignment type) is a parameter, referenced 460 in Figure 4b, that gives the associated predefined semantics of the indicated level value 450. For the sake of illustration, it may be defined as follows: * 0: if the feature_type parameter is set to zero, the level value assigned to a partial sub-segment corresponds to the level indicated in the Veva' box. As described above, the '/eva' box indicates the mechanism used to specify the assignment of a feature to a level value. If the partial sub-segment (byte range) is not associated with any information in the level assignment defined by the Veva' box, then any level value that is not included in the level assignment may be used. This value of feature_type should only be used when the leva box version is 1 or more; * 1: if the feature_type parameter is set to one, the level value may correspond to a dependency level, for example as follows: o 0: if a level value is equal to zero, this means that the associated byte range contains exactly one or more file-level boxes (e.g. movie fragment). Media data boxes are not included in byte ranges with level 0, o 1: if a level value is equal to one, this means that the associated data are independently decodable (SAP (Stream Access Points) 1, 2, or 3). Byte ranges assigned to level 1 may contain the initial part of the sub-segment (e.g. movie fragment box). The beginning of a byte range assigned to level 1 coincides with the beginning of a top-level box in the sub-segment, o 2: if the level value is equal to two, this means that the associated data are independently decodable (SAP 1, 2, or 3). The beginning of a byte range assigned to level 2 does not coincide with the beginning of a top-level box in the sub-segment, o N: if the level value is equal to N, N being greater than two, this means that the associated data require data from the preceding byte ranges with lower levels (level N-1 and below) to be processed, stopping at the previous preceding byte range with level 0 if specified (i.e., if present), otherwise at the previous preceding byte range with level 1 or 2 if specified (i.e., if present), otherwise at the first byte range in the box. Byte ranges assigned to levels other than 2 may contain movie fragment box, * 2: if the feature_type parameter is set to two, the level value corresponds to a multitrack dependency level. In this mode, lbs is equal to one or more (i.e., at least 16 bits to code the level). The first 8 bits of the level field give the dependency level value, with the same values and semantics as the ones set for the level_assignment_type with value one.
The remaining less significant bits of the level field give a track_ID, which identifies a track of the movie present in the indexed sub-segment for level values other than zero. It is set to zero if the level value is equal to zero. In this mode, each range consists only of data from the identified track, possibly with some meta-data boxes (e.g., movie fragments, etc.).
The level value only gives dependency information within the track. This allows cross-track indexation within a same level; * other values are reserved values; range count is a parameter that specifies the number of partial sub-segment levels into which the media data are grouped. In the case where the version of the issbe box is 0, this value is greater than or equal to two and each byte in the sub-segment is explicitly assigned to a level. In the case where the version of the issix' box is 1 or more, this value may be 0 or more, and the described ranges may lead to a size smaller than the one of the sub-segment if and only if incomplete flag is set to one. It is noted that the value of the range_count parameter could be restricted to one or more instead of zero or more; range_size is a parameter that indicates the size in bytes of the partial sub-segment. This value cannot be 0, except for the last entry for which the value 0 may be used to indicate the remaining bytes of the segment, to the end of the segment; and level is a parameter that specifies the level to which the considered partial sub-segment is assigned.
Alternatively, the flags lsc, lbs, and rbs can be removed from the box syntax and defined as parts of the FullBox flags instead.
In a variant, the incomplete flag is optional or could be removed since this information can be deduced by cross-checking the sum of byte ranges of a sub-segment with the sub-segment size documented in the sidx' box.
In a variant, different values of incomplete flag or feature type can be signalled for each sub-segment within a segment by declaring them within the subsegment_count loop in the new version of issix' box.
Still alternatively, it is possible to define more than one sub-segment index box ssix' with version 1 or higher per segment index box sidx' that indexes only leaf sub-segments. In such cases, the multiple sub-segment index boxes ssix' all document the sub-segments that are indicated in the immediately preceding segment index box sidx' and each sub-segment index box uses a different predefined feature type, referenced 460 in Figure 4b. This allows to define byte ranges for different features per sub-segment. For example, a sub-segment index box 'ssix' can be used to document stream access points, and another sub-segment index box ssik can be used to document corrupted byte ranges.
However, the inventors have noted that the semantics of level N defined above may lead to a wrong interpretation of dependencies between partial sub-segments (or byte ranges). Indeed, "stopping at the previous preceding byte range with level 0 if specified" can be understood as all byte ranges from the byte range preceding the current byte range with level N up to the preceding byte range with level 0 are required to process the current byte range.
For instance, considering the case where partial sub-segments are organized as represented below, where Ri is the ith byte-range and Lj is the level with value j associated with the byte-range Ri: RO: LO (corresponding to a 'ma& box) R1: L1 (corresponding to header of 'mdat' box + data for an IDR (i.e. an independently decodable access unit (SAP 1, 2, or 3)) R2: L3 R3: L4 R4: L3 R5: L4 R6: L1 (header of ' mdat' box + data for OR) or L2 (IDR) R7: L3 R8: L4 R9: L3 R10: L4 the semantics of level N defined above can be interpreted as if dependencies for R10 is stopping at RD leading to (RU, R1, R2, R4, R6, R7, R9} while correct dependencies should be {RO:LO, R6:L1, R7:L3, R9:L3} in the order of level values.
Accordingly, in some particular embodiments, the semantics of level N is then specified as follows: Level N requires data from the preceding byte ranges with lower levels (level N-1 and below) to be processed, stopping at the previous preceding byte range with level 1 or 2 if specified (i.e., if present), otherwise at the first byte range in the box. The previous preceding byte range with level 0, if specified (i.e., if present), is required to process the data. Ranges assigned to levels other than 2 may contain movie fragment box.
In cases where temporal sublayers are gathered into levels defined in a issix' box, it is further specified that as level N depends only from level N-1 and below, a direct mapping of temporal sublayers to levels is not always possible in case frames from one temporal sublayer depend on preceding frames from the same temporal sublayer in another byte range (or partial sub-segment). In other words, it may be necessary to assign different values of level to byte ranges corresponding to a same temporal sublayer in order to signal correct dependencies between partial sub-segments. For instance, in previous example, if R10 depends on frames in R8 (having same level L4 in previous example and with the assumption that it belongs to the same temporal sublayer as R10), then the level values associated with these byte-ranges must be changed to signal the correct dependency as follows: RO: LO (corresponding to a 'moot' box) R1: L1 (corresponding to header of Indat' box + data for an IDR (i.e. an independently decodable access unit (SAP 1, 2 or 3)) R2: L3 R3: L5 R4: L4 R5: L6 R6: L1 (header of mdar box + data for IDR) or L2 (IDR) R7: L3 R8: L5 R9: L4 R10: L6 Therefore, it makes it clear that R10: L6 depends on the set of partial sub-segments {RO:LO, R6:[1, R7:L3, R9:L4, R8:L5). in the order of level values.
Figure 5 illustrates an example of requests and responses between a server and a client, as performed with DASH according to some embodiments of the invention, to obtain media data. For the sake of illustration, it is assumed that the media data are encapsulated in ISOBMFF and a description of the media content components (corresponding to encapsulated media data) is available in a DASH Media Presentation Description (MPD).
As illustrated, a first request and response (steps 500 and 505) aim at providing the streaming manifest to the client, that is to say the media presentation description. From the manifest, the client may determine the initialization segments that are required to set up and initialize its decoder(s). Next, the client requests one or more of the initialization segments identified according to the selected media content components through HTTP requests (step 510). The server replies with metadata (step 515), typically the ones available in the ISOBMFF 'mom/ box and its sub-boxes. The client does the set-up (step 520) and may request index information from the server (step 525). This is the case for example in DASH profiles where indexed media segments are in use, e.g. live profile. To achieve this, the client may rely on an indication in the MPD (e.g., indexRange) providing the byte range for the index information. When the media content components are encapsulated according to ISOBMFF, the segment index information may correspond to the Segmentlndex box 'sidx' and optionally an associated new version of the sub-segment index box 'ssix' according to some embodiments of the invention, as described here after. In the case according to which the media data are encapsulated according to MPEG-2 TS, the indication in the MPD may be a specific URL referencing an Index Segment.
Next, the client receives the requested segment index from the server (step 530). From this index, the client may compute byte ranges (step 535) to request movie fragments or portions of a movie fragment at a given time (e.g. corresponding to a given time range) or corresponding to a given feature of the bit-stream (e.g. a point to which the client can seek (e.g. a random-access point or stream access point), a scalability layer, a temporal sub-layer or a spatial sub-part such as a HEVC tile, a G-PCC tile or VVC subpicture for example). The client may issue one or more requests to get one or more movie fragments or portions of movie fragments (typically portions of data within the Media data box) for the selected media content components in the MPD (step 540). The server replies to the requested data by sending one or more sets of data byte ranges comprising 'moo,', 'Indat boxes, or portions of mdat boxes or a combination thereof (step 545). It is observed that the requests for the movie fragments may be made directly without requesting the index, for example when media segments are described as segment template and no index information is available.
Upon reception of the requested data, the client de-encapsulates, optionally decodes and renders the corresponding media data and prepares the request for the next time interval (step 550). This may consist in getting a new index, even sometimes in getting an MPD update or simply to request next media segments as indicated in the MPD (e.g. following a SegmentList or a SegmentTemplate description).
Figure 6 is a block diagram illustrating an example of steps carried out by a server or file writer to encapsulate and transmit media data to a client, according to some embodiments of the invention.
As illustrated, a first step (step 600) is directed to encoding media data as including one or more bit-stream features (e.g., points to which the client can seek (i.e., random-access points or stream access points), scalability layers, temporal sub-layers, and/or spatial sub-parts such as HEVC tiles, G-PCC files or VVC sub-pictures).
According to some embodiments, multiple alternatives of the encoded media data may be generated, for example in terms of quality, resolution, etc. The encoding step results in bit-streams that are encapsulated (step 605). It is noted that the encoding step 600 is optional and media data can be obtained and encapsulated without being first encoded.
The encapsulation step comprises generating structured boxes containing metadata describing the placement and timing of the media data. The encapsulation step (605) may also comprise generating indexes to make it possible to access sub-parts of the media data (e.g., by using a csidx' box, a issix' box according to an embodiment of the invention as described below, and optionally a Veva' box).
According to some embodiments of the invention, in order to allow a client or reader reconstructing a valid media file compliant with ISOBMFF from partial sub-segments, a server or file writer assigns ssix' level values corresponding to a feature_type (also denoted level_assignement type) equal to 1 or above such that, when replacing with 0 (zero) every byte assigned to a level higher than level N, with N>0: -the result is a valid ISOBMFF file, although the media samples may be invalid since some bytes may be missing and, -the sequence, in decoding order, of all complete samples in the segment is valid according to the coding format (i.e., a reader actually removes samples overlapping any discarded range).
To achieve this, it is necessary for the client or reader to be able to locate all top-level boxes in the encapsulated media data and to take into account some byte ranges that are not fetched in the reconstruction process.
Therefore, the server or file writer assigns levels to partial sub-segments in order to comply with the following modified semantics of level values for feature type equal to 1 or 2: -0: if a level value is equal to zero, this indicates that the associated byte range contains: * exactly one or more file-level boxes (e.g., a movie fragment) other than a media data container box (e.g., MediaDataBox or IdentifiedMediaDataBox), and/or * zero or at most one box header (8 or 16 bytes) of a media data container box which corresponds to the last 8 or 16 bytes of the byte range, 1: if a level value is equal to one, this indicates that the data in the associated byte range is independently decodable (SAP 1, 2 or 3). Byte range with level 0 preceding this level is required to process the data, N: if a level value is equal to any value between 2 and N, this indicates that the associated byte range requires data from the preceding byte ranges with lower levels (level N-1 and below) to be processed, stopping at the previous preceding byte range with level 1 if specified (i.e., if present), otherwise at the first byte range in the box. All byte ranges with level 0 immediately preceding any required dependent level (i.e., each first byte range associated with level 0 preceding a byte range associated with a required dependent level) are required to process the data. Therefore, ranges assigned to levels other than 0 does not contain any file-level box headers and the header of a media data container box (e.g., MediaDataBox mdat or IdentifiedMediaDataBox Vmda') is in level 0 while media data may be in level 1 to N. In a variant, it may be also necessary to identify dependencies between different file-level boxes, for instance when a byte range contains a dependent movie fragment that requires a preceding movie fragment to be processed, or when a byte range contains only a header of a media data box, it is necessary to be able to identify the preceding byte range containing the corresponding movie fragment.
In such a case, in order to allow a client or reader to reconstruct a valid media file compliant with ISOBMFF from partial sub-segments, a server or file writer assigns ssix' level values corresponding to a feature_type (also denoted level_assignement_type) equal to 1 or above such that, when replacing with 0 (zero) every byte assigned to a level higher than level N, with N>1: -the result is a valid ISOBMFF file, although the media samples may be invalid since some bytes may be missing; and, -the sequence, in decoding order, of all complete samples in the segment is valid according to the coding format (i.e., a reader actually removes samples overlapping any discarded range).
In a variant, the rule above may apply when replacing with 0 (zero) every byte assigned to a level higher than level N, with N>=1. In such a case the resulting ISOBMFF file only contains boxes and no media data (or samples).
And, an additional level value is defined between the level 0 and the level identifying independently decodable media data. The server or file writer assigns levels to partial sub-segments in order to comply with the following modified semantics of level values for feature type equal to 1: - 0: if a level value is equal to zero, this indicates that the associated byte range contains: * exactly one or more file-level boxes (e.g., MovieFragmentBox) other than a media data container box (e.g., MediaDataBox or IdenfifiedMediaDataBox), and/or * zero or at most one box header (8 or 16 bytes) of a media data container box which corresponds to the last 8 or 16 bytes of the byte range, 1: if a level value is equal to one, this indicates same type of data as level 0 but having a dependency on previous byte range with level LO (e.g. one single box header (8 or 16 bytes) of a media data container box, the media data container box containing data described by the preceding MovieFragmentBox), 2: if a level value is equal to two, this indicates that the data in the associated byte range is independently decodable (SAP 1, 2, or 3). Byte range with level 1 immediately preceding if specified (i.e., the first preceding byte range associated with level 1, following the first preceding byte range associated with level 0, if present) and the first preceding byte range with level 0 are required to process the data.
- N: N>2, if a level value is equal to any value between 3 and N, this indicates that the associated byte range requires data from the preceding byte ranges with lower levels (level N-1 and below) to be processed, stopping at the previous preceding byte range with level 2 if specified (i.e., if present), otherwise at the first byte range in the box. Each first byte range with level 0 or 1 preceding (or immediately preceding) any required byte range (with level 2 to N) are required to process the data.
Therefore, ranges assigned to levels other than 0 or 1 does not contain file-level box headers and the header of a media data container box (e.g., MediaDataBox mdaf or IdentifiedMediaDataBox Vmda') is in level 0 or 1 while media data may be in level 2 to N. In addition, in order to comply with the modified semantics of level values for feature type equal to 1 above, the semantics of the feature_type equal to 2 may be modified as follows: 2: if the feature_type parameter is set to two, the level value corresponds to a multitrack dependency level. In this mode, lbs is equal to one or more (i.e., at least 16 bits to code the level). The first 8 bits of the level field give the dependency level value, with the same values and semantics as the ones set for the level_assignment_type with value one. The remaining less significant bits of the level field give a track_ID, which identifies a track of the movie present in the indexed sub-segment for level values other than zero and one. It is set to zero if the level value is equal to zero or one. In this mode, each range with a level greater than one1 consists only of data from the identified track, and the level value only gives dependency information within the track. This allows cross-track indexation within a same level; In a variant, when a movie fragment contains only one track and the level value is equal to zero or one, the remaining less significant bits of the level field can be set to a track_ID since the movie fragment contains only one track and there is no possible confusion in track_IDs.
Once the indexing of the one or more media files or media segments resulting from the encapsulation step has been performed, the one or more media files or media segments are described in a streaming manifest (step 610), for example in a DASH MPD. Next, the media files or segments with their description are published on a streaming server for diffusion to clients (step 615).
It is noted that a file writer may only carry out steps 600 and 605 to produce encapsulated media data and save them on a storage device.
Figure 7 is a block diagram illustrating an example of steps carried out by a client to obtain data from a server, according to some embodiments of the invention.
As illustrated, a first step is directed to requesting and obtaining a media presentation description or streaming manifest (step 700). Next, the client gets initialization information (e.g., the initialization segments) from the server and initializes its player(s) and/or decoder(s) (step 705) by using items of information of the obtained media description and initialization segments.
Next, the client selects one or more media content components (or encapsulated media data) to play from the media description (step 710) and requests information on these media content components, for example index information (step 715) including for instance a sidx' box, a ssix' box according to some embodiments of the invention, and optionally a cleva' box. Next, after having parsed received index information (step 720), the client may select byte ranges for data to request (i.e. partial sub-segments), corresponding to portions of the selected media content components (step 725). In order to be able to reconstruct a valid media file from the selected partial sub-segment, the client also selects all partial sub-segments the selected partial sub-segment is depending on as signalled by levels in ssix' box. Next, the client issues requests for the data that are actually selected (step 730).
As described by reference to Figure 5, this may be done in one or more requests and responses between the client and a server, depending on the index used during the encapsulation and the level of description in the media presentation description.
It is noted that a reader or file parser may only conduct steps 705 to 725 to access portions of data from an encapsulated media data located on a local storage device.
Next, the client, reader, or file parser may reconstruct a media file compliant with ISOBMFF from the requested data by concatenating contiguous requested data in the order of their byte ranges (step 740). If two requested data in byte range order are not contiguous in byte ranges, then the missing data between the non-contiguous requested data are replaced with 0 (zero). In addition, samples overlapping any missing byte range are removed.
Figure 8a illustrates a first example of a level assignment using level values of the sub-segment index box ssix' as defined in the variant described in reference to Figure 6.
According to this example, the level assignment is used to identify the byte ranges corresponding to the stream access points referenced 805 and 810 (e.g. instantaneous decoding refresh (IDR) frames) in the sub-segment referenced 800. The feature type is set to the predefined value 1 (identifying dependency levels).
In order to allow a client to reconstruct a valid media file, the file-level boxes are explicitly signalled independently from media data. The first byte range begins with a file-level box, the movie fragment box 'moo'. It also includes the beginning of the media data box imdat (i.e. its box header comprising its four-character code and the size).
The level value assigned to this first byte range is set to 0 (zero) since the byte range begins with a top-level box. The second byte range is set to level 2 (two) and identifies the first independently decodable media data (SAP 1, 2, or 3) (805). The third byte-range between the two IDR frames is composed of predictively coded P-frames that depends on the decoding of the first IDR frame 805. Any level value N greater than two can be used to identify this byte range. The level value indicates that this byte range may depend on preceding byte ranges with level value smaller than N up to previous independently decodable media data, if any, and on the preceding byte range with level 0 to be able to process the data and reconstruct a valid media file compliant with ISOBMFF. The fourth byte range corresponds to the second IDR frame (reference 810). It is assigned to the level value two to indicate that this byte range contains independently decodable media data (SAP 1, 2, 3). Accordingly, a client may use this indication to jump directly to this stream access point. The fifth byte range corresponding to another set of P-frames depending on the IDR frame 810 is assigned to a level N greater than two to signal their dependence to preceding byte ranges with level value smaller than N up to previous independently decodable media data (i.e. the IDR frame 810) and on the preceding byte range with level 0 to be able to process the data and reconstruct a valid media file compliant with ISOBM FF.
Figure 8b illustrates a second example of a level assignment using level values of the sub-segment index box ssix as defined in the variant described in reference to Figure 6.
This example illustrates a low latency DASH sub-segment 860 composed of two chunks referenced 865 and 870 (each chunk corresponding to a media fragment). In this example, the feature type in the tssix' is set to the predefined value one (identifying dependency levels).
The first byte range begins with a file-level box, the movie fragment box 'moor. It also includes the beginning of the media data box rndat (i.e. its box header comprising its four-character code and the size). The level value assigned to this first byte range is set to 0 (zero) since the byte range begins with a top-level box. The second byte range contains an IDR frame and is assigned to a level two indicating that the byte range contains independently decodable data (SAP 1, 2 or 3). The third byte range is assigned to a level three (i.e. to a value greater than two) because it contains dependently decodable data. The fourth byte range, corresponding to the beginning of the second chunk 870, also begins with a file-level box, a movie fragment box 'moo,'. However, it is a dependent movie fragment box that depends on the movie fragment box from the first chunk 865. It also includes the beginning of the media data box 'nide (i.e. its box header comprising its four-character code and the size). The level value assigned to this fourth byte range is set to 1 (one) since the byte range begins with a dependent top-level box. The fifth byte range only contains predictively coded P-frames and is assigned with level value three because its data depends on data from byte range with assigned level two.
Figures 9a to 9d illustrate some examples of organisation of byte ranges of data in levels and of selection of byte ranges to process a given byte range, depending on these levels, according to some embodiments of the invention.
According to the illustrated examples, level 0 (LO) indicates file-level boxes and/or a box header of a media data container box, level 1 (L1) indicates same type of data as level 0 but having a dependency on a previous byte range associated with level LO, level 2 (L2) indicates independently decodable data (SAP 1, 2, 3), and level N (with N > 2, LN) indicates data requiring byte ranges associated with lower levels (level N-1 to level 2) to be processed.
For each example illustrated in Figures 9a to 9d, the top line indicates the organisation of byte ranges of data in levels, where Lx represents the level with value x associated with the corresponding byte range, and the bottom line indicates the byte ranges which are required to process a given byte range indicated by a surrounding square (the byte range which are not required are crossed out), according to some embodiments of the invention.
As illustrated in Figure 9a, the byte range referenced 900 which is associated with level 4 (L4) is selected as the given byte range to process. As a consequence, the preceding byte ranges with lower levels (level N-1 and below) are also selected (i.e. required), stopping at the first preceding byte range with level 2 (L2), as illustrated with reference 910, skipping the byte ranges associated with the same or higher levels in-between, in particular the byte range associated with level 5 (L5), as illustrated with reference 905. Preceding byte ranges with level 2 and 3 (referenced 915) are also skipped since they are located after the first byte range associated with level 2 preceding the given byte range, starting from the latter. Byte range 920 associated with level 0 (LO) is selected since it is the first byte range associated with level 0 preceding a selected (or required) byte range associated with a level from 2 to N, which is also required to process the given byte range.
According to the example of Figure 9b, the right-most byte range associated with level 3 (L3) is selected as the given byte range to process. Accordingly the first preceding byte range with level 1 (L1) is selected. The preceding byte ranges associated with levels 1, 3, and 4 respectively, referenced 930, are skipped since only the first preceding byte range associated with level 2 is required to process data with level 3. It is noted that byte range 935 associated with level 1 is not selected since it is not the first byte range associated with level 1 preceding a selected byte range associated with a level from 2 to N. The preceding byte ranges associated with level 2 and level 0, referenced 940, are also selected since a given byte range requires byte ranges associated with lower levels stopping at the previous preceding byte range with level 2 and the first byte ranges associated with level 0 preceding a selected byte range associated with a level from 2 to N to be processed.
According to the example of Figure 9c, the right-most byte range associated with level 4 (L4) is selected as the given byte range to process. Accordingly, the preceding byte range associated with level 1 (L1), referenced 950, is selected (for the same reasons as for byte range 925 in Figure 9b). In the preceding byte ranges associated with level 3 and 4, only the byte range with level 3, referenced 955, is also selected since its level is lower than the given byte range associated with level 4. The second preceding byte range associated with level 1 is also selected since it is a first byte range associated with level 1 preceding a required byte range associated with a level from 2 to N. The preceding bytes range associated with level 2 and level Dare also selected since a byte range associated with a level greater than 2 requires byte ranges associated with lower levels, stopping at the first preceding byte range associated with level 2 and the one or more first byte ranges associated with level 0 preceding a required byte range associated with a level equal to or greater than 2 to be processed. According to the example of Figure 9d, the right-most byte range associated with level 4 (L4), referenced 960, is selected as the given byte range to process.
Accordingly, the following byte range associated with level 3 is skipped since it is not a preceding byte range. Going from right to left from the given byte range, the first preceding byte range associated with level 5 is also skipped since its level is higher than the level of the initially selected byte range. The first preceding byte range associated with level 0 (LO), referenced 965, is selected since it is a first byte range associated with level 0 preceding a required byte range associated with a level from 2 to N. Byte range 970 associated with level 5 is also skipped since its level is higher than the level of the initially selected byte range. Byte range 975 associated with level 0 (LO) preceding the initially selected byte range with level 4 is skipped since it is not a first byte range associated with level 0 preceding a required byte range associated with a level from 2 to N (in this case, the first byte range associated with level 0 preceding required byte range 960 is byte range 965). Byte range associated with level 4 and its preceding byte range associated with level 1, referenced 980, are also skipped for the same reasons as the one given by reference to Figure 9b. Finally, the remaining most-left byte ranges associated with levels 3, 1, 2, and 0 are also selected.
Figure 10 is a schematic block diagram of a computing device 1000 for implementation of one or more embodiments of the invention. The computing device 1000 may be a device such as a micro-computer, a workstation, or a light portable device. The computing device 1000 comprises a communication bus 1002 connected to: -a central processing unit (CPU) 1004, such as a microprocessor; -a random access memory (RAM) 1008 for storing the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method for encapsulating, indexing, de-encapsulating, and/or accessing data, the memory capacity thereof can be expanded by an optional RAM connected to an expansion port for example; -a read only memory (ROM) 1006 for storing computer programs for implementing embodiments of the invention; -a network interface 1012 that is, in turn, typically connected to a communication network 1014 over which digital data to be processed are transmitted or received. The network interface 1012 can be a single network interface, or composed of a set of different network interfaces (for instance wired and wireless interfaces, or different kinds of wired or wireless interfaces). Data are written to the network interface for transmission or are read from the network interface for reception under the control of the software application running in the CPU 1004; -a user interface (UI) 1016 for receiving inputs from a user or to display information to a user; -a hard disk (HD) 1010; and/or -an I/O module 1018 for receiving/sending data from/to external devices such as a video source or display.
The executable code may be stored either in read only memory 1006, on the hard disk 1010 or on a removable digital medium for example such as a disk. According to a variant, the executable code of the programs can be received by means of a communication network, via the network interface 1012, in order to be stored in one of the storage means of the communication device 1000, such as the hard disk 1010, before being executed.
The central processing unit 1004 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to embodiments of the invention, which instructions are stored in one of the aforementioned storage means. After powering on, the CPU 1004 is capable of executing instructions from main RAM memory 1008 relating to a software application after those instructions have been loaded from the program ROM 1006 or the hard-disc (HD) 1010 for example. Such a software application, when executed by the CPU 1004, causes the steps of the flowcharts shown in the previous figures to be performed.
In this embodiment, the apparatus is a programmable apparatus which uses software to implement the invention. However, alternatively, the present invention may be implemented in hardware (for example, in the form of an Application Specific Integrated Circuit or ASIC).
Although the present invention has been described hereinabove with reference to specific embodiments, the present invention is not limited to the specific embodiments, and modifications will be apparent to a person skilled in the art which lie within the scope of the present invention.
Many further modifications and variations will suggest themselves to those versed in the art upon making reference to the foregoing illustrative embodiments, which are given by way of example only and which are not intended to limit the scope of the invention, that being determined solely by the appended claims. In particular the different features from different embodiments may be interchanged, where appropriate.
In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used.
Claims (20)
- CLAIMS1. A method for processing encapsulated media data, the encapsulated media data comprising metadata and data associated with the metadata, the metadata being descriptive of the associated data, the encapsulated media data comprising a plurality of segments, at least one segment comprising a plurality of sub-segments, the method comprising: for a plurality of ordered byte ranges of at least one of the sub-segments, the byte ranges being defined in metadata descriptive of partial sub-segments of the at least one of the sub-segments, obtaining one level value associated with each byte range within the metadata descriptive of partial sub-segments of the at least one of the sub-segments, obtaining a feature type value indicating that the level values are representative of dependency levels, the feature type value being obtained from the metadata descriptive of partial sub-segments of the at least one of the sub-segments, based on the level value associated with a given byte range to be processed, selecting only byte ranges that are required for processing the given byte range, the selected byte ranges being byte ranges preceding the given byte range, that are associated with lower level values, obtaining the given byte range and the selected byte ranges.
- 2. The method of claim 1, wherein the obtained level value associated with the given byte range indicates that all data from byte ranges preceding the given byte range, that are associated with level values lower than the obtained level value associated with the given byte range and higher than a predetermined level value, from the given byte range to a first preceding byte range associated with the predetermined level value, and the data of the first byte range associated with the predetermined level value preceding the given byte range, denoted the required byte ranges, are required to process the given byte range.
- 3. The method of claim 2, wherein the obtained level value associated with the given byte range further indicates that in addition to all data from the required byte ranges, denoted the first required byte ranges, the predetermined level value being denoted the first predetermined level value, data from each first byte range associated with a second predetermined level value preceding a byte range of the first required byte ranges, denoted the second require byte ranges, are required to process the given byte range.
- 4. The method of claim 3, wherein the second predetermined level value is value zero, a byte range associated with level value zero comprising only metadata.
- 5. The method of claim 3 or claim 4, wherein the obtained level value associated with the given byte range further indicates that in addition to all data from the first and second required byte ranges, data from each first byte range associated with a third predetermined level value preceding a byte range of the first required byte ranges and following a first byte range of the second required byte ranges are required to process the given byte range.
- 6. The method of claim 5, wherein the third predetermined level value is value one, a byte ranges associated with level value one comprising only metadata depending on a first preceding byte range associated with level value zero.
- 7. The method of any one of claims 1 to 6, further comprising identifying a byte range which is not required for processing the given byte range, between two consecutive required byte ranges, and padding the identified byte range with dummy values.
- 8. The method of claim 7, further comprising generating a media file comprising required byte ranges, that complies with the ISOBMFF standard.9. The method of any one of claims 1 to 8, wherein the metadata descriptive of partial sub-segments of the at least one of the sub-segments belong to a box of the tssbe type, the media data being encapsulated according to ISOBM FF.
- 9. A method for encapsulating media data, the media data comprising metadata and data associated with the metadata, the metadata being descriptive of the associated data, the media data comprising a plurality of segments, at least one segment comprising a plurality of sub-segments, the method comprising: for a plurality of ordered byte ranges of at least one of the sub-segments, associating one level value with each byte range within metadata descriptive of partial sub-segments of the at least one of the sub-segments, wherein the metadata descriptive of partial sub-segments of the at least one of the sub-segments further comprise a feature type value indicating that the level values are representative of dependency levels and wherein the level value associated with a given byte range indicates which data from preceding byte ranges that are associated with lower level values are required to process the given byte range.
- 10. The method of claim 9, wherein the level value associated with the given byte range indicates that all data from byte ranges preceding the given byte range, that are associated with level values lower than the obtained level value associated with the given byte range and greater than a predetermined level value, from the given byte range to a first preceding byte range associated with the predetermined level value, and the data of the first byte range associated with the predetermined level value preceding the given byte range, denoted the required byte ranges, are required to process the given byte range.
- 11. The method of claim 10, wherein the level value associated with the given byte range further indicates that in addition to all data from the required byte ranges, denoted the first required byte ranges, the predetermined level value being denoted the first predetermined level value, data from each first byte range associated with a second predetermined level value preceding a byte range of the first required byte ranges, denoted the second require byte ranges, are required to process the given byte range.
- 12. The method of claim 11, wherein the second predetermined level value is value zero, a byte range associated with level value zero comprising only metadata.
- 13. The method of claim 11 or claim 12, wherein the level value associated with the given byte range further indicates that in addition to all data from the first and second required byte ranges, data from each first byte range associated with a third predetermined level value preceding a byte range of the first required byte ranges and following a first byte range of the second required byte ranges are required to process the given byte range.
- 14. The method of claim 13, wherein the third predetermined level value is value one, a byte ranges associated with level value one comprising only metadata depending on a first preceding byte range associated with level value zero.
- 15. The method of any one of claims 9 to 14, wherein the metadata descriptive of partial sub-segments of the at least one of the sub-segments belong to a box of the (ssix' type, the media data being encapsulated according to ISOBMFF.
- 16. The method of any one of claims 9 to 15, wherein a same level value is associated with at least two non-contiguous byte ranges of the at least one of the sub-segments.
- 17. A computer program product for a programmable apparatus, the computer program product comprising a sequence of instructions for implementing each of the steps of the method according to any one of claims 1 to 16 when loaded into and executed by the programmable apparatus.
- 18. A non-transitory computer-readable storage medium storing instructions of a computer program for implementing each of the steps of the method according to any one of claims 1 to 16.
- 19. A device for processing encapsulated media data, the device comprising a processing unit configured for carrying out each of the steps of the method according to any one of claims 1 to 8.
- 20. A device for encapsulating media data, the device comprising a processing unit configured for carrying out each of the steps of the method according to any one of claims 9 to 16.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB2210189.3A GB2620582A (en) | 2022-07-11 | 2022-07-11 | Method, device, and computer program for improving indexing of portions of encapsulated media data |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB2210189.3A GB2620582A (en) | 2022-07-11 | 2022-07-11 | Method, device, and computer program for improving indexing of portions of encapsulated media data |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| GB202210189D0 GB202210189D0 (en) | 2022-08-24 |
| GB2620582A true GB2620582A (en) | 2024-01-17 |
Family
ID=84539846
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| GB2210189.3A Pending GB2620582A (en) | 2022-07-11 | 2022-07-11 | Method, device, and computer program for improving indexing of portions of encapsulated media data |
Country Status (1)
| Country | Link |
|---|---|
| GB (1) | GB2620582A (en) |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| GB2599170A (en) * | 2020-09-29 | 2022-03-30 | Canon Kk | Method, device, and computer program for optimizing indexing of portions of encapsulated media content data |
-
2022
- 2022-07-11 GB GB2210189.3A patent/GB2620582A/en active Pending
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| GB2599170A (en) * | 2020-09-29 | 2022-03-30 | Canon Kk | Method, device, and computer program for optimizing indexing of portions of encapsulated media content data |
Also Published As
| Publication number | Publication date |
|---|---|
| GB202210189D0 (en) | 2022-08-24 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP6924266B2 (en) | Methods and devices for encoding video data, including generated content | |
| JP7249413B2 (en) | Method, apparatus and computer program for optimizing transmission of portions of encapsulated media content | |
| CN114846811B (en) | Method, apparatus and computer program for improving encapsulation of media content | |
| CN113170239B (en) | Method, apparatus and storage medium for encapsulating media data into media files | |
| GB2593897A (en) | Method, device, and computer program for improving random picture access in video streaming | |
| WO2019121963A1 (en) | Prioritized transmission of predetermined portions of encapsulated media content | |
| US20230370659A1 (en) | Method, device, and computer program for optimizing indexing of portions of encapsulated media content data | |
| JP2013532441A (en) | Method and apparatus for encapsulating encoded multi-component video | |
| CN113615205B (en) | Methods, apparatuses and computer programs for signaling available portions of packaged media content | |
| US20130097334A1 (en) | Method and apparatus for encapsulating coded multi-component video | |
| US12407880B2 (en) | Method and apparatus for optimizing media content data encapsualtion in low latency applications | |
| US20250024119A1 (en) | Method, device, and computer program for improving indexing, filtering, and repairing of portions of encapsulated media data | |
| GB2620582A (en) | Method, device, and computer program for improving indexing of portions of encapsulated media data | |
| WO2022148650A1 (en) | Method, device, and computer program for encapsulating timed media content data in a single track of encapsulated media content data | |
| US20250024118A1 (en) | Method, device, and computer program for improving signaling of multiple transformations applying to encapsulated media data | |
| GB2634576A (en) | Method, device, and computer program for signaling hidden toplevel boxes in an encapsulated media data file | |
| GB2629222A (en) | Method and apparatus for encapsulating and parsing a media file comprising neural network based post filter information | |
| JP2013534101A (en) | Method and apparatus for encapsulating encoded multi-component video | |
| CN120937378A (en) | Method and apparatus for packaging and parsing media files including neural network-based post-filter information |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| COOA | Change in applicant's name or ownership of the application |
Owner name: CANON KABUSHIKI KAISHA Free format text: FORMER OWNERS: CANON KABUSHIKI KAISHA;TELECOM PARIS |