HK1262962A1

HK1262962A1 - Method for decoding a picture and electronic device for decoding the same

Info

Publication number: HK1262962A1
Application number: HK19122921.0A
Authority: HK
Inventors: 萨琴·G·德希潘德
Original assignee: 杜比国际公司
Priority date: 2012-10-02
Filing date: 2019-04-25
Publication date: 2020-01-24

Description

Method for decoding picture and electronic device for decoding picture

This application is a divisional application of the invention patent application No.201380062970.8(PCT international application No. PCT/JP2013/005888) entitled "method of signaling step-by-step time sub-layer access samples", filed on day 6/2 of 2015 and entered the chinese patent office (application date: 10/2 of 2013).

Technical Field

The present disclosure relates generally to electronic devices. More particularly, the present disclosure relates to a method of signaling a stepped temporal sublayer access sample.

Background

To meet consumer demand and to improve portability and convenience, electronic devices have become smaller and more powerful. Consumers have begun to rely on electronic devices and have begun to expect more and more functionality. Some examples of electronic devices include desktop computers, laptop computers, cellular phones, smart phones, media players, integrated circuits, and the like.

Some electronic devices are used to process and display digital media. For example, portable electronic devices now allow consumers to consume digital media in almost any location that they may be in. In addition, some electronic devices may provide for the downloading or streaming of digital media content for use and enjoyment by consumers.

The increasing popularity of digital media has raised several problems. For example, efficiently representing high quality digital media for storage, transmission, and playback presents several challenges. From this discussion, it can be seen that a system and method for representing digital media more efficiently may be beneficial.

Disclosure of Invention

One embodiment of the present invention discloses a method for decoding a picture, comprising: receiving an audiovisual bitstream; a stepped temporal sub-layer access (STSA) sample packet is obtained, wherein the STSA sample packet is indicated by a grouping _ type with "STSA" in a sample group description box.

Drawings

Fig. 1 is a block diagram illustrating an example of one or more electronic devices in which systems and methods for signaling stepped time sub-layer access (STSA) sample packets may be implemented.

Fig. 2 is a block diagram showing two examples of coding structures.

FIG. 3 is a block diagram illustrating one configuration of an encoder on an electronic device.

Figure 4 is a flow diagram illustrating one configuration of a method for signaling a stepped temporal sub-layer access (STSA) sample packet.

Fig. 5 is a flowchart showing a more specific configuration of a method for signaling a stepped temporal sub-layer access (STSA) sample packet.

Fig. 6 shows a block diagram of one configuration of a decoder on an electronic device.

Figure 7 is a flow diagram illustrating one configuration of a method for receiving a stepped temporal sub-layer access (STSA) sample packet.

Fig. 8 is a flowchart showing a more specific configuration of a method for receiving a stepped temporal sub-layer access (STSA) sample packet.

Figure 9 illustrates a block diagram of one configuration of an electronic device in which systems and methods for signaling Stepped Time Sublayer Access (STSA) sample packets may be implemented.

Figure 10 illustrates a block diagram of one configuration of an electronic device in which systems and methods for receiving a stepped temporal sub-layer access (STSA) sample packet may be implemented.

FIG. 11 is a block diagram illustrating various components that may be utilized in a transmitting electronic device.

FIG. 12 is a block diagram illustrating various components that may be utilized in a receiving electronic device.

Detailed Description

An electronic device for decoding a picture is described. The electronic device includes a processor and instructions stored in a memory in electronic communication with the processor. The instructions can perform encoding the stepped temporal sublayer access sample packets. The instructions can also perform sending and/or storing STSA sample packets.

Sending the STAS sample packets may include storing the STSA sample packets in a recordable storage medium. The recordable storage medium may be a file. Encoding the STSA sample grouping may include encoding the STSA sample grouping based on an ISO base media file format. The ISO base media file format may be extended to support High Efficiency Video Coding (HEVC) video streams. Sending the STSA sample packets may include sending the STSA sample packets in an ISO base media file format.

The STSA sample packet may indicate STSA samples. The STSA sample packet may indicate a next temporal layer switching point at the same temporal layer. The next temporal layer switching point may indicate a number of samples in the same temporal layer to the next temporal layer switching point.

The STSA sample packet may indicate the next temporal layer switching point at a higher temporal layer. The next temporal layer switching point may indicate some samples at a higher temporal layer to the next temporal layer switching point. The higher temporal layer may be a temporal layer having a temporal Identifier (ID) greater than that of the current sample. The higher temporal layer may be a temporal layer having a temporal ID greater than the temporal ID of the current sample.

The STSA sample packets may be sent in a sample group description box (SGPD). The SGPD may include one of a next STSA up-switch distance parameter, a next STSA sample distance parameter, and a type Time Switch Access (TSA) flag. The up-switch distance parameter may be next _ STSA _ up _ distance, the next STSA sample distance parameter may be next _ STSA _ sample _ distance, and the type TSA flag may be typeTSAFlag. The value of the type TSA flag may indicate that a sample in the STSA sample grouping is a TSA sample or a STSA sample. The STSA picture may provide a temporal layer switching function to a temporal layer to which the STSA picture belongs.

An electronic device for decoding pictures is also described. The electronic device includes a processor and instructions stored in a memory in electronic communication with the processor. The instructions are capable of performing one of receiving a bitstream and a recordable storage medium. The instructions may also perform obtaining a Stepped Temporal Sublayer Access (STSA) sample packet. The instructions may also perform decoding the STSA sample packet. The instructions may also perform determining when to switch to a new temporal layer based on the STSA sample packets.

The recordable storage medium may be a file. Receiving the STSA sample packets may include receiving the STSA sample packets in an ISO base media file. Decoding the STSA sample packets may include decoding the STSA sample packets based on an ISO base media file format. The ISO base media file format may have been extended to support High Efficiency Video Coding (HEVC) video streams.

The STSA sample packet may indicate STSA samples. The SRSA sample packet may indicate the next temporal layer switching point at the same temporal layer. The next temporal layer switching point may indicate some samples to the next temporal layer switching point at the same temporal layer.

The STSA sample packet may indicate the next temporal layer switching point at a higher temporal layer. The next temporal layer switching point may indicate some samples to the next temporal layer switching point at a higher temporal layer. The higher temporal layer may be a temporal layer having a temporal Identifier (ID) greater than that of the current sample. The higher temporal layer may be a temporal layer having a temporal ID greater than the temporal ID of the current sample.

The STSA sample packets may be sent in a sample group description box (SGPD). The SGPD may include one of a next STSA up-switch distance parameter, a next STSA sample distance parameter, and a type Time Switch Access (TSA) flag. The up-switch distance parameter may be next _ STSA _ up _ distance, the next STSA sample distance parameter may be next _ STSA _ sample _ distance, and the type TSA flag may be typeTSAFlag.

The value of the type TSA flag may indicate that a sample in the STSA sample grouping is a TSA sample or a STSA sample. The STSA picture may provide a temporal layer switching function to a temporal layer to which the STSA picture belongs.

A picture encoding method is also described. A stepped temporal sub-layer access (STSA) sample packet is encoded. The STSA sample packet is sent.

A picture decoding method is also described. A bitstream and/or a recordable storage medium is received. A stepped temporal sub-layer access (STSA) sample packet is obtained. The STSA sample packet is decoded. A determination is made when to switch to a new temporal layer based on the STSA sample packets.

Systems and methods are disclosed herein that describe methods of signaling a stepped temporal sub-layer access (STSA) sample packet. For example, some configurations described herein include apparatus and methods that signal STSA sample packets using corresponding Network Access Layer (NAL) units. The STSA sample packet may include one or more STSA samples.

In some known configurations, such as "High Efficiency Video Coding (HEVC) text specification Draft 8," JCTVC-J1003_ d7, Stockholm, July 2012(hereinafter "HEVC Draft 8" by Benjamin Bros et al, STSA pictures are described. HEVC Draft 8 also describes a Network Access Layer (NAL) unit type corresponding to an STSA picture. In some cases, an STSA picture may be referred to as a progressive temporal layer access (GTLA) picture.

The High Efficiency Video Coding (HEVC) standard provides increased coding efficiency and enhanced robustness. Thus, ISO/IEC 14496-15, "Carriage of NAL unit structured video in the ISO Base Media File Format," Stockholm, July 2012(hereinafter "ISO/IEC 14496-15") defines the transport of video composed of NAL units in the ISO Base Media File Format. In addition, information technology, audiovisual object coding, "Part 15: card of NAL unit structured video in the ISO Base Media File Format," AMENDMENT 2: card of High Efficiency Video Coding (HEVC), "Stockholm, July 2012, defines the transport of High Efficiency Video Coding (HEVC) video streams. The storage of HEVC content uses the capabilities of the existing ISO base media file format, but also defines extensions to support the features of the HEVC codec. For example, ISO/IEC 14496-15 provides a method for transporting video composed of NAL units in the ISO base media file format. Methods involving HEVC transportation are also described (see "Part 15: Carriage of NAL unit structured video in the ISO Base Media File Format, AMENDMENT 2: Carriage of High Efficiency Video Coding (HEVC)", Stockholm, July 2012).

One of the HEVC features supported by the ISO base media file format includes parameter sets. For example, Video Parameter Set (VPS) mechanisms, Sequence Parameter Set (SPS) mechanisms, and Picture Parameter Set (PPS) mechanisms may decouple the transmission of rarely changing information from the transmission of encoded data blocks. Each slice containing a coded data block may reference a PPS containing its decoding parameters. In turn, the PPS may reference an SPS that includes sequence level decoding parameter information. The SPS may reference a VPS that includes global decoding parameter information, such as cross-layers or views in potentially scalable and 3DV extensions. Furthermore, HEVC may also support an adaptive change parameter set (APS), including decoding parameters that should be changed more frequently than the coding parameters in PPS. The Adaptation Parameter Set (APS) may also be referenced by a slice (slice) if needed.

To support HEVC units in the ISO base media file format, additional tools, such as sample packets, may also be included. For example, temporal scalability sample grouping may provide a construction and grouping mechanism to indicate a combination of different hierarchical levels of access units and temporal scalability. As another example, temporal sub-layer access sample packets may provide a construction and grouping mechanism to indicate the identity of an access unit that is a temporal sub-layer access (TSA) sample. In some cases, a temporal layer may be referred to as a temporal sublayer, or a sublayer. Similarly, a temporal sub-layer access (TSA) sample may be referred to as a Temporal Layer Access (TLA) sample.

In some configurations, distributed temporal sub-layer access (STSA) sample grouping may also be added to provide a construction and grouping mechanism to indicate the identification of access units as STSA samples. For example, a STSA sample packet may indicate a STSA sample.

In some configurations, temporal sub-layer access (TSA) type sample packets may be added to provide a construction and grouping mechanism to indicate the identification of access units as TSA and STSA samples. For example, a TSA type sample packet may indicate TSA and STSA samples. Other information in the sample packets may distinguish between TSA sample packets and STSA sample packets. Additional details regarding the TSA, STSA, and TSA sample sets are described below.

In some configurations, temporal sub-layer access (TSA) or sub-layer pictures may be currently signaled in the bitstream. The TLA picture will signal that both the thin Random Access (CRA) picture and the temporal layer switching point are unified. A CRA picture may indicate a Random Access Point (RAP) or a point from which a decoder may start decoding without accessing pictures that precede the CRA picture in decoding order. In some cases, a CRA picture may include intra-prediction slices (I-slices), which are decoded using intra-prediction.

As used herein, the term "temporal layer" refers to all pictures having the same temporal identifier (temporal _ id, tld, or temporald) or all pictures on the same temporal level. Additional details regarding temporal layers are described in greater detail below in conjunction with FIG. 2.

A temporal sublayer switching point is a picture that represents a point in the bitstream from which it is possible to start decoding a larger number of temporal layers than the temporal layers decoded before the switching point. In other words, the temporal sublayer switching point may indicate that pictures with a higher temporal ID than the current temporal ID may start decoding. In this case, the temporal sub-layer switching point is a temporal sub-layer on-switching point. Thus, there are no pictures after the switching point in the decoding order and the display order, which use any pictures before the switching point in the decoding order and the display order. The temporal sub-layer switch point may be signaled using the ISO base media file format.

In one configuration, STSA sample packets may be signaled using the ISO base media file format through NAL unit type transport. In other configurations, STSA sample packets may be signaled using the ISO base media file format by HEVC transport.

In other configurations, the NAL unit type may specify the type of Raw Byte Sequence Payload (RBSP) data structure included in the NAL unit. In one example, using NAL units of NAL unit types equal to 0 or in the range of 33-63 may not affect the decoding process specified in the various configurations. It should be noted that: in some configurations, NAL unit types 0 and 33-63 may be used as determined by various applications. NAL unit types 0 and 33-63 may be reserved for future use. In some configurations described herein, a decoder may ignore the content of NAL units that use reserved or unspecified values of NAL unit types.

Examples of NAL unit type codes and NAL unit type categories that may be implemented in accordance with the systems and methods described herein are included in table 1 below. It should be noted that: some configurations may include fields similar to or different from those described below.

In some configurations, some or all of the NAL fields in table 1 may be examples of different NAL unit types. In some configurations, a particular NAL unit type may be associated with different fields and syntax structures associated with one or more pictures. Further explanation of one or more fields is included below. It should be noted that table 1 below includes abbreviations for the following pictures: broken Link Access (BLA), Random Access Point (RAP), flag discard (TFD), and Instantaneous Decoding Refresh (IDR) pictures.

NAL unit type	Content of NAL unit	RBSP grammar structure
			0	Is not specified	N/A
1，2	Coding slice of non-TSA (TSA, non-STSA) tracking picture	slice_layer_rbsp()
			3，4	Coding slice of TSA picture	slice_layer_rbsp()
5，6	Coded slice of STSA picture	slice_layer_rbsp()
			7，8，9	Coding slice of BLA picture	slice_layer_rbsp()
10，11	Coding slice of IDR picture	slice_layer_rbsp()
			12	Coding slice of CRA picture	slice_layer_rbsp()
13	Coding slice of DLP (digital light processing) picture	slice_layer_rbsp()
			14	Coding slice of TFD picture	slice_layer_rbsp()
15…24	Retention
			25	Video parameter set	video_parameter_set_rbsp()
26	Sequence parameter set	seq_parameter_set_rbsp
			27	Picture parameter set	pic_parameter_set_rbsp()
28	Access unit delimiter	access_unit_delimiter_rbsp()
			29	End of sequence	end_of_seq_rbsp()
30	End of bit stream	end_of_bitstream_rbsp()
			31	Stuffing data	filler_data_rbsp()
32	Supplemental Enhancement Information (SEI)	sei_rbsp()
			33…47	Retention	N/A
48…63	Is not specified	N/A

TABLE 1

Table 1 was constructed with the following: NAL unit type (NAL _ unit _ type), content of NAL unit, and RBSP syntax structure. NAL units may provide an indication of the type of data that follows. For example, a NAL unit type of 5 or 6 may indicate that data for a coded slice of an STSA picture is to follow.

In table 1, the syntax may include a slice Raw Byte Sequence Payload (RBSP) syntax. Additionally or alternatively, the syntax may also include Supplemental Enhancement Information (SEI) RBSP syntax. The SEI RBSP may include one or more SEI messages. Each SEI message may include variables that specify the type (e.g., payloadType) and size (e.g., payloadSize) of the SEI payload. The resulting SEI payload size may be specified in bytes and is equal to the number of RBSP bytes in the SEI payload.

In table 1, when the value of the NAL unit type is equal to 5 or 6 for a specific picture, the specific picture may be referred to as a stepped temporal sub-layer access (STSA) picture. In some configurations, the temporal ID may not equal 0 when nal _ unit _ type is in the range of 3 to 6 (including 3 and 6, e.g., coded slices of TSA or STSA pictures).

STSA pictures may be coded pictures, where each slice has a nal _ unit _ type equal to STSA _ R or STSA _ N. The STSA _ R may indicate that the decoded STSA picture may serve as a reference for a later decoded picture. The STSA _ N may indicate that a decoded STSA picture cannot serve as a reference for a later decoded picture. STSA pictures cannot use pictures with the same temporal ID as STSA for intra prediction reference. Pictures having the same temporal ID as the STSA picture, which are positioned after the STSA picture in decoding order, cannot use pictures having the same temporal ID as the STSA picture, which are positioned before the STSA picture in decoding order, as intra prediction references. At the STSA picture, the STSA picture can be switched from on the immediately lower sub-layer to the sub-layer containing the STSA picture. STSA pictures must have a temporal ID greater than 0.

It should be noted that RefPicSetStCurrBefore, RefPicSetStCurrAfter, and RefPicSetLtCurr include all reference pictures that may be used for inter prediction of the current picture and may be used for inter prediction of one or more pictures following the current picture in decoding order.

When the current picture is an STSA picture, a picture having the same Temporal _ id as that of the current picture may not be included in RefPicSetStCurrBefore, RefPicSetStCurrAfter, and RefPicSetLtCurr. When the current picture is a picture following a picture having the same Temporal _ id as that of the current picture in decoding order, a picture having the same Temporal _ id as that of the current picture, which precedes the STSA picture in decoding order, may not be included in RefPicSetStCurrBefore, RefPicSetStCurrAfter, and RefPicSetLtCurr.

In some configurations, the systems and methods disclosed herein describe STSA pictures. STSA pictures may be coded pictures with NAL unit type equal to 5 or 6 per slice.

Signaling stepped temporal sub-layer access (STSA) sample packets may provide advantages over signaling temporal layer sample packets. For example, a STSA sample grouping may provide a clear marking and/or marking of STSA samples that also belong to the STSA sample grouping. This in turn provides for simple identification of temporal layer switching points in the samples. Additional benefits and advantages will be described hereinafter.

In some configurations, STSA sample packets may be sent in the transport of NAL units. The STSA sample packet may be a set of pictures stored using the ISO base media file format. The ISO base media file format may be specified according to ISO/IEC 14496-15. The ISO base media file format may also be extended to support the transport of HEVC.

The beneficial effect of signaling STSA sample packets in ISO base media file format is: additional syntax elements may be defined for the STSA sample packets. For example, an additional syntax element may provide the ability to know when the next temporal layer switching point will occur in the same temporal layer. This may be useful in determining when to adaptively switch to a new temporal layer.

Furthermore, STSA pictures may provide the ability to increase the frame rate of video in a stepwise manner. For example, the electronic device 102 may begin by receiving video only for the lowest temporal layer. Then, based on its decoding capability and/or current CPU load and/or available bandwidth, after a period of time, the electronic device 102 may wait for STSA pictures at the next higher temporal layer. From this point, the electronic device 102 may then begin decoding the lowest temporal sub-layer and the next higher temporal sub-layer. When the electronic device 102 encounters another STSA picture (e.g., an STSA picture with a higher temporal layer), the electronic device 102 may decide to wait and not start decoding the higher temporal sublayer if the electronic device has only recently switched up to the current highest temporal sublayer being decoded. In some cases, the decision to immediately continue with the up-switch or wait may be based on when the next STSA picture is about to appear. This information can be known from additional syntax elements that provide the ability to know when the next temporal layer switching point will occur at the same temporal layer.

As another example, additional syntax elements may be defined for STSA sample groupings that provide the ability to know when a switching point at the next temporal layer (i.e., STSA samples for higher temporal IDs) will occur at higher temporal layers. This may be beneficial to allow for the desired frame rate to be selected in a stepwise manner and to allow for temporal switching.

In some systems and methods of signaling STSA sample packets described herein, one or more indicators may be implemented to indicate STSA sample packets and/or STSA pictures in a bitstream. For example, in one configuration, NAL units may be used to indicate STSA pictures in a bitstream.

Various configurations are now described with reference to the figures, where like reference numbers may indicate functionally similar elements. The systems and methods as generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following more detailed description of the several configurations illustrated in the figures is not intended to limit the scope of the claims, but is merely representative of the systems and methods.

Figure 1 is a block diagram illustrating one or more electronic devices 102a-b in which systems and methods of signaling a stepped time sub-layer access (STSA) sample packet may be implemented. In this example, electronic device a102a and electronic device B102B are shown. It should be noted, however, that in some configurations, one or more features and functions described in relation to electronic device a102a and electronic device B102B may be incorporated into a single electronic device.

The electronic device a102a includes an encoder 104. Each element included in the electronic device a102a, such as the encoder 104 and the STSA sample grouping module 108, may be implemented in hardware, software, or a combination of both hardware and software.

The electronic device a102a may obtain the input picture 106. The input picture 106 may be captured on the electronic device a102a using an image sensor, retrieved from memory, and/or received from another electronic device.

The encoder 104 may include an STSA sample grouping module 108 and a file generator 153. As shown in fig. 1, the STSA sample grouping module 108 and the file generator 153 may be part of the encoder 104.

In other configurations, the STSA sample grouping module 108 and/or the file generator 153 may be separate from the encoder 304 and/or located at another electronic device 102. For example, the encoder 104 may be an HEVC encoder and the STSA sample grouping module 108 may reside in a file generator 153 located in the electronic device a102a, separate from the encoder 104. In this example, the STSA sample grouping module 108 may use information from the HEVC encoder to identify STSA samples. In another example, STSA sample grouping module 108 may analyze bitstream 10 produced by an HEVC encoder and identify STSA pictures and samples from bitstream 110.

The encoder 104 may encode the input picture 106 to produce encoded data, e.g., samples. For example, the encoder 104 may encode a series of input pictures (e.g., video) to obtain a series of samples.

As used herein, the term "sample" may be used as provided in the ISO base media file format standard. A "sample" as defined by the ISO base media file format standard may refer to all information associated with a single timestamp. Thus, no two samples share the same timestamp within a track. The "samples" defined by the ISO base media file format standard may also correspond to "access units" defined by the HEVC standard. For example, a sample may refer to a set of NAL units that are consecutive in decoding order and that include exactly one coded picture. In addition to coded slice NAL units of a coded picture, a sample may also include NAL units that do not contain slices of the coded picture. The decoding of samples always takes place in decoded pictures. The sample packet may include one or more samples. The grouping of samples may be an assignment of each sample in a track based on a grouping criterion, referred to as a portion of a grouping of samples. The sample groups in the sample grouping are not limited to adjacent samples and may include non-adjacent samples.

The encoder 104 may be a High Efficiency Video Coding (HEVC) encoder. In some configurations, the HEVC standard may define a storage format for video streams compressed using HEVC. The standard may be an extension of the ISO base media file format. In other words, the encoder 104 may encode the input picture 106 based on the ISO base media file format that has been extended to support HEVC video streams.

The encoder 104 may perform HEVC encoding using the use of known tools, such as parameter sets, temporal scalability sample grouping, and temporal sub-layer access (TSA) sample grouping. For example, temporal scalability sample grouping may provide a construction and grouping mechanism to indicate the association of access units with different hierarchical temporal scalability levels. Temporal layer access sample grouping may provide a construction and grouping mechanism to indicate the identification of access units as temporal sub-layer (TSA) samples. Further, in some cases, a Temporal Layer Access (TLA) sample packet may be referred to as a Temporal Sublayer Access (TSA) sample packet.

Table 2 shows an example of a temporal layer sample group entry.

TABLE 2

In table 2, a temporal layer sample group entry may define temporal layer information for all samples in a temporal layer. Time slots may be numbered using non-negative integers. Each temporal layer may be associated with a particular value called a temporal ID (e.g., temporal _ ID). Temporal layers associated with temporal _ id values greater than zero may reference all temporal layers with lower temporal _ id values. In other words, the temporal layer representation associated with a particular temporal _ id value may include all temporal layers associated with temporal _ id values equal to or less than the particular temporal _ id value.

Table 1 below provides a syntax that may be used in conjunction with the temporal layer sample packet entries shown in table 2.

Grammar for grammar

List 1

In list 1, temporalllayerld may be given the ID of the current temporal layer. A Video Coding Layer (VCL) NAL unit may have a temporal _ id equal to the temporal layerid for all samples that are members of the group of samples (as defined in ISO/IEC 23008-2). For the representation of the temporal layer identified by the temporalllayerid, tlprofile _ space, tlprofile _ idc, tlconstraint _ flags, tllevel _ idc, and tlprofile _ compatibility _ indications may include encodings as defined in ISO/IEC 23008-10.

For the representation of the temporal layer identified by temporalllayerld, tlMaxBitrate may provide a maximum rate, 1000 bits per second over any one second window. tlAvgBitRate may provide an average bit rate in 1000 bits per second for the representation of the temporal layer identified by temporalllayerld.

tlconstatframerate equal to 1 may indicate that the representation of the temporal layer identified by temporalllayerld has a constant frame rate. tlconstatframerate equal to 0 may indicate that the representation of the temporal layer identified by temporalllayerld may or may not have a constant frame rate. For the representation of the temporal layer identified by temporalllayerld, tlAvgFrameRate may provide an average frame rate in frames/(256 seconds).

The encoder 104 may perform HEVC coding using additional tools, such as STSA temporal scalability sample packets. The STSA samples may provide a construction and grouping mechanism to indicate the identity of the access units as STSA samples. For example, a STSA sample packet may indicate a STSA sample.

In one configuration, an HEVC video track (i.e., video stream) may include zero or one sample-to-group box instance with grouping _ type of 'tlaw'. The sample-to-group box instance may represent the label of the sample as a STSA point. There may be additional instances of sample group description boxes having the same grouping type. Table 3 below shows one example of a stepped temporal sub-layer sample group entry.

TABLE 3

In table 3, the sample set may be used to mark step-wise temporal sub-layer access (STSA) samples. Table 2 below provides one example of a syntax that may be used with the stepped temporal sub-layer sample group entry shown in table 3.

Grammar for grammar

classTemporalLayerEntry()extendsVisualSampleGroupEntry(′tlaw′)

{

}

List 2

Table 3 below provides another example of a syntax that may be used with the stepped temporal sub-layer sample group entries shown in table 3.

Grammar for grammar

List 3

In list 3, the next _ STSA _ updating _ distance may indicate the number of samples for a temporal layer with a temporal _ id (tid) equal to temporal _ id (tid) +1, after which STSA samples will appear at a temporal layer with temporal _ id equal to tId + 1. In other words, the next _ STSA _ switching _ distance may provide the ability to know when a switching point (i.e., STSA sample for a higher temporal ID) will occur at a higher temporal level at the next temporal level. This may be beneficial for determining the desired frame rate in a stepwise manner and for switching over in time.

In some configurations, the next _ stsa _ updating _ distance value may be similarly indicated for all higher temporal layers with temporal _ ID higher than tId (i.e., the temporal ID of the current temporal layer). Further, as used herein, in some cases, a temporal layer may also be referred to as a temporal sublayer.

The next _ STSA _ point _ distance may indicate the number of samples of the temporal layer with temporal _ id equal to tId (i.e., the temporal _ id of the sample), after which the stepped temporal sub-layer access (STSA) samples will reappear at the current temporal layer with temporal _ id equal to tId. In other words, the next _ stsa _ point _ distance may provide the ability to know when the next temporal sub-layer switch point will occur in the same temporal layer. This may be useful in determining when to adaptively switch to a new temporal layer. It should be noted that although the above description uses the group type 'tlaw', some other names may be used to represent the same purpose. For example, 'tsaw' or 'abcd' or 'zhgf' may be used.

In another configuration, an HEVC video track (i.e., video stream) may include zero or one example of a sample-to-group box with grouping _ type of 'tlas'. The sample-to-group box example may represent the label of a sample as a temporal layer access point (or temporal sub-layer access point). Additional instances of sample group description boxes having the same grouping type may also occur. Table 4 below shows one example of a temporal layer sample group entry.

TABLE 4

In table 4, the sample set is used to label the temporal sub-layer access (TSA) and stepped temporal sub-layer access (STSA) samples. Table 4 provides a syntax that may be used with the temporal layer sample group entries shown in table 4.

Grammar for grammar

List 4

In table 4, typeTSAFlag equal to 1 may indicate that the sample group is a temporal sub-layer access TSA sample. Otherwise, typeTSAFlag equal to 0 may indicate that the sample packet is a stepped temporal sublayer access STSA sample. Alternatively, some other flag and indicator with a predefined value to distinguish TSA and STSA samples may also be signaled together with the temporal layer sample group entry.

Returning to fig. 1, encoded data may be included in the bitstream 110. The encoder 104 may generate overhead signaling based on the input picture 106. It should be noted that in some configurations, the STSA sample grouping module 108 may be included in the encoder 104.

In some configurations, the STSA sample grouping module 108 can send or share STSA sample groupings with one or more electronic devices. In one example, electronic device a102a can send one or more STSA sample packets to electronic device B102B. The STSA sample packets may be sent in an ISO base media file format. One benefit of generating STSA sample packets in the ISO base media file format may include explicitly marking and/or flagging STSA samples as belonging to a particular sample packet.

The encoder 104 (and, for example, the STSA sample packet module 108) may generate a bitstream 110. The bitstream 110 may include encoded data based on the input picture 106. In one example, the bitstream 10 may include encoded pictures based on the input picture 106.

In some configurations, encoder 104 may include a file generator 153. The encoded data may be stored and transmitted as a file 151. For example, the file generator 153 may store the bitstream 110 in a file format, such as the ISO base media file format. Thus, information included in the bitstream 110 as described herein may be stored and placed in the file 151.

In some configurations, the bitstream 110 may also include overhead data, such as slice header information, PPS information, SPS information, APS information, VPS information, and so forth. The bitstream 110 may also include other data, some examples of which are described herein. As a result of encoding the additional input pictures 106, the bitstream may include one or more STSA sample packets. Additionally or alternatively, the bitstream 110 may include other encoded data.

The bitstream 110 and/or the file 151 including the bitstream information may be provided to the decoder 112. In one example, the bitstream 110 may be transmitted to the electronic device B102B using a wired or wireless connection. In some cases, this may be accomplished over a network, such as the Internet, a Local Area Network (LAN), or other type of network for communicating between devices.

The file 151 may be similarly transferred to the electronic device 102 b. Further, the file 151 may be provided to the electronic device 102b in a variety of ways, for example, the file 151 may be copied from a server, mailed on a storage medium, electronically transmitted, sent in a message, and so forth.

As shown in fig. 1, the decoder 112 may be implemented on the electronic device B102B separately from the encoder 104 on the electronic device a102 a. It should be noted that in some configurations, the encoder 104 and the decoder 112 may be implemented on the same electronic device. For example, the decoder 112 may be a Hypothetical Reference Decoder (HRD). In implementations where the encoder 104 and decoder 112 are implemented on the same electronic device, for example, the bitstream 110 may be made available to the decoder in a variety of forms. For example, the bitstream 110 may be provided to the decoder 112 on a bus, or may be stored in a memory for retrieval by the decoder 112. The decoder 112 may be implemented in hardware, software, or a combination of both. In one configuration, decoder 112 may be an HEVC decoder.

In some configurations, decoder 112 may include a file parser 155. For example, the file parser 155 may be an ISO base media file format parser. File parser 155 may receive file 151 and obtain file data. Once the file data is obtained, the decoder 112 may process the data in the same manner as the received bitstream data described below.

As shown in fig. 1, the STSA sample packet reception module 120 and the file parser 155 may be part of the decoder 112. In some configurations, the STSA sample packet reception module 120 and the file parser 155 may be separate from the decoder 112. For example, the STSA sample packet reception module 120 and/or the file parser 155 may be located on the electronic device 102 separate from the decoder 112.

Decoder 112 may obtain (e.g., receive) bitstream 110 and/or file 151. Decoder 112 may generate one or more decoded pictures 114 based on bitstream 110 and/or file 151.

Decoded picture 114 may include one or more decoded pictures and may be displayed, played back, stored in memory, and/or transmitted to other devices, and so forth. Decoder 112 may include an STSA sample packet receiver module 120. The STSA sample packet receiver module 120 may enable electronic device B102B to obtain STSA sample packets from the bitstream 110. The STSA samples in the sample packet may assist the decoder 112 in decoding the decoded picture 114. The STSA samples may comprise a collection of NAL units that are consecutive in decoding order used to decode the coded picture 114.

Electronic device B102B may also perform one or more operations on the bitstream 110 and/or the file 151. In one example, operations or processing may be performed on the bitstream 110 and/or file 151 based on whether STSA sample packets are present. In some configurations, the decoder 112 or other unit on the electronic device B102B may perform operations on the bitstream 110 and/or the file 151. In addition, other operations may also be performed on the bitstream 110 and/or the file 151.

In some configurations, electronic device B102B may output decoded picture 114. In one example, the decoded picture 114 may be transmitted to other devices or transmitted back to electronic device a102 a. In one configuration, the decoded picture 114 may be stored or otherwise maintained on the electronic device B102B. In another configuration, electronic device B102B may display decoded picture 114. In another configuration, the decoded picture 114 may include elements of the input picture 106 having different characteristics based on operations performed on the bitstream 110. In some configurations, decoded picture 114 may be included in the stream of pictures and/or samples at a different resolution, format, characteristic, or other attribute than input picture 106.

It should be noted that one or more units or components included in the electronic device 102 may be implemented in hardware. For example, one or more of these units or components may be implemented as a chip, a circuit or a hardware component, and so on. It should also be noted that one or more of the functions or methods described herein may be implemented using hardware. For example, one or more of the methods described herein may be implemented and/or carried out using a chipset, an Application Specific Integrated Circuit (ASIC), a large scale integrated circuit (LSI), or an integrated circuit, among others.

Fig. 2 is a block diagram showing two examples of coding structures. Example a230a shows the coding structure when temporal sub-layer access (TSA) pictures 228 are used. In some cases, a temporal layer access picture may be referred to as a temporal sub-layer access (TSA) picture. Example B230B shows the coding structure when one or more stepped temporal sub-layer access (STSA) pictures 229a-B are used.

The horizontal access in example a230a represents the output order 222a of the pictures in the coding structure. The output order may start at zero and count up (e.g., from left to right) and may identify the corresponding pictures in the coding structure. By way of example, example A230 has an output order 222a from 0-16 corresponding to pictures 0-16, respectively.

In example a230a, the vertical access represents a temporal layer 218 (e.g., a temporal sub-layer or sub-layer). Each temporal layer 218a-n may include one or more pictures. Each picture on the same temporal layer 218 may have the same temporal identifier. For example, all pictures on temporal layer a218a may have a temporal _ id equal to 0, all pictures on temporal layer B218B may have a temporal _ id equal to 1, all pictures on temporal layer C18C may have a temporal _ id equal to 2, all pictures on temporal layer N218N may have a temporal _ id equal to N-1, etc.

As shown in example A230a, there are multiple temporal layers 218 a-n. For example, there may be 2, 3, 4, 8, 16, etc. temporal layers 218. In the case of HEVC base features, there may be up to 8 temporal layers. Each temporal layer 218 may include a different number of pictures. In some configurations, the temporal layers 218 are organized in a hierarchical fashion. Each higher temporal layer 218 above the base layer (e.g., temporal layer a 218) may include more pictures than the previous lower temporal layer 218. For example, temporal layer N218N may include twice as many pictures as temporal layer C218C includes, and temporal layer C218C may include twice as many pictures as temporal layer B218B includes. Higher temporal layers 218 with a greater number of pictures may provide a higher frame rate to the decoded pictures. Thus, a greater number of pictures may be decoded at higher temporal layers 218.

Each temporal layer 218 may have various picture, slice types, and sample packet types. For example, temporal layer a218a may have: pictures with intra prediction slices (I-slices) and pictures with prediction slices (P-slices). Temporal layer C218C may have pictures with double predicted slices (B-slices). Temporal layer B218B may have pictures with P slices and pictures with B-slices.

In example a230a, a TSA picture 228 is shown. For example, the TSA picture 228 may be the 12 th picture in the output order 222 a. The TSA pictures may be thin random access (CRA) pictures and/or temporal layer switching points.

The electronic device 102 may use the temporal layer switching function indicated by the TSA picture 228 to switch between the temporal layers 218. For example, electronic device 102 may use temporal layer switching points to indicate a switch between temporal layer a218a and temporal layer B218B or any temporal layer above temporal layer B (i.e., temporal layer C. Accordingly, the electronic device 102 may begin decoding any higher temporal layers at the TSA picture 228.

Example B230B shows the coding structure when one or more stepped temporal sub-layer access (STSA) pictures 299a-B are used. STSA pictures 229a-b may be interleaved between temporal layers 219. In other words, STSA pictures 229a-b may be located at different temporal layers. For example, STSA picture 229a may be on temporal layer B219B and STSA picture 229B may be on temporal layer C219C. In example B230B of fig. 2, STSA229a and STSA 229B are respectively taken as pictures 4 and 10 shown in output order 222B.

STSA picture 229 may be a picture associated with a STSA sample group. For example, STSA picture 229 may be a coded picture associated with a set of NAL units that are consecutive in decoding order.

Example B230B includes temporal layers 291a-n and an output order 222B similar to corresponding temporal layers 218a-n and output order 222a described in connection with example A230a of FIG. 2. For example, example B230B may have pictures output in output order 222B of 0-16.

Each temporal layer 291a-n may include one or more pictures. Each picture on the same temporal layer 219 may have the same temporal identifier. For example, all pictures on temporal layer B219B may have the same temporal _ id. Temporal layers 219a-n may be organized in a hierarchical manner in which each higher temporal layer 219 above the base layer (e.g., temporal layer a219 a) has more pictures than lower temporal layers 219. For example, temporal layer N219N may have 8 pictures, while temporal layer B218B may have 2 pictures. Higher temporal layers 219 with a larger number of pictures may provide a higher frame rate to the decoded pictures.

In example B230B, STSA pictures 229a-B are shown. For example, STSA picture 229a may be the 4 th picture in output order 222 b. The STSA picture coding structure with STSA sample packets may provide a clear flag or marking of STSA samples belonging to a STSA sample packet. For example, STSA picture 229 may indicate when the next temporal layer switching point will occur at the same temporal layer (e.g., temporal layer B219B). This may be useful in determining when to adaptively switch to a new temporal layer.

During decoding of pictures from temporal layer a219a and temporal layer B219B, the electronic device 102 may receive an indication of STSA pictures 229B. The STSA picture 229b may indicate a stepped temporal sub-layer switching point to the electronic device 102. At this point, the electronic device 102 may begin receiving temporal layer C219C pictures (or stop dropping temporal layer C219C pictures) and may begin decoding temporal layer a 291a, temporal layer B219B, and temporal layer C219C pictures. In this manner, when the electronic device 102 receives the STSA picture 229b, the electronic device 102 may determine when to adaptively switch to the new temporal layer 219.

Additionally, STSA picture 229 may indicate when the next temporal layer upper switching point (i.e., STSA sample for a higher temporal layer ID, e.g., temporal layer C219C) will occur at a higher temporal layer. This may be beneficial to allow for a desired frame rate to be selected in a stepwise manner and to allow for temporal switching.

For example, using STSA pictures 229 in multiple temporal layers 219 allows for the selection of a desired frame in a stepwise manner. For example, "F" hertz (Hz) may represent the full frame rate. In this example, all temporal layers 219a-n are used, and each temporal layer 219a-n may represent a different frame rate. Temporal layer a219a, temporal layer B219B, temporal layer C219C, and temporal layer N219N may have time IDs of 0, 1, 2, and 3, respectively. The full frame rate uses all temporal layers 219. In other words, a full frame rate of F Hz uses all pictures with temporal _ id equal to 0, 1, 2, 3. By way of example, this may include all 16 pictures shown in example B230B.

However, in some instances, sub-streams (i.e., a subset of frames or less than a full frame rate) may be used. For example, a substream using only temporal _ id of 0, 1, 2 may use half of the full frame rate represented by F/2 Hz. For example, this approach may include all pictures in temporal level a219a through temporal level C219C shown in example B230B, or 8 pictures.

Substreams using only temporal _ ids of 0 and 1 may use one quarter of the full frame rate represented by F/4 Hz. For example, this may include all pictures in temporal layer a219a through temporal layer B219B shown in example B230B, or 4 pictures.

A substream using only a temporal _ id of 0 may use one eighth of the full frame rate represented by F/8 Hz. For example, this approach may include only the pictures in temporal level a219a shown in example B230B, or 2 pictures.

In some configurations, the available bandwidth may determine that the electronic device 102 may transmit a full frame rate (e.g., F Hz) or a partial frame rate (e.g., F/2Hz, F/4Hz, F/8 Hz). Thus, each temporal layer 219a-n and corresponding temporal identifier may be transmitted separately as its own multicast group.

In some configurations, the lowest frame rate (e.g., F/8Hz) is transmitted first as the multicast group. In addition, higher frame rates (e.g., F/4Hz, F/2Hz, and F Hz) may be transmitted as additional multicast groups, respectively. For example, the electronic device 102 may initially receive the bitstream 110 including a multicast group substream (F/8Hz) with only temporal layer a219a pictures (e.g., temporal _ id ═ 0). Subsequently, the bitstream 110 may start to additionally include a multicast group substream (F/4Hz) having temporal layer a219a and temporal layer B219B pictures (e.g., temporal _ id ═ 1 and 2). However, the electronic device 102 cannot immediately start decoding the time-layer B219B picture. Instead, the electronic device 102 must discard the temporal layer B219B picture.

During the subsequent reception of pictures from temporal layer a219a and temporal layer B219B, the electronic device 102 may receive an indication of STSA pictures 229 a. For example, the indication may be a sample packet indication indicating a NAL unit type of the STSA picture 229a or STSA. The STSA picture 229a may indicate a stepped temporal sub-layer switching point to the electronic device 102. At this point, the electronic device 102 may begin decoding pictures for both temporal layer a219a and temporal layer B219B.

The electronic device 102 may continue to receive pictures from additional temporal layers 219, such as pictures in temporal layer C219C and temporal layer N219N. With the additional temporal layer 219, the electronic device 102 can receive additional STSA pictures 229 (e.g., STSA pictures 229b) to indicate additional stepped temporal sub-layer switching points. Thus, the electronic device 102 may switch to the full frame rate of F Hz using the STSA pictures 229 as temporal sub-layer switching points. Thus, in this manner, STSA pictures 229 allow for a desired frame rate to be selected in a stepwise manner.

Figure 3 is a block diagram illustrating one configuration of an encoder 304 and a STSA sample grouping module 308 on an electronic device 302. The electronic device 302 may be one example of the electronic device 102 described above in connection with fig. 1. For example, the electronic device 302 and the encoder 304 may correspond to the electronic device a102a and the encoder 104 of fig. 1. As shown in fig. 3, the STSA sample grouping module 308 may be separate from the encoder 304. In other configurations, the STSA sample grouping module 308 can be part of the encoder 304.

One or more of the elements shown as included in electronic device 302 may be implemented in hardware, software, or a combination of both. The electronic device 302 may include an encoder 304, which encoder 304 may be implemented in hardware, software, or a combination of both. The encoder 304 may be implemented as a circuit, integrated circuit, Application Specific Integrated Circuit (ASIC), processor in electronic communication with a memory having executable instructions, firmware, Field Programmable Gate Array (FPGA), etc., or a combination thereof. In some configurations, encoder 304 may be an HEVC encoder. The encoder 304 may encode based on the ISO base media file format.

The electronic device 302 may include a source 334. Source 334 may provide picture or image data (e.g., video) as input picture 306 to encoder 304. Examples of sources 334 may include image sensors, memory, communication interfaces, network interfaces, wireless receivers, ports, and the like.

One or more input pictures 306 may be provided to an intra prediction module and reconstruction buffer 340. The input picture 306 may also be provided to a motion estimation and motion compensation module 366 and a subtraction module 346.

The intra prediction module and reconstruction buffer 340 may generate intra mode information 358 and intra signals 342 based on one or more input pictures 306 and reconstruction data 380. The motion estimation and motion compensation module 366 may generate the inter mode information 368 and the inter signal 344 based on the one or more input pictures 306 and a reference picture buffer output signal 398 of the reference picture buffer 396. In some configurations, the reference picture buffer 396 may include data from one or more reference pictures in the reference picture buffer 396.

The encoder 304 may select between the intra signal 342 and the inter signal 344 according to a mode. In intra coding mode, in order to take advantage of spatial characteristics within a picture, an intra signal 342 may be used. In the inter-coding mode, in order to utilize temporal characteristics between pictures, an inter signal 344 may be used. During intra-coding mode, the intra-frame signal 342 may be provided to the subtraction module 346 and the intra-mode information 358 may be provided to the entropy encoding module 360. During being in the inter-coding mode, the inter signal 344 may be provided to the subtraction module 346 and the inter mode information 368 may be provided to the entropy encoding module 360.

At subtraction module 346, either intra signal 342 or inter signal 344 (depending on the and mode) is subtracted from input picture 306 to generate prediction residual 348. The prediction residual 348 may be provided to a transform module 350. The transform module 350 may compress the prediction residuals 348 to generate a transformed signal 352, which transformed signal 352 may be provided to the quantization module 354. A quantization module 354 quantizes the transformed signal 352 to produce Transformed and Quantized Coefficients (TQC) 356.

The TQC 356 may be provided to entropy encoding module 360 and inverse quantization module 370. The inverse quantization module 370 performs inverse quantization on the TQC 356 to produce an inverse quantized signal 372, which inverse quantized signal 372 may be provided to an inverse transform module 374. The inverse transform module 374 decompresses the inverse quantized signal 372 to produce a decompressed signal 376, which may be provided to a reconstruction module 378.

The reconstruction module 378 may generate reconstruction data 380 based on the decompressed signal 376. For example, the reconstruction module 378 may reconstruct (modify) the picture. The reconstruction data 380 may be provided to a deblocking filter 382 and the reconstruction data 380 may be provided to an intra prediction module and reconstruction buffer 340. The deblocking filter 382 may generate a filtered signal 384 based on the reconstruction data 380.

The filtered signal 384 may be provided to a Sample Adaptive Offset (SAO) module 386. The SAO module 386 may generate SAO information 388 provided to the encoding module 360 and SAO signals 390 provided to the Adaptive Loop Filter (ALF) 392. ALF 392 generates ALF signal 394 that is provided to reference picture buffer 396. ALF signal 394 may include data from one or more pictures that may be used as reference pictures.

Entropy encoding module 360 may encode TQC 356 and provide an output to NAL unit module to produce bitstream a 310a or other signals. Further, entropy encoding module 360 may encode TQC 356 using Context Adaptive Variable Length Coding (CAVLC) or Context Adaptive Binary Arithmetic Coding (CABAC). In particular, the entropy encoding module 360 may encode the TQC 356 based on one or more of the intra-mode information 358, the inter-mode information 368, and the SAO information 388.

In some configurations, NAL unit module 324 may generate a set of NAL units. For example, NAL units may be used to decode coded pictures (e.g., STSA pictures). For example, NAL unit module 324 may associate NAL units having type 5 and/or 6 values (as shown in table 1 above) with STSA picture 329.

In some configurations, bitstream a 310a may include encoded picture data. In one example, bitstream a 310a passes to STSA sample packet module 308 before bitstream a 310a is sent from electronic device 302 or as bitstream B110B to other electronic devices 102.

Quantization involved in video compression (e.g., HEVC) is a lossy compression technique implemented by compressing a range value to a single quantized value. The Quantization Parameter (QP) is a predetermined scaling parameter for performing quantization based on both the reconstructed video and the compression ratio. The block type is defined in HEVC for representing the characteristics of a given block based on the size of the block and its color information. The QP, resolution information, and block type may be determined prior to entropy encoding. For example, the electronic device 302 (e.g., encoder 304) may determine the QP, resolution information, and block type, which may be provided to the entropy encoding module 360.

Entropy encoding module 360 may determine the size of the block based on the blocks of TQC 356. For example, the size of a block may be the number of TQC 356 along one dimension of the block of TQCs. In other words, the number of TQCs 356 in a TQC block may be equal to the square of the block size. For example, the block size may be determined as the square root of the number of TQCs 356 in the TQC block. Resolution may be defined as the pixel width multiplied by the pixel height. The resolution information may include the number of pixels for the picture width, for the picture height, or both. The block size may be defined as the number of TQCs 356 along one dimension of the 2D block of TQCs.

In some configurations, an STSA sample grouping module 308 is included in the electronic device 302. STSA sample grouping module 308 can provide a construction and grouping mechanism to indicate the identity of access units as STSA samples.

NAL unit module 324 can send bitstream a 310a or other signal containing one or more pictures to STSA sample grouping module 308. STSA sample packet module 308 can process STSA sample packets 330 and corresponding STSA pictures 329. In this case, the intra prediction module and reconstruction buffer 340, transform module 350, quantization module 354, entropy coding module 360, motion estimation and motion compensation module 366 have encoded the STSA picture 329 such that a set of consecutive NAL units in decoding order is associated with the STSA picture 329 in the STSA sample packet 330.

In some configurations, the STSA sample grouping module 308 can generate a set of NAL unit types related to the encoded input picture 306. The encoded input picture 306 may be an encoded STSA picture 329.

Additionally, the STSA sample grouping module 308 can modify or generate a set of NAL unit types that are transmitted with a bitstream B310B or file (not shown), which bitstream B310B or file can be stored on the electronic device 302 or transmitted to other electronic devices 102. STSA sample packet 330 may also include one or more samples. The sample may be a STSA sample. Each sample in STSA sample packet 330 may include a corresponding STSA picture 329.

In this way, the other electronic device 102 can be provided with a clear marking and/or labeling of the STSA samples. Additionally, STSA sample grouping 330 may provide a mechanism that allows for simple identification of temporal layer switching points among samples.

The STSA sample grouping module 308 may also include various modules or sub-modules for generating one or more STSA sample groupings 330 associated with the input pictures 306. For example, STSA sample grouping module 308 may include an ISO base media file format module 326 or other module for generating STSA pictures 329 using STSA sample groupings 330 associated with input pictures 306.

ISO base media file format module 326 may assist STSA sample grouping module 308 in constructing STSA sample groupings 330. ISO base media file format module 326 may provide ISO base media file format information to other modules. For example, ISO base media file format module 326 may provide ISO base media file formatting to NAL unit module 324.

As another example, the ISO base media file format module 326 may provide ISO base media file formatting to various modules in the encoder 304 to allow the ISO base media file formatting to be extended to HEVC encoding. In this way, the encoder 304 may signal HEVC STSA samples using the ISO base media file format.

In some configurations, the bitstream B310B or a file (not shown) may be transmitted to other electronic devices 102. For example, bitstream B310B or file 351 may be provided to a communication interface, a network interface, a wireless transmitter, a port, or the like. For example, the bitstream B310B or the file 351 may be transmitted to the other electronic device 102 through a LAN, the internet, a cellular phone base station, or the like. The bitstream B310B or the file 351 may additionally or alternatively be stored in memory or other components on the electronic device 302.

Figure 4 is a flow diagram illustrating one configuration of a method 400 for signaling a Stepped Time Sublayer Access (STSA) sample packet. The electronic device 302 can encode 402 the STSA sample packets 330. The STSA sample packets 330 may correspond to one of the input pictures 306 or the stream of input pictures 306 obtained by the electronic device 302.

Encoding 402 the STSA sample packets 330 may include representing the input pictures 306 as digital data. For example, encoding 402 the STSA sample packets 330 may include generating a string of bits representing characteristics (e.g., color, brightness, spatial location, etc.) of the input picture 306. In some cases, input picture 306 may be encoded as STSA picture 329. One or more encoded STSA pictures 329 and/or STSA sample packets 330 can be included in the bitstream 310 and can be transmitted to another electronic device 102 that includes the decoder 112.

The electronic device 302 can send 404 a STSA sample packet 330. Sending 404 STSA sample packets 330 may include transmitting data (e.g., bitstream 310 or file 351) between components of electronic device 102, or transmitting bitstream 310 or file 351 between one or more electronic devices 102. In the case of file 351, the STSA sample packets may be stored in file 351 and file 351 may be sent 404 to electronic device 102.

In one example, an encoder 304 on the electronic device 302 can send a bitstream 310 including one or more STSA pictures 329 and/or one or more sample packets 330 to the electronic device 102. In some configurations, the bitstream 310 may be transmitted to the decoder 112 on the electronic device B102B. For example, STSA sample packets 330 may be constructed in the ISO base media file format and the STA sample packets 330 may be sent in NAL unit transport.

Figure 5 is a flow diagram illustrating a more specific configuration of a step-wise temporal sub-layer access (STSA) sample grouping signaling method 500. The electronic device 302 can obtain 502 a STSA picture 329. For example, the input picture 306 may be an STSA picture 329. The electronic device 302 can determine 504 a set of NAL units based on the STSA picture.

The electronic device 302 can generate 506 a sample packet 330 that includes a set of NAL units and a corresponding STSA picture 329. The electronic device 302 may encode 508 the STSA sample packets 330 based on (e.g., using) an ISO base media file format that has been extended to support HEVC video streams. For example, the electronic device 302 can encode the input pictures 306 into STSA pictures 329 corresponding to the STSA sample packets 330. As described above with respect to fig. 4, electronic device 302 can encode 508 STSA pictures 329.

In some configurations, the STSA sample packet 330 may also include a type TSA flag. The type TSA flag may indicate whether the samples in sample grouping 330 are TSA samples or STSA samples. The type TSA flag may be typeTSAFlag. For example, typeTSAFlag equal to 1 may indicate that the sample packet is a TSA sample. Otherwise, typeTSAFlag equal to 0 may indicate that the sample packet is a STSA sample.

The encoded STSA pictures and/or corresponding STSA sample packets 330 may be constructed in an ISO base media file format. For example, tables 2 through 4 provide examples of ISO base media file format syntax that may be used to construct STSA sample packets 330 in ISO base media file format, e.g., using sample group description boxes (SGPDs). For example, table 3 shows an example of a stepped temporal sub-layer sample group entry. In table 3, the sample set can be used to label STSA samples.

In some configurations, STSA sample packets 330 may indicate a desired frame rate in a stepwise manner and may be used for temporal switching. For example, STSA sample grouping 330 may provide additional syntax that may provide the ability to know when a switching point on the next temporal layer (i.e., STSA samples for higher temporal IDs) will occur at a higher temporal layer. In some configurations, STSA sample packet 330 may indicate all higher temporal layers with temporal IDs higher than that of the temporal layer of samples.

In some configurations, STSA sample packet 330 may indicate when to adaptively switch to a new temporal layer. For example, STSA sample packets 330 may provide additional syntax that provides the ability to know when the next temporal layer switch point will occur at the same temporal layer.

The ISO base media file format may also be extended to support HEVC video streams. In this way, STSA sample packets 330 may be formatted in the ISO base media file format while still incorporating the advantages and functionality of HEVC.

The electronic device 302 can send 510 STSA sample packets 330. For example, STSA sample packets 330 may be sent in NAL unit transport. The NAL unit transport may include a set of NAL units in sequential decoding order and include one coded picture. For example, NAL unit transport may include encoded STSA pictures 329.

Sending STSA sample packets 330 may include transmitting data (i.e., bitstream 310 or file 351) between components of electronic device 102, or transmitting bitstream 310 and/or file 351 between one or more electronic devices 102. Further, sending 510 STSA sample packets 330 may include other similar methods of transferring data between one or more electronic devices 102. In the case of transmitting the file 351, STSA sample packets in NAL units may be stored in the file 351, and the file 351 may be transmitted 510 to the electronic device 102.

Figure 6 is a block diagram illustrating one configuration of a decoder 612 and an STSA sample packet receive module 620 on an electronic device 602. The electronic device 602 and the decoder 612 may be one example of the electronic device 102 and the decoder 112 described in connection with fig. 1. As shown in fig. 6, the STSA sample packet receiving module 620 may be separate from the decoder 612. In other configurations, the STSA sample packet reception module 620 may be part of the decoder 612.

The electronics 612 can receive the bitstream 610. For example, the STSA sample packet reception module 620 may receive the bitstream a 610a and/or the file 651. It should be noted that while fig. 6 refers to the electronic device 602 receiving and processing data from the bitstream 610, the electronic device may also receive and process data from the file 651. For example, the file 651 may include bitstream data stored in an ISO base media file format.

In one configuration, bitstream a 610a and/or file 651 may include or be accompanied by one or more STSA sample packets 630. STSA sample packets 630 may include corresponding STSA pictures 629. The STSA sample packet reception module 620 may provide a construction and grouping mechanism to indicate (e.g., to the decoder 612) the identity of the access units that are STSA samples.

In another configuration, the electronic device 602 receives the bitstream a 610a and/or the file 651 and sends the bitstream a 610a and/or the file 651 through the STSA sample packet receive module 620 to produce the bitstream B610B. STSA sample packet receive module 620 can obtain STSA sample packets 630. Sample packet 630 may include an encoded picture and a set of NAL units used by decoder 612 to decode the encoded picture.

The STSA sample packet reception module 620 can identify marked and/or flagged STSA samples obtained at the electronic device 602. STSA sample grouping 630 may also allow for simple identification of temporal layer switching points in samples.

STSA sample packet receiving module 620 can include various modules or sub-modules for receiving STSA pictures 629 and/or sample packets 630 from bitstreams 610 and/or files 651. For example, STSA sample packet receiving module 620 may include NAL unit module 624, ISO base media file format module 626, or other modules to receive sample packets 630 and/or STSA pictures 629 from bitstream 610 and/or file 651 before passing through certain units of decoder 612. The STSA sample packet reception module 620 may also include STSA sample packets 330 and/or STSA pictures 629 decoded by the decoder 612.

In some configurations, NAL unit module 624 can help decoder 612 obtain the NAL unit type from bitstream a 610a and/or file 651. For example, a set of NAL units can be associated with STSA picture 629.

In one configuration, NAL unit module 624 can receive the set of NAL units and provide the NAL unit types to decoder 612. In some examples, NAL unit module 624 may provide the NAL unit type to NAL unit module 624a located in decoder 612. For example, NAL unit module 624 can obtain NAL units having type values of 5 and/or 6 (as shown in table 1 above) related to received encoded STSA picture 629 and provide the NAL unit values to NAL unit module 624 a.

NAL unit module 624 can also obtain NAL unit traffic for transporting data about STSA sample packets 630 and/or STSA pictures 629. For example, the STSA sample packets 630 may be structured in the ISO base media file format and the STSA sample packets 630 are received in the in-transit bitstream 610 and/or file 651 of the NAL unit.

ISO base media file format module 626 may assist STSA sample packet reception module 620 in obtaining STSA sample packets 630. One or more received sample packets may be constructed in an ISO base media file format.

ISO base media file format module 626 may also provide ISO base media file format information to other modules. For example, ISO base media file format module 626 may provide ISO base media file formatting to NAL unit module 624 to assist NAL unit module 624 in obtaining NAL unit types. As another example, the ISO base media file format module 626 may provide ISO base media file formatting to various modules in the decoder 612 to allow the ISO base media file format to be extended to HEVC decoding. In this way, decoder 612 may decode HEVC STSA samples by using ISO base media file formatting.

A decoder 612 may be included in the electronic device 602. For example, the decoder 612 may be an HEVC decoder and/or an ISO base media file format parser. For example, the decoder 612 may also decode HEVC files based on the ISO base media file format. The decoder 612 and/or one or more of the illustrated units included in the encoder 612 may be implemented in hardware, software, or a combination of both hardware and software.

The decoder 612 may receive the bitstream B610B (e.g., one or more encoded pictures included in the bitstream B610B) from the STSA sample packet reception module 620. It should be appreciated that bitstream B610B from STSA sample packet receive module 620 includes picture data received by electronic device 602 as bitstream a 610a and/or file 651. In other words, the bitstream B610B data may be based on data obtained from the bitstream a 610a and/or the file 651.

In some configurations, the received bitstream B610B may include received overhead information, such as a received slice header, a received PPS, received buffer description information, and so forth. The coded pictures included in the bitstream B610B may include one or more coded reference pictures and/or one or more other coded pictures.

The received symbols (in one or more coded pictures included in the bitstream B610B) may be entropy decoded by an entropy decoding module 668, thereby producing a motion information signal 670 and quantized and scaled and/or transformed coefficients 672.

At motion compensation module 674, the motion information signal 670 may be combined with a portion of the reference frame signal 698 from frame memory 678, which may produce an inter-prediction signal 682. The quantized, de-scaled and/or transformed coefficients 672 may be de-quantized, scaled and inverse transformed by an inverse module 662, thereby producing a decoded residual signal 684. The decoded residual signal 684 may be added to the prediction signal 692 to produce a combined signal 686. The prediction signal 692 may be a signal selected from the inter-prediction signal 682 produced by the motion compensation module 674 or alternatively the intra-prediction signal 690 produced by the intra-prediction module 688. In some configurations, such signal selection may be based on the bitstream 610 and/or the file 651 (as controlled by the bitstream 610 and/or the file 651).

The intra-prediction signal 690 may be predicted (as in the current frame) based on previously decoded information from the combined signal 686. The combined signal 686 may also be filtered by a deblocking filter 694. The resulting filtered signal 696 may be written to the frame memory 678. The resulting filtered signal 696 may include decoded pictures.

The frame memory 678 may include overhead information corresponding to decoded pictures. For example, the frame memory 678 may include slice headers, parameter information, loop parameters, buffer description information, and the like. One or more of these pieces of information may be signaled from an encoder (e.g., encoder 104). Frame memory 678 may provide decoded picture 618 or other output signals.

In some configurations, decoder 612 may include NAL unit module 624 b. NAL unit module 624b may receive NAL unit information from NAL unit module 624 located in STSA sample packet reception module 620. NAL unit module 624b may provide NAL unit information to entropy decoding module 668 or another component in decoder 612. NAL unit information from NAL unit module 624b may assist decoder 612 in decoding the encoded picture.

Figure 7 is a flow diagram illustrating one configuration of a method 700 for receiving a stepped temporal layer access (STSA) sample packet. The electronic device 602 may receive 702 a bitstream 610 and/or a recordable storage medium, such as a file 651. Receiving 702 the bitstream 610 and/or the file 651 may include obtaining, reading, or otherwise accessing the bitstream 610. In some configurations, the bitstream 610 and/or the file 651 may be received from the encoder 104 on the same electronic device or a different electronic device 102. For example, electronic device B102B may receive the bitstream 110 and/or the file 651 from the encoder 104 on electronic device a102 a.

In some configurations, the electronic device 602 may include a decoder 612 that receives the bitstream 610 and/or the file 651. The bitstream 610 and/or file 651 can include encoded data based on one or more input pictures 106.

The electronic device 602 can obtain 704 an STSA sample packet 630. STSA sample packet 630 may include one or more samples. The electronic device 602 can obtain the STSA sample packets 630 from the bitstream 610 and/or the file 651. In other words, the bitstream 610 and/or the file 651 may include STSA sample packets 630. STSA sample packets 630 may include NAL unit sets and corresponding encoded STSA pictures 629.

The electronic device 602 can decode 706 the STSA sample packet 630. For example, the decoder 612 may decode 706 a portion of the bitstream 610 and/or the file 651 to produce a sample packet 630. As described above, the STSA sample packets 630 may provide a clear marking and/or marking of the STSA samples belonging to the STSA sample packets 630. In this way, the electronic device 602 can easily identify temporal layer switching points in the STSA samples.

The electronic device 602 can decode 708 the current picture based on the STSA sample packets 630. For example, the decoder 612 may decode 708 a portion of the bitstream 610 and/or the file 651 based on the STSA sample packets 630 to produce a current picture. In some cases, the current picture being decoded may be an STSA picture 629. The current picture may be decoded by the decoder 612 as described above.

Fig. 8 is a flow chart showing a more specific configuration of a method 800 of receiving a stepped temporal layer access (STSA) sample packet. The electronic device 602 may receive 802 a bitstream 610 and/or a file 651. The bitstream 610 and/or the file 651 may be received as described above in connection with fig. 7. For example, electronic device B102B can receive 802 a bitstream 610 and/or a file 651 from the encoder 104 on electronic device a102 a.

The electronic device 602 can obtain 804 a STSA sample packet 630. The electronic device 602 can obtain the STSA sample packets 630 from the bitstream 610 and/or the file 651. In other words, the bitstream 610 and/or the file 651 may include STSA sample packets 630. The STSA sample packet 630 may include one or more samples.

In some configurations, sample packets 630 may be constructed in an ISO base media file format, such as NAL unit transport. The electronic device 602 may receive the NAL unit transport and obtain a sample packet 630. Tables 2 through 4 provide examples of ISO base media file format syntax, and sample packets 630 may be constructed when received in the ISO base media file format, for example, using a sample group description box (SGPD). For example, table 3 shows an example of a stepped temporal sub-layer sample group entry. In table 3, STSA samples can be marked using sample grouping.

The electronic device 602 can obtain 806NAL unit sets and corresponding encoded STSA pictures 629 from the STSA sample packets 630. NAL unit sets and corresponding encoded STSA pictures 629 may be packed in sample packets 630. The set of NAL units may be in a sequential decoding order.

In some configurations, the electronic device 602 may obtain a type TSA tag. For example, STSA sample packet 630 may also include a type TSA flag. The type TSA flag may indicate whether a sample in the STSA sample packet 630 is a TSA sample or a STSA sample. The type TSA flag may be typeTSAFlag. For example, typeTSAFlag equal to 1 may indicate that the sample packet is a TSA sample. Otherwise, typeTSAFlag equal to 0 may indicate that the sample packet is a STSA sample.

Electronic device 602 can decode 808 a corresponding encoded STSA picture 629 based on a set of NAL units in STSA sample packet 630. The electronic device 602 can also receive an indication from the STSA sample packet 630 corresponding to a temporal sub-layer switch. For example, STSA sample packets 630 may indicate a desired frame rate in a stepwise manner and for temporal switching. As other examples, STSA sample grouping 630 may provide additional syntax that may provide insight into when a switching point at the next temporal layer (i.e., STSA samples for higher temporal IDs) will occur at a higher temporal layer. In some configurations, STSA sample packet 630 may indicate when all higher temporal layers with higher temporal IDs than the temporal ID of the temporal layer of samples will occur. In each of these examples and configurations, STSA sample packet 630 may indicate and include one or more STSA samples.

Electronic device 602 can decode 810 the current picture based on STSA picture 629. For example, decoder 612 decodes 810 bitstream 610 and/or file 651 based on STSA picture 629 to produce a current picture.

When the current picture is STSA picture 629, a picture having a Temporal _ id equal to that of the current picture may not be included in RefPicSetStCurrBefore, RefPicSetStCurrAfter, or RefPicSetLtCurr. When the current picture is located behind STSA picture 629 having a Temporal _ id equal to the Temporal _ id of the current picture in decoding order, a picture preceding the STSA picture in decoding order having a Temporal _ id equal to the Temporal _ id of the current picture may not be included in RefPicSetStCurrBefore, RefPicSetStCurrAfter, or RefPicSetLtCurr.

As described above, the STSA sample packets 630 and/or STSA pictures 629 may allow the decoder 612 to store and use additional reference pictures when decoding the current picture. The use of STSA sample packets 630 may provide a clear marking and/or marking of STSA samples belonging to the STSA sample packet. In this manner, the electronic device 602 may easily identify temporal layer switching points in the sample.

Figure 9 is a block diagram illustrating one configuration of an electronic device 902 in which systems and methods of signaling Stepped Time Sublayer Access (STSA) sample packets may be implemented. The electronic device 902 may comprise a bitstream 910, a file 951, an encoding device 935, a transmitting device 937, and a storage device 957. The encoding device 935, the transmitting device 937, and the storage device 957 may be configured to: perform one or more of the functions described in connection with one or more of fig. 4, 5, and other figures described herein. Fig. 11 below shows one example of a specific device structure of fig. 9. Various other structures for implementing one or more of the functions of fig. 1 and 3 may be implemented. For example, a DSP may be implemented by software.

Figure 10 is a block diagram illustrating one configuration of an electronic device 1002 in which systems and methods for receiving a stepped temporal sub-layer access (STSA) sample packet may be implemented. The electronic device 1002 may comprise a bitstream 1010, a file 1051, receiving means 1039 and decoding means 1041. The receiving device 1039 and the decoding device 1041 may be configured to perform one or more similar functions described in connection with fig. 7, 8, and other figures described herein. Fig. 12 below shows one example of a specific device structure of fig. 10. Various other structures for implementing one or more of the functions of fig. 1 and 6 may be implemented. For example, the DSP may be implemented by software.

FIG. 11 is a block diagram illustrating various components that may be used in the transmitting electronic device 1102. One or more of the electronic devices 102, 302, 602, 902, and 1002 described herein may be implemented in accordance with the transmitting electronic device 1102 shown in fig. 11.

The transmitting electronic device 1102 includes a processor 1117 that controls the operation of the transmitting electronic device 1102. The processor 1117 may be referred to as a Computer Processing Unit (CPU). Memory 1111 provides instructions 1113a (e.g., executable instructions) and data 1115a to processor 1117, which memory 1111 may comprise Read Only Memory (ROM), Random Access Memory (RAM), or any type of device that may store information. A portion of the memory 1111 may also include non-volatile random access memory (NVRAM). The memory 1111 may be in electronic communication with the processor 1117.

The instructions 1113b and data 1115b may also reside within the processor 1117. The instructions 1113b and/or data 1115b loaded into the processor 1117 may also include instructions 1113b and/or data 1115b from the memory 1111 that were loaded for execution or processing by the processor 1117. The instructions 1113b may be executed by the processor 1117 to implement one or more of the methods 400 and 500 described herein.

The transmitting electronic device 1102 may include one or more communication interfaces 1109 for communicating with other electronic devices (e.g., receiving electronic devices). The communication interface 1109 may be based on wired communication techniques, wireless communication techniques, or both. Examples of communication interface 1109 include a serial port, a parallel port, a Universal Serial Bus (USB), an ethernet adapter, an IEEE 1394 bus interface, a Small Computer System Interface (SCSI) bus interface, an Infrared (IR) communication interface, a bluetooth wireless communication adapter, a wireless transceiver compliant with the third generation partnership project (3GPP) specifications, and so forth.

The sending electronic device 1102 may include one or more output devices 1103 and one or more input devices 1101. Examples of output devices 1103 include speakers, printers, and so forth. One type of output device that may be included in the transmitting electronic device 1102 is a display device 1105. Display device 1105 for use with configurations disclosed herein may utilize any suitable image projection technology, such as Cathode Ray Tube (CRT), Liquid Crystal Display (LCD), Light Emitting Diode (LED), gas plasma, electroluminescence, and so forth. A display controller 1107 may be provided for converting data stored in the memory 1111 into text, graphics, and/or moving images (as needed) shown on the display device 1105. Examples of input device 1101 include a keyboard, mouse, microphone, remote control device, buttons, joystick, trackball, touch pad, touch screen, light pen, and the like.

The various components of the transmitting electronic device 1102 are coupled together by a bus system 1133, wherein the bus system 1133 may include a power bus, a control signal bus, and a status signal bus in addition to a data bus. However, for clarity, the various buses are illustrated in FIG. 11 as the bus system 1133. The transmitting electronic device 1102 shown in FIG. 11 is a functional block diagram rather than a listing of specific components.

Fig. 12 is a block diagram illustrating various components that may be used in the receiving electronic device 1202. One or more of the electronic devices 102, 302, 602, 902, and 1002 described herein may be implemented in accordance with the receiving electronic device 1202 shown in fig. 12.

The receiving electronic device 1202 includes a processor 1217 that controls the operation of the receiving electronic device 1202. The processor 1217 may also be referred to as a CPU. Memory 1211 provides instructions 1213a (e.g., executable instructions) and data 1215a to processor 1217, and memory 1211 may include both read-only memory (ROM) and Random Access Memory (RAM), or may include any type of device that may store information. A portion of memory 1211 may also include non-volatile random access memory (NVRAM). The memory 1211 may be in electronic communication with the processor 1217.

Instructions 1213b and data 1215b may also reside in the processor 1217. The instructions 1213b and/or data 1215b loaded into the processor 1217 may also include instructions 1213a and/or data 1215a from the memory 1211 that were loaded for execution or processing by the processor 1217. The instructions 1213b may be executed by the processor 1217 to implement one or more of the methods 700 and 800 described herein.

The receiving electronic device 1202 may include one or more communication interfaces 1209 for communicating with other electronic devices (e.g., a transmitting electronic device). The communication interface 1209 may be based on wired communication technology, wireless communication technology, or both. Examples of communication interface 1209 include a serial interface, a parallel interface, a Universal Serial Bus (USB), an ethernet adapter, an IEEE 1394 bus interface, a Small Computer System Interface (SCSI) bus interface, an Infrared (IR) communication port, a bluetooth wireless communication adapter, a wireless transceiver compliant with the third generation partnership project (3GPP) specifications, and so forth.

The receiving electronic device 1202 may include one or more output devices 1203 and one or more input devices 1201. Examples of output devices 1203 include speakers, printers, and so forth. One type of output device that may be included in the receiving electronic device 1202 is a display device 1205. The display device 1205 used with the configurations disclosed herein may use any suitable image projection technology, such as a Cathode Ray Tube (CRT), Liquid Crystal Display (LCD), Light Emitting Diode (LED), gas plasma, electroluminescence, and so forth. A display controller 1207 may be provided to convert (as needed) data stored in the memory 1211 into text, graphics, and/or moving images shown on the display device 1205. Examples of input devices 1201 include a keyboard, mouse, microphone, remote control device, buttons, joystick, trackball, touch screen, light pen, and the like.

The various components of the receiving electronic device 1202 are coupled together by a bus system 1233, which bus system 1233 may include a power bus, a control signal bus, and a status signal bus in addition to a data bus. However, for the sake of clarity, the various buses are illustrated in FIG. 12 as bus system 1233. The receiving electronic device 1202 shown in FIG. 12 is a functional block diagram rather than a listing of specific components.

The term "computer-readable medium" refers to any available medium that can be accessed by a computer or processor. As used herein, the term "computer-readable medium" may represent non-transitory and tangible computer and/or processor-readable media. By way of example, and not limitation, computer-readable media or processor-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer or processor. As used herein, a magnetic disk or an optical disk includes a Compact Disk (CD), a laser disk, an optical disk, a Digital Versatile Disk (DVD), a floppy disk, and a blu-ray (registered trademark) disk, where magnetic disks usually reproduce data magnetically, while optical disks reproduce data optically with lasers.

It should be noted that one or more of the methods described herein may be implemented in hardware and/or performed using hardware. For example, one or more of the methods described herein may be implemented in and/or using a chipset, an ASIC, a large scale integrated circuit (LSI), an integrated circuit, or the like.

Each method described herein includes one or more steps or acts for achieving the described method height. The method steps and/or actions may be interchanged with one another and/or combined into a single step without departing from the scope of the claims. In other words, unless a specific order of steps or actions is required for proper operation of the method being described, the order and/or use of specific steps and/or actions may be modified without departing from the scope of the claims.

It is to be understood that the claims are not limited to the precise configuration and components shown above. Various modifications, changes, and variations may be made in the arrangement, operation, and details of the systems, methods, and apparatuses described herein without departing from the scope of the claims.

Claims

1. A method for decoding a picture, comprising:

receiving an audiovisual bitstream;

a step-wise temporal sub-layer access is obtained to the STSA sample packets,

decoding the STSA sample packet; and

determines when to switch to a new temporal layer based on STSA sample packets,

wherein:

the STSA sample packet is sent in a sample group description box SGDP, where the SGDP includes a next STSA up-switch distance parameter and a next STSA sample distance parameter.

2. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,

wherein the SGDP includes a type Time Switching Access (TSA) flag.

3. The method for decoding a picture according to claim 2, wherein the type TSA flag value indicates that a sample in the STSA sample packet is a TSA sample or a STSA sample.

4. The method for decoding pictures according to claim 1, wherein an STSA picture provides a temporal layer switching function to a temporal layer to which the STSA picture belongs.

5. The method for decoding pictures of claim 1, wherein the STSA sample packets are included in an ISO base media file.

6. The method according to any of the preceding claims, wherein the next STSA sample distance parameter indicates when a next temporal layer switching point will occur at the same temporal layer.

7. The method of claim 1, wherein the next STSA up-switch distance parameter indicates when a switch point will occur at a higher temporal layer on a next temporal layer.

8. The method of claim 1, wherein the STSA sample packets are accessed from a bitstream, from a recordable storage medium, or from a file derived stepwise temporal sub-layer.

9. An electronic device for decoding a picture, comprising:

a processor and instructions stored in the memory, the instructions being in electronic communication with the processor, the instructions being executed on the processor to perform the steps of any of methods 1 to 8.