HK1189110B

HK1189110B - Decoding method

Info

Publication number: HK1189110B
Application number: HK14102085.6A
Authority: HK
Inventors: 朱立华; 罗建聪; 尹鹏; 杨继珩
Original assignee: 杜比国际公司
Priority date: 2007-04-18
Filing date: 2014-03-03
Publication date: 2018-03-23

Description

Decoding method

The application is a divisional application of an invention patent application with the application date of 2008-4-7, the application number of 200880012349.X and the invention name of a coding system.

Cross Reference to Related Applications

This application claims priority to each of the following and is hereby incorporated by reference herein in its entirety for all purposes: (1) U.S. provisional application Ser. No.60/923,993 entitled "Supplemental Sequence Parameter Set for Scalable Video Coding or Multi-view Video Coding" filed on 18.4.2007 (attorney docket No. PU 070101) and (2) U.S. patent application Ser. No.11/824,006 entitled "Supplemental Sequence Parameter Set for Scalable Video Coding or Multi-view Video Coding" filed on 28.6.2007 (attorney docket No. PA 070032).

Technical Field

At least one implementation relates to encoding and decoding video data in a scalable manner.

Background

Encoding video data according to multiple layers may be useful when the terminals for which the data is intended have different capabilities and therefore do not decode the entire data stream but only a portion of the entire data stream. When video data is encoded according to a plurality of layers in a scalable manner, a receiving terminal may extract a portion of the data from a received bitstream according to a profile of the terminal. The complete data stream may also transport overhead information for each supported layer to facilitate decoding of each layer at the terminal.

Disclosure of Invention

According to one general aspect, information from a sequence parameter set ("SPS") network abstraction layer ("NAL") unit is accessed. The information describes parameters used in decoding a first layer encoding of a sequence of images. Information from a supplemental SPS NAL unit having a different structure than the SPS NAL unit is also accessed. The information from the supplemental SPS NAL unit describes parameters used in decoding a second layer encoding of a sequence of images. Generating a decoding of the sequence of images based on the first layer encoding, the second layer encoding, the accessed information from the SPS NAL unit, and the accessed information from the supplemental SPS NAL unit.

According to another general aspect, a syntax structure is used that provides for multi-layer decoding of a sequence of pictures. The syntax structure includes syntax for an SPS NAL unit that includes information describing parameters used in decoding a first layer encoding of a sequence of pictures. The syntax structure also includes syntax for a supplemental SPS NAL unit having a different structure than the SPS NAL unit. The supplemental SPS NAL unit includes information describing parameters used in decoding a second layer encoding of a sequence of images. Decoding of the sequence of images may be generated based on the first layer encoding, the second layer encoding, the information from the SPS NAL unit, and the information from the supplemental SPS NAL unit.

According to another general aspect, a signal is formatted to include information from an SPS NAL unit. The information describes parameters used in decoding a first layer encoding of a sequence of images. The signal is also formatted to include information from a supplemental SPS NAL unit having a different structure than the SPS NAL unit. The information from the supplemental SPS NAL unit describes parameters used in decoding a second layer encoding of a sequence of images.

According to another general aspect, an SPS NAL unit is generated that includes information describing a parameter for use in decoding a first layer encoding of a sequence of images. A supplemental SPS NAL unit is generated, the supplemental SPS NAL unit having a different structure than the SPS NAL unit. The supplemental SPS NAL unit includes information describing parameters used in decoding a second layer encoding of a sequence of images. A data set is provided that includes a first layer encoding of a sequence of pictures, a second layer encoding of the sequence of pictures, an SPS NAL unit, and a supplemental SPS NAL unit.

According to another general aspect, a syntax structure is used that provides for multi-layer coding of a sequence of pictures. The syntax structure includes syntax for an SPS NAL unit. The SPS NAL unit includes information describing parameters used in decoding a first layer encoding of a sequence of images. The syntax structure includes syntax for a supplemental SPS NAL unit. The supplemental SPS NAL unit has a different structure than the SPS NAL unit. The supplemental SPS NAL unit includes information describing parameters used in decoding a second layer encoding of a sequence of images. A data set is provided that includes a first layer encoding of a sequence of pictures, a second layer encoding of the sequence of pictures, an SPS NAL unit, and a supplemental SPS NAL unit.

According to another general aspect, first layer dependent information in a first set of specification parameters is accessed. The accessed first layer dependent information is used in decoding a first layer encoding of the sequence of images. Second layer dependent information in a second set of specification parameters is accessed. The second normative parameter set has a different structure than the first normative parameter set. The accessed second layer dependent information is used in decoding a second layer encoding of the sequence of images. Decoding the sequence of images based on one or more of the accessed first layer dependent information or the accessed second layer dependent information.

According to another general aspect, a first set of specification parameters is generated that includes information dependent on a first layer. The first layer dependent information is used in decoding a first layer encoding of a sequence of images. A second normative parameter set is generated having a different structure than the first normative parameter set. The second set of normative parameters includes second-layer-dependent information for use in decoding a second-layer encoding of the sequence of images. A set of data is provided that includes a first set of normative parameters and a second set of normative parameters.

According to another general aspect, a decoding method includes: accessing information from a sequence parameter set "SPS" network abstraction layer "NAL" unit, the information describing parameters used in decoding a first layer encoding of a picture in a sequence of pictures; accessing supplemental information from a supplemental SPS NAL unit, the supplemental SPS NAL unit having a different NAL unit type code and having a different syntax structure than the SPS NAL unit, and the supplemental information from the supplemental SPS NAL unit describing parameters for use in decoding a second layer encoding of a picture in a sequence of pictures; and decoding the first layer code and the second layer code based on the accessed information from the SPS NAL unit and the accessed supplemental information from the supplemental SPS NAL unit, respectively.

The details of one or more implementations are set forth in the accompanying drawings and the description below. Even if described in one particular way, it should be clear that implementations may be configured or implemented in various ways. For example, an implementation may be performed as a method or embodied as a device, e.g., a device configured to perform a set of operations or a device storing instructions for performing a set of operations, or embodied in a signal. Other aspects and features will become apparent from the following detailed description considered in conjunction with the accompanying drawings and the claims.

Drawings

Fig. 1 shows a block diagram of an implementation of an encoder.

Fig. 1a shows a block diagram of another implementation of an encoder.

Fig. 2 shows a block diagram of an implementation of a decoder.

Fig. 2a shows a block diagram of another implementation of a decoder.

Fig. 3 illustrates the structure of an implementation of a single layer sequence parameter set ("SPS") network abstraction layer ("NAL") unit.

Fig. 4 shows a block diagram view of an example of a partial data stream, illustrating the use of SPS NAL units.

Fig. 5 shows the structure of an implementation of a supplemental SPS ("SUP SPS") NAL unit.

Fig. 6 illustrates an implementation of an organizational hierarchy between an SPS unit and a plurality of SUP SPS units.

Fig. 7 shows the structure of another implementation of a SUP SPS NAL unit.

Fig. 8 shows a functional view of an implementation of a scalable video encoder that generates SUP SPS units.

Fig. 9 shows a hierarchical view of an implementation of generating a data stream containing SUP SPS units.

Fig. 10 shows a block diagram view of an example of a data stream generated by the implementation of fig. 9.

Fig. 11 shows a block diagram of an implementation of an encoder.

Fig. 12 shows a block diagram of another implementation of an encoder.

Fig. 13 shows a flow chart of an implementation of the encoding process used by the encoder of fig. 11 or 12.

FIG. 14 shows a block diagram view of an example of a data stream generated by the process of FIG. 13.

Fig. 15 shows a block diagram of an implementation of a decoder.

Fig. 16 shows a block diagram of another implementation of a decoder.

Fig. 17 shows a flow diagram of an implementation of a decoding process used by the decoder of fig. 15 or 16.

Detailed Description

Today there are a number of video coding standards that can encode video data according to different layers and/or profiles. Among other things, reference may be made to the H.264/MPEG-4AVC ("AVC Standard"), also known as the International organization for standardization/International electrotechnical Commission (ISO/IEC) moving Picture experts group-4 (MPEG-4) part 10 Advanced Video Coding (AVC) Standard/International telecommunication Union, telecommunication sector (ITU-T) H.264 recommendation. Furthermore, there is an extension to the AVC standard. The first such extension is the scalable video coding ("SVC") extension (annex G), known as H.264/MPEG-4AVC, scalable video coding extension ("SVC extension"). A second such extension is the multiview video coding ("MVC") extension (appendix H), known as H.264/MPEG-4AVC, MVC extension ("MVC extension").

At least one implementation described in this disclosure may be used with the AVC standard as well as SVC and MVC extensions. The implementation provides a supplemental ("SUP") sequence parameter set ("SPS") network abstraction layer ("NAL") unit having a different NAL unit type than the SPS NAL unit. The SPS unit typically includes (but does not necessarily include) information for at least a single layer. Furthermore, the SUP SPS NAL unit includes layer-dependent information for at least one additional layer. Thus, by accessing the SPS and SUP SPS units, the decoder has available specific (typically all) layer-dependent information needed to decode the bitstream.

With this implementation in the AVC system, SUP SPS NAL units need not be transmitted, and a single layer SPS NAL unit (described below) may be transmitted. Using this implementation in an SVC (or MVC) system, SUP SPS NAL units for the required additional layers (or views) may be transmitted in addition to the SPS NAL units. Using this implementation in a system that includes both AVC-compatible decoders and SVC-compatible (or MVC-compatible) decoders, the AVC-compatible decoders can ignore the SUP SPS NAL units by detecting the NAL unit type. In each case, high efficiency and compatibility may be achieved.

The above implementation also provides benefits to systems (standard or otherwise) that impose the following requirements: specific layers are required to share header information (e.g., SPS or specific information typically carried in SPS). For example, if the base layer and its composite temporal layer need to share an SPS, layer-dependent information cannot be transmitted with the shared SPS. However, SUP SPS provides a mechanism for transmitting layer-dependent information.

SUP SPS of various implementations also provide efficient advantages: the SUPSPS need not include and therefore need not repeat all parameters in the SPS. SUP SPS will typically focus on layer-dependent parameters. However, various implementations include SUP SPS structures that include layer-independent parameters, or even repeat the entire SPS structure.

Various implementations relate to SVC extensions. The SVC extension proposes the transmission of video data according to multiple spatial levels, temporal levels, and quality levels. For one spatial level, coding may be performed according to multiple temporal levels, and for each temporal level, coding may be performed according to multiple quality levels. Thus, when m spatial levels, n temporal levels, and O quality levels are defined, video data may be encoded according to m x n x O different combinations. These combinations are referred to as layers, or interoperability points ("IOPs"). Depending on the capabilities of the decoder (also called receiver or client), different layers may be transmitted up to a specific layer corresponding to the maximum client capability.

As used herein, "layer-dependent" information refers to information that is specifically related to a single layer. That is, as the name implies, this information depends on the particular layer. Such information does not necessarily differ from layer to layer, but is typically provided separately for each layer.

As used herein, "high level syntax" refers to syntax present in the bitstream that is hierarchically above the macroblock layer. For example, as used herein, a high level grammar can refer to (but is not limited to): a slice header level, a Supplemental Enhancement Information (SEI) level, a Picture Parameter Set (PPS) level, a Sequence Parameter Set (SPS) level, and a syntax of a Network Abstraction Layer (NAL) unit header level.

Referring to fig. 1, an exemplary SVC encoder is indicated generally by the reference numeral 100. The SVC encoder 100 may also be used for AVC encoding, i.e. for a single layer (e.g. base layer). Furthermore, SVC encoder 100 may be used for MVC encoding, as will be understood by one of ordinary skill in the art. For example, various components of the SVC encoder 100, or variations of these components, may be used in encoding multiple views.

A first output of the temporal decomposition module 142 is connected in signal communication with a first input of an intra prediction module 146 for intra blocks. A second output of the temporal decomposition module 142 is connected in signal communication with a first input of a motion coding module 144. An output of the intra prediction module 146 for intra blocks is connected in signal communication with an input of a transform/entropy encoder (signal-to-noise ratio (SNR) scalable) 149. A first output of the transform/entropy coder 149 is connected in signal communication with a first input of a multiplexer 170.

A first output of the temporal decomposition module 132 is connected in signal communication with a first input of an intra prediction module for intra blocks 136. A second output of the temporal decomposition module 132 is connected in signal communication with a first input of a motion coding module 134. An output of the intra prediction module for intra blocks 136 is connected in signal communication with an input of a transform/entropy encoder (signal-to-noise ratio (SNR) scalable) 139. A first output of the transform/entropy coder 139 is connected in signal communication with a first input of a multiplexer 170.

A second output of the transform/entropy coder 149 is connected in signal communication with an input of a 2D spatial interpolation module 138. An output of the 2D spatial interpolation module 138 is connected in signal communication with a second input of the intra prediction module for intra blocks 136. A second output of the motion coding module 144 is connected in signal communication with an input of the motion coding module 134.

A first output of the temporal decomposition module 122 is connected in signal communication with a first input of an intra-frame predictor 126. A second output of the temporal decomposition module 122 is connected in signal communication with a first input of a motion coding module 124. An output of the intra predictor 126 is connected in signal communication with an input of a transform/entropy encoder (signal-to-noise ratio (SNR) scalable) 129. A first output of the transform/entropy coder 129 is connected in signal communication with a first input of a multiplexer 170.

A second output of the transform/entropy coder 139 is connected in signal communication with an input of a 2D spatial interpolation module 128. An output of the 2D spatial interpolation module 128 is connected in signal communication with a second input of the intra predictor 126. A second output of the motion coding module 134 is connected in signal communication with an input of the motion coding module 124.

A first output of the motion coding module 124, a first output of the motion coding module 134, and a first output of the motion coding module 144 are each connected in signal communication with a second input of the multiplexer 170.

A first output of the 2D spatial decimation module 104 is connected in signal communication with an input of a temporal decomposition module 132. A second output of the 2D spatial decimation module 104 is connected in signal communication with an input of a temporal decomposition module 142.

An input of the temporal decomposition module 122 and an input of the 2D spatial decimation module 104 are available as inputs of the encoder 100, for receiving the input video 102.

The output of the multiplexer 170 is available as an output of the encoder 100 for providing a bitstream 180.

In the core encoder portion 187 of the encoder 100 includes: temporal decomposition module 122, temporal decomposition module 132, temporal decomposition module 142, motion coding module 124, motion coding module 134, motion coding module 144, intra predictor 126, intra predictor 136, intra predictor 146, transform/entropy encoder 129, transform/entropy encoder 139, transform/entropy encoder 149, 2D spatial interpolation module 128, and 2D spatial interpolation module 138.

Fig. 1 includes three core encoders 187. In the implementation shown in the figure, the bottom-most core encoder 187 may encode the base layer, and the middle and upper core encoders 187 encode the higher layers.

Turning to fig. 2, an exemplary SVC decoder is indicated generally by the reference numeral 200. The SVC decoder 200 may also be used for AVC decoding, i.e. for a single view. Furthermore, one of ordinary skill in the art will appreciate that the SVC decoder 200 may be used for MVC decoding. For example, various components of SVC decoder 200, or different variations of these components, may be used in decoding of multiple views.

Note that the encoder 100 and decoder 200, as well as other encoders and decoders discussed in this disclosure, may be configured to perform the various methods shown throughout this disclosure. In addition to performing encoding operations, the encoder described in this disclosure may perform various decoding operations during the reconstruction process in order to mirror the expected actions of the decoder. For example, to generate a reconstruction of the encoded video data to predict the additional video data, the encoder may decode the SUP SPS unit to decode the encoded video data. Thus, the encoder can perform substantially all of the operations performed by the decoder.

An input of the demultiplexer 202 is available as an input to the scalable video decoder 200, for receiving a scalable bit stream. A first output of the demultiplexer 202 is connected in signal communication with an input of a spatial inverse transform SNR scalable entropy decoder 204. A first output of the spatial inverse transform SNR scalable entropy decoder 204 is connected in signal communication with a first input of a prediction module 206. An output of the prediction module 206 is connected in signal communication with a first input of a combiner 230.

A second output of the spatial inverse transform SNR scalable entropy decoder 204 is connected in signal communication with a first input of a Motion Vector (MV) decoder 210. An output of the MV decoder 210 is connected in signal communication with an input of a motion compensator 232. An output of the motion compensator 232 is connected in signal communication with a second input of the combiner 230.

A second output of the demultiplexer 202 is connected in signal communication with an input of a spatial inverse transform SNR scalable entropy decoder 212. A first output of the spatial inverse transform SNR scalable entropy decoder 212 is connected in signal communication with a first input of a prediction module 214. A first output of the prediction module 214 is connected in signal communication with an input of an interpolation module 216. An output of the interpolation module 216 is connected in signal communication with a second input of the prediction module 206. A second output of the prediction module 214 is connected in signal communication with a first input of a combiner 240.

A second output of the spatial inverse transform SNR scalable entropy decoder 212 is connected in signal communication with a first input of an MV decoder 220. A first output of the MV decoder 220 is connected in signal communication with a second input of the MV decoder 210. A second output of the MV decoder 220 is connected in signal communication with an input of a motion compensator 242. An output of the motion compensator 242 is connected in signal communication with a second input of the combiner 240.

A third output of the demultiplexer 202 is connected in signal communication with an input of a spatial inverse transform SNR scalable entropy decoder 222. A first output of the spatial inverse transform SNR scalable entropy decoder 222 is connected in signal communication with an input of a prediction module 224. A first output of the prediction module 224 is connected in signal communication with an input of an interpolation module 226. An output of the interpolation module 226 is connected in signal communication with a second input of the prediction module 214.

A second output of the prediction module 224 is connected in signal communication with a first input of a combiner 250. A second output of the spatial inverse transform SNR scalable entropy decoder 222 is connected in signal communication with an input of an MV decoder 230. A first output of the MV decoder 230 is connected in signal communication with a second input of the MV decoder 220. A second output of the MV decoder 230 is connected in signal communication with an input of a motion compensator 252. An output of the motion compensator 252 is connected in signal communication with a second input of the combiner 250.

The output of the combiner 250 is available as an output of the decoder 200 for outputting the layer 0 signal. The output of the combiner 240 is available as an output of the decoder 200 for outputting the layer 1 signal. The output of the combiner 230 is available as an output of the decoder 200, for outputting a layer 2 signal.

Referring to fig. 1a, an exemplary AVC encoder is indicated generally by the reference numeral 2100. AVC encoder 2100 may be used, for example, to encode a single layer (e.g., a base layer).

The video encoder 2100 includes a frame ordering buffer 2110, the buffer 2110 having an output in signal communication with a non-inverting input of a combiner 2185. An output of the combiner 2185 is connected in signal communication with a first input of a transformer and quantizer 2125. An output of the transformer and quantizer 2125 is connected in signal communication with a first input of an entropy coder 2145 and a first input of an inverse transformer and inverse quantizer 2150. An output of the entropy coder 2145 is connected in signal communication with a first non-inverting input of a combiner 2190. An output of the combiner 2190 is connected in signal communication with a first input of an output buffer 2135.

A first output of the encoder controller 2105 is connected in signal communication with a second input of the frame ordering buffer 2110, a second input of the inverse transformer and inverse quantizer 2150, an input of a picture type decision module 2115, an input of a macroblock type (MB type) decision module 2120, a second input of an intra prediction module 2160, a second input of a deblocking filter 2165, a first input of a motion compensator 2170, a first input of a motion estimator 2175, and a second input of the reference picture buffer 2180.

A second output of the encoder controller 2105 is connected in signal communication with a first input of a supplemental enhancement information ("SEI") inserter 2130, a second input of the transformer and quantizer 2125, a second input of the entropy coder 2145, a second input of the output buffer 2135, a Sequence Parameter Set (SPS), and an input of the Picture Parameter Set (PPS) inserter 2140.

A first output of the picture-type decision module 2115 is connected in signal communication with a third input of the frame ordering buffer 2110. A second output of the picture-type decision module 2115 is connected in signal communication with a second input of the macroblock-type decision module 2120.

An output of the sequence parameter set ("SPS") and picture parameter set ("PPS") inserter 2140 is connected in signal communication with a third non-inverting input of the combiner 2190. An output of the SEI inserter 2130 is connected in signal communication with a second non-inverting input of the combiner 2190.

An output of the inverse quantizer and inverse transformer 2150 is connected in signal communication with a first non-inverting input of a combiner 2127. An output of the combiner 2127 is connected in signal communication with a first input of an intra prediction module 2160 and a first input of a deblocking filter 2165. An output of the deblocking filter 2165 is connected in signal communication with a first input of a reference picture buffer 2180. An output of the reference picture buffer 2180 is connected in signal communication with a second input of the motion estimator 2175 and a first input of the motion compensator 2170. A first output of the motion estimator 2175 is connected in signal communication with a second input of the motion compensator 2170. A second output of the motion estimator 2175 is connected in signal communication with a third input of the entropy encoder 2145.

An output of the motion compensator 2170 is connected in signal communication with a first input of a switch 2197. An output of the intra prediction module 2160 is connected in signal communication with a second input of the switch 2197. An output of the macroblock-type decision module 2120 is connected in signal communication with a third input of the switch 2197 to provide a control input to the switch 2197. An output of the switch 2197 is connected in signal communication with a second non-inverting input of the combiner 2127 and an inverting input of the combiner 2185.

Inputs to the frame ordering buffer 2110 and to the encoder controller 2105 are available as inputs to the encoder 2100, for receiving the input pictures 2101. Further, an input of the SEI inserter 2130 is available as an input of the encoder 2100, for receiving metadata. An output of the output buffer 2135 is available as an output of the encoder 2100, for outputting a bitstream.

Referring to fig. 2a, a video decoder capable of performing video decoding in accordance with the MPEG-4AVC standard is indicated generally by the reference numeral 2200.

The video decoder 2200 includes an input buffer 2210, the buffer 2210 having an output connected in signal communication with a first input of an entropy decoder 2245. A first output of the entropy decoder 2245 is connected in signal communication with a first input of an inverse transformer and inverse quantizer 2250. An output of the inverse transformer and inverse quantizer 2250 is connected in signal communication with a second non-inverting input of the combiner 2225. An output of the combiner 2225 is connected in signal communication with a second input of a deblocking filter 2265 and a first input of an intra prediction module 2260. A second output of the deblocking filter 2265 is connected in signal communication with a first input of a reference picture buffer 2280. An output of the reference picture buffer 2280 is connected in signal communication with a second input of the motion compensator 2270.

A second output of the entropy decoder 2245 is connected in signal communication with a third input of the motion compensator 2270 and a first input of the deblocking filter 2265. A third output of the entropy decoder 2245 is connected in signal communication with an input of the decoder controller 2205. A first output of the decoder controller 2205 is connected in signal communication with a second input of the entropy decoder 2245. A second output of the decoder controller 2205 is connected in signal communication with a second input of the inverse transformer and inverse quantizer 2250. A third output of the decoder controller 2205 is connected in signal communication with a third input of a deblocking filter 2265. A fourth output of the decoder controller 2205 is connected in signal communication with a second input of the intra prediction module 2260, a first input of the motion compensator 2270, and a second input of the reference picture buffer 2280.

An output of the motion compensator 2270 is connected in signal communication with a first input of a switch 2297. An output of the intra prediction module 2260 is connected in signal communication with a second input of the switch 2297. An output of the switch 2297 is connected in signal communication with a first non-inverting input of a combiner 2225.

An input of the input buffer 2210 is available as an input of the decoder 2200, for receiving an input bitstream. A first output of the deblocking filter 2265 is available as an output of the decoder 2200, for outputting an output picture.

Referring to FIG. 3, the structure of a single-layer SPS300 is shown. An SPS is a syntax structure that, in general, includes syntax elements that apply to zero or more entire coded video sequences. In the SVC extension, the values of some syntax elements conveyed in the SPS are layer-dependent. These layer-dependent syntax elements include, but are not limited to: timing information, HRD (standing for "hypothetical reference decoder") parameters, and bitstream restriction information. HRD parameters may include, for example: buffer size, maximum bit rate, and indicator of initial delay. The HRD parameters may, for example, allow the receiving system to verify the integrity of the received bitstream and/or determine whether the receiving system (e.g., a decoder) is capable of decoding the bitstream. Thus, the system may provide for the transmission of the aforementioned syntax elements for each layer.

The single-layer SPS300 includes an SPS-ID310 that provides an identifier of the SPS. The single-layer SPS300 also includes a VUI (standing for video usability information) parameter 320 for a single layer. The VUI parameters include HRD parameters 330 for a single layer (e.g., base layer). The single-layer SPS300 may also include additional parameters 340, although implementations need not include any additional parameters 340.

Referring to FIG. 4, a block diagram view of a data flow 400 illustrates a typical use of the single layer SPS 300. In the AVC standard, for example, a typical data stream may include SPS units, PPS (picture parameter sequence) units that provide parameters for a particular picture, and units for coding picture data, among other components. Such an overall framework is shown in fig. 4, which includes SPS300, PPS-1410, one or more units 420 that include coded picture 1 data, PPS-2430, and one or more units 440 that include coded picture 2 data. PPS-1410 includes parameters for encoding picture 1 data 420 and PPS-2430 includes parameters for encoding picture 2 data 440.

The encoded picture 1 data 420 and the encoded picture 2 data 440 are both associated with a particular SPS (SPS 300 in the implementation of fig. 4). This is achieved by using pointers, as now explained. Encoded picture 1 data 420 includes a PPS-ID (not shown) that identifies PPS-1410, as indicated by arrow 450. The PPS-ID may be stored, for example, in a slice header. Encoded picture 2 data 440 includes a PPS-ID (not shown) that identifies PPS-2430, as indicated by arrow 460. PPS-1410 and PPS-2430 each include an SPS-ID (not shown), which identifies SPS300, as indicated by arrows 470 and 480, respectively.

Referring to fig. 5, the structure of SUP SPS500 is shown. SUP SPS500 includes SPS ID510, VUI520 including HRD parameters 530 for a single additional layer called "(D2, T2, Q2)," and optional additional parameters 540. "D2, T2, Q2" refers to the second layer having a spatial (D) level 2, a temporal (T) level 2, and a quality (Q) level 2.

Note that various numbering schemes may be used to refer to layers. In one numbering scheme, the base layer has D, T, Q values of 0, x,0, meaning that the spatial level is zero, any temporal level, and the quality level is zero. In this numbering scheme, the enhancement layer has D, T, Q, where either D or Q is greater than zero.

The use of SUP SPS500 allows, for example, a system to use an SPS structure that includes only parameters for a single layer, or to use an SPS structure that does not include any layer-dependent information. Such a system may create a separate SUP SPS for each additional layer above the base layer. The additional layer may identify the SPS that it associates with using SPS ID 510. It is apparent that multiple layers may share a single SPS by using a common SPS ID in their respective SUP SPS units.

Referring to fig. 6, an organizational hierarchy 600 between an SPS unit 605 and a plurality of SUP SPS units 610 and 620 is shown. SUP SPS units 610 and 620 are shown as single layer SUP SPS units, but other implementations may use one or more multiple layer SUP SPS units in addition to or instead of using single layer SUP SPS units. In a typical scenario, the hierarchy 600 shows that multiple SUP SPS units may be associated with a single SPS unit. Of course, implementations may include multiple SPS units, and each SPS unit may have an associated SUP SPS unit.

Referring to fig. 7, the structure of another SUP SPS700 is shown. SUP SPS700 includes parameters for multiple layers, while SUPSPS500 includes parameters for a single layer. SUP SPS700 includes SPS ID710, VUI720, and optional additional parameters 740. The VUI720 includes HRD parameters 730 for a first additional layer (D2, T2, Q2) and HRD parameters for additional layers up to layer (Dn, Tn, Qn).

Referring again to fig. 6, the hierarchy 600 may be modified to use multi-layer SUP SPS. For example, if SUP SPS610 and 620 both include the same SPS ID, the combination of SUP SPS610 and 620 may be replaced with SUP SPS 700.

Furthermore, SUP SPS700 may be used with, for example, an SPS that includes parameters for a single layer, or an SPS that includes parameters for multiple layers, or an SPS that does not include layer-dependent parameters for any layer. SUP SPS700 allows a system to provide parameters for multiple layers with less overhead.

Other implementations may be based on, for example, SPS that includes all required parameters for all possible layers. That is, the SPS of this implementation includes all of the corresponding spaces (D) available for transmission, regardless of whether all layers are transmitted or not_i) Time (T)_i) And mass (Q)_i) And (4) grading. However, even for such systems, the SUP SPS may be used to provide the ability to change parameters for one or more layers without transmitting the entire SPS again.

With reference to table 1, syntax is provided for a particular implementation of single layer SUP SPS. The syntax includes sequence _ parameter _ set _ id for identifying the associated SPS, and identifiers temporal _ level, dependency _ id, and quality _ level for identifying the scalability layers. The VUI parameters are included through the use of svc _ VUI _ parameters () (see table 2), and the HRD parameters are included through the use of HRD _ parameters (). The following syntax allows each layer to specify its own layer-dependent parameters, such as HRD parameters.

sup_seq_parameter_set_svc(){	C	Descriptor(s)
			sequence_parameter_set_id	0	ue(v)
temporal_level	0	u(3)
			dependency_id	0	u(3)
qualily_level	0	u(2)
			vui_parameters_present_svc_flag	0	u(1)
if(vui_parameters_present_svc_flag)
			svc_vui_parameters()
}

TABLE 1

The semantics of the sup _ seq _ parameter _ set _ svc () syntax are as follows.

-sequence _ parameter _ set _ id identifies: for a current layer, a sequence parameter set to which a current SUP SPS is mapped;

the temporal _ level, dependency _ id and quality _ level specify the temporal level, dependency identifier and quality level of the current layer. dependency _ id generally indicates a spatial level. However, dependency _ id is also used to indicate coarse granularity scalability ("CGS") hierarchy, which includes spatial and SNR scalability, where SNR scalability is the traditional quality scalability. Accordingly, both quality _ level and dependency _ id can be used to differentiate quality levels.

-vui _ parameters _ present _ svc _ flag equal to 1 indicates the presence of svc _ vui _ parameters () syntax structure defined as follows. vui _ parameters _ present _ svc _ flag equal to 0 indicates that svc _ vui _ parameters () syntax structure is not present.

Table 2 gives syntax of svc _ vui _ parameters (). Thus, the VUI parameters are separate for each layer and put into separate SUP SPS units. However, other implementations combine VUI parameters for multiple layers into a single SUPSPS.

svc_vui_parameters(){	C	Descriptor(s)
			timing_info_present_flag	0	u(1)
If(timing_info_present_flag){
			num_units_in_tick	0	u(32)
time_scale	0	u(32)
			fixed_frame_rate_flag	0	u(1)
}
			nal_hrd_parameters_present_flag	0	u(1)
If(nal_hrd_parameters_present_flag)
			hrd_parameters()
vcl_hrd_parameters_present_flag	0	u(1)
			If(vcl_hrd_parameters_present_flag)
hrd_parameters()
			If(nal_hrd_parameters_present_flag\|\|vcl_hrd_parameters_present_flag)
low_delay_hrd_flag	0	u(1)
			pic_struct_present_flag	0	u(1)
bitstream_restriction_flag	0	u(1)
			If(bitstream_restriction_flag){
motion_vectors_over_pic_boundaries_flag	0	u(1)
			max_bytes_per_pic_denom	0	ue(v)
max_bits_per_mb_denom	0	ue(v)
			log2_max_mv_length_horizontal	0	ue(v)
log2_max_mv_length_vertical	0	ue(v)
			num_reorder_frames	0	ue(v)
max_dec_frame_buffering	0	ue(v)
			}
}

TABLE 2

The fields of the SVC extension () syntax of table 2 are defined in the version of the SVC extension present under annex e.1 of JVT _ U201, month 4, 2007. Specifically, hrd _ parameters () for the AVC standard is defined. Note also that svc _ vui _ parameters () includes various layer-dependent information, including HRD-related parameters. HRD-related parameters include num _ units _ in _ tick, time _ scale, fixed _ frame _ rate _ flag, nal _ HRD _ parameters _ present _ flag, vcl _ HRD _ parameters _ present _ flag, HRD _ parameters (), low _ delay _ HRD _ flag, and pic _ struct _ present _ flag. Furthermore, even if not HRD-related, syntax elements in the if loop of bitstream _ restriction _ flag are layer-dependent.

As mentioned above, SUP SPS is defined as a new type of NAL unit. Table 3 lists some NAL unit codes defined by the standard JVT-U201, but these NAL unit codes are modified to assign type 24 to SUP SPS. The ellipsis between NAL unit types 1 and 16 and 18 and 24 indicate that these types are unchanged. The ellipsis between NAL unit types 25 and 31 indicates that these types are unspecified. The implementation of table 3 below changes the type 24 of the standard from "unspecified" to "sup _ seq _ parameter _ set _ svc ()". "unspecified" is typically reserved for user applications. "reserved," on the other hand, is generally reserved for future standard modifications. Accordingly, another implementation changes one of the "reservation" types (e.g., type 16, 17, or 18) to "sup _ seq _ parameter _ set _ svc ()". Changing the "unspecified" type results in an implementation for a given user, while changing the "reserved" type results in an implementation that changes the criteria for all users.

TABLE 3

Fig. 8 shows a functional view of an implementation of a scalable video encoder 800 that generates SUP SPS units. Video is received at the input of a scalable video encoder 1. The video is encoded according to different spatial levels. Spatial levels mainly refer to different resolution levels of the same video. For example, as an input to a scalable video encoder, there may be a CIF sequence (352 to 288) or a QCIF sequence (176 to 144) representing each spatial level.

Each spatial level is sent to the encoder. Spatial level 1 is sent to encoder 2 ", spatial level 2 is sent to encoder 2', and spatial level m is sent to encoder 2.

The spatial level is encoded with 3 bits using the dependency _ id. Thus, the maximum number of spatial levels in this implementation is 8.

Encoders 2, 2' and 2 "encode one or more layers having the indicated spatial levels. Encoders 2, 2', and 2 "may be specified to have a particular quality level and temporal level, or the quality level and temporal level may be configurable. As can be seen in fig. 8, the encoders 2, 2' and 2 "are arranged hierarchically. That is, encoder 2 "feeds encoder 2 ', which encoder 2' in turn feeds encoder 2. The hierarchical arrangement indicates a typical scene where a higher layer uses a lower layer as a reference.

After encoding, a header is prepared for each layer. In the illustrated implementation, for each spatial level, an SPS message, a PPS message, and a plurality of SUP _ SPS messages are created. SUP SPS messages (or units) may be created, for example, for layers corresponding to various different quality and temporal levels.

For spatial level 1, create SPS and PPS5 ", also create a set:

for spatial level 2, SPS and PPS 5' are created, as are sets:

for spatial level m, SPS and PPS5 are created, and a set is also created:

the bitstreams 7, 7 'and 7 "encoded by the encoders 2, 2' and 2" typically follow a plurality of SPS, PPS, and SUP _ SPS (also referred to as headers, units or messages) in a global bitstream.

The bitstream 8 "comprises SPS and PPS5And a coded video bitstream 7 "which constitutes all the coded data associated with spatial level 1.

The bitstream 8' comprises SPS and PPS5And a coded video bitstream 7' which constitutes all the coded data associated with spatial level 2.

The bitstream 8 comprises the SPS and PPS5,And a coded video bitstream 7, which constitutes all the coded data associated with spatial level m.

The different SUP _ SPS headers conform to the headers described in tables 1-3.

The encoder 800 shown in fig. 8 generates one SPS for each spatial level. However, other implementations may generate multiple SPSs for each spatial level or may generate SPSs that serve multiple spatial levels.

As shown in fig. 8, the bitstreams 8, 8' and 8 "are combined in a multiplexer 9 which generates an SVC bitstream.

Referring to fig. 9, a hierarchical view 900 illustrates the generation of a data stream containing SUP SPS units. View 900 may be used to illustrate a possible bitstream generated by scalable video encoder 800 of fig. 8. View 900 provides the SVC bitstream to transport interface 17.

An SVC bitstream including one SPS for each spatial level may be generated according to, for example, the implementation of fig. 8. When encoding the m spatial levels, the SVC bitstream includes SPS1, SPS2, and SPSm represented by 10, 10', and 10 "in fig. 9.

In an SVC bitstream, each SPS encodes general information related to spatial levels. This SPS is followed by a header of SUP _ SPS type 11, 11 ', 11 ", 13 ', 13", 15 ' and 15 ". The SUP _ SPS is followed by corresponding encoded video data 12, 12 ', 12 ", 14 ', 14", 16 ' and 16 ", which correspond to a temporal level (n) and a quality level (O), respectively.

Thus, when a layer is not transmitted, the corresponding SUP _ SPS is also not transmitted. Since typically one SUP _ SPS corresponds to each layer.

Typical implementations use a numbering scheme for layers where the base layer has D and Q of zero values. If this numbering scheme is used for view 900, view 900 does not explicitly show the base layer. This does not exclude the use of a base layer. However, view 900 may also be added to explicitly show the bitstream for the base layer, and, for example, a separate SPS for the base layer. Further, view 900 may use an alternative numbering scheme for the base layer, where one or more of bitstreams (1, 1, 1) through (m, n, O) refer to the base layer.

Referring to fig. 10, a block diagram view of a data stream 1000 generated by the implementations of fig. 8 and 9 is provided. Fig. 10 shows the transmission of the following layers:

layer (1, 1, 1): spatial level 1, temporal level 1, quality level 1; transmission including blocks 10, 11 and 12;

layer (1, 2, 1): spatial level 1, temporal level 2, quality level 1; additional transmissions comprising blocks 11 'and 12';

layer (2, 1, 1): spatial level 2, temporal level 1, quality level 1; additional transmissions including blocks 10', 13 and 14;

layer (3, 1, 1): spatial level 3, temporal level 1, quality level 1; additional transmissions including blocks 10 ", 15 and 16;

layer (3, 2, 1): spatial level 3, temporal level 2, quality level 1; additional transmissions including blocks 15 'and 16';

layer (3, 3, 1): spatial class 3, temporal class 3, quality class 1; additional transmissions including blocks 15 "and 16";

the block view of data flow 1000 shows that SPS10 is sent only once and used by layer (1, 1, 1) and layer (1, 2, 1); SPS10 "is sent only once and is used by layer (3, 1, 1), layer (3, 2, 1), and layer (3, 3, 1). Further, data flow 1000 illustrates that no transmissions are intended for the stationThere are parameters of a layer, and only parameters corresponding to the transmitted layer are transmitted. For example, the parameters for layer (2, 2, 1) are not transmitted (andcorrespondingly), since the layer is not transmitted. This provides a high efficiency of the present implementation.

Referring to fig. 11, the encoder 1100 includes an SPS generation unit 1100, a video encoder 1120, and a formatter 1130. The video encoder 1120 receives input video, encodes the input video, and provides the encoded input video to the formatter 1130. The encoded input video may include, for example, multiple layers, such as an encoded base layer and an encoded enhancement layer. The SPS generation unit 1110 generates header information, such as SPS units and SUP SPS units, and provides the header information to the formatter 1130. SPS generation unit 1110 also communicates with video encoder 1120 to provide parameters used by video encoder 1120 in encoding the input video.

SPS generation unit 1110 may be configured to, for example, generate SPS NAL units. The SPS NAL unit may include information describing parameters used in decoding a first layer encoding of a sequence of images. SPS generation unit 1110 may also be configured to, for example, generate a SUP SPS NAL unit having a different structure than the SPS NAL unit. The SUPSPS NAL unit may include information describing parameters used in decoding a second layer encoding of a sequence of images. The first layer encoding and the second layer encoding may be generated by a video encoder 1120.

The formatter 1130 multiplexes the encoded video from the video encoder 1120 and the header information from the SPS generation unit 1110 to generate an output encoded bitstream. The coded bitstream may be a data set including a first layer code of a sequence of pictures, a second layer code of the sequence of pictures, an SPS NAL unit, and a SUP SPS NAL unit.

The components 1110, 1120, and 1130 of the encoder 1100 may take a variety of forms. One or more of the components 1110, 1120, and 1130 may comprise hardware, software, firmware, or a combination, and may operate from a variety of platforms (e.g., a dedicated encoder or a general purpose processor configured by software to operate as an encoder).

Fig. 8 and 11 can be compared. The SPS generation unit 1110 may generate SPS and various kinds shown in fig. 8The video encoder 1120 may generate the bit streams 7, 7', and 7 ″ (which are encodings of the input video) shown in fig. 8. Video encoder 1120 may correspond to, for example, one or more of encoders 2, 2', and 2 ″. The formatter 1130 may generate hierarchically arranged data as shown by reference numerals 8, 8', and 8 ″, and perform the operation of the multiplexer 9 to generate the SVC bitstream of fig. 8.

Fig. 1 and 11 can also be compared. Video encoder 1120 may correspond to, for example, modules 104 and 187 of fig. 1. The formatter 1130 may correspond to, for example, the multiplexer 170. SPS generation unit 1110 is not explicitly shown in fig. 1, although, for example, multiplexer 170 may perform the functions of SPS generation unit 1110.

Other implementations of encoder 1100 do not include video encoder 1120 because, for example, the data is pre-encoded. The encoder 1100 may also provide additional outputs and provide additional communications between components. The encoder 1100 may also be modified to provide additional components, for example, located between existing components.

Referring to fig. 12, there is shown an encoder 1200 that operates in the same manner as encoder 1100. The encoder 1200 includes a memory 1210 in communication with a processor 1220. The memory 1210 may be used, for example, to store input video, to store encoding or decoding parameters, to store intermediate or final results during an encoding process, or to store instructions for performing an encoding method. Such storage may be temporary or permanent.

The processor 1220 receives input video and encodes the input video. The processor 1220 also generates header information and formats an encoded bitstream including the header information and the encoded input video. As in encoder 1100, the header information provided by processor 1220 may include separate structures for conveying header information for multiple layers. The processor 1220 may operate in accordance with instructions stored or resident on, for example, the processor 1220 or the memory 1210 or portions thereof.

Referring to fig. 13, a process 1300 for encoding an input video is shown. Process 1300 may be performed by, for example, encoder 1100 or 1200.

Process 1300 includes generating an SPS NAL unit (1310). The SPS NAL unit includes information that describes a parameter for use in decoding a first layer encoding of a sequence of images. SPS NAL units may or may not be defined by a coding standard. If the SPS NAL unit is defined by a coding standard, the coding standard may require a decoder to operate according to the received SPS NAL unit. Typically, such a requirement is made by declaring the SPS NAL unit to be "canonical". SPS is canonical, for example in the AVC standard, while supplemental enhancement information ("SEI") messages are non-canonical, for example. Accordingly, an AVC-compatible decoder may ignore received SEI messages, but must operate in accordance with the received SPS.

The SPS NAL unit includes information describing one or more parameters for decoding the first layer. The parameter may be, for example, layer-dependent or layer-independent information. Examples of typical layer-dependent parameters include VUI parameters or HRD parameters.

Operation 1310 may be performed by, for example, SPS generation unit 1110, processor 1220, or SPS and PPS inserter 2140. Operation 1310 may also correspond to the generation of SPS in any of blocks 5, 5', 5 "in fig. 8.

Accordingly, the means for performing operation 1310 (i.e., generating the SPS NAL unit) may include various components. For example, such means may include a module for generating the SPS5, 5', or 5 ", the entire encoder system of fig. 1, 8, 11, or 12, the SPS generation unit 1110, the processor 1220, or the SPS and PPS inserter 2140, or equivalents thereof including known and future developed encoders.

Process 1300 includes generating a supplemental ("SUP") SPS NAL unit (1320), the supplemental SPS NAL unit having a different structure than the SPSNLA unit. The SUP SPS NAL unit includes information describing parameters used in decoding a second layer encoding of a sequence of images. The SUP SPS NAL unit may or may not be defined by a coding standard. If the SUP SPS NAL unit is defined by a coding standard, the coding standard may require a decoder to operate according to the received SUP SPS NAL unit. As discussed above with respect to operation 1310, generally, such a requirement is by stating that the SUP SPS NAL unit is "canonical".

Various implementations include a canonical SUP SPS message. For example, for a decoder that decodes more than one layer (e.g., an SVC-compatible decoder), the SUP SPS message may be canonical. Such a multi-layer decoder (e.g., an SVC-compliant decoder) needs to operate according to the information conveyed in the SUP SPS message. However, a single layer decoder (e.g., an AVC compatible decoder) may ignore the SUP SPS message. As another example, SUP SPS messages may be canonical for all decoders, including single-layer and multi-layer decoders. Since the SUP SPS messages are mostly based on SPS messages and are canonical in the AVC standard as well as SVC and MVC extensions, it is not surprising that many implementations include canonical SUPSPS messages. That is, the SUP SPS message carries data similar to the SPS message, functions similarly to the SPS message, and may be considered as a kind of SPS message. It should be clear that implementations with the canonical SUP SPS message may provide compatibility advantages, for example, allowing AVC and SVC decoders to receive a common data stream.

The SUP SPS NAL unit (also referred to as a SUP SPS message) includes one or more parameters for decoding the second layer. The parameter may be, for example, layer-dependent or layer-independent information. Specific examples include VUI parameters or HRD parameters. In addition to being used to decode the second layer, the SUP SPS may also be used to decode the first layer.

Operation 1320 may be performed by, for example, SPS generation unit 1110, processor 1220, or a module similar to SPS and PPS inserter 2140. Operation 1320 may also correspond to the generation of a SUP _ SPS in any of blocks 6, 6', 6 "in fig. 8.

Accordingly, means for performing operation 1320 (i.e., generating the SUP SPS NAL unit) may include various components. For example, such means may include a module for generating the SUP _ SPS6, 6', or 6 ", the entire encoder system of fig. 1, 8, 11, or 12, the SPS generation unit 1110, the processor 1220, or a module similar to the SPS and PPS inserter 2140, or equivalents thereof including known and future developed encoders.

Process 1300 includes encoding a first layer (e.g., base layer) encoding of a sequence of images and encoding a second layer encoding of the sequence of images (1330). These encodings of the image sequence yield a first layer encoding and a second layer encoding. The first layer encoding may be formatted into a series of units referred to as first layer coding units and the second layer encoding may be formatted into a series of units referred to as second layer coding units. Operation 1330 may be performed, for example, by video encoder 1120, processor 1220, encoder 2, 2', or 2 "of fig. 8, or an implementation of fig. 1.

Accordingly, the means for performing operation 1330 may include various components. For example, such means may include encoder 2, 2' or 2 ", the entire encoder system of fig. 1, 8, 11 or 12, video encoder 1120, processor 1220, or one or more core encoders 187 (possibly including decimation module 104), or equivalents thereof, including known and future developed encoders.

Process 1300 includes providing a data set (1340). The data set includes a first layer encoding of a sequence of pictures, a second layer encoding of the sequence of pictures, an SPS NAL unit, and a SUP SPS NAL unit. The data set may be, for example, a bit stream encoded according to a known standard, stored in a memory, or transmitted to one or more decoders. Operation 1340 may be performed by formatter 1130, processor 1220, or multiplexer 170 of fig. 1. Operation 1340 may also be performed in fig. 8 by the generation of any of bitstreams 8, 8', and 8 "and the generation of a multiplexed SVC bitstream.

Accordingly, the device for performing operation 1340 (i.e., providing the data set) may include various components. For example, such means may include a module for generating a bitstream 8, 8', or 8 ", a multiplexer 9, the entire encoder system of fig. 1, 8, 11, or 12, a formatter 1130, a processor 1220, or a multiplexer 170, or equivalents thereof including known or future developed encoders.

The process 1300 may be modified in various ways. For example, in implementations where data is precoded, operation 1330 may be removed from process 1300. Further, in addition to removing operation 1330, operation 1340 may be removed to provide a process for generating description units for multiple layers.

Referring to fig. 14, a data flow 1400 is shown, data flow 1400 may be generated by, for example, process 1300. Data stream 1400 includes portion 1410 for SPS NAL units, portion 1420 for SUP SPS NAL units, portion 1430 for first layer encoded data, and portion 1440 for second layer encoded data. First layer encoded data 1430 is first layer encoded and may be formatted as first layer encoded units. Second layer encoded data 1440 is second layer encoding and may be formatted as second layer coding units. Data stream 1400 may include additional portions that may be appended to portion 1440 or interspersed between portions 1410 through 1440. In addition, other implementations may modify one or more of the portions 1410 through 1440.

Data flow 1400 may be compared to fig. 9 and 10. SPS NAL unit 1410 may be, for example, any of SPS 110, SPS 210', or SPSm10 ". The SUP SPS NAL unit may be, for example, any one of the SUP _ SPS headers 11, 11 ', 11 ", 13 ', 13", 15 ', or 15 ". The first layer encoded data 1430 and the second layer encoded data 1440 may be any one of bit streams for respective layers as shown by bit streams for layers (1, 1, 1) 12 through (m, n, O) 16 ", and include bit streams 12, 12 ', 12", 14 ', 14 ", 16 ', and 16". First layer encoded data 1430 may be a bitstream having a higher level set than second layer encoded data 1440. For example, first layer encoded data 1430 may be a bitstream for layer (2, 2, 1) 14' and second layer encoded data 1440 may be a bitstream for layer (1, 1, 1) 12.

An implementation of data flow 1400 may also correspond to data flow 1000. SPS NAL unit 1410 may correspond to SPS module 10 of data stream 1000. The SUP SPS NAL unit 1420 may correspond to the SUP _ SPS module 11 of the data stream 1000. First layer encoded data 1430 may correspond to a bitstream for layer (1, 1, 1) 12 of data stream 1000. The second layer encoded data 1440 may correspond to a bitstream for layer (1, 2, 1) 12' of the data stream 1000. SUP _ SPS modules 11' of the data stream 1000 may be interspersed between first layer encoded data 1430 and second layer encoded data 1440. The remaining blocks (10' -16 ") shown in data stream 1000 may be appended to data stream 1400 in the same order as shown in data stream 1000.

Fig. 9 and 10 may suggest that the SPS module does not include any layer-specific parameters. Various implementations operate in this manner and typically require SUP _ SPS per layer. However, other implementations allow the SPS to include layer-specific parameters for one or more layers, allowing transmission of one or more layers without the need for SUP _ SPS.

Fig. 9 and 10 suggest that each spatial level has its own SPS. Other implementations vary this feature. For example, other implementations provide separate SPS for each temporal level or each quality level. Further implementations provide a separate SPS for each layer, and other implementations provide a single SPS serving all layers.

Referring to fig. 15, the decoder 1500 includes a parsing unit 1510 that receives an encoded bitstream, e.g., provided by the encoder 1100, the encoder 1200, the process 1300, or the data stream 1400. Parsing unit 1510 is coupled with decoder 1520.

Parsing unit 1510 is configured to access information from SPS NAL units. The information from the SPS NAL unit describes parameters used in decoding the first layer encoding of a sequence of pictures. Parsing unit 1510 is also configured to access information from a SUP SPS NAL unit having a different structure than the SPS NAL unit. The information from the SUPSPS NAL unit describes parameters used in decoding a second layer encoding of a sequence of images. As described in connection with fig. 13, these parameters may be layer-dependent or layer-independent.

The parsing unit 1510 provides parsed header data as output. The header data includes information accessed from SPS NAL units and also includes information accessed from SUP SPS NAL units. Parsing unit 1510 also provides parsed encoded video data as output. The encoded video data includes a first layer encoding and a second layer encoding. Both the header data and the encoded video data are provided to a decoder 1520.

The decoder 1520 decodes the first layer coding using the information accessed from the SPS NAL unit. The decoder 1520 also decodes the second layer coding using the information accessed from the SUP SPS NAL unit. The decoder 1520 also generates a reconstruction of the image sequence based on the decoded first layer and/or the decoded second layer. The decoder 1520 provides the reconstructed video as output. The reconstructed video may be, for example, a first layer encoded reconstruction or a second layer encoded reconstruction.

Comparing fig. 15, 2 and 2a, the parsing unit 1510 may correspond to, for example, one or more of the demultiplexer 202 and/or the entropy decoder 204, 212, 222 or 2245 in some implementations. The decoder 1520 may correspond to, for example, the remaining blocks in fig. 2.

Decoder 1500 may also provide additional outputs and provide additional communications between components. Decoder 1500 may also be modified to provide additional components, for example, located between existing components.

The components 1510 and 1520 of the decoder 1500 may take many forms. One or more of components 1510 and 1520 can include hardware, software, firmware, or a combination and can operate from a variety of platforms (e.g., a dedicated decoder or a general purpose processor configured by software to operate as a decoder).

Referring to fig. 16, a decoder 1600 is shown that operates in the same manner as decoder 1500. The decoder 1600 includes a memory 1610 in communication with a processor 1620. The memory 1610 may be used, for example, to store the input encoded bitstream, to store decoding or encoding parameters, to store intermediate or final results during the decoding process, or to store instructions for performing the decoding method. Such storage may be temporary or permanent.

Processor 1620 receives the coded bitstream and decodes the coded bitstream into reconstructed video. The coded bitstream includes, for example, (1) a first layer code of a sequence of pictures, (2) a second layer code of the sequence of pictures, (3) an SPS NAL unit having information describing parameters used in decoding the first layer code, (4) a SUP SPS NAL unit having a different structure than the SPS NAL unit having information describing parameters used in decoding the second layer code.

Processor 1620 generates reconstructed video based on at least the first layer coding, the second layer coding, information from SPS NAL units, and information from SUP SPS NAL units. The reconstructed video may be, for example, a first layer encoded reconstruction or a second layer encoded reconstruction. Processor 1620 may operate in accordance with instructions stored or resident on, for example, processor 1620 or memory 1610 or a portion thereof.

Referring to fig. 17, a process 1700 for decoding an encoded bitstream is shown. The process 1700 may be performed by, for example, the decoder 1500 or 1600.

Process 1700 includes accessing information from an SPS NAL unit (1710). The accessed information describes parameters used in decoding a first layer encoding of the sequence of images.

The SPS NAL unit may be as described previously with respect to fig. 13. Further, the accessed information may be, for example, HRD parameters. Operation 1710 may be performed by, for example, the parsing unit 1510, the processor 1620, the entropy decoder 204, 212, 222, or 2245, or the decoder control 2205. Operation 1710 may also be performed by one or more components of the encoder in a reconstruction process at the encoder.

Accordingly, the means for performing operation 1710 (i.e., accessing information from an SPS NAL unit) may include various components. Such means may include, for example, parsing unit 1510, processor 1620, a single layer decoder, the entire decoder system of fig. 2, 15, or 16, or one or more components of a decoder, or one or more components of an encoder 800, 1100, or 1200, or equivalents thereof, including known and future developed decoders and encoders.

Process 1700 includes accessing information from a SUP SPS NAL unit having a different structure than the SPS NAL unit (1720). The information accessed from the SUP SPS NAL unit describes parameters used in decoding a second layer encoding of a sequence of images.

The SUP SPS NAL unit may be as previously described with respect to fig. 13. Further, the accessed information may be, for example, HRD parameters. Operation 1720 may be performed by, for example, parsing unit 1510, processor 1620, entropy decoder 204, 212, 222, or 2245, or decoder control 2205. Operation 1720 may also be performed by one or more components of the encoder in a reconstruction process at the encoder.

Accordingly, means for performing operation 1720 (i.e., accessing information from a SUP SPS NAL unit) may include various components. For example, such means may include a parsing unit 1510, a processor 1620, a demultiplexer 202, an entropy decoder 204, 212 or 222, a single layer decoder, or an entire decoder system 200, 1500 or 1600, or one or more components of a decoder, or one or more components of an encoder 800, 1100 or 1200, or equivalents thereof including known and future developed decoders and encoders.

The process 1700 includes accessing a first layer encoding and a second layer encoding of the image sequence (1730). The first layer encoding may have been formatted as first layer coding units and the second layer encoding may have been formatted as second layer coding units. Operation 1730 may be performed by, for example, parsing unit 1510, decoder 1520, processor 1620, entropy decoder 204, 212, 222, or 2245, or various other modules downstream of the entropy decoder. Operation 1730 may also be performed by one or more components of the encoder in a reconstruction process at the encoder.

Accordingly, the means for performing operation 1730 may include various components. For example, such means may include a parsing unit 1510, a decoder 1520, a processor 1620, a demultiplexer 202, an entropy decoder 204, 212 or 222, a single layer decoder, a bitstream receiver, a receiving device, or the entire decoder system 200, 1500 or 1600, or one or more components of a decoder, or one or more components of an encoder 800, 1100 or 1200, or equivalents thereof, including known and future developed decoders and encoders.

The process 1700 includes generating a decode of the image sequence (1740). The decoding of the sequence of images may be based on the first layer encoding, the second layer encoding, information accessed from SPS NAL units, and information accessed from SUP SPS NAL units. Operation 1740 may be performed by, for example, the decoder 1520, the processor 1620, or various modules downstream of the demultiplexer 202 and the input buffer 2210. Operation 1740 may also be performed by one or more components of the encoder in a reconstruction process at the encoder.

Accordingly, the means for performing operation 1740 may include various components. For example, such means may include the decoder 1530, the processor 1620, a single layer decoder, the entire decoder system 200, 1500, or 1600, or one or more components of a decoder, an encoder performing the reconstruction, or one or more components of the encoder 800, 1100, or 1200, or equivalents thereof including known and future developed decoders or encoders.

Another implementation performs an encoding method that includes accessing first layer dependent information in a first set of specification parameters. The accessed first layer dependent information is used to decode a first layer encoding of the sequence of images. The first set of normative parameters may be, for example, SPS's including HRD-related parameters or other layer-dependent information. However, the first set of specification parameters need not be SPS and need not be related to the h.264 standard.

In addition to the first set of parameters being normative, which requires that the decoder operates according to the first set of parameters if such a set of parameters is received, the first set of parameters needs to be received in an implementation. That is, the implementation may also require that the first set of parameters is provided to the decoder.

The encoding method of this implementation also includes accessing second layer dependent information in a second set of specification parameters. The second normative parameter set has a different structure than the first normative parameter set. Furthermore, the accessed second layer dependent information is used for decoding a second layer encoding of the sequence of images. The second set of normative parameters may be, for example, supplemental SPS. The supplemental SPS has a different structure than, for example, the SPS. The supplemental SPS also includes HRD parameters or other layer-dependent information for the second layer (different from the first layer).

The encoding method of this implementation further includes: decoding the sequence of images based on one or more of the accessed first layer dependent information or the accessed second layer dependent information. This may include, for example, decoding the base layer or the enhancement layer.

In other implementation modes, corresponding equipment is further provided for realizing the coding method of the implementation mode. Such a device comprises, for example, a programmed encoder, a programmed processor, a hardware implementation, or a processor-readable medium having instructions for performing an encoding method. For example, systems 1100 and 1200 may implement the encoding method of the present implementation.

Corresponding signals and a medium storing such signals or data of such signals are also provided. Such a signal is generated by, for example, an encoder that performs the encoding method of the present implementation.

Another implementation performs a decoding method similar to the encoding method described above. The decoding method comprises the following steps: a first set of specification parameters is generated that includes information dependent on the first layer. The information dependent on the first layer is used for decoding a first layer encoding of the sequence of images. The decoding method further includes: a second normative parameter set is generated having a different structure than the first normative parameter set. The second normative parameter set includes second layer-dependent information for decoding a second layer encoding of the sequence of images. The decoding method further includes: a set of data is provided that includes a first set of normative parameters and a second set of normative parameters.

In other implementation manners, corresponding equipment is also provided for realizing the decoding method of the implementation manner. Such a device includes, for example, a programmed decoder, a programmed processor, a hardware implementation, or a processor-readable medium having instructions for performing a decoding method. For example, systems 1500 and 1600 may implement the decoding methods of the present implementations.

Note that the term "supplemental" as used above, for example, in "supplemental SPS," is a descriptive term. Thus, "supplemental SPS" does not exclude units that do not include the term "supplemental" in their name. Accordingly, as an example, the current draft of the SVC extension defines a "subset SPS" syntax structure, and the "subset SPS" syntax structure is fully encompassed by the descriptive term "supplemental". The "subset SPS" of the current SVC extension is one implementation of the SUP SPS described in this disclosure.

Implementations may use other types of messages in addition to or instead of SPS NAL units and/or SUP SPS NAL units. For example, at least one implementation generates, sends, receives, accesses, and parses other parameter sets with layer-dependent information.

Furthermore, although SPS and supplemental SPS are discussed primarily in the context of the h.264 standard, other standards may also include SPS, supplemental SPS, or variants of SPS or supplemental SPS. Accordingly, other standards (existing or future developed) may include structures referred to as SPS or supplemental SPS, and such structures may be the same as or variations of SPS and supplemental SPS described herein. Such other standards may, for example, be related to the current h.264 standard (e.g., a revision of an existing h.264 standard), or entirely new standards. Alternatively, other standards (existing or future developed) may include structures that are not referred to as SPS or supplemental SPS, but such structures may be the same as, similar to, or variations of SPS or supplemental SPS described herein.

Note that a parameter set is a set of data including parameters. Such as SPS, PPS, or supplemental SPS.

In various implementations, the data is referred to as being "accessed". "accessing" data may include, for example, receiving, storing, transmitting, or processing the data.

Various implementations are provided and described. These implementations may be used to solve various problems. One such problem arises when multiple interoperability points (IOPs), also referred to as layers, require different values of parameters typically carried in SPS. There is no suitable method for transmitting layer-dependent syntax elements in the SPS for different layers with the same SPS identifier. Sending separate SPS data for each such layer is problematic. For example, in many existing systems, the base layer and its composite temporal layer share the same SPS identifier.

Various implementations provide different NAL unit types for supplemental SPS data. Thus, multiple NAL units may be transmitted and each NAL unit may include supplemental SPS information for different SVC layers, but each NAL unit may be identified by the same NAL unit type. In one implementation, the supplemental SPS information may be provided in a "subset SPS" NAL unit type of the current SVC extension.

It should be clear that the implementations described in this disclosure are not limited to SVC extensions or any other standard. The concepts and features of the disclosed implementations may be used with other standards, either now existing or developed in the future, or in systems that do not comply with any of the standards. As one example, the concepts and features disclosed herein may be used in implementations that operate in the context of an MVC extension. For example, MVC views may require different SPS information, or SVC layers supported in MVC extensions may require different SPS information. Furthermore, features and aspects of the described implementations may also be applicable to other implementations. Accordingly, although implementations described herein are described in the context of SPS for the SVC layer, such description should not be taken as limiting the features and concepts to such implementations or contexts.

Implementations described herein may be implemented in, for example, a method or process, an apparatus, or a software program. Even if only discussed in the context of a single form of implementation (e.g., discussed only as a method), the implementation of features discussed may also be implemented in other forms (e.g., a device or program). The apparatus may be implemented in, for example, suitable hardware, software, and firmware. The methods may be implemented, for example, in a device such as a processor, which refers generally to processing devices including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors may also include communication devices such as computers, cellular telephones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.

Implementations of the various processes and features described herein may be implemented in a variety of different devices or applications, particularly, for example, devices and applications associated with data encoding and decoding. Examples of devices include video encoders, video decoders, video codecs, web servers, set-top boxes, laptop computers, personal computers, cellular telephones, PDAs, and other communication devices. It should be clear that the device may be mobile and may even be mounted on a moving vehicle.

Further, the methods may be implemented by instructions being executed by a processor, and such instructions may be stored on a processor-readable medium, such as an integrated circuit, a software carrier, or other storage device, e.g., hard disk, optical disk, random access memory ("RAM"), read only memory ("ROM"). The instructions may form an application program tangibly embodied on a processor-readable medium. The instructions may be in, for example, hardware, firmware, software, or a combination. The instructions may be found in, for example, an operating system, a stand-alone application, or a combination of the two. A processor may thus be characterized, for example, as both a device configured to perform a process and a device that includes a computer-readable medium having instructions for performing a process.

As will be apparent to those of skill in the art, implementations may produce various signals formatted to carry information that may be stored or transmitted. This information may include, for example, instructions for performing a method, data produced by one of the implementations. For example, the signal may be formatted to carry as data the rules for writing or reading the syntax of the described embodiments, or to carry as data the actual syntax values written by the described embodiments. Such signals may be formatted, for example, as electromagnetic waves (e.g., using the radio frequency portion of the spectrum) or as baseband signals. Such formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information carried by the signal may be, for example, analog or digital information. It is well known that the signal may be transmitted over a variety of different wired or wireless links.

Various implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Further, one of ordinary skill in the art will appreciate that other structures and processes may be substituted for those disclosed and that the resulting implementations will perform at least substantially the same function in at least substantially the same way as the implementations disclosed herein to achieve at least substantially the same result. Accordingly, this application contemplates these and other implementations as falling within the scope of the appended claims.

Claims

1. A decoding method, comprising:

accessing information from a first parameter set network abstraction layer ("NAL") unit, the first parameter set being a syntax structure including syntax elements that apply to zero or more fully encoded video sequences, the information describing parameters used in decoding a first layer encoding of a picture in a sequence of pictures, the parameters being common to multiple layers of the picture;

accessing supplemental information from a second NAL unit having a different NAL unit type code than the first NAL unit, and the supplemental information from the second NAL unit describing (i) a video usability information "VUI" parameter with layer-dependent information for use in decoding a second layer coding of a picture in a sequence of pictures, wherein the second NAL unit does not include a VUI parameter for each layer of the picture, and (ii) an identifier of the first parameter set to indicate that the second NAL unit is for supplementing the first NAL unit;

identifying communication between the first NAL unit and the second NAL unit based on an identifier of the first parameter set;

decoding the first layer encoding based on information from the access of the first NAL unit; and

decoding the second layer encoding after identifying communication between the first NAL unit and the second NAL unit based on the accessed information from the first NAL unit and the accessed supplemental information from the second NAL unit.

2. The decoding method of claim 1, wherein the first NAL unit further includes an identifier for identifying the first parameter set.