[go: up one dir, main page]

HK1209550B - Hypothetical reference decoder parameter syntax structure - Google Patents

Hypothetical reference decoder parameter syntax structure Download PDF

Info

Publication number
HK1209550B
HK1209550B HK15110228.6A HK15110228A HK1209550B HK 1209550 B HK1209550 B HK 1209550B HK 15110228 A HK15110228 A HK 15110228A HK 1209550 B HK1209550 B HK 1209550B
Authority
HK
Hong Kong
Prior art keywords
hrd
parameter syntax
vps
parameters
hrd parameter
Prior art date
Application number
HK15110228.6A
Other languages
Chinese (zh)
Other versions
HK1209550A1 (en
Inventor
王益魁
Original Assignee
高通股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/954,712 external-priority patent/US9319703B2/en
Application filed by 高通股份有限公司 filed Critical 高通股份有限公司
Publication of HK1209550A1 publication Critical patent/HK1209550A1/en
Publication of HK1209550B publication Critical patent/HK1209550B/en

Links

Description

Syntax structure of hypothetical reference decoder parameters
This application claims united states provisional patent application No. 61/711,098, filed on 8/10/2012, the entire contents of which are incorporated herein by reference.
Technical Field
The present invention relates to video encoding and video decoding.
Background
Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, Personal Digital Assistants (PDAs), laptop or desktop computers, tablet computers, electronic book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video gaming consoles, cellular or satellite radio telephones, so-called "smart phones," video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video compression techniques, such as those described in: standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 part 10 Advanced Video Coding (AVC), the High Efficiency Video Coding (HEVC) standard currently under development, and extensions of these standards. Video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing these video compression techniques.
Video compression techniques perform spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (i.e., a video frame or a portion of a video frame) may be partitioned into video blocks. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in inter-coded (P or B) slices of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. A picture may be referred to as a frame and a reference picture may be referred to as a reference frame.
Spatial prediction or temporal prediction results in coding a predictive block for a block. The residual data represents pixel differences between the original block to be coded and the predictive block. The inter-coded block is encoded according to a motion vector that points to a block of reference samples that forms the predictive block, and the residual data indicates a difference between the coded block and the predictive block. The intra-coded block is encoded according to an intra-coding mode and the residual data. For further compression, the residual data may be transformed from the pixel domain to the transform domain, producing residual coefficients that may then be quantized. Quantized coefficients, initially arranged in a two-dimensional array, may be scanned in order to generate a one-dimensional vector of coefficients, and entropy coding may be applied to achieve even more compression.
A multi-view coding bitstream may be generated by, for example, encoding views from multiple views. Some three-dimensional (3D) video standards have been developed that exploit aspects of multiview coding. For example, different views may transmit left and right eye views to support 3D video. Alternatively, some 3D video coding processes may apply so-called multiview plus depth coding. In multiview plus depth coding, a 3D video bitstream may contain not only texture view components, but also depth view components. For example, each view may include one texture view component and one depth view component.
Disclosure of Invention
In general, this disclosure describes signaling of Hypothetical Reference Decoder (HRD) parameters. For example, a video encoder may signal, in a bitstream, a Video Parameter Set (VPS) including a plurality of HRD parameter syntax structures that each include a set of one or more HRD parameters. For each respective HRD parameter syntax structure of the plurality of HRD parameter syntax structures, the VPS further includes a syntax element that indicates whether the HRD parameters of the respective HRD parameter syntax structure include a common set of HRD parameters in addition to a set of sub-layer-specific HRD parameter information that is specific to a particular sub-layer of the bitstream. The common set of HRD parameters is common to all sub-layers of the bitstream. A video decoder or other device may decode the VPS from the bitstream and may perform operations using the HRD parameters of at least one of the HRD parameter syntax structures.
In one example, this disclosure describes a method of decoding video data. The method includes decoding, from an encoded video bitstream, a VPS that includes a plurality of HRD parameter syntax structures that each include HRD parameters. For each respective HRD parameter syntax structure of the plurality of HRD parameter syntax structures, the VPS further includes a syntax element indicating whether the HRD parameters of the respective HRD parameter syntax structure include a common set of HRD parameters. The common set of HRD parameters is common to all sub-layers of the encoded video bitstream. The method also includes performing an operation using the HRD parameters of at least one of the HRD parameter syntax structures.
In another example, this disclosure describes a video decoding device comprising one or more processors configured to decode, from an encoded video bitstream, a VPS that includes a plurality of HRD parameter syntax structures that each include HRD parameters. For each respective HRD parameter syntax structure of the plurality of HRD parameter syntax structures, the VPS further includes a syntax element indicating whether the HRD parameters of the respective HRD parameter syntax structure include a common set of HRD parameters. The common set of HRD parameters is common to all sub-layers of the encoded video bitstream. The one or more processors are also configured to perform an operation using the HRD parameters of at least one of the HRD parameter syntax structures.
In another example, this disclosure describes a video decoding device comprising means for decoding, from an encoded video bitstream, a VPS that includes a plurality of HRD parameter syntax structures that each include HRD parameters. For each respective HRD parameter syntax structure of the plurality of HRD parameter syntax structures, the VPS further includes a syntax element indicating whether the HRD parameters of the respective HRD parameter syntax structure include a common set of HRD parameters, wherein the common set of HRD parameters is common to all sub-layers of the encoded video bitstream. The video decoding device comprises means for performing an operation using the HRD parameters of at least one of the HRD parameter syntax structures.
In another example, this disclosure describes a computer-readable storage medium storing instructions that, when executed by a video decoding device, configure the video decoding device to decode, from an encoded video bitstream, a VPS that includes a plurality of HRD parameter syntax structures that each include HRD parameters. For each respective HRD parameter syntax structure of the plurality of HRD parameter syntax structures, the VPS further includes a syntax element indicating whether the HRD parameters of the respective HRD parameter syntax structure include a common set of HRD parameters. The common set of HRD parameters is common to all sub-layers of the encoded video bitstream. When executed, the instructions further configure the video decoding device to perform operations using the HRD parameters of at least one of the HRD parameter syntax structures.
In another example, this disclosure describes a method of encoding video data. The method includes generating a VPS that includes a plurality of HRD parameter syntax structures that each include HRD parameters. For each respective HRD parameter syntax structure of the plurality of HRD parameter syntax structures, the VPS further includes a syntax element that indicates whether the HRD parameters of the respective HRD parameter syntax structure include a common set of HRD parameters in addition to a set of sub-layer-specific HRD parameter information that is specific to a particular sub-layer of an encoded video bitstream. The common set of HRD parameters is common to all sub-layers of the encoded video bitstream. The method also includes signaling the VPS in the encoded video bitstream.
In another example, this disclosure describes a video encoding device comprising one or more processors configured to generate a VPS that includes a plurality of HRD parameter syntax structures that each include HRD parameters. For each respective HRD parameter syntax structure of the plurality of HRD parameter syntax structures, the VPS further includes a syntax element that indicates whether the HRD parameters of the respective HRD parameter syntax structure include a common set of HRD parameters in addition to a set of sub-layer-specific HRD parameter information that is specific to a particular sub-layer of an encoded video bitstream. The common set of HRD parameters is common to all sub-layers of the encoded video bitstream. The one or more processors are also configured to signal the VPS in the encoded video bitstream.
In another example, this disclosure describes a video encoding device comprising means for generating a VPS that includes a plurality of HRD parameter syntax structures that each include HRD parameters. For each respective HRD parameter syntax structure of the plurality of HRD parameter syntax structures, the VPS further includes a syntax element that indicates whether the HRD parameters of the respective HRD parameter syntax structure include a common set of HRD parameters in addition to a set of sub-layer-specific HRD parameter information that is specific to a particular sub-layer of an encoded video bitstream. The common set of HRD parameters is common to all sub-layers of the encoded video bitstream. The video encoding device also comprises means for signaling the VPS in the encoded video bitstream.
In another example, this disclosure describes a computer-readable storage medium storing instructions that, when executed by a video encoding device, configure the video encoding device to generate a VPS that includes a plurality of HRD parameter syntax structures that each include HRD parameters. For each respective HRD parameter syntax structure of the plurality of HRD parameter syntax structures, the VPS further includes a syntax element that indicates whether the HRD parameters of the respective HRD parameter syntax structure include a common set of HRD parameters in addition to a set of sub-layer-specific HRD parameter information that is specific to a particular sub-layer of an encoded video bitstream. The common set of HRD parameters is common to all sub-layers of the encoded video bitstream. When executed, the instructions further configure the video encoding device to signal the VPS in the encoded video bitstream.
The details of one or more examples of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
Drawings
FIG. 1 is a block diagram illustrating an example video coding system that may utilize the techniques described in this disclosure.
FIG. 2 is a block diagram illustrating an example video encoder that may implement the techniques described in this disclosure.
FIG. 3 is a block diagram illustrating an example video decoder that may implement the techniques described in this disclosure.
Fig. 4 is a flow diagram illustrating example operation of a video encoder in accordance with one or more techniques of this disclosure.
Fig. 5 is a flow diagram illustrating example operation of a device in accordance with one or more techniques of this disclosure.
Fig. 6 is a flow diagram illustrating example operation of a video encoder in accordance with one or more techniques of this disclosure.
Fig. 7 is a flow diagram illustrating example operation of a device in accordance with one or more techniques of this disclosure.
Fig. 8 is a flow diagram illustrating example operation of a video encoder in accordance with one or more techniques of this disclosure.
Fig. 9 is a flow diagram illustrating example operation of a device in accordance with one or more techniques of this disclosure.
Detailed Description
A video encoder may generate a bitstream that includes encoded video data. Because the bitstream includes encoded video data, the bitstream may be referred to herein as an encoded video bitstream. A bitstream may comprise a series of Network Abstraction Layer (NAL) units. NAL units may include Video Coding Layer (VCL) NAL units and non-VCL NAL units. VCL NAL units may include coded slices of a picture. non-VCL NAL units may include Video Parameter Sets (VPS), Sequence Parameter Sets (SPS), Picture Parameter Sets (PPS), Supplemental Enhancement Information (SEI), or other types of data. A VPS is a syntax structure that may contain syntax elements applicable to zero or more full coded video sequences. An SPS is a syntax structure that may contain syntax elements applicable to zero or more full coded video sequences. A single VPS may be applicable to multiple SPSs. A PPS is a syntax structure that may contain syntax elements applicable to zero or more all coded pictures. A single SPS may be applicable to multiple PPS.
A device, such as a Content Delivery Network (CDN) device, a Media Aware Network Element (MANE), a video encoder, or a video decoder, may extract the sub-bitstream from the bitstream. A device may perform a sub-bitstream extraction process by removing certain NAL units from the bitstream. The resulting sub-bitstream includes the remaining unremoved NAL units of the bitstream. In some examples, video data decoded from the sub-bitstream may have a lower frame rate, and/or may represent fewer views than the original bitstream.
Video coding standards may include various features to support the sub-bitstream extraction process. For example, the video data of the bitstream may be divided into sets of layers. For each of the layers, data in a lower layer may be decoded without reference to data in any higher layer. NAL units encapsulate only a single layer of data. Thus, NAL units that encapsulate data of the highest remaining layer of the bitstream may be removed from the bitstream without affecting the decodability of the data in the remaining layers of the bitstream. In Scalable Video Coding (SVC), higher layers may include enhancement data that improves the quality of pictures in lower layers (quality scalability), expands the spatial format of pictures in lower layers (spatial scalability), or increases the temporal rate of pictures in lower layers (temporal scalability). In multiview coding (MVC) and three-dimensional video (3DV) coding, higher layers may include additional views.
Each NAL unit may include a header and a payload. The header of the NAL unit may include a nuh _ reserved _ zero _6bits syntax element. If the NAL unit is related to the base layer in MVC, 3DV coding, or SVC, the nuh _ reserved _ zero _6bits syntax element of the NAL unit is equal to 0. Data in the base layer of the bitstream may be decoded without reference to data in any other layer of the bitstream. The nuh _ reserved _ zero _6bits syntax element may have other non-zero values if the NAL unit is not related to the base layer in MVC, 3DV, or SVC. In particular, if the NAL unit does not relate to the base layer in MVC, 3DV, or SVC, the nuh _ reserved _ zero _6bits syntax element of the NAL unit specifies a layer identifier that identifies the layer associated with the NAL unit.
Furthermore, some pictures within a layer may be decoded without reference to other pictures within the same layer. Thus, NAL units that encapsulate data for certain pictures of a layer may be removed from the bitstream without affecting the decodability of other pictures in the layer. For example, pictures with even Picture Order Count (POC) values may be decoded without reference to pictures with odd POC values. Removing NAL units that encapsulate data of these pictures may reduce the frame rate of the bitstream. A subset of pictures within a layer that may be decoded without reference to other pictures within the layer may be referred to herein as a "sub-layer" or a "temporal sub-layer.
NAL units may include a nuh temporal id plus1 syntax element. The nuh temporal id plus1 syntax element of a NAL unit may specify the temporal identifier of the NAL unit. If the temporal identifier of the first NAL unit is less than the temporal identifier of the second NAL unit, the data encapsulated by the first NAL unit may be decoded without reference to the data encapsulated by the second NAL unit.
The operation points of the bitstream are each associated with a set of layer identifiers (i.e., a set of nuh _ reserved _ zero _6bits values) and a temporal identifier. The set of layer identifiers may be denoted OpLayerIdSet and the time identifier may be denoted temporalld. A NAL unit is associated with an operation point if the layer identifier of the NAL unit is a set of layer identifiers of the operation point and the temporal identifier of the NAL unit is less than or equal to the temporal identifier of the operation point. The operation point is represented as a bitstream subset (i.e., a sub-bitstream) associated with the operation point. The operation point representation of the operation point may include each NAL unit associated with the operation point. The operation point representation does not contain VCL NAL units not associated with the operation point.
The external source may specify a set of target layer identifiers for the operation point. For example, a Content Delivery Network (CDN) device may specify a set of target tier identifiers. In this example, the CDN device may use the set of target tier identifiers to identify the operating point. The CDN device may then extract an operation point representation of the operation point and forward the operation point representation to the client device instead of the original bitstream. Extracting the operation point representation and forwarding the operation point representation to the user-side device may reduce the bitrate of the bitstream.
Furthermore, video coding standards specify a video buffering model. The video buffering model may also be referred to as a "hypothetical reference decoder" or "HRD". The HRD describes how to buffer data for decoding and how to buffer decoded data for output. For example, HRD describes the operation of a coded picture buffer ("CPB") and a decoded picture buffer ("DPB") in a video decoder. The CPB is a first-in-first-out buffer containing access units in the decoding order specified by the HRD. A DPB is a buffer that holds decoded pictures for reference, output reordering, or output delay specified by the HRD.
The video encoder may signal a set of HRD parameters. The HRD parameters control various aspects of the HRD. The HRD parameters may include initial CPB removal delay, CPB size, bit rate, initial DPB output delay, and DPB size. These HRD parameters may be coded in the HRD _ parameters () syntax structure specified in the VPS and/or SPS. HRD parameters may also be specified in a buffering period SEI message or a picture timing SEI message.
As explained above, the operation point representation may have a different frame rate and/or bit rate than the original bit stream. This is because the operation point representation may not include some pictures and/or some data of the original bitstream. Thus, if the video decoder were to remove data from the CPB and/or DPB at a particular rate when processing the original bitstream, and if the video decoder were to remove data from the CPB and/or DPB at the same rate when processing the operation point representation, the video decoder may remove too much or too little data from the CPB and/or DPB. Thus, the video encoder may signal different sets of HRD parameters for different operating points. In the emerging High Efficiency Video Coding (HEVC) standard, a video encoder may signal a set of HRD parameters in a VPS, or a video encoder may signal a set of HRD parameters in a SPS.
Optionally, the set of HRD parameters comprises a set of information common for all temporal sub-layers. A temporal sublayer is a temporal scalable layer of a temporal scalable bitstream consisting of VCL NAL units with a particular temporal identifier and associated non-VCL NAL units. In addition to the set of common information, the set of HRD parameters may include a set of syntax elements specific to individual temporal sub-layers. Because the set of common information is common to multiple sets of HRD parameters, it may not be necessary to signal the set of common information in multiple sets of HRD parameters. In some proposals for HEVC, the common information may be present in the set of HRD parameters when the set of HRD parameters is the first set of HRD parameters in the VPS, or in the set of HRD parameters when the set of HRD parameters is associated with the first operation point.
However, when there are multiple sets of HRD parameters in the VPS, it may be desirable to have multiple different sets of common information for the sets of HRD parameters. This may be particularly true when there are a large number of HRD parameter syntax structures in the VPS. Thus, unlike the first HRD parameter syntax structure, it may be desirable to have a set of common information in the HRD parameter syntax structure.
The techniques of this disclosure provide a design that allows common information of HRD parameter syntax structures to be explicitly signaled for any HRD parameter syntax structure. In other words, the techniques of this disclosure may allow information common to all sub-layers to be explicitly signaled for any hrd _ parameters () syntax structure. This may improve coding efficiency.
Thus, in accordance with one or more techniques of this disclosure, a device, such as a video decoder or other device, may determine, based at least in part on a syntax element in a VPS that includes a plurality of HRD parameter syntax structures, whether a particular HRD parameter syntax structure in the VPS includes a set of HRD parameters that is common to each sub-layer of a bitstream. The device may decode syntax elements from the VPS. The one or more HRD parameter syntax structures may occur in coding order in the VPS prior to the particular HRD parameter syntax structure. In response to determining that the particular HRD parameter syntax structure includes a set of HRD parameters that is common to each sub-layer of the bitstream, the device may perform an operation using the particular HRD parameter syntax structure (including a set of HRD parameters that is common to each sub-layer of the bitstream).
Further, the video encoder may generate a scalable nesting SEI message. The scalable nesting SEI message contains one or more SEI messages. SEI messages nested in scalable nesting SEI messages may include HRD parameters or other information associated with an operation point. Some proposals for HEVC do not allow one SEI message to be applicable to multiple operation points. This may reduce bitrate efficiency because it may cause the video encoder to signal multiple SEI messages with the same information. Thus, the techniques of this disclosure may allow one SEI message to be applicable to multiple operation points. For example, a scalable nesting SEI message may include syntax elements that specify a plurality of operation points applicable to SEI messages nested within the scalable nesting SEI message.
In addition, similar to other types of NAL units, SEI NAL units include NAL unit headers and NAL unit bodies. The NAL unit body of an SEI NAL unit may include an SEI message, such as a scalable nesting SEI message or another type of SEI message. Similarly to other NAL units, the NAL unit header of an SEI NAL unit may include a nuh _ reserved _ zero _6bits syntax element and a nuh _ temporal _ id _ plus1 syntax element. However, in some proposals for HEVC, the nuh _ reserved _ zero _6bits syntax element and/or the nuh temporal id plus1 syntax element of the NAL unit header of the SEI NAL unit are not used to determine an operation point applicable to the SEI message(s) encapsulated by the SEI NAL unit. However, these syntax elements of the SEI NAL unit header may be reused in order to reduce the number of signaled bits. Thus, in accordance with the techniques of this disclosure, a syntax element may be signaled in a scalable nesting SEI message to indicate whether an operation point applicable to a nested SEI message in a SEI NAL unit is an operation point indicated by layer identification information in a NAL unit header of the SEI NAL unit. The layer identification information in the NAL unit header of the SEI NAL unit may include the nuh _ reserved _ zero _6bits value and the nuh _ temporal _ id _ plus1 value of the NAL unit header.
FIG. 1 is a block diagram illustrating an example video coding system 10 that may utilize techniques of this disclosure. As used herein, the term "video coder" generally refers to both video encoders and video decoders. In this disclosure, the terms "video coding" or "coding" may generally refer to video encoding or video decoding.
As shown in fig. 1, video coding system 10 includes a source device 12 and a destination device 14. Source device 12 generates encoded video data. Accordingly, source device 12 may be referred to as a video encoding device or a video encoding apparatus. Destination device 14 may decode the encoded video data generated by source device 12. Destination device 14 may, therefore, be referred to as a video decoding device or a video decoding apparatus. Source device 12 and destination device 14 may be examples of video coding devices or video coding apparatuses.
Source device 12 and destination device 14 may comprise a wide range of devices, including desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" phones, televisions, cameras, display devices, digital media players, video game consoles, in-vehicle computers, or the like.
Destination device 14 may receive encoded video data from source device 12 over channel 16. Channel 16 may comprise one or more media or devices capable of moving encoded video data from source device 12 to destination device 14. In one example, channel 16 may comprise one or more communication media that enable source device 12 to transmit encoded video data directly to destination device 14 in real-time. In this example, source device 12 may modulate the encoded video data according to a communication standard (e.g., a wireless communication protocol), and may transmit the modulated video data to destination device 14. The one or more communication media may include wireless and/or wired communication media such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (e.g., the internet). One or more communication media may include routers, switches, base stations, or other equipment that facilitates communication from source device 12 to destination device 14.
In another example, channel 16 may include a storage medium that stores encoded video data generated by source device 12. In this example, destination device 14 may access the storage medium, such as via disk access or card access. The storage medium may comprise a variety of locally-accessible data storage media such as blu-ray discs, DVDs, CD-ROMs, flash memory, or other suitable digital storage media for storing encoded video data.
In yet another example, channel 16 may comprise a file server or another intermediate storage device that stores encoded video data generated by source device 12. In this example, destination device 14 may access encoded video data stored at a file server or other intermediate storage device via streaming or download. The file server may be a server of the type capable of storing encoded video data and transmitting the encoded video data to destination device 14. Example file servers include web servers (e.g., for a website), File Transfer Protocol (FTP) servers, Network Attached Storage (NAS) devices, and local disk drives.
Destination device 14 may access the encoded video data over a standard data connection, such as an internet connection. Example types of data connections may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both suitable for accessing encoded video data stored on a file server. The transmission of the encoded video data from the file server may be a streaming transmission, a download transmission, or a combination of both.
The techniques of this disclosure are not limited to wireless applications or settings. The techniques may be applied to video coding in a variety of multimedia applications that support applications such as: over-the-air television broadcasts, cable television transmissions, satellite television transmissions, streaming video transmissions, such as via the internet, encoding for video data stored on a data storage medium, decoding of video data stored on a data storage medium, or other applications. In some examples, video coding system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.
Fig. 1 is merely an example and the techniques of this disclosure may be applicable to video coding settings (e.g., video encoding or video decoding) that do not necessarily include any data communication between an encoding device and a decoding device. In other examples, data is retrieved from a local memory that is streamed over a network or the like. A video encoding device may encode and store data to memory and/or a video decoding device may retrieve and decode data from memory. In many examples, the encoding and decoding are performed by devices that do not communicate with each other, but simply encode data to memory and/or retrieve data from memory and decode the data.
In the example of fig. 1, source device 12 includes a video source 18, a video encoder 20, and an output interface 22. In some examples, output interface 22 may include a modulator/demodulator (modem) and/or a transmitter. Video source 18 may include a video capture device such as a video camera, a video archive containing previously captured video data, a video feed interface to receive video data from a video content provider, and/or a computer graphics system for generating video data, or a combination of these sources of video data.
Video encoder 20 may encode video data from video source 18. In some examples, source device 12 transmits the encoded video data directly to destination device 14 via output interface 22. In other examples, the encoded video data may also be stored on a storage medium or on a file server for later access by destination device 14 for decoding and/or playback.
In the example of fig. 1, destination device 14 includes an input interface 28, a video decoder 30, and a display device 32. In some examples, input interface 28 includes a receiver and/or a modem. Input interface 28 may receive encoded video data over channel 16. The display device 32 may be integrated with the destination device 14 or may be external to the destination device 14. In general, display device 32 displays the decoded video data. The display device 32 may include a variety of display devices, such as a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or another type of display device.
Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable circuits, such as one or more microprocessors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic, hardware, or any combinations thereof. When the techniques are implemented in part in software, a device may store instructions for the software in a suitable non-transitory computer-readable storage medium and may execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Any of the above, including hardware, software, a combination of hardware and software, etc., can be considered one or more processors. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in the respective device.
This disclosure may generally refer to video encoder 20 "signaling" certain information to another device, such as video decoder 30. The term "signaling" may generally refer to the communication of syntax elements and/or other data used to decode compressed video data. This communication may occur in real time or near real time. Alternatively, such communication may occur over a span of time, such as may occur when syntax elements in an encoded bitstream are stored to a computer-readable storage medium at an encoding time, which may then be retrieved by a decoding device at any time after storage in such medium.
In some examples, video encoder 20 and video decoder 30 operate in accordance with video compression standards such as the ISO/IEC MPEG-4Visual and ITU-T H.264 (also known as ISO/IEC MPEG-4AVC), including their Scalable Video Coding (SVC) extensions, Multiview Video Coding (MVC) extensions, and MVC-based 3DV extensions. In some cases, any bitstream compliant with MVC-based 3DV always contains sub-bitstreams compliant with MVC profiles (e.g., stereo high profile). Furthermore, efforts are continually being made to generate three-dimensional video (3DV) coding extensions to H.264/AVC, i.e., AVC-based 3 DV. In other examples, video encoder 20 and video decoder 30 may operate according to ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262, or ISO/IEC MPEG-2 Visual, and ITU-T H.264, ISO/IEC Visual.
In other examples, video encoder 20 and video decoder 30 may operate according to the High Efficiency Video Coding (HEVC) standard currently developed by the joint collaborative group of video coding (JCT-VC) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG). A draft of the upcoming HEVC standard, which is referred to as "HEVC working draft 8", is described in Bross et al, "High Efficiency Video Coding (HEVC) text specification draft 8" (JCT-VC joint collaboration team (JCT-VC), ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, conference No. 10, Sweden Stocko M, 2012 7 months, until 13 days 2013, which is available from http:// phenix. int-evry.fr/JCT/doc _ end _ user/documentus/10 _ Stockm/WG 11/JCTVC-J1003-v8. zip). Another draft of the upcoming HEVC standard, referred to as "HEVC working draft 9", is described in the "High Efficiency Video Coding (HEVC) text specification draft 9 by blocs et al (High Efficiency Video Coding (HEVC) text specification draft 9)" (the Video Coding joint collaboration group (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, conference 11 th, Shanghai, 2012, months, until 13 days 2013, which is available from http:/phenix.int-evry.fr/JCT/doc _ end _ user/documentus/11 _ Shanghai/WG11/JCTVC-K1003-v13. zip). Furthermore, efforts are continuing to generate SVC, MVC, and 3DV extensions for HEVC. The 3DV extension of HEVC may be referred to as HEVC-based 3DV or HEVC-3 DV.
In HEVC and other video coding standards, a video sequence typically includes a series of pictures. Pictures may also be referred to as "frames". A picture may include a representation SL、SCbAnd SCrThree sample arrays. SLIs a two-dimensional array (i.e., block) of luma samples. SCbIs a two-dimensional array of Cb chroma samples. SCrIs a two-dimensional array of Cr chroma samples. Chroma samples may also be referred to herein as "chroma" samples. In other cases, the picture may be monochrome, and may include only brightnessAn array of samples.
To generate an encoded representation of a picture, video encoder 20 may generate a set of Coding Tree Units (CTUs). Each of the CTUs may be a coding tree block of luma samples, two corresponding coding tree blocks of chroma samples, and syntax structures for coding samples of the coding tree blocks. The coding tree block may be an nxn block of samples. A CTU may also be referred to as a "tree block" or a "largest coding unit" (LCU). The CTU of HEVC may be broadly similar to macroblocks of other standards such as h.264/AVC. However, the CTUs are not necessarily limited to a particular size and may include one or more Coding Units (CUs). A slice may include an integer number of CTUs that are successively ordered in a raster scan.
To generate a coded CTU, video encoder 20 may recursively perform quadtree partitioning on a coding tree block of the CTU to divide the coding tree block into coding blocks, hence the name "coding tree unit. The coded block is an nxn block of samples. A CU may be two corresponding coding blocks of luma samples and chroma samples, an array of Cb samples, and an array of Cr samples for a picture having an array of luma samples, and syntax structures for coding samples of the coding blocks. Video encoder 20 may partition the coding block of the CU into one or more prediction blocks. A prediction block may be a rectangular (i.e., square or non-square) block of samples to which the same prediction applies. A Prediction Unit (PU) of a CU may be a prediction block of luma samples of a picture, two corresponding prediction blocks of chroma samples, and a syntax structure used to predict the prediction block samples. Video encoder 20 may generate predictive luma, Cb, and Cr blocks for the luma prediction block, the Cb prediction block, and the Cr prediction block for each PU of the CU.
Video encoder 20 may use intra prediction or inter prediction to generate the predictive blocks for the PU. If video encoder 20 uses intra prediction to generate the predictive blocks for the PU, video encoder 20 may generate the predictive blocks for the PU based on decoded samples of a picture associated with the PU.
If video encoder 20 uses inter prediction to generate the predictive blocks for the PU, video encoder 20 may generate the predictive blocks for the PU based on decoded samples of one or more pictures other than the picture associated with the PU. Video encoder 20 may use uni-prediction or bi-prediction to generate the predictive blocks for the PU. When video encoder 20 uses uni-prediction to generate the predictive blocks for the PU, the PU may have a single motion vector. When video encoder 20 uses bi-prediction to generate the predictive blocks for the PU, the PU may have two motion vectors.
After video encoder 20 generates the predictive luma, Cb, and Cr blocks for one or more PUs of the CU, video encoder 20 may generate luma residual blocks for the CU. Each sample in the luma residual block of the CU indicates a difference between luma samples in one of the predictive luma blocks of the CU and corresponding samples in the original luma coding block of the CU. In addition, video encoder 20 may generate Cb residual blocks for the CU. Each sample in the Cb residual block of the CU may indicate a difference between the Cb sample in one of the predictive Cb blocks of the CU and the corresponding sample in the original Cb coding block of the CU. Video encoder 20 may also generate a Cr residual block for the CU. Each sample in the Cr residual block of the CU may indicate a difference between a Cr sample in one of the predictive Cr blocks of the CU and a corresponding sample in the original Cr coding block of the CU.
Moreover, video encoder 20 may use quadtree partitioning to decompose the luma, Cb, and Cr residual blocks of the CU into one or more luma, Cb, and Cr transform blocks. The transform block may be a rectangular block of samples to which the same transform is applied. A Transform Unit (TU) of a CU may be a transform block of luma samples, two corresponding transform blocks of chroma samples, and syntax structures used to transform the transform block samples. Thus, each TU of a CU may be associated with a luma transform block, a Cb transform block, and a Cr transform block. A luma transform block associated with a TU may be a sub-block of a luma residual block of a CU. The Cb transform block may be a sub-block of a Cb residual block of the CU. The Cr transform block may be a sub-block of a Cr residual block of the CU.
Video encoder 20 may apply one or more transforms to a luma transform block of a TU to generate a luma coefficient block for the TU. The coefficient block may be a two-dimensional array of transform coefficients. The transform coefficients may be scalar in number. Video encoder 20 may apply one or more transforms to the Cb transform block of the TU to generate a Cb coefficient block of the TU. Video encoder 20 may apply one or more transforms to a Cr transform block of a TU to generate a Cr coefficient block of the TU.
After generating the coefficient blocks (e.g., luma coefficient blocks, Cb coefficient blocks, or Cr coefficient blocks), video encoder 20 may quantize the coefficient blocks. Quantization generally refers to the process of: transform coefficients are quantized to possibly reduce the amount of data used to represent the transform coefficients, providing further compression. After video encoder 20 quantizes the coefficient block, video encoder 20 may entropy encode syntax elements indicating the quantized transform coefficients. For example, video encoder 20 may perform Context Adaptive Binary Arithmetic Coding (CABAC) on syntax elements that indicate quantized transform coefficients. Video encoder 20 may output the entropy-encoded syntax elements in a bitstream.
Video encoder 20 may output a bitstream that includes the entropy-encoded syntax elements. The bitstream may include a sequence of bits that forms a representation of coded pictures and associated data. The bitstream may include a sequence of Network Abstraction Layer (NAL) units. Each of the NAL units includes a NAL unit header and encapsulates a Raw Byte Sequence Payload (RBSP). The NAL unit header may include a syntax element indicating a NAL unit type code. The NAL unit type code specified by the NAL unit header of the NAL unit indicates the type of the NAL unit. An RBSP may be a syntax structure containing an integer number of bytes encapsulated within a NAL unit. In some cases, the RBSP includes a zero bit.
Different types of NAL units may encapsulate different types of RBSPs. For example, a first type of NAL unit may encapsulate a RBSP of a Picture Parameter Set (PPS); a second type of NAL unit may encapsulate an RBSP of the coded slice; a third type of NAL unit may encapsulate the RBSP of the SEI, and so on. The NAL units of the RBSP that encapsulate the video coding data (as opposed to the RBSP of a parameter set and the RBSP of an SEI message) may be referred to as Video Coding Layer (VCL) NAL units.
Video decoder 30 may receive a bitstream generated by video encoder 20. In addition, video decoder 30 may parse the bitstream to decode syntax elements from the bitstream. Video decoder 30 may reconstruct pictures of the video data based at least in part on syntax elements decoded from the bitstream. The process of reconstructing the video data may be substantially reciprocal to the process performed by video encoder 20. For example, video decoder 30 may use the motion vectors of the PUs to determine predictive blocks for the PUs of the current CU. In addition, video decoder 30 may inverse quantize transform coefficient blocks associated with TUs of the current CU. Video decoder 30 may perform an inverse transform on the transform coefficient blocks to reconstruct the transform blocks associated with the TUs of the current CU. Video decoder 30 may reconstruct the coding blocks of the current CU by adding samples of predictive blocks of PUs of the current CU to corresponding samples of transform blocks of TUs of the current CU. By reconstructing the coding blocks of each CU of a picture, video decoder 30 may reconstruct the picture.
In multiview coding, there may be multiple views of the same scene from different viewpoints. The term "access unit" is used to refer to a set of pictures corresponding to the same instance in time. Thus, video data may be conceptualized as a series of access units over time. A "view component" may be a coded representation of a view in a single access unit. In this disclosure, "view" may refer to a sequence of view components associated with the same view identifier.
Multiview coding supports inter-view prediction. Inter-view prediction is similar to inter prediction used in HEVC and may use the same syntax elements. However, when the video coder performs inter-view prediction on a current video unit (e.g., a PU), video encoder 20 may use as a reference picture a picture that is in the same access unit as the current video unit but in a different view. In contrast, conventional inter prediction uses only pictures in different access units as reference pictures.
In multiview coding, a view may be referred to as a "base view" if a video decoder (e.g., video decoder 30) can decode a picture in the view without reference to a picture in any other view. When coding a picture in one of the non-base views, a video coder (e.g., video encoder 20 or video decoder 30) may add the picture to a reference picture list if the picture is within the same time instance (i.e., access unit) but in a different view than the picture that the video coder is currently coding. Like other inter-prediction reference pictures, a video coder may insert an inter-view prediction reference picture at any position of a reference picture list.
Video coding standards specify a video buffering model. In H.264/AVC and HEVC, the buffer model is referred to as a "hypothetical reference decoder" or "HRD". In HEVC working draft 8, HRD is described in annex C.
The HRD describes how data should be buffered for decoding, and how decoded data is buffered for output. For example, the HRD describes the operation of the CPB, the decoded picture buffer ("DPB"), and the video decoding process. The CPB is a first-in-first-out buffer containing access units in the decoding order specified by the HRD. A DPB is a buffer that holds decoded pictures for reference, output reordering, or output delay specified by the HRD. The behavior of the CPB and DPB can be specified mathematically. HRD may directly impose constraints on timing, buffer size, and bit rate. Furthermore, HRD may indirectly impose constraints on various bitstream characteristics and statistics.
In h.264/AVC and HEVC, bitstream conformance and decoder conformance are specified as part of the HRD specification. In other words, the HRD model specifies a test to determine whether the bitstream conforms to the standard, and a test to determine whether the decoder conforms to the standard. Although HRD is referred to as some decoder, video encoders typically use HRD to ensure bitstream conformance, while video decoders typically do not require HRD.
Both h.264/AVC and HEVC specify two types of bitstream conformance or HRD conformance, i.e., type I and type II. A type I bitstream is a NAL unit stream that contains only VCL NAL units and filler data NAL units for all access units in the bitstream. A type II bitstream is a stream of NAL units that contains, in addition to VCL NAL units and filler data NAL units for all access units in the bitstream, at least one of: additional non-VCL NAL units other than filler data NAL units; and all leading _ zero _8bits, zero _ byte, start _ coded _ prefix _ one _3bytes, and trailing _ zero _8bits syntax elements that form a byte stream from the NAL unit stream.
When the device performs a bitstream conformance test that determines whether the bitstream conforms to a video coding standard, the device may select an operation point for the bitstream. Then, the device may determine a set of HRD parameters applicable to the selected operating point. The device may use a set of HRD parameters applicable to the selected operating point to configure behavior of the HRD. More particularly, the device may use an applicable set of HRD parameters to configure behavior of particular components of the HRD, such as a hypothetical flow scheduler (HSS), CPB, decoding process, DPB, and so on. Then, according to a particular schedule, the HSS may inject the coded video data of the bitstream into the CPBs of the HRD. Furthermore, the device may invoke a decoding process that decodes coded video data in the CPB. The decoding process may output the decoded pictures to the DPB. As the device moves data through the HRD, the device may determine whether a particular set of constraints is still satisfied. For example, when the HRD decodes the operation point representation of the selected operation point, the device may determine whether an overflow or underflow condition occurs in the CPB or the DPB. The device may select and process each operation point of the bitstream in this manner. If there are no operation points that result in a bitstream that violates a constraint, the device may determine that the bitstream conforms to a video coding standard.
Both h.264/AVC and HEVC specify two types of decoder conformance, namely output timing decoder conformance and output order decoder conformance. A decoder claiming conformance to specific profiles, layers, and levels is able to successfully decode all bitstreams compliant with the bitstream conformance requirements of the video coding standard (e.g., HEVC). In this disclosure, "profile" may refer to a subset of the bitstream syntax. A "tier" and a "level" may be specified within each profile. The level of the layer may be a specified set of constraints imposed on the values of syntax elements in the bitstream. These constraints may be simple limits on the values. Alternatively, the constraint may take the form of a constraint on an arithmetic combination of values (e.g., picture width times picture height times number of pictures decoded per second). Typically, the levels specified for lower layers are more constrained than the levels specified for higher layers.
When a device performs a decoder conformance test to determine whether a Decoder Under Test (DUT) complies with a video coding standard, the device may provide a bitstream compliant with the video coding standard to both the HRD and the DUT. The HRD may process the bitstream in the manner described above with respect to bitstream conformance testing. The device may determine that the DUT conforms to the video coding standard if the order of the decoded pictures output by the DUT matches the order of the decoded pictures output by the HRD. Furthermore, if the timing at which the DUT outputs the decoded picture matches the timing at which the HRD outputs the decoded picture, the device may determine that the DUT conforms to the video coding standard.
In the h.264/AVC and HEVC HRD models, decoding or CPB removal may be based on access units. That is, it is assumed that the HRD decodes all access units at once and removes all access units from the CPB. Furthermore, in the H.264/AVC and HEVC HRD models, picture decoding is assumed to be instantaneous. Video encoder 20 may signal a decoding time in a picture timing SEI message to start decoding an access unit. In practical applications, if a conforming video decoder strictly follows the decoding time signaled to start decoding an access unit, the earliest possible time to output a particular decoded picture is equal to the decoding time of that particular picture plus the time required to decode that particular picture. However, in the real world, the time required to decode a picture cannot be equal to zero.
The HRD parameters may control various aspects of HRD. In other words, the HRD may rely on HRD parameters. The HRD parameters may include initial CPB removal delay, CPB size, bit rate, initial DPB output delay, and DPB size. Video encoder 20 may signal these HRD parameters in the HRD _ parameters () syntax structure specified in the Video Parameter Set (VPS) and/or the Sequence Parameter Set (SPS). Individual VPSs and/or SPSs may include multiple HRD parameters () syntax structures for different sets of HRD parameters. In some examples, video encoder 20 may signal the HRD parameters in a buffering period SEI message or a picture timing SEI message.
As explained above, the operation point of the bitstream is associated with a set of layer identifiers (i.e., a set of nuh _ reserved _ zero _6bits values) and temporal identifiers. The operation point representation may include each NAL unit associated with the operation point. The operation point representation may have a different frame rate and/or bit rate than the original bit stream. This is so because the operation point representation may not include some pictures of the original bitstream and/or some data of the original bitstream. Thus, when processing the original bitstream, if video decoder 30 were to remove data from the CPB and/or DPB at a particular rate, and if video decoder 30 were to remove data from the CPB and/or DPB at the same rate when processing the operation point representation, video decoder 30 may remove too much or too little data from the CPB and/or DPB. Thus, video encoder 20 may signal different sets of HRD parameters for different operating points. For example, in VPS, video encoder 20 may include multiple HRD _ parameters () syntax structures that include HRD parameters for different operation points.
In HEVC working draft 8, optionally, the set of HRD parameters includes a set of information that is common for all temporal sub-layers. In other words, the set of HRD parameters may optionally include a set of common syntax elements applicable to operation points including any temporal sub-layer. The temporal sublayer may be a temporal scalable layer of a temporally scalable bitstream consisting of VCL NAL units having a particular value of temporalld and associated non-VCL NAL units. In addition to the set of common information, the set of HRD parameters may include a set of syntax elements specific to individual temporal sub-layers. For example, the hrd _ parameters () syntax structure can optionally include a set of information that is common to all sub-layers and always includes information for a particular sub-layer. Because the set of common information is common to multiple sets of HRD parameters, it may not be necessary to signal the set of common information in multiple sets of HRD parameters. Instead, in HEVC working draft 8, the common information may be present in the set of HRD parameters when the set of HRD parameters is the first set of HRD parameters in the VPS, or in the set of HRD parameters when the set of HRD parameters is associated with the first operation point index. For example, HEVC working draft 8 supports the presence of common information when the hrd _ parameters () syntax structure is the first hrd _ parameters () syntax structure in the VPS, or when the hrd _ parameters () syntax structure is associated with the first operation point index.
Table 1 below is an example syntax structure for the hrd _ parameters () syntax structure in HEVC.
TABLE 1 HRD parameters
In the example of table 1 above and other syntax tables of this disclosure, syntax elements having type descriptors ue (v) may be variable-length unsigned integers encoded using 0-order exponential Golomb (Exp-Golomb) coding, starting with the left bit. In the example of table 1 and the tables below, a syntax element of a descriptor of the form u (n) (where n is a non-negative integer) is an unsigned value of length n.
In the example syntax of table 1, the syntax elements in the "if (commoninfpresentflag) {. }" block are common information of the HRD parameter syntax structure. In other words, common information of the set of HRD parameters may include syntax elements timing _ info _ present _ flag, num _ units _ in _ tick, time _ scale, nal _ HRD _ parameters _ present _ flag, vcl _ HRD _ parameters _ present _ flag, sub _ pic _ cpb _ parameters _ present _ flag, tick _ multivisor _ minus2, du _ cpb _ removal _ delay _ length _ minus1, bit _ rate _ scale, cpb _ size _ scale, initial _ cpb _ removal _ delay _ length _ minus1, cpb _ removal _ delay _ length _ minus1, and pbb _ dput _ minus _ delay _ minus 1.
Furthermore, in the example of table 1, syntax elements fixed _ pic _ rate _ flag [ i ], pic _ duration _ in _ tc _ minus1[ i ], low _ delay _ HRD _ flag [ i ], and cpb _ cnt _ minus1[ i ] may be a set of sublayer-specific HRD parameters. In other words, these syntax elements of the hrd _ parameter () syntax structure may only apply to operation points that include a specific sub-layer. Thus, in addition to optionally included common information, the HRD parameters of the HRD _ parameters () syntax structure may include a set of sub-layer-specific HRD parameters that are specific to a particular sub-layer of the bitstream.
When HighestTid is equal to i, the fixed _ pic _ rate _ flag [ i ] syntax element may indicate that the temporal distance between HRD output times of any two consecutive pictures in output order is constrained in a concrete manner. The HighestTid may be a variable that identifies the highest temporal sub-layer (e.g., of the operating point). When HighestTid is equal to i, the pic _ duration _ in _ tc _ minus1[ i ] syntax element may specify, in clock scale, the temporal distance between HRD output times of any consecutive pictures in output order in the coded video sequence. When HighestTid is equal to i, the low delay HRD flag [ i ] syntax element may specify the HRD mode of operation, as specified in annex C of HEVC working draft 8. When HighestTid is equal to i, the CPB _ cnt _ minus1[ i ] syntax element may specify the number of alternative CPB specifications in the bitstream of the coded video sequence, where one alternative CPB specification refers to one particular CPB operation having a particular set of CPB parameters.
Video encoder 20 may use SEI messages to include in the bitstream postamble data that is not needed to correctly decode the sample values of the picture. However, video decoder 30 or other device may use the metadata included in the SEI message for various other purposes. For example, video decoder 30 or another device may use the metadata in the SEI message for picture output timing, picture display, loss detection, and error concealment.
Video encoder 20 may include one or more SEI NAL units in an access unit. In other words, any number of SEI NAL units may be associated with an access unit. Further, each SEI NAL unit may contain one or more SEI messages. The HEVC standard describes the syntax and semantics for various types of SEI messages. However, the HEVC standard does not describe the handling of SEI messages, because SEI messages do not affect the canonical decoding process. One reason for having SEI messages in the HEVC standard is to achieve the same interpretation of supplemental data in different systems using HEVC. Specifications and systems using HEVC may require a video encoder to generate certain SEI messages or may define specific handling of particular types of received SEI messages. Table 2 below lists the SEI messages specified in HEVC and briefly describes their purpose.
TABLE 2 summary of SEI messages
Us provisional patent application 61/705,102, filed on 24/9/2012, describes various methods for signaling and selecting HRD parameters, including signaling and selecting delay information and timing information in SEI messages. "AHG 9 to Hannuksela et al: operating points in the VPS and nest SEI (AHG 9: Operation points in VPS and nesting SEI) "(ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 video coding joint collaboration team (JCT-VC), meeting 11, Shanghai, 2012, 10 months to 19 days, file JCTVC-K0180v1 number, which is available until 13 days 6 months 2013 from http:// phenix. int-evry.fr/JCT/doc _ end _ user/documents/11_ Shanghai/WG 11/CTVC-K0180-v1. zip) provide another method for signaling HRD parameters and a mechanism for nest SEI messages.
There are several problems or disadvantages with existing techniques for signaling HRD parameters. For example, existing techniques may not allow sharing of a set of HRD parameters by multiple operating points. However, when the number of operation points is high, this may be burdensome for video encoder 20 or another unit that attempts to ensure conformance of the bitstream to generate a different set of HRD parameters for each operation point. Instead, conformance of the bitstream may be ensured by ensuring that each operation point is associated with a set of HRD parameters, but that a particular set of HRD parameters may be shared by multiple operation points. One or more techniques of this disclosure may provide a design that allows one set of HRD parameters to be shared by multiple operating points. In other words, a single set of HRD parameters may be applicable to multiple operating points. This design may allow video encoder 20 or another unit, which attempts to ensure conformance of the bitstream, to trade off between complexity and performance.
In another example of the problems or disadvantages of existing techniques of signaling HRD parameters, when there are multiple sets of HRD parameters in a VPS, it may be desirable to have multiple different sets of common information for the sets of HRD parameters. This may be particularly true when there are a large number of HRD parameter syntax structures in the VPS. Thus, it may be desirable to have a set of common information in a different HRD parameter syntax structure than the first HRD parameter syntax structure. For example, when there are multiple hrd _ parameters () syntax structures in the VPS, especially when the total number of hrd _ parameters () syntax structures is relatively high, to provide improved performance, it may be necessary to have different common information for the hrd _ parameters () syntax structures than the first hrd _ parameters () syntax structures, or than the first operation point index.
One or more techniques of this disclosure provide a design that allows common information for a set of HRD parameters to be explicitly signaled for any set of HRD parameters. For example, the techniques of this disclosure may allow information common to all sub-layers to be explicitly signaled for any hrd _ parameters () syntax structure.
In this way, video encoder 20 may signal, in the bitstream, a VPS that includes a plurality of HRD parameter syntax structures that each include HRD parameters. For each respective HRD parameter syntax structure of the plurality of HRD parameter syntax structures, the VPS further includes a syntax element indicating whether HRD parameters of the respective HRD parameter syntax structure include a common set of HRD parameters in addition to a set of sub-layer-specific HRD parameter information specific to a particular sub-layer of the bitstream. The common set of HRD parameters is common to all sub-layers of the bitstream.
Similarly, video decoder 30 or another device may decode, from the bitstream, a VPS that includes a plurality of HRD parameter syntax structures that each include HRD parameters. For each respective HRD parameter syntax structure of the plurality of HRD parameter syntax structures, the VPS may further include a syntax element indicating whether the HRD parameters of the respective HRD parameter syntax structure include a common set of HRD parameters. Video decoder 30 or other device may perform operations using HRD parameters of at least one of the HRD parameter syntax structures.
Furthermore, existing methods for nesting SEI messages can have several problems or disadvantages. For example, existing techniques for signaling HRD parameters may not allow one SEI message to be applicable to multiple operation points. The techniques of this disclosure may provide a design that allows one SEI message to be applicable to multiple operation points.
In particular, the scalable nesting SEI message may include syntax elements that specify a plurality of operation points applicable to SEI messages nested within the scalable nesting SEI message. In other words, the scalable nesting SEI message may provide a mechanism for associating the SEI message with a bitstream subset (e.g., operation point representation), or with specific layers and sub-layers.
In this way, video encoder 20 may generate a scalable nesting SEI message that includes a plurality of syntax elements that identify a plurality of operation points to which a nested SEI message encapsulated by the scalable nesting SEI message applies. Further, video encoder 20 may signal a scalable nesting SEI message in the bitstream.
In this way, in a video coding process, video decoder 30 or another device may decode, from a scalable nesting SEI message, a plurality of syntax elements that identify operation points to which a nested SEI message encapsulated by the scalable nesting SEI message applies. Moreover, video decoder 30 or other device may perform operations based at least in part on one or more of the syntax elements of the nested SEI message.
Another example of a problem or disadvantage of existing techniques of nested SEI messages is related to the fact that: existing techniques for nesting SEI messages do not use the value of the layer identifier syntax element (e.g., nuh _ reserved _ zero _6bits) in the current SEI NAL unit to determine the operation points applicable to the scalable nesting SEI message encapsulated by the current SEI NAL unit.
The techniques of this disclosure provide for a design that signals whether an operation point applicable to a nested SEI message in an SEI NAL unit is an operation point indicated by layer identification information in a NAL unit header of the SEI NAL unit. The layer identification information in the NAL unit header of the SEI NAL unit may include the value of nuh _ reserved _ zero _6bits and the value of nuh temporal _ id _ plus1 of the NAL unit header. In other words, the techniques of this disclosure may provide a design for using layer identification information in the NAL unit header of the current SEI NAL unit (e.g., the value of nuh _ reserved _ zero _6bits and the value of nuh _ temporal _ id _ plus1) by signaling whether the nested SEI message applies to the default operation point identified by the layer identification information included in the NAL unit header of the current SEI NAL unit (i.e., the SEI NAL unit containing the scalable nesting SEI message).
In this way, in a scalable nesting SEI message encapsulated by an SEI NAL unit, video encoder 20 may include a syntax element that indicates whether a nested SEI message encapsulated by the scalable nesting SEI message applies to a default sub-bitstream. The default sub-bitstream may be an operation point representation of an operation point defined by a layer identifier specified in a NAL unit header of the SEI NAL unit and a temporal identifier specified in the NAL unit header. Furthermore, video encoder 20 may output a bitstream that includes the scalable nesting SEI message.
Similarly, a device such as video decoder 30 or another device may determine whether a nested SEI message encapsulated by a scalable nesting SEI message is applicable to a default sub-bitstream based at least in part on syntax elements in the scalable nesting SEI message encapsulated by an SEI NAL unit. As described above, the default sub-bitstream may be an operation point representation of an operation point defined by a layer identifier specified in a NAL unit header of the SEI NAL unit and a temporal identifier specified in the NAL unit header. When the nested SEI message is applicable to the default sub-bitstream, the device may use the nested SEI message in the operation on the default sub-bitstream. For example, the nested SEI message may include one or more HRD parameters. In this example, the device may use the one or more HRD parameters to perform a bitstream conformance test that determines whether the default sub-bitstream conforms to the video coding standard (e.g., HEVC). Alternatively, in this example, the device may use the one or more HRD parameters to determine whether video decoder 30 satisfies the decoder conformance test.
In another example of the problems or disadvantages of existing approaches for nesting SEI messages, explicit coding of layer identifiers is inefficient. The techniques of this disclosure may increase the efficiency of explicit coding of layer identifiers by differential coding or coding using flags.
FIG. 2 is a block diagram illustrating an example video encoder 20 that may implement the techniques of this disclosure. Fig. 2 is provided for purposes of explanation and should not be taken as a limitation on the techniques as broadly illustrated and described in this disclosure. For purposes of explanation, this disclosure describes video encoder 20 in the context of HEVC coding. However, the techniques of this disclosure may be applicable to other coding standards or methods.
In the example of fig. 2, video encoder 20 includes a prediction processing unit 100, a residual generation unit 102, a transform processing unit 104, a quantization unit 106, an inverse quantization unit 108, an inverse transform processing unit 110, a reconstruction unit 112, a filter unit 114, a decoded picture buffer 116, and an entropy encoding unit 118. The prediction processing unit 100 includes an inter prediction processing unit 120 and an intra prediction processing unit 126. Inter prediction processing unit 120 includes a motion estimation unit 122 and a motion compensation unit 124. In other examples, video encoder 20 may include more, fewer, or different functional components.
Video encoder 20 may receive video data. Video encoder 20 may encode each CTU in a slice of a picture of video data. Each of the CTUs may be associated with a luma Coding Tree Block (CTB) of equal size and a corresponding CTB of the picture. As part of encoding the CTUs, prediction processing unit 100 may perform quadtree partitioning to divide the CTBs of the CTUs into progressively smaller blocks. The smaller block may be a coding block of the CU. For example, prediction processing unit 100 may partition a CTB associated with a CTU into four equally sized sub-blocks, partition one or more of the sub-blocks into four equally sized sub-blocks, and so on.
Video encoder 20 may encode a CU of a CTU to generate an encoded representation of the CU (i.e., a coded CU). As part of encoding a CU, prediction processing unit 100 may partition coding blocks associated with the CU among one or more PUs of the CU. Thus, each PU may be associated with a luma prediction block and a corresponding chroma prediction block. Video encoder 20 and video decoder 30 may support PUs having various sizes. As indicated above, the size of a CU may refer to the size of a luma coding block of the CU, and the size of a PU may refer to the size of a luma prediction block of the PU. Assuming that the size of a particular CU is 2 nx 2N, video encoder 20 and video decoder 30 may support 2 nx 2N or nxn PU sizes for intra prediction, and symmetric PU sizes of 2 nx 2N, 2 nx N, N x 2N, N x N, or similar sizes for inter prediction. Video encoder 20 and video decoder 30 may also support asymmetric partitions of PU sizes of 2 nxnu, 2 nxnd, nlx 2N, and nR x 2N for inter prediction.
By performing inter prediction for each PU of the CU, inter prediction processing unit 120 may generate predictive data for the PU. The predictive data for the PU may include predictive blocks for the PU and motion information for the PU. Inter prediction processing unit 120 may perform different operations for PUs of a CU depending on whether the PU is in an I-slice, a P-slice, or a B-slice. In an I slice, all PUs are intra predicted. Thus, if the PU is in an I slice, inter prediction processing unit 120 does not perform inter prediction on the PU. Thus, for a block encoded in I-mode, a predictive block is formed using spatial prediction from previously encoded neighboring blocks within the same frame.
If the PU is in a P slice, motion estimation unit 122 may search for the reference picture in a list of reference pictures (e.g., "RefPicList 0") for the reference region of the PU. The reference region of the PU may be a region within the reference picture that contains a sample block that most closely corresponds to the sample block of the PU. Motion estimation unit 122 may generate a reference index that indicates a position in RefPicList0 of a reference picture of the reference region that contains the PU. In addition, motion estimation unit 122 may generate a motion vector that indicates a spatial displacement between the coding block of the PU and a reference location associated with the reference region. For example, a motion vector may be a two-dimensional vector that provides an offset from coordinates in a current picture to coordinates in a reference picture. Motion estimation unit 122 may output the reference index and the motion vector as motion information for the PU. Based on the actual samples or interpolated samples at the reference location indicated by the motion vector of the PU, motion compensation unit 124 may generate the predictive blocks of the PU.
If the PU is in a B slice, motion estimation unit 122 may perform uni-directional prediction or bi-directional prediction for the PU. To perform uni-directional prediction for the PU, motion estimation unit 122 may search reference pictures of RefPicList0 or a second reference picture list ("RefPicList 1") to obtain a reference region for the PU. Motion estimation unit 122 may output, as motion information for the PU, a reference index indicating a position in RefPicList0 or RefPicList1 of a reference picture containing the reference region, a motion vector indicating a spatial shift between a prediction block of the PU and a reference position associated with the reference region, and one or more prediction direction indicators indicating whether the reference picture is in RefPicList0 or RefPicList 1. Motion compensation unit 124 may generate the predictive blocks of the PU based at least in part on actual samples or interpolated samples at the reference region indicated by the motion vector of the PU.
To perform bi-directional inter prediction for a PU, motion estimation unit 122 may search reference pictures in RefPicList0 to obtain a reference region for the PU, and may also search reference pictures in RefPicList1 to obtain another reference region for the PU. Motion estimation unit 122 may generate reference indices that indicate positions in RefPicList0 and RefPicList1 of the reference picture containing the reference region. In addition, motion estimation unit 122 may generate motion vectors that indicate spatial displacements between reference locations associated with the reference region and prediction blocks of the PU. The motion information for the PU may include a reference index and a motion vector for the PU. Motion compensation unit 124 may generate the predictive blocks of the PU based at least in part on actual samples or interpolated samples at the reference region indicated by the motion vector of the PU.
Intra-prediction processing unit 126 may generate predictive data for the PU by performing intra-prediction on the PU. The predictive data for the PU may include predictive blocks for the PU and various syntax elements. Intra prediction processing unit 126 may perform intra prediction on PUs in I slices, P slices, and B slices.
To perform intra-prediction for a PU, intra-prediction processing unit 126 may use multiple intra-prediction modes to generate a set of multiple predictive data for the PU. Intra-prediction processing unit 126 may generate predictive blocks for the PU based on samples of neighboring PUs. For PU, CU, and CTU, assuming left-to-right top-to-bottom coding order, the neighboring PU may be above the PU, above-right, above-left, or left. Intra-prediction processing unit 126 may use various numbers of intra-prediction modes, e.g., 33 directional intra-prediction modes. In some examples, the number of intra prediction modes may depend on the size of a prediction block of the PU.
Prediction processing unit 100 may select the predictive data for the PU of the CU from the predictive data for the PU generated by inter prediction processing unit 120 or from the predictive data for the PU generated by intra prediction processing unit 126. In some examples, prediction processing unit 100 selects predictive data for PUs of the CU based on a bitrate/distortion metric for the set of predictive data. The predictive block of the selected predictive data may be referred to herein as the selected predictive block.
Based on the luma, Cb, and Cr coding blocks of the CU, and the selected predictive luma, predictive Cb, and predictive Cr blocks of the PUs of the CU, residual generation unit 102 may generate luma, Cb, and Cr residual blocks of the CU. For example, residual generation unit 102 may generate the residual block of the CU such that each sample in the residual block has a value equal to a difference between a sample in the coding block of the CU and a corresponding sample in a corresponding selected predictive block of the PU of the CU.
Transform processing unit 104 may perform quadtree partitioning to partition a residual block of the CU into transform blocks associated with TUs of the CU. Thus, a TU may be associated with a luma transform block and two corresponding chroma transform blocks. The size and position of luma and chroma transform blocks of a TU of a CU may or may not be based on the size and position of a prediction block of a PU of the CU. A quadtree structure, referred to as a "residual quadtree" (RQT), may include nodes associated with each of the regions. The TUs of a CU may correspond to leaf nodes of a RQT.
By applying one or more transforms to transform blocks of a TU, transform processing unit 104 may generate transform coefficient blocks for each TU of a CU. Transform processing unit 104 may apply various transforms to transform blocks associated with TUs. For example, transform processing unit 104 may apply a Discrete Cosine Transform (DCT), a directional transform, or a conceptually similar transform to the transform blocks. In some examples, transform processing unit 104 does not apply the transform to the transform block. In these examples, the transform block may be treated as a block of transform coefficients.
Quantization unit 106 may quantize transform coefficients in the coefficient block. The quantization process may reduce the bit depth associated with some or all of the transform coefficients. For example, n-bit transform coefficients may be down-rounded to m-bit transform coefficients during quantization, where n is greater than m. Based on the quantization parameter (0P) value associated with the CU, quantization unit 106 may quantize coefficient blocks associated with TUs of the CU. Video encoder 20 may adjust the degree of quantization applied to coefficient blocks associated with a CU by adjusting the QP value associated with the CU. Quantization may cause information to be lost, so the quantized transform coefficients may have lower precision than the original transform coefficients.
Inverse quantization unit 108 and inverse transform processing unit 110 may apply inverse quantization and inverse transform, respectively, to the coefficient block to reconstruct the residual block from the coefficient block. Reconstruction unit 112 may add the reconstructed residual block to corresponding samples from one or more predictive blocks generated by prediction processing unit 100 to generate a reconstructed transform block associated with the TU. By reconstructing the transform blocks for each TU of a CU in this manner, video encoder 20 may reconstruct the coding blocks of the CU.
Filter unit 114 may perform one or more deblocking operations to reduce block artifacts in coding blocks associated with CUs. Decoded picture buffer 116 may store the reconstructed coded block after filter unit 114 performs one or more deblocking operations on the reconstructed coded block. Inter-prediction processing unit 120 may use the reference picture containing the reconstructed coding block to perform inter-prediction on PUs of other pictures. In addition, intra-prediction processing unit 126 may use the reconstructed coding blocks in decoded picture buffer 116 to perform intra-prediction on other PUs in the same picture as the CU.
Entropy encoding unit 118 may receive data from other functional components of video encoder 20. For example, entropy encoding unit 118 may receive coefficient blocks from quantization unit 106 and may receive syntax elements from prediction processing unit 100. Entropy encoding unit 118 may perform one or more entropy encoding operations on the data to generate entropy encoded data. For example, entropy encoding unit 118 may perform a Context Adaptive Variable Length Coding (CAVLC) operation, a CABAC operation, a variable to variable (V2V) length coding operation, a syntax-based context adaptive binary arithmetic coding (SBAC) operation, a Probability Interval Partitioning Entropy (PIPE) coding operation, an exponential golomb encoding operation, or another type of entropy encoding operation on the data. Video encoder 20 may output a bitstream that includes the entropy-encoded data generated by entropy encoding unit 118. For example, the bitstream may include data representing the RQT of a CU.
As indicated above, the techniques of this disclosure may provide a design that allows common information of HRD parameter syntax structures to be explicitly signaled for any HRD parameter syntax structure in a VPS. To enable common information of the HRD parameter syntax structure to be signaled explicitly for any HRD parameters in the VPS, video encoder 20 may generate a VPS syntax structure that follows the example syntax shown in table 3 below.
TABLE 3 VPS syntax Structure
The italicized portion of table 3 indicates the differences between the syntax of table 3 and the corresponding table from HEVC working draft 8. Also, in the example syntax of table 3, the num _ ops _ minus1 syntax element specifies the number of operation _ point () syntax structures present in the VPS. The hrd _ applicable _ ops _ minus1[ i ] syntax element specifies the number of operation points to which the ith hrd _ parameters () syntax structure applies. The hrd _ op _ idx [ i ] [ j ] syntax element specifies the jth operation point in the VPS to which the ith hrd _ parameters () syntax structure applies. As briefly mentioned above, the techniques of this disclosure may allow one set of HRD parameters to be shared by multiple operating points. The HRD _ applicable _ ops _ minus1[ i ] syntax element and HRD _ op _ idx [ i ] [ j ] may be used to indicate the operation point for which the set of HRD parameters applies. In some examples where multiple operation points are not allowed to be applied to a single set of HRD parameters, the HRD _ applicable _ ops _ minus1[ i ] syntax element and the HRD _ op _ idx [ i ] [ j ] syntax element are omitted from table 3.
In the example syntax of table 3, the VPS may include a set of common parameter presence flags (i.e., syntax elements), denoted as cprms _ present _ flag [ i ] in table 3. The cprms _ present _ flag [ i ] syntax element equal to 1 specifies that HRD parameters common to all sub-layers are present in the ith HRD _ parameters () syntax structure in the VPS. The cprms _ present _ flag [ i ] syntax element equal to 0 specifies that the HRD parameters common to all sub-layers are not present in the ith HRD _ parameters () syntax structure in the VPS, but are derived to be the same as the (i-1) th HRD _ parameters () syntax structure in the VPS.
cprms present flag [0] can be inferred to be equal to 1. That is, the device can slaved to determine (i.e., infer) the first HRD _ parameters () syntax structure in the VPS (in coding order) includes HRD parameters that are common to all sub-layers. Thus, the first HRD parameter syntax structure signaled in the VPS includes a common set of HRD parameters. One or more subsequent HRD parameter syntax structures in the VPS may include different common sets of HRD parameters.
As briefly mentioned above, the techniques of this disclosure may allow common information of HRD parameter syntax structures (i.e., HRD parameters that are common to each of the sub-layers) to be explicitly signaled for any HRD parameter syntax structure. The cprms _ present _ flag [ i ] syntax element of table 3 may enable video decoder 30 or another device to determine which of the HRD parameter syntax structures include a set of HRD parameters that are common to each of the sub-layers. Thus, while the first HRD parameter syntax structure may always include the common set of HRD parameters, the one or more HRD parameter syntax structures signaled in the VPS do not include the common set of HRD parameters. The device may use the cprms present flag [ i ] syntax element to determine which of the HRD parameter syntax structures of the VPS include the common set of HRD parameters.
The HRD parameter syntax structure (e.g., the HRD _ parameters () syntax structure) may include a set of sub-layer-specific HRD parameters regardless of whether the HRD parameter syntax structure includes HRD parameters that are common to all sub-layers. When video decoder 30 or another device determines that the particular HRD parameter syntax structure does not include the common set of HRD parameters, video decoder 30 or another device may perform operations using the common set of HRD parameters associated with the previous HRD parameter syntax structure and the set of sub-layer-specific HRD parameters of the particular HRD parameter syntax structure. The previous HRD parameter syntax structure may be a set of HRD parameters signaled in the VPS before the particular HRD parameter syntax structure in coding order. The common set of HRD parameters associated with the previous HRD parameter syntax structure is the common set of HRD parameters included in the previous HRD parameter syntax structure if the previous HRD parameter syntax structure includes the common set of HRD parameters. If the previous HRD parameter syntax structure does not include the common set of HRD parameters, the device may determine that the common set of HRD parameters associated with the previous HRD parameter syntax structure is the common set of HRD parameters associated with the HRD parameter syntax structure that precedes the previous HRD parameter syntax structure in coding order.
As mentioned above, the device may perform operations using the common set of HRD parameters and the sub-layer specific HRD parameters. During this operation, the device may manage operation of the CPB in accordance with one or more of the HRD parameters, decode the video data, and manage decoded pictures in the DPB in accordance with one or more of the HRD parameters. In another example, the common set of HRD parameters and the sub-layer specific HRD parameters may be used to perform a bitstream conformance test or a decoder conformance test.
Furthermore, in some examples, scalable nesting SEI messages provide a mechanism for associating SEI messages with bitstream subsets (e.g., operation point representations) or with specific layers and sub-layers. In some such examples, the scalable nesting SEI message may contain one or more SEI messages. The SEI message contained in the scalable nesting SEI message may be referred to as a nested SEI message. SEI messages that are not contained in scalable nesting SEI messages may be referred to as non-nested SEI messages. In some examples, nested SEI messages in a scalable nesting SEI message may include a set of HRD parameters.
In some examples, there are several limitations on which types of messages can be nested. For example, a buffering period SEI message and any other type of SEI message may not be nested in the same scalable nesting SEI message. The buffering period SEI message may indicate the initial delay of the HRD operation. In another example, a picture timing SEI message and any other type of SEI message may not be nested in the same scalable nesting SEI message. The picture timing SEI message may indicate a picture output time and a picture/sub-picture removal time for the HRD operation. In other examples, the nested picture timing SEI message and the sub-picture timing SEI message may be in the same scalable nesting SEI message. The sub-picture timing SEI message may provide CPB removal delay information to a decoded unit associated with the SEI message.
As indicated above, one or more techniques of this disclosure may allow one SEI message to be applicable to multiple operation points. Furthermore, one or more techniques of this disclosure may enable video encoder 20 to signal whether an operation point applicable to a nested SEI message in an SEI NAL unit is an operation point indicated by layer identification information in a NAL unit header of the SEI NAL unit. In addition, one or more techniques of this disclosure may increase the efficiency of explicit coding of layer identifiers by differential coding. The example syntax and accompanying semantics shown in table 4 below may implement these techniques.
TABLE 4 scalable nesting SEI message
In the example of table 4, the italicized portion may indicate a difference from HEVC working draft 8. Specifically, in the example syntax of table 4, a bitstream _ subset _ flag syntax element equal to 0 specifies that SEI messages nested in scalable nesting SEI messages apply to specific layers and sub-layers. The bitstream _ subset _ flag syntax element equal to 1 specifies that the SEI message nested in the scalable nesting SEI message applies to the sub-bitstream resulting from the sub-bitstream extraction process of sub-clause 10.1 of HEVC working draft 8, which has inputs specified by the syntax elements of the scalable nesting SEI message as specified below. Sub-clause 10.1 of HEVC working draft 8 describes an operation for extracting a sub-bitstream (i.e., an operation point representation) from a bitstream. In particular, sub-clause 10.1 of HEVC working draft 8 provides for deriving a sub-bitstream by removing from the bitstream all NAL units with temporal identifiers greater than tIdTarget (e.g., temporalld) or all NAL units with layer identifiers that are not at values in targetDecLayerIdSet (e.g., nuh _ reserved _ zero _6 bits). the tIdTarget and targetdelayeridset are parameters of the bitstream extraction process. In some examples, the bitstream _ subset _ flag syntax element is equal to 1 if the nested SEI message is a picture buffering SEI message, a picture timing SEI message, or a sub-picture timing SEI message. Otherwise, in these examples, the bitstream _ subset _ flag syntax element is equal to 0.
Furthermore, in the example syntax of table 4, if the bitstream _ subset _ flag syntax element is equal to 1, the scalable nesting SEI message includes the default _ op _ applicable _ flag syntax element. The default _ op _ applicable _ flag syntax element equal to 1 specifies that the nested SEI message (i.e., the SEI message nested within the scalable nesting SEI message) is applicable to the default sub-bitstream, which is the output of the sub-bitstream extraction process of sub-clause 10.1 of HEVC working draft 8, with an input of tIdTarget equal to the temporal identifier (temporalld) of the current SEI NAL unit, and an input of targetdelayerdset consisting of all values of nuh _ reserved _ zero _6bits within the range of 0 to nuh _ reserved _ zero _6bits (including 0 and nuh _ reserved _ zero _6bits) of the current SEI NAL unit. Thus, the default sub-bitstream may be a bitstream derived by removing from the bitstream all NAL units with temporal identifiers greater than the temporal identifier of the current SEI NAL unit or all NAL units with layer identifiers in the range of 0 to the layer identifier of the current SEI NAL unit (e.g., nuh _ reserved _ zero _6bits), including 0 and nuh _ reserved _ zero _6 bits. For example, the default sub-bitstream may be a subset of the bitstream, and the default sub-bitstream may not include VCL NAL units of the bitstream having layer identifiers greater than the layer identifier indicated by the layer identifier syntax element of the NAL unit header, or VCL NAL units of the bitstream having temporal identifiers greater than the temporal identifier indicated by the temporal layer identifier syntax element of the NAL unit header (e.g., nuh temporal id plus 1). The default _ op _ applicable _ flag syntax element equal to 0 specifies that the nested SEI message does not apply to the default sub-bitstream.
In the example syntax of table 4, if the bitstream _ subset _ flag syntax element is equal to 1, the scalable nesting SEI message includes the nesting _ num _ ops _ minus1 syntax element. The nesting _ num _ ops _ minus1 syntax element plus1 specifies the number of nesting _ op _ idx [ i ] syntax elements in the scalable nesting SEI message. Thus, if the nesting _ num _ ops _ minus1 syntax element plus1 is greater than 0, the nesting _ num _ ops _ minus1 syntax element may indicate whether the scalable nesting SEI message includes multiple syntax elements that identify multiple operation points to which the nested SEI message applies. In this way, the device may decode, from the scalable nesting SEI message, a syntax element (nesting _ num _ ops _ minus1) indicating the number of operation points to which the nested SEI message applies. When the nesting _ num _ ops _ minus1 syntax element is not present, the value of nesting _ num _ ops _ minus1 may be inferred to be equal to 0. Thus, if the bitstream _ subset _ flag syntax element is equal to 0, the scalable nesting SEI message does not contain the nesting _ op _ idx [ i ] syntax element.
The nesting _ op _ flag syntax element equal to 0 specifies that nesting LayerIdSet [0] is specified by an all _ layers _ flag syntax element, and (when present) a nesting _ layer _ id _ delta [ i ] syntax element (all values i are in the range of 0 to nesting _ num _ layers _ minus1 (including 0 and nesting _ num _ layers _ minus 1)). The nestingLayerIdSet [ ] syntax element is an array of layer identifiers. The nesting _ op _ flag syntax element equal to 1 specifies that nesting LayerIdSet [ i ] is specified by the nesting _ op _ idx [ i ] syntax element. When not present, the value of nesting _ op _ flag is inferred to be equal to 1.
The nesting _ max _ temporal _ id _ plus1[ i ] syntax element specifies the variable maxtamporalld [ i ]. In the example syntax of table 4, the value of the nesting _ max _ temporal _ id _ plus1[ i ] syntax element is greater than the value of the nuh _ temporal _ id _ plus1 syntax element of the current SEI NAL unit (i.e., the NAL unit containing the scalable nesting SEI message). The variable maxTemporalId [ i ] is set equal to nesting _ max _ temporal _ id _ plus1[ i ] -1.
The nesting _ op _ idx [ i ] syntax element is used to specify the setting of nesting LayerIdSet [ i ]. Setting nesting LayerIdSet [ i ] may consist of op _ layer _ id [ nesting _ op _ idx ] [ i ] (where all values of i are in the range of 0 to op _ num _ layer _ id _ values _ minus1[ nesting _ op _ idx ] (including 0 and op _ num _ layer _ id _ values _ minus1[ nesting _ op _ idx ])). The active VPS may specify an op _ layer _ id [ ] value and an op _ num _ layer _ values _ minus1[ ] value.
Further, in the example syntax of Table 4, the all _ layers _ flag syntax element equal to 0 specifies that setting nestingLayerIdSet [0] consists of nestingLayerId [ i ] (all values i are in the range of 0 to nestingNumLayers minus1 (including 0 and nestingNumLayers minus 1)). The variable nestingLayerId [ i ] is described below. The all _ layers _ flag syntax element equal to 1 specifies that setting nestingLayerIdSet consists of all values of nuh _ reserved _ zero _6bits present in the current access unit equal to or greater than nuh _ reserved _ zero _6bits of the current SEI NAL unit.
The nesting _ num _ layers _ minus1 syntax element plus1 specifies the number of nesting _ layer _ id _ delta [ i ] syntax elements in the scalable nesting SEI message. When i is equal to 0, the nesting _ layer _ id _ delta [ i ] syntax element specifies the difference between the first (i.e., 0 th) nuh _ reserved _ zero _6bits value included in the set nesting LayerIdSet [0] and the nuh _ reserved _ zero _6bits syntax element of the current SEI NAL unit. When i is greater than 0, the nesting _ layer _ id _ delta [ i ] syntax element specifies the difference between the (i-1) th nuh _ reserved _ zero _6bits value and the (i _ reserved _ zero _6bits value included in the set nesting LayerIdSet [0 ].
The variable nestingLayerId [ i ] may be derived as follows, where nuh _ reserved _ zero _6bits is the NAL unit header from the current SEI NAL unit.
nestingLayerId[0]=nuh_reserved_zero_6bits+nesting_layer_id_delta[0]
for(i=1;i<=nesting_num_layers_minus1;i++)
nestingLayerId[i]=nestingLayerId[i-1]+nesting_layer_id_delta[i]
The nestingLayerIdSet [0] is set to consist of nestingLayerId [ i ] (all i values are in the range of 0 to nestingNum _ layers _ minus1 (including 0 and nestingNum _ layers _ minus 1)). When the bitstream _ subset _ flag syntax element is equal to 0, the nested SEI message applies to NAL units having nuh _ reserved _ zero _6bits included in the set nesting layeridset [0], or NAL units having nuh _ reserved _ zero _6bits equal to the current SEI NAL unit, and where nuh _ temporal _ id _ plus1 is in the range of nuh _ temporal id _ plus1 to maxtamelald [0] +1 of the current SEI NAL unit (including nuh _ temporal _ id _ plus1 and maxtamelald [0] +1 of the current SEI NAL unit). The nested SEI message applies to the output of the sub-bitstream extraction process of sub-clause 10.1 of HEVC working draft 8 when the bitstream _ subset _ flag syntax element is equal to 1, which has an input equal to tIdTarget of maxtamporalld [ i ], and an input equal to targetlayeridset [ i ] (each i value is in the range of 0 to nestingjnumopjs _ minus1 (including 0 and nestingjnumops _ minus1)), and also to the default sub-bitstream when the default _ subset _ flag syntax element is equal to 1. The extracted sub-bitstream may result from removing all NAL units with temporal identifiers greater than maxtamporalld [ i ], or removing all NAL units with layer identifiers in the range of 0 to nesting _ num _ ops _ minus 1.
In this manner, for at least one respective operation point of a plurality of operation points to which the nested SEI message applies, a device (e.g., video encoder 20, video decoder 30, or another device such as a content delivery network device) may decode a first syntax element (e.g., nesting _ max _ temporal _ id _ plus1[ i ]) and a second syntax element (e.g., nesting _ op _ idx [ i ]) from the scalable nesting SEI message. Further, the device may determine, based at least in part on the first syntax element, a maximum temporal identifier for the respective operation point. The device may determine, based at least in part on the second syntax element, a set of layer identifiers for the respective operation points.
In the example of table 4, the nesting _ zero _ bit syntax element is equal to 0. The nesting _ zero _ bit syntax element may be used to ensure that scalable nesting SEI messages are byte-aligned. A scalable nesting SEI message may be byte aligned when the number of bits in the scalable nesting SEI message is divisible by 8.
Further, in the example of table 4, the SEI _ message () syntax structure contains an SEI message. Thus, a device may decode, from a scalable nesting SEI message, a plurality of nested SEI messages encapsulated by the scalable nesting SEI message. Each of the nested SEI messages may be applicable to all operation points identified by multiple syntax elements (e.g., nesting _ max _ temporal _ id _ plus1[ i ], nesting _ op _ idx [ i ], etc.).
In an alternative example, the scalable nesting SEI message may follow the example syntax of table 5 below. In the example syntax of table 5, the scalable nesting SEI message may increase the efficiency of explicit coding of layer identifiers by using coding flags, according to one or more techniques of this disclosure.
TABLE 5 scalable nesting SEI message
In the example of table 5, the italicized portion shows the differences from HEVC working draft 8. As shown in table 5, the bitstream _ subset _ flag syntax element, the default _ op _ applicable _ flag syntax element, the nesting _ num _ ops _ minus1 syntax element, the nesting _ max _ temporal _ id _ plus1 syntax element, the nesting _ op _ idx [ i ] syntax element, and the nesting _ zero _ bit syntax element may have the same semantics as described above with respect to table 4.
Further, in the example of table 5, the variable minlayerld is set equal to nuh _ reserved _ zero _6bits +1, where nuh _ reserved _ zero _6bits is the NAL unit header from the current SEI NAL unit. The nesting _ op _ flag syntax element equal to 0 specifies that nesting _ layer _ idset [0] is set by the all _ layers _ flag syntax element and, when present, the nesting _ layer _ id _ included _ flag [ i ] (all values i are in the range of 0 to nesting _ max _ layer _ id-minlayer id-1 (including 0 and nesting _ max _ layer _ id-minlayer id-1)). The nesting _ op _ flag syntax element equal to 1 specifies that nesting LayerIdSet [ i ] is set as specified by the nesting _ op _ idx [ i ] syntax element. When the nesting _ op _ flag syntax element is not present, it is inferred that the value of nesting _ op _ flag is equal to 1.
In the example of Table 5, the all _ layers _ flag syntax element equal to 0 specifies that the setting nestingLayerIdSet [0] consists of nestingLayerId [ i ] (all i values are in the range of 0 to nestingMeax _ layer _ id-minLayerId (including 0 and nestingMeax _ layer _ id-minLayerId)). The nestingLayerId [ i ] variables are described below. In the example of table 5, an all _ layers _ flag equal to 1 specifies that setting nestingLayerIdSet consists of all values of nuh _ reserved _ zero _6bits present in the current access unit that are greater than or equal to the nuh _ reserved _ zero _6bits syntax element of the current SEI NAL unit.
Furthermore, in the example of Table 5, the nesting _ max _ layer _ id syntax element specifies setting the maximum value of nuh _ reserved _ zero _6bits in the nesting LayerIdSet [0 ]. The nesting _ layer _ id _ included _ flag [ i ] syntax element equal to 1 specifies that the value of the nuh _ reserved _ zero _6bits equal to i + minLayerId is included in setting nesting LayerIdSet [0 ]. The nesting _ layer _ id _ included _ flag [ i ] syntax element equal to 0 specifies that the value of the nuh _ reserved _ zero _6bits equal to i + minLayerId is not included in the setting nesting LayerIdSet [0 ].
The variables nestingnumlayersMinus1 and nestingLayerId [ i ] (i is in the range of 0 to nestingNumLayerMinus 1 (including 0 and nestingNumLayerMinus 1)):
for(i=0,j=0;i<nesting_max_layer_id;i++)
if(nesting_layer_id_incuded_flag[i])
nestingLayerId[j++]=I+minLayerId
nestingLayerId[j]=nesting_max_layer_id
nestingNumLayersMinus1=j
setting nestingLayerIdSet [0] may be set to consist of nestingLayerId [ i ] (all i values are in the range of 0 to nestingNumLayersMinus1 (including 0 and nestingNumLayersMinus 1)).
When the bitstream _ subset _ flag syntax element is equal to 0, the nested SEI message may be applicable to NAL units with nuh _ reserved _ zero _6bits included in the set nesting layeridset [0], or NAL units with nuh _ reserved _ zero _6bits equal to the nuh _ reserved _ zero _6bits syntax element of the current SEI NAL unit, and where nuh _ temporal _ id _ plus1 is in the range from the nuh _ temporal _ id _ plus1 syntax element of the current NAL unit to maxtalld [0] +1 (nuh _ temporal _ id _ plus1 syntax element and maxtalld [0] +1 including the current SEI NAL unit).
The nested SEI message may be applicable to the output of the sub-bitstream extraction process of sub-clause 10.1 when the bitstream _ subset _ flag syntax element of the scalable nested SEI message is equal to 1, the process having an input equal to tIdTarget of maxtamporalld [ i ] and an input equal to targetlayeridset [ i ] (each i value is in the range of 0 to nestingjnum _ ops _ minus1 (including 0 and nestingjnum _ ops _ minus1)), and the nested SEI message is also applicable to the default sub-bitstream when default _ op _ applicable _ flag is equal to 1.
FIG. 3 is a block diagram illustrating an example video decoder 30 configured to implement the techniques of this disclosure. Fig. 3 is provided for purposes of explanation and is not limiting of the techniques as broadly illustrated and described in this disclosure. For purposes of explanation, this disclosure describes video decoder 30 in the context of HEVC coding. However, the techniques of this disclosure may be applicable to other coding standards or methods.
In the example of fig. 3, video decoder 30 includes an entropy decoding unit 150, a prediction processing unit 152, an inverse quantization unit 154, an inverse transform processing unit 156, a reconstruction unit 158, a filter unit 160, and a decoded picture buffer 162. Prediction processing unit 152 includes a motion compensation unit 164 and an intra prediction processing unit 166. In other examples, video decoder 30 may include more, fewer, or different functional components.
Coded Picture Buffer (CPB)151 may receive and store encoded video data (e.g., NAL units) of a bitstream. Entropy decoding unit 150 may receive the NAL unit from CPB 151 and parse the NAL unit to decode the syntax elements. Entropy decoding unit 150 may entropy decode entropy-encoded syntax elements in the NAL units. Prediction processing unit 152, inverse quantization unit 154, inverse transform processing unit 156, reconstruction unit 158, and filter unit 160 may generate decoded video data based on syntax elements extracted from the bitstream.
NAL units of a bitstream may include coded slice NAL units. As part of decoding the bitstream, entropy decoding unit 150 may extract and entropy decode syntax elements from the coded slice NAL units. Each of the coded slices may include a slice header and slice data. The slice header may contain syntax elements for the slice. The syntax elements in the slice header may include syntax elements that identify a PPS associated with the picture containing the slice.
In addition to decoding syntax elements from the bitstream, video decoder 30 may perform a reconstruction operation on the non-partitioned CU. To perform a reconstruction operation on an undivided CU, video decoder 30 may perform a reconstruction operation on each TU of the CU. By performing a reconstruction operation for each TU of the CU, video decoder 30 may reconstruct the residual blocks of the CU.
As part of performing the reconstruction operation on the TUs of the CU, inverse quantization unit 154 may inverse quantize (i.e., dequantize) coefficient blocks associated with the TUs. Inverse quantization unit 154 may use the QP value associated with the CU of the TU to determine the degree of quantization and, likewise, the degree of inverse quantization applied by inverse quantization unit 154. That is, the compression ratio, i.e., the ratio of the number of bits used to represent the original sequence to the compressed sequence, can be controlled by adjusting the value of QP used when quantizing the transform coefficients. The compression ratio may also depend on the entropy coding method utilized.
After inverse quantization unit 154 inverse quantizes the coefficient block, inverse transform processing unit 156 may apply one or more inverse transforms to the coefficient block in order to generate a residual block associated with the TU. For example, the inverse transform processing unit 156 may apply an inverse DCT, an inverse integer transform, an inverse Karhunen-Loeve transform (KLT), an inverse rotation transform, an inverse directional transform, or another inverse transform to the coefficient block.
If the PU is encoded using intra prediction, intra prediction processing unit 166 may perform intra prediction to generate a predictive block for the PU. Intra-prediction processing unit 166 may use the intra-prediction mode to generate predictive luma, Cb, and Cr blocks for the PU based on the prediction blocks of the spatially neighboring PUs. Intra-prediction processing unit 166 may determine the intra-prediction mode for the PU based on one or more syntax elements decoded from the bitstream.
The prediction processing unit 152 may construct a first reference picture list (RefPicList0) and a second reference picture list (RefPicList1) based on syntax elements extracted from the bitstream. Furthermore, if the PU is encoded using inter prediction, entropy decoding unit 150 may extract motion information for the PU. Motion compensation unit 164 may determine one or more reference regions for the PU based on the motion information of the PU. Motion compensation unit 164 may generate, based on the sample blocks at the one or more reference blocks for the PU, predictive luma, Cb, and predictive Cr blocks for the PU.
Reconstruction unit 158 may use, as applicable, the luma, Cb, and Cr transform blocks associated with the TUs of the CU, and the predictive luma, Cb, and Cr blocks of the PUs of the CU (i.e., either intra-prediction data or inter-prediction data) to reconstruct the luma, Cb, and Cr coding blocks of the CU. For example, reconstruction unit 158 may add samples of the luma, Cb, and Cr transform blocks to corresponding samples of the predictive luma, Cb, and Cr blocks to reconstruct luma, Cb, and Cr coding blocks of the CU.
Filter unit 160 may perform deblocking operations to reduce block artifacts associated with luma, Cb, and Cr coding blocks of a CU. Video decoder 30 may store luma, Cb, and Cr coding blocks of the CU in decoded picture buffer 162. Decoded picture buffer 162 may provide reference pictures for subsequent motion compensation, intra prediction, and presentation on a display device, such as display device 32 of fig. 1. For example, video decoder 30 may perform intra-prediction or inter-prediction operations on PUs of other CUs based on luma, Cb, and Cr blocks in decoded picture buffer 162. In this way, video decoder 30 may extract a large number of transform coefficient levels of the block of luma coefficients from the bitstream; inverse quantizing the transform coefficient level; applying a transform to a transform coefficient level to produce a transform block; generating a coding block based at least in part on the transform block; and outputting the decoded block for display.
Fig. 4 is a flow diagram illustrating example operations 200 of video encoder 20 in accordance with one or more techniques of this disclosure. In the example of fig. 4, video encoder 20 may generate a VPS that includes a plurality of HRD parameter syntax structures that each include HRD parameters (202). For each respective HRD parameter syntax structure of the plurality of HRD parameter syntax structures, the VPS further includes a syntax element indicating whether HRD parameters of the respective HRD parameter syntax structure include a common set of HRD parameters in addition to a set of sub-layer-specific HRD parameter information specific to a particular sub-layer of the bitstream, wherein the common set of HRD parameters is common to all sub-layers of the bitstream. Further, video encoder 20 may signal the VPS in the bitstream (204).
Fig. 5 is a flow diagram illustrating example operations 250 of a device in accordance with one or more techniques of this disclosure. Operation 250 may be performed by video encoder 20, video decoder 30, or another device. As illustrated in the example of fig. 5, a device may decode, from a bitstream, a VPS that includes a plurality of HRD parameter syntax structures that each include HRD parameters (252). For each respective HRD parameter syntax structure of the plurality of HRD parameter syntax structures, the VPS further includes a syntax element indicating whether the HRD parameters of the respective HRD parameter syntax structure include a common set of HRD parameters.
Further, the device may perform an operation using the HRD parameters of at least one of the HRD parameter syntax structures (254). In some examples, the bitstream may include an operation point representation of a particular operation point to which a particular HRD parameter syntax structure may be applicable, and the device may perform the operation using the HRD parameters of the particular HRD parameter syntax structure. For example, a device may use HRD parameters to perform a bitstream conformance test that determines whether an operation point applicable to the HRD parameter syntax structure complies with a video coding standard (e.g., HEVC). In another example, the device may use the HRD parameters to perform a decoder conformance test.
The common set of HRD parameters may be common to all sub-layers of the bitstream. In some examples, the HRD parameters of each HRD parameter syntax structure include a set of sub-layer-specific HRD parameters that are specific to a particular sub-layer of the bitstream. In some examples, each of the sets of sub-layer-specific HRD parameters includes syntax elements (e.g., syntax elements indicating a temporal distance between HRD output times of any two consecutive pictures in output order, a number indicating alternative coded picture buffer specifications in a bitstream of a coded video sequence). In some examples, when the device determines that the particular HRD parameter syntax structure does not include the common set of HRD parameters, the device may perform operations using the common set of HRD parameters associated with the previous HRD parameter syntax structure and the set of sub-layer-specific HRD parameters of the particular HRD parameter syntax structure.
Fig. 6 is a flow diagram illustrating example operations 300 of video encoder 20 in accordance with one or more techniques of this disclosure. As illustrated in the example of fig. 6, video encoder 20 may generate a scalable nesting SEI message that includes a plurality of syntax elements that identify a plurality of operation points to which a nested SEI message encapsulated by the scalable nesting SEI message applies (302). Further, video encoder 20 may signal a scalable nesting SEI message in the bitstream (304).
Fig. 7 is a flow diagram illustrating example operations 350 of a device in accordance with one or more techniques of this disclosure. Video encoder 20, video decoder 30, or another device may perform operation 350. As illustrated in the example of fig. 7, a device may decode, from a scalable nesting SEI message, a plurality of syntax elements that identify a plurality of operation points to which a nested SEI message encapsulated by the scalable nesting SEI message applies (352). In some examples, the device may decode, from the scalable nesting SEI message, a syntax element (e.g., nesting _ num _ ops _ minus1) that indicates whether the scalable nesting SEI message includes a plurality of syntax elements that identify operation points.
Further, the device may use one or more syntax elements of the nested SEI message to perform operations with respect to any of the operation points to which the nested SEI message applies (354). For example, a device may use syntax elements of the nested SEI message in determining whether any of the operation points to which the nested SEI message applies comply with a bitstream conformance test of a video coding standard (e.g., HEVC). In another example, a device may use syntax elements of a nested message to perform decoder conformance testing.
Fig. 8 is a flow diagram illustrating example operations 400 of video encoder 20 in accordance with one or more techniques of this disclosure. As illustrated in the example of fig. 8, in the scalable nesting SEI message encapsulated by the SEI NAL unit, video encoder 20 may include a syntax element (e.g., default _ op _ applicable _ flag) that indicates whether the nested SEI message encapsulated by the scalable nesting SEI message applies to the default sub-bitstream (402). The default sub-bitstream is an operation point representation of an operation point defined by a layer identifier specified in a NAL unit header of the SEI NAL unit, and a temporal identifier specified in the NAL unit header. A first syntax element in the NAL unit header (e.g., nuh _ reserved _ zero _6bits) may indicate a layer identifier and a second syntax element in the NAL unit header (e.g., nuh _ reserved _ temporal _ id _ plus1) may indicate a temporal identifier.
In the example of fig. 8, in the scalable nesting SEI message, video encoder 20 may include one or more additional syntax elements that identify temporal identifiers of the additional operation points, and maximum layer identifiers of the additional operation points (404). Further, video encoder 20 may signal a scalable nesting SEI message in the bitstream (406). In some examples, the syntax element that indicates whether the nested SEI message encapsulated by the scalable nesting SEI message applies to the default sub-bitstream may be referred to as a first syntax element, and video encoder 20 may include a second syntax element (e.g., bitstream _ subset _ flag) in the scalable nesting SEI message. The second syntax element may indicate whether a nested SEI message encapsulated by the scalable nesting SEI message applies to sub-bitstreams extracted from the bitstream, or whether the nested SEI message applies to specific layers and sub-layers of the bitstream. When the second syntax element indicates that the nested SEI message is applicable to the sub-bitstream extracted from the bitstream, video encoder 20 may include only the first syntax element.
Fig. 9 is a flow diagram illustrating example operations 450 of a device in accordance with one or more techniques of this disclosure. Video encoder 20, video decoder 30, or another device may perform operation 450. As illustrated in the example of fig. 9, a device may determine whether a nested SEI message encapsulated by the scalable nesting SEI message is applicable to a sub-bitstream extracted from the bitstream based at least in part on a first syntax element (e.g., bitstream _ subset _ flag) of the scalable nesting SEI message (452). In response to determining that the nested SEI message encapsulated by the scalable nesting SEI message applies to the sub-bitstream extracted from the bitstream ("yes" of 452), the device may decode a default operation point syntax element (e.g., default _ op _ applicable _ flag) in the scalable nesting SEI message (454). The default operation point syntax element may indicate whether the nested SEI message encapsulated by the scalable nesting SEI message applies to the default sub-bitstream.
The default sub-bitstream may be an operation point representation of an operation point defined by a layer identifier specified in a NAL unit header of the SEI NAL unit, and a temporal identifier specified in the NAL unit header. In some examples, a first syntax element (e.g., nuh _ reserved _ zero _6bits) in the NAL unit header indicates the layer identifier and a second syntax element (e.g., nuh _ reserved _ temporal _ id _ plus1) in the NAL unit header indicates the temporal identifier. The default sub-bitstream may be a subset of the bitstream, and the default sub-bitstream does not include the following VCL NAL units of the bitstream: having a layer identifier greater than the layer identifier indicated by the first syntax element of the NAL unit header or having a temporal identifier greater than the temporal identifier indicated by the second syntax element of the NAL unit header.
Further, the device may determine whether the nested SEI message encapsulated by the scalable nesting SEI message applies to a default sub-bitstream of the bitstream based at least in part on a syntax element (e.g., default _ op _ applicable _ flag) in the scalable nesting SEI message encapsulated by the SEI NAL unit (456). In some examples, the scalable nesting SEI message encapsulates multiple nested SEI messages. In these examples, the device may determine, based on a syntax element (e.g., default _ op _ applicable _ flag), whether each of the nested SEI messages in the scalable nesting SEI message is applicable to a default sub-bitstream.
When the nested SEI message applies to the default sub-bitstream ("yes" of 456), the device may use the nested SEI message in the operation on the default sub-bitstream (458). For example, the nested SEI message may include a set of HRD parameters. In this example, the device may use the HRD parameters in the nested SEI message in an operation that tests whether the default sub-bitstream conforms to the video coding standard (e.g., HEVC). In another example, the device may use the HRD parameters in the nested SEI message in the decoder conformance test. In another example, the device may use nested SEI messages in decoding operations on the default sub-bitstream. In another example, the initial CPB removal delay may be used to steer the system to establish an appropriate initial end-to-end delay, and when video is transported over RTP, the DPB output time may be used to derive the RTP timestamp.
Otherwise, when the nested SEI message does not apply to the default sub-bitstream ("no" of 456), or when the scalable nested SEI message does not apply to the sub-bitstreams extracted from the bitstream ("no" of 452), the device does not use the nested SEI message in the operation on the default sub-bitstream (460). For example, the device may determine the temporal identifier of the second operation point and the maximum layer identifier of the second operation point based on one or more additional syntax elements (e.g., nesting _ max _ temporal _ id _ plus1[ i ], nesting _ op _ idx [ i ], etc.) in the scalable nesting SEI message. In this example, the device may use the nested SEI message in the operation on the additional sub-bitstream, which is the operation point representation of the second operation point.
In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, corresponding to tangible media, such as data storage media, or communication media, including any medium that facilitates transfer of a computer program from one place to another, such as in accordance with a communication protocol. In this manner, the computer-readable medium may generally correspond to (1) a tangible computer-readable storage medium that is not transitory, or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, process code, and/or data structures for implementation of the techniques described in this disclosure. The computer program product may include a computer-readable medium.
By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term "processor" as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Further, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.
The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses including wireless handsets, Integrated Circuits (ICs), or a collection of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Instead, the various units may be combined in a codec hardware unit, as described above, or provided by a collection of interoperability hardware units (including one or more processors as described above) in conjunction with suitable software and/or firmware.
Various examples have been described. These and other examples are within the scope of the following claims.

Claims (32)

1. A method of decoding video data, the method comprising:
decoding, from an encoded video bitstream, a Video Parameter Set (VPS) that includes a plurality of Hypothetical Reference Decoder (HRD) parameter syntax structures that each include HRD parameters, wherein for each respective HRD parameter syntax structure of the plurality of HRD parameter syntax structures other than a first HRD parameter syntax structure included in the VPS, the VPS further includes a syntax element that indicates whether the HRD parameters of the respective HRD parameter syntax structure include a common set of HRD parameters, wherein the common set of HRD parameters is common to all sub-layers of the encoded video bitstream, and wherein the first HRD parameter syntax structure included in the VPS includes the common set of HRD parameters;
and
performing an operation using the HRD parameters of at least one of the HRD parameter syntax structures.
2. The method of claim 1, further comprising:
determining that a particular HRD parameter syntax structure of the plurality of HRD parameter syntax structures does not include the common set of HRD parameters, and
wherein performing the operation using the HRD parameters of at least one of the HRD parameter syntax structures comprises performing the operation using the common set of HRD parameters in the plurality of HRD parameter syntax structures included in a previous HRD parameter syntax structure and additional HRD parameters of the particular HRD parameter syntax structure.
3. The method of claim 1, wherein each of the HRD parameter syntax structures always includes a set of sub-layer-specific HRD parameter information that is specific to a particular sub-layer of the encoded video bitstream.
4. The method of claim 1, wherein a subsequent HRD parameter syntax structure included in the VPS includes a common set of HRD parameters that is different from the common set of HRD parameters included in the first HRD parameter syntax structure included in the VPS.
5. The method of claim 1, wherein at least one HRD parameter syntax structure included in the HRD parameter syntax structures in the VPS does not include the common set of HRD parameters.
6. The method of claim 1, wherein each of the HRD parameter syntax structures applies to an operation point of the encoded video bitstream.
7. The method of claim 1, wherein at least one of the HRD parameter syntax structures applies to a plurality of operation points of the encoded video bitstream.
8. A video decoding device, comprising:
a memory configured to store data, the data comprising an encoded video bitstream; and
one or more processors configured to:
decoding, from the encoded video bitstream, a Video Parameter Set (VPS) that includes a plurality of Hypothetical Reference Decoder (HRD) parameter syntax structures that each include HRD parameters, wherein for each respective HRD parameter syntax structure of the plurality of HRD parameter syntax structures other than a first HRD parameter syntax structure included in the VPS, the VPS further includes a syntax element that indicates whether the HRD parameters of the respective HRD parameter syntax structure include a common set of HRD parameters, wherein the common set of HRD parameters is common to all sub-layers of the encoded video bitstream, wherein the first HRD parameter syntax structure included in the VPS includes the common set of HRD parameters; and
performing an operation using the HRD parameters of at least one of the HRD parameter syntax structures.
9. The video decoding device of claim 8, wherein the one or more processors are further configured to:
determining that a particular HRD parameter syntax structure of the plurality of HRD parameter syntax structures does not include the common set of HRD parameters, an
Performing the operation using the common set of HRD parameters in the plurality of HRD parameter syntax structures included in a previous HRD parameter syntax structure and additional HRD parameters of the particular HRD parameter syntax structure.
10. The video decoding device of claim 8, wherein each of the HRD parameter syntax structures always includes a set of sub-layer-specific HRD parameter information that is specific to a particular sub-layer of the encoded video bitstream.
11. The video decoding device of claim 8, wherein a subsequent HRD parameter syntax structure included in the VPS includes a different common set of HRD parameters than the common set of HRD parameters included in the first HRD parameter syntax structure included in the VPS.
12. The video decoding device of claim 8, wherein at least one of the HRD parameter syntax structures included in the VPS does not include the common set of HRD parameters.
13. The video decoding device of claim 8, wherein each of the HRD parameter syntax structures is applicable to an operation point of the encoded video bitstream.
14. The video decoding device of claim 8, wherein at least one of the HRD parameter syntax structures applies to a plurality of operation points of the encoded video bitstream.
15. A video decoding device, comprising:
means for decoding, from an encoded video bitstream, a Video Parameter Set (VPS) that includes a plurality of Hypothetical Reference Decoder (HRD) parameter syntax structures that each include HRD parameters, wherein for each respective HRD parameter syntax structure of the plurality of HRD parameter syntax structures other than a first HRD parameter syntax structure included in the VPS, the VPS further includes a syntax element that indicates whether the HRD parameters of the respective HRD parameter syntax structure include a common set of HRD parameters, wherein the common set of HRD parameters is common to all sub-layers of the encoded video bitstream, and wherein the first HRD parameter syntax structure included in the VPS includes the common set of HRD parameters; and
means for performing an operation using the HRD parameters of at least one of the HRD parameter syntax structures.
16. The video decoding device of claim 15, wherein:
a subsequent HRD parameter syntax structure included in the VPS includes a common set of HRD parameters that is different from the common set of HRD parameters included in the first HRD parameter syntax structure included in the VPS, and
each of the HRD parameter syntax structures is applicable to an operation point of the encoded video bitstream.
17. A non-transitory computer-readable storage medium storing instructions that, when executed by a video decoding device, configure the video decoding device to:
decoding, from an encoded video bitstream, a Video Parameter Set (VPS) that includes a plurality of Hypothetical Reference Decoder (HRD) parameter syntax structures that each include HRD parameters, wherein for each respective HRD parameter syntax structure of the plurality of HRD parameter syntax structures other than a first HRD parameter syntax structure included in the VPS, the VPS further includes a syntax element that indicates whether the HRD parameters of the respective HRD parameter syntax structure include a common set of HRD parameters, wherein the common set of HRD parameters is common to all sub-layers of the encoded video bitstream, and wherein the first HRD parameter syntax structure included in the VPS includes the common set of HRD parameters;
and
performing an operation using the HRD parameters of at least one of the HRD parameter syntax structures.
18. The non-transitory computer-readable storage medium of claim 17, wherein:
a subsequent HRD parameter syntax structure included in the VPS includes a common set of HRD parameters that is different from the common set of HRD parameters included in the first HRD parameter syntax structure included in the VPS, and
each of the HRD parameter syntax structures is applicable to an operation point of the encoded video bitstream.
19. A method of encoding video data, the method comprising:
generating a Video Parameter Set (VPS) including a plurality of Hypothetical Reference Decoder (HRD) parameter syntax structures that each include HRD parameters, wherein for each respective HRD parameter syntax structure of the plurality of HRD parameter syntax structures other than a first HRD parameter syntax structure included in the VPS, the VPS further includes a syntax element that indicates whether the HRD parameters of the respective HRD parameter syntax structure include a common set of HRD parameters, wherein the common set of HRD parameters is common to all sub-layers of an encoded video bitstream including encoded pictures of the video data,
and wherein the first HRD parameter syntax structure included in the VPS includes the common set of HRD parameters; and
signaling the VPS in the encoded video bitstream.
20. The method of claim 19, wherein each of the HRD parameter syntax structures always includes a set of sub-layer-specific HRD parameter information that is specific to a particular sub-layer of the encoded video bitstream.
21. The method of claim 19, wherein a subsequent HRD parameter syntax structure included in the VPS includes a common set of HRD parameters that is different from the common set of HRD parameters included in the first HRD parameter syntax structure included in the VPS.
22. The method of claim 19, wherein at least one HRD parameter syntax structure in the HRD parameter syntax structures included in the VPS does not include the common set of HRD parameters.
23. A video encoding device, comprising:
a data storage medium configured to store video data; and
one or more processors configured to:
generating a Video Parameter Set (VPS) including a plurality of Hypothetical Reference Decoder (HRD) parameter syntax structures that each include HRD parameters, wherein for each respective HRD parameter syntax structure of the plurality of HRD parameter syntax structures other than a first HRD parameter syntax structure included in the VPS, the VPS further includes a syntax element that indicates whether the HRD parameters of the respective HRD parameter syntax structure include a common set of HRD parameters, wherein the common set of HRD parameters is common to all sub-layers of an encoded video bitstream that includes encoded pictures of the video data, and wherein the first HRD parameter syntax structure included in the VPS includes the common set of HRD parameters; and
signaling the VPS in the encoded video bitstream.
24. The video encoding device of claim 23, wherein each of the HRD parameter syntax structures always includes a set of sub-layer-specific HRD parameter information that is specific to a particular sub-layer of the encoded video bitstream.
25. The video encoding device of claim 23, wherein a subsequent HRD parameter syntax structure included in the VPS includes a common set of HRD parameters that is different from the common set of HRD parameters included in the first HRD parameter syntax structure included in the VPS.
26. The video encoding device of claim 23, wherein at least one HRD parameter syntax structure in the HRD parameter syntax structures included in the VPS does not include the common set of HRD parameters.
27. A video encoding device, comprising:
means for generating a Video Parameter Set (VPS) including a plurality of Hypothetical Reference Decoder (HRD) parameter syntax structures that each include HRD parameters, wherein for each respective HRD parameter syntax structure of the plurality of HRD parameter syntax structures other than a first HRD parameter syntax structure included in the VPS, the VPS further includes a syntax element indicating whether the HRD parameters of the respective HRD parameter syntax structure include a common set of HRD parameters, wherein the common set of HRD parameters is common to all sub-layers of an encoded video bitstream that includes encoded pictures of video data, and wherein the first HRD parameter syntax structure included in the VPS includes the common set of HRD parameters; and
means for signaling the VPS in the encoded video bitstream.
28. The video encoding device of claim 27, wherein:
a subsequent HRD parameter syntax structure included in the VPS includes a common set of HRD parameters that is different from the common set of HRD parameters included in the first HRD parameter syntax structure included in the VPS, and
each of the HRD parameter syntax structures is applicable to an operation point of the encoded video bitstream.
29. The video encoding device of claim 27, wherein each of the HRD parameter syntax structures always includes a set of sub-layer-specific HRD parameter information that is specific to a particular sub-layer of the encoded video bitstream.
30. A non-transitory computer-readable storage medium storing instructions that, when executed by a video encoding device, configure the video encoding device to:
generating a Video Parameter Set (VPS) including a plurality of Hypothetical Reference Decoder (HRD) parameter syntax structures that each include HRD parameters, wherein for each respective HRD parameter syntax structure of the plurality of HRD parameter syntax structures other than a first HRD parameter syntax structure included in the VPS, the VPS further includes a syntax element that indicates whether the HRD parameters of the respective HRD parameter syntax structure include a common set of HRD parameters, wherein the common set of HRD parameters is common to all sub-layers of the encoded video bitstream including encoded pictures of video data,
and wherein the first HRD parameter syntax structure included in the VPS includes the common set of HRD parameters; and
signaling the VPS in the encoded video bitstream.
31. The non-transitory computer-readable storage medium of claim 30, wherein:
a subsequent HRD parameter syntax structure included in the VPS includes a common set of HRD parameters that is different from the common set of HRD parameters included in the first HRD parameter syntax structure included in the VPS, and
each of the HRD parameter syntax structures is applicable to an operation point of the encoded video bitstream.
32. The non-transitory computer-readable storage medium of claim 30, wherein each of the HRD parameter syntax structures always includes a set of sub-layer-specific HRD parameter information that is specific to a particular sub-layer of the encoded video bitstream.
HK15110228.6A 2012-10-08 2013-09-20 Hypothetical reference decoder parameter syntax structure HK1209550B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201261711098P 2012-10-08 2012-10-08
US61/711,098 2012-10-08
US13/954,712 US9319703B2 (en) 2012-10-08 2013-07-30 Hypothetical reference decoder parameter syntax structure
US13/954,712 2013-07-30
PCT/US2013/060906 WO2014058598A1 (en) 2012-10-08 2013-09-20 Hypothetical reference decoder parameter syntax structure

Publications (2)

Publication Number Publication Date
HK1209550A1 HK1209550A1 (en) 2016-04-01
HK1209550B true HK1209550B (en) 2018-10-12

Family

ID=

Similar Documents

Publication Publication Date Title
EP2904787B1 (en) Hypothetical reference decoder parameter syntax structure
US9521393B2 (en) Non-nested SEI messages in video coding
EP3058743B1 (en) Support of multi-mode extraction for multi-layer video codecs
HK1209550B (en) Hypothetical reference decoder parameter syntax structure
HK1207775B (en) Method, device, and readable storage medium for processing video data