[go: up one dir, main page]

HK1198236B - Indication of use of wavefront parallel processing in video coding - Google Patents

Indication of use of wavefront parallel processing in video coding Download PDF

Info

Publication number
HK1198236B
HK1198236B HK14111634.3A HK14111634A HK1198236B HK 1198236 B HK1198236 B HK 1198236B HK 14111634 A HK14111634 A HK 14111634A HK 1198236 B HK1198236 B HK 1198236B
Authority
HK
Hong Kong
Prior art keywords
picture
syntax element
tile
ctbs
ctb
Prior art date
Application number
HK14111634.3A
Other languages
Chinese (zh)
Other versions
HK1198236A1 (en
Inventor
王益魁
穆哈默德.蔡德.科班
Original Assignee
高通股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/718,883 external-priority patent/US9332259B2/en
Application filed by 高通股份有限公司 filed Critical 高通股份有限公司
Publication of HK1198236A1 publication Critical patent/HK1198236A1/en
Publication of HK1198236B publication Critical patent/HK1198236B/en

Links

Description

Indication of the use of wavefront parallel processing in video coding
This application claims the right of U.S. provisional patent application No. 61/588,096, filed on month 1, 2012, and on day 18, the entire contents of which are incorporated herein by reference.
Technical Field
This disclosure relates to video coding (i.e., encoding and/or decoding of video data).
Background
Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, Personal Digital Assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video gaming consoles, cellular or satellite radio telephones, so-called "smart phones," video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video compression techniques such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 part 10 (advanced video coding (AVC)), the High Efficiency Video Coding (HEVC) standard currently under development, and extensions of these standards. Video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing these video compression techniques.
Video compression techniques perform spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (i.e., a video frame or a portion of a video frame) may be partitioned into video blocks, which may also be referred to as treeblocks, Coding Units (CUs), and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in inter-coded (P or B) slices of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. A picture may be referred to as a frame and a reference picture may be referred to as a reference frame.
Spatial or temporal prediction generates a predictive block for the block to be coded. The residual data represents pixel differences between the original block to be coded and the predictive block. An inter-coded block is encoded according to a motion vector that points to a block of reference samples that forms a predictive block and residual data that indicates a difference between the coded block and the predictive block. The intra-coded block is encoded according to an intra-coding mode and the residual data. For further compression, the residual data may be transformed from the pixel domain to the transform domain, producing residual coefficients, which may then be quantized. Quantized coefficients, initially arranged in a two-dimensional array, may be scanned in order to generate one-dimensional vectors of coefficients, and entropy coding may be applied to achieve even more compression.
Disclosure of Invention
In general, techniques are described for video coding in which combinations of tiles within a single picture and Wavefront Parallel Processing (WPP) are not allowed. More specifically, a video encoder generates a bitstream that includes a syntax element that indicates whether a picture is encoded according to a first coding mode or a second coding mode. In the first coding mode, the picture is encoded in its entirety using WPP. In the second coding mode, each tile of the picture is encoded without using WPP. A video decoder parses the syntax element from the bitstream and determines whether the syntax element has a particular value. In response to determining that the syntax element has the particular value, the video decoder completely decodes the picture using WPP. In response to determining that the syntax element does not have the particular value, the video decoder decodes each tile of the picture without using WPP.
In one aspect, this disclosure describes a method for decoding video data. The method includes parsing syntax elements from a bitstream that includes a coded representation of a picture in the video data. Additionally, the method includes, in response to determining that the syntax element has a particular value, decoding the picture in its entirety using WPP. The method also includes, in response to determining that the syntax element does not have the particular value, decoding each tile of the picture without using WPP, wherein the picture has one or more tiles.
In another aspect, this disclosure describes a method for encoding video data. The method includes generating a bitstream that includes a syntax element that indicates whether a picture is encoded according to a first coding mode or a second coding mode. In the first coding mode, the picture is encoded in its entirety using WPP. In the second coding mode, each tile of the picture is encoded without using WPP, wherein the picture has one or more tiles.
In another aspect, this disclosure describes a video decoding device comprising one or more processors configured to parse syntax elements from a bitstream that includes a coded representation of a picture in video data. The one or more processors are configured to, in response to determining that the syntax element has a particular value, decode the picture in its entirety using WPP. Additionally, the one or more processors are configured to, in response to determining that the syntax element does not have the particular value, decode each tile of the picture without using WPP, wherein the picture has one or more tiles.
In another aspect, this disclosure describes a video encoding device comprising one or more processors configured to generate a bitstream that includes a syntax element indicating whether a picture is encoded according to a first coding mode or a second coding mode. In the first coding mode, the picture is encoded in its entirety using WPP. In the second coding mode, each tile of the picture is encoded without using WPP, wherein the picture has one or more tiles.
In another aspect, this disclosure describes a video decoding device comprising means for parsing syntax elements from a bitstream that includes a coded representation of a picture in video data. The video decoding device also comprises means for decoding the picture in its entirety using WPP in response to determining that the syntax element has a particular value. Additionally, the video decoding device comprises means for decoding each tile of the picture without using WPP in response to determining that the syntax element does not have the particular value, wherein the picture has one or more tiles.
In another aspect, this disclosure describes a video encoding device comprising means for generating a bitstream that includes a syntax element that indicates whether a picture is encoded according to a first coding mode or a second coding mode. In the first coding mode, the picture is encoded in its entirety using WPP. In the second coding mode, each tile of the picture is encoded without using WPP, wherein the picture has one or more tiles.
In another aspect, this disclosure describes a computer-readable storage medium storing instructions that, when executed by one or more processors of a video decoding device, configure the video decoding device to parse syntax elements from a bitstream of a coded representation of a picture in video data. The instructions also cause the video decoding device to, in response to determining that the syntax element has a particular value, decode the picture in its entirety using WPP. Additionally, the instructions cause the video decoding device to, in response to determining that the syntax element does not have the particular value, decode each tile of the picture without using WPP, wherein the picture has one or more tiles.
In another aspect, a computer-readable storage medium stores instructions that, when executed by one or more processors of a video encoding device, configure the video encoding device to generate a bitstream that includes a syntax element indicating that a picture is encoded according to a first coding mode or a second coding mode. In the first coding mode, the picture is encoded in its entirety using WPP. In the second coding mode, each tile of the picture is encoded without using WPP, wherein the picture has one or more tiles.
The details of one or more examples of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
Drawings
FIG. 1 is a block diagram illustrating an example video coding system that may utilize the techniques described in this disclosure.
FIG. 2 is a block diagram illustrating an example video encoder that may implement the techniques described in this disclosure.
FIG. 3 is a block diagram illustrating an example video decoder that may implement the techniques described in this disclosure.
Fig. 4 is a flow diagram illustrating example operations of a video encoder for encoding video data in which combinations of tiles and Wavefront Parallel Processing (WPP) within a single picture are not allowed, in accordance with one or more aspects of the present disclosure.
Fig. 5 is a flow diagram illustrating example operations of a video decoder for decoding video data in which combination of tiles and WPPs within a single picture is not allowed, in accordance with one or more aspects of the present disclosure.
Fig. 6 is a flow diagram illustrating another example operation of a video decoder for decoding video data in which no combination of tiles and WPPs within a single picture is allowed, in accordance with one or more aspects of the present disclosure.
Fig. 7 is a flow diagram illustrating example operations of a video encoder for encoding video data in which each line of a Coding Tree Block (CTB) of a picture is in a separate sub-stream, according to one or more aspects of this disclosure.
Fig. 8 is a flow diagram illustrating example operations of a video decoder for decoding video data in which each line of a CTB of a picture is in a separate sub-stream, in accordance with one or more aspects of the present disclosure.
Fig. 9A is a flow diagram illustrating a first portion of an example Context Adaptive Binary Arithmetic Coding (CABAC) parsing process of parsing slice data, according to one or more aspects of the present disclosure.
Fig. 9B is a flow diagram illustrating successive portions of the example CABAC parsing process of fig. 9A.
Fig. 10 is a conceptual diagram illustrating an example of WPP.
FIG. 11 is a conceptual diagram illustrating an example coding order when partitioning a picture into multiple tiles.
Detailed Description
During video coding, a picture may be partitioned into multiple tiles, Wavefront Parallel Processing (WPP) waves, and/or entropy slices. An image block of a picture is defined by horizontal and/or vertical image block boundaries that cross the picture. Tiles of a picture are coded according to a raster scan order, and Coding Tree Blocks (CTBs) within each tile are also coded according to the raster scan order. In WPP, each behavior of CTBs in a picture is a "WPP wave". When a video coder uses WPP to code a picture, the video coder may begin coding CTBs of a WPP wave from left-to-right after the video coder has coded two or more CTBs of an immediately higher WPP wave. An entropy slice may comprise a series of consecutive CTBs according to a raster scan order. The use of information from crossing 1 slice boundaries is prohibited for selecting entropy coding contexts, but may be allowed for other purposes.
In existing video coding systems, a picture may have any combination of image blocks, WPP waves, and entropy slices. For example, a picture may be partitioned into multiple image blocks. In this example, the CTBs in some of the tiles may be coded according to a raster scan order, while the CTBs in other of the tiles may be coded using WPP. Allowing pictures to include combinations of image blocks, WPP waves, and entropy slices may unnecessarily increase implementation complexity and cost of these video coding systems.
The technique of the present invention can solve this problem. That is, according to the technique of the present invention, a combination of any one of two or more image blocks within a picture, a WPP wave, and an entropy slice is not allowed. For example, a video encoder may generate a bitstream that includes a syntax element that indicates whether a picture is encoded according to a first coding mode or a second coding mode. In the first coding mode, the picture is encoded in its entirety using WPP. In the second coding mode, the picture has one or more tiles, and each tile of the picture is encoded without using WPP.
Moreover, in this example, the video decoder may parse syntax elements from the bitstream that includes the coded representation of the picture. In response to determining that the syntax element has a particular value, the video decoder may decode the picture in its entirety using WPP. In response to determining that the syntax element does not have the particular value, the video decoder may decode each tile of the picture without using WPP. A picture may have one or more tiles.
The figures illustrate examples. Elements indicated by reference numerals in the drawings correspond to elements indicated by like reference numerals in the following description. In the present disclosure, elements having names beginning with ordinal words (e.g., "first," "second," "third," etc.) do not necessarily imply a particular order to the elements. Rather, these ordinal words are used only to refer to different elements of the same or similar type.
FIG. 1 is a block diagram illustrating an example video coding system 10 that may utilize techniques of this disclosure. As used herein, the term "video coder" refers generally to both video encoders and video decoders. In this disclosure, the term "video coding" or "coding" may generally refer to video encoding or video decoding.
As shown in fig. 1, video coding system 10 includes a source device 12 and a destination device 14. Source device 12 generates encoded video data. Accordingly, source device 12 may be referred to as a video encoding device or a video encoding apparatus. Destination device 14 may decode the encoded video data generated by source device 12. Destination device 14 may, therefore, be referred to as a video decoding device or a video decoding apparatus. Source device 12 and destination device 14 may be examples of video coding devices or video coding apparatuses. Source device 12 and destination device 14 may comprise a wide range of devices, including desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" phones, televisions, cameras, display devices, digital media players, video game consoles, in-vehicle computers, or the like.
Destination device 14 may receive encoded video data from source device 12 over channel 16. Channel 16 may comprise one or more media and/or devices capable of moving encoded video data from source device 12 to destination device 14. In one example, channel 16 may comprise one or more communication media that enable source device 12 to transmit encoded video data directly to destination device 14 in real-time. In this example, source device 12 may modulate the encoded video data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated video data to destination device 14. The one or more communication media may include wireless and/or wired communication media such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (e.g., the internet). The one or more communication media may include routers, switches, base stations, or other equipment that facilitates communication from source device 12 to destination device 14.
In another example, channel 16 may include a storage medium that stores encoded video data generated by source device 12. In this example, destination device 14 may access the storage medium via disk access or card access. The storage medium may comprise a variety of locally accessed data storage media such as blu-ray discs, DVDs, CD-ROMs, flash memory, or other suitable digital storage media for storing encoded video data.
In another example, channel 16 may include a file server or another intermediate storage device that stores the encoded video generated by source device 12. In this example, destination device 14 may access encoded video data stored at a file server or other intermediate storage device via streaming or download. The file server may be a server of the type capable of storing encoded video data and transmitting the encoded video data to destination device 14. Example file servers include page servers (e.g., for a website), File Transfer Protocol (FTP) servers, Network Attached Storage (NAS) devices, and local disk drives.
Destination device 14 may access the encoded video data over a standard data connection, such as an internet connection. Example types of data connections include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both wireless and wired connections suitable for accessing encoded video data stored on a file server. The transmission of the encoded video data from the file server may be a streaming transmission, a download transmission, or a combination of both streaming and download transmissions.
The techniques of this disclosure are not limited to wireless applications or settings. The techniques may be applied to video coding to support a variety of multimedia applications, such as over-the-air television broadcasts, cable television transmissions, satellite television transmissions, streaming video transmissions (e.g., via the internet), encoding video data for storage on a data storage medium, decoding video data stored on a data storage medium, or other applications. In some examples, video coding system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.
In the example of fig. 1, source device 12 includes a video source 18, a video encoder 20, and an output interface 22. In some examples, output interface 22 may include a modulator/demodulator (modem) and/or a transmitter. Video source 18 may include a video capture device (e.g., a video camera), a video archive containing previously captured video data, a video feed interface that receives video data from a video content provider, and/or a computer graphics system for generating video data, or a combination of these video data sources.
Video encoder 20 may encode video data from video source 18. In some examples, source device 12 transmits the encoded video data directly to destination device 14 via output interface 22. The encoded video data may also be stored onto a storage medium or file server for later access by destination device 14 for decoding and/or playback.
In the example of fig. 1, destination device 14 includes an input interface 28, a video decoder 30, and a display device 32. In some examples, input interface 28 includes a receiver and/or a modem. Input interface 28 may receive encoded video data over channel 16. The display device 32 may be integrated with the destination device 14 or may be external to the destination device 14. In general, display device 32 displays the decoded video data. The display device 32 may include a variety of display devices, such as a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or another type of display device.
Video encoder 20 and video decoder 30 may operate according to a video compression standard, such as the High Efficiency Video Coding (HEVC) standard currently under development, and may comply with the HEVC test model (HM). A draft of the upcoming HEVC standard, referred to as "HEVC working draft 5" or "WD 5", is described in Bross et al, "WD 5: working Draft5 for High Efficiency Video Coding (WD 5: work Draft5of High-Efficiency Video Coding) "(ITU-T SG16WP3 and ISO/IEC JTC1/SC29/WG11 Video Coding joint collaboration team (JCT-VC) at 7 th meeting in Switzerland Nile at 11 months in 2011) which may start from http from 10 days in 10 months in 2012: int-evry fr/jct/doc end user/documents/7 Geneva/wg11/JCTVC-G1103-v3.zip download, the entire contents of said draft being incorporated herein by reference. Another draft of the upcoming HEVC standard, referred to as "HEVC working draft 9", is described in "High Efficiency Video Coding (HEVC) text specification draft 9(High Efficiency Video Coding (HEVC) text specification draft 9)" by bronus et al (ITU-T SG16WP3 and ISO/IEC JTC1/SC29/WG11 Video Coding joint collaboration group (JCT-VC) at conference 11 in shanghai, 2012), which may be from http: int-evry, fr/jct/doc end user/documents/11_ Shanghai/wg11/JCTVC-K1003-v8.zip download, the entire contents of said draft being incorporated herein by reference.
Alternatively, video encoder 20 and video decoder 30 may operate according to other proprietary or industrial standards, including ITU-T H.261, ISO/IEC MPEG-1Visual, ITU-T H.262, or ISO/IEC MPEG-2Visual, ITU-T H.263, ISO/IEC MPEG-4Visual, and ITU-T H.264 (also known as ISO/IEC MPEG-4AVC), including Scalable Video Coding (SVC) and Multiview Video Coding (MVC) extensions thereof. However, the techniques of this disclosure are not limited to any particular coding standard or technique.
Moreover, fig. 1 is merely an example and the techniques of this disclosure may be applied to video coding settings (e.g., video encoding or video decoding) that do not necessarily include any data communication between an encoding device and a decoding device. In other examples, the data is retrieved from local memory, streamed over a network, or the like. The encoding device may encode and store data to memory, and/or the decoding device may retrieve data from memory and decode data. In many examples, the encoding and decoding are performed by devices that do not communicate with each other, but simply encode data to and/or retrieve data from memory and decode the data.
Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable circuits, such as: one or more microprocessors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic, hardware, or any combinations thereof. If the techniques are implemented in part in software, the device may store instructions of the software in a suitable non-transitory computer-readable storage medium and may execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) can be considered as one or more processors. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated in a respective device as part of a combined encoder/decoder (CODEC).
This disclosure may generally refer to video encoder 20 "signaling" certain information to another device, such as video decoder 30. The term "signaling" may generally refer to the communication of syntax elements and/or other data representing encoded video data. This communication may occur in real time or near real time. Alternatively, such communication may occur over a span of time, e.g., may occur in an encoded bitstream at the time of encoding when the syntax elements are stored to a computer-readable storage medium, then the syntax elements may be retrieved by a decoding device at any time after the syntax elements are stored to such medium.
As briefly mentioned above, video encoder 20 encodes video data. The video data may include one or more pictures. Each of the pictures may be a still image. In some examples, a picture may be referred to as a video "frame. Video encoder 20 may generate a bitstream that includes a sequence of bits that forms a coded representation of the video data. The bitstream may include coded pictures and associated data. A coded picture is a coded representation of a picture. The associated data may include Sequence Parameter Sets (SPS), Picture Parameter Sets (PPS), and other syntax structures. An SPS may contain parameters applicable to zero or more sequences of pictures. A PPS may contain parameters applicable to zero or more pictures.
To generate an encoded representation of a picture, video encoder 20 may partition the picture into a grid of Coding Tree Blocks (CTBs). In some examples, a CTB may be referred to as a "treeblock," largest coding unit "(LCU), or" coding tree unit. The CTB of HEVC may be broadly similar to macroblocks of previous standards such as h.264/AVC. However, CTBs are not necessarily limited to a particular size and may include one or more Coding Units (CUs).
Each of the CTBs may be associated with different blocks of pixels of equal size within the picture. Each pixel may include one luma sample and two chroma samples. Thus, each CTB may be associated with one block of luma samples and two blocks of chroma samples. For ease of explanation, this disclosure may refer to the two-dimensional array of pixels as a block of pixels and may refer to the two-dimensional array of samples as a block of samples. Video encoder 20 may use quadtree partitioning to partition a block of pixels associated with a CTB into blocks of pixels associated with a CU (hence the name "coding tree block").
CTBs of a picture may be grouped into one or more slices. In some examples, each of the slices includes an integer number of CTBs. As part of encoding the picture, video encoder 20 may generate an encoded representation (i.e., a coded slice) of each slice of the picture. To generate a coded slice, video encoder 20 may encode each CTB of the slice to generate an encoded representation (i.e., a coded CTB) of each of the CTBs of the slice.
To generate a coded CTB, video encoder 20 may recursively perform a quadtree partitioning of a block of pixels associated with the CTB to divide the block of pixels into progressively smaller blocks of pixels. Each of the smaller blocks of pixels may be associated with a CU. A partitioned CU may be a CU whose block of pixels is partitioned into blocks of pixels associated with other CUs. An undivided CU may be a CU whose block of pixels is undivided into blocks of pixels associated with other CUs.
Video encoder 20 may generate one or more Prediction Units (PUs) for each undivided CU. Each of the PUs of the CU may be associated with a different block of pixels within the block of pixels of the CU. Video encoder 20 may generate a predictive block of pixels for each PU of the CU. The predictive block of pixels for a PU may be a block of pixels.
Video encoder 20 may use intra prediction or inter prediction to generate the predictive pixel blocks for the PU. If video encoder 20 uses intra prediction to generate the predictive pixel block for the PU, video encoder 20 may generate the predictive pixel block for the PU based on decoded pixels of the picture associated with the PU. If video encoder 20 uses inter prediction to generate the predictive pixel block for the PU, video encoder 20 may generate the predictive pixel block for the PU based on decoded pixels of one or more pictures other than the picture associated with the PU.
Video encoder 20 may generate residual blocks of pixels for the CU based on the predictive blocks of pixels for the PUs of the CU. The residual pixel block of the CU may indicate a difference between samples in the predictive pixel blocks of the PUs of the CU and corresponding samples in the original pixel block of the CU.
Moreover, as part of encoding the undivided CU, video encoder 20 may perform recursive quarter-tree partitioning on the residual pixel blocks of the CU to partition the residual pixel blocks of the CU into one or more smaller residual pixel blocks associated with Transform Units (TUs) of the CU. Because pixels in a block of pixels associated with a TU each include one luma sample and two chroma samples, each of the TUs may be associated with one block of residual samples for luma samples and two blocks of residual samples for chroma samples.
Video coder 20 may apply one or more transforms to a block of residual samples associated with a TU to generate a coefficient block (i.e., a block of coefficients). Video encoder 20 may perform a quantization process on each of the coefficient blocks. Quantization generally refers to the process of quantizing coefficients to potentially reduce the amount of data used to represent the coefficients, providing further compression.
Video encoder 20 may generate a set of syntax elements that represent coefficients in the quantized coefficient block. Video encoder 20 may apply an entropy encoding operation, such as a Context Adaptive Binary Arithmetic Coding (CABAC) operation, to at least some of these syntax elements. As part of performing the entropy encoding operation, video encoder 20 may select a coding context. In the case of CABAC, the coding context may indicate the probability of a value 0 and a value 1 bin.
The bitstream generated by video encoder 20 may include a series of Network Abstraction Layer (NAL) units. Each of the NAL units may be a syntax structure containing an indication of the type of data in the NAL unit and bytes containing the data. For example, a NAL unit may contain data representing an SPS, a PPS, a coded slice, Supplemental Enhancement Information (SEI), an access unit delimiter, padding data, or another type of data. A coded slice NAL unit is a NAL unit that includes a coded slice.
Video decoder 30 may receive a bitstream. The bitstream may include a coded representation of the video data encoded by video encoder 20. Video decoder 30 may parse the bitstream to extract syntax elements from the bitstream. As part of extracting some syntax elements from the bitstream, video decoder 30 may entropy decode (e.g., CABAC decode, exponential golomb decode, etc.) the data in the bitstream. Video decoder 30 may reconstruct pictures of the video data based on syntax elements extracted from the bitstream.
The process of reconstructing video data based on the syntax elements may be substantially reciprocal to the process performed by video encoder 20 to generate the syntax elements. For example, video decoder 30 may generate predictive pixel blocks for PUs of the CU based on syntax elements associated with the CU. In addition, video decoder 30 may inverse quantize coefficient blocks associated with TUs of the CU. Video decoder 30 may perform an inverse transform on the coefficient blocks to reconstruct residual pixel blocks associated with the TUs of the CU. Video decoder 30 may reconstruct the pixel blocks of the CU based on the predictive pixel blocks and the residual pixel blocks.
In some examples, video encoder 20 may divide the picture into a plurality of entropy slices. The present invention may use the term "regular slice" to distinguish slices from entropy slices. Entropy slices may include a subset of CUs of a regular slice. In some examples, video encoder 20 may partition a CU among entropy slices such that none of the entropy slices includes more bins (e.g., entropy coded bits) than an upper limit. Each entropy slice may be included in a separate NAL unit.
In this disclosure, in-picture prediction may refer to using information associated with a first unit of a picture (e.g., CTB, CU, PU, etc.) for coding a second unit of the same picture. In-picture prediction across entropy slice boundaries is allowed, except for the purpose of achieving entropy coding. For example, if a video coder (e.g., video encoder 20 or video decoder 30) is performing intra-prediction for a particular PU, the video coder may use samples from neighboring PUs even if the neighboring PUs are in entropy slices different from the particular PU. In this example, the video coder may not be able to use samples from neighboring PUs if the neighboring PUs are in a different slice than the particular PU.
However, when the video coder is performing entropy coding on data associated with a particular PU, the video coder is only allowed to select a coding context based on information associated with a neighboring PU if the particular PU is in the same entropy slice as the neighboring PU. Because of this limitation, a video coder may be able to perform entropy coding (i.e., entropy encoding or decoding) operations on multiple entropy slices of a slice in parallel. Thus, video decoder 30 may be able to parse syntax elements of multiple entropy slices in parallel. However, video decoder 30 is not able to reconstruct pixel blocks of multiple entropy slices of a slice in parallel.
As indicated above, a coded slice NAL unit may contain a coded slice. This slice may be an entropy slice or a regular slice. A slice header in a coded slice NAL unit may include a syntax element (e.g., entropy slice flag) that indicates whether the slice is an entropy slice or a regular slice. For example, if a syntax element is equal to 1, a slice in a coded slice NAL unit may be an entropy slice.
Each coded slice may include a slice header and slice data. The slice header of an entropy slice may be different from the slice header of a regular slice. For example, syntax elements in a slice header of an entropy slice may include a subset of syntax elements in a slice header of a regular slice. Because the slice header of an entropy slice includes fewer syntax elements than the slice header of a regular slice, the entropy slice may also be referred to as a lightweight slice, a slice with a short slice header, or a short slice. An entropy slice may inherit syntax elements omitted from a slice header of a regular slice preceding the entropy slice in decoding order from a slice header of the entropy slice.
Conventionally, a video encoder generates a separate NAL unit for each entropy slice. Individual NAL units are often transported in separate packets over the network. In other words, during transmission of NAL units over a network, there may be one NAL unit per packet. This can be problematic for NAL units containing entropy slices. If packets containing NAL units that include a regular slice are lost during transmission, video decoder 30 may not be able to inherit the entropy slice of the syntax elements using the slice header from the regular slice. Moreover, if, for in-picture prediction, one or more CTBs of the first entropy slice depend on one or more CTBs of the second entropy slice, and a packet containing a NAL unit that includes the second entropy slice is lost during transmission, video decoder 30 may not be able to decode the CTBs of the first entropy slice.
In some examples, a video coder may code at least a portion of a picture using Wavefront Parallel Processing (WPP). Fig. 9, described in detail below, is a conceptual diagram illustrating an example of a WPP. If a video coder uses WPP to code a picture, the video coder may divide the CTB of the picture into a plurality of "WPP waves. Each of the WPP waves may correspond to a different row of CTBs in the picture. If the video coder uses WPP to code a picture, the video coder may begin coding the top row of the CTB. After the video coder has coded the top row of two or more CTBs, the video coder may begin coding the next top row of CTBs in parallel with coding the top row of CTBs. After the video coder has coded two or more CTBs of the second-to-top row, the video coder may begin coding the third row of CTBs from the top in parallel with coding the higher rows of CTBs. This pattern may continue down the row of CTBs in the picture.
If the video coder is using WPP, the video coder may use information associated with spatially neighboring CUs that are outside of the current CTB to perform in-picture prediction for a particular CU in the current CTB as long as the spatially neighboring CU is to the left, above-left, above, or above-right of the current CTB. If the current CTB is the leftmost CTB in a row other than the topmost row, the video coder may use information associated with a second CTB of the immediately higher row to select a context for CABAC coding one or more syntax elements of the current CTB. Otherwise, if the current CTB is not the leftmost CTB in the row, the video coder may use information associated with the CTB to the left of the current CTB to select a context for CABAC coding one or more syntax elements of the current CTB. In this way, the video coder may initialize the CABAC state of a row based on the CABAC state of an immediately higher row after encoding two or more CTBs of the immediately higher row.
Thus, in response to determining that the first CTB is separated from the left boundary of the picture by a single CTB, the video coder may store a context variable associated with the first CTB. The video coder may entropy code (e.g., entropy encode or entropy decode), based at least in part on a context variable associated with a first CTB, one or more syntax elements of a second CTB, the second CTB being adjacent to a left boundary of the picture and one row of CTBs lower than the first CTB.
Even when WPP is used, coded CTBs of a slice are typically arranged in coded slice NAL units according to a raster scan order. This may complicate the design of video coders that implement WPP. When the number of WPP waves is greater than 1 and less than the number of CTB rows of the picture, the bitstream order of the coded bits of the CTB (i.e., decoding order if the coded picture is processed by one decoder core instead of decoding the coded picture in parallel) changes as compared to the case when WPP is not applied as follows. A coded CTB that is later in bitstream/decoding order may need to be in-picture predicted by another coded CTB that is earlier in decoding order. This may break the bitstream causal relationship where no earlier data depends on data coming later in bitstream/decoding order. Bitstream causal relationships have been a generally followed principle in video coding design including video coding standards. While the decoding process works, the decoding process may be more complex when a bitstream indicator that indicates a current position in the bitstream may move back and forth within the portion of the bitstream associated with the coded slice NAL unit.
In some examples, video encoder 20 may divide the picture into one or more tiles. An image block may comprise a non-overlapping set of CTBs of a picture. Video encoder 20 may divide a picture into tiles by defining two or more vertical tile boundaries and two or more horizontal tile boundaries. Each vertical side of a picture may be a vertical image block boundary. Each horizontal side of the current picture may be a horizontal tile boundary. For example, if video encoder 20 defines four vertical tile boundaries and three horizontal tile boundaries for a picture, the current picture is divided into six tiles.
A video coder, such as video encoder 20 or video decoder 30, may code the CTBs of a tile of a picture according to the tile scan order. To code the CTBs according to the tile scan order, the video coder may code the tiles of the picture according to a raster scan order. That is, the video coder may code each tile in a row of tiles in left-to-right order, starting from the top row of tiles and then proceeding down the picture. Furthermore, the video coder may code each CTB within a block of pictures according to a raster scan order. In this way, the video coder may code each CTB of a given tile of a picture before coding any CTB of another tile of the picture. In other words, the tile scan order traverses the CTB in a CTB raster scan order within the tile, and traverses the tile in a tile raster scan order within the picture. Thus, the order in which a video coder codes the CTBs of a picture may be different if the picture is partitioned into multiple tiles than if the picture is not partitioned into multiple tiles. Fig. 10, described below, is a conceptual diagram illustrating an example tile scan order when a picture is partitioned into multiple tiles.
In some examples, a video coder may perform in-picture prediction that spans an image block boundary but does not span a slice boundary. In other examples, in-picture prediction across image block boundaries and slice boundaries is prohibited. In examples where in-picture prediction across image block boundaries and slice boundaries is prohibited, a video coder may be able to code multiple image blocks in parallel.
In some examples, in-picture prediction across tile boundaries is controlled by a flag (e.g., "tile boundary index idc"). If the flag is equal to 1, no prediction is allowed in pictures that cross image block boundaries within a picture. Otherwise, in-picture prediction across image block boundaries is allowed, except for image block boundaries, which are also picture boundaries or slice boundaries. If in-picture prediction across tile boundaries is allowed, the functionality of the tiles may be to change the scan order of the CTBs compared to the case where the picture has no tiles or, equivalently, only one tile. In addition to changing the scan order of the CTBs, a tile may also provide independent partitions that may be used for parallel coding (encoding and/or decoding) of the tile if in-picture prediction across tile boundaries is not allowed. Thus, if a picture is partitioned into at least a first tile and a second tile, video decoder 30 may decode the CTB of the first tile and the CTB of the second tile in parallel when video decoder 30 decodes the tiles without using WPP.
In some examples, a picture may be partitioned into a combination of image blocks, WPP waves, and entropy slices. For example, a picture may be partitioned into a set of image blocks and WPP waves. In another example, a picture may be partitioned into two image blocks and one entropy slice. Allowing combinations of image blocks, WPP waves, and entropy slices within a picture can be problematic because allowing these combinations can increase the complexity and cost of a video encoder and/or video decoder.
The techniques of this disclosure may address the problems described above. According to the techniques of this disclosure, a picture may not be partitioned into any combination of image blocks, WPP waves, and entropy slices. In other words, a picture may be partitioned into one or more tiles, a picture may be partitioned into WPP waves, or a picture may be partitioned into one or more entropy slices. However, a picture may not be partitioned into any of the following combinations: (a) tiles, WPP waves and entropy slices, (b) tiles and WPP waves, (c) tiles and entropy slices, or (d) WPP waves and entropy slices.
To achieve this, video encoder 20 may include a syntax element in the bitstream that indicates whether the picture is encoded according to the first coding mode or the second coding mode. In the first coding mode, the picture is encoded in its entirety using WPP. That is, each line CTB in the picture may be encoded as a WPP wave. In a second coding mode, a picture may have one or more tiles. Moreover, in the second coding mode, each tile of the picture may be encoded without using WPP. For example, in the second coding mode, video encoder 20 may, for each tile of the picture, sequentially encode the CTBs within the tile in order from left to right across the rows of CTBs and down the rows of CTBs of the tile. For ease of explanation, this syntax element may be referred to herein as a coding mode syntax element.
Video decoder 30 may parse syntax elements from a bitstream that includes coded representations of pictures in the video data. In response to determining that the syntax element has a particular value, video decoder 30 may use WPP to decode the picture in its entirety. In response to determining that the syntax element does not have the particular value, video decoder 30 may decode each tile of the picture without using WPP, wherein the picture has one or more tiles.
Various portions of the bitstream may include coding mode syntax elements. For example, video encoder 20 may generate an SPS that includes coding mode syntax elements. In this example, video decoder 30 may parse an SPS from the bitstream that includes the coding mode syntax element. In another example, video encoder 20 may generate a PPS including the coding mode syntax element. In this example, video decoder 30 may parse, from the bitstream, a PPS that includes coding mode syntax elements. Further, if the picture is encoded according to the second coding mode, the bitstream may include one or more syntax elements that indicate whether entropy slices are enabled for the picture. Various portions of the bitstream may include one or more syntax elements that indicate whether entropy slices are enabled for the picture. For example, an SPS may include one or more syntax elements that indicate that entropy slices are enabled for pictures associated with the SPS. In another example, a PPS may include one or more syntax elements indicating that entropy slices are enabled for pictures associated with the PPS. For example, in this example, the PPS may include an entropy slice enabled flag syntax element that indicates whether a coded slice of the reference PPS may consist of an entropy slice.
If a picture includes one or more entropy slices, each entropy slice associated with a slice of the picture may be included in a single coded slice NAL unit, rather than in a separate NAL unit. Thus, an entropy slice may be defined as a subset of slices, where the entropy decoding process of the entropy slice is independent of other entropy slices in the same slice.
As mentioned briefly above, the bitstream may include coded slice NAL units that include coded slices. A coded slice may include a slice header and slice data. The slice data may include one or more substreams. According to the techniques of this disclosure, if a picture is encoded in the first coding mode (i.e., the picture is encoded in its entirety using WPP), each row of the CTBs of the slice is represented by a single one of the sub-streams. If a picture is encoded in the second coding mode (i.e., each tile of the picture is encoded without using WPP), each tile of the picture having one or more CTBs in the slice is represented by a single one of the substreams.
Furthermore, in accordance with the techniques of this disclosure, a slice header of a coded slice may include a set of syntax elements that indicate entry points of an image block, a WPP wave, or an entropy slice within slice data of a coded slice NAL unit. The entry point of the sub-stream may be the first bit of the sub-stream. Furthermore, tiles, WPP waves, or entropy slices within slice data of a coded slice NAL unit may include padding bits that ensure tile, WPP waves, or entropy slice byte alignment.
FIG. 2 is a block diagram illustrating an example video encoder 20 configured to implement the techniques of this disclosure. Fig. 2 is provided for purposes of explanation and should not be viewed as limiting fig. 2 to the techniques as broadly illustrated and described in this disclosure. For purposes of explanation, this disclosure describes video encoder 20 in the context of HEVC coding. However, the techniques of this disclosure may be applicable to other coding standards or methods.
In the example of fig. 2, video encoder 20 includes prediction processing unit 100, residual generation unit 102, transform processing unit 104, quantization unit 106, inverse quantization unit 108, inverse transform processing unit 110, reconstruction unit 112, filter unit 113, decoded picture buffer 114, and entropy encoding unit 116. The prediction processing unit 100 includes an inter prediction processing unit 121 and an intra prediction processing unit 126. The inter prediction processing unit 121 includes a motion estimation unit 122 and a motion compensation unit 124. In other examples, video encoder 20 may include more, fewer, or different functional components.
Video encoder 20 may receive video data. To encode the video data, video encoder 20 may encode each slice of each picture of the video data. As part of encoding the slice, video encoder 20 may encode each CTB in the slice. As part of encoding the CTB, prediction processing unit 100 may perform a quadtree partition on a block of pixels associated with the CTB to divide the block of pixels into progressively smaller blocks of pixels. Smaller blocks of pixels may be associated with CUs. For example, prediction processing unit 100 may partition a block of pixels of a CTB into four equally sized sub-blocks, partition one or more of the sub-blocks into four equally sized sub-blocks, and so on.
Video encoder 20 may encode a CU of a CTB to generate an encoded representation of the CU (i.e., a coded CU). Video encoder 20 may encode CUs of the CTB according to the z-scan order. In other words, video encoder 20 may encode the upper-left CU, the upper-right CU, the lower-left CU, and then the lower-right CU in that order. When video encoder 20 encodes a partitioned CU, video encoder 20 may encode CUs associated with sub-blocks of a block of pixels of the partitioned CU according to a z-scan order.
As part of encoding the CU, prediction processing unit 100 may partition blocks of pixels of the CU among one or more PUs of the CU. Video encoder 20 and video decoder 30 may support various PU sizes. Assuming that the size of a particular CU is 2 nx 2N, video encoder 20 and video decoder 30 may support 2 nx 2N or nxn PU sizes for intra prediction, and symmetric PU sizes of 2 nx 2N, 2 nx N, N x 2N, N xn, or similar sizes for inter prediction. Video encoder 20 and video decoder 30 may also support asymmetric partitioning of PU sizes of 2 nxnu, 2 nxnd, nlx 2N, and nrx 2N for inter prediction.
Inter prediction processing unit 121 may generate predictive data for the PUs by performing inter prediction on each PU of the CU. The predictive data for the PU may include a predictive block of pixels corresponding to the PU and motion information for the PU. The slice may be an I slice, a P slice, or a B slice. Inter prediction unit 121 may perform different operations on PUs of a CU depending on whether the PU is in an I slice, in a P slice, or in a B slice. In I slices, all PUs are intra predicted. Therefore, if the PU is in an I slice, inter prediction unit 121 does not perform inter prediction on the PU.
If the PU is in a P slice, motion estimation unit 122 may search reference pictures in a reference picture list (e.g., "list 0") for the reference block of the PU. The reference block of the PU may be a block of pixels that most closely corresponds to the block of pixels of the PU. Motion estimation unit 122 may generate a reference picture index that indicates a reference picture in list 0 of the reference block containing the PU and a motion vector that indicates a spatial displacement between the pixel block of the PU and the reference block. Motion estimation unit 122 may output the reference picture index and the motion vector as the motion information for the PU. Motion compensation unit 124 may generate the predictive pixel block for the PU based on the reference block indicated by the motion information of the PU.
If the PU is in a B slice, motion estimation unit 122 may perform uni-directional inter prediction or bi-directional inter prediction on the PU. To perform uni-directional inter prediction for a PU, motion estimation unit 122 may search the reference pictures of a first reference picture list ("list 0") or a second reference picture list ("list 1") for a reference block of the PU. Motion estimation unit 122 may output, as the motion information for the PU, each of: a reference picture index indicating a location in list 0 or list 1 of the reference picture containing the reference block, a motion vector indicating a spatial displacement between the pixel block of the PU and the reference block, and a prediction direction indicator indicating whether the reference picture is in list 0 or list 1.
To perform bi-directional inter prediction for the PU, motion estimation unit 122 may search the reference picture in list 0 for the reference block of the PU and may also search the reference picture in list 1 for another reference block of the PU. Motion estimation unit 122 may generate reference picture indices that indicate the locations in list 0 and list 1 of reference pictures that contain the reference block. In addition, motion estimation unit 122 may generate motion vectors that indicate spatial displacements between the reference block and the block of pixels of the PU. The motion information of the PU may include a reference picture index and a motion vector of the PU. Motion compensation unit 124 may generate the predictive pixel block for the PU based on the reference block indicated by the motion information of the PU.
Intra-prediction processing unit 126 may generate predictive data for the PU by performing intra-prediction on the PU. The predictive data for the PU may include predictive pixel blocks for the PU and various syntax elements. Intra prediction processing unit 126 may perform intra prediction on PUs in I slices, P slices, and B slices.
To perform intra-prediction for a PU, intra-prediction processing unit 126 may use multiple intra-prediction modes to generate multiple sets of predictive data for the PU. To generate the set of predictive data for the PU using the intra-prediction mode, intra-prediction processing unit 126 may extend samples from sample blocks of neighboring PUs across sample blocks of the PU in a direction associated with the intra-prediction mode. Assuming left-to-right, top-to-bottom coding order of PU, CU, and CTB, the neighboring PU may be above, above-right, above-left, or to the left of the PU. Intra-prediction processing unit 126 may use various numbers of intra-prediction modes, e.g., 33 directional intra-prediction modes. In some examples, the number of intra prediction modes may depend on the size of the block of pixels of the PU.
Prediction processing unit 100 may select the predictive data for the PU from among the predictive data for the PU of the CU generated by inter prediction processing unit 121 or the predictive data for the PU generated by intra prediction processing unit 126. In some examples, prediction processing unit 100 selects predictive data for PUs of the CU based on bitrate/distortion metrics of the set of predictive data. The predictive block of pixels of the selected predictive data may be referred to herein as the selected predictive block of pixels.
Residual generation unit 102 may generate residual blocks of pixels for the CU based on the blocks of pixels of the CU and the selected predictive blocks of pixels of the PUs of the CU. For example, residual generation unit 102 may generate the residual block of pixels for the CU such that each sample in the residual block of pixels has a value equal to a difference between a sample in the block of pixels of the CU and a corresponding sample in the selected predictive block of pixels of the PU of the CU.
Prediction processing unit 100 may perform a quadtree partition to partition the residual pixel block of the CU into sub-blocks. Each undivided residual pixel block may be associated with a different TU of the CU. The size and location of residual pixel blocks associated with TUs of a CU may or may not be based on the size and location of pixel blocks of PUs of the CU.
Because pixels of a residual pixel block of a TU may include one luma sample and two chroma samples, each of the TUs may be associated with one block of luma samples and two blocks of chroma samples. Transform processing unit 104 may generate coefficient blocks for each TU of the CU by applying one or more transforms to residual sample blocks associated with the TU. Transform processing unit 104 may apply various transforms to a block of residual samples associated with a TU. For example, transform processing unit 104 may apply a Discrete Cosine Transform (DCT), a directional transform, or a conceptually similar transform to the residual sample block.
Quantization unit 106 may quantize coefficients in coefficient blocks associated with TUs. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, n-bit coefficients may be reduced to m-bit coefficients during quantization, where n is greater than m. Quantization unit 106 may quantize coefficient blocks associated with TUs of the CU based on Quantization Parameter (QP) values associated with the CU. Video encoder 20 may adjust the degree of quantization applied to coefficient blocks associated with a CU by adjusting the QP value associated with the CU.
Inverse quantization unit 108 and inverse transform processing unit 110 may apply inverse quantization and inverse transform, respectively, to the coefficient block to reconstruct a residual sample block from the coefficient block. Reconstruction unit 112 may add the reconstructed residual sample block to corresponding samples from one or more predictive sample blocks generated by prediction processing unit 100 to generate a reconstructed sample block associated with the TU. By reconstructing sample blocks for each TU of a CU in this manner, video encoder 20 may reconstruct blocks of pixels of the CU.
Filter unit 113 may perform deblocking operations to reduce block artifacts in blocks of pixels associated with CUs. Decoded picture buffer 114 may store the reconstructed block of pixels after filter unit 113 performs one or more deblocking operations on the reconstructed block of pixels. Inter prediction unit 121 may perform inter prediction on PUs of other pictures using the reference picture containing the reconstructed block of pixels. In addition, intra-prediction processing unit 126 may use reconstructed pixel blocks in decoded picture buffer 114 to perform intra-prediction on other PUs in the same picture as the CU.
Entropy encoding unit 116 may receive data from other functional components of video encoder 20. For example, entropy encoding unit 116 may receive coefficient blocks from quantization unit 106 and may receive syntax elements from prediction processing unit 100. Entropy encoding unit 116 may perform one or more entropy encoding operations on the data to generate entropy encoded data. For example, entropy encoding unit 116 may perform the following operations on the data: a Context Adaptive Variable Length Coding (CAVLC) operation, a CABAC operation, a variable-to-variable (V2V) length coding operation, a syntax-based context adaptive binary arithmetic coding (SBAC) operation, a Probability Interval Partitioning Entropy (PIPE) coding operation, an exponential golomb encoding operation, or another type of entropy encoding operation.
Video encoder 20 may output a bitstream that includes the entropy-encoded data generated by entropy encoding unit 116. A bitstream may include a series of NAL units. NAL units may include coded slice NAL units, SPS NAL units, PPS NAL units, and so on. To ensure that a picture does not include a combination of tiles, WPP waves, and entropy slices, the bitstream may include syntax elements that indicate whether the picture is encoded entirely using WPP or whether each tile of the picture is encoded without using WPP.
FIG. 3 is a block diagram illustrating an example video decoder 30 configured to implement the techniques of this disclosure. Fig. 3 is provided for purposes of explanation and fig. 3 is not limiting of the techniques as broadly illustrated and described in this disclosure. For purposes of explanation, this disclosure describes video decoder 30 in the context of HEVC coding. However, the techniques of this disclosure may be applicable to other coding standards or methods.
In the example of fig. 3, video decoder 30 includes an entropy decoding unit 150, a prediction processing unit 152, an inverse quantization unit 154, an inverse transform processing unit 156, a reconstruction unit 158, a filter unit 159, and a decoded picture buffer 160. Prediction processing unit 152 includes a motion compensation unit 162 and an intra prediction processing unit 164. In other examples, video decoder 30 may include more, fewer, or different functional components.
Video decoder 30 may receive a bitstream. Entropy decoding unit 150 may parse the bitstream to extract syntax elements from the bitstream. As part of parsing the bitstream, entropy decoding unit 150 may entropy decode entropy-encoded syntax elements in the bitstream. Prediction processing unit 152, inverse quantization unit 154, inverse transform processing unit 156, reconstruction unit 158, and filter unit 159 may generate decoded video data based on syntax elements extracted from the bitstream.
A bitstream may comprise a series of NAL units. NAL units of a bitstream may include coded slice NAL units. As part of parsing the bitstream, entropy decoding unit 150 may extract and entropy decode syntax elements from the coded slice NAL units. Each of the coded slices may include a slice header and slice data. The slice header may contain syntax elements for the slice. The syntax elements in the slice header may include syntax elements that identify a PPS associated with the picture containing the slice.
In addition, video decoder 30 may perform a reconstruction operation on the undivided CU. To perform a reconstruction operation on an undivided CU, video decoder 30 may perform a reconstruction operation on each TU of the CU. By performing a reconstruction operation for each TU of the CU, video decoder 30 may reconstruct a residual block of pixels associated with the CU.
As part of performing the reconstruction operation on the TUs of the CU, inverse quantization unit 154 may inverse quantize (i.e., dequantize) coefficient blocks associated with the TUs. Inverse quantization unit 154 may use the QP value associated with the CU of the TU to determine the degree of quantization and, likewise, the degree of inverse quantization that inverse quantization unit 154 will apply.
After inverse quantization unit 154 inverse quantizes the coefficient block, inverse transform processing unit 156 may apply one or more inverse transforms to the coefficient block in order to generate a residual sample block associated with the TU. For example, inverse transform processing unit 156 may apply an inverse DCT, an inverse integer transform, an inverse karhunen-raytown transform (KLT), an inverse rotational transform, an inverse transform, or another inverse transform to the coefficient block.
If the PU is encoded using intra prediction, intra prediction processing unit 164 may perform intra prediction to generate a predictive block of samples for the PU. Intra-prediction processing unit 164 may use the intra-prediction mode to generate predictive pixel blocks for PUs based on pixel blocks of spatially neighboring PUs. Intra-prediction processing unit 164 may determine the intra-prediction mode of the PU based on one or more syntax elements parsed from the bitstream.
Motion compensation unit 162 may construct a first reference picture list (list 0) and a second reference picture list (list 1) based on syntax elements extracted from the bitstream. Furthermore, if the PU is encoded using inter prediction, entropy decoding unit 150 may extract motion information of the PU. Motion compensation unit 162 may determine one or more reference blocks for the PU based on the motion information of the PU. Motion compensation unit 162 may generate the predictive block of pixels for the PU based on one or more reference blocks of the PU.
Reconstruction unit 158 may reconstruct the pixel blocks of the CU using residual pixel blocks associated with the TUs of the CU and predictive pixel blocks (i.e., intra-prediction data or inter-prediction data) of the PUs of the CU, as applicable. In particular, reconstruction unit 158 may add the samples of the residual pixel block and the corresponding samples of the predictive pixel block to reconstruct the pixel block of the CU.
Filter unit 159 may perform deblocking operations to reduce block artifacts associated with blocks of pixels of a CU. Video decoder 30 may store blocks of pixels of the CU in decoded picture buffer 160. Decoded picture buffer 160 may provide reference pictures for subsequent motion compensation, intra prediction, and presentation on a display device (e.g., display device 32 of fig. 1). For example, video decoder 30 may perform intra-prediction or inter-prediction operations on PUs of other CUs based on blocks of pixels in decoded picture buffer 160.
As mentioned above, video decoder 30 may receive a bitstream that includes coding mode syntax elements. If the coding mode syntax element has a particular value, the coding mode syntax element indicates that the picture is completely encoded using WPP. In various examples, the coding mode syntax elements may be in various portions of the bitstream. For example, an SPS may include a coding mode syntax element. Table 1 below provides an example syntax for an SPS that includes a coding mode syntax element ("tile _ mode").
Table 1-sequence parameter set RBSP syntax
The syntax element with type descriptor ue (v) is an unsigned variable length value encoded using exponential golomb coding, with the left bit first. Syntax elements with type descriptors u (1) and u (2) are unsigned values of length 1 bit or 2 bits, respectively. In the example syntax of table 1, the inter _4 × 4_ enabled _ flag syntax element specifies whether inter prediction can be applied to a block having a size of 4 × 4 luma samples.
Further, in the example syntax of table 1, the tile _ mode syntax element specifies the image block mode of the picture associated with the SPS. If the tile mode syntax element is equal to 0, there is only one tile in each of the pictures associated with the SPS. Coding CTBs in a single tile of each picture according to raster scan order without using WPP. If the tile _ mode syntax element is equal to 1, then the pictures associated with the SPS are in a uniformly spaced picture block mode. When the pictures are in a uniformly spaced tile mode, tile column boundaries and tile row boundaries are uniformly distributed in each picture associated with the SPS. Thus, when a picture is in a uniformly spaced image block mode, the image blocks of the picture have the same size. The CTBs within each of the evenly distributed tiles may be encoded according to a raster scan order without using WPP. If the tile _ mode syntax element is equal to 2, then the picture associated with the SPS is in a non-uniformly spaced picture block mode. When the picture is in non-uniformly spaced tile mode, tile column boundaries and tile row boundaries are non-uniformly distributed across the picture, but can be explicitly signaled using the column _ width [ i ] and row _ height [ i ] syntax elements of the SPS. The CTBs within each of the non-uniformly spaced tiles may be encoded according to a raster scan order without using WPP.
If the tile mode syntax element is equal to 3, then the picture associated with the SPS is coded using WPP mode. In other words, if the tile _ mode syntax element has a particular value (e.g., 3), the picture associated with the SPS is completely encoded using WPP. If the tile mode syntax element has any value other than 3, then none of the picture blocks in any picture associated with the SPS are encoded using WPP. Furthermore, when using WPP to code a picture, a particular remembering process is invoked after decoding two CTBs in a row of CTBs of the picture. In addition, a particular synchronization process is invoked prior to decoding the first CTB in a row of CTBs of a picture. Additionally, when the rightmost CTB in a row has been coded, a specific CABAC state reinitialization process of internal variables is invoked.
In the particular remembering process mentioned above, the video coder may store a particular context variable associated with a first CTB in response to determining that the first CTB is separated from a left boundary of a picture by a single CTB. In a particular synchronization process, the video coder may entropy code (i.e., entropy encode or entropy decode), based at least in part on a context variable associated with the first CTB, one or more syntax elements of a second CTB, the second CTB being positioned adjacent to the left boundary of the picture and at a lower row of CTBs than the first CTB.
Furthermore, in the example syntax of table 1, the num _ tile _ columns _ minus1 syntax element specifies the number of tile columns that partition each of the pictures associated with the SPS. When the tile _ mode syntax element is equal to 0 or 3, it can be inferred that the value of the num _ tile _ columns _ minus1 syntax element is equal to 0. This is because there is only a single tile in a picture when the tile _ mode syntax element is equal to 0, and each CTB row of the picture is a single tile when the tile _ mode syntax element is equal to 3. The num _ tile _ rows _ minus1 syntax element specifies the number of image block lines that partition each of the pictures associated with the SPS. When the tile _ mode syntax element is equal to 0, it can be inferred that the value of the num _ tile _ rows _ minus1 syntax element is equal to 0. When the tile _ mode syntax element is equal to 3, video decoder 30 may automatically determine (i.e., infer) that the value of the num _ tile _ rows _ minus1 syntax element is equal to the height of the picture in the CTB minus 1. Further, when the tile _ mode syntax element is equal to 1 or 2, at least one of the num _ tile _ columns _ minus1 syntax element and the num _ tile _ rows _ minus1 syntax element is greater than 0.
Video decoder 30 may determine the width and height of a tile for a picture associated with the SPS based on the column _ width [ i ] syntax element and the row _ height [ i ] syntax element. The column _ width [ i ] syntax element indicates a width of an image block column of a picture associated with the SPS. Video decoder 30 may generate a column width vector that indicates a width of a column of an image block in a picture associated with the SPS based at least in part on a column _ width [ i ] syntax element. Video decoder 30 may generate a columnWidth vector from a column _ width [ i ] syntax element of the SPS using the following pseudo-code.
Video decoder 30 may generate a rowHeight vector that indicates the height of the tile in the picture associated with the SPS. In some examples, video decoder 30 may use the following pseudo-code to generate the rowHeight vector.
Further, video decoder 30 may generate a colBd vector indicating the location within the picture associated with the SPS of the leftmost column boundary of each column of the image block. In some examples, video decoder 30 may determine the colBd vector using the following pseudo-code.
colBd[0]=0
for(i=0;i<=num_tile_columns_minus1;i++)
colBd[i+1]=colBd[i]+columnWidth[i]
Video decoder 30 may generate a rowBd vector that indicates the location within the picture associated with the SPS of the top row boundary of each row of the image block. In some examples, video decoder 30 may determine the rowBd vector using the following pseudo-code.
rowBd[0]=0
for(i=0;i<=num_tile_rows_minus1;i++)
rowBd[i+1]=rowBd[i]+rowHeight[i]
In the example syntax of table 1, the tile _ boundary _ independence _ flag syntax element indicates whether an image block can be independently decoded. For example, if the tile _ boundary _ independence _ flag is equal to 1, the image block can be independently decoded. For example, if tile _ boundary _ independence _ flag is equal to 1 and video decoder 30 is decoding a particular CTB, then it is determined that all CTBs neighboring the particular CTB that are not within the same image block as the particular CTB are unavailable for in-picture prediction. Furthermore, if the tile _ boundary _ independence _ flag is equal to 1, video decoder 30 re-initializes the entropy coding context before entropy decoding the first CTB in the image block.
If the tile boundary index flag syntax element is equal to 0, the availability of the CTB for prediction in the picture is not affected by the tile boundary. In other words, if the tile _ boundary _ independent _ flag syntax element is equal to 0, video decoder 30 may perform in-picture prediction across image block boundaries. Furthermore, if the tile _ boundary _ independence _ flag syntax element is equal to 0, entropy decoding unit 150 may invoke the synchronization process when decoding the first CTB in the image block, except for the first tree block in the picture. In this synchronization process, entropy decoding unit 150 may use information associated with the last CTB of the previous tile to select a coding context for entropy decoding one or more syntax elements of the first CTB in the tile. Additionally, entropy decoding unit 150 may perform a remembering process when decoding the first CTB in the second CTB row in the image block. The remembering process can store context variables for use in selecting a context for use in CABAC coding one or more syntax elements of a leftmost CTB in a next lower row of CTBs.
If the tile mode syntax element is equal to 0 (i.e., there is only one tile per picture), then in the example syntax of table 1, the SPS does not include the tile boundary index flag syntax element. However, if the tile _ mode syntax element is equal to 0, video decoder 30 may automatically determine that the value of the tile _ boundary _ independence _ flag syntax element is equal to 1. Similarly, if the tile _ mode syntax element is equal to 3 (i.e., the picture is completely encoded using WPP), then in the example syntax of table 1, the SPS does not include the tile _ boundary _ independence _ flag syntax element. However, if the tile _ mode syntax element is equal to 3, video decoder 30 may automatically determine that the value of the tile _ boundary _ independence _ flag syntax element is equal to 0.
In the example syntax of table 1, the loop _ filter _ across _ tile _ flag syntax element specifies whether video decoder 30 is to perform an in-loop filtering operation that crosses tile boundaries. For example, if the loop _ filter _ across _ tile _ flag syntax element is equal to 1, video decoder 30 may perform an in-loop filtering operation that crosses picture block boundaries. Otherwise, if the loop _ filter _ across _ tile _ flag syntax element is equal to 0, video decoder 30 may not perform the in-loop filtering operation across the picture block boundaries. Example in-loop filtering operations may include a deblocking filter, a sample adaptive offset, and an adaptive loop filter.
If the tile mode syntax element is equal to 0 (i.e., there is only one tile per picture) or equal to 3 (i.e., WPP is used to completely encode each picture associated with the SPS), then in the example syntax of table 1, the SPS does not include the loop filter across tile flag syntax element. However, if the tile _ mode syntax element is equal to 0, video decoder 30 may automatically determine that the value of the loop _ filter _ across _ tile _ flag syntax element is equal to 0. If the tile _ mode syntax element is equal to 3, video decoder 30 may automatically determine that the value of the loop _ filter _ across _ tile _ flag syntax element is equal to 1.
Alternatively, or in addition to receiving an SPS that includes a coding mode syntax element, video decoder 30 may also receive a PPS that includes a coding mode syntax element. In some examples in which video decoder 30 receives an SPS and a PPS that apply to the same picture, and both the SPS and the PPS include coding mode syntax elements, video decoder 30 may give priority to the coding mode syntax elements specified by the PPS. Table 2 below presents an example syntax for a PPS including a coding mode syntax element ("tile _ mode").
Table 2-picture parameter set RBSP syntax
In the example syntax of table 2, if the tile _ partition _ info _ present _ flag syntax element is equal to 1, then there is a tile _ mode syntax element. In addition, if the tile _ partition _ info _ present _ flag syntax element is equal to 1, num _ tile _ columns _ minus1, num _ tile _ rows _ minus1, column _ width [ i ], and row _ height [ i ] syntax elements may be present in the PPS. the semantics of the tile _ mode syntax element, num _ tile _ columns _ minus1 syntax element, num _ tile _ rows _ minus1 syntax element, column _ width syntax element, and row _ height may be the same as those described above with respect to the example SPS syntax of table 1. If the tile _ partition _ info _ present _ flag syntax element is equal to 0, then tile _ mode, num _ tile _ columns _ minus1, num _ tile _ rows _ minus1, column _ width [ i ], and row _ height [ i ] syntax elements are not present in the PPS.
In this way, video decoder 30 may determine, based at least in part on a coding mode syntax element (e.g., tile mode) having a value indicating that no CTBs in a picture are encoded using WPP, that a parameter set (e.g., SPS or PPS) includes a tile column number syntax element and a tile row number syntax element. Video decoder 30 may also determine the number of image block columns based on the image block column number syntax element. The number of columns of image blocks for each picture associated with a parameter set may be equal to the number of columns of image blocks. Video decoder 30 may also determine the number of rows of image blocks based on the image block row number syntax element. The number of rows of image blocks for each picture associated with a parameter set may be equal to the number of rows of image blocks. Moreover, video decoder 30 may determine that the parameter set (e.g., SPS or PPS) includes a series of one or more column width syntax elements and a series of one or more tile height syntax elements. In addition, video decoder 30 may determine a width of a column of an image block for each picture associated with a parameter set based at least in part on a column width syntax element. Moreover, video decoder 30 may determine the height of the picture block for each picture associated with the parameter set based at least in part on the picture block height syntax element.
Similarly, video encoder 20 may generate a parameter set that includes a picture block column number syntax element and a picture block row number syntax element. The parameter set may be a Picture Parameter Set (PPS) or a Sequence Parameter Set (SPS). The number of columns of image blocks may be determined based on the image block column number syntax element, and the number of columns of image blocks for each picture associated with the parameter set is equal to the number of columns of image blocks. The number of rows of image blocks may be determined based on the image block row number syntax element, and the number of rows of image blocks for each picture associated with the parameter set is equal to the number of rows of image blocks. When video encoder 20 generates a parameter set, video encoder 20 may generate a series of one or more column width syntax elements and a series of one or more row height syntax elements. A width of a column of an image block of each picture associated with a parameter set may be determined based at least in part on a column width syntax element. A height of a row for an image block of each picture associated with a parameter set may be determined based at least in part on a row height syntax element.
Also, in the example syntax of table 2, if the tile _ control _ info _ present _ flag syntax element is equal to 1, a tile _ boundary _ index _ flag and a loop _ filter _ across _ tile _ flag syntax element may be present in the PPS. If the tile _ control _ info _ present _ flag syntax element is equal to 0, then the tile _ boundary _ index _ flag and loop _ filter _ across _ tile _ flag syntax elements are not present in the PPS.
In the example syntax of table 2, if entropy slice enabled flag is equal to 1, a coded slice of the reference PPS may include (and may consist of) one or more entropy slices. If the entropy slice enabled flag syntax element is equal to 0, then the coded slice of the reference PPS contains no entropy slices. When the entry slice enabled flag syntax element is not present, video decoder 30 may automatically determine (i.e., infer) that the entry slice enabled flag syntax element is equal to 0. The semantics of the other syntax elements of the PPS may be the same as those defined in the hevwd 5.
In the example syntax of table 2, if the tile _ mode syntax element is equal to 0, the PPS only includes the entry _ slice _ enabled _ flag syntax element. As discussed above, video decoder 30 may determine whether to decode the CTB of each tile of a picture using WPP based on the tile _ mode syntax element. Accordingly, video decoder 30 may determine, based on a coding mode syntax element (e.g., tile mode) having a particular value, that the bitstream includes an additional syntax element (e.g., entropy slice enabled flag) that indicates whether entropy slices are enabled for an encoded representation of a picture that references a parameter set (e.g., SPS or PPS) that includes the coding mode syntax element and the additional syntax element.
As described above, a coded slice NAL unit may include a coded representation of a slice. The coded representation of a slice may include a slice header followed by slice data. In some examples, video decoder 30 may determine whether the slice header includes a plurality of entry offset syntax elements from which entry points of the sub-streams in the slice data may be determined based at least in part on a coding mode syntax element (e.g., tile _ mode). In response to determining that the slice header includes the entry offset syntax element, video decoder 30 may use the plurality of entry offset syntax elements to determine entry points for the sub-streams in the slice data. In other words, video decoder 30 may determine the locations of the sub-streams in memory based at least in part on the offset syntax elements. If the coding mode syntax element has one value (e.g., 3), each row of the CTB of the picture is represented by a single one of the sub-streams. If the coding mode syntax elements have different values (e.g., 0, 1, or 2), each tile of a picture having one or more CTBs in a slice is represented by a single one of the sub-streams. The slice header may follow the example syntax of table 3 below.
TABLE 3 slice header syntax
In the example syntax of table 3, the values of the slice header syntax elements "pic _ parameter _ set _ id", "frame _ num", "idr _ pic _ id", "pic _ order _ cnt _ lsb", "delta _ pic _ order _ cnt [0 ]" and "delta _ pic _ order _ cnt [1 ]" are the same in all slice headers of the coded pictures. Furthermore, in the example syntax of table 3, the first _ slice _ in _ pic _ flag syntax element indicates whether the slice includes a CU that covers the top-left luma samples of the picture. If the first slice in pic flag syntax element is equal to 1, video decoder 30 may set both variables SliceAddress and lcuaddres to 0 and video decoder 30 may begin decoding with the first CTB in the picture.
Also, in the example syntax of table 3, the slice _ address syntax element specifies the address of the slice start at slice granularity resolution. Slice granularity resolution is the granularity a slice is defined to have. The number of bits of the slice address syntax element may be equal to (Ceil (Log2(NumLCUsInPicture)) + slicegrumberity), where "NumLCUsInPicture" is the number of CTBs in a picture.
In the example syntax of table 3, video decoder 30 sets the LCUAddress variable to (slice _ address > slicegrumbering). The LCUAddress variable indicates the LCU portion of the slice address for the slice in raster scan order. Video decoder 30 sets the granualitoryaddress variable to (slice _ address- (lcuaddres < slicegGranularity)). The granualityaddress variable represents a sub-LCU portion of the slice address. The granualityaddress variable is expressed in z-scan order.
Video decoder 30 sets the SliceAddress variable to (lcuaddres < (log2_ diff _ max _ min _ coding _ block _ size < 1)) + (granulataddress < ((log2_ diff _ max _ min _ coding _ block _ size < 1) -sliceageaddress)). The value log2_ diff _ max _ min _ coding _ block _ size specifies the difference between the maximum CU size and the minimum CU size. Video decoder 30 may begin decoding the slice with the largest CU possible at the slice start coordinate. The slice start coordinate may be the coordinate of the top left pixel of the first CU of the slice.
Further, in the example syntax of table 3, the callback _ init _ idc syntax specifies an index for determining an initialization table used in the initialization process of the context variable. The value of the cabac _ init _ idc syntax element may be in the range of 0 to 2, including 0 and 2.
In the example syntax of table 3, the num _ entry _ offsets syntax element specifies the number of entry _ offset [ i ] syntax elements in the slice header. In other words, the number of entry offset syntax elements of the plurality of entry offset syntax elements may be determined based on the num _ entry _ offsets syntax element. When a num _ entry _ offsets syntax element is not present, video decoder 30 may determine that the value of the num _ entry _ offsets syntax element is equal to 0. In this way, video decoder 30 may determine how many offset syntax elements are in the plurality of entry offset syntax elements based on the num _ entry _ offsets syntax element. The offset _ len _ minus8 syntax element plus 8 specifies the length in bits of the entry _ offset [ i ] syntax element. In other words, the length in bits of each of the entry offset syntax elements may be determined based on the offset _ len _ minus8 syntax element. In this way, video decoder 30 may determine the length in bits of the offset syntax element based on the offset _ len _ minus8 syntax element. The entry _ offset [ i ] syntax element specifies the ith entry offset (in bytes).
Video decoder 30 may parse the offset syntax elements from the bitstream based at least in part on how many of the plurality of offset syntax elements are and the length in bits of the offset syntax elements. The number of substreams in a coded slice NAL unit may be equal to num _ entry _ offsets + 1. The index values of the sub-streams may be in the range of 0 to num _ entry _ offsets, including 0 and num _ entry _ offsets. Sub-stream 0 of a coded slice NAL unit may consist of: bytes 0 through entry _ offset [0] -1 of the slice data of the coded slice NAL unit include 0 and entry _ offset [0] -1. Substream k of a coded slice NAL unit (where k is in the range of 1 to num _ entry _ offsets-1, including 1 and num _ entry _ offsets-1) may consist of: bytes entry _ offset [ k-1] to entry _ offset [ k ] -1 of the slice data of the coded slice NAL unit, including entry _ offset [ k-1] and entry _ offset [ k ] -1. The last substream of a coded slice NAL unit, where the substream index is equal to num _ entry _ offsets, may be comprised of the remainder in bytes of slice data of the coded slice NAL unit.
In the example syntax of table 3, if the tile mode syntax element is greater than 0, each substream having a substream index in the range of 1 to num _ entry _ offsets-1 contains each coded bit of one tile and the substream having a substream index of 0 contains each coded bit of the tile or a number of the last coded bits of the tile. The end coded bits of the image block are coded bits that are coded at the end of the image block. Furthermore, if the tile _ mode syntax element is greater than 0, the last substream (i.e., the substream with a substream index equal to num _ entry _ offsets) contains all of the coded bits of the image block or a number of the starting coded bits of the image block. The starting coded bits of the image block are coded bits that are coded at the start of the image block. A sub-stream does not contain coded bits for more than one image block. In the example syntax of table 3, the NAL unit header and slice header of a coded slice NAL unit are always included in sub-stream 0. If the tile _ mode syntax element is equal to 0 and the entropy slice enabled flag syntax element is equal to 1, each sub-stream contains each coded bit of one entropy slice and does not contain any coded bits of another entropy slice.
In the example syntax of table 3, the entropy slice address [ i ] syntax element specifies the starting address of the (i +1) th entropy slice in the coded slice NAL unit at a slice granularity resolution. The size in bits of each of the entry slice address [ i ] syntax elements may be equal to (Ceil (Log2(NumLCUsInPicture)) + slicegGranularity).
Furthermore, in the example syntax of table 3, the "entropy _ slice _ cabac _ init _ idc [ i ]" syntax element specifies the index of the (i +1) th entropy slice in the coded slice NAL unit that is used to determine the initialization table used in the initialization process for the context variables. The value of entropy _ slice _ calbac _ init _ idc [ i ] is in the range of 0 to 2, including 0 and 2. The semantics of the other syntax elements of the slice header may be the same as those defined in HEVC WD 5.
In some examples, the entry _ offset [ i ] syntax element indicates the offset of the substream in place. Furthermore, in some examples, the slice header may include a flag indicating whether "entry _ offset [ i ]" is in bytes (when equal to 1) or bits (when equal to 0). This flag may be located in the slice header after the offset _ len _ minus8 syntax element.
Further, in some examples, the slice header may include a syntax element for each sub-stream (including sub-stream 0) to indicate the sub-stream type for the respective sub-stream. In this example, a sub-stream is an image block if the syntax elements of the sub-stream have a first value. If the syntax element of the sub-stream has the second value, then the sub-stream is an entropy slice.
As mentioned above, a coded representation may include a slice header and slice data. The slice data may include one or more substreams. If the coding mode syntax element has a first value (e.g., 3), then each row of the CTB of the picture is represented by a single one of the sub-streams. If the syntax element has a second value (e.g., 0, 1, or 2), each tile of a picture having one or more CTBs in the slice is represented by a single one of the sub-streams. To facilitate WPP or decoding image blocks of a slice in parallel, sub-streams in the slice data may include padding bits that ensure byte alignment of the sub-streams. However, in the example where there is only one tile in the picture and entropy slices are not enabled, it may not be necessary to include these padding bits. Accordingly, video decoder 30 may determine whether a sub-stream in the slice data includes padding bits that ensure byte alignment of the sub-stream based at least in part on a coding mode syntax element (e.g., tile _ mode).
The slice data may follow the example syntax of table 4 below.
TABLE 4 slicing data syntax
slice_data(){ Descriptor(s)
CurrTbAddr=LCUAddress
moreDataFlag=1
if(adaptive_loop_filter_flag&&alf_cu control_flag)
AlfCuFlagIdx=-1
subStreamIdx=0
do{
xCU=HorLumaLocation(CurrTbAddr)
yCU=VerLumaLocation(CurrTbAddr)
moreDataFlag=coding_tree(xCU,yCU,Log2TbSize,0)
CurrTbAddr=NextTbAddress(CurrTbAddr)
if(tile_mode!=0||entropy_slice_enabled_flag){
byteIdx=byte_index()
if(byte_aligned()&&byteIdx==entry_offset[subStreamIdx])
subStreamIdx++
else if(!byte_aligned()&&byteIdx==entry_offset[subStreamIdx]-1){
while(!byte_aligned())
bit_equal_to_one f(1)
subStreamIdx++
}
moreDataFlag=moreDataFlag&&(subStreamIdx==num_entry_offsets)
}
}while(moreDataFlag)
}
In the example syntax of table 4, the slice data includes a coding _ tree () function. Video decoder 30 may perform a loop as video decoder 30 parses slice data. During each iteration of the loop, video decoder 30 calls a coding _ tree () function to parse coded CTBs in the slice data. When video decoder 30 calls the coding _ tree () function to parse a particular coded CTB, video decoder 30 may parse an end _ of _ slice _ flag syntax element from the slice data. If the end of slice flag syntax element is equal to 0, there is another CTB in the slice or entropy slice that follows the particular coded CTB. If the end of slice flag syntax element is equal to 1, the particular coded CTB is the last coded CTB of the slice or entropy slice.
Furthermore, the example syntax of table 4 includes a byte _ index () function. The byte _ index () function may return the byte index of the current position within the bits of the NAL unit. The current position within the bits of the NAL unit may be the first non-parsed bit of the NAL unit. The byte _ index () function returns a value equal to 0 if the next bit in the bitstream is any bit of the first byte of the NAL unit header.
The slice data syntax of table 4 is an example. In other examples of the slice data syntax, the condition "if (tile _ mode | | | 0| | entry _ slice _ enabled _ flag)" of table 4 is replaced with the condition "if (tile _ mode ═ 1| | | tile _ mode | | | 2| | | entry _ slice _ enabled _ flag)".
Fig. 4 is a flow diagram illustrating example operations 200 of video encoder 20 for encoding video data in which no combination of tiles within a single picture and WPP waves is allowed, in accordance with one or more aspects of the present disclosure. Fig. 4 is provided as an example. In other examples, more, fewer, or different steps than those shown in the example of fig. 4 may be used to implement the techniques of this disclosure.
In the example of fig. 4, video encoder 20 generates a first coded picture by encoding a picture according to a first coding mode (202). When video encoder 20 encodes a picture according to the first coding mode, the picture is completely encoded using WPP. In addition, video encoder 20 may generate a second coded picture by encoding the picture according to a second coding mode (204). When video encoder 20 encodes the picture according to the second coding mode, video encoder 20 may partition the picture into one or more tiles. Video encoder 20 may encode each tile of a picture (i.e., encode each CTB in each of the tiles) without using WPP. For example, video encoder 20 may encode the CTBs of each of the tiles according to a raster scan order without using WPP. Video encoder 20 may then select either the first coded picture or the second coded picture (206). In some examples, video encoder 20 may select the first coded picture or the second coded picture based on a bitrate/distortion analysis of the first coded picture and the second coded picture. Video encoder 20 may generate a bitstream that includes the selected coded picture and a syntax element indicating whether the picture is encoded according to the first coding mode or the second coding mode (208).
Fig. 5 is a flowchart illustrating example operations 220 of video decoder 30 for decoding video data in which no combination of tiles and WPPs within a single picture is allowed, in accordance with one or more aspects of the present disclosure. Fig. 5 is provided as an example.
In the example of fig. 5, video decoder 30 may parse syntax elements from a bitstream that includes a coded representation of a picture in the video data (222). Video decoder 30 may determine whether the syntax element has a particular value (224). In response to determining that the syntax element has a particular value ("yes" of 224), video decoder 30 may decode the picture in its entirety using WPP (226). In response to determining that the syntax element does not have the particular value ("no" of 224), video decoder 30 may decode each tile of the picture without using WPP, wherein the picture has one or more tiles (228).
Fig. 6 is a flow diagram illustrating example operations 230 of video decoder 30 for decoding video data in which no combination of tiles and WPPs within a single picture is allowed, in accordance with one or more aspects of the present disclosure. Fig. 6 is provided as an example. In other examples, more, fewer, or different steps than those shown in the example of fig. 6 may be used to implement the techniques of this disclosure. Fig. 6 may be a more specific example of operation 220 of fig. 5.
In the example of fig. 6, video decoder 30 receives a bitstream (231). Video decoder 30 may parse syntax elements from the bitstream (232). In some examples, the bitstream includes an SPS that includes the syntax element. In other examples, the bitstream includes a PPS including the syntax element.
Subsequently, video decoder 30 may determine whether the syntax element has a first value (e.g., 0) (234). In the example of fig. 6, if the syntax element has the first value ("yes" of 234), the picture has a single tile, and video decoder 30 may decode the single tile of the picture without using WPP (236).
However, if the syntax element does not have the first value ("no" of 234), video decoder 30 may determine whether the syntax element has a second value (e.g., 1) (238). In response to determining that the syntax element has the second value ("yes" of 238), video decoder 30 may determine that the picture has a plurality of uniformly spaced tiles, and video decoder 30 may decode each of the uniformly spaced tiles without using WPP (240).
On the other hand, if the syntax element does not have the second value ("no" of 238), video decoder 30 may determine whether the syntax element has a third value (e.g., 2) (242). In response to determining that the syntax element has the third value ("yes" of 242), video decoder 30 may determine that the picture has a plurality of unevenly-spaced tiles, and video decoder 30 may decode the unevenly-spaced tiles of the picture without using WPP (244). However, in response to determining that the syntax element does not have the third value ("no" of 242), video decoder 30 may decode the picture in its entirety using WPP (246). In this way, if the syntax element has a first value (e.g., 3), the picture is encoded in its entirety using WPP, and if the syntax element has a second value (e.g., 0, 1, or 2) that is different from the first value, the picture is partitioned into one or more tiles, and the picture is encoded without using WPP.
Fig. 7 is a flow diagram illustrating example operations 270 of video encoder 20 for encoding video data in which each column of CTBs of a picture is in a separate sub-stream, according to one or more aspects of this disclosure. In some video coding systems, there are different ways to signal the entry points of the image blocks and the WPP wave. This may increase the complexity of these video coding systems. The techniques of this disclosure (and as explained with respect to fig. 7 and 8) may address these issues by providing a unified syntax for indicating entry points for image blocks, WPP waves, and (in some examples) entropy slices.
In the example of fig. 7, video encoder 20 signals: a picture in a sequence of video pictures is encoded using WPP (272). Video encoder 20 may signal the use of WPP to encode pictures in various ways. For example, video encoder 20 may generate an SPS that includes a syntax element (e.g., "tile _ mode") that indicates whether WPP is used to completely decode a picture. In another example, video encoder 20 may generate a PPS that includes a syntax element (e.g., "tile _ mode") that indicates whether WPP is used to decode the picture.
Further, video encoder 20 may perform WPP to generate a plurality of sub-streams (274). Each of the sub-streams may include a consecutive series of bits representing one encoded row of CTBs in a slice of a picture. Thus, each row of CTBs is encoded as one substream. Video encoder 20 may generate a coded slice NAL unit that includes a plurality of sub-streams (276). A coded slice NAL unit may include a slice header and slice data in accordance with the example syntax of table 3 and table 4 above.
Fig. 8 is a flow diagram illustrating example operations 280 of video decoder 30 for decoding video data, in which each line of the CTB of a picture is in a separate sub-stream, in accordance with one or more aspects of this disclosure. In the example of fig. 8, video decoder 30 receives a bitstream that includes coded slice NAL units (282). A coded slice NAL unit includes multiple sub-streams. Each of the sub-streams may include a consecutive series of bits representing one row of CTBs in a slice of a picture. Moreover, in the example of fig. 8, video decoder 30 determines that the slice is encoded using WPP based on one or more syntax elements in the bitstream (284). For example, video decoder 30 may determine that the slice is encoded using WPP based on a tile _ mode syntax element equal to 3. In this example, if the tile mode syntax element is not equal to 3, video decoder 30 may decode each of the one or more tiles of the picture without using WPP.
Next, video decoder 30 may decode the slice using WPP (286). When video decoder 30 decodes the slice, video decoder 30 may parse syntax elements associated with the CTBs of the slice. Video decoder 30 may perform a CABAC parsing process on some of the syntax elements as part of parsing the syntax elements associated with the CTB.
Fig. 9A is a flow diagram illustrating a first portion of an example CABAC parsing process 300 to parse slice data, in accordance with one or more aspects of the present disclosure. Video decoder 30 may perform the process of fig. 9A when parsing syntax elements having descriptors ae (v) in slice data and in coding tree syntax. The process of fig. 9A may output a value for a syntax element.
In the example of fig. 9A, entropy decoding unit 150 of video decoder 30 performs initialization of the CABAC parsing process (302). In some examples, the initialization of the CABAC parsing process is the same as the initialization described in subclause 9.2.1 of HEVC WD 5.
In addition, entropy decoding unit 150 may determine the address of the neighboring CTB (304). A neighboring CTB may be a CTB that contains a block that neighbors the current CTB (i.e., the CTB currently being decoded by video decoder 30) on the left, above-left, above, or above-right. In some examples, entropy decoding unit 150 may determine the address of the neighboring CTB as:
tbAddrT=cuAddress(x0+2*(1<<Log2MaxCUSize)-1,y0-1)
in the above equation, tbAddrT represents the address of the neighboring CTB, x0 represents the x coordinate of the top left luma sample of the current CTB, y0 represents the y coordinate of the top left luma sample of the current CTB, and Log2MaxCUSize represents the logarithm with the base of the maximum size of the CU of 2. The function cuAddress returns the address of the CU, which contains the x-coordinate specified by the first argument and the y-coordinate specified by the second argument.
Next, entropy decoding unit 150 may determine the availability of the neighboring CTBs for prediction in the picture using the addresses of the neighboring CTBs (306). In other words, entropy decoding unit 150 may determine whether information associated with neighboring CTBs is available for use in selecting a CABAC context.
Entropy decoding unit 150 may determine the availability of neighboring CTBs for prediction in a picture in various ways. For example, entropy decoding unit 150 may perform the process described in sub-clause 6.4.3 of WD5 (with tbAddrT as input) to determine the availability of neighboring CTBs for prediction in a picture. In another example, entropy decoding unit 150 may determine that the CTB may be used for in-picture prediction unless one of the following conditions is true. Entropy decoding unit 150 may determine that the CTB is not usable for in-picture prediction if one of the following conditions is true. First, if the address of the CTB is less than 0, entropy decoding unit 150 may determine that the CTB is not usable for in-picture prediction. Second, if the address of the CTB is greater than the address of the CTB that entropy decoding unit 150 is currently parsing, entropy decoding unit 150 may determine that the CTB is not usable for in-picture prediction. Third, if a particular CTB belongs to a different slice than the CTB currently being parsed by entropy decoding unit 150, entropy decoding unit 150 may determine that the particular CTB is not usable for in-picture prediction. For example, if the address of a particular CTB is represented as tbAddr and the address of the CTB currently being parsed by entropy decoding unit 150 is represented as CurrTbAddr, entropy decoding unit 150 may determine that the CTB having the address tbAddr belongs to a different slice than the CTB having the address CurrTbAddr. Fourth, entropy decoding unit 150 may determine that the CTB is not usable for in-picture prediction if one or more syntax elements in the bitstream indicate that a tile of a picture currently being decoded by video decoder 30 may be independently decoded and the CTB is in a different tile than the CTB currently being parsed by entropy decoding unit 150. For example, if the tile _ boundary _ independence _ flag syntax element of the example syntax of table 1 is equal to 1 and a CTB having an address tbAddr is contained in a different tile than a CTB having an address CurrTbAddr, entropy decoding unit 150 may determine that the CTB is not usable for in-picture prediction.
Moreover, entropy decoding unit 150 may determine whether the syntax element that entropy decoding unit 150 is currently parsing (i.e., the current syntax element) is in the coding tree syntax structure (308). If the current syntax element is not in the coding tree syntax structure ("no" of 308), entropy decoding unit 150 may perform the portion of CABAC parsing process 300 shown in fig. 9B. On the other hand, if the current syntax element is in the coding tree structure ("yes" of 308), entropy decoding unit 150 may determine whether the image block of the current picture (i.e., the picture including the current CTB) may be independently decoded (310). For example, in the example SPS syntax of table 1, if an SPS associated with the current picture includes a tile boundary index flag syntax element equal to 1, entropy decoding unit 150 may determine that the tiles of the current picture may be independently decoded. In response to determining that an image block of the current picture may be independently decoded ("yes" of 310), entropy decoding unit 150 may perform the portions of CABAC parsing process 300 shown in fig. 9B.
However, in response to determining that an image block of the current picture cannot be independently decoded ("no" of 310), entropy decoding unit 150 may determine whether tbAddr% picwidthlnlcus, where tbAddr is an address of a neighboring CTB,% represents a modulo operator, and picwidthlnlcus indicate a width of the current picture in the CTB (i.e., LCU) (312).
In response to determining tbAddr% picWidthInLCUs is equal to 0 ("yes" of 312), entropy decoding unit 150 may determine whether the neighboring CTBs can be used for in-picture prediction (314). In some examples, entropy decoding unit 150 may perform a process in act 306 to determine a value of a variable availableflag that indicates whether neighboring CTBs may be used for in-picture prediction. If the variable availableflag t is equal to 1, then the neighboring CTBs may be used for in-picture prediction. In act 314, entropy decoding unit 150 may determine whether the variable availableflag t is equal to 1.
In response to determining that neighboring CTBs are available for in-picture prediction ("yes" of 314), entropy decoding unit 150 may perform a synchronization process of a CABAC parsing process (316). In some examples, entropy decoding unit 150 may perform the synchronization process described in sub-clause 9.2.1.3 of HEVC WD 5. After performing the synchronization process or in response to determining that the neighboring CTBs are not available for in-picture prediction ("no" of 314), entropy decoding unit 150 may perform a decoding process for the binary decision prior to terminating (318). In general, the decoding process for binary decision before termination is a special decoding process for entropy decoding the end _ of _ slice _ flag and pcm _ flag syntax elements. Video decoder 30 may use the end of slice flag and pcm flag to make a binary decision prior to termination of the process of parsing the slice data. In some examples, entropy decoding unit 150 may perform a decoding process for binary decisions prior to termination as specified in sub-clause 9.2.3.2.4 of HEVC WD 5.
After performing the decoding process of the binary decision before termination (318), the entropy-decoding unit 150 may perform an initialization process of the arithmetic decoding engine (320). In some examples, entropy decoding unit 150 may perform the initialization process defined in sub-clause 9.2.1.4 of HEVC WD 5. After performing the initialization process of the arithmetic decoding engine, entropy decoding unit 150 may perform the portion of CABAC parsing process 300 shown in fig. 9B.
If tbAddr% picWidthInLCUs are not equal to 0 ("NO" of 312), the entropy decoding unit 150 may determine whether tbAddr% picWidthInLCUs are equal to 2 (322). In other words, entropy decoding unit 150 may determine whether the CTB address of the neighboring CTB modulo the width of the current picture in the CTB is equal to 2. In response to determining that tbAddr% picWidthInLCUs is not equal to 2, entropy decoding unit 150 may perform the portion of CABAC parsing process 300 shown in fig. 9B. However, in response to determining that tbAddr% picWidthInLCUs equals 2 ("yes" of 322), entropy decoding unit 150 may perform a remembering process (324). In general, the remembering process outputs variables used in the initialization process for context variables assigned to syntax elements other than the end _ of _ slice _ flag syntax element. In some examples, entropy decoding unit 150 may perform the remembering process defined in subclause 9.2.1.2 of HEVC WD 5. After performing the remembering process, entropy decoding unit 150 may perform the portions of CABAC parsing process 300 shown in fig. 9B.
Fig. 9B is a flow diagram illustrating successive portions of the example CABAC parsing process 300 of fig. 9A. As shown in fig. 9B, entropy decoding unit 150 may binarize the current syntax element (330). In other words, entropy decoding unit 150 may derive a binarization for the current syntax element. Binarization of a syntax element may be a set of binning strings for all possible values of the syntax element. The bin string is a string of bins that is a middle representation of values from the binarized syntax element of the syntax element. In some examples, entropy decoding unit 150 may perform the process defined in subclause 9.2.2 of HEVC WD5 to derive the binarization of the current syntax element.
In addition, entropy decoding unit 150 may determine a coding process flow (332). Entropy decoding unit 150 may determine a coding process flow based on the binarization of the current syntax element and the sequence of parsed bins. In some examples, entropy decoding unit 150 may determine a coding process flow as described in sub-clause 9.2.2.9 of HEVC WD 5.
Furthermore, entropy decoding unit 150 may determine a context index for each bin of the binarization of the current syntax element (334). Each of the binarized bins of the current syntax element is indexed by a variable binIdx, and the context index of the binarized bin of the current syntax element may be represented as ctxIdx. In some examples, entropy decoding unit 150 may determine a context index for a binarized bin of a current syntax element as specified in sub-clause 9.2.3.1 of HEVC WD 5.
Entropy decoding unit 150 may perform an arithmetic decoding process for each context index (336). In some examples, entropy decoding unit 150 may perform an arithmetic decoding process for each context index as specified in sub-clause 9.2.3.2 of HEVC WD 5. By performing the arithmetic decoding process for each context index, entropy decoding unit 150 may generate a sequence of parsed bins.
Entropy decoding unit 150 may determine whether the sequence of parsed bins matches a bin string in the set of bin strings generated by binarization of the current syntax element (340). If the sequence of parsed bins matches a bin string in the set of bin strings resulting from binarization of the current syntax element ("yes" of 340), entropy decoding unit 150 may assign a corresponding value to the current syntax element (342). After assigning a corresponding value to the current syntax element or in response to determining that the sequence of parsed bins does not match any bin string in the set of bin strings generated by binarization of the current syntax element ("no" of 340), entropy decoding unit 150 has completed parsing the current syntax element.
In some examples, if the current syntax element is an mb _ type syntax element and the decoded value of the mb _ type syntax element is equal to I _ PCM, the entropy decoding unit 150 may be initialized after decoding any PCM _ alignment _ zero _ bit syntax element and all PCM _ sample _ luma and PCM _ sample _ chroma data, as specified in sub-clause 9.2.1.2 of HEVC WD 5.
Fig. 10 is a conceptual diagram illustrating an example of WPP. As described above, a picture may be partitioned into blocks of pixels, each of which is associated with a CTB. Fig. 10 illustrates the pixel blocks associated with the CTBs as a grid of white squares. The picture includes CTB rows 350A-350E (collectively, "CTB rows 350").
A first parallel processing thread (e.g., executed by one of a plurality of parallel processing cores) may be coding CTBs in CTB row 350A. At the same time, other threads (e.g., executed by other parallel processing cores) may be decoding CTBs in CTB rows 350B, 350C, and 350D. In the example of fig. 10, the first thread is currently coding CTB352A, the second thread is currently coding CTB352B, the third thread is currently coding CTB352C, and the fourth thread is currently coding CTB 352D. The present disclosure may refer to CTBs 352A, 352B, 352C, and 352D collectively as "current CTBs 352". Because the video coder may begin coding a row of CTBs after more than two CTBs of the immediately higher row have been coded, the current CTBs 352 are horizontally displaced from each other by the width of two CTBs.
In the example of fig. 10, the thread may use data from the CTBs indicated by the thick gray arrows to perform intra-prediction or inter-prediction on the CUs in the current CTB 352. (a thread may also use data from one or more reference frames to perform inter prediction for a CU.) to code a given CTB, a thread may select one or more CABAC contexts based on information associated with previously coded CTBs. The thread may use one or more CABAC contexts to perform CABAC coding on syntax elements associated with the first CU of a given CTB. If the given CTB is not the leftmost CTB of the row, the thread may select one or more CABAC contexts based on information associated with the last CU of the CTB to the left of the given CTB. If the given CTB is the leftmost CTB of the row, the thread may select one or more CABAC contexts based on information associated with the last CU of the CTB at the two CTBs above and to the right of the given CTB. The thread may use data from the last CU of the CTB indicated by the thin black arrow to select the CABAC context of the first CU of the current CTB 352.
Fig. 11 is a conceptual diagram illustrating an example CTB coding order for a picture 400 partitioned into a plurality of tiles 402A, 402B, 402C, 402D, 402E, and 402F (collectively, "tiles 402"). Each square block in picture 400 represents a block of pixels associated with a CTB. The bold dashed lines indicate example image block boundaries. Different types of cross-hatching correspond to different slices.
The numbers in the pixel blocks indicate the positions of corresponding CTBs (LCUs) in the picture block coding order of picture 400. As illustrated in the example of FIG. 11, the CTB in tile 402A is coded first, followed by the CTB in tile 402B, followed by the CTB in tile 402C, followed by the CTB in tile 402D, followed by the CTB in tile 402E, followed by the CTB in tile 402F. Within each of the tiles 402, the CTBs are coded according to a raster scan order.
A video encoder may generate four coded slice NAL units for picture 400. The first coded slice NAL unit may include an encoded representation of CTBs 1-18. The slice data of the first coded slice NAL unit may include two sub-streams. The first substream may include encoded representations of CTBs 1-9. The second sub-stream may include encoded representations of CTBs 10-18. Thus, the first coded slice NAL unit may include an encoded representation of a slice containing a plurality of picture blocks.
The second coded slice NAL unit may include an encoded representation of CTBs 19-22. The slice data of the second coded slice NAL unit may include a single sub-stream. The third coded slice NAL unit may include an encoded representation of CTBs 23-27. The slice data of the third coded slice NAL unit may include only a single sub-stream. Thus, image block 402C may contain multiple slices.
The fourth coded slice NAL unit may include encoded representations of CTBs 28-45. The slice data of the fourth coded slice NAL unit may include three sub-streams, one sub-stream for each of image blocks 402D, 402E, and 402F. Thus, the fourth coded slice NAL unit may include an encoded representation of a slice containing a plurality of image blocks.
In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, corresponding to tangible media such as data storage media, or communication media, including any medium that facilitates transfer of a computer program from one place to another, such as according to a communication protocol. In this manner, a computer-readable medium may generally correspond to: (1) a non-transitory, tangible computer-readable storage medium, or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. The computer program product may include a computer-readable medium.
By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, as used herein, the term "processor" may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.
The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses including wireless handsets, Integrated Circuits (ICs), or a collection of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Conversely, as described above, the various units may be combined in a codec hardware unit or provided by a collection of interoperability hardware units (including one or more processors as described above) in conjunction with suitable software and/or firmware.
Various examples have been described. These and other examples are within the scope of the following claims.

Claims (16)

1. A method for decoding video data, the method comprising:
parsing a syntax element from a bitstream, the bitstream comprising coded slice Network Abstraction Layer (NAL) units of slices of pictures in the video data, the pictures being partitioned into a grid of Coding Tree Blocks (CTBs), each CTB being associated with a different block of pixels of equal size within the picture, the coded slice NAL units comprising a plurality of sub-streams, wherein, when the syntax element has a particular value, it indicates that each line of CTBs is represented by a single one of the sub-streams, and when the syntax element does not have the particular value, it indicates that each block of pictures having one or more CTBs in the slice is represented by a single one of the sub-streams;
determining, based on the syntax element, whether the sub-stream includes padding bits that ensure byte alignment of the sub-stream;
in response to determining that the syntax element has the particular value, completely decoding the picture using Wavefront Parallel Processing (WPP), wherein decoding using WPP includes decoding CTBs of a row of the picture from left to right after two or more CTBs of an immediately higher row of the picture have been decoded; and
wherein, when the syntax element does not have the particular value, the picture has one or more tiles, and each tile of the picture is decoded without using WPP.
2. The method of claim 1, further comprising:
parse a second syntax element from the bitstream, the bitstream comprising a coded representation of a second picture of the video data, the second picture partitioned into a grid of CTBs, wherein the second picture is partitioned into at least a first tile and a second tile; and is
In response to determining that the second syntax element does not have a particular value, decoding, in parallel, a CTB of the first tile and a CTB of the second tile.
3. The method of claim 1, further comprising:
determining that the parameter set comprises an image block column number syntax element and an image block row number syntax element;
determining a specified number of tile columns based on the tile column number syntax element, wherein a number of tile columns for each picture associated with the parameter set is equal to the specified number of tile columns; and
determining a specified number of image block lines based on the image block line number syntax element, wherein a number of image block lines per picture associated with the parameter set is equal to the specified number of image block lines.
4. The method of claim 1, wherein the syntax element is a first syntax element, the particular value is a first value, and the method further comprises determining, based on the first syntax element having a second value, that the picture includes only one tile and that the bitstream includes a second syntax element indicating whether entropy slicing is enabled for an encoded representation of the picture referencing a parameter set that includes the first syntax element and the second syntax element.
5. The method of claim 1, wherein decoding the picture using WPP comprises:
in response to determining that a first CTB separates a single CTB from a left boundary of the picture, storing a context variable associated with the first CTB; and
entropy decode, based at least in part on the context variable associated with the first CTB, one or more syntax elements of a second CTB, the second CTB being adjacent to the left boundary of the picture and one row of CTBs lower than the first CTB.
6. A method for encoding video data, the method comprising
Generating, in a bitstream, a coded slice Network Abstraction Layer (NAL) unit for a slice of a picture of video data, the picture partitioned into a grid of Coding Tree Blocks (CTBs), each CTB associated with a different pixel block of equal size within the picture, the coded slice NAL unit comprising a plurality of sub-streams;
generating a syntax element in the bitstream;
wherein the syntax element having a particular value indicates that each row of CTBs of the picture is represented by a single one of the sub-streams and the picture is encoded in its entirety using Wavefront Parallel Processing (WPP), and wherein encoding using WPP comprises encoding CTBs of a row of the picture from left to right after two or more CTBs of an immediately higher row of the picture have been encoded;
wherein the syntax elements without the particular value indicate that the picture has one or more tiles and each tile of a picture having one or more CTBs in the slice is represented by a single one of the substreams, and each tile of the picture is encoded without using WPP.
7. The method of claim 6, further comprising:
including, in the bitstream, a second syntax element, the bitstream comprising a coded representation of a second picture of the video data, the second picture partitioned into a grid of CTBs, the second syntax element not having the particular value, wherein the second picture is partitioned into at least a first tile and a second tile, an
Encoding the CTB of the first tile and the CTB of the second tile in parallel.
8. The method of claim 6, wherein:
generating the bitstream comprises generating a parameter set comprising a picture block column number syntax element and a picture block row number syntax element,
a number of tile columns may be determined based on the tile column number syntax element, and the number of tile columns for each picture associated with the parameter set is equal to the number of tile columns, and
a number of rows of image blocks may be determined based on the image block row number syntax element, and a number of rows of image blocks for each picture associated with the parameter set is equal to the number of rows of image blocks.
9. The method of claim 6, wherein the syntax element is a first syntax element, the particular value is a first value, the first syntax element has a second value that indicates the picture is partitioned into single tiles, and the bitstream includes a second syntax element that indicates whether entropy slicing is enabled for an encoded representation of the picture that references a parameter set that includes the first syntax element and the second syntax element.
10. The method of claim 6, further comprising encoding the picture using WPP, wherein encoding the picture using WPP comprises:
in response to determining that a first CTB separates a single CTB from a left boundary of the picture, storing a context variable associated with the first CTB; and
entropy encoding, based at least in part on the context variables associated with the first CTB, one or more syntax elements of a second CTB, the second CTB being adjacent to the left boundary of the picture and one row of CTBs lower than the first CTB.
11. A video decoding device, comprising:
means for parsing a syntax element from a bitstream, the bitstream comprising coded slice Network Abstraction Layer (NAL) units of slices of pictures in video data, the pictures being partitioned into a grid of Coding Tree Blocks (CTBs), each CTB being associated with a different block of pixels of equal size within the picture, the coded slice NAL units comprising a plurality of sub-streams, wherein, when the syntax element has a particular value, it indicates that each row of CTBs is represented by a single one of the sub-streams, and when the syntax element does not have the particular value, it indicates that each tile of a picture having one or more CTBs in the slice is represented by a single one of the sub-streams;
means for determining, based on the syntax element, whether the sub-stream includes padding bits that ensure byte alignment of the sub-stream;
means for decoding the picture in its entirety using Wavefront Parallel Processing (WPP) in response to determining that the syntax element has the particular value, wherein decoding using WPP comprises decoding CTBs of a row of the picture from left to right after two or more CTBs of an immediately higher row of the picture have been decoded; and
wherein, when the syntax element does not have the particular value, the picture has one or more tiles, and each tile of the picture is decoded without using WPP.
12. The video decoding device of claim 11, comprising means for performing any of the methods of claims 2-5.
13. A video encoding device, comprising:
means for generating, in a bitstream, coded slice Network Abstraction Layer (NAL) units for slices of pictures in the video data, the pictures partitioned into a grid of Coding Tree Blocks (CTBs), each CTB associated with a different pixel block of equal size within the pictures, the coded slice NAL units comprising a plurality of sub-streams;
means for generating a syntax element in the bitstream;
wherein the syntax element having a particular value indicates that each row of CTBs is represented by a single one of the sub-streams, the sub-streams including guaranteed byte-aligned padding bits and the picture being encoded in its entirety using wave-front parallel processing (WPP), and wherein encoding using WPP includes encoding CTBs of a row of the picture from left to right after two or more CTBs of an immediately higher row of the picture have been encoded;
wherein the syntax elements without the particular value indicate that the picture has one or more tiles and each tile of a picture having one or more CTBs in the slice is represented by a single one of the substreams, and each tile of the picture is encoded without using WPP.
14. The video coding device of claim 13, comprising means for performing the method of any of claims 7-10.
15. A computer-readable storage medium storing instructions that, when executed by one or more processors of a video decoding device, configure the video decoding device to:
parsing a syntax element from a bitstream, the bitstream comprising coded slice Network Abstraction Layer (NAL) units of a slice of a picture in video data, the picture partitioned into a grid of Coding Tree Blocks (CTBs), each CTB associated with a different block of pixels of equal size within the picture, the coded slice NAL units comprising a plurality of sub-streams, wherein, when the syntax element has a particular value, it indicates that each row of CTBs is represented by a single one of the sub-streams, and when the syntax element does not have the particular value, it indicates that each tile of the picture having one or more CTBs in the slice is represented by a single one of the sub-streams;
determining, based on the syntax element, whether the sub-stream includes padding bits that ensure byte alignment of the sub-stream;
in response to determining that the syntax element has the particular value, completely decoding the picture using Wavefront Parallel Processing (WPP), wherein decoding using WPP includes decoding CTBs of a row of the picture from left to right after two or more CTBs of an immediately higher row of the picture have been decoded; and
wherein, when the syntax element does not have the particular value, the picture has one or more tiles, and each tile of the picture is decoded without using WPP.
16. A computer-readable storage medium storing instructions that, when executed by one or more processors of a video encoding device, configure the video encoding device to
Generating, in a bitstream, a coded slice Network Abstraction Layer (NAL) unit for a slice of a picture in video data, the picture partitioned into a grid of Coding Tree Blocks (CTBs), each CTB associated with a different pixel block of equal size within the picture, the coded slice NAL unit comprising a plurality of sub-streams;
generating a syntax element in the bitstream;
wherein the syntax element having a particular value indicates that each row of CTBs of the picture is represented by a single one of the sub-streams and the picture is encoded in its entirety using Wavefront Parallel Processing (WPP), and wherein encoding using WPP comprises encoding CTBs of rows of the picture from left to right after two or more CTBs of an immediately higher row of the picture have been encoded;
wherein the syntax element having no particular value indicates that the picture has one or more tiles and each tile of the picture having one or more CTBs in the slice is represented by a single one of the substreams, and each tile of the picture is encoded without using WPP.
HK14111634.3A 2012-01-18 2012-12-19 Indication of use of wavefront parallel processing in video coding HK1198236B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201261588096P 2012-01-18 2012-01-18
US61/588,096 2012-01-18
US13/718,883 US9332259B2 (en) 2012-01-18 2012-12-18 Indication of use of wavefront parallel processing in video coding
US13/718,883 2012-12-18
PCT/US2012/070680 WO2013109382A2 (en) 2012-01-18 2012-12-19 Indication of use of wavefront parallel processing in video coding

Publications (2)

Publication Number Publication Date
HK1198236A1 HK1198236A1 (en) 2015-03-13
HK1198236B true HK1198236B (en) 2019-07-19

Family

ID=

Similar Documents

Publication Publication Date Title
JP6203755B2 (en) Substreams for wavefront parallel processing in video coding
KR101589851B1 (en) Padding of segments in coded slice nal units
JP2015515824A (en) Coding coded block flags
KR20140093254A (en) Video coding with network abstraction layer units that include multiple encoded picture partitions
HK1198236B (en) Indication of use of wavefront parallel processing in video coding
HK1198403B (en) Sub-streams for wavefront parallel processing in video coding
HK1197512A (en) Method and apparatus for padding segments in coded slice nal units
HK1197512B (en) Method and apparatus for padding segments in coded slice nal units