[go: up one dir, main page]

HK1198401B - A method and apparatus for encoding and decoding video data - Google Patents

A method and apparatus for encoding and decoding video data Download PDF

Info

Publication number
HK1198401B
HK1198401B HK14111884.0A HK14111884A HK1198401B HK 1198401 B HK1198401 B HK 1198401B HK 14111884 A HK14111884 A HK 14111884A HK 1198401 B HK1198401 B HK 1198401B
Authority
HK
Hong Kong
Prior art keywords
partition
bin
block
video data
size
Prior art date
Application number
HK14111884.0A
Other languages
Chinese (zh)
Other versions
HK1198401A1 (en
Inventor
钱威俊
霍埃尔.索赖.罗哈斯
马尔塔.卡切维奇
Original Assignee
Qualcomm Incorporated
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/645,308 external-priority patent/US9451287B2/en
Application filed by Qualcomm Incorporated filed Critical Qualcomm Incorporated
Publication of HK1198401A1 publication Critical patent/HK1198401A1/en
Publication of HK1198401B publication Critical patent/HK1198401B/en

Links

Description

Method and apparatus for encoding and decoding video data
This application claims the benefit of united states provisional application No. 61/557,325, filed on 8/11/2011 and united states provisional application No. 61/561,911, filed on 20/11/2011, both of which are incorporated herein by reference in their entirety.
Technical Field
This disclosure relates to video coding, and in particular, to Context Adaptive Binary Arithmetic Coding (CABAC) for use in video coding.
Background
Digital video capabilities can be incorporated into a wide variety of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, Personal Digital Assistants (PDAs), laptop or desktop computers, tablet computers, electronic book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video gaming consoles, cellular or satellite radio telephones, so-called "smart phones," video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video compression techniques such as those defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 part 10 Advanced Video Coding (AVC), the High Efficiency Video Coding (HEVC) standards currently being developed, and extensions of such standards. Video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing such video compression techniques.
Video compression techniques perform spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (i.e., a video picture or a portion of a video picture) may be partitioned into video blocks, which may also be referred to as treeblocks, Coding Units (CUs), and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in inter-coded (P or B) slices of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. A picture may be referred to as a frame and a reference picture may be referred to as a reference frame.
Spatial or temporal prediction results in a prediction block for a block to be coded. The residual data represents pixel differences between the original block to be coded and the prediction block. An inter-coded block is encoded according to a motion vector that points to a block of reference samples that form a prediction block and residual data that indicates the difference between the coded block and the prediction block. The intra-coded block is encoded according to an intra-coding mode and residual data. For further compression, the residual data may be transformed from the pixel domain to the transform domain, resulting in residual transform coefficients, which may then be quantized. Quantized transform coefficients, initially arranged in a two-dimensional array, may be scanned in order to generate a one-dimensional vector of transform coefficients, and entropy coding may be applied to achieve even more compression.
Disclosure of Invention
In general, techniques are described for Context Adaptive Binary Arithmetic Coding (CABAC) in a video coding process. In particular, this disclosure proposes a reduction in the number of CABAC contexts for one or more syntax elements, non-limiting examples of which include pred _ type, merge _ idx, inter _ pred _ flag, ref _ idx _ lx, cbf _ cb, cbf _ cr, coeff _ abs _ level _ header 1_ flag, and coeff _ abs _ level _ header 2_ flag. The modification may reduce up to 56 contexts, with negligible coding efficiency changes. The proposed context reduction of syntax elements can be used alone or in any combination.
In one example of this disclosure, a method of encoding video may comprise: the method includes determining a first prediction type for a block of video data in a P slice, representing the first prediction type as a P-slice prediction type syntax element, determining a second prediction type for the block of video data in a B slice, representing the second prediction type as a B-slice prediction type syntax element, determining P-slice binarization for the P-slice prediction type syntax element, determining B-slice binarization for the B-slice prediction type syntax element, wherein the P-slice prediction type syntax element and the B-slice prediction type syntax element are determined using the same binarization logic, and encoding the video data based on the binarization for the P-slice prediction type syntax element and the B-slice prediction type syntax element.
In another example of this disclosure, a method of decoding video may comprise: map the binarized P-slice prediction type syntax element to a prediction type using a binarization mapping for a block of video data in a P-slice, map the binarized B-slice prediction type syntax element to a prediction type using the same binarization mapping for a block of video data in a B-slice, and decode the video data based on the mapped prediction type.
In another example of this disclosure, a method of encoding video data comprises: the method includes determining a partition type for a prediction mode of a block of video data, encoding a partition type bin of a prediction type syntax element of the block of video data using CABAC having a single context, wherein the single context is the same for either partition type, and encoding a partition size bin of the prediction type syntax element of the block of video data using CABAC in a bypass mode.
In another example of this disclosure, a method of decoding video data comprises: receiving a prediction type syntax element for a block of video data that has been coded using CABAC, the prediction type syntax element including a partition type bin representing a partition type and a partition size bin representing a partition size, decoding the partition type bin of the prediction type syntax element using context adaptive binary arithmetic coding with a single context, wherein the single context is the same for any partition type, and decoding the partition size bin of the prediction type syntax element using CABAC in a bypass mode.
In another example of this disclosure, a method of coding video data comprises: coding a Cb chroma coded block flag of a block of video data using CABAC, wherein coding the Cb chroma coded block flag comprises using a context set that includes one or more contexts as part of CABAC, and coding a Cr chroma coded block using CABAC, wherein coding the Cr chroma coded block flag comprises using the same context set as the Cb chroma coded block flag as part of CABAC.
The techniques described above are also described in terms of an apparatus configured to perform the techniques and in terms of a computer-readable storage medium storing instructions that, when executed, cause one or more processors to perform the techniques.
The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
Drawings
FIG. 1 is a block diagram illustrating an example video encoding and decoding system that may utilize the techniques described in this disclosure.
FIG. 2 is a block diagram illustrating an example video encoder that may implement the techniques described in this disclosure.
FIG. 3 is a block diagram illustrating an example video decoder that may implement the techniques described in this disclosure.
FIG. 4 is a conceptual diagram showing square and non-square partition types.
Fig. 5 is a conceptual diagram showing asymmetric partition types.
Fig. 6 is a flow diagram illustrating an example video encoding method of this disclosure.
Fig. 7 is a flow diagram illustrating an example video decoding method of this disclosure.
Fig. 8 is a flow diagram illustrating an example video encoding method of this disclosure.
Fig. 9 is a flow diagram illustrating an example video decoding method of this disclosure.
Fig. 10 is a flow diagram illustrating an example video coding method of this disclosure.
Detailed Description
This disclosure describes techniques for coding data, such as video data. In particular, this disclosure describes techniques that may facilitate efficient coding of video data using a context adaptive entropy coding process. More specifically, this disclosure proposes a reduction in the number of CABAC contexts used to code syntax elements such as pred _ type, merge _ idx, inter _ pred _ flag, ref _ idx _ lx, cbf _ cb, cbf _ cr, coeff _ abs _ level _ header 1_ flag, and coeff _ abs _ level _ header 2_ flag. The modification reduces up to 56 contexts, with negligible coding efficiency changes. This disclosure describes video coding for purposes of illustration. However, the techniques described in this disclosure may also be applicable to coding other types of data.
Fig. 1 is a block diagram illustrating an example video encoding and decoding system 10 that may be configured to utilize techniques for Context Adaptive Binary Arithmetic Coding (CABAC), according to an example of this disclosure. As shown in fig. 1, system 10 includes a source device 12 that source device 12 transmits encoded video to a destination device 14 via a communication channel 16. The encoded video data may also be stored on storage medium 34 or file server 36, and may be accessed by destination device 14 as desired. When stored to a storage medium or file server, video encoder 20 may provide the coded video data to another device, such as a network interface, a Compact Disc (CD), blu-ray, or Digital Video Disc (DVD) burner or stamping facility device, or other device, for storage of the coded video data to a storage medium. Likewise, a device separate from video decoder 30, such as a network interface, CD or DVD reader, or the like, may retrieve coded video data from a storage medium and provide the retrieved data to video decoder 30.
Source device 12 and destination device 14 may comprise any of a wide variety of devices, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called smart phones, televisions, cameras, display devices, digital media players, video game consoles, or the like. In many cases, such devices may be equipped for wireless communication. Thus, communication channel 16 may comprise a wireless channel, a wired channel, or a combination of wireless and wired channels suitable for transmitting encoded video data. Similarly, the file server 36 may be accessed by the destination device 14 over any standard data connection, including an internet connection. This may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both, suitable for accessing encoded video data stored on a file server.
Techniques for CABAC according to examples of this disclosure may be applied to video coding to support any of a variety of multimedia applications, such as over-the-air television broadcasts, closed circuit television transmissions, satellite television transmissions, streaming video transmissions (e.g., via the internet), encoding of digital video for storage on a data storage medium, decoding of digital video stored on a data storage medium, or other applications. In some examples, system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.
In the example of fig. 1, source device 12 includes a video source 18, a video encoder 20, a modulator/demodulator 22, and a transmitter 24. In source device 12, video source 18 may include sources such as: a video capture device such as a video camera, a video archive containing previously captured video, a video feed interface to receive video from a video content provider, and/or a computer graphics system for generating computer graphics as source video, or a combination of such sources. As one example, if video source 18 is a video camera, source device 12 and destination device 14 may form so-called camera phones or video phones. However, the techniques described in this disclosure may be applicable to video coding in general, and may be applicable to wireless and/or wired applications, or applications in which encoded video data is stored on a local disk.
Captured, pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video information may be modulated by modem 22 according to a communication standard, such as a wireless communication protocol, and transmitted to destination device 14 via transmitter 24. Modem 22 may include various mixers, filters, amplifiers, or other components designed for signal modulation. Transmitter 24 may include circuitry designed for transmitting data, including amplifiers, filters, and one or more antennas.
Captured, pre-captured, or computer-generated video encoded by video encoder 20 may also be stored on storage medium 34 or file server 36 for later use. Storage medium 34 may comprise a blu-ray disc, DVD, CD-ROM, flash memory, or any other suitable digital storage medium for storing encoded video. The encoded video stored on storage medium 34 may then be accessed by destination device 14 for decoding and playback. Although not shown in fig. 1, in some examples, storage medium 34 and/or file server 36 may store the output of transmitter 24.
File server 36 may be any type of server capable of storing encoded video and transmitting the encoded video to destination device 14. Example file servers include web servers (e.g., for a website), FTP servers, Network Attached Storage (NAS) devices, local disk drives, or any other type of device capable of storing encoded video data and transmitting it to a destination device. The transmission of the encoded video data from the file server 36 may be a streaming transmission, a download transmission, or a combination of both. The file server 36 may be accessed by the destination device 14 over any standard data connection, including an internet connection. This may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, ethernet, USB, etc.), or a combination of both, suitable for accessing encoded video data stored on a file server.
In the example of fig. 1, destination device 14 includes a receiver 26, a modem 28, a video decoder 30, and a display device 32. Receiver 26 of destination device 14 receives the information over channel 16, and modem 28 demodulates the information to generate a demodulated bitstream for video decoder 30. The information communicated over channel 16 may include a variety of syntax information generated by video encoder 20 for use by video decoder 30 in decoding the video data. This syntax may also be included with the encoded video data stored on the storage medium 34 or the file server 36. Each of video encoder 20 and video decoder 30 may form part of a respective encoder-decoder (CODEC) capable of encoding or decoding video data.
The display device 32 may be integrated with the destination device 14 or external to the destination device 14. In some examples, destination device 14 may include an integrated display device, and also be configured to interface with an external display device. In other examples, destination device 14 may be a display device. In general, display device 32 displays the decoded video data to a user, and may comprise any of a variety of display devices, such as a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or another type of display device.
In the example of fig. 1, communication channel 16 may comprise any wireless or wired communication medium, such as a Radio Frequency (RF) spectrum or one or more physical transmission lines, or any combination of wireless and wired media. Communication channel 16 may form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the internet. Communication channel 16 generally represents any suitable communication medium or collection of different communication media, including any suitable combination of wired or wireless media, for transmitting video data from source device 12 to destination device 14. Communication channel 16 may include a router, switch, base station, or any other apparatus that may be used to facilitate communication from source device 12 to destination device 14.
Video encoder 20 and video decoder 30 may operate according to video compression standards such as the joint collaborative team for video coding (JCT-VC) by the ITU-T Video Coding Experts Group (VCEG) and the High Efficiency Video Coding (HEVC) standard currently being developed by the ISO/IEC Motion Picture Experts Group (MPEG). The latest draft of the HEVC standard, named "HEVC working draft 6" or "WD 6", is described in the document JCTVC-H1003 "High Efficiency Video Coding (HEVC) text specification draft 6" (high efficiency video coding (HEVC) text specification draft 6) "by Blos (Bross) et al (ITU-T SG16 WP3 and the video coding joint collaboration group (JCT-VC) of ISO/IEC JTC1/SC29/WG11, conference 8: San Jose, Calif., USA, month 2 2012), downloadable from http:// phenix. int-evry. frct/doct/doc _ end _ user/documents/8_ ww% 20 Jose/g 11/JCTVC-H1003-v22.zip from month 1 2012.
Alternatively, video encoder 20 and video decoder 30 may operate in accordance with other specialized or industry standards, such as the ITU-T h.264 standard, alternatively referred to as MPEG4 part 10 Advanced Video Coding (AVC), or extensions of such standards. The techniques of this disclosure, however, are not limited to any particular coding standard. Other examples include MPEG-2 and ITU-T H.263.
Although not shown in fig. 1, in some aspects, video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder, and may include appropriate multiplexer-demultiplexer (MUX-DEMUX) units or other hardware and software to handle encoding of both audio and video in a common data stream or separate data streams. If applicable, in some examples, the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol, or other protocols such as the User Datagram Protocol (UDP).
Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuits, such as one or more microprocessors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. When the techniques are implemented in part in software, a device may store instructions for the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in the respective device.
Video encoder 20 may implement any or all of the techniques of this disclosure for CABAC in a video coding process. Video encoder 30 may implement any or all of the techniques of this disclosure for CABAC in a video coding process. A video coder as described in this disclosure may refer to a video encoder or a video decoder. Similarly, a video coding unit may refer to a video encoder or a video decoder. Likewise, video coding may refer to video encoding or video decoding.
In one example of this disclosure, video encoder 20 may be configured to determine a first prediction type for a block of video data in a P slice, represent the first prediction type as a P slice prediction type syntax element, determine a second prediction type for the block of video data in a B slice, represent the second prediction type as a B slice prediction type syntax element, determine a P slice binarization for the P slice prediction type syntax element, determine a B slice binarization for the B slice prediction type syntax element, wherein the P slice prediction type syntax element and the B slice prediction type syntax element are determined using the same binarization logic, and encode the video data based on the binarization of the P slice prediction type syntax element and the B slice prediction syntax element.
In another example of this disclosure, video decoder 30 may be configured to map a binarized P-slice prediction type syntax element to a prediction type using a binarization mapping for a block of video data in a P-slice, map a binarized B-slice prediction type syntax element to a prediction type using the same binarization mapping for a block of video data in a B-slice, and decode the video data based on the mapped prediction type.
In another example of this disclosure, video encoder 20 may be configured to determine a partition type for a prediction mode of a block of video data, encode a partition type bin of a prediction type syntax element of the block of video data using CABAC having a single context, wherein the single context is the same for either partition type, and encode a partition size bin of the prediction type syntax element of the block of video data using CABAC in bypass mode.
In another example of this disclosure, video decoder 30 may be configured to receive a prediction type syntax element for a block of video data that has been coded using CABAC, the prediction type syntax element including a partition type bin representing a partition type and a partition size bin representing a partition size, decode the partition type bin of the prediction type syntax element using CABAC having a single context, wherein the single context is the same for any partition type, and decode the partition size bin of the prediction type syntax element using CABAC in bypass mode.
In another example of this disclosure, video encoder 20 and video decoder 30 may be configured to code Cb chroma coded block flags for a block of video data using CABAC, wherein coding the Cb chroma coded block flags comprises using a context set that includes one or more contexts as part of CABAC, and code Cr chroma coded block flags using CABAC, wherein coding the Cr chroma coded block flags comprises using the same context set as the Cb chroma coded block flags as part of CABAC.
JCT-VC is working on the development of the HEVC standard. HEVC standardization efforts are based on an evolution model of the video coding device known as the HEVC test model (HM). The HM assumes several additional capabilities of video coding devices relative to existing devices that conform to, for example, ITU-T H.264/AVC. For example, h.264 provides 9 intra-prediction encoding modes, while the HM may provide up to 33 intra-prediction encoding modes. The following sections will discuss certain aspects of the HM in more detail.
In general, the working model for HM describes a sequence of Largest Coding Units (LCUs) or treeblocks that can divide a video frame or picture into both luma and chroma samples. Treeblocks have a similar purpose as macroblocks of the h.264 standard. A slice includes a number of treeblocks that are consecutive in coding order. A video frame or picture may be partitioned into one or more slices. Each treeblock may be split into Coding Units (CUs) according to a quadtree. For example, a treeblock that is the root node of a quadtree may be split into four child nodes, and each child node may in turn be a parent node and split into four other child nodes. As a leaf node of the quadtree, the final non-split child nodes comprise coding nodes, i.e., coded video blocks. Syntax data associated with a coded bitstream may define a maximum number of times a treeblock may be split, and may also define a minimum size of a coding node.
A CU includes a coding node and a Prediction Unit (PU) and a Transform Unit (TU) associated with the coding node. The size of a CU generally corresponds to the size of the coding node and typically must be square in shape. The size of a CU may range from 8x8 pixels up to a treeblock with a maximum of 64x64 pixels or larger. Each CU may contain one or more PUs and one or more TUs. Syntax data associated with a CU may describe, for example, a partitioning of the CU into one or more PUs. The partition mode may be different between whether the CU is skipped or direct mode encoded, intra prediction mode encoded, or inter prediction mode encoded. A PU may be partitioned into non-square shapes. Syntax data associated with a CU may also describe, for example, the partitioning of the CU into one or more TUs according to a quadtree. The TU may be square or non-square in shape.
The emerging HEVC standard allows for a transform according to TUs, which may be different for different CUs. A TU is typically sized based on the size of a PU within a given CU defined for a partitioned LCU, although this may not always be the case. TUs are typically the same size or smaller than a PU. In some examples, using a quadtree structure referred to as a "residual quadtree" (RQT), residual samples corresponding to a CU may be subdivided into smaller units. The leaf nodes of the RQT may be referred to as Transform Units (TUs). Pixel difference values associated with TUs may be transformed to produce transform coefficients that may be quantized.
In general, a PU refers to data related to a prediction process. For example, when the PU is intra-mode encoded, the PU may include data describing an intra-prediction mode of the PU. As another example, when the PU is inter-mode encoded, the PU may include data defining a motion vector for the PU. The data defining the motion vector for the PU may describe, for example, a horizontal component of the motion vector, a vertical component of the motion vector, a resolution of the motion vector (e.g., one-quarter pixel precision or one-eighth pixel precision), a reference picture to which the motion vector points, and/or a reference picture list (e.g., list 0, list 1, or list C) of the motion vector.
In general, TUs are used for transform and quantization processes. A given CU with one or more PUs may also include one or more Transform Units (TUs). After prediction, video encoder 20 may calculate residual values from the video blocks identified by the coding node according to the PUs. The coding node is then updated to reference the residual values instead of the original video block. The residual values comprise pixel difference values that may be transformed into transform coefficients using transforms and other transform information specified in the TUs, quantized, and scanned to generate serialized transform coefficients for entropy coding. The coding node may again be updated to reference these serialized transform coefficients. This disclosure generally uses the term "video block" to refer to a coding node of a CU. In some particular cases, this disclosure may also use the term "video block" to refer to treeblocks, i.e., LCUs or CUs, that include coding nodes as well as PUs and TUs.
A video sequence typically comprises a series of video frames or pictures. A group of pictures (GOP) typically includes a series of one or more video pictures. A GOP may include syntax data describing the number of pictures included in the GOP, in a header of one or more pictures, or elsewhere. Each slice of a picture may include slice syntax data that describes an encoding mode for the respective slice. Video encoder 20 typically operates on video blocks within individual video slices in order to encode the video data. The video block may correspond to a coding node within a CU. Video blocks may have fixed or varying sizes, and may be different sizes according to a specified coding standard.
As an example, the HM supports prediction at various PU sizes. Assuming that the size of a particular CU is 2Nx2N, the HM supports intra prediction of PU sizes of 2Nx2N or NxN and inter prediction of symmetric PU sizes of 2Nx2N, 2NxN, Nx2N, or NxN. The HM also supports asymmetric partitioning for inter prediction for PU sizes of 2NxnU, 2NxnD, nLx2N, and nRx 2N. In asymmetric partitioning, one direction of a CU is undivided, while the other direction is partitioned into 25% and 75%. The portion of the CU corresponding to the 25% split is indicated by "n", followed by an indication of "up", "down", "left" or "right". Thus, for example, "2 NxnU" refers to a horizontally partitioned 2Nx2N CU with the top being 2nx0.5npu and the bottom being 2nx1.5n PU.
Fig. 4 is a conceptual diagram showing square and non-square partition types for intra-prediction and inter-prediction. Partition 102 is a 2Nx2N partition and may be used for both intra-prediction and inter-prediction. Partition 104 is an NxN partition and may be used for both intra-prediction and inter-prediction. Partition 106 is a 2NxN partition and is currently used for inter prediction in HEVC. Partition 108 is an Nx2N partition and is currently used in HEVC for inter prediction.
Fig. 5 is a conceptual diagram showing asymmetric partition types. Partition 110 is a 2NxnU partition and is currently used for inter prediction in HEVC. Partition 112 is a 2NxnD partition and is currently used for inter prediction in HEVC. Partition 114 is an nLx2N partition and is currently used in HEVC for inter prediction. Partition 116 is an nRx2N partition and is currently used in HEVC for inter prediction.
In this disclosure, "NxN" and "N by N" may be used interchangeably to refer to the pixel size of a video block in terms of vertical and horizontal dimensions, such as 16x16 pixels or 16 by 16 pixels. In general, a 16x16 block will have 16 pixels in the vertical direction (y-16) and 16 pixels in the horizontal direction (x-16). Likewise, an NxN block typically has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. The pixels in a block may be arranged in several rows and columns. Also, the block does not necessarily have to have the same number of pixels in the horizontal direction as in the vertical direction. For example, a block may comprise NxM pixels, where M is not necessarily equal to N.
After using intra-prediction or inter-prediction coding of PUs of the CU, video encoder 20 may calculate residual data to which the transform specified by the TUs of the CU applies. The residual data may correspond to pixel differences between pixels of the unencoded picture and to prediction values of the CU. Video encoder 20 may form residual data for the CU and then transform the residual data to generate transform coefficients.
After any transform to generate transform coefficients, video encoder 20 may perform quantization of the transform coefficients. Quantization generally refers to a process in which transform coefficients are quantized to possibly reduce the amount of data used to represent the coefficients, providing further compression. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value may be rounded down to an m-bit value during quantization, where n is greater than m.
In some examples, video encoder 20 may utilize a predefined scan order to scan the quantized transform coefficients to generate a serialized vector that may be entropy encoded. In other examples, video encoder 20 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 20 may entropy encode the one-dimensional vector, e.g., according to Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), Probability Interval Partitioning Entropy (PIPE) coding, or another entropy encoding method. Video encoder 20 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 30 in decoding the video data.
To perform CABAC, video encoder 20 may assign contexts within the context model to symbols to be transmitted. The context may, for example, relate to whether adjacent values of a symbol are non-zero. To perform CAVLC, video encoder 20 may select variable length coding for the symbols to be transmitted. Codewords in VLC may be constructed such that relatively shorter codes correspond to more likely symbols, while longer codes correspond to less likely symbols. In this way, the use of VLC may achieve bit savings, e.g., using equal length codewords for each symbol to be transmitted. The probability determination may be based on the context of the symbol assignment.
The present invention is related art for Context Adaptive Binary Arithmetic Coding (CABAC) entropy decoders or other entropy decoders such as probability interval partition entropy coding (PIPE) or related decoders. Arithmetic coding is a form of entropy coding used in many compression algorithms with high coding efficiency because it is able to map symbols to non-integer length codewords. An example of an arithmetic coding algorithm is the context-based binary arithmetic coding (CABAC) used in h.264/AVC.
In general, coding a data symbol using CABAC involves one or more of the following steps:
(1) binarization: if the symbol to be coded is non-binary, it is mapped to a sequence of so-called "bins". Each bin may have a value of "0" or "1".
(2) Context assignment: each bin is assigned to a context (in the normal mode). The context model determines how to compute the context for a given bin based on information available for that bin, such as the values of previously encoded symbols or bins.
(3) Binary coding: the binary is encoded with an arithmetic encoder. In order to encode a bin, the arithmetic encoder needs as input the probability of the value of the bin, i.e. the probability that the value of the bin equals "0" and the probability that the value of the bin equals "1". The (estimated) probability of each context is represented by an integer value called "context state". Each context has a state, and thus the state (i.e., estimated probability) is the same for the bin assigned to one context and differs from context to context.
(4) And (3) updating the state: the probability (state) for the selected context is updated based on the actual coded value of the bin (e.g., if the bin value is "1", then the probability of "1" is increased).
It should be noted that probability interval partitioning entropy coding (PIPE) uses principles similar to arithmetic coding principles, and may thus also utilize the techniques of this disclosure.
CABAC in h.264/AVC and HEVC uses several states, and each state implicitly refers to a probability. There are variants of CABAC in which the probability of a symbol being used directly ("0" or "1"), i.e., the probability (or an integer version thereof) is a state. For example, such variants of CABAC are described in the Description of the video coding technology proposal by "telecommunications, NTT DOCOMO, panaxon and techonlogy company (Technicolor)" (Description of video coding technology project, japan b y france Telecom, NTT DOCOMO, Panasonic and Technicolor) "(german dereston, month 4 2010, JCT-VC conference, hereinafter" JCTVC-a114 "), and" Multi-parameter probabilistic update for CABAC (Multi-parameter probabilistic update for CABAC) "of a.
In the present invention, it is proposed to reduce the number of binarizations and/or contexts used in CABAC. In particular, this disclosure presents techniques that can reduce the number of contexts used in CABAC by as much as 56. By reducing 56 contexts, experimental results show 0.00%, 0.01%, and-0.13% Bit Distortion (BD) rate changes in high efficiency intra-only, random access, and low latency test conditions, respectively. Thus, the reduction in the number of required contexts reduces the storage requirements at both the encoder and decoder without substantially impacting coding efficiency.
In this disclosure, it is proposed to reduce the number of CABAC contexts for syntax elements pred _ type, merge _ idx, inter _ pred _ flag, ref _ idx _ lx, cbf _ cb, cbf _ cr, coeff _ abs _ level _ header 1_ flag, and coeff _ abs _ level _ header 2_ flag. The modification reduces up to 56 contexts, with negligible coding efficiency changes. The context reduction of the syntax elements presented above can be used alone or in any combination.
The syntax element pred _ type includes a prediction mode (pred _ mode _ flag) and a partition type (part _ mode) for each coding unit. A syntax element pred _ mode _ flag equal to 0 specifies that the current coding unit is coded in inter prediction mode. A syntax element pred _ mode _ flag equal to 1 specifies that the current coding unit is coded in intra prediction mode. The syntax element part _ mode specifies the partition mode of the current coding unit.
The syntax element merge _ idx [ x0] [ y0] specifies the merge candidate index of the merge candidate list, where x0, y0 specify the position of the top left luma sample of the considered prediction block relative to the top left luma sample of the picture (x0, y 0). When merge _ idx [ x0] [ y0] is absent, it is inferred to be equal to 0. The merge candidate list is a list of coding units that neighbor the current unit from which motion information may be copied.
The syntax element inter _ pred _ flag x0 y0 specifies whether uni-prediction or bi-prediction is used for the current prediction unit. The array indices x0, y0 specify the position of the top left luma sample of the considered prediction block relative to the top left luma sample of the picture (x0, y 0).
The syntax element ref _ idx _ lx refers to a particular reference picture within the reference picture list.
The syntax elements cbf _ Cb, cbf _ Cr indicate whether the chroma (Cb and Cr, respectively) transform block contains non-zero transform coefficients. A syntax element cbf _ Cb [ x0] [ y0] [ trafoDepth ] equal to 1 specifies that the Cb transform block contains one or more transform coefficient levels not equal to 0. The array indices x0, y0 specify the position of the top left luma sample of the transform block under consideration relative to the top left luma sample of the picture (x0, y 0). The array index trafoDepth specifies the current subdivision level at which the coding unit is changed into a block for transform coding. For a block corresponding to a coding unit, the array index trafoDepth is equal to 0. When cbf _ cb [ x0] [ y0] [ trafoDepth ] is not present and the prediction mode is not intra-predicted, the value of cbf _ cb [ x0] [ y0] [ trafoDepth ] is inferred to be equal to 0.
A syntax element cbf _ Cr [ x0] [ y0] [ trafoDepth ] equal to 1 specifies that the Cr transform block contains one or more transform coefficient levels not equal to 0. The array indices x0, y0 specify the position of the top left luma sample of the transform block under consideration relative to the top left luma sample of the picture (x0, y 0). The array index trafoDepth specifies the current subdivision level at which the coding unit is changed into a block for transform coding. For a block corresponding to a coding unit, the array index trafoDepth is equal to 0. When cbf _ cr [ x0] [ y0] [ trafoDepth ] is not present and the prediction mode is not intra-predicted, it is inferred that the value of cbf _ cr [ x0] [ y0] [ trafoDepth ] is equal to 0.
The syntax element coeff _ abs _ level _ header 1_ flag [ n ] specifies whether there is a transform coefficient level greater than 1 for scan position n. When coeff _ abs _ level _ grease 1_ flag [ n ] is not present, it is inferred to be equal to 0.
The syntax element coeff _ abs _ level _ header 2_ flag [ n ] specifies whether there is a transform coefficient level greater than 2 for scan position n. When coeff _ abs _ level _ grease 2_ flag [ n ] is not present, it is inferred to be equal to 0.
In one proposal for HEVC, different binarization of the syntax element pred _ type is used in P and B slices, as shown in table 1. The present invention proposes to use the same binarization for P and B slices. Examples are shown in tables 2-4. Table 5 shows the coding performance impact on P slices under Common test conditions (see, e.g., F bosen, "Common test conditions and software reference configurations", JCTVC-F900).
Table 1. Binarization of pred _ type in one proposal for HEVC
As can be seen in table 1, an I slice (e.g., a slice that includes only intra-predicted blocks) includes two different prediction types (pred _ types). One binary bit string (binarized) is used for an intra prediction block having a 2Nx2N partition type and the other binary bit string is used for an intra prediction block having an NxN partition type. As shown in table 1, the binary bit string for an I slice does not depend on the CU size.
For P and B slices, in table 1, a different binary string is used for each value of pred _ type. Also, the value of pred _ type depends on the prediction mode (inter prediction or intra prediction) and partition type used. For P and B slices, the actual binary string used further depends on the size of the CU being coded and whether inter prediction is enabled for the 4x4 block size.
The first column under the bin string applies to the case where the log function of the CU size of the CU being coded is greater than the log function of the minimum allowable CU size. According to one example in HEVC, if cLog2CUSize > Log2MinCUsize, then the first column of the binary string is used. A logarithmic function is used to generate smaller numbers so that smaller consecutive indices can be used.
If the Log function of the CU size of the CU being coded is equal to the Log function of the minimum allowable CU size (i.e., cLog2CUSize ═ cLog 2MinCUSize), then one of columns 2 and 3 under the bin string in table 1 is used to select binarization. Column 2 is used when the log function of the CU size of the CU being coded is equal to 3 and inter prediction of 4x4CU is not enabled (i.e., cLog2CUSize ═ 3& & | inter _4x4_ enabled _ flag). Column 3 is used when the logarithmic function of the CU size of the CU being coded is greater than 3 or 4x4 CU-enabled inter prediction (i.e., cLog2CUSize > 3| | inter _4x4_ enabled _ flag).
Table 2 below shows example binarization, where P and B slices use the same binary string, in accordance with one or more examples described in this disclosure. As shown in table 2, P slices use the same binarization as used for B slices in table 1. In this way, separate context sets do not have to be stored and used for both P and B slices. As such, the total number of contexts needed to code the pred _ type syntax element is reduced. Furthermore, only one mapping (rather than two) between the binary string logic (shown in columns (1) through (3)) and the actual binary string needs to be stored.
Table 2. Binarization of pred _ type in one embodiment of the invention
Table 3 below shows another example of binarization for pred _ type. In this example, B slices use the same binarization as P slices from table 1. Table 4 below shows additional examples where P-slices and B-slices use the same binarization. Tables 2-4 are only intended to show examples of shared binarization between P and B slices. Any binarization or binarization rule may be used such that pred _ type syntax elements for both P and B slices share the same binarization.
Video encoder 20 and video decoder 30 may store the same mapping rules and mapping tables (e.g., as shown in tables 2-4) for use with both P and B slices. CABAC encoding and decoding may be applied to pred _ type syntax elements using these mappings.
In this way, video encoder 20 may be configured to determine a first prediction type for a block of video data in a P slice, represent the first prediction type as a P-slice prediction type syntax element, determine a second prediction type for the block of video data in a B slice, represent the second prediction type as a B-slice prediction type syntax element, determine a P-slice binarization for the P-slice prediction type syntax element, determine a B-slice binarization for the B-slice prediction type syntax element, wherein the P-slice prediction type syntax element and the B-slice prediction type syntax element are determined using the same binarization logic, and encode the video data based on the binarization for the P-slice prediction type syntax element and the B-slice prediction syntax element.
Video encoder 20 may be further configured to binarize the P-slice prediction type syntax element with the determined P-slice binarization, binarize the B-slice prediction type syntax element with the determined B-slice binarization, apply Context Adaptive Binary Arithmetic Coding (CABAC) to the binarized P-slice prediction type syntax element, and apply Context Adaptive Binary Arithmetic Coding (CABAC) to the binarized B-slice prediction type syntax element.
Similarly, video decoder 30 may be configured to map a binarized P-slice prediction type syntax element to a prediction type using a binarization mapping for a block of video data in a P slice, map a binarized B-slice prediction type syntax element to a prediction type using the same binarization mapping for a block of video data in a B slice, and decode the video data based on the mapped prediction type.
Video decoder 30 may be further configured to receive a context-adaptive binary arithmetic coding P-slice prediction type syntax element indicating a prediction type for a block of video data in a P-slice, receive a context-adaptive binary arithmetic coding B-slice prediction type syntax element indicating a prediction type for a block of video data in a B-slice, decode the P-slice prediction type syntax element to generate a binarized P-slice prediction type syntax element, and decode the B-slice prediction type syntax element to generate a binarized B-slice prediction type syntax element.
Table 3. Binarization of pred _ type in another embodiment of the present invention
Table 4. Binarization of pred _ type in another embodiment of the present invention
Table 5 below shows the coding performance using the shared binarization shown in table 2 for P and B slices. As can be seen in table 5, using shared binarization loses little to no coding efficiency. Low latency pthe (high efficiency) is a common test condition for uni-directional prediction (P) slice binarization. Classes a to E represent different frame resolutions. Class a is 2k x 4k resolution. Class B is 1920x 1080 resolution. Class C is WVGA resolution. Class D is WQVGA resolution. Class E is 720P resolution. A 0.1 to 0.2 percent change in low latency pthe test conditions is generally considered insignificant.
Table 5. Decoding performance for unified binarization of pred _ type
Optionally, the same binarization (not limited to tables 2-4) for prediction types (including prediction size and/or prediction mode) may be shared among two or more different types of inter-predicted slices. Inter-predicted slices may include (but are not limited to):
a. p slicing: slice support only unidirectional motion prediction
b. B, slicing: slice support uni-directional and bi-directional motion prediction
c. In scalable video coding: the enhancement layer may share the same binarization as the base layer.
d. In multi-view coding: different views may share the same binarization.
When asymmetric partitioning is enabled, four contexts equally divided into two context sets are used for the last two bins of CABAC for signaling pred _ type syntax elements for asymmetric partitioning (i.e., PART _2NxnU, PART _2NxnD, PART _ nLx2N, PART _ nRx 2N). Depending on whether the partition is divided in the horizontal or vertical direction, a set of contexts is applied. The second to last bin (i.e., partition type bin; part _ mode) specifies whether the current CU has a symmetric partition or an asymmetric partition. The last bin (i.e., partition size bin; part _ mode) specifies whether the size of the first partition is one-quarter or three-quarters of the CU size. Table 6 shows an example of the second to last (partition type) and last (partition size) contexts of the pred _ type syntax element.
Table 6. Context of the last two bins of Pred _ Type syntax element
The present disclosure proposes to use one context for the second-to-last bin (i.e., partition type bin) and bypass mode for the last bin (i.e., partition size bin). Thus, the number of contexts is reduced from 4 to 1. Table 7 shows an example of a context used in accordance with this example of the invention. Table 8 shows the coding performance associated with the proposed modifications. Random access High Efficiency (HE) is a test condition for random access frames. Low delay bhe is a test condition that allows bi-directional prediction.
Binary bit Context(s)
Partition type (symmetrical or asymmetrical) Context set 1(1 context)
Partition size (first partition is1/4CU or 3/4CU) Bypass mode (without context)
Table 7. Context of the last two bins of the Pred _ Type syntax element according to an example of the present invention
Table 8. Decoding performance of the proposed method of pred _ type
In this way, according to this example, video encoder 20 may be configured to determine a partition type for a prediction mode for a block of video data, encode a partition type bin for a prediction type syntax element for the block of video data using context adaptive binary arithmetic coding having a single context, wherein the single context is the same for any partition type, and encode the partition size bin for the prediction type syntax for the block of video data using context adaptive binary arithmetic coding in bypass mode.
Similarly, according to this example, video decoder 30 may be configured to receive a prediction type syntax element for a block of video data that has been coded using Context Adaptive Binary Arithmetic Coding (CABAC), the prediction type syntax element including a partition type bin representing a partition type and a partition size bin representing a partition size, decode the partition type bin of the prediction type syntax element using context adaptive binary arithmetic coding with a single context, wherein the single context is the same for any partition type, and decode the partition size bin of the prediction type syntax for the block of video data using context adaptive binary arithmetic coding in bypass mode.
In another example, when coding a rectangular partition type, a bypass mode or a single context may be used for the bin indicating whether the partition mode is PART _ nLx2N or PART _ nRx2N or whether the mode is PART _2NxnU, PART _2 NxnD. Using bypass mode or a single context is applicable because the chance that either partition mode is used is close to 50%. Also optionally, a bypass mode or a single context may be used for the bin indicating whether the mode is a symmetric partition or an asymmetric partition.
The next example of this disclosure relates to signaling in the "merge" mode of inter prediction. In merge mode, the encoder instructs the decoder, through bitstream signaling of the prediction syntax, to copy motion vectors, reference indices (identifying the reference pictures in a given reference picture list to which the motion vectors point) and motion prediction directions (identifying the reference picture lists (list 0 or list 1), i.e., in view of whether the reference frame temporally precedes or follows the current frame, from the selected candidate motion vectors of the current portion of the picture to be coded. This is accomplished by signaling an index into the list of candidate motion vectors in the bitstream that identifies the selected candidate motion vector (i.e., a particular spatial Motion Vector Predictor (MVP) candidate or temporal MVP candidate).
Thus, for merge mode, the prediction syntax may include a flag identifying the mode (in this case "merge" mode) and an index (merge idx) identifying the selected candidate motion vector. In some examples, the candidate motion vector will be in a causal portion of the reference current portion. That is, the candidate motion vector will have been decoded by the decoder. Thus, the decoder has received and/or determined the motion vector, the reference index and the motion prediction direction for the causal portion. Thus, the decoder may simply retrieve the motion vector, reference index, and motion prediction direction associated with the causal portion from memory and copy these values as the motion information for the current portion. To reconstruct the block in merge mode, the decoder uses the derived motion information of the current portion to obtain a prediction block, and adds residual data to the prediction block to reconstruct the coded block.
In HM4.0, one of the five merge candidates is signaled when the current PU is in merge mode. The syntax element merge _ idx is represented by a truncated unary code. In one proposal for HEVC, one context is used for each bin for CABAC. The present invention proposes to repeatedly use one context in all four bins as shown in table 9.
Table 9. Context of the last two bins of Pred _ Type syntax element
Table 10 shows the coding performance associated with this example.
Table 10. Decoding performance of the proposed method of merge idx
Optionally, more than one context may be used in merge index coding, with some bins sharing the same context and some bins using other contexts. As one example, only consecutive bins share the same context. For example, bin 2 and bin 3 may share one context; bin 2 and bin 4 cannot share the same context except for bin 3.
As another example, assume that the total number of bins of the merge index is N (the first bin is bin 0 and the last bin is bin N-1). Using Y threshold thres in merging index codingiI 1.. y to determine context sharing. In this example, the following rules indicate how the context is shared between bins:
1.0 < Y < N (there are less thresholds than binary)
2.thresi<thresi+1
3.0<thres1
4.thresY=N
5.binjWill share a context where i ═ { thres ═Y,...,thresi+1-1}
Based on these rules, the previous method in which one context is reused in all four bins can be considered as N-4, Y-1, thres1In one case of 4. Thus, bin 0 through bin 3 share the same context.
Another example includes setting N-4, Y-2, thres1=2、thres24. In this example, bin 0 and bin 1 share the same context, and bin 2 and bin 3 share the same context.
An inter prediction flag (inter _ pred _ flag) specifies whether uni-directional prediction or bi-directional prediction is used for the current PU. In some examples, the context index for the inter prediction flag is equal to the current CU depth. Since there are four possible CU depths (0 to 3), there are four possible contexts for coding inter _ pred _ flag.
This disclosure proposes that the context index used to select the context used to code inter _ pred _ flag is equal to the current CU depth (e.g., the hierarchical quadtree decomposition of the CU), but is capped with the selected threshold (i.e., is the lesser of the current CU depth or threshold). The threshold may be selected to be 2 in one example. Alternatively, the context index may be equal to the maximum CU depth minus the current CU depth and capped at the selected threshold. Alternatively, a predefined mapping table may be designed to select a context index by a given CU depth. The mapping table may be implemented as a logical set. Thus, the syntax element inter _ pred _ flag is coded using 3 contexts.
Table 11 shows coding performance when the initialization table changes but the number of contexts does not change. Table 12 shows the coding performance of the proposed technique that reduces the number of contexts from 4 to 3.
Table 11. Coding performance of HM4.0 with modified CABAC initialization of inter _ pred _ flag.
Table 12. Coding performance of the proposed context reduction technique of inter _ pred _ flag.
The reference frame index (ref _ idx _ lx) is signaled by using a truncated unary code with respect to the active reference frames in the associated list (e.g., list 0 or list 1). Three contexts are used to code the reference frame index. One context for bin 0, one context for bin 1, and one context for the remaining bins. Table 13 shows an example of context assignment of bins for unary codes of ref _ idx _ lx.
Binary of ref _ idx _ lx unary code Context(s)
Binary 0 Context 1
Binary 1 Context 2
Binary 2-N (N being the total number of binary bits) Context 3
Table 13. Context assignment of binary bits for ref _ idx _ lx
This disclosure proposes the use of two contexts to code a unary code for ref _ idx _ lx: one context is for bin 0 and the other context is for the remaining bins. Table 14 shows an example of context assignment of bins for a unary code of ref _ idx _ lx according to this example of the present disclosure. Table 15 shows the coding performance associated with the proposed modifications.
Binary of ref _ idx _ lx unary code Context(s)
Binary 0 Context 1
Bins 1-N (N being the total number of bins) Context 2
Table 14. Context assignment of binary bits for ref _ idx _ lx
Table 15. Decoding performance of the proposed method of ref _ idx _ lx.
For chroma coded block flag syntax elements (cbf _ cb and cbf _ cr), two different context sets (5 contexts in each context set) are used for CABAC. The index of the actual context used in each set is equal to the current transform depth associated with the chroma coded block flag being coded. Table 16 shows the set of contexts for cbf cb and cbf cr chroma coded block flags.
Chroma coded block flag Context collection
Cbf_cb Context set 1(5 contexts)
Cbf_cr Context set 2(5 contexts)
Table 16. Context sets for cbf _ cb and cbf _ cr
The present invention proposes that cbf _ cb and cbf _ cr share a context set. The index of the actual context used in each set may still be equal to the current transform depth associated with the chroma coded block flag being coded. Table 17 shows the set of contexts for cbf cb and cbf cr chroma coded block flags according to an example of this disclosure. Table 18 shows coding performance associated with the proposed modifications.
Chroma coded block flag Context collection
Cbf_cb Context set 1(5 contexts)
Cbf_cr Context set 1(5 contexts)
Table 17. Context sets for cbf _ cb and cbf _ cr according to embodiments of the present invention
Table 18. The decoding performance of the proposed methods of cbf _ cb and cbf _ cr.
In this way, according to this example, both video encoder 20 and video decoder 30 may be configured to code Cb chroma coded block flags for a block of video data using Context Adaptive Binary Arithmetic Coding (CABAC), where CABAC uses a context set that includes one or more contexts, and code Cr chroma coded block flags using CABAC, where CABAC uses the same context set as Cb chroma coded block flags. Video encoder 20 and video decoder 30 may be further configured to select a context from the one or more contexts based on a transform depth of a transform unit associated with the block of video data.
In one proposal for HEVC, there are twelve context sets for coeff _ abs _ level _ grease 1_ flag and coeff _ abs _ level _ grease 2_ flag. coeff _ abs _ level _ header 1_ flag indicates whether a transform coefficient has an absolute value greater than 1. coeff _ abs _ level _ header 2_ flag indicates whether a transform coefficient has an absolute value greater than 2. The luma and chroma components are assigned context sets equally, i.e., 6 context sets for luma and 6 contexts for chroma. Each context set consists of 5 contexts. The index ctxSet of the context set is selected based on the previous coeff _ abs _ level _ grease 1_ flag. For coeff _ abs _ level _ grease 1_ flag, the index grease 1Ctx of a context within a context set is determined based on the trailing one to maximum 4. The context index may be represented as:
ctxIdx_level_greater1=(ctxSet*5)+Min(4,greater1Ctx) (1)
for coeff _ abs _ level _ header 2_ flag, the index header 2Ctx of a context within a context set is based on the number of coeff _ abs _ level _ headers 1_ flag from 1 to a maximum of 4. The context index may be represented as:
ctxIdx_level_greater2=(ctxSet*5)+Min(4,greater2Ctx) (2)
the grease 1Ctx is based on the number of significant coefficients and the number of coefficients greater than 1. On the other hand, grease 2Ctx is based on the number of coefficients greater than 1.
In some examples, different numbers of contexts may be used in different sets of contexts, including, for example:
1. context sets for levels greater than 1 or for levels greater than 2 may have different numbers of contexts. For example, context sets 0 and 3 may have 5 contexts, and the remaining context sets may have 2 contexts.
2. The set of contexts for luma coefficients may have a different number of contexts than the set of contexts for chroma components. For example, context set 0 for luma may have 5 contexts and context set 0 for chroma may have 4 contexts.
3. A context set for a level greater than 1 may have a different number of contexts than a context set for a level greater than 2. For example, context set 0 for a level greater than 1 may have 5 contexts, and context set 0 for a level greater than 2 may have only 3 contexts.
In other examples, different numbers of context sets may be used for coding greater than 1 or greater than 2, including, for example:
1. the set of contexts for luma coefficients may have a different number of sets of contexts than the set of contexts for chroma components. For example, luma may use 6 contexts and chroma may use 4 contexts.
2. A context set for greater than 1 may have a different number of context sets than a context set for greater than 2. For example, greater than 1 may use 6 contexts, and greater than 2 may use 4 contexts.
Optionally, a metric is used to determine which context is being used in the context set and the value range of the metric is greater than the number of contexts in the context set. In one such aspect, one context may be associated with one or more values of the metric. Context sharing is preferably limited to continuous values. For example, let the value of the metric be y. y 2 is associated with context 3, and y 1 and y 4 may also be associated with context 3. However, if y-3 is associated with context 4, then y-4 cannot be associated with context 3.
For example, for coeff abs level grease 1 flag, context sets 0 and 3 have 5 contexts, and context sets 1, 2, 4, and 5 have 2 contexts. For coeff _ abs _ level _ grease 2_ flag, context sets 0, 1, and 2 have 5 contexts, and context sets 3, 4, and 5 have 2 contexts. It can be expressed as:
ctxIdx_level_greater1=(ctxSet*5)+Min(Thres_greater1,greater1Ctx) (3)
if ctxSet ═ 0 or ctxSet ═ 3, Thres _ grease 1 ═ 4;
otherwise, Thres _ grease 1 is 1.
ctxIdx_level_greater2=(ctxSet*5)+Min(Thres_greater2,greater2Ctx) (4)
If ctxSet < 3, Thres _ grease 2 ═ 4;
otherwise, Thres _ grease 2 ═ 1
Thres _ grease 1 and Thres _ grease 2 may be selected differently based on:
1. luminance or chrominance component
2. Context collection
As another example, for coeff _ abs _ level _ grease 1_ flag, context sets 0 and 3 have 5 contexts, and context sets 1, 2, 4, and 5 have 3 contexts. For coeff _ abs _ level _ grease 2_ flag, context sets 0, 1, and 2 have 5 contexts, and context sets 3, 4, and 5 have 2 contexts. It can be expressed as:
ctxIdx_level_greater1=(ctxSet*5)+greater1Ctx_mapped (3)
ctxIdx_level_greater2=(ctxSet*5)+greater2Ctx_mapped (4)
in such examples, the mapping may be as shown in tables 19 and 20:
greater1Ctx 0 1 2 3 >3
ctxSet 0 0 1 2 3 4
ctxSet 1 0 1 1 2 2
ctxSet 2 0 1 1 1 2
ctxSet 3 0 1 2 3 4
ctxSet 4 0 1 2 2 2
ctxSet 5 0 1 1 2 2
watch 19
greater2Ctx 0 1 2 3 >3
ctxSet0 0 1 2 3 4
ctxSet 1 0 1 1 1 1
ctxSet 2 0 1 1 1 1
ctxSet 3 0 1 2 3 4
ctxSet 4 0 1 1 1 1
ctxSet 5 0 1 1 1 1
Watch 20
The CABAC initialization tables for coeff _ abs _ level _ grease 1_ flag and coeff _ abs _ level _ grease 2_ flag are also modified for the sets of contexts for Thres _ grease 1 or Thres _ grease 2 equal to 1. The modification moves the initialization of the fifth context forward to the initialization of the second context. This proposed method reduces the number of contexts from 120 to 78.
Table 21. coeff _ abs _ level _ granularity 1_ flag and coeff _ abs _ level _ granularity 2_ flag.
Table 21 lists the number of contexts for all syntax elements mentioned in the previous section. The total reduction is 56 contexts.
Number of contexts HM4.0 The proposed method
pred_type 10 6
merge_idx 4 1
inter_pred_flag 4 3
ref_idx_lc,ref_idx_l0,ref_idx_l1 3 2
cbf_cb,cbf_cr 10 5
coeff_abs_level_greater1_flag 60 36
coeff_abs_level_greater2_flag 60 42
Total of 151 95
Table 22. Comparison of the proposed method with the number of contexts in HM4.0
Fig. 2 is a block diagram illustrating an example video encoder 20 that may implement the techniques described in this disclosure. Video encoder 20 may perform intra and inter coding of video blocks within a video slice. Intra-coding relies on spatial prediction to reduce or remove spatial redundancy in video within a given video frame or picture. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy in video within adjacent frames or pictures of a video sequence. Intra-mode (I-mode) may refer to any of a number of space-based compression modes. An inter mode, such as uni-directional prediction (P-mode) or bi-directional prediction (B-mode), may refer to any of a number of time-based compression modes.
In the example of fig. 2, video encoder 20 includes partition unit 35, prediction unit 41, reference picture memory 64, summer 50, transform unit 52, quantization unit 54, and entropy encoding unit 56. Prediction unit 41 includes motion estimation unit 42, motion compensation unit 44, and intra prediction unit 46. For video block reconstruction, video encoder 20 also includes inverse quantization unit 58, inverse transform unit 60, and summer 62. A deblocking filter (not shown in fig. 2) may also be included to filter block boundaries to remove blocking artifacts from the reconstructed video. The deblocking filter will typically filter the output of summer 62 if desired. Additional loop filters (in-loop or post-loop) may be used in addition to the deblocking filter.
As shown in fig. 2, video encoder 20 receives video data and partition unit 35 partitions the data into video blocks. This partitioning may also include partitioning into slices, tiles, or other larger units, as well as video block partitioning, such as a quadtree structure according to LCUs and CUs. Video encoder 20 generally illustrates components that encode video blocks within a video slice to be encoded. A slice may be divided into multiple video blocks (and possibly into sets of video blocks called tiles). Prediction unit 41 may select one of a plurality of possible coding modes, such as one of a plurality of intra coding modes or one of a plurality of inter coding modes, for the current video block based on the error results (e.g., coding rate and distortion level). Prediction unit 41 may provide the resulting intra-or inter-coded block to summer 50 to generate residual block data and to summer 62 to reconstruct the encoded block for use as a reference picture.
Intra-prediction unit 46 within prediction unit 41 may perform intra-prediction coding of the current video block relative to one or more neighboring blocks in the same frame or slice as the current block to be coded to provide spatial compression. Motion estimation unit 42 and motion compensation unit 44 within prediction unit 41 perform inter-prediction coding of the current video block relative to one or more prediction blocks in one or more reference pictures to provide temporal compression.
Motion estimation unit 42 may be configured to determine an inter-prediction mode for a video slice according to a predetermined mode of a video sequence. The predetermined pattern may designate video slices in the sequence as P slices, B slices, or GPB (generalized P/B) slices. Motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation, performed by motion estimation unit 42, is the process of generating motion vectors that estimate the motion of video blocks. For example, a motion vector may indicate the displacement of a PU of a video block within a current video frame or picture relative to a prediction block within a reference picture.
A prediction block is a block that is found to closely match a PU of a video block to be coded in terms of pixel differences, which may be determined by Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), or other difference metrics. In some examples, video encoder 20 may calculate values for sub-integer pixel positions of reference pictures stored in reference picture memory 64. For example, video encoder 20 may interpolate values for a quarter-pixel position, an eighth-pixel position, or other fractional-pixel positions of a reference picture. Thus, motion estimation unit 42 may perform a motion search with respect to full pixel positions and fractional pixel positions and output motion vectors with fractional pixel precision.
Motion estimation unit 42 computes motion vectors for PUs of video blocks in inter-coded slices by comparing locations of the PUs to locations of prediction blocks of reference pictures. The reference picture may be selected from a first reference picture list (list 0) or a second reference picture list (list 1), each of which identifies one or more reference pictures stored in reference picture memory 64. Motion estimation unit 42 sends the calculated motion vectors to entropy encoding unit 56 and motion compensation unit 44.
The motion compensation performed by motion compensation unit 44 may involve obtaining or generating a prediction block based on a motion vector determined by motion estimation, possibly performing interpolation to sub-pixel precision. Upon receiving the motion vector for the PU of the current video block, motion compensation unit 44 may locate the prediction block to which the motion vector points in one of the reference picture lists. Video encoder 20 forms a residual video block by subtracting pixel values of the prediction block from pixel values of the current video block being coded to form pixel difference values. The pixel difference values form residual data for the block and may include both luminance and chrominance difference components. Summer 50 represents the component that performs this subtraction operation. Motion compensation unit 44 may also generate syntax elements associated with the video blocks and the video slice for use by video decoder 30 in decoding the video blocks of the video slice.
Intra-prediction unit 46 may intra-predict the current block as an alternative to inter-prediction performed by motion estimation unit 42 and motion compensation unit 44 as described above. In particular, intra-prediction unit 46 may determine the intra-prediction mode to be used to encode the current block. In some examples, intra-prediction unit 46 may encode the current block using various intra-prediction modes, e.g., during separate encoding passes, and intra-prediction unit 46 (or mode select unit 40 in some examples) may select an appropriate intra-prediction mode to use from the test modes. For example, intra-prediction unit 46 may calculate rate-distortion values using rate-distortion analysis for various tested intra-prediction modes, and select the intra-prediction mode with the best rate-distortion characteristics among the tested modes. Rate-distortion analysis typically determines the amount of distortion (or error) between an encoded block and an original, unencoded block that was encoded to produce the encoded block, as well as the bit rate (i.e., number of bits) used to produce the encoded block. Intra-prediction unit 46 may calculate a ratio from the distortion and rate of various encoded blocks to determine which intra-prediction mode exhibits the best rate-distortion value for the block.
In any case, after selecting the intra-prediction mode for the block, intra-prediction unit 46 may provide information to entropy coding unit 56 indicating the selected intra-prediction mode for the block. Entropy coding unit 56 may encode information indicative of the selected intra-prediction mode in accordance with the techniques of this disclosure. Video encoder 20 may include configuration data in the transmitted bitstream, which may include a plurality of intra-prediction mode index tables and a plurality of modified intra-prediction mode index tables (also referred to as codeword mapping tables), definitions of encoding contexts for the various blocks, and indications of the most probable intra-prediction mode, intra-prediction mode index tables, and modified intra-prediction mode index tables to be used for each of the contexts.
After prediction unit 41 generates a prediction block for the current video block via inter prediction or intra prediction, video encoder 20 forms a residual video block by subtracting the prediction block from the current video block. The residual video data in the residual block may be included in one or more TUs and applied to transform unit 52. Transform unit 52 transforms the residual video data into residual transform coefficients using a transform, such as a Discrete Cosine Transform (DCT) or a conceptually similar transform. Transform unit 52 may convert the residual video data from the pixel domain to a transform domain, such as the frequency domain.
Transform unit 52 may send the resulting transform coefficients to quantization unit 54. Quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The quantization level may be modified by adjusting a quantization parameter. In some examples, quantization unit 54 may then perform a scan of a matrix that includes quantized transform coefficients. Alternatively, entropy encoding unit 56 may perform scanning. As one example, the coding techniques described in this disclosure may be performed in whole or in part by entropy encoding unit 56. However, aspects of the present invention are not limited thereto. For example, the coding techniques described in this disclosure may be performed by a component (e.g., a processor or any other component) of video encoder 20 that is not shown in fig. 2. In some examples, the coding techniques of this disclosure may be performed by one of the other units or modules illustrated in fig. 2. In still other examples, the coding techniques of this disclosure may be performed by a combination of units and modules of video encoder 20. In this manner, video encoder 20 may be configured to perform the example techniques described in this disclosure.
After quantization, entropy encoding unit 56 entropy encodes the quantized transform coefficients. For example, entropy encoding unit 56 may perform Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), Probability Interval Partition Entropy (PIPE) coding, or another entropy encoding method or technique. Following entropy encoding by entropy encoding unit 56, the encoded bitstream may be transmitted to video decoder 30, or archived for later transmission or retrieval by video decoder 30. Entropy encoding unit 56 may also entropy encode the motion vectors and other syntax elements of the current video slice being coded.
In one example of this disclosure, entropy encoding unit 56 may be configured to determine a first prediction type for a block of video data in a P slice, represent the first prediction type as a P-slice prediction type syntax element, determine a second prediction type for a block of video data in a B slice, represent the second prediction type as a B-slice prediction type syntax element, determine a P-slice binarization for the P-slice prediction type syntax element, determine a B-slice binarization for the B-slice prediction type syntax element, wherein the P-slice prediction type syntax element and the B-slice prediction type syntax element are determined using the same binarization logic, and encode the video data based on the binarization for the P-slice prediction type syntax element and the B-slice prediction syntax element.
In another example of this disclosure, entropy encoding unit 56 may be configured to determine a partition type for a prediction mode of a block of video data, encode a partition type bin for a prediction type syntax element of the block of video data using context adaptive binary arithmetic coding with a single context, wherein the single context is the same for any partition type, and encode the partition size bin for the prediction type syntax for the block of video data using context adaptive binary arithmetic coding in bypass mode.
In another example of this disclosure, entropy encoding unit 56 may be configured to code Cb chroma coded block flags for a block of video data using Context Adaptive Binary Arithmetic Coding (CABAC), wherein CABAC uses a context set that includes one or more contexts, and code Cr chroma coded block flags using CABAC, wherein CABAC uses the same context set as Cb chroma coded block flags. Video encoder 20 and video decoder 30 may be further configured to select a context from the one or more contexts based on a transform depth of a transform unit associated with the block of video data.
Inverse quantization unit 58 and inverse transform unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain for later use as a reference block for a reference picture. Motion compensation unit 44 may calculate the reference block by adding the residual block to a predictive block of one of the reference pictures within one of the reference picture lists. Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for motion estimation. Summer 62 adds the reconstructed residual block to the motion compensated prediction block produced by motion compensation unit 44 to produce a reference block for storage in reference picture memory 64. The reference block may be used by motion estimation unit 42 and motion compensation unit 44 as a reference block to inter-predict a block in a subsequent video frame or picture.
Fig. 3 is a block diagram illustrating an example video decoder 30 that may implement the techniques described in this disclosure. In the example of fig. 3, video decoder 30 includes entropy decoding unit 80, prediction unit 81, inverse quantization unit 86, inverse transform unit 88, summer 90, and reference picture memory 92. Prediction unit 81 includes motion compensation unit 82 and intra prediction unit 84. In some examples, video decoder 30 may perform a decoding pass that is substantially reciprocal to the encoding pass described with respect to video encoder 20 of fig. 2.
During the decoding process, video decoder 30 receives an encoded video bitstream and associated syntax elements representing video blocks of an encoded video slice from video encoder 20. Entropy decoding unit 80 of video decoder 30 entropy decodes the bitstream to generate quantized coefficients, motion vectors, and other syntax elements. Entropy decoding unit 80 forwards the motion vectors and other syntax elements to prediction unit 81. Video decoder 30 may receive syntax elements at the video slice level and/or the video block level.
As one example, the coding techniques described in this disclosure may be performed in whole or in part by entropy decoding unit 80. However, aspects of the present invention are not limited thereto. For example, the coding techniques described in this disclosure may be performed by a component (e.g., a processor or any other component) of video decoder 30 that is not shown in fig. 3. In some examples, the coding techniques of this disclosure may be performed by one of the other units or modules illustrated in fig. 3. In still other examples, the coding techniques of this disclosure may be performed by a combination of units and modules of video decoder 30. In this manner, video decoder 30 may be configured to perform the example techniques described in this disclosure.
In one example of this disclosure, entropy decoding unit 80 may be configured to map a binarized P-slice prediction type syntax element to a prediction type using a binarization mapping for a block of video data in a P-slice, map a binarized B-slice prediction type syntax element to a prediction type using the same binarization mapping for a block of video data in a B-slice, and decode the video data based on the mapped prediction type.
In one example of this disclosure, entropy decoding unit 80 may be configured to receive a prediction type syntax element for a block of video data that has been coded using Context Adaptive Binary Arithmetic Coding (CABAC), the prediction type syntax element including a partition type bin representing a partition type and a partition size bin representing a partition size, decode the partition type bin of the prediction type syntax element using context adaptive binary arithmetic coding with a single context, wherein the single context is the same for any partition type, and decode the partition size bin of the prediction type syntax for the block of video data using context adaptive binary arithmetic coding in bypass mode.
In another example of this disclosure, entropy decoding unit 80 may be configured to code Cb chroma coded block flags for a block of video data using Context Adaptive Binary Arithmetic Coding (CABAC), wherein CABAC uses a context set that includes one or more contexts, and code Cr chroma coded block flags using CABAC, wherein CABAC uses the same context set as the Cb chroma coded block flags. Video encoder 20 and video decoder 30 may be further configured to select a context from the one or more contexts based on a transform depth of a transform unit associated with the block of video data.
When a video slice is coded as an intra-coded (I) slice, intra-prediction unit 84 of prediction unit 81 may generate prediction data for a video block of the current video slice based on the signaled intra-prediction mode and data from previously decoded blocks of the current frame or picture. When a video frame is coded as an inter-coded (i.e., B, P or GPB) slice, motion compensation unit 82 of prediction unit 81 generates prediction blocks for video blocks of the current video slice based on motion vectors and other syntax elements received from entropy decoding unit 80. The prediction block may be generated from one of the reference pictures within one of the reference picture lists. Video decoder 30 may construct the reference frame lists, list 0 and list 1, using default construction techniques based on the reference pictures stored in reference picture memory 92.
Motion compensation unit 82 determines prediction information for video blocks of the current video slice by parsing the motion vectors and other syntax elements and uses the prediction information to generate a prediction block for the current video block being decoded. For example, motion compensation unit 82 uses some of the received syntax elements to determine a prediction mode (e.g., intra or inter prediction) used to code video blocks of a video slice, an inter-prediction slice type (e.g., a B-slice, a P-slice, or a GPB slice), construction information for one or more reference picture lists of the slice, a motion vector for each inter-coded video block of the slice, an inter-prediction state for each inter-coded video block of the slice, and other information used to decode video blocks in the current video slice.
Motion compensation unit 82 may also perform interpolation based on the interpolation filter. Motion compensation unit 82 may use interpolation filters used by video encoder 20 during video block encoding to calculate interpolated values for sub-integer pixels of the reference block. In this case, motion compensation unit 82 may determine the interpolation filters used by video encoder 20 according to the received syntax elements and use the interpolation filters to generate the prediction blocks.
Inverse quantization unit 86 inverse quantizes (i.e., dequantizes) the quantized transform coefficients provided in the bitstream and decoded by entropy decoding unit 80. The inverse quantization process may include using quantization parameters calculated by video encoder 20 for each video block in the video slice to determine a degree of quantization, and likewise, a degree of inverse quantization that should be applied. Inverse transform unit 88 applies an inverse transform, such as an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process, to the transform coefficients in order to generate residual blocks in the pixel domain.
After motion compensation unit 82 generates the prediction block for the current video block based on the motion vector and other syntax elements, video decoder 30 forms a decoded video block by summing the residual block from inverse transform unit 88 with the corresponding prediction block generated by motion compensation unit 82. Summer 90 represents the component that performs this summation operation. If desired, a deblocking filter may also be applied to filter the decoded blocks in order to remove blocking artifacts. Other loop filters (in or after the coding loop) may also be used to smooth pixel transitions, or otherwise improve video quality. The decoded video blocks in a given frame or picture are then stored in reference picture memory 92, which stores reference pictures used for subsequent motion compensation. Reference picture memory 92 also stores decoded video for later presentation on a display device, such as display device 32 of fig. 1.
Fig. 6 is a flow diagram illustrating an example video encoding method of this disclosure. The method of fig. 6 may be implemented by video encoder 20. Video encoder 20 may be configured to determine a first prediction type for a block of video data in a P slice (602), and represent the first prediction type as a P slice prediction type syntax element (604). Video encoder 20 may be further configured to determine a second prediction type for the block of video data in the B slice (606), and represent the second prediction type as a B slice prediction type syntax element (608). The P-slice prediction type syntax element and the B-slice prediction type syntax element specify a prediction mode and a partition type. The prediction mode may include one of inter-prediction and intra-prediction. The partition type may include one of a symmetric partition and an asymmetric partition.
Video encoder 20 may be further configured to determine P-slice binarization for a P-slice prediction type syntax element (610), and determine B-slice binarization for a B-slice prediction type syntax element, wherein the P-slice prediction type syntax element and the B-slice prediction type syntax element are determined using the same binarization logic (612). Video encoder 20 may then encode the video data based on the binarization of the P-slice prediction type syntax element and the B-slice prediction type syntax element (614).
Encoding the video data may include binarizing a P-slice prediction type syntax element with the determined P-slice binarization, binarizing a B-slice prediction type syntax element with the determined B-slice binarization, applying Context Adaptive Binary Arithmetic Coding (CABAC) to the binarized P-slice prediction type syntax element, and applying Context Adaptive Binary Arithmetic Coding (CABAC) to the binarized B-slice prediction type syntax element.
Fig. 7 is a flow diagram illustrating an example video decoding method of this disclosure. The method of fig. 7 may be implemented by video decoder 30. Video decoder 30 may be configured to receive a context adaptive binary arithmetic coded P-slice prediction type syntax element indicating a prediction type for a block of video data in a P-slice (702), and receive a context adaptive binary arithmetic coded B-slice prediction type syntax element indicating a prediction type for a block of video data in a B-slice (704). The P-slice prediction type syntax element and the B-slice prediction type syntax element specify a prediction mode and a partition type. The prediction mode may include one of inter-prediction and intra-prediction. The partition type may include one of a symmetric partition and an asymmetric partition.
Video decoder 30 may be further configured to decode the P-slice prediction type syntax element to generate a binarized P-slice prediction type syntax element (706), and decode the B-slice prediction type syntax element to generate a binarized B-slice prediction type syntax element (708). Video decoder 30 may be further configured to map the binarized P-slice prediction type syntax element to a prediction type using a binarization mapping for a block of video data in a P slice (710), and map the binarized B-slice prediction type syntax element to a prediction type using the same binarization mapping for a block of video data in a B slice (712). Video decoder 30 may then decode the video data based on the mapped prediction type (714).
Fig. 8 is a flow diagram illustrating an example video encoding method of this disclosure. The method of fig. 8 may be implemented by video encoder 20. Video encoder 20 may be configured to determine a partition type for a prediction mode for a block of video data (802), and encode partition type bins for prediction type syntax elements for the block of video data using Context Adaptive Binary Arithmetic Coding (CABAC) with a single context (804). The single context is the same for any partition type. In one example, the partition type is an asymmetric partition, and the partition type bin indicates whether the asymmetric partition is vertically partitioned or horizontally partitioned. For example, the partition size bin indicates whether the first partition is one-quarter of the size of the block of video data or whether the first partition is three-quarters of the size of the block of video data.
Video encoder 20 may be further configured to encode partition size bins for prediction type syntax elements of the block of video data using CABAC in bypass mode (806).
Fig. 9 is a flow diagram illustrating an example video decoding method of this disclosure. The method of fig. 9 may be implemented by video decoder 30. Video decoder 30 may be configured to receive prediction type syntax elements for a block of video data that has been coded using Context Adaptive Binary Arithmetic Coding (CABAC), the prediction type syntax elements including a partition type bin representing a partition type and a partition size bin representing a partition size (902). In one example, the partition type is an asymmetric partition, and the partition type bin indicates whether the asymmetric partition is vertically partitioned or horizontally partitioned. For example, the partition size bin indicates whether the first partition is one-quarter of the size of the block of video data or whether the first partition is three-quarters of the size of the block of video data.
Video decoder 30 may be further configured to decode partition type bins of the prediction type syntax element using CABAC with a single context, wherein the single context is the same for either partition type (904), and decode partition size bins of the prediction type syntax element using CABAC in bypass mode (906).
Fig. 10 is a flow diagram illustrating an example video coding method of this disclosure. The method of fig. 10 may be implemented by video encoder 20 or a video decoder. For the purposes of fig. 10, video encoder 20 and video decoder 30 will be generally collectively referred to as a video coder. According to the techniques of fig. 10, a video coder may be configured to code Cb chroma coded block flags for a block of video data using Context Adaptive Binary Arithmetic Coding (CABAC), wherein coding the Cb chroma coded block flags comprises using a set of contexts including one or more contexts as part of CABAC (1002), and code Cr chroma coded block flags using CABAC, wherein coding the Cr chroma coded block flags comprises using the same set of contexts as the Cb chroma coded block flags as part of CABAC (1004). In one example, a context set includes 5 contexts.
In one optional example of this disclosure, the video coder may be further configured to select a context from the one or more contexts based on a transform depth of a transform unit associated with the block of video data (1006).
When operating as a video encoder, the video coder may be further configured to signal a coded Cb chroma coded block flag in the encoded video bitstream and signal a coded Cr chroma coded block flag in the encoded video bitstream. When operating as a video decoder, the video coder may be further configured to receive a coded Cb chroma coded block flag in the encoded video bitstream and receive a coded Cr chroma coded block flag in the encoded video bitstream.
In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media corresponding to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, such as according to a communication protocol. In this manner, a computer-readable medium may generally correspond to (1) a tangible computer-readable storage medium that is not transitory, or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. The computer program product may include a computer-readable medium.
By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.
The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as noted above, the various units may be combined in a codec hardware unit or provided by a collection of interoperating hardware units (including one or more processors as noted above) in conjunction with suitable software and/or firmware.
Various examples have been described. These and other examples are within the scope of the following claims.

Claims (24)

1. A method of encoding video data, comprising:
determining an asymmetric partition of a prediction mode for a block of video data;
encoding a partition type bin of a syntax element that indicates how the block of video data is partitioned using Context Adaptive Binary Arithmetic Coding (CABAC) with a single context, wherein the syntax element comprises a binary string, wherein the partition type bin indicates whether the block of video data is a symmetric partition or an asymmetric partition, wherein the single context is the same for any asymmetric partition, and wherein the single context is a probability model; and
encoding partition size bins of the syntax element using CABAC in bypass mode.
2. The method of claim 1, wherein the partition type bin is a second-to-last bin in the binary string.
3. The method of claim 1, wherein the partition size bin is a last bin in the binary string, and wherein the partition size bin indicates whether a first partition is one-quarter of a size of the block of video data or whether the first partition is three-quarters of the size of the block of video data.
4. A method of decoding video data, comprising:
receiving syntax elements for a block of video data that has been coded using Context Adaptive Binary Arithmetic Coding (CABAC), the syntax elements indicating how the block of video data is partitioned, the syntax elements including a partition type bin representing a partition type and a partition size bin representing a partition size;
decoding the partition type bin of the syntax element using CABAC with a single context, wherein the syntax element comprises a binary string, wherein the partition type bin indicates whether the block of video data is a symmetric partition or an asymmetric partition, wherein the single context is the same for any asymmetric partition, and wherein the single context is a probability model;
decoding the partition size bin of the syntax element using CABAC in bypass mode; and
determining an asymmetric partition of a prediction mode for the block of video data based on the decoded partition type bin and the decoded partition size bin.
5. The method of claim 4, wherein the partition type bin is a second-to-last bin in the binary string.
6. The method of claim 4, wherein the partition size bin is a last bin in the binary string, and wherein the partition size bin indicates whether a first partition is one-quarter of a size of the block of video data or whether the first partition is three-quarters of the size of the block of video data.
7. An apparatus configured to encode video data, the apparatus comprising:
a memory configured to store a block of video data; and
a video encoder configured to:
determining an asymmetric partition of a prediction mode for a block of video data;
encoding a partition type bin of a syntax element that indicates how the block of video data is partitioned using Context Adaptive Binary Arithmetic Coding (CABAC) with a single context, wherein the syntax element comprises a binary string, wherein the partition type bin indicates whether the block of video data is a symmetric partition or an asymmetric partition, wherein the single context is the same for any asymmetric partition, and wherein the single context is a probability model; and
encoding partition size bins of the syntax element using CABAC in bypass mode.
8. The apparatus of claim 7, wherein the partition type bin is a second-to-last bin in the binary string.
9. The apparatus of claim 7, wherein the partition size bin is a last bin in the binary string, and wherein the partition size bin indicates whether a first partition is one-quarter of a size of the block of video data or whether the first partition is three-quarters of the size of the block of video data.
10. An apparatus configured to decode video data, the apparatus comprising:
a memory configured to store a block of video data; and
a video decoder configured to:
receiving syntax elements for a block of video data that has been coded using Context Adaptive Binary Arithmetic Coding (CABAC), the syntax elements indicating how the block of video data is partitioned, the syntax elements including a partition type bin representing a partition type and a partition size bin representing a partition size;
decoding the partition type bin of the syntax element using CABAC with a single context, wherein the syntax element comprises a binary string, wherein the partition type bin indicates whether the block of video data is a symmetric partition or an asymmetric partition, wherein the single context is the same for any asymmetric partition, and wherein the single context is a probability model;
decoding the partition size bin of the syntax element using CABAC in bypass mode; and
determining an asymmetric partition of a prediction mode for the block of video data based on the decoded partition type bin and the decoded partition size bin.
11. The apparatus of claim 10, wherein the partition type bin is a second-to-last bin in the binary string.
12. The apparatus of claim 10, wherein the partition size bin is a last bin in the binary string, and wherein the partition size bin indicates whether a first partition is one-quarter of a size of the block of video data or whether the first partition is three-quarters of the size of the block of video data.
13. An apparatus configured to encode video data, comprising:
means for determining an asymmetric partition of a prediction mode for a block of video data;
means for encoding a partition type bin of a syntax element using Context Adaptive Binary Arithmetic Coding (CABAC) with a single context, the syntax element indicating how the block of video data is partitioned, wherein the syntax element comprises a binary string, wherein the partition type bin indicates whether the block of video data is a symmetric partition or an asymmetric partition, wherein the single context is the same for any asymmetric partition, and wherein the single context is a probability model; and
means for encoding partition size bins for the syntax element using CABAC in bypass mode.
14. The apparatus of claim 13, wherein the partition type bin is a second-to-last bin in the binary string.
15. The apparatus of claim 13, wherein the partition size bin is a last bin in the binary string, and wherein the partition size bin indicates whether a first partition is one-quarter of a size of the block of video data or whether the first partition is three-quarters of the size of the block of video data.
16. An apparatus configured to decode video data, comprising:
means for receiving syntax elements for a block of video data that has been coded using Context Adaptive Binary Arithmetic Coding (CABAC), the syntax elements indicating how the block of video data is partitioned, the syntax elements including a partition type bin representing a partition type and a partition size bin representing a partition size;
means for decoding the partition type bin of the syntax element using CABAC with a single context, wherein the syntax element comprises a binary string, wherein the partition type bin indicates whether the block of video data is a symmetric partition or an asymmetric partition, wherein the single context is the same for any asymmetric partition, and wherein the single context is a probability model;
means for decoding the partition size bin of the syntax element using CABAC in bypass mode; and
means for determining an asymmetric partition of a prediction mode for the block of video data based on the decoded partition type bin and the decoded partition size bin.
17. The apparatus of claim 16, wherein the partition type bin is a second-to-last bin in the binary string.
18. The apparatus of claim 16, wherein the partition size bin is a last bin in the binary string, and wherein the partition size bin indicates whether a first partition is one-quarter of a size of the block of video data or whether the first partition is three-quarters of the size of the block of video data.
19. A non-transitory computer-readable storage medium storing instructions that, when executed, cause one or more processors configured to encode video data to:
determining an asymmetric partition of a prediction mode for a block of video data;
encoding a partition type bin of a syntax element that indicates how the block of video data is partitioned using Context Adaptive Binary Arithmetic Coding (CABAC) with a single context, wherein the syntax element comprises a binary string, wherein the partition type bin indicates whether the block of video data is a symmetric partition or an asymmetric partition, wherein the single context is the same for any asymmetric partition, and wherein the single context is a probability model; and
encoding partition size bins of the syntax element using CABAC in bypass mode.
20. The non-transitory computer-readable storage medium of claim 19, wherein the partition type bin is a second-to-last bin in the binary string.
21. The non-transitory computer-readable storage medium of claim 19, wherein the partition size bin is a last bin in the binary string, and wherein the partition size bin indicates whether a first partition is one-quarter of a size of the block of video data or whether the first partition is three-quarters of the size of the block of video data.
22. A non-transitory computer-readable storage medium storing instructions that, when executed, cause one or more processors configured to decode video data to:
receiving syntax elements for a block of video data that has been coded using Context Adaptive Binary Arithmetic Coding (CABAC), the syntax elements indicating how the block of video data is partitioned, the syntax elements including a partition type bin representing a partition type and a partition size bin representing a partition size;
decoding the partition type bin of the syntax element using CABAC with a single context, wherein the syntax element comprises a binary string, wherein the partition type bin indicates whether the block of video data is a symmetric partition or an asymmetric partition, wherein the single context is the same for any asymmetric partition, and wherein the single context is a probability model;
decoding the partition size bin of the syntax element using CABAC in bypass mode; and
an asymmetric partition of a prediction mode for the block of video data is determined based on the decoded partition type bin and the decoded partition size bin.
23. The non-transitory computer-readable storage medium of claim 22, wherein the partition type bin is a second-to-last bin in the binary string.
24. The non-transitory computer-readable storage medium of claim 22, wherein the partition size bin is a last bin in the binary string, and wherein the partition size bin indicates whether a first partition is one-quarter of a size of the block of video data or whether the first partition is three-quarters of the size of the block of video data.
HK14111884.0A 2011-11-08 2012-10-05 A method and apparatus for encoding and decoding video data HK1198401B (en)

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
US201161557325P 2011-11-08 2011-11-08
US61/557,325 2011-11-08
US201161561911P 2011-11-20 2011-11-20
US61/561,911 2011-11-20
US13/645,308 2012-10-04
US13/645,308 US9451287B2 (en) 2011-11-08 2012-10-04 Context reduction for context adaptive binary arithmetic coding
PCT/US2012/059095 WO2013070354A1 (en) 2011-11-08 2012-10-05 Number of contexts reduction for context adaptive binary arithmetic coding

Publications (2)

Publication Number Publication Date
HK1198401A1 HK1198401A1 (en) 2015-04-17
HK1198401B true HK1198401B (en) 2018-06-22

Family

ID=

Similar Documents

Publication Publication Date Title
AU2012336323B2 (en) Context reduction for context adaptive binary arithmetic coding
HK1198401B (en) A method and apparatus for encoding and decoding video data
HK40003967A (en) Number of context reduction for context adaptive binary arithmetic coding
HK1196719B (en) Number of context reduction for context adaptive binary arithmetic coding
HK1196719A (en) Number of context reduction for context adaptive binary arithmetic coding
HK1199152B (en) Context reduction for context adaptive binary arithmetic coding