[go: up one dir, main page]

HK40000797A - Intra video coding using a decoupled tree structure - Google Patents

Intra video coding using a decoupled tree structure Download PDF

Info

Publication number
HK40000797A
HK40000797A HK19124169.4A HK19124169A HK40000797A HK 40000797 A HK40000797 A HK 40000797A HK 19124169 A HK19124169 A HK 19124169A HK 40000797 A HK40000797 A HK 40000797A
Authority
HK
Hong Kong
Prior art keywords
mode
candidate list
video
modes
chroma
Prior art date
Application number
HK19124169.4A
Other languages
Chinese (zh)
Other versions
HK40000797B (en
Inventor
张莉
钱威俊
陈建乐
赵欣
马尔塔‧卡切维奇
Original Assignee
高通股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 高通股份有限公司 filed Critical 高通股份有限公司
Publication of HK40000797A publication Critical patent/HK40000797A/en
Publication of HK40000797B publication Critical patent/HK40000797B/en

Links

Description

Intra-video coding using decoupled tree structures
The present application claims the benefit of U.S. provisional application No. 62/375,383, filed on 8/15/2016 and U.S. provisional application No. 62/404,572, filed on 10/5/2016, each of which is hereby incorporated by reference herein in its entirety.
Technical Field
The present disclosure relates to video coding.
Background
Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, Personal Digital Assistants (PDAs), laptop or desktop computers, tablet computers, electronic book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video gaming consoles, cellular or satellite radio telephones, so-called "smart phones," video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video coding techniques, such as those described in standards defined by various video coding standards. Video coding standards include ITU-T H.261, ISO/IEC MPEG-1Visual, ITU-T H.262, or ISO/IEC MPEG-2Visual, ITU-T H.263, ISO/IEC MPEG-4Visual, and ITU-T H.264 (also known as ISO/IEC MPEG-4AVC), including their Scalable Video Coding (SVC) and Multiview Video Coding (MVC) extensions.
In addition, new video coding standards, i.e., High Efficiency Video Coding (HEVC), have been recently developed by the video coding joint collaboration group (JCT-VC) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Motion Picture Experts Group (MPEG). The latest HEVC draft specification, hereinafter referred to as "HEVC WD", is available from http:// phenix. int-evry. fr/jct/doc _ end _ user/documents/14_ Vienna/wg11/JCTVC-N1003-v1. zip. The specification of HEVC and its extensions (format range (RExt), Scalability (SHVC), and multi-view (MV-HEVC) extensions and screen content extensions) may be derived from http:// phenix. id 10481. ITU-t v ceg (Q6/16) and ISO/IEC MPEG (JTC 1/SC29/WG 11) are now investigating the potential need for future video coding technology standardization with compression capabilities significantly exceeding those of the current HEVC standard, including its current extensions and recent extensions for screen content coding and high dynamic range coding.
The panel is working together on this exploration activity in Joint collaborative work, called Joint video exploration Team (jmet), to evaluate the compression technology design suggested by the panel's experts in this field. Jvt meets for the first time during days 19 to 21 of month 10 in 2015. The latest version of the reference software (i.e., Joint exploration model 3(JEM 3)) is downloadable from https:// jvet. hhi. fraunhofer. de/svn/svn _ HMJEMSOFORWare/tags/HM-16.6-JEM-3.0/. The Algorithm description of JEM3 is further described in "Algorithm description of Joint expression Test Model 3" in J.Chen, E.Alshina, G.J.Sullivan, J.R.ohm, J.Boyce (JVET-C1001, Nissan., 2016, 1 month).
Video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing such video coding techniques. Video coding techniques include spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (e.g., a video frame or portion of a video frame) may be partitioned into video blocks, which may also be referred to as treeblocks, Coding Units (CUs), and/or coding nodes for some techniques. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in inter-coded (P or B) slices of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. A picture may be referred to as a frame and a reference picture may be referred to as a reference frame.
Spatial or temporal prediction generates a predictive block for a block to be coded. The residual data represents pixel differences between the original block to be coded and the predictive block. An inter-coded block is encoded according to a motion vector that points to a block of reference samples that forms a predictive block and residual data that indicates a difference between the coded block and the predictive block. The intra-coded block is encoded according to an intra-coding mode and the residual data. For further compression, the residual data may be transformed from the pixel domain to the transform domain, producing residual transform coefficients, which may then be quantized. Quantized transform coefficients, initially arranged as a two-dimensional array, may be scanned in order to generate a one-dimensional vector of transform coefficients, and entropy coding may be applied to achieve even more compression.
Disclosure of Invention
In general, this disclosure describes techniques related to coding (e.g., decoding or encoding) video data using intra-prediction (in some cases, according to a tree structure that provides different splitting information for luma and chroma components). That is, the luma partition tree structure may be decoupled from the corresponding chroma partition tree structure according to various partitioning schemes compatible with the described techniques. The described techniques may be used in the context of advanced video codecs, such as extensions to HEVC or the next generation video coding standard.
In one example, a device for coding video data includes a memory and a processing circuit in communication with the memory. The memory of the device is configured to store video data. The processing circuit is configured to determine a plurality of Derived Modes (DM) that are usable to predict a luma block of the video data stored to the memory are also usable to predict a chroma block of the video data stored to the memory, the chroma block corresponding to the luma block. The processing circuit is further configured to form a candidate list for a prediction mode for the chroma block, the candidate list including one or more DMs of the plurality of DMs that may be used to predict the chroma block. The processing circuit is further configured to determine to code the chroma block using any of the one or more DMs of the candidate list, and code an indication identifying a selected DM of the candidate list to be used for coding the chroma block based on the determination to code the chroma block using any of the one or more DMs of the candidate list. The processing circuit is further configured to code the chroma block according to the selected DM of the candidate list.
In another example, a method of coding video data includes determining a plurality of Derived Modes (DMs) that can be used to predict a luma block of the video data may also be used to predict a chroma block of the video data, the chroma block corresponding to the luma block. The method further comprises: forming a candidate list for a prediction mode for the chroma block, the candidate list including one or more DM of the plurality of DM usable for predicting the chroma block; and determining to code the chroma block using any DM of the one or more DMs of the candidate list. The method further comprises: coding, based on the determination to code the chroma block using any DM of the one or more DMs of the candidate list, an indication that identifies a selected DM of the candidate list that will be used to code the chroma block; and coding the chroma block according to the selected DM of the candidate list.
In another example, an apparatus includes means for determining a plurality of Derived Modes (DMs) that may be used to predict a luma block of the video data may also be used to predict a chroma block of the video data, the chroma block corresponding to the luma block. The method further comprises: forming a candidate list for a prediction mode for the chroma block, the candidate list including one or more DM of the plurality of DM usable for predicting the chroma block; and determining to code the chroma block using any DM of the one or more DMs of the candidate list. The apparatus further comprises: means for forming a candidate list for a prediction mode for the chroma block, the candidate list including one or more DMs of the plurality of DMs that may be used to predict the chroma block; and means for determining to code the chroma block using any DM of the one or more DMs of the candidate list. The apparatus further comprises: means for coding, based on the determination to code the chroma block using any DM of the one or more DMs of the candidate list, an indication that identifies a selected DM of the candidate list that will be used to code the chroma block; and code the chroma block according to the selected DM of the candidate list.
In another example, a non-transitory computer-readable storage medium is encoded with instructions that, when executed, cause a processor of a computing device to determine a plurality of Derived Modes (DM) that may be used to predict a luma block of the video data may also be used to predict a chroma block of the video data, the chroma block corresponding to the luma block. The instructions, when executed, further cause the processor to: forming a candidate list for a prediction mode for the chroma block, the candidate list including one or more DM of the plurality of DM usable for predicting the chroma block; and determining to code the chroma block using any DM of the one or more DMs of the candidate list. The instructions, when executed, further cause the processor to: forming a candidate list for a prediction mode for the chroma block, the candidate list including one or more DM of the plurality of DM usable for predicting the chroma block; and determining to code the chroma block using any DM of the one or more DMs of the candidate list. The instructions, when executed, further cause the processor to: coding, based on the determination to code the chroma block using any DM of the one or more DMs of the candidate list, an indication that identifies a selected DM of the candidate list that will be used to code the chroma block; and coding the chroma block according to the selected DM of the candidate list.
In another example, a device for coding video data includes a memory and a processing circuit in communication with the memory. The memory of the device is configured to store video data. The processing circuit is configured to: forming a Most Probable Mode (MPM) candidate list for a chroma block of the video data stored to the memory, such that the MPM candidate list includes one or more Derived Modes (DM) associated with a luma block of the video data associated with the chroma block, and a plurality of luma prediction modes usable to code a luma component of the video data. The processing circuit is further configured to: selecting a mode from the MPM candidate list; and code the chroma block according to the mode selected from the MPM candidate list.
In another example, a method of coding video data includes: a Most Probable Mode (MPM) candidate list is formed for a chroma block of the video data such that the MPM candidate list includes one or more Derived Modes (DMs) associated with a luma block of the video data associated with the chroma block and a plurality of luma prediction modes usable to code a luma component of the video data. The method further comprises: selecting a mode from the MPM candidate list; and code the chroma block according to the mode selected from the MPM candidate list.
In another example, an apparatus includes means for: a Most Probable Mode (MPM) candidate list is formed for a chroma block of the video data such that the MPM candidate list includes one or more Derived Modes (DMs) associated with a luma block of the video data associated with the chroma block and a plurality of luma prediction modes usable to code a luma component of the video data. The apparatus further comprises: means for selecting a mode from the MPM candidate list; and code the chroma block according to the mode selected from the MPM candidate list.
In another example, a non-transitory computer-readable storage medium is encoded with instructions that, when executed, cause a processor of a computing device to: forming a Most Probable Mode (MPM) candidate list for a chroma block of the video data stored to the memory, such that the MPM candidate list includes one or more Derived Modes (DM) associated with a luma block of the video data associated with the chroma block, and a plurality of luma prediction modes usable to code a luma component of the video data. The instructions, when executed, cause the processor of the computing device to: selecting a mode from the MPM candidate list; and code the chroma block according to the mode selected from the MPM candidate list.
The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
Drawings
Fig. 1 is a block diagram depicting an example video encoding and decoding system that may be configured to perform the techniques of this disclosure.
Fig. 2 is a block diagram showing an example of a video encoder that may be configured to perform the techniques of this disclosure.
Fig. 3 is a block diagram showing an example of a video decoder that may be configured to perform the techniques of this disclosure.
Fig. 4 is a conceptual diagram depicting aspects of intra prediction.
Fig. 5 is a conceptual diagram showing an intra prediction mode for a luminance block.
Fig. 6 is a conceptual diagram showing an aspect of a planar mode.
Fig. 7 is a conceptual diagram depicting aspects of angular modes according to HEVC.
Fig. 8 is a conceptual diagram showing an example of nominal vertical and horizontal position luma and chroma samples in a picture.
Fig. 9 is a conceptual diagram showing the locations of samples used to derive parameters used in prediction according to the Linear Model (LM) mode.
FIG. 10 is a conceptual diagram showing a binary Quadtree (QTBT) structure.
Fig. 11A and 11B show examples of independent segmentation structures for corresponding luma and chroma blocks according to a QTBT segmentation scheme.
Fig. 12A and 12B illustrate adaptively ordered neighboring block selections for chroma prediction modes in accordance with one or more aspects of the present disclosure.
Fig. 13A and 13B are conceptual diagrams showing examples of block positions that a video encoding device and a decoding device may use to select chroma intra prediction modes according to the techniques described above based on multiple DM mode selections.
Fig. 14 is a flow diagram depicting an example process executable by processing circuitry of a video decoding device according to aspects of this disclosure.
Fig. 15 is a flow diagram depicting an example process executable by processing circuitry of a video encoding device according to aspects of this disclosure.
Fig. 16 is a flow diagram showing an example process executable by the processing circuitry of a video decoding device according to an aspect of this disclosure.
Fig. 17 is a flow diagram depicting an example process executable by processing circuitry of a video encoding device according to aspects of this disclosure.
Detailed Description
Fig. 1 is a block diagram depicting an example video encoding and decoding system 10 that may be configured to perform the techniques of this disclosure with respect to motion vector prediction. As shown in fig. 1, system 10 includes a source device 12 that provides encoded video data that is to be later decoded by a destination device 14. In particular, source device 12 provides video data to destination device 14 via computer-readable medium 16. Source device 12 and destination device 14 may comprise any of a wide range of devices, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets (e.g., so-called "smart" phones), so-called "smart" tablets, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, and so forth. In some cases, source device 12 and destination device 14 may be equipped for wireless communication.
Destination device 14 may receive encoded video data to be decoded via computer-readable medium 16. Computer-readable medium 16 may comprise any type of medium or device capable of moving encoded video data from source device 12 to destination device 14. In one example, computer-readable medium 16 may comprise a communication medium to enable source device 12 to transmit encoded video data directly to destination device 14 in real-time. The encoded video data may be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to destination device 14. The communication medium may comprise any wireless or wired communication medium, such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network, such as the internet. The communication medium may include routers, switches, base stations, or any other apparatus that may be suitable for facilitating communication from source device 12 to destination device 14.
In some examples, encoded data may be output from output interface 22 to a storage device. Similarly, encoded data may be accessed from the storage device by the input interface. The storage device may include any of a variety of distributed or locally accessed data storage media such as a hard drive, Blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data. In yet another example, the storage device may correspond to a file server or another intermediate storage device that may store the encoded video generated by source device 12. Destination device 14 may access stored video data from storage via streaming or download. The file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to destination device 14. Example file servers include web servers (e.g., for a website), FTP servers, Network Attached Storage (NAS) devices, or local disk drives. Destination device 14 may access the encoded video data over any standard data connection, including an internet connection. Such a connection may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both, suitable for accessing encoded video data stored on a file server. The transmission of encoded video data from the storage device may be a streaming transmission, a download transmission, or a combination thereof.
The techniques of this disclosure are not necessarily limited to wireless applications or settings. The techniques may be applied to video coding to support any of a variety of multimedia applications, such as over-the-air television broadcasts, cable television transmissions, satellite television transmissions, internet streaming video transmissions, such as dynamic adaptive streaming over HTTP (DASH), digital video encoded onto a data storage medium, decoding of digital video stored on a data storage medium, or other applications. In some examples, system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.
In the example of fig. 1, source device 12 includes a video source 18, a video encoder 20, and an output interface 22. Destination device 14 includes input interface 28, video decoder 30, and display device 32. In accordance with this disclosure, video encoder 20 of source device 12 may be configured to apply the techniques of this disclosure with respect to motion vector prediction. In other examples, the source device and the destination device may include other components or arrangements. For example, source device 12 may receive video data from an external video source 18, such as an external camera. Likewise, destination device 14 may interface with an external display device, rather than including an integrated display device.
The depicted system 10 of FIG. 1 is merely one example. The techniques of this disclosure relating to motion vector prediction may be performed by any digital video encoding and/or decoding device. Although the techniques of this disclosure are typically performed by a video encoding device, the techniques may also be performed by a video encoder/decoder (commonly referred to as a "CODEC"). Furthermore, the techniques of this disclosure may also be performed by a video preprocessor. Source device 12 and destination device 14 are merely examples of such coding devices that source device 12 generates coded video data for transmission to destination device 14. In some examples, devices 12, 14 may operate in a substantially symmetric manner such that each of devices 12, 14 includes video encoding and decoding components. Thus, system 10 may support one-way or two-way video propagation, for example, between video devices 12, 14 for video streaming processing, video playback, video broadcasting, or video telephony.
Video source 18 of source device 12 may include a video capture device, such as a video camera, a video archive containing previously captured video, and/or a video feed interface for receiving video from a video content provider. As another alternative, video source 18 may generate computer graphics-based data as the source video, or a combination of live video, archived video, and computer-generated video. In some cases, if video source 18 is a video camera, source device 12 and destination device 14 may form so-called camera phones or video phones. However, as mentioned above, the techniques described in this disclosure may be applicable to video coding in general, and may be applicable to wireless and/or wired applications. In each case, the captured, pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video information may then be output by output interface 22 onto computer-readable medium 16.
Computer-readable medium 16 may include transitory media such as a wireless broadcast or a wired network transmission, or storage media (i.e., non-transitory storage media) such as a hard disk, flash drive, compact disc, digital video disc, blu-ray disc, or other computer-readable media. In some examples, a network server (not shown) may receive encoded video data from source device 12 and provide the encoded video data to destination device 14, such as via a network transmission. Similarly, a computing device of a media generation facility (e.g., a disc stamping facility) may receive encoded video data from source device 12 and produce a disc containing the encoded video data. Thus, in various examples, computer-readable medium 16 may be understood to include one or more computer-readable media in various forms.
Input interface 28 of destination device 14 receives information from computer-readable medium 16. The information of computer-readable medium 16 may include syntax information defined by video encoder 20 that is also used by video decoder 30, including syntax elements that describe characteristics and/or processing of blocks and other coded units (e.g., GOPs). Display device 32 displays the decoded video data to a user, and may comprise any of a variety of display devices, such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or another type of display device.
Video encoder 20 and video decoder 30 may operate according to a video coding standard, such as the High Efficiency Video Coding (HEVC) standard, an extension of the HEVC standard, or a subsequent standard, such as ITU-t h.266. Alternatively, video encoder 20 and video decoder 30 may operate according to other proprietary or industry standards, such as the ITU-T h.264 standard, alternatively referred to as MPEG-4, part 10, Advanced Video Coding (AVC), or extensions of these standards. However, the techniques of this disclosure are not limited to any particular coding standard. Other examples of video coding standards include MPEG-2 and ITU-T H.263. Although not shown in fig. 1, in some aspects, video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder, and may include appropriate MUX-DEMUX units, or other hardware and software, to handle encoding of both audio and video in a common data stream or separate data streams. Where applicable, the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol or other protocols such as the User Datagram Protocol (UDP).
Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuits, such as one or more microprocessors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic, software, hardware, solid state, or any combinations thereof. When the techniques are implemented in part in software, a device may store instructions for the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in the respective device.
Video coding standards include ITU-T H.261, ISO/IEC MPEG-1Visual, ITU-T H.262, or ISO/IECMPEG-2Visual, ITU-T H.263, ISO/IEC MPEG-4Visual, and ITU-T H.264 (also known as ISO/IECMPEG-4AVC), including Scalable Video Coding (SVC) and Multiview Video Coding (MVC) extensions thereof. One joint draft of MVC is described in "advanced video coding for general audio-visual services" (ITU-T standard h.264) at 3 months 2010.
In addition, there are newly developed video coding standards, namely High Efficiency Video Coding (HEVC) developed by the video coding joint collaboration group (JCT-VC) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG). A recent draft of HEVC is available from http:// phenix. int-evry. fr/jct/doc _ end _ user/documents/12_ Geneva/wg11/JCTVC-L1003-v34. zip. The HEVC standard is also proposed jointly in the standard ITU-T h.265 and the international standard ISO/IEC23008-2, both named "high efficiency video coding" and both published in 10 months 2014.
The JCT-VC developed the HEVC standard. HEVC standardization efforts are based on an evolution model of the video coding device, referred to as the HEVC test model (HM). The HM assumes several additional capabilities of the video coding device relative to existing devices according to, for example, ITU-T h.264/AVC. For example, although h.264 provides nine intra-prediction encoding modes, the HEVC HM may provide up to thirty-three intra-prediction encoding modes.
In general, the working model for HM describes that a video frame or picture may be divided into a sequence of tree blocks or Largest Coding Units (LCUs) that include both luma samples and chroma samples. Syntax data within the bitstream may define a size of the LCU, which is the largest coding unit in terms of the number of pixels. A slice includes a number of consecutive treeblocks in coding order. A video frame or picture may be partitioned into one or more slices. Each treeblock may be split into Coding Units (CUs) according to a quadtree. In general, a quadtree data structure contains one node per CU, where the root node corresponds to a tree block. If a CU is split into four sub-CUs, the node corresponding to the CU includes four leaf nodes, each of the four leaf nodes corresponding to one of the sub-CUs.
Each node of the quadtree data structure may provide syntax data for the corresponding CU. For example, a node in a quadtree may include a split flag that indicates whether a CU corresponding to the node is split into sub-CUs. Syntax elements for a CU may be recursively defined and may depend on whether the CU is split into sub-CUs. If a CU is not split further, then that CU is referred to as a leaf CU. In the present invention, four sub-CUs of a leaf CU will be referred to as leaf CUs even if there is no significant splitting of the original leaf CU. For example, if a CU of size 16 × 16 is not further split, then the four 8 × 8 sub-CUs will also be referred to as leaf CUs, even though the 16 × 16CU is never split.
A CU has a similar purpose to a macroblock of the h.264 standard, except that the CU has no size difference. For example, a tree-type block may be split into four child nodes (also referred to as child CUs), and each child node may in turn be a parent node and may be split into another four child nodes. The final non-split child nodes, referred to as leaf nodes of the quadtree, comprise coding nodes, also referred to as leaf-CUs. Syntax data associated with a coded bitstream may define a maximum number of times a treeblock may be split (which is referred to as a maximum CU depth), and may also define a minimum size for the coding node. Thus, the bitstream may also define a minimum coding unit (SCU). This disclosure uses the term "block" to refer to any of a CU, PU, or TU in the context of HEVC, or similar data structures in the context of other standards (e.g., macroblocks and their sub-blocks in h.264/AVC).
A CU includes a coding node and Prediction Units (PUs) and Transform Units (TUs) associated with the coding node. The size of a CU corresponds to the size of the coding node and must be square in shape. The size of a CU may range from 8 × 8 pixels up to the size of a tree block with a maximum of 64 × 64 pixels or more. Each CU may contain one or more PUs and one or more TUs. Syntax data associated with a CU may describe, for example, partitioning the CU into one or more PUs. The partition mode may be different between CU skipped or direct mode encoding, intra prediction mode encoding, or inter prediction mode encoding. The PU may be segmented into non-square shapes. Syntax data associated with a CU may also describe partitioning the CU into one or more TUs, e.g., according to a quadtree. The TU may be square or non-square (e.g., rectangular) in shape.
The HEVC standard allows for a transform according to a TU, which may be different for different CUs. TUs are typically sized based on the size of a PU within a given CU defined for a partitioned LCU, although this may not always be the case. The size of a TU is typically the same or smaller than a PU. In some examples, residual samples corresponding to a CU may be subdivided into smaller units using a quadtree structure referred to as a "residual quadtree" (RQT). The leaf nodes of the RQT may be referred to as Transform Units (TUs). Pixel difference values associated with TUs may be transformed to produce transform coefficients that may be quantized.
A leaf CU may include one or more Prediction Units (PUs). In general, a PU represents a spatial region corresponding to all or part of a corresponding CU, and may include data for retrieving a reference sample for the PU. In addition, the PU contains data related to prediction. For example, when a PU is intra-mode encoded, data of the PU may be included in a Residual Quadtree (RQT), which may include data describing an intra-prediction mode for a TU corresponding to the PU. As another example, when the PU is inter-mode encoded, the PU may include data defining one or more motion vectors for the PU. The data defining the motion vector for the PU may describe, for example, a horizontal component of the motion vector, a vertical component of the motion vector, a resolution of the motion vector (e.g., one-quarter pixel precision or one-eighth pixel precision), a reference picture to which the motion vector points, and/or a reference picture list (e.g., list 0, list 1, or list C) of the motion vector.
A leaf-CU having one or more PUs may also include one or more Transform Units (TUs). As discussed above, the transform units may be specified using RQTs (also referred to as TU quadtree structures). For example, the split flag may indicate whether a leaf CU is split into four transform units. Each transform unit may then be further split into other sub-TUs. When a TU is not further split, it may be referred to as a leaf-TU. In general, for intra coding, all leaf-TUs belonging to a leaf-CU share the same intra prediction mode. That is, the same intra prediction mode is generally applied to calculate prediction values for all TUs of a leaf-CU. For intra coding, a video encoder may calculate a residual value for each leaf-TU as the difference between the portion of the CU corresponding to the TU and the original block using an intra-prediction mode. TUs are not necessarily limited to the size of a PU. Thus, TU may be larger or smaller than PU. For intra coding, a PU may be co-located with a corresponding leaf-TU of the same CU. In some examples, the maximum size of a leaf-TU may correspond to the size of the corresponding leaf-CU.
Furthermore, the TUs of a leaf-CU may also be associated with a respective quadtree data structure, referred to as a Residual Quadtree (RQT). That is, a leaf-CU may include a quadtree that indicates how the leaf-CU is partitioned into TUs. The root node of a TU quadtree typically corresponds to a leaf CU, while the root node of a CU quadtree typically corresponds to a tree-type block (or LCU). The un-split TUs of the RQT are referred to as leaf-TUs. In general, this disclosure uses the terms CU and TU to refer to leaf-CU and leaf-TU, respectively, unless otherwise indicated.
A video sequence typically comprises a series of video frames or pictures. A group of pictures (GOP) generally includes one or more of a series of video pictures. The GOP may include syntax data in a header of the GOP, a header of one or more of the pictures, or elsewhere, that describes the number of pictures included in the GOP. Each slice of a picture may include slice syntax data that describes an encoding mode for the respective slice. Video encoder 20 typically operates on video blocks within individual video slices in order to encode the video data. The video block may correspond to a coding node within a CU. Video blocks may have fixed or varying sizes, and may be different sizes according to a specified coding standard.
As an example, the HM supports prediction of various PU sizes. Assuming that the size of a particular CU is 2N × 2N, the HM supports intra prediction of PU sizes of 2N × 2N or N × N (in the case of 8 × 8 CU), and inter prediction of symmetric PU sizes of 2N × 2N, 2N × N, N × 2N, or N × N. The HM also supports asymmetric partitioning for inter prediction for PU sizes of 2 nxnu, 2 nxnd, nlx 2N and nR x 2N. In asymmetric partitioning, one direction of a CU is not partitioned, while the other direction is partitioned into 25% and 75%. The portion of the CU corresponding to the 25% split is indicated by an indication of "n" followed by "Up", "Down", "Left", or "Right". Thus, for example, "2N × nU" refers to a 2N × 2N CU horizontally partitioned with a top 2N × 0.5N PU and a bottom 2N × 1.5N PU.
In this disclosure, "nxn" and "N by N" are used interchangeably to refer to the pixel size of a video block in the vertical and horizontal dimensions, e.g., 16 x 16 pixels or 16 by 16 pixels. In general, a 16 × 16 block will have 16 pixels in the vertical direction (y 16) and 16 pixels in the horizontal direction (x 16). Likewise, an nxn block typically has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. The pixels in a block may be arranged in rows and columns. Further, the block does not necessarily need to have the same number of pixels in the horizontal direction as in the vertical direction. For example, a block may comprise N × M pixels, where M is not necessarily equal to N.
After using intra-predictive or inter-predictive coding of PUs of the CU, video encoder 20 may calculate residual data for the TUs of the CU. The PU may comprise syntax data that describes a method or mode of generating predictive pixel data in the spatial domain, also referred to as the pixel domain, and the TU may comprise coefficients in the transform domain after applying a transform (e.g., a Discrete Cosine Transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform) to the residual video data. The residual data may correspond to pixel differences between pixels of the unencoded picture and prediction values corresponding to the PU. Video encoder 20 may form TUs that include the residual data of the CU and then transform the TUs to generate transform coefficients for the CU.
Video encoder 20 may perform quantization on the transform coefficients after any transform used to generate the transform coefficients. Quantization generally refers to a process of quantizing transform coefficients to potentially reduce the amount of data used to represent the transform coefficients, thereby providing further compression. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value may be downscaled to an m-bit value during quantization, where n is greater than m.
After quantization, the video encoder may scan the transform coefficients, producing a one-dimensional vector from a two-dimensional matrix including the quantized transform coefficients. The scan may be designed to place higher energy (and therefore lower frequency) coefficients at the front of the array and lower energy (and therefore higher frequency) coefficients at the back of the array. In some examples, video encoder 20 may utilize a predefined scan order to scan the quantized transform coefficients to generate a serialized vector that may be entropy encoded. In other examples, video encoder 20 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 20 may entropy encode the one-dimensional vector, e.g., according to Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), Probability Interval Partitioning Entropy (PIPE) coding, or another entropy encoding method. Video encoder 20 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 30 in decoding the video data.
To perform CABAC, video encoder 20 may assign contexts within the context model to symbols to be transmitted. The context may be as to whether, for example, adjacent values of the symbol are non-zero. To perform CAVLC, video encoder 20 may select a variable length code for a symbol to be transmitted.
Codewords in a VLC may be constructed such that relatively shorter codes correspond to more probable symbols and longer codes correspond to less probable symbols. In this way, using VLC may achieve bit savings relative to, for example, using an equal length codeword for each symbol to be transmitted. The probability determination may be based on the context assigned to the symbol.
In accordance with one or more techniques of this disclosure, video encoder 20 and/or video decoder 30 may implement one or more of the techniques of this disclosure. For example, video encoder 20 and/or video decoder 30 may use affine models in motion estimation and compensation.
Fig. 2 is a block diagram depicting an example of a video encoder 20 that may be configured to perform the techniques of this disclosure with respect to motion vector prediction. Video encoder 20 may perform intra-coding and inter-coding of video blocks within a video slice. Intra-coding relies on spatial prediction to reduce or remove spatial redundancy of video within a given video frame or picture. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy of video within adjacent frames or pictures of a video sequence. Intra-mode (I-mode) may refer to any of a number of spatial-based coding modes. An inter mode, such as uni-directional prediction (P-mode) or bi-directional prediction (B-mode), may refer to any of a number of temporally based coding modes.
As shown in fig. 2, video encoder 20 receives a current video block within a video slice to be encoded. In the example of fig. 2, video encoder 20 includes mode select unit 40, reference picture memory 64, summer 50, transform processing unit 52, quantization unit 54, and entropy encoding unit 56. Mode select unit 40, in turn, includes motion compensation unit 44, motion estimation unit 42, intra-prediction unit 46, and partition unit 48. For video block reconstruction, video encoder 20 also includes an inverse quantization unit 58, an inverse transform unit 60, and a summer 62. A deblocking filter (not shown in fig. 2) may also be included in order to filter block boundaries to remove blockiness artifacts from the reconstructed video. The deblocking filter will typically filter the output of summer 62, if desired. In addition to deblocking filters, additional filters (in-loop or post-loop) may be used. Such filters are not shown for simplicity, but may filter the output of summer 50 (as an in-loop filter) if desired.
During the encoding process, video encoder 20 receives a video frame or slice to be coded. A frame or slice may be divided into a plurality of video blocks. Motion estimation unit 42 and motion compensation unit 44 perform inter-predictive coding of the received video block relative to one or more blocks in one or more reference frames to provide temporal prediction. Intra-prediction unit 46 may alternatively perform intra-predictive coding of the received video block relative to one or more neighboring blocks in the same frame or slice as the block to be coded to provide spatial prediction. Video encoder 20 may perform multiple coding passes, e.g., to select an appropriate coding mode for each block of video data.
Furthermore, partition unit 48 may partition a block of video data into sub-blocks based on evaluation of previous partition schemes in previous coding passes. For example, partition unit 48 may first partition a frame or slice into LCUs, and partition each of the LCUs into sub-CUs based on a bitrate-distortion analysis (e.g., bitrate-distortion optimization). Mode select unit 40 may further generate a quadtree data structure indicating a partitioning of the LCU into sub-CUs. Leaf-node CUs of a quadtree may include one or more PUs and one or more TUs.
Mode select unit 40 may select one of the coding modes (intra or inter), e.g., based on the error results, and provide the resulting intra-coded block or inter-coded block to summer 50 to generate residual block data and to summer 62 to reconstruct the encoded block for use as a reference frame. Mode select unit 40 also provides syntax elements, such as motion vectors, intra-mode indicators, partition information, and other such syntax information, to entropy encoding unit 56.
Motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are shown separately for conceptual purposes. Motion estimation, performed by motion estimation unit 42, is the process of generating motion vectors that estimate the motion of video blocks. For example, a motion vector may indicate the displacement of a PU of a video block within a current video frame or picture relative to a predictive block within a reference picture (or other coded unit) relative to a current block being coded within the current picture (or other coded unit). A predictive block is a block that is found to closely match the block to be coded in terms of pixel differences, which may be determined by Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), or other difference metrics. In some examples, video encoder 20 may calculate values for sub-integer pixel positions of reference pictures stored in reference picture memory 64. For example, video encoder 20 may interpolate values for a quarter-pixel position, an eighth-pixel position, or other fractional-pixel positions of a reference picture. Thus, motion estimation unit 42 may perform a motion search with respect to full pixel positions and fractional pixel positions and output motion vectors with fractional pixel precision.
Motion estimation unit 42 calculates motion vectors for PUs of video blocks in inter-coded slices by comparing the locations of the PUs to locations of predictive blocks of reference pictures. The reference picture may be selected from a first reference picture list (list 0) or a second reference picture list (list 1), each of which identifies one or more reference pictures stored in reference picture memory 64. Motion estimation unit 42 sends the calculated motion vectors to entropy encoding unit 56 and motion compensation unit 44.
The motion compensation performed by motion compensation unit 44 may involve extracting or generating a predictive block based on the motion vectors determined by motion estimation unit 42. Again, in some examples, motion estimation unit 42 and motion compensation unit 44 may be functionally integrated. Upon receiving the motion vector for the PU of the current video block, motion compensation unit 44 may locate, in one of the reference picture lists, the predictive block to which the motion vector points. Summer 50 forms a residual video block by subtracting the pixel values of the predictive block from the pixel values of the current video block being coded, forming pixel difference values, as discussed below. In general, motion estimation unit 42 performs motion estimation with respect to the luma component, and motion compensation unit 44 uses motion vectors calculated based on the luma component for both the chroma and luma components. Mode select unit 40 may also generate syntax elements associated with the video blocks and the video slice for use by video decoder 30 in decoding the video blocks of the video slice.
Video encoder 20 may be configured to perform any of the various techniques of this disclosure discussed above with respect to fig. 1, and as will be described in more detail below. For example, motion compensation unit 44 may be configured to code motion information for a block of video data using AMVP or merge mode in accordance with the techniques of this disclosure.
Assuming motion compensation unit 44 chooses to perform the merge mode, motion compensation unit 44 may form a candidate list that includes a merge candidate set. Motion compensation unit 44 may add the candidates to the candidate list based on a particular predetermined order. As discussed above, motion compensation unit 44 may also add additional candidates and perform pruning of the candidate list. Finally, mode select unit 40 may determine which candidates will be used to encode motion information for the current block and encode a merge index that represents the selected candidate.
As described above, as an alternative to inter-prediction performed by motion estimation unit 42 and motion compensation unit 44, intra-prediction unit 46 may intra-predict the current block. In particular, intra-prediction unit 46 may determine the intra-prediction mode to be used to encode the current block. In some examples, intra-prediction unit 46 may encode the current block using various intra-prediction modes, e.g., during separate encoding passes, and intra-prediction unit 46 (or mode selection unit 40 in some examples) may select an appropriate intra-prediction mode from the tested modes for use.
For example, intra-prediction unit 46 may calculate rate-distortion values using rate-distortion analysis for various tested intra-prediction modes, and select the intra-prediction mode with the best rate-distortion characteristics among the tested modes. The bit rate-distortion analysis generally determines the amount of distortion (or error) between an encoded block and an original, unencoded block, which is encoded to produce the encoded block, and the bit rate (i.e., the number of bits) used to produce the encoded block. Intra-prediction unit 46 may calculate a ratio from the distortion and bitrate of various encoded blocks to determine which intra-prediction mode exhibits the best bitrate-distortion value for the block.
After selecting the intra-prediction mode for the block, intra-prediction unit 46 may provide information to entropy encoding unit 56 indicating the selected intra-prediction mode for the block. Entropy encoding unit 56 may encode information indicating the selected intra-prediction mode. Video encoder 20 may include the following in the transmitted bitstream: configuration data, which may include a plurality of intra-prediction mode index tables and a plurality of modified intra-prediction mode index tables (also referred to as codeword mapping tables); definition of coding context of various blocks; and an indication of a most probable intra-prediction mode, an intra-prediction mode index table, and a modified intra-prediction mode index table to be used for each of the contexts.
Video encoder 20 forms a residual video block by subtracting the prediction data from mode select unit 40 from the original video block being coded. Summer 50 represents one or more components that perform this subtraction operation. Transform processing unit 52 applies a transform, such as a Discrete Cosine Transform (DCT) or a conceptually similar transform, to the residual block, producing a video block that includes residual transform coefficient values. Transform processing unit 52 may perform other transforms that are conceptually similar to DCT. Wavelet transforms, integer transforms, sub-band transforms, or other types of transforms may also be used.
In any case, transform processing unit 52 applies the transform to the residual block, producing a block of residual transform coefficients. The transform may convert the residual information from a pixel value domain to a transform domain, such as the frequency domain. Transform processing unit 52 may send the resulting transform coefficients to quantization unit 54. Quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The degree of quantization may be modified by adjusting a quantization parameter. In some examples, quantization unit 54 may then perform a scan of a matrix that includes quantized transform coefficients. Alternatively, entropy encoding unit 56 may perform scanning.
After quantization, entropy encoding unit 56 entropy codes the quantized transform coefficients. For example, entropy encoding unit 56 may perform Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), Probability Interval Partition Entropy (PIPE) coding, or another entropy coding technique. In the case of context-based entropy coding, the contexts may be based on neighboring blocks. After entropy coding by entropy coding unit 56, the encoded bitstream may be transmitted to another device (e.g., video decoder 30) or archived for later transmission or retrieval.
Inverse quantization unit 58 and inverse transform unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain, e.g., for later use as a reference block. Motion compensation unit 44 may calculate the reference block by adding the residual block to a predictive block of one of the frames of reference picture memory 64. Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for use in motion estimation. Summer 62 adds the reconstructed residual block to the motion compensated prediction block produced by motion compensation unit 44 to produce a reconstructed video block for storage in reference picture memory 64. The reconstructed video block may be used by motion estimation unit 42 and motion compensation unit 44 as a reference block to inter-code a block in a subsequent video frame.
Fig. 3 is a block diagram depicting an example of a video decoder 30 that may be configured to perform the motion vector prediction techniques of this disclosure. In the example of fig. 3, video decoder 30 includes an entropy decoding unit 70, a motion compensation unit 72, an intra prediction unit 74, an inverse quantization unit 76, an inverse transform unit 78, a reference picture memory 82, and a summer 80. In some examples, video decoder 30 may perform a decoding pass that is substantially reciprocal to the encoding pass described with respect to video encoder 20 (fig. 2). Motion compensation unit 72 may generate prediction data based on the motion vectors received from entropy decoding unit 70, while intra-prediction unit 74 may generate prediction data based on the intra-prediction mode indicator received from entropy decoding unit 70.
During the decoding process, video decoder 30 receives an encoded video bitstream representing video blocks of an encoded video slice and associated syntax elements from video encoder 20. Entropy decoding unit 70 of video decoder 30 entropy decodes the bitstream to generate quantized coefficients, motion vectors, or intra-prediction mode indicators, among other syntax elements. Entropy decoding unit 70 forwards the motion vectors and other syntax elements to motion compensation unit 72. Video decoder 30 may receive syntax elements at the video slice level and/or the video block level.
When a video slice is coded as an intra-coded (I) slice, intra-prediction processing unit 74 may generate prediction data for the video block of the current video slice based on signaling the intra-prediction mode and data from previously decoded blocks of the current frame or picture. When the video frame is coded as an inter-coded (i.e., B, P or GPB) slice, motion compensation unit 72 generates predictive blocks for the video blocks of the current video slice based on the motion vectors and other syntax elements received from entropy decoding unit 70. The predictive block may be generated by one of the reference pictures within one of the reference picture lists. Video decoder 30 may construct the reference frame list based on the reference pictures stored in reference picture memory 82 using default construction techniques: list 0 and list 1.
Motion compensation unit 72 determines prediction information for video blocks of the current video slice by parsing the motion vectors and other syntax elements and uses the prediction information to generate predictive blocks for the current video block being decoded. For example, motion compensation unit 72 uses some of the received syntax elements to determine a prediction mode (e.g., intra or inter prediction) for coding video blocks of a video slice, an inter-prediction slice type (e.g., a B-slice or a P-slice), construction information for one or more of the reference picture lists of the slice, a motion vector for each inter-coded video block of the slice, an inter-prediction state for each inter-coded video block of the slice, and other information used to decode video blocks in the current video slice.
Motion compensation unit 72 may also perform interpolation based on interpolation filters. Motion compensation unit 72 may calculate interpolated values for sub-integer pixels of the reference block using interpolation filters as used by video encoder 20 during encoding of the video block. In this case, motion compensation unit 72 may determine the interpolation filter used by video encoder 20 from the received syntax element and generate the predictive block using the interpolation filter.
Video decoder 30 may be configured to perform any of the various techniques of this disclosure discussed above with respect to fig. 1, as will be discussed in more detail below. For example, motion compensation unit 72 may be configured to determine to perform motion vector prediction using AMVP or merge mode in accordance with the techniques of this disclosure. Entropy decoding unit 70 may decode one or more syntax elements that represent how motion information is used for coding of the current block.
Assuming that the syntax element indicates that merge mode is performed, motion compensation unit 72 may form a candidate list that includes a merge candidate set. Motion compensation unit 72 may add the candidates to the candidate list based on a particular predetermined order. As discussed above, motion compensation unit 72 may also add additional candidates and perform pruning of the candidate list. Finally, motion compensation unit 72 may decode a merge index that represents which candidate is used to code the motion information for the current block.
Inverse quantization unit 76 inverse quantizes (i.e., de-quantizes) the quantized transform coefficients provided in the bitstream and entropy decoded by entropy decoding unit 70. The inverse quantization process may include using a quantization parameter QP calculated by video decoder 30 for each video block in the video sliceYTo determine the degree of quantization that should be applied and similarly to determine the degree of dequantization that should be applied.
The inverse transform unit 78 applies an inverse transform (e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients in order to generate a residual block in the pixel domain.
After motion compensation unit 72 generates the predictive block for the current video block based on the motion vector and other syntax elements, video decoder 30 forms a decoded video block by summing the residual block from inverse transform unit 78 with the corresponding predictive block generated by motion compensation unit 72. Summer 80 represents one or more components that perform this summation operation. If desired, a deblocking filter may also be applied to filter the decoded blocks in order to remove blockiness artifacts. Other loop filters may also be used (within or after the coding loop) to smooth pixel transitions, or otherwise improve video quality. The decoded video blocks in a given frame or picture are then stored in reference picture memory 82, which stores reference pictures used for subsequent motion compensation. Reference picture memory 82 also stores decoded video for later presentation on a display device, such as display device 32 of fig. 1.
Fig. 4 is a conceptual diagram depicting aspects of intra prediction. Video encoder 20 and/or video decoder 30 may implement intra-prediction to perform tile prediction by using spatially neighboring reconstructed image samples of the block. A typical example of intra prediction for a 16 x 16 tile is shown in fig. 4. As depicted in fig. 4, with intra prediction, a 16 × 16 tile (in solid squares) is predicted from the upper and left neighboring reconstructed samples (reference samples) located in the nearest upper row and left column along the selected prediction direction (as indicated by the arrow). In HEVC, for intra prediction of luma blocks, 35 modes are included.
Fig. 5 is a conceptual diagram showing an intra prediction mode for a luminance block. The modes include a planar mode, a DC mode, and 33 angular modes, as indicated in fig. 5. The 35 modes of intra prediction defined in HEVC are indexed as shown in table 1 below:
intra prediction mode CorrelationContact name
0 INTRA_PLANAR
1 INTRA_DC
2..34 INTRA_ANGULAR2..INTRA_ANGULAR34
TABLE 1-Specification of Intra prediction modes and associated names
Fig. 6 is a conceptual diagram showing an aspect of a planar mode. For planar modes, which are typically the most commonly used intra-prediction modes, the prediction samples are generated as shown in fig. 6. To perform plane prediction for an nxn block, for each sample located at (x, y), video encoder 20 and/or video decoder 30 may utilize a bilinear filter to calculate a prediction value using four particular neighboring reconstructed samples (i.e., reference samples). The four reference samples include the top right reconstructed sample TR, the bottom left reconstructed sample BL, two reconstructed samples located at the same column (rx, -1) of the current sample (denoted as T) and the same row (r-1, y) of the current sample (denoted as L). The planar mode is formulated as shown in the following equation: p is a radical ofxy=(N-x-1)·L+(N-y-1)·T+x·TR+y·BL。
For DC mode, the prediction block is simply padded with the average of the neighboring reconstructed samples. In general, both planar mode and DC mode are applied for modeling smoothly varying and constant picture areas.
Fig. 7 is a conceptual diagram depicting aspects of angular modes according to HEVC. For angular intra prediction modes in HEVC that contain 33 different prediction directions in total, the intra prediction process is described as follows. For each given angular intra prediction, the intra prediction direction may be identified accordingly.For example, according to FIG. 5, intra-mode 18 corresponds to a pure horizontal prediction direction, and intra-mode 26 corresponds to a pure vertical prediction direction given a particular intra-prediction direction, for each sample of a prediction block, the coordinates (x, y) of the sample are first projected along the prediction direction to the row/column of the neighboring reconstructed sample, as shown in the example in FIG. 7 assuming that the (x, y) pair is projected to a fractional position α between two neighboring reconstructed samples L and R, then a two-tap bilinear interpolation filter is used to calculate the prediction value for (x, y), formulated as shown in the following equation pxyTo avoid floating point operations, (1- α) · L + α · r., in HEVC, for example, p is actually usedxy=((32-a)·L+a·R+16)>>An integer of 5, where a is an integer equal to 32 x α, arithmetically approximates the above calculation.
Aspects of chroma encoding and decoding are generally described below. The structure in the chrominance signal often follows the structure of the corresponding luminance signal. As described, according to HEVC, each luma block corresponds to one chroma block, while each chroma prediction block may correspond to one or four luma prediction blocks based on a partition size of the luma prediction block equal to 2 nx 2N or nx N. With these characteristics and general trends of the chroma signal structure, HEVC provides a mechanism by which video encoder 20 may indicate to video decoder 30 that a chroma PU is predicted using the same prediction mode as a corresponding selected luma PU. Table 2 below specifies a mode arrangement that video encoder 20 may use to signal the chroma modes for the chroma PU. For example, one INTRA-coded chroma PU may be predicted using a mode selected from one of five (5) modes, including PLANAR mode (INTRA _ PLANAR), vertical mode (INTRA _ anchor 26), horizontal mode (INTRA _ anchor 10), DC mode (INTRA _ DC), and Derived Mode (DM). DM is set to an intra prediction mode for predicting the corresponding selected luma PU. For example, if the corresponding selected luma PU is coded with an intra mode with an index equal to 11, DM is set to an intra mode with an index equal to 11.
Table 2-specification of chroma intra prediction modes and associated names
If the derived mode is indicated in the encoded video bitstream to be used for a PU, video decoder 30 may perform prediction for the chroma PU using the prediction mode for the corresponding luma PU. To alleviate redundancy issues that may arise when the derived mode refers to an always-existing one of the prediction modes, video encoder 20 and video decoder 30 may use a designated alternative mode as an alternative to the repeat mode. As shown in table 2 above, video encoder 20 and video decoder 30 may use the "INTRA _ anchor 34" chroma replacement mode, also referred to as the "angle (34) mode," as a replacement to remove redundancy. For example, where the relationship between a chroma PU and a luma PU is one-to-one or many-to-one, video encoder 20 and video decoder 30 may determine the prediction mode for the chroma PU by selecting a prediction mode applicable to a single corresponding luma PU.
However, in some cases, one chroma PU may correspond to multiple luma PUs. The scenario in which a single chroma PU corresponds to multiple luma PUs is considered an exception or "special case" with respect to chroma encoding and decoding. For example, in some of these special cases, one chroma PU may correspond to four luma PUs. In the special case where the chroma-luma relationship is one-to-many, video encoder 20 and video decoder 30 may determine the prediction mode for the chroma PU by selecting the prediction mode for the corresponding upper-left luma PU.
Video encoder 20 and video decoder 30 may entropy code (entropy encode and entropy decode, respectively) data that indicates a chroma prediction mode for a block of video data. According to chroma mode coding, video encoder 20 may assign a 1-b syntax element (0) to the single most frequently occurring derived mode, while assigning a 3-b syntax element (100, 101, 110, and 111, respectively) to each of the remaining four modes. Video encoder 20 and video decoder 3 may only code the first bin by one context model and may bypass code the remaining two bins, if desired.
Video encoder 20 and video decoder 30 may entropy code (entropy encode and entropy decode, respectively) the video data according to Context Adaptive Binary Arithmetic Coding (CABAC). CABAC is an entropy coding method as follows: first introduced in H.264/AVC and described in "Context-based adaptive binary encoding in the H.264/AVC video compression standard" (IEEETranss. circuits Syst. video Technol., 7.2003, vol. 7, pp. 13 to 636) by D.Marpe, H.Schwarz and T.Wiegand. CABAC is now used in the High Efficiency Video Coding (HEVC) video coding standard. Video encoder 20 may use CABAC for entropy coding by video decoder 30 in a manner similar to CABAC performed for HEVC.
CABAC involves three main functions: binarization, context modeling, and arithmetic coding. The binarization function maps syntax elements to binary symbols (bins), referred to as binary bit strings. The context modeling function estimates the probability of a bin. An arithmetic coding function (also referred to as binary arithmetic coding) compresses the bins into bits based on the estimated probabilities.
Video encoder 20 and video decoder 30 may perform binarization for CABAC using one or more of several different binarization processes provided in HEVC. The binarization process provided in HEVC includes a unary (U), Truncated Unary (TU), exponential golomb of order k (EGk), and Fixed Length (FL) techniques. Details of these binarization processes are described in "High throughput CABAC encoding in HEVC" by v.sze and m.budagavi (ieee transactions on Circuits and Systems for Video Technology (TCSVT), 12 months 2012, 12 th, vol.12, vol.22, pages 1778 to 1791).
According to unary-based encoding, video encoder 20 may signal a string of binary bits of length N +1, where "N" represents an integer value, where the first N bins (values) are 1, and where the last bin (value) is 0. According to unary based decoding, video decoder 30 may search for a 0 value for the bin. Upon reaching the 0-value bin, video decoder 30 may determine that the syntax element is complete.
Based on the truncated unary coding, video encoder 20 may encode one less bin than in the case of unary coding. For example, video encoder 20 may set a maximum value for a maximum possible value of the syntax element. The maximum value is indicated herein by "cMax". When (N +1) < cMax, video encoder 20 may implement the same signaling as unary coding. However, when (N +1) ═ cMax, video encoder 20 may set all bins to respective values of 1. Video decoder 30 may search for a 0-valued bin until the cMax number of bins have been examined to determine when the syntax element is complete. The aspects of and comparisons between binary strings used in unary and truncated unary decoding are shown in table 3 below. The comparative binary values are shown in table 3, and are set out using bold italics.
TABLE 3 example binary string of unary and truncated unary
Video encoder 20 and video decoder 30 may also perform context modeling aspects of CABAC. Context modeling provides a relatively accurate probability estimate, which is an aspect of enabling efficient coding. Thus, context modeling is an adaptive process, and is sometimes described as "highly adaptive". Different context models may be used for different bins, where the probabilities of the context models may be updated based on the values of previously coded bins. Bins with similar distributions often share the same context model. Video encoder 20 and/or video decoder 30 may select the context model for each bin based on one or more factors including: type of syntax element, binary position in syntax element (binIdx), luma/chroma, neighbor information, etc.
Video encoder 20 and video decoder 30 may be in binary coding (binary encoding or binary decoding, as the case may be)And certainly) performs a context switch after each instance of the context switch. Video encoder 20 and video decoder 30 may store the probability model as 7-bit entries (6 bits for probability states and 1 bit for Most Probable Symbol (MPS)) in context memory, and may address the probability model using a context index computed by context selection logic. HEVC provides the same probability update method as h.264/AVC. However, HEVC-based context selection logic is modified with respect to h.264/AVC context selection logic to improve throughput. Video encoder 20 and video decoder 30 may also use the probability representations for CABAC entropy encoding and decoding, respectively. For CABAC, 64 representative probability values pσ∈[0.01875,0.5]Is derived for the Least Probable Symbol (LPS) by the following recursive equation:
pσ=α*pσ-1for all σ -1, …,63,
wherein
The parameters used in the above equation have demonstrated a relatively good compromise between probability representation accuracy and faster adaptation needs.the probability of MPS equals 1 minus the probability of LPS (i.e., (1-LPS)). thus, the probability range that can be represented by CABAC is [0.01875,0.98125 ]. the upper limit (MPS probability) of the range equals one minus the lower limit (i.e., one minus the LPS probability), i.e., 1-0.01875 0.98125.
Prior to encoding or decoding a particular slice, video encoder 20 and video decoder 30 may initialize the probability model based on some predefined values. For example, given an input quantization parameter indicated by "qp" and a predefined value indicated by "initVal", video encoder 20 and/or video decoder 30 may derive a 7-bit entry for a probability model (indicated by "state" and "MPS") as follows:
qp=Clip3(0,51,qp);
slope=(initVal>>4)*5-45;
offset=((initVal&15)<<3)-16;
initState=min(max(1,(((slope*qp)>>4)+offset)),126);
MPS=(initState>=64);
state index=((mpState?(initState-64):(63-initState))<<1)+MPS;
the derived state index implicitly contains MPS information. That is, when the state index is an even value, the MPS value is equal to 0. Conversely, when the state index is an odd value, the MPS value is equal to 1. The value of "initVal" is in the range [0,255] with 8-bit precision.
The predefined initVal is slice dependent. That is, video encoder 20 may use three sets of context initialization parameters for probability models that are specifically used for the coding of I-slices, P-slices, and B-slices, respectively. In this way, video encoder 20 is enabled to select between three initialization tables for these three slice types, such that a better fit to different coding scenarios and/or different types of video content may potentially be achieved.
Recent advances in JEM3.0 include developments relating to intra-mode coding. According to these recent developments of JEM3.0, video encoder 20 and video decoder 30 may perform intra-mode coding with 6 Most Probable Modes (MPMs). As described in "Neighbor based intra most basic models derivative" (jmet-C0055, geneva, 2016, 5 months) of v.seregin, x.zhao, a.said, m.karczewicz, 33 angular modes in HEVC have been extended to 65 angular modes, plus DC and planar modes with 6 Most Probable Modes (MPM). Video encoder 20 may encode a one-bit flag (e.g., an "MPM flag") to indicate whether the intra luma mode is included in an MPM candidate list that includes 6 modes (as described in jfet-C0055 referenced above). If the intra-luma mode is included in the MPM candidate list (thereby causing video encoder 20 to set the MPM flag to a positive value), video encoder 20 may further encode and signal an index of the MPM candidates to indicate which MPM candidate in the list is the intra-luma mode. Otherwise (i.e., if video encoder 20 sets the MPM flag to a negative value), video encoder 20 may further signal the index of the remaining intra luma mode.
According to these aspects of JEM3.0 advancement, video decoder 30 may decode the MPM flag upon receiving the signaled encoded video bitstream to determine whether the intra luma mode is included in the MPM candidate list. If video decoder 30 determines that the MPM flag is set to a positive value, video decoder 30 may decode the received index to identify the intra luma mode from the MPM candidate list. Conversely, if video decoder 30 determines that the MPM flag is set to a negative value, video decoder 30 may receive and decode the indices of the remaining intra luma modes.
Recent JEM3.0 advances have also been implemented with respect to adaptive multi-core transformations. In addition to being used for DCT-II and 4 x 4DST-VII in HEVC, adaptive multi-transform (AMT) schemes are also used for residual coding of both inter-coded blocks and intra-coded blocks. AMT utilizes a number of selected transforms from the DCT/DST family in addition to the transforms currently defined by HEVC. The newly introduced transformation matrices of JEM3.0 are DST-VII, DCT-VIII, DST-I and DCT-V.
For intra-residual coding, video encoder 20 and video decoder 30 may use a mode-dependent transform candidate selection process due to different residual statistics for different intra-prediction modes. Three transform subsets have been defined as shown in table 4 below, and video encoder 20 and/or video decoder 30 may select a transform subset based on an intra-prediction mode, as specified in table 5 below.
Table 4: three predefined transformation candidate sets
Table 5: selected sets of horizontal (H) and vertical (V) transforms for each intra-prediction mode
According to the subset concept, video decoder 30 may first identify a transform subset based on table 6 below. For example, to identify a transform subset, video decoder 30 may use an intra-prediction mode for a CU, which is signaled with the CU-level AMT flag set to a value of 1. Subsequently, for each of the horizontal and vertical transforms, video decoder 30 may select one of the two transform candidates in the identified transform subset according to table 7 below. The selected transform candidates for each of the horizontal and vertical transforms are selected based on explicitly signaling data with flags. However, for inter-prediction residuals, video decoder 30 may use only one set of transforms, consisting of DST-VII and DCT-VIII, for all inter modes and for both horizontal and vertical transforms.
TABLE 6-Specification of chroma Intra prediction modes and associated names
TABLE 7-binary string for each chroma mode
Recent JEM3.0 progress has been achieved with respect to LM (Linear model) prediction modes for video coding. The video coding devices of this disclosure, such as video encoder 20 and video decoder 30, may handle aspects of color space and color format when video encoding and video decoding. Color video plays a major role in multimedia systems, where various color spaces are used to efficiently represent colors. The color space specifies a color using a plurality of components with a numerical value. A common color space is the "RGB" color space, where colors are represented as combinations of three primary color component values (i.e., red, green, and blue). For color video compression, the YCbCr color space has been widely used, as described in "color space transforms" by a.ford and a.roberts (tech.rep, london, university of wissmith, 8 months 1998). YCbCr can be converted from the RGB color space relatively easily via a linear transformation. In RGB to YCbCr conversion, the redundancy between different components (i.e., cross-component redundancy) is significantly reduced in the resulting YCbCr color space.
One advantage of YCbCr is backward compatibility with black and white TV, since the Y signal conveys luminance information. In addition, chroma bandwidth may be reduced by subsampling the Cb and Cr components in a 4:2:0 chroma sampling format, with significantly less subjective impact than subsampling in RGB. Because of these advantages, YCbCr is already the dominant color space in video compression. There are also other color spaces available for video compression, such as YCoCg. For purposes of illustration, the Y, Cb, Cr signals are used throughout this disclosure to represent the three color components in a video compression scheme, regardless of the actual color space used. In 4:2:0 sampling, the height and width of each of the two chroma arrays (Cb and Cr) are half that of the luma array (Y).
Fig. 8 is a conceptual diagram showing an example of nominal vertical and horizontal position luma and chroma samples in a picture. The nominal vertical and horizontal relative positions of the luma and chroma samples in the picture are shown in fig. 8, substantially corresponding to the positions as provided by the 4:2:0 sampling format.
Aspects of LM prediction mode for video coding will be discussed in the following paragraphs. Although the cross-component redundancy is significantly reduced in the YCbCr color space, the correlation between the three color components still exists in the YCbCr color space. Various techniques have been investigated to improve video coding performance by further reducing the correlation between color components. With respect to 4:2:0 chroma video coding, Linear Model (LM) prediction modes were studied during HEVC standard development. Aspects of LM prediction modes are described in J.Chen, V.Seregin, W.J.Han, J.S.Kim, and B.M.Joen, "CE6.a.4: Chroma interpretation by reconstructed luma samples" (ITU-T SG16WP3 and the Video Coding Joint collaboration Team (JCT-VC) of ISO/IEC JTC1/SC29/WG 11), JCTVC-E266, 5 th conference, Nintewa, 2011, 16.16-23 months).
When performing prediction according to the LM prediction mode, video encoder 20 and video decoder 30 may predict chroma samples based on the downsampled reconstructed luma samples of the same block by using the linear model shown in equation (1) below.
predC(i,j)=α·recL(i,j)+β (1)
Wherein predC(i, j) denotes prediction of chroma samples in a block and recLParameters α and β are derived from causal reconstructed samples surrounding the current block.
FIG. 9 is a conceptual diagram showing the locations of samples used to derive parameters used in prediction according to the Linear Model (LM) mode. the example of selected reference samples depicted in FIG. 9 is for the derivation of α and β as used in equation (1) above if the chroma block size is represented by N (where N is an integer), then both i and j are within the range [0, N ].
Video encoder 20 and video decoder 30 may derive parameters α and β in equation (1) by reducing or potentially minimizing the regression error between neighboring reconstructed luma samples and chroma samples around the current block according to equation (2) below.
The parameters α and β are solved as described below.
β=(∑yi-α·∑xi)/I (4)
Wherein xiRepresenting the downsampled reconstructed luma reference sample, yiRepresents the reconstructed chroma reference samples and I represents the amount (e.g., count) of the reference samples. For a target nxn chroma block, the total number of samples involved (I) is equal to 2N when both left and upper causal samples are available. When only the left or upper causal sample is available, the total number of samples involved (I) is equal to N.
In general, when applying LM prediction mode, video encoder 20 and/or video decoder 30 may invoke the following steps in the order listed below:
a) reducing the sampled neighboring luma samples;
b) deriving linear parameters (i.e., α and β), and
c) the current luma block is downsampled and a prediction is derived from the downsampled luma block and the linear parameters.
To further improve coding efficiency, video encoder 20 and/or video decoder 30 may utilize downsampling filters (1,2,1) and (1,1) to derive neighboring samples x within the corresponding luma blockiAnd downsampled luma samples recL(i,j)。
Recent JEM3.0 progress has also been achieved with respect to prediction between chroma components. In JEM, the LM prediction mode is extended to prediction between two chroma components. For example, the Cr component may be predicted from the Cb component. Instead of using the reconstructed sample signal, video encoder 20 and/or video decoder 30 may apply cross-component prediction in the residual domain. For example, video encoder 20 and/or video decoder 30 may implement residual domain application of cross-component prediction by adding the weighted reconstructed Cb residual to the original Cr intra prediction, forming a final Cr prediction. An example of this operation is shown in equation (3) below:
video encoder 20 and/or video decoder 30 may derive scale factor α, as derived in the LM mode, however, one difference is that the regression cost is increased relative to the default α value in the error function, such that the derived scale factor is biased toward the default value (-0.5.) the LM prediction mode is added as one additional chroma intra prediction mode.
Aspects of binary Quadtree (QTBT) structure are described in the following paragraphs. In VCEG proposed COM16-C966 ("Block partitioning for next generation video coding" (International telecommunications union, COM16-C966, 9 months 2015), j.an, y. -W Chen, k.zhang, h.huang, y. -w.huang and s.lei, QTBT partitioning schemes are proposed for future video coding standards beyond HEVC. Simulations showing the QTBT structure proposed in COM16-C966 are more efficient than the quadtree structure used in HEVC. In the proposed QTBT structure of COM16-C966, a Coding Tree Block (CTB) is first partitioned according to a quadtree structure, where quadtree splitting of a node may iterate until the node reaches the minimum allowed quadtree leaf node size (MinQTSize).
According to the QTBT structure, the quadtree leaf nodes may be further partitioned according to the binary tree structure if the quadtree leaf node size is not greater than a maximum allowed binary tree root node size (MaxBTSize). Binary tree splitting for a given node may be iterated until the node reaches a minimum allowed binary tree leaf node size (MinBTSize), or until iterative splitting reaches a maximum allowed binary tree depth (MaxBTDepth). The binary tree leaf nodes are CUs that can be used for prediction (e.g., intra-picture or inter-picture prediction) and transform without any further partitioning.
According to the binary tree split, video encoder 20 and/or video decoder 30 may implement two split types, namely, a symmetric horizontal split and a symmetric vertical split. In one example of a QTBT partitioning structure, the CTU size is set to 128 × 128 (i.e., 128 × 128 luma samples and two corresponding 64 × 64 chroma samples), MinQTSize is set to 16 × 16, MaxBTSize is set to 64 × 64, MinBTSize (for both width and height) is set to 4, and MaxBTDepth is set to 4. Video encoder 20 and/or video decoder 30 may first apply the quadtree partitioning portion of the QTBT scheme to the CTUs to generate quadtree leaf nodes. The quad tree leaf nodes may have sizes from 16 × 16 (i.e., MinQTSize) to 128 × 128 (i.e., CTU size).
If the leaf quadtree node is 128 x 128, video encoder 20 and/or video decoder 30 may not be able to further split the leaf quadtree node using the binary tree portion of the QTBT scheme because the node size exceeds MaxBTSize (in this case, 64 x 64). In other aspects (i.e., if the node size does not exceed a 64 × 64 MaxBTSize), video encoder 20 and/or video decoder 30 may further partition the leaf quadtree nodes using a binary tree partition portion of the QTBT structure. Thus, the quadtree leaf nodes are also the root nodes of the binary tree portion of the QTBT scheme, and thus have a binary tree depth of 0. When the binary tree partitioning is repeated to achieve a binary tree depth of MaxBTDepth (i.e., 4), video encoder 20 and/or video decoder 30 do not perform any sort of further splitting with respect to the leaf nodes. When the binary tree portion of the QTBT scheme produces binary tree nodes having a width equal to MinBTSize (i.e., 4), video encoder 20 and/or video decoder 30 may not perform further horizontal splitting of the nodes. Similarly, when the binary tree portion of the QTBT scheme produces binary tree nodes having a height equal to MinBTSize (i.e., 4), video encoder 20 and/or video decoder 30 may not perform further vertical splitting of the nodes. The leaf nodes of the binary tree portion of the QTBT scheme (in case the partitioning completely reaches the binary tree partitioning), i.e. CUs that are further processed by prediction and transformation without any other partitioning.
FIG. 10 is a conceptual diagram depicting aspects of a QTBT segmentation scheme. The block diagram on the left side of fig. 10 shows an example of partitioning a block 162 according to a QTBT partitioning structure. The quadtree partitioning aspect of the QTBT partitioning scheme is depicted using solid lines in block 162, while the binary tree partitioning aspect of the QTBT partitioning scheme is depicted using dashed lines in block 162. Block 162 is split into square leaf nodes if only the quadtree portion of the QTBT scheme is invoked, and into non-square rectangular leaf nodes in any case if the binary tree portion of the QTBT scheme is invoked (whether or not it is invoked in combination with the quadtree splitting portion). In contrast to the partitioning technique of HEVC, where multiple transforms are possible, the QTBT partitioning scheme provides a system whereby the PU size is always equal to the CU size.
The schematic diagram on the right side of fig. 10 depicts tree structure 164. The tree structure 164 is the corresponding tree structure for the partition depicted with respect to block 162 in FIG. 10. Also in the case of the tree structure 164, with support of the QTBT segmentation scheme of fig. 10, the solid lines indicate quadtree splitting and the dashed lines indicate binary tree splitting. For each split (i.e., non-leaf) node using the binary tree portion depicted by the dashed lines in tree structure 164, video encoder 20 may signal a respective one-bit flag to indicate which split type (i.e., horizontal or vertical) is used. According to some implementations of QTBT segmentation, video encoder 20 may set the flag to a value of zero (0) to indicate horizontal splitting and to a value of one (1) to indicate vertical splitting. It should be appreciated that for the quadtree split portion of the QTBT partitioning structure, there is no need to indicate the split type, since quadtree splitting always splits a block horizontally and vertically into 4 sub-blocks of equal size.
Fig. 11A and 11B show examples of independent segmentation structures for corresponding luma and chroma blocks according to a QTBT segmentation scheme. QTBT block segmentation techniques permit and support features of corresponding luma and chroma blocks with independent QTBT-based segmentation structures. According to the QTBT segmentation scheme, for P-slices and B-slices, corresponding luma CTUs and chroma CTUs in one CTU share the same QTBT-based segmentation structure. However, for I-slices, luma CTUs may be partitioned into CUs according to a first QTBT-based partitioning structure and chroma CTUs are partitioned into chroma CUs according to a second QTBT-based partitioning structure, which may or may not be different from the first QTBT-based partitioning structure. Thus, a CU in an I-slice may consist of a coding block for a luma component or coding blocks for two chroma components, while for CUs in P-slices and B-slices, a CU may consist of coding blocks for all three color components.
The independent tree structure supported by QTBT for I-slice includes aspects related to chroma coding. For example, JEM allows six (6) chroma modes per PU. Usage indication of DM mode: video encoder 20 and/or video decoder 30 use the same prediction mode for chroma PUs as for corresponding luma PUs. As described above, for an I-slice, the QTBT-based segmentation structure for luma blocks and corresponding chroma may be different. Thus, when DM mode is used in an I-slice, video encoder 20 and/or video decoder 30 may inherit the luma prediction mode of the PU covering the top-left position to perform prediction for the chroma PU. In contrast to the partitioning technique of HEVC, where luma blocks and their corresponding chroma blocks always share the same tree structure, the JEM3.0 based QTBT partitioning permits possible differences between the luma tree structure and the chroma tree structure as shown in fig. 11A and 11B.
Fig. 11A and 11B show an example of a QTBT segmentation structure for one CTU in an I-slice. Fig. 11A shows a luminance block 172 in which the left partition 174 is called out using the upper and lower title numbers. Fig. 11B shows the corresponding chroma block 176, where the left partition 178 is called out using the upper and lower title numbers. The respective left partitions 174 and 178 include more finely divided partitions, as shown in fig. 11A and 11B. L (i) (where "i" represents the respective integer value depicted within the respective partition) indicates that: the luma intra prediction mode for the corresponding partition has an index equal to i. In the example shown in fig. 11A and 11B, video encoder 20 and/or video decoder 30 may encode/decode the left partition of chroma block 176 according to the DM mode. Accordingly, video encoder 20 and/or video decoder 30 may select LM mode to predict left partition 178 of chroma block 176 from the top-left corresponding luma block partition. In the use case depicted in fig. 11A and 11B, video encoder 20 and/or video decoder 30 may select an intra-prediction mode with an index equal to 1 to encode/decode left partition 178 of chroma block 176 because "i" has a value of 1 in the upper left partition of luma block 172.
Table 7 above specifies a mode arrangement that video encoder 20 may use to signal chroma modes. To remove possible redundancy in chroma mode signaling that may occur when Derived Mode (DM) refers to one of the always present modes, video encoder 20 may use an angular (66 when there are a total of 67 intra modes) mode instead of a repeating mode as shown in table 7.1 below. In the use case scenario depicted in table 7.1 below, the ANGULAR mode (denoted INTRA _ ANGULAR66) is referred to as the "alternative mode".
Table 7.1-specification of chroma intra prediction modes and associated names
As discussed above, video encoder 20 and video decoder 30 may perform entropy coding of chroma prediction modes. In chroma-mode coding, the 1-b syntax element (0) is assigned to the most frequently occurring derived mode, two bins (10) are assigned to the LM mode, and the 4-b syntax elements (1100, 1101, 1110, 1111) are assigned to the remaining four modes. The first two bins are coded with one context model and the remaining two bins (if needed) may be bypass coded.
TABLE 7.2 binary string for each chroma mode
The techniques of this disclosure are directed to improving the performance of the various techniques discussed above. As described above, JEM3.0 supports separate tree structures for chroma block partitioning and luma block partitioning for the same CTU. However, one chrominance PU may correspond to a plurality of luminance PUs. Inheriting only one of the luma intra prediction modes from multiple luma PUs used for chroma coding according to the QTBT partitioning aspect of JEM3.0 may provide sub-optimal results, which may be improved or possibly optimized by the various techniques of this disclosure. Additionally, for a given PU in JEM, the total number of possible chroma modes is six (6). However, for luma coding, the total number of possible modes is sixty-seven (67). The various techniques of this disclosure may improve coding efficiency by increasing the total number of chroma modes.
Various techniques of the present invention are set forth below in a detailed list. It should be understood that video encoder 20 and/or video decoder 30 may apply the various techniques discussed below, individually or in various combinations of two or more of the described techniques. Although described as being performed by video encoder 20 and/or video decoder 30, it should be understood that one or more components of video encoder 20 shown in fig. 2 and/or one or more components of video decoder 30 shown in fig. 3 may perform the various techniques of this disclosure.
The following description refers to the size of one chroma block as W × H (where "W" is the width of the chroma block and "H" is the height of the chroma block). The position of the top left pixel in the chroma block relative to the entire slice is represented by the tuple (x, y), where "x" and "y" are the horizontal and vertical offsets, respectively. The luminance block corresponding to a given chroma block has a size equal to 2W x 2H (for a 4:2:0 color format) or W x H (for a 4:4:4 color format). The position of the top left pixel in the corresponding luminance block relative to the entire slice is represented by the tuple (2x,2y) (for 4:2:0) or (x, y) (for 4:4: 4). The examples given below are described with respect to a 4:2:0 color format. It should be appreciated that the techniques described herein may also be extended to other color formats.
According to certain aspects of this disclosure, multiple DM modes may be added with respect to chroma coding, thereby increasing the number of available chroma encoding and decoding modes (from luma blocks) that are available for use by video encoder 20 and video decoder 30. That is, according to these aspects of this disclosure, video encoder 20 and video decoder 30 may have more DM options than a single option to inherit the coding mode for the corresponding luma block. For example, in accordance with the techniques of this disclosure, video encoder 20 and/or video decoder 30 may generate a candidate list containing DM intra-prediction modes for chroma blocks based on the intra-prediction mode used in the corresponding luma block. Although coding and bandwidth efficiency is preserved by maintaining the same total number of possible chroma modes in the DM candidate list, the techniques of this disclosure for applying multiple DMs provide potential precision enhancements because the DMs provide better accuracy than the default modes used in the prior art.
In this example, video encoder 20 may signal chroma modes as currently set forth in JEM 3.0. However, if video encoder 20 selects the DM mode for chroma coding of a chroma block, video encoder 20 may implement additional signaling. More specifically, according to this example, video encoder 20 may encode and signal a flag indicating that the DM mode is selected for encoding of the chroma block. Based on the chroma block having been encoded in DM mode, video encoder 20 may encode and signal an index value to indicate which mode of the candidate list is used as the DM mode. Based on the size of the candidate list, video encoder 20 may encode and signal an index value between zero (0) and five (5). That is, video encoder 20 may generate a candidate list for chroma prediction mode that includes a total of six (6) candidates, i.e., resulting in a candidate list size of six (6).
Based on receiving the flag set to the value indicating that the encoded chroma block is encoded using the DM mode, video decoder 30 may determine that the decoding mode for the chroma block is included in the candidate list. Subsequently, video decoder 30 may receive and decode an index that identifies an entry in the chroma mode candidate list. Based on the flag indicating that the encoded chroma block is encoded using the DM mode, and using the received index value for the encoded chroma block, video decoder 30 may select a particular mode from a chroma mode candidate list for decoding the chroma block. In this way, in examples where the DM mode is selected for coding chroma blocks, video encoder 20 and video decoder 30 may increase the number of candidate modes available for encoding and decoding chroma blocks. Based on the size of the candidate list, video decoder 30 may decode index values between zero (0) and five (5). That is, video decoder 30 may generate a candidate list for chroma prediction mode that includes a total of six (6) candidates, i.e., resulting in a candidate list size of six (6).
In some examples, video encoder 20 may first encode and signal a flag indicating whether the chroma block is encoded in Linear Model (LM) mode. In these examples, video encoder 20 may follow the data indicating all DM candidates in the candidate list with a signaling flag (to indicate whether the chroma block is LM encoded). According to this implementation, video decoder 30 may receive an encoded flag in the encoded video bitstream that indicates whether a chroma block is encoded in LM mode. Video decoder 30 may parse data indicating all DM candidates in the candidate list from a location beginning after the LM flag in the encoded video bitstream. It will thus be appreciated that, according to various examples of this disclosure, video decoder 30 may construct the DM candidate list, or alternatively, the entire DM candidate list may be received in the encoded video bitstream. In either scenario, video decoder 30 may select an appropriate DM mode from the candidate list using the signaling index.
Video encoder 20 may also perform a reduction of DM with respect to the DM candidate list. That is, video encoder 20 may determine whether two of the DMs included in the list are the same. If video encoder 20 determines that multiple instances of a single DM (i.e., multiple identical DMs) are included in the candidate list, video encoder 20 may remove the redundancy by removing all other instances of the same DM. That is, video encoder 20 may prune the list such that only one instance of this same DM remains in the candidate list.
In some examples of the DM candidate list-based techniques of this disclosure, video encoder 20 may prune the DM candidates in the candidate list for one or more of the default modes. According to the refinement techniques of this disclosure, if video encoder 20 determines that one of the default modes (e.g., the kth mode in the default mode list) is the same as one of the DM modes in the DM candidate list, video encoder 20 may replace this DM mode in the candidate list with an alternate mode. In addition to replacing the compacted DM mode in the candidate list, video encoder 20 may set the alternative mode to a mode having an index equal to a value of ((maximum intra mode index) -1-K). In some implementations in which video encoder 20 signals data indicating all DM modes included in the candidate list, video encoder 20 may signal data reflecting the reduced DM candidate list.
In some examples where video decoder 30 also performs DM candidate list construction, video decoder 30 may also perform thinning to complete the DM candidate list. For example, if video decoder 30 determines that one of the default modes (e.g., the K-th mode in the default mode list) is the same as one of the DM modes in the DM candidate list, video decoder 30 may replace this DM mode in the candidate list with an alternate mode. In addition to replacing the reduced DM mode in the candidate list, video decoder 30 may set the alternative mode to a mode having an index equal to a value of ((maximum intra mode index) -1-K).
By implementing one or more of the DM candidate list-based techniques described above, video encoder 20 and video decoder 30 may increase the number of possible chroma prediction modes. The increased number of chroma modes available via the DM candidate list-based techniques described above may improve coding efficiency while maintaining precision. As described above, in various examples, video decoder 30 may receive the entire DM candidate list via the encoded video bitstream. Or alternatively, a DM candidate list may be constructed and the prediction mode selected from the DM candidate list for the chroma block using the used signaling index. Because video decoder 30 may receive the explicitly signaled DM candidate list, or alternatively construct the DM candidate list, various DM candidate list-based techniques are described herein as being performed by video encoder 20 and optionally video decoder 30.
In some examples, video encoder 20 may fix the size of the DM candidate list (i.e., the total number of candidates included in the DM candidate list) within a particular range, such as within a pattern block, within a slice, within a picture, or within a sequence. In some such examples, if video decoder 30 is configured to construct a DM candidate list and use the signaled index to select candidates, video decoder 30 may also fix the size of the DM candidate list (i.e., the total number of candidates included in the DM candidate list) within a particular range, such as within a pattern block, within a slice, within a picture, or within a sequence.
In some examples, video encoder 20 may signal the size of the candidate list in a data structure containing metadata, which may be signaled out-of-band with respect to corresponding encoded video data. As some non-limiting examples, video encoder 20 may signal the size of the candidate list in any of a slice header, a Picture Parameter Set (PPS), or a Sequence Parameter Set (SPS). According to some examples, video encoder 20 (and optionally video decoder 30) may be configured to predefine the size of the candidate list such that the size of the candidate list is the same for all block sizes. Alternatively, video encoder 20 (and optionally video decoder 30) may be configured to predefine the size of the candidate list such that the size of the candidate list varies depending on the size of the block.
According to some examples, video encoder 20 (and optionally video decoder 30) may construct the DM candidate list to include (e.g., contain) at most three portions. In these examples, the three parts of the DM candidate list include each of: (i) a first portion including candidates for luma intra prediction modes associated with particular locations relative to corresponding luma blocks; (ii) a second portion that includes candidates derived from a function of all luma blocks within the corresponding luma block, e.g., the most commonly used luma intra prediction modes as described in one example above; and (iii) a third portion comprising candidates derived from the selected luma intra prediction mode with a particular offset of the mode index.
In one example, video encoder 20 (and optionally video decoder 30) may insert the candidates from the first two portions into the DM candidate list in order until the total number of candidates is equal to the predefined list size (i.e., the predefined total number of DM modes). After performing the refinement process with respect to the modes included in the DM candidate list, video encoder 20 (and optionally video decoder 30) may insert a candidate from the third portion of the list if the size of the candidate list is still less than the predefined total number of DM modes. In one such example, video encoder 20 (and optionally video decoder 30) may insert candidates from three portions (or two portions, depending on the result of the refinement) into a candidate list in the order of a first portion, followed by a second portion, followed by a third portion. In another alternative example, video encoder 20 (and optionally video decoder 30) may insert candidates from the second portion before candidates from the first portion. In yet another alternative example, video encoder 20 (and optionally video decoder 30) may insert candidates from the second portion among candidates from the first portion (e.g., by interleaving or interleaving candidates for the first and second portions).
According to some examples, the candidates of the first portion of the DM candidate list are modes for coding the corresponding luma block that are inherited from a particular location. For example, the first portion of the candidate list may include patterns that inherit from the following positions in the corresponding luma block: a center position, an upper left position, an upper right position, a lower left position, and a lower right position. That is, in this example, the first portion of the candidate list may include a pattern inherited from four corners of the corresponding luma block. In one such example, video encoder 20 (and optionally video decoder 30) may insert the modes inherited from the four corner positions of the corresponding luma block into the DM candidate list in the following order: center, upper left, upper right, lower left and lower right. In another such example, video encoder 20 (and optionally video decoder 30) may insert the modes inherited from the four corner positions of the corresponding luma block into the DM candidate list in the following order: center, upper left, lower right, lower left and upper right. In other examples, the order may vary, and it should be understood that the order described above is a non-limiting example.
In one example, video encoder 20 (and optionally video decoder 30) may form a first portion of the DM candidate list to include intra-prediction modes for all positions of the corresponding luma block. In this example, the second portion may become unnecessary because the first portion includes all intra-prediction modes for the corresponding luma block. In addition, video encoder 20 (and optionally video decoder 30) may traverse all cells within the corresponding luma block in some order. Alternatively or additionally, video encoder 20 (and optionally video decoder 30) may add additional modes to the DM candidate list in an order based on a reduced number of occurrences within the corresponding luma block.
In one example, video encoder 20 (and optionally video decoder 30) may apply an offset to the previous candidate or candidates that have been inserted into the list in order to form the third portion. In addition, video encoder 20 (and optionally video decoder 30) may further apply or perform the refinement of the inserted candidates when forming the third portion. In one alternative example, video encoder 20 (and optionally video decoder 30) may form the third portion to include one or more intra chroma modes from neighboring blocks.
In accordance with some implementations of the techniques described herein, video encoder 20 (and optionally video decoder 30) may adaptively change the size of the candidate list from CU to CU or from PU to PU or from TU to TU. In one example, video encoder 20 (and optionally video decoder 30) may only add candidates from the first portion, as described with respect to the three-portion DM candidate list formation implementation. Alternatively, video encoder 20 (and optionally video decoder 30) may only add candidates from the first portion and the second portion to the DM candidate list. In some examples, video encoder 20 (and optionally video decoder 30) may perform refinement to remove the same intra-prediction modes. And III.
In the example where video encoder 20 prunes the DM candidate list, video encoder 20 may not signal the DM index if the number of candidates in the final pruned DM candidate list is equal to 1. In some examples, video encoder 20 (and optionally video decoder 30) may binarize the DM index values within the DM candidate list using truncated unary binarization. Alternatively, video encoder 20 (and optionally video decoder 30) may binarize the DM index values within the DM candidate list using unary binarization.
In some examples, video encoder 20 (and optionally video decoder 30) may set the context model index equal to the bin index. Alternatively, the total number of context models used to code the DM index values may be less than the maximum number of candidates. In this case, video encoder 20 may set the context model index setting equal to min (K, bin index), where K represents a positive integer. Alternatively, video encoder 20 may encode only the first few bins with the context model and may be used to bypass mode encode the remaining bins. In this example, video decoder 30 may decode only the first few bins with the context model and may decode the remaining bins with the bypass mode.
Alternatively, video encoder 20 (and optionally video decoder 30) may determine the number of context-coded bins depending on the total number of DM candidates or one or more of the CU, PU, or TU sizes. Alternatively, for the first M bins (e.g., M equals 1), the context modeling may further depend on the total number of DM candidates in the final (e.g., reduced) DM candidate list or the CU/PU/TU size or split information of the corresponding luma block.
In some examples, video encoder 20 (and optionally video decoder 30) may further reorder the candidates in the candidate list prior to binarization. In one example, when the width of a CU/PU/TU is greater than the height of the CU/PU/TU, the reordering may be based on an intra prediction mode index difference between the actual intra mode and the horizontal intra prediction mode for the candidate. The smaller the difference, the smaller will be the index assigned to be assigned to a candidate in the DM candidate list. In another example, when the height of a CU/PU/TU is greater than the width of the CU/PU/TU, the reordering may be based on an intra prediction mode index difference between the actual intra mode and the vertical intra prediction mode for the candidate. Also in this example, the smaller the difference, the smaller the index will be assigned for the candidate in the DM candidate list.
Alternatively, further, video encoder 20 (and optionally video decoder 30) may perform the reduction of all DM candidates in the list with respect to the default mode. If one of the default modes, e.g., the kth mode in the default mode list, is the same as one of the DM modes in the DM candidate list, video encoder 20 (and optionally video decoder 30) may replace this DM mode in the DM candidate list with an alternate mode. In addition to replacing the compacted DM modes in the candidate list, video encoder 20 (and optionally video decoder 30) may set the alternative mode to a mode having an index equal to the value of ((maximum intra mode index) -1-K).
According to some techniques of this disclosure, video encoder 20 and video decoder 30 may unify luma and chroma intra prediction modes. That is, for each chroma block, video encoder 20 and/or video decoder 30 may select a prediction mode from a pool of available luma prediction modes, in addition to a Linear Model (LM) mode and other modes specific to coding chroma components. The pool of available luma prediction modes is described herein as including a total of "N" prediction modes, where "N" represents a positive integer value. In some examples, the value of "N" is equal to sixteen-seven (67), corresponding to 67 different available luma prediction modes.
In addition, with respect to encoding and signaling of chroma intra prediction modes, video encoder 20 may also signal a Most Probable Mode (MPM) flag, and signal an MPM index (an index corresponding to an MPM candidate in an MPM candidate list) depending on a value of the MPM flag. For example, video encoder 20 may construct the MPM candidate list by first adding one or more DM modes for chroma blocks to the MPM candidate list. As described above, video encoder 20 may identify a plurality of DM modes for chroma blocks. It should be appreciated, however, that in some scenarios, video encoder 20 may identify a single DM mode for a chroma block. After adding the DM mode to the MPM candidate list, video encoder 20 may add other chroma modes from neighboring blocks to the MPM candidate list. Alternatively or additionally, video encoder 20 may add a default mode, such as by using the luma MPM candidate list construction process described in "Neighbor based intra coded modes list derivative" (jfet-C0055, geneva, 2016 5 months (hereinafter, "Seregin") of v.
Alternatively, video encoder 20 may construct the chroma MPM candidate list in the same manner as for the luma mode MPM candidate list. For example, video encoder 20 may check a number of neighboring blocks in the order described in Seregin. In these implementations, video encoder 20 may process LM mode and/or other chroma specific intra prediction modes in the same manner that video encoder 20 processes other intra prediction modes. Furthermore, video encoder 20 may prune the MPM candidate list to remove redundancy resulting from the addition of the same intra-prediction mode from multiple sources.
In one example, video encoder 20 may first signal a flag to indicate the use of one or more chroma specific modes that apply only to chroma components (e.g., LM modes and/or other prediction modes used only to code chroma components). Video encoder 20 may further signal an MPM flag if the selected prediction mode is not chroma specific mode (i.e., video encoder 20 sets the above flag to a disabled state). In this example implementation, video encoder 20 may not consider a chroma specific mode (e.g., LM mode) when adding chroma prediction modes inherited from neighboring blocks to the MPM list, in which case the chroma specific mode is taken from the neighboring blocks.
Example use cases for this implementation are described below. Video encoder 20 may use the LM mode to intra-predict the chroma block and may therefore signal the LM flag set to the enabled state. Based on the chroma block having been encoded using the LM prediction mode, video encoder 20 may signal an MPM index that indicates a position within an MPM candidate list for the chroma block. This example use case illustrates that video encoder 20 may use a one-bit flag to first provide video decoder 30 with an indication of whether the prediction mode for the chroma block is at all a candidate in the MPM candidate list. If and only if the prediction mode for the chroma block is a candidate from the MPM candidate list, video encoder 20 may signal an index to indicate to video decoder 30 which mode of the MPM candidate list is to be used to predict the chroma block. In this way, video encoder 20 may save bandwidth by first using a one-bit flag, then determining whether to signal the index value at all based on the value of the flag.
The decoder side aspects of the above techniques are discussed below. Video decoder 30 may receive the MPM flag in the encoded video bitstream. Video decoder 30 may also receive an MPM index for the relevant chroma block that corresponds to the index of the particular MPM candidate in the MPM candidate list if the value of the MPM flag is set to the enabled state. For example, video decoder 30 may construct the MPM candidate list by first adding one or more DM modes for chroma blocks to the MPM candidate list. As described above, video decoder 30 may identify a plurality of DM modes for reconstruction of chroma blocks. It should be appreciated, however, that in some scenarios, video decoder 30 may identify a single DM mode for the chroma block. After adding the DM mode to the MPM candidate list, video decoder 30 may add other chroma modes from neighboring blocks to the MPM candidate list. Alternatively or additionally, video decoder 30 may add a default mode, such as by using the luma MPM candidate list construction process described in segegin.
Alternatively, video decoder 30 may construct the chroma MPM candidate list in the same manner as for the luma mode MPM candidate list. For example, video decoder 30 may check a number of neighboring blocks in the order described in Seregin. In these implementations, video decoder 30 may process LM mode and/or other chroma specific intra prediction modes in the same manner that video decoder 30 processes other intra prediction modes. Furthermore, video decoder 30 may prune the MPM candidate list to remove redundancy resulting from the addition of the same intra-prediction mode from multiple sources.
In one example, video encoder 20 may first signal a flag to indicate the use of one or more chroma specific modes that apply only to chroma components (e.g., LM modes and/or other prediction modes used only to code chroma components). Video decoder 30 may further receive an MPM flag if the selected prediction mode is not a chroma specific mode (i.e., video decoder 30 determines that the flag is set to a disabled state). In this example implementation, video decoder 30 may not consider a chroma specific mode (e.g., LM mode) when adding chroma prediction modes inherited from neighboring blocks to the MPM list, in which case the chroma specific mode is taken from the neighboring blocks.
Example use cases for this implementation are described below. Video decoder 30 may receive the LM flag set to the enabled state and may thus reconstruct the chroma block using LM mode intra prediction. Based on the chroma block having been encoded using the LM prediction mode, video decoder 30 may receive an MPM index that indicates a position within an MPM candidate list for the chroma block. This example use case illustrates that video decoder 30 may use a one-bit flag to first determine whether the prediction mode for the chroma block is at all a candidate in the MPM candidate list. If the prediction mode is not a candidate from the MPM candidate list, video decoder 30 avoids the need for video encoder 20 to signal an index indicating which mode of the MPM candidate list is used to predict the chroma block. In this way, video decoder 30 may save bandwidth by reducing the number of instances that require video encoder 20 to signal an index value, which may be more bandwidth intensive than signaling a one-bit flag.
In some examples, video encoder 20 and/or video decoder 30 may add other chroma-specific or chroma-specific intra-prediction modes to the MPM list in addition to the LM mode, and add the remaining intra-prediction modes as default modes for the list. Alternatively, video encoder 20 may first signal the MPM flag, and video encoder 20 and/or video decoder 30 may always consider the chroma prediction modes of the neighboring blocks when constructing the MPM list, regardless of whether the neighboring blocks are predicted using the LM mode. In another example, if the LM mode is not added to the MPM list, video encoder 20 and/or video decoder 30 may add the LM mode as the first default mode. In another example, video encoder 20 and/or video decoder 30 may only use LMs and modes from the MPM candidate list, and may remove the default modes together. In some examples, video encoder 20 (and optionally video decoder 30) may add an existing default mode only when the total number of added default modes is less than a predetermined integer value represented by "K". In one such example, K is set to a value of four (4).
In some examples, when only one DM is allowed, instead of taking the luma intra prediction mode from the top-left corner with the corresponding luma block, video encoder 20 and/or video decoder 30 may use one or more of the following rules to select the luma intra prediction mode as the DM mode. In one example of this rule, luma intra prediction mode is the most frequently used mode within the corresponding luma block. In one example, based on some scan order, video encoder 20 and/or video decoder 30 may traverse the intra-prediction mode for each unit within the corresponding luma block and record the number of occurrences of existing luma prediction modes. Video encoder 20 and/or video decoder 30 may select the mode with the largest number of occurrences. That is, video encoder 20 and/or video decoder 30 may select the luma intra prediction mode that covers the largest size (i.e., area) of the corresponding luma block. When both prediction modes have the same usage in the corresponding luma block, video encoder 20 and/or video decoder 30 may select the prediction mode that is first detected based on the scan order. Here, a unit is defined as a minimum PU/TU size for luma/chroma intra prediction. In some examples, the scan order may be a raster/zig-zag/diagonal/zig-zag scan order or a coding order.
Alternatively, video encoder 20 and/or video decoder 30 may start scanning from the center position of the luma block and traverse to the boundary in some order. Alternatively or additionally, the scans/units may depend on the PU/TU size. Alternatively, based on some scan order, video encoder 20 and/or video decoder 30 may traverse the intra-prediction mode for each PU/TU/CU within the corresponding luma block and record the number of occurrences of the existing luma prediction modes recorded. Video encoder 20 and/or video decoder 30 may select the mode with the largest number of occurrences. When both modes have the same usage in a luma block, video encoder 20 and/or video decoder 30 may select the prediction mode that occurs first (i.e., is detected first) based on the scan order. In some examples, the scan order may be a raster/zig-zag/diagonal/zig-zag scan order or a coding order. Alternatively, the scanning may depend on the PU/TU size.
In another alternative, for the example described above with respect to a single allowed DM mode, if video encoder 20 and/or video decoder 30 determines that two or more modes have equal occurrences in the corresponding luma block, video encoder 20 and/or video decoder 30 may select one of the modes having equal occurrences in the luma block. The selection may depend on the mode index and/or PU/TU size of these multiple luma modes. Alternatively, for a particular block size, e.g., a block size greater than 32 x 32, video encoder 20 and/or video decoder 30 may only evaluate a portion (e.g., a partial subset) of the luma intra prediction modes of the corresponding luma block according to this single DM-based rule.
As another example of such a rule with respect to a single DM mode context, video encoder 20 and/or video decoder 30 may select a luma intra prediction mode associated with a center position of a corresponding luma block. In one example, video encoder 20 and/or video decoder 30 may define the center position according to a tuple of coordinates (2x + W-1,2y + H-1) for the 4:2:0 color format. Alternatively, video encoder 20 and/or video decoder 30 may define the center location as follows:
if W and H are both equal to 2, video encoder 20 and/or video decoder 30 may use position (2x,2y) as the center position.
Otherwise, if H is equal to 2, video encoder 20 and/or video decoder 30 may use position (2x + (2 × W/4/2-1) × 4,2y) as the center position.
Otherwise, if W is equal to 2, video encoder 20 and/or video decoder 30 may use position (2x,2y + (2 × H/4/2-1) × 4) as the center position.
Otherwise (e.g., neither H nor W equals 4), then (2x + (2x W/4/2-1) 4,2y + (2x H/4/2-1) 4) is used as the center position.
In accordance with some examples of the techniques of this disclosure, instead of using the same default mode for all blocks, video encoder 20 and/or video decoder 30 may consider the mode derived from the corresponding luma block as the default mode. In one example, the total number of default modes is increased to include more modes derived from the corresponding luma block. In another example, only existing default modes are added when the total number of added default modes is less than K (in one non-limiting example, K is set to 4).
Fig. 12A and 12B illustrate adaptively ordered neighboring block selections for chroma prediction modes in accordance with one or more aspects of the present disclosure. According to some examples of the techniques of this disclosure, video encoder 20 and/or video decoder 30 may apply adaptive ordering of chroma modes such that the order may depend on chroma modes of neighboring blocks. In one example, video encoder 20 and/or video decoder 30 apply adaptive ordering only to particular modes, such as DM mode and/or LM mode. In another example, the neighboring blocks are five neighboring blocks, as depicted in fig. 12A. Alternatively, video encoder 20 and/or video decoder 30 may use only two neighboring blocks, e.g., a1 and B1 as shown in fig. 12A, or an upper block (a) and a left block (L) as shown in fig. 12B. In one example, when all available neighboring intra-coded blocks are coded with LM mode, video encoder 20 and/or video decoder 30 may put LM mode before DM mode. Alternatively, video encoder 20 and/or video decoder 30 may put LM mode before DM mode when at least one of the available neighboring intra-coded blocks is coded with LM mode.
According to some examples of this disclosure, video encoder 20 and/or video decoder 30 may use the luma information to reorder the chroma syntax values prior to entropy coding. In one example, the NSST index of a luma block may be used to update the coding order of chroma NSST indices. In this case, video encoder 20 and/or video decoder 30 may first encode/decode a bin indicating whether the index of the chroma block is the same as the NSST index of the corresponding luma block. In another example, video encoder 20 and/or video decoder 30 may update the coding order of chroma multiple transform (AMT) indices using AMT indices for luma blocks. In this case, video encoder 20 and/or video decoder 30 may first encode/decode the bin to indicate whether the index of the chroma block and the AMT index of the corresponding luma block are the same. Video encoder 20 and/or video decoder 30 may use another (e.g., similar) approach for any other syntax, with which the approach may be applicable to both luma and chroma components, while the indices/modes may be different for luma and chroma components.
According to some examples of this disclosure, video encoder 20 and/or video decoder 30 may derive multiple sets of LM parameters for one chroma block such that the derivation is based on luma intra prediction modes of corresponding luma blocks. In one example, video encoder 20 and/or video decoder 30 may derive up to K sets of parameters, e.g., where "K" represents an integer value. In one example, "K" is set to a value of two (2). In another example, video encoder 20 and/or video decoder 30 may classify neighboring luma/chroma samples into K sets based on the intra-prediction mode of the samples located in the corresponding luma block. Video encoder 20 and/or video decoder 30 may classify samples of luma samples within the corresponding luma block into K sets based on the intra-prediction mode of the samples located in the corresponding luma block. In another example, when two intra-prediction modes are considered "far apart," e.g., where the absolute value of the mode index is greater than a threshold, video encoder 20 and/or video decoder 30 may consider the corresponding sub-block and neighboring samples as using different parameters.
According to some examples of this disclosure, video encoder 20 and/or video decoder 30 may use the composite DM mode for encoding/decoding the current chroma block. According to the composite DM mode of this disclosure, video encoder 20 may generate a prediction block using a weighted sum of prediction blocks generated from two or more identified intra-prediction modes. Video encoder 20 may identify two or more intra-prediction modes for encoding co-located luma blocks or for encoding neighboring chroma blocks or for encoding neighboring blocks of corresponding luma blocks. The video encoder may then generate a prediction block for each of the identified intra-prediction modes, and may derive a weighted sum of the two or more generated prediction blocks as the prediction block for this composite DM mode.
In one example, the weights used to generate the prediction block for this composite DM mode depend on the area size of each identified intra-prediction mode applied to the corresponding luma block. Alternatively, the weight of the prediction block for each identified intra-prediction mode may depend on the location of the current pixel and whether the current identified intra-prediction mode covers the current pixel. In another alternative, the weights are the same for each identified intra-prediction mode. In another alternative, video encoder 20 and/or video decoder 30 may still utilize a set of predefined weights. In yet another alternative, or in addition, video encoder 20 may signal an index for the weight for each CTU/CU/PU/TU. When signaling the default mode (non-DM mode and non-LM mode as shown in table 7.1), if the default mode has been identified for generating the composite DM mode, video encoder 20 may replace the default mode with other intra-prediction modes that are not identified for generating the composite DM mode.
Fig. 13A and 13B are conceptual diagrams showing examples of block positions that video encoder 20 and video decoder 30 may use to select chroma intra prediction modes according to the techniques described above based on multiple DM mode selections. One example implementation regarding a selection based on multiple DM modes for chroma coding is described below. As described above, video encoder 20 (and optionally video decoder 30) may perform the selection of the DM mode, according to aspects of this disclosure. That is, in some examples, video encoder 20 may explicitly signal the DM candidate list, thereby eliminating the need for video decoder 30 to also form the DM candidate list. In other examples, video encoder 20 may signal only the index of the selected candidate from the DM candidate list, enabling video decoder 30 to select a candidate from the DM candidate list that video decoder 30 also forms.
Fig. 13A shows the prediction modes used in the sub-blocks of the luma component (luma block 202). Fig. 13B depicts luma mode inheritance for chroma blocks 204 in accordance with HEVC techniques. As shown, according to the HEVC technique, the prediction mode from the upper left sub-block of luma block 202 (i.e., mode L (1)) is inherited with respect to the left region of chroma block 204. As shown in fig. 13A, luminance patterns for sub-blocks located at center (C0), top-left (TL), top-right (TR), bottom-left (BL), and bottom-right (BR) are obtained (e.g., by video encoder 20, and optionally video decoder 30). The patterns are denoted by the acronyms DMC, DMTL, DMTR, DMBL, DMBR. In some alternatives, video encoder 20 (and optionally video decoder 30) may replace the C0 selection with a selection of the mode used at positions C1 and/or C2 and/or C3. In addition, video encoder 20 (and optionally video decoder 30) may add the luma mode covering most of the area of luma block 202 as an additional DM mode to the DM candidate list. The luminance pattern covering the maximum area of the luminance block 202 is indicated by the acronym "DMM".
Video encoder 20 (and optionally video decoder 30) may construct the DM candidate list using one or more techniques discussed below. A number of candidates from the candidate group (represented by "N") including DMC, DMTL, DMTR, DMBL, and DMBL may be added to the DM candidate list according to a predetermined order. In one example, "N" is set to six (6) and the order may be as follows: DMC, DMM, DMTL, DMTR, DMBL, DMBR. In one alternative, "N" is set to five (5) and the order may be as follows: DMC, DMTL, DMTR, DMBL, DMBR. In forming the candidate list, video encoder 20 (and optionally video decoder 30) may prune each candidate relative to all candidates or a partial subset (e.g., a proper subset) of previously added candidates before adding each such candidate to the DM candidate list. While two example orders are discussed above, it should be appreciated that video encoder 20 (and optionally video decoder 30) may also use various other orders in accordance with aspects of this disclosure. Assuming that the total number of DM modes in the candidate list is "M" (where "M" is a positive integer) and the total number of default modes is represented by "F", then a particular candidate of the DM candidate list is represented by DMiTo indicate. In this notation, the subscript "i" denotes integer values ranging from 0 to M-1).
Video encoder 20 (and optionally video decoder 30) may use application refinement in the DM candidate and default mode. That is, in forming the DM candidate list, video encoder 20 (and optionally video decoder 30) may prune the DM candidates from the default mode. In an alternative, for each DMiVideo encoder 20 (and optionally video decoder 30) may compare DMsiAnd each of the default modes. If it is notDiscovering any default mode and DMiLikewise, video encoder 20 (and optionally video decoder 30) may replace the first such default mode (which was found with the DM) with an alternate modeiThe same). For example, video encoder 20 (and optionally video decoder 30) may replace the reduced default mode with a mode having an index value equal to (K-1-i), where "K" is the total number of luma prediction modes for the corresponding luma block. Example pseudo code for these operations is given below:
for example, the default mode may be: mode 0 (plane), mode 50 (vertical direction), mode 18 (horizontal direction), and mode 1(DC), and the DM candidate list is { mode 0, mode 63, mode 50, mode 1 }. After the compaction process, the default mode is replaced by the set of: { mode 66, mode 64, mode 18, mode 63 }. In another alternative, video encoder 20 (and optionally video decoder 30) may apply full compaction, with each default mode being compacted with respect to all DM modes. That is, for each default mode, the default mode will be compared with all DM modes. If the step-by-step comparison indicates that one of the DM modes is the same as the default mode currently under examination, then the default mode will be replaced by the last non-DM mode. Example pseudo code for this example is given below:
video encoder 20 may implement various aspects of the multiple DM mode based techniques of this disclosure to implement signaling of chroma modes. Video encoder 20 may encode the chroma mode according to a process that includes the following portions. As one part, video encoder 20 may encode and signal a one-bit flag to indicate the use of any of the prediction modes applicable only to the chroma components (e.g., LM, which is chroma coding specific). If the chroma block is encoded according to this chroma particular mode (thereby causing video encoder 20 to set the flag to the enabled state), video encoder 20 may additionally encode and signal an index of the particular mode.
In addition, video encoder 20 may encode and signal a flag to indicate the use of the mode derived from the corresponding luma block. That is, if video encoder 20 selects a prediction mode for encoding a chroma block based on a prediction mode for a corresponding luma block, video encoder 20 may set the flag to an enabled state. Subsequently, if the chroma block is encoded using a prediction mode inherited from the corresponding luma block, video encoder 20 may additionally encode and signal an index of a mode selected from the corresponding luma block.
If video encoder 20 determines that the chroma block is encoded according to a prediction mode that is derived neither from the chroma specific prediction mode nor from the luma block, video encoder 20 may encode and signal information identifying the remaining modes. Video encoder 20 may implement the above listed portions/options of chroma encoding according to different orders. Examples of different sequences are given in table 7.3 below and table 7.4 or table 8 below.
Table 7.3-specification of chroma intra prediction modes and associated names
TABLE 7.4 binary string for each chroma mode
TABLE 8 binary string for each chroma mode
As described above, aspects of the present invention relate to unification of a luminance mode and a chrominance mode. Unified example implementations of luma and chroma modes are described below. The total allowed number of Most Probable Mode (MPM) candidates is hereinafter represented by NmpmAnd (4) showing. Video encoder 20 and/or video decoder 30 may construct a mode list of chroma intra modes to include the following:
-an LM mode; and
-MPM mode.
The MPM mode portion may include a DM candidate list and a chroma mode portion. Video encoder 20 (and optionally video decoder 30) may form the DM candidate list portion of the unified candidate list using the same techniques as described above with respect to DM multiple DM modes. With respect to the chroma mode portion of the MPM mode, video encoder 20 (and optionally video decoder 30) may derive chroma modes from neighboring blocks of a currently coded chroma block. For example, to derive chroma modes from neighboring blocks, video encoder 20 (and optionally video decoder 30) may reuse the MPM construction process for luma modes. If the total number of MPM candidates is still less than N after performing the list construction process described abovempmVideo encoder 20 (and optionally video decoder 30) may perform various steps in accordance with jfet-C0055 referenced above.
For example, if the total number of MPM candidates is less than N after performing the list construction process set forth abovempmThen video encoder 20 (and optionally video decoder 30) may add the following modes: left (L), above (a), planar, DC, left-down (BL), right-up (AR), and left-up (AL) modes. If the MPM candidate list is still incomplete (i.e., if the total number of MPM candidates is less than NmpmOf (b), video encoder 20 (and optionally video decoder 30) may add-1 and +1 to the included angle modes. If the MPM list is still incomplete, the MPM candidate list is still incomplete (i.e., the total number of MPM candidates is less than NmpmValues of) then video encoder 20 (and optionally video decoder 30) may add default modes, i.e., vertical, horizontal, 2, and diagonal modes.
non-MPM modes that video encoder 20 and/or video decoder 30 may recognize include any remaining intra-prediction modes not included in the MPM candidate list construction process described above. The difference from the luma-based MPM list construction process described above (e.g., in the section referencing JVET-C0055) is that when one candidate is added, the added candidate is not LM mode. Alternatively or additionally, the plane and DC modes may be added after all spatial neighbors. Alternatively, video encoder 20 and/or video decoder 30 may implement one or more other MPM list construction techniques in place of the technique of jfet-C0055.
With respect to the unification of luma and chroma modes, video encoder 20 may implement the various chroma mode signaling techniques of this disclosure. Video encoder 20 may encode the chroma mode according to a process that includes the following portions. As one part, video encoder 20 may encode and signal a one-bit flag to indicate the use of any of the prediction modes applicable only to the chroma components (e.g., LM, which is chroma coding specific). If the chroma block is encoded according to this chroma particular mode (thereby causing video encoder 20 to set the flag to the enabled state), video encoder 20 may additionally encode and signal an index of the particular mode.
In addition, video encoder 20 may encode and signal a flag to indicate the use of the mode included in the MPM candidate list. That is, if video encoder 20 selects a prediction mode for encoding a chroma block and the selected prediction mode is included in the MPM candidate list, video encoder 20 may set the flag to the enabled state. Subsequently, if the chroma block is encoded using a prediction mode included in the MPM candidate list, video encoder 20 may additionally encode and signal an index for the mode that indicates the position of the mode in the MPM candidate list.
If video encoder 20 determines that the chroma block is encoded according to neither the chroma specific prediction mode nor the prediction mode included in the MPM candidate list, video encoder 20 may encode and signal information identifying the remaining modes. Video encoder 20 may implement the above listed portions/options of chroma encoding according to different orders. Examples of different sequences are given in table 8.1 or table 9 below.
TABLE 8.1 binary string for each chroma mode
If the mode list of chroma intra-modes includes only the LM portion and the MPM portion (including multiple DM modes and modes from spatial neighbors like luma MPM), video encoder 20 may implement signaling of chroma modes in another modified manner, as shown in table 9 below:
TABLE 9
In another alternative, video encoder 20 (and optionally video decoder 30) may always add a default mode (e.g., plane, DC, horizontal, vertical) to the MPM candidate list. In one example, the N of the MPM candidate list may first be constructed with one or more of the techniques described abovempmAnd (4) a candidate. Next, the missing pattern of the default pattern may replace the last one or moreAn MPM candidate.
Fig. 14 is a flow diagram depicting an example process 220 executable by processing circuitry of video decoder 30 according to an aspect of the disclosure. Process 220 may begin when the processing circuitry of video decoder 30: a plurality of Derived Modes (DM) that are determined to be available for predicting a luma block of video data may also be available for predicting a chroma block of video data, the chroma block corresponding to the luma block (222). Video decoder 30 may form a candidate list for a prediction mode for the chroma block, the candidate list including one or more DMs of the plurality of DMs that may be used to predict the chroma block (224). In some non-limiting examples, the processing circuitry of video decoder 30 may receive data indicative of each respective DM of the one or more DMs of the candidate list in the encoded video bitstream and reconstruct the received data indicative of each respective DM of the one or more DMs, forming the candidate list. In other examples, the processing circuit of video decoder 30 may construct the candidate list.
Processing circuitry of video decoder 30 may determine to decode the chroma block using any DM of the one or more DMs of the candidate list (226). In some non-limiting examples, the processing circuit of video decoder 30 may receive a one-bit flag in the encoded video bitstream indicating that the chroma block is encoded using one of the DMs. Based on the determination to decode the chroma block using any DM of the one or more DMs of the candidate list, processing circuitry of video decoder 30 may decode an indication identifying a selected DM of the candidate list that will be used to decode the chroma block (228). For example, processing circuitry of video decoder 30 may reconstruct data (received in the encoded video bitstream) indicative of an index value identifying a location of the selected DM in the candidate list. Processing circuitry of video decoder 30 may then decode the chroma block according to the selected DM (230). In various examples, video data including luma blocks and chroma blocks may be stored to a memory of video decoder 30.
In some examples, the one or more DMs included in the candidate list may include one or more of: a first prediction mode associated with a center position of a corresponding luma block; a second prediction mode associated with an upper left position of a corresponding luma block; a third prediction mode associated with an upper right position of the corresponding luma block; a fourth prediction mode associated with a lower left position of the corresponding luma block; or a fifth prediction mode associated with a lower right position of the corresponding luma block. In some examples, the candidate list may further include one or more chroma intra prediction modes different from each of the one or more DMs. In some such examples, each of the chroma intra prediction modes corresponds to a mode used to predict a neighboring chroma block of the chroma block. In some examples, at least one respective chroma intra prediction mode of the candidate list is a chroma specific prediction mode used only for predicting chroma data.
Fig. 15 is a flow diagram depicting an example process 240 executable by the processing circuitry of video encoder 20 according to an aspect of the disclosure. Process 240 may begin when the processing circuitry of video encoder 20: a plurality of Derived Modes (DM) that are determined to be available for predicting a luma block of video data may also be available for predicting a chroma block of video data, the chroma block corresponding to the luma block (242). In various examples, video data including luma blocks and chroma blocks may be stored to a memory of video encoder 20. Video encoder 20 may form a candidate list for the prediction mode for the chroma block, the candidate list including one or more DMs of the plurality of DMs that may be used to predict the chroma block (244).
Processing circuitry of video encoder 20 may determine to encode the chroma block using any DM of the one or more DMs of the candidate list (246). Based on the determination to encode the chroma block using any DM of the one or more DMs of the candidate list, the processing circuitry of video encoder 20 may encode an indication that identifies the selected DM of the candidate list that will be used to decode the chroma block (248). For example, the processing circuitry of video encoder 20 may encode data indicating an index value that identifies the position of the selected DM in the candidate list and signal the encoded data in the encoded video bitstream. Processing circuitry of video encoder 20 may then encode the chroma block according to the selected DM (250). In some examples, the processing circuitry of video encoder 20 may signal a one-bit flag in the encoded video bitstream that indicates whether the chroma block is encoded using a Linear Model (LM) mode. In these examples, the processing circuitry of video encoder 20 may signal data indicative of each respective DM of the one or more DMs of the candidate list in the encoded video bitstream.
In some examples, the one or more DMs included in the candidate list may include one or more of: a first prediction mode associated with a center position of a corresponding luma block; a second prediction mode associated with an upper left position of a corresponding luma block; a third prediction mode associated with an upper right position of the corresponding luma block; a fourth prediction mode associated with a lower left position of the corresponding luma block; or a fifth prediction mode associated with a lower right position of the corresponding luma block. In some examples, the candidate list may further include one or more chroma intra prediction modes different from each of the one or more DMs. In some such examples, each of the chroma intra prediction modes corresponds to a mode used to predict a neighboring chroma block of the chroma block. In some examples, at least one respective chroma intra prediction mode of the candidate list is a chroma specific prediction mode used only for predicting chroma data. In some examples, processing circuitry of video encoder 20 may determine that at least two DMs of the one or more DMs are the same, and may include only one DM of the at least two identical DMs in the candidate list.
Fig. 16 is a flow diagram depicting an example process 260 executable by processing circuitry of video decoder 30 according to an aspect of this disclosure. Process 260 may begin when the processing circuitry of video decoder 30: a Most Probable Mode (MPM) candidate list is formed for chroma blocks of video data stored to a memory of video decoder 30, such that the MPM candidate list includes one or more Derived Modes (DM) associated with luma blocks of the video data associated with the chroma blocks, and a plurality of luma prediction modes usable for decoding luma components of the video data (262). In some examples, processing circuitry of video decoder 30 may add one or more DMs to the MPM candidate list, and may add one or more chroma modes inherited from neighboring chroma blocks of the chroma blocks at positions of the MPM candidate list that occur after positions of all ones or DMs in the MPM candidate list.
In some examples, the processing circuitry of video decoder 30 may omit any additional instances of LM mode from the MPM candidate list in response to a determination that LM mode is one or more neighboring chroma blocks used to predict the chroma block. In some examples, the processing circuitry of video decoder 30 may receive a one-bit flag in the encoded video bitstream that indicates whether the chroma block is encoded using LM mode. In one scenario, processing circuitry of video decoder 30 may determine that the received one-bit flag is set to a disabled state, may receive an MPM index corresponding to a particular mode of the MPM candidate list, and may select the particular mode corresponding to the received MPM index based on the received one-bit flag being set to the disabled state. In another scenario, processing circuitry of video decoder 30 may determine that the received one-bit flag is set to an enabled state, and based on the received one-bit flag being set to the enabled state, the LM mode may be selected from the MPM candidate list.
In some examples, processing circuitry of video decoder 30 may determine whether a number of default modes associated with a chroma block meets a predetermined threshold. Based on a determination that the number of default modes meets a predetermined threshold, processing circuitry of video decoder 30 may add each of the default modes to the MPM candidate list, and may omit all of the default modes from the MPM candidate list. Processing circuitry of video decoder 30 may select a mode from the MPM candidate list (264). Subsequently, processing circuitry of video decoder 30 may decode the chroma block according to the mode selected from the MPM candidate list (266).
In some examples, to form the MPM candidate list, processing circuitry of video decoder 30 may add one or more DMs to the MPM candidate list, and may add one or more chroma modes inherited from neighboring chroma blocks of the chroma blocks at positions of the MPM candidate list that occur after positions of all ones or DMs in the MPM candidate list. In some examples, to form the MPM candidate list, processing circuitry of video decoder 30 may add one or more Linear Model (LM) modes to the MPM candidate list. In one such example, processing circuitry of video decoder 30 may determine that the one or more LM modes include a first instance of the first mode and one or more additional instances of the first LM mode, and may omit the one or more additional instances of the LM mode from the MPM candidate list in response to a determination that the first LM mode is one or more neighboring chroma blocks used to predict the chroma block.
In some examples, the processing circuitry of video decoder 30 may receive a one-bit flag in the encoded video bitstream indicating whether the chroma block is encoded using the LM mode, wherein selecting the mode from the MPM candidate list is based on a value of the one-bit flag. In some such examples, processing circuitry of video decoder 30 may determine that the one or more LM modes include a plurality of LM modes, and may determine that the received one-bit flag is set to an enabled state. In some such examples, video decoder 30 device processing circuitry may receive an LM index corresponding to a location of a particular LM mode of the plurality of LM modes in the MPM candidate list, and may select the particular LM mode corresponding to the received LM index for coding the chroma block based on the received one-bit flag being set to the enabled state. In some examples, to select a mode from the MPM candidate list, processing circuitry of video decoder 30 may determine that the received one-bit flag is set to a disabled state, may receive an MPM index corresponding to the particular mode of the MPM candidate list, and may select the particular mode corresponding to the received MPM index based on the received one-bit flag being set to the disabled state.
In some examples, processing circuitry of video decoder 30 may determine whether a number of default modes associated with a chroma block meets a predetermined threshold. In these examples, the processing circuitry of video decoder 30 may perform one of: (i) adding, based on a determination that the number of default modes does not meet a predetermined threshold, each of the default modes to an MPM candidate list; or (ii) omit all default modes from the MPM candidate list based on a determination that the number of default modes meets a predetermined threshold.
Fig. 17 is a flow diagram depicting an example process 280 executable by the processing circuitry of video encoder 20 according to an aspect of this disclosure. Process 280 may begin when the processing circuitry of video encoder 20: a Most Probable Mode (MPM) candidate list is formed for chroma blocks of video data stored to a memory of video encoder 20, such that the MPM candidate list includes a Linear Model (LM) mode, one or more Derived Modes (DM) associated with luma blocks of the video data associated with the chroma blocks, and a plurality of luma prediction modes (282) that may be used to decode the luma blocks. In some examples, processing circuitry of video encoder 20 may add one or more DMs to the MPM candidate list, and may add one or more chroma modes that are inherited from neighboring chroma blocks of the chroma blocks at positions of the MPM candidate list that occur after positions of all ones or DMs in the MPM candidate list.
In some examples, the processing circuitry of video encoder 20 may omit any additional instances of LM mode from the MPM candidate list in response to a determination that LM mode is one or more neighboring chroma blocks used to predict the chroma block. In some examples, the processing circuitry of video encoder 20 may signal a one-bit flag in the encoded video bitstream that indicates whether the chroma block is encoded using LM mode. In one scenario, the processing circuit of video encoder 20 may set a one-bit flag to a disabled state based on a determination that the chroma block is not encoded using LM mode. In this scenario, based on a determination that a chroma block is not encoded using the LM mode and a determination that the chroma block is encoded using a particular mode of the MPM candidate list, the processing circuitry of video encoder 20 may signal, in the encoded video bitstream, an MPM index corresponding to the particular mode of the MPM candidate list. In another scenario, the processing circuit of video encoder 20 may set the one-bit flag to the enabled state based on a determination that the chroma block is encoded using the LM mode.
In some examples, processing circuitry of video encoder 20 may determine whether a number of default modes associated with a chroma block meets a predetermined threshold. Based on a determination that the number of default modes meets a predetermined threshold, processing circuitry of video encoder 20 may add each default mode of the default modes to the MPM candidate list, and may omit all default modes from the MPM candidate list. Processing circuitry of video encoder 20 may select a mode from the MPM candidate list (284). Subsequently, the processing circuitry of video encoder 20 may encode the chroma block according to a mode selected from the MPM candidate list.
In some examples, to form the MPM candidate list, processing circuitry of video encoder 20 may add one or more Linear Model (LM) modes to the MPM candidate list. In some examples, the processing circuitry of video encoder 20 may signal a one-bit flag in the encoded video bitstream that indicates whether the chroma block is encoded using any of one or more LM modes of the MPM candidate list. In some examples, the processing circuitry of video encoder 20 may set the one-bit flag to the disabled state based on a determination that the chroma block is not encoded using any LM mode of the MPM candidate list, and may signal, in the encoded video bitstream, an MPM index corresponding to a particular mode of the MPM candidate list based on a determination that the chroma block is not encoded using any LM mode of the MPM candidate list and based on a determination that the chroma block is encoded using the particular mode of the MPM candidate list. In some examples, the processing circuitry of video encoder 20 may set the one-bit flag to the enabled state based on a determination that the chroma block is encoded using a particular LM mode of the one or more LM modes of the MPM candidate list.
In some examples, processing circuitry of video encoder 20 may determine whether a number of default modes associated with a chroma block meets a predetermined threshold. Subsequently, the processing circuitry of video encoder 20 may perform one of: (i) based on a determination that the number of default modes does not meet a predetermined threshold, adding each of the default modes to an MPM candidate list; or (ii) omit all default modes from the MPM candidate list based on a determination that the number of default modes meets a predetermined threshold.
It will be recognized that depending on the example, certain acts or events of any of the techniques described herein can be performed in a different sequence, may be added, merged, or left out entirely (e.g., not all described acts or events are necessary for the practice of the techniques). Further, in some examples, acts or events may be performed concurrently, e.g., via multi-threaded processing, interrupt processing, or multiple processors, rather than sequentially.
In one or more examples, the functions described may be implemented in hardware, software, solid state, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. The computer-readable medium may include: a computer-readable storage medium corresponding to a tangible medium such as a data storage medium; or communication media including any medium that facilitates transfer of a computer program from one place to another, such as according to a communication protocol. In this manner, the computer-readable medium may generally correspond to (1) a tangible computer-readable storage medium that is not transitory, or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. The computer program product may include a computer-readable medium.
By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but rather refer to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Furthermore, the techniques may be fully implemented in one or more circuits or logic elements.
The techniques of this disclosure may be implemented in a variety of apparatuses or devices, including wireless handsets, Integrated Circuits (ICs), or a collection of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as noted above, the various units may be combined in a codec hardware unit, or provided by a set of interoperability hardware units (including one or more processors as described above) in conjunction with suitable software and/or solids.
Various examples have been described. These and other examples are within the scope of the following claims.

Claims (22)

1. A method of coding video data, the method comprising:
forming a Most Probable Mode (MPM) candidate list for a chroma block of the video data such that the MPM candidate list includes one or more Derived Modes (DM) associated with a luma block of the video data associated with the chroma block and a plurality of luma prediction modes capable of being used to code a luma component of the video data;
selecting a mode from the MPM candidate list; and
coding the chroma block according to the mode selected from the MPM candidate list.
2. The method of claim 1, wherein forming the MPM candidate list comprises:
adding the one or more DMs to the MPM candidate list;
adding one or more chroma modes inherited from neighboring chroma blocks of the chroma block at positions of the MPM candidate list that occur after positions of all of the one or DMs in the MPM candidate list.
3. The method of claim 1, wherein forming the MPM candidate list comprises:
adding one or more linear model, LM, modes to the MPM candidate list.
4. The method of claim 3, wherein forming the MPM candidate list comprises:
determining that the one or more LM modes include a first instance of a first LM mode and one or more additional instances of the first LM mode; and
omitting the one or more additional instances of the LM mode from the MPM candidate list in response to a determination that the first LM mode is to predict one or more neighboring chroma blocks of the chroma block.
5. The method of claim 1, wherein coding the chroma block comprises decoding the chroma block, the method further comprising receiving a one-bit flag in an encoded video bitstream that indicates whether the chroma block is encoded using the LM mode, wherein selecting the mode from the MPM candidate list is based on a value of the one-bit flag.
6. The method of claim 5, further comprising:
determining that the one or more LM modes include a plurality of LM modes;
determining that the received one-bit flag is set to an enable state;
receiving an LM index corresponding to a location of a particular LM mode of the plurality of LM modes in the MPM candidate list; and
based on the received one-bit flag being set to the enabled state, selecting the particular LM mode corresponding to the received LM index for coding the chroma block.
7. The method of claim 5, wherein selecting the mode from the MPM candidate list comprises:
determining that the received one-bit flag is set to a disabled state;
receiving an MPM index corresponding to a particular mode of the MPM candidate list; and
based on the received one-bit flag being set to the disabled state, selecting the particular mode corresponding to the received MPM index.
8. The method of claim 1, wherein coding the chroma block comprises encoding the chroma block, the method further comprising:
adding one or more linear model, LM, modes to the MPM candidate list; and
signaling, in an encoded video bitstream, a one-bit flag indicating whether the chroma block is encoded using any of the one or more LM modes of the MPM candidate list.
9. The method of claim 8, further comprising:
setting the one-bit flag to a disabled state based on a determination that the chroma block is not encoded using any LM mode of the candidate list; and
signaling, in the encoded video bitstream, an MPM index corresponding to a particular mode of the MPM candidate list based on the determination that the chroma block is not encoded using any LM mode of the MPM candidate list and based on a determination that the chroma block is encoded using the particular mode of the MPM candidate list.
10. The method of claim 8, further comprising:
setting the one-bit flag to an enabled state based on a determination that the chroma block is encoded using a particular LM mode of the one or more LM modes of the MPM candidate list.
11. The method of claim 1, further comprising:
determining whether a number of default modes associated with the chroma block meets a predetermined threshold; and
performing one of:
based on a determination that the number of default modes does not meet the predetermined threshold, adding each of the default modes to the MPM candidate list, or
Based on a determination that the number of default modes meets the predetermined threshold, omitting all of the default modes from the MPM candidate list.
12. An apparatus, comprising:
a memory configured to store video data; and
processing circuitry in communication with the memory, the processing circuitry configured to:
forming a Most Probable Mode (MPM) candidate list for a chroma block of the video data stored to the memory device, such that the MPM candidate list includes one or more Derived Modes (DM) associated with a luma block of the video data associated with the chroma block and a plurality of luma prediction modes capable of being used to code a luma component of the video data;
selecting a mode from the MPM candidate list; and
coding the chroma block according to the mode selected from the MPM candidate list.
13. The device of claim 12, wherein to form the MPM candidate list, the processing circuitry is configured to:
adding the one or more DMs to the MPM candidate list;
adding one or more chroma modes inherited from neighboring chroma blocks of the chroma block at positions of the MPM candidate list that occur after positions of all of the one or DMs in the MPM candidate list.
14. The device of claim 12, wherein to form the MPM candidate list, the processing circuitry is configured to:
adding one or more linear model, LM, modes to the MPM candidate list.
15. The device of claim 14, wherein to add the one or more LM modes, the processing circuit is configured to:
determining that the one or more LM modes include a first instance of a first LM mode and one or more additional instances of the first LM mode; and
omitting the one or more additional instances of the LM mode from the MPM candidate list in response to a determination that the first LM mode is to predict one or more neighboring chroma blocks of the chroma block.
16. The device of claim 12, wherein to code the chroma block, the processing circuit is configured to decode the chroma block, wherein the processing circuit is further configured to receive a one-bit flag in an encoded video bitstream that indicates whether the chroma block is encoded using the LM mode, and wherein to select the mode from the MPM candidate list, the processing circuit is configured to select the mode based on a value of the one-bit flag.
17. The device of claim 16, wherein the processing circuit is further configured to:
determining that the one or more LM modes include a plurality of LM modes;
determining that the received one-bit flag is set to an enable state;
receiving an LM index corresponding to a location of a particular LM mode of the plurality of LM modes in the MPM candidate list; and
based on the received one-bit flag being set to the enabled state, selecting the particular LM mode corresponding to the received LM index for coding the chroma block.
18. The device of claim 16, wherein to select the mode from the MPM candidate list, the processing circuitry is configured to:
determining that the received one-bit flag is set to a disabled state;
receiving an MPM index corresponding to a particular mode of the MPM candidate list; and
based on the received one-bit flag being set to the disabled state, selecting the particular mode corresponding to the received MPM index.
19. The device of claim 12, wherein to code the chroma block, the processing circuit is configured to encode the chroma block, and wherein the processing circuit is further configured to signal a one-bit flag in an encoded video bitstream that indicates whether the chroma block is encoded using the LM mode.
20. The device of claim 19, wherein the processing circuit is further configured to:
setting the one-bit flag to a disabled state based on a determination that the chroma block is not encoded using the LM mode; and
signaling, in the encoded video bitstream, an MPM index corresponding to a particular mode of the MPM candidate list based on the determination that the chroma block is not encoded using the LM mode and based on a determination that the chroma block is encoded using the particular mode of the MPM candidate list.
21. The device of claim 19, wherein the processing circuit is further configured to:
setting the one-bit flag to an enabled state based on a determination that the chroma block is encoded using the LM mode.
22. The device of claim 12, wherein the processing circuit is further configured to:
determining whether a number of default modes associated with the chroma block meets a predetermined threshold;
based on a determination that the number of default modes does not meet the predetermined threshold, adding each of the default modes to the MPM candidate list; and
based on a determination that the number of default modes meets the predetermined threshold, omitting all of the default modes from the MPM candidate list.
HK19124169.4A 2016-08-15 2017-08-15 Intra video coding using a decoupled tree structure HK40000797B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US62/375,383 2016-08-15
US62/404,572 2016-10-05
US15/676,345 2017-08-14

Publications (2)

Publication Number Publication Date
HK40000797A true HK40000797A (en) 2020-02-14
HK40000797B HK40000797B (en) 2024-05-17

Family

ID=

Similar Documents

Publication Publication Date Title
CN109565599B (en) Intra-Video Coding Using Decoupled Tree Structure
US11743509B2 (en) Intra video coding using a decoupled tree structure
CN107211124B (en) Method, apparatus and computer readable storage medium for decoding video data
KR102600724B1 (en) Coding data using an enhanced context-adaptive binary arithmetic coding (cabac) design
EP2847997B1 (en) Grouping bypass coded syntax elements in video coding
JP5844460B2 (en) Quantization parameter prediction in video coding
CN103959777B (en) The sample self adaptation merged with auto-adaptive loop filter skew in video coding
KR102334126B1 (en) Residual prediction for intra block copying
CN111213376A (en) Encoding motion information of video data using candidate list construction based on coding structure
US20140064359A1 (en) Intra prediction most probable mode order improvement for scalable video coding
US9648353B2 (en) Multiple base layer reference pictures for SHVC
EP3375187A1 (en) Coding sign information of video data
AU2012267737A1 (en) Enhanced intra-prediction mode signaling for video coding using neighboring mode
JP7651669B2 (en) Position-dependent space-varying transformations for video coding.
CN111149361B (en) Adaptive group of pictures structure with future reference frames in a random access configuration for video coding
HK40098896A (en) Method and device for decoding video data and computer-readable storage medium
HK40000797A (en) Intra video coding using a decoupled tree structure
HK40000339A (en) Intra video coding using a decoupled tree structure
HK40000797B (en) Intra video coding using a decoupled tree structure
HK1201003B (en) Method and device for encoding and decoding video data