[go: up one dir, main page]

HK40000966B - Linear model chroma intra prediction for video coding - Google Patents

Linear model chroma intra prediction for video coding Download PDF

Info

Publication number
HK40000966B
HK40000966B HK19124175.1A HK19124175A HK40000966B HK 40000966 B HK40000966 B HK 40000966B HK 19124175 A HK19124175 A HK 19124175A HK 40000966 B HK40000966 B HK 40000966B
Authority
HK
Hong Kong
Prior art keywords
block
video data
samples
linear prediction
chroma
Prior art date
Application number
HK19124175.1A
Other languages
Chinese (zh)
Other versions
HK40000966A (en
Inventor
张凯
陈建乐
张莉
马尔塔‧卡切维奇
Original Assignee
高通股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 高通股份有限公司 filed Critical 高通股份有限公司
Publication of HK40000966A publication Critical patent/HK40000966A/en
Publication of HK40000966B publication Critical patent/HK40000966B/en

Links

Description

Linear model chroma intra prediction for video coding
This application claims the benefit of united states provisional application No. 62/395,145, filed 2016, 9, 15, the entire contents of which are incorporated herein by reference.
Technical Field
The present disclosure relates to video coding.
Background
Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, Personal Digital Assistants (PDAs), portable or desktop computers, tablet computers, electronic book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video gaming consoles, cellular or satellite radio telephones, so-called "smart phones," video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video coding techniques such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 part 10 Advanced Video Coding (AVC), the High Efficiency Video Coding (HEVC) standard, ITU-T H.265/High Efficiency Video Coding (HEVC), and extensions of such standards. Video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing such video coding techniques.
Video coding techniques include spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (e.g., a video picture or a portion of a video picture) may be partitioned into video blocks, which may also be referred to as Coding Tree Units (CTUs), Coding Units (CUs), and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in an inter-coded (P or B) slice of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. A picture may be referred to as a frame and a reference picture may be referred to as a reference frame.
Spatial or temporal prediction results in a predictive block for the block to be coded. The residual data represents pixel differences between the original block to be coded and the predictive block. An inter-coded block is encoded according to a motion vector that points to a block of reference samples that forms a predictive block and residual data that indicates a difference between the coded block and the predictive block. The intra-coded block is encoded according to an intra-coding mode and residual data. For further compression, the residual data may be transformed from the pixel domain to the transform domain, resulting in residual transform coefficients, which may then be quantized. The quantized transform coefficients initially arranged in a two-dimensional array may be scanned in order to generate a one-dimensional vector of transform coefficients, and entropy coding may be applied to achieve even more compression.
Disclosure of Invention
In general, techniques are described for enhanced linear model chroma intra prediction. This disclosure describes techniques that include predicting chroma samples for a corresponding block of luma samples using two or more linear prediction models. In other examples, a luma sample block may be downsampled using one of a plurality of downsampling filters. The downsampled luma samples may then be used to predict corresponding chroma samples using linear model prediction techniques. In other examples, chroma samples may be predicted using a combination of linear model prediction and angular prediction.
In one example of this disclosure, a method of decoding video data comprises receiving an encoded block of luma samples for a first block of video data, decoding the encoded block of luma samples to generate reconstructed luma samples, and predicting chroma samples for the first block of video data using the reconstructed luma samples for the first block of video data and two or more linear prediction models.
In another example of this disclosure, a method of encoding video data includes encoding a block of luma samples for a first block of video data, reconstructing the encoded block of luma samples to generate reconstructed luma samples, and predicting chroma samples for the first block of video data using the reconstructed samples for the first block of video data and two or more linear prediction models.
In another example of this disclosure, an apparatus configured to decode video data comprises: the apparatus generally includes a memory configured to receive a first block of video data, and one or more processors configured to receive an encoded block of luma samples for the first block of video data, decode the encoded block of luma samples to generate reconstructed luma samples, and predict chroma samples for the first block of video data using the reconstructed luma samples for the first block of video data and two or more linear prediction models.
In another example of this disclosure, an apparatus configured to encode video data comprises: the apparatus generally includes a memory configured to receive a first block of video data, and one or more processors configured to encode a block of luma samples for the first block of video data, reconstruct the encoded block of luma samples to generate reconstructed luma samples, and predict chroma samples for the first block of video data using the reconstructed samples for the first block of video data and two or more linear prediction models.
In another example of this disclosure, an apparatus configured to decode video data comprises: the apparatus generally includes means for receiving an encoded block of luma samples for a first block of video data, means for decoding the encoded block of luma samples to generate reconstructed luma samples, and means for predicting chroma samples of the first block of video data using the reconstructed luma samples of the first block of video data and two or more linear prediction models.
In another example, this disclosure describes a computer-readable storage medium storing instructions that, when executed, cause one or more processors configured to decode video data to: the method includes receiving an encoded block of luma samples for the first block of video data, decoding the encoded block of luma samples to generate reconstructed luma samples, and predicting chroma samples of the first block of video data using the reconstructed luma samples of the first block of video data and two or more linear prediction models.
In one example, a method of coding video data includes determining luma samples for a first block of video data and predicting chroma samples for the first block of video data using the luma samples and two or more prediction models for the first block of video data. In one example, a device for coding video data comprises a memory that stores video data, and a video coder comprising one or more processors configured to: the method includes determining luma samples for a first block of video data and predicting chroma samples for the first block of video data using the luma samples for the first block of video data and two or more prediction models.
In one example, a method of coding video data includes determining luma samples for a first block of video data; determining a prediction model to predict chroma samples of the first block of video data; determining one of a plurality of downsampling filters to downsample the luma samples; down-sampling the luma samples using the determined down-sampling filter to generate down-sampled luma samples; and predicting chroma samples of the first block of video data using the downsampled luma samples of the first block of video data and the prediction model.
In one example, a method of coding video data comprises determining whether a current chroma block of the video data is coded using a linear model; code the current chroma block of video data using a linear model if the current chroma block of video data is coded using the linear model; in the case that the current chroma block of video data is not coded using a linear model, the method further comprises determining whether linear mode angular prediction is enabled when the current block is determined not to be coded using the linear model; applying an angular mode prediction pattern and a linear model prediction pattern to samples of the current chroma block if linear mode angular prediction is enabled; and determining a final linear-mode angular prediction for the samples of the current chroma block as a weighted sum of the applied angular-mode prediction pattern and a linear-model prediction pattern.
In one example, a device for coding video data comprises a memory that stores video data and a video coder comprising one or more processors configured to determine whether a current chroma block of the video data is coded using a linear model; code the current chroma block of video data using a linear model if the current chroma block of video data is coded using the linear model; in a case that the current chroma block of video data is not coded using a linear model, the one or more processors are further configured to determine whether linear mode angular prediction is enabled when the current block is determined not to use the linear model; applying an angular mode prediction pattern and a linear model prediction pattern to samples of the current chroma block if linear mode angular prediction is enabled; and determining a final linear-mode angular prediction for the samples of the current chroma block as a weighted sum of the applied angular-mode prediction pattern and a linear-model prediction pattern.
In one example, a method of coding video data includes determining, relative to current block video data, a number of neighboring chroma blocks coded using a linear model coding mode, and dynamically changing a particular type of codeword used to indicate the linear model coding mode based on the determined number of neighboring chroma blocks of video data coded using the linear model coding mode.
In one example, a device for coding video data includes a memory that stores video data and a video coder that includes one or more processors configured to determine, relative to current block video data, a number of neighboring chroma blocks that are coded using a linear model coding mode, and dynamically change a particular type of codeword to indicate a linear model coding mode based on the determined number of neighboring chroma blocks of video data that are coded using the linear model coding mode.
In one example, a method of coding video data comprises: determining a size of a current chroma block of the video data, comparing the size of the current chroma block to a threshold, applying a linear model mode of a plurality of linear model modes when the size of the current chroma block satisfies the threshold, and not applying the linear model mode of a plurality of linear model modes when the size of the current chroma block does not satisfy the threshold.
In one example, a device for coding video data comprises a memory that stores video data, and a video coder comprising one or more processors configured to determine a size of a current chroma block of the video data, compare the size of the current chroma block to a threshold, apply a linear model mode of a plurality of linear model modes when the size of the current chroma block satisfies the threshold, and not apply the linear model mode of a plurality of linear model modes when the size of the current chroma block does not satisfy the threshold.
In one example, a device configured to code video data comprises means for performing any combination of the methods described in this disclosure. In another example, a computer-readable medium is encoded with instructions that, when executed, cause one or more processors of a device configured to code video data to perform any combination of the methods described in this disclosure. In another example, any combination of the techniques described in this disclosure may be performed.
The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
Drawings
Fig. 1 is a block diagram illustrating an example video encoding and decoding system that may utilize techniques for multi-model linear model chroma intra prediction described in this disclosure.
Fig. 2 is a block diagram illustrating an example video encoder that may implement the techniques for multi-model linear model chroma intra prediction described in this disclosure.
Fig. 3 is a block diagram of an example video decoder that may implement the techniques for multi-mode linear model chroma intra prediction described in this disclosure.
Fig. 4 is a conceptual diagram of example locations of samples used to derive model parameters a and β for linear model chroma intra prediction.
Fig. 5 is a graph of an example of a linear regression between a luma (Y) component and a chroma (C) component.
Fig. 6 is a conceptual diagram of an example of luma sample downsampling.
Fig. 7A to 7E are graphs classifying neighboring samples according to an example of the present invention.
Fig. 8A-8D are conceptual diagrams of neighboring chroma samples used to derive a linear model, according to an example of this disclosure.
FIG. 9 is a conceptual diagram of adjacent sample classification according to an example of this disclosure.
Fig. 10 is a conceptual diagram of two linear models for neighboring coded luma samples classified into 2 groups, according to an example of this disclosure.
Fig. 11 is a conceptual diagram of applying one of two linear models, model 1, to all pixels of a current block according to an example of the present invention.
Fig. 12 is a conceptual diagram of applying one of two linear models, model 2, to all pixels of a current block according to an example of the present invention.
FIG. 13 is a conceptual diagram of a prediction process according to an example of this disclosure.
14A-14C are conceptual diagrams of luma sub-sampling filters according to examples of this disclosure.
Fig. 15 is a flow chart of signaling in LM Angle Prediction (LAP) mode according to an example of the present invention.
Fig. 16 is a block diagram of a LAP in accordance with an example of the invention.
Fig. 17 is a conceptual diagram of neighboring blocks of a current block.
FIG. 18 is a flow chart illustrating an example encoding method of the present invention.
FIG. 19 is a flow chart illustrating an example encoding method of the present invention.
FIG. 20 is a flow diagram illustrating an example method for encoding a current block.
Fig. 21 is a flow diagram illustrating an example method for decoding a current block of video data.
Detailed Description
The present disclosure relates to prediction of traversal components in video codecs, and more particularly to techniques for Linear Model (LM) chroma intra prediction. In one example of the present invention, a multi-model LM (MMLM) technique is described. When using MMLM for intra-chroma prediction, a video coder (e.g., a video encoder or a video decoder) may use more than one linear model for predicting a block of a chroma component from a corresponding block of a luma component (e.g., a Coding Unit (CU) or a Prediction Unit (PU)). Neighboring luma samples and neighboring chroma samples of a current block may be classified into groups, and each group may be used as a training set to derive an independent linear model. In one example, samples of a corresponding luma block may be further classified based on the same rules used to classify neighboring samples. The video coder may apply each linear model to portions of the current luma block depending on the classification to obtain partially predicted chroma blocks. Partial predicted chroma blocks from multiple linear models may be combined to obtain a final predicted chroma block.
In another example of this disclosure, techniques for multi-filter LM mode are described. When using multi-filter lm (mflm) chroma prediction techniques, a video coder may use more than one luma downsampling filter if the video data is not in 4:4:4 format. That is, if a chroma block is subsampled compared to luma values (i.e., the video data is not 4:4:4), the video coder subsamples the luma block for purposes of traversing component chroma intra prediction. In this way, there is a 1:1 correlation between luma and chroma samples. The MFLM technique of this disclosure may additionally be applied to downsampling filters defined in the examples of joint exploratory models (JEM-3.0) currently developed by the joint video exploration team (jfet).
In another example of this disclosure, techniques for LM angle prediction mode are described. When LM Angle Prediction (LAP) is used, some type of angular prediction (e.g., angular prediction may include directional DC plane prediction or other non-transverse component intra prediction), and some types of LM predictions may be combined together to obtain the final prediction for the chroma block. Using multi-model LM (mmlm) chroma prediction (with or without multi-filter LM (mflm)) and/or LM Angle Prediction (LAP) prediction techniques described herein, alone or in combination, can individually cause bit rate distortion (BD-rate) coding gains of approximately 0.4% and 3.5% on the luma and chroma components with a slight increase in coding time (e.g., 104% coding time).
Fig. 1 is a block diagram illustrating an example video encoding and decoding system 10 that may utilize techniques for linear model chroma intra prediction described in this disclosure. As shown in fig. 1, system 10 includes a source device 12 that generates encoded video data to be decoded at a later time by a destination device 14. In particular, source device 12 provides video data to destination device 14 via computer-readable medium 16. Source device 12 and destination device 14 may comprise any of a wide range of devices, including desktop computers, notebook (i.e., portable) computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" phones, so-called "smart" pads, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, or the like. In some cases, source device 12 and destination device 14 may be equipped for wireless communication.
Destination device 14 may receive encoded video data to be decoded via computer-readable medium 16. Computer-readable medium 16 may comprise any type of medium or device capable of moving encoded video data from source device 12 to destination device 14. In one example, computer-readable medium 16 may comprise a communication medium to enable source device 12 to transmit encoded video data directly to destination device 14 in real-time. The encoded video data may be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to destination device 14. The communication medium may include any wireless or wired communication medium, such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the internet. The communication medium may include routers, switches, base stations, or any other equipment that may be used to facilitate communication from source device 12 to destination device 14.
In some examples, the encoded data may be output from output interface 22 to computer-readable medium 16 configured as a storage device. Similarly, encoded data may be accessed from storage by input interface 28. The storage device may include any of a variety of distributed or locally accessed data storage media such as a hard drive, Blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data. In yet another example, the storage device may correspond to a file server or another intermediate storage device that may store the encoded video generated by source device 12. Destination device 14 may access stored video data via streaming or downloading from a storage device. The file server may be any type of server capable of storing encoded video data and transmitting that encoded video data to destination device 14. Example file servers include web servers (e.g., for a website), FTP servers, Network Attached Storage (NAS) devices, or local disk drives. Destination device 14 may access the encoded video data over any standard data connection, including an internet connection. Such a connection may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both, suitable for accessing encoded video data stored on a file server. The transmission of encoded video data from the storage device may be a streaming transmission, a download transmission, or a combination thereof.
The techniques of this disclosure are not necessarily limited to wireless applications or settings. The techniques may be applied to video coding to support any of a variety of multimedia applications, such as over-the-air television broadcasts, cable television transmissions, satellite television transmissions, internet streaming video transmissions (e.g., dynamic adaptive streaming over HTTP (DASH)), digital video encoded onto a data storage medium, decoding of digital video stored on a data storage medium, or other applications. In some examples, system 10 may be configured to support one-way or two-way video transmission to support applications, such as applications for video streaming transmission, video playback, video broadcasting, and/or video telephony.
In the example of fig. 1, source device 12 includes a video source 18, a video encoder 20, and an output interface 22. Destination device 14 includes input interface 28, video decoder 30, and display device 32. In accordance with this disclosure, video encoder 20 of source device 12 and/or video decoder 30 of destination device 14 may be configured to apply the techniques for enhanced linear model chroma intra prediction described in this disclosure. In other examples, source device 12 and destination device 14 may include other components or arrangements. For example, source device 12 may receive video data from an external video source 18, such as an external video camera. Likewise, destination device 14 may interface with an external display device, rather than including an integrated display device.
The illustrated system 10 of FIG. 1 is merely one example. The techniques for enhanced linear model chroma intra prediction described in this disclosure may be performed by any digital video encoding and/or decoding device. Although the techniques of this disclosure are typically performed by a video encoding device, the techniques may also be performed by a video encoder/decoder (commonly referred to as a "CODEC"). Furthermore, the techniques of this disclosure may also be performed by a video preprocessor. Source device 12 and destination device 14 are merely examples of such coding devices that source device 12 generates coded video data for transmission to destination device 14. In some examples, devices 12, 14 may operate in a substantially symmetric manner such that each of devices 12, 14 includes video encoding and decoding components. Thus, system 10 may support one-way or two-way video propagation between video devices 12, 14 for, for example, video streaming transmission, video playback, video broadcasting, or video telephony.
Video source 18 of source device 12 may include a video capture device, such as a video camera, a video archive containing previously captured video, and/or a video feed interface for receiving video from a video content provider. As another alternative, video source 18 may generate computer graphics-based data as the source video, or a combination of live video, archived video, and computer-generated video. In some cases, if video source 18 is a video camera, source device 12 and destination device 14 may form so-called camera phones or video phones. However, as mentioned above, the techniques described in this disclosure may be applicable to video coding in general, and may be applied to wireless and/or wired applications. In each case, the captured, pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video information may then be output by output interface 22 onto computer-readable medium 16.
Computer-readable medium 16 may include transitory media such as wireless broadcast or wired network transmissions; or a storage medium (i.e., a non-transitory storage medium) such as a hard disk, a drive-on disk, a compact disc, a digital video disc, a blu-ray disc, or other computer readable medium. In some examples, a network server (not shown) may receive encoded video data from source device 12 and provide the encoded video data to destination device 14, e.g., via a network transmission. Similarly, a computing device of a media generation facility (e.g., a disc stamping facility) may receive encoded video data from source device 12 and produce a disc containing the encoded video data. Thus, in various examples, it may be appreciated that computer-readable medium 16 includes one or more computer-readable media in various forms.
Input interface 28 of destination device 14 receives information from computer-readable medium 16. The information of computer-readable medium 16 may include syntax information defined by video encoder 20 that is also used by video decoder 30, including syntax elements that describe characteristics and/or processing of blocks and other coded units. Display device 32 displays the decoded video data to a user, and may comprise any of a variety of display devices, such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or another type of display device.
Video encoder 20 and video decoder 30 may operate in accordance with a video coding standard, such as the High Efficiency Video Coding (HEVC) standard, also known as ITU-T h.265. In other examples, video encoder 20 and video decoder may operate according to future video coding standards, including the standard currently developed by jfet. Alternatively, video encoder 20 and video decoder 30 may operate according to other proprietary or industry standards, such as the ITU-T h.264 standard, alternatively referred to as MPEG-4 part 10 "Advanced Video Coding (AVC)", or extensions of such standards. However, the techniques of this disclosure are not limited to any particular coding standard and may be applied to future video coding standards. Other examples of video coding standards include MPEG-2 and ITU-T H.263. Although not shown in fig. 1, in some aspects, video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder, and may include appropriate MUX-DEMUX units or other hardware and software to handle encoding of both audio and video in a common data stream or separate data streams. The MUX-DEMUX unit, if applicable, may conform to the ITU h.223 multiplexer protocol or other protocols such as the User Datagram Protocol (UDP).
Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder or decoder circuits, which may include fixed-function and/or programmable processing circuitry, such as one or more microprocessors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic, software, hardware, firmware, or any combinations thereof. When the techniques are implemented in part in software, a device may store instructions for the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in a respective device.
In general, according to ITU-T h.265, a video picture may be divided into a series of Coding Tree Units (CTUs) (or Largest Coding Units (LCUs)) that may include both luma and chroma samples. Alternatively, the CTU may include monochrome data (i.e., only luma samples). Syntax data within the bitstream may define the size of the CTU, which is the largest coding unit in terms of the number of pixels. A slice includes a number of consecutive CTUs in coding order. A video picture may be partitioned into one or more slices. Each CTU may be split into Coding Units (CUs) according to a quadtree. In general, the quadtree data structure includes one node per CU, where the root node corresponds to a CTU. If a CU is split into four sub-CUs, the node corresponding to the CU includes four leaf nodes, each of the four leaf nodes corresponding to one of the sub-CUs.
Each node in the quadtree data structure may provide syntax data for the corresponding CU. For example, a node in the quadtree may include a split flag, indicating whether a CU corresponding to the node is split into sub-CUs. Syntax elements for a CU may be defined recursively and may depend on whether the CU is split into sub-CUs. If a CU is not further split, it may be referred to as a leaf CU. In the present invention, the four sub-CUs of a leaf CU will be referred to as leaf CUs even if there is no significant splitting of the original leaf CU. For example, if a CU of size 16 × 16 is not further split, then the four 8 × 8 sub-CUs will also be referred to as leaf CUs, even though the 16 × 16CU is never split.
A CU has a similar purpose to a macroblock of the h.264 standard, except that the CU has no size difference. For example, a CTU may be split into four child nodes (also referred to as child CUs), and each child node may in turn be a parent node and may be split into four additional child nodes. The final, non-split child nodes, referred to as leaf nodes of the quadtree, comprise coding nodes, also referred to as leaf-CUs. Syntax data associated with a coded bitstream may define a maximum number of times a CTU may be split (which is referred to as a maximum CU depth), and may also define a minimum size of a coding node. Thus, the bitstream may also define a minimum coding unit (SCU). This disclosure uses the term "block" to refer to any of a CU, Prediction Unit (PU), or Transform Unit (TU) in the case of HEVC, or similar data structures in the case of other standards (e.g., macroblocks and sub-blocks thereof in h.264/AVC).
A CU includes a coding node and Prediction Units (PUs) and Transform Units (TUs) associated with the coding node. The size of a CU corresponds to the size of a coding node and is substantially square in shape. The CU may range in size from 8 × 8 pixels up to a size with a maximum size of e.g. 64 × 64 pixels or CTUs larger than 64 × 64 pixels. Each CU may contain one or more PUs and one or more TUs. Syntax data associated with a CU may describe, for example, partitioning the CU into one or more PUs. The partition mode may be different between a CU being skipped or direct mode encoded, intra prediction mode encoded, or inter prediction mode encoded. The PU may be segmented into non-square shapes. Syntax data associated with a CU may also describe partitioning the CU into one or more TUs, e.g., according to a quadtree. The TU may be square or non-square (e.g., rectangular) in shape.
The HEVC standard allows for transforms from TUs, which may be different for different CUs. A TU is typically sized based on the size of the PU (or partition of a CU) within a given CU defined for a partitioned CTU, although this may not always be the case. A TU is typically the same size or smaller in size compared to a PU (or a partition of a CU, e.g., in the case of intra prediction). In some examples, residual samples corresponding to a CU may be subdivided into smaller units using a quadtree structure referred to as a "residual quadtree" (RQT). The leaf nodes of the RQT may be referred to as Transform Units (TUs). Pixel difference values associated with TUs may be transformed to produce transform coefficients that may be quantized.
A leaf-CU, when using inter prediction, may include one or more Prediction Units (PUs). In general, a PU represents a spatial region corresponding to all or a portion of the corresponding CU, and may include data used to retrieve and/or generate reference samples for the PU. In addition, the PU contains data related to prediction. When a CU is inter-mode encoded, one or more PUs of the CU may include data defining motion information, such as one or more motion vectors, or a PU may be skip mode coded. The data defining the motion vector for a PU may describe, for example, a horizontal component of the motion vector, a vertical component of the motion vector, a resolution of the motion vector (e.g., one-quarter pixel precision or one-eighth pixel precision), a reference picture to which the motion vector points, and/or a reference picture list (e.g., list 0 or list 1) of the motion vector.
A leaf-CU may also be intra-mode predicted. In general, intra-prediction may involve using intra-modes to predict a leaf-CU (or its partition). The video coder may select a set of neighboring previously coded pixels to the leaf-CU to use to predict the leaf-CU (or its partition).
A leaf-CU may also include one or more Transform Units (TUs). As discussed above, the transform units may be specified using RQTs (also referred to as TU quadtree structures). For example, the split flag may indicate whether a leaf CU is split into four transform units. Each TU may then be further split into other sub-TUs. When a TU is not further split, it may be referred to as a leaf-TU. In general, for intra coding, all leaf-TUs belonging to a leaf-CU share the same intra prediction mode. That is, the same intra prediction mode is typically applied to calculate the prediction values for all TUs of a leaf-CU. For intra coding, a video encoder may calculate a residual value for each leaf-TU as the difference between the portion of the CU corresponding to the TU and the original block using an intra-prediction mode. TUs are not necessarily limited to the size of a PU. Thus, TU may be larger or smaller than PU. For intra coding, partitions of a CU or the CU itself may be collocated with the corresponding leaf-TUs of the CU. In some examples, the maximum size of a leaf-TU may correspond to the size of the corresponding leaf-CU.
Furthermore, the TUs of a leaf-CU may also be associated with a respective quadtree data structure, referred to as a Residual Quadtree (RQT). That is, a leaf-CU may include a quadtree that indicates how the leaf-CU is partitioned into TUs. The root node of the TU quadtree generally corresponds to a leaf CU, while the root node of the CU quadtree generally corresponds to a CTU (or LCU). The un-split TUs of the RQT are referred to as leaf-TUs. In general, the terms CU and TU are used by this disclosure to refer to leaf-CU and leaf-TU, respectively, unless otherwise indicated.
A video sequence typically includes a series of video frames or pictures that start with a Random Access Point (RAP) picture. A video sequence may include syntax data in a Sequence Parameter Set (SPS) that includes characteristics of the video sequence. Each slice of a picture may include slice syntax data that describes the encoding mode of the respective slice. Video encoder 20 typically operates on video blocks within individual video slices in order to encode the video data. The video block may correspond to a coding node within a CU. Video blocks may have fixed or varying sizes, and may be different sizes according to a specified coding standard.
As an example, prediction may be performed for PUs of various sizes. Assuming that the size of a particular CU is 2 nx 2N, intra prediction may be performed on PU sizes of 2 nx 2N or nxn, and inter prediction may be performed on symmetric PU sizes of 2 nx 2N, 2 nx N, N x 2N, or nxn. Asymmetric partitioning for inter prediction may also be performed for PU sizes of 2 nxnu, 2 nxnd, nL × 2N, and nR × 2N. In asymmetric partitioning, one direction of a CU is undivided, while the other direction is partitioned into 25% and 75%. The portion of the CU corresponding to the 25% split is indicated by an indication of "n" followed by "Up", "Down", "Left", or "Right". Thus, for example, "2N × nU" refers to a 2N × 2N CU horizontally partitioned with a top 2N × 0.5N PU and a bottom 2N × 1.5N PU.
In this disclosure, "nxn" and "N by N" may be used interchangeably to refer to pixel dimensions of a video block in terms of vertical and horizontal dimensions, e.g., 16 x 16 pixels or 16 by 16. In general, a 16 × 16 block will have 16 pixels (y-16) in the vertical direction and 16 pixels (x-16) in the horizontal direction. Likewise, an nxn block typically has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. The pixels in a block may be arranged in rows and columns. Furthermore, the block does not necessarily need to have the same number of pixels in the horizontal direction as in the vertical direction. For example, a block may comprise N × M pixels, where M is not necessarily equal to N.
After using intra-predictive or inter-predictive coding of PUs of the CU, video encoder 20 may calculate residual data for the TUs of the CU. The PU may comprise syntax data describing a method or mode of generating predictive pixel data in the spatial domain, also referred to as the pixel domain, and the TU may comprise coefficients in the transform domain after applying a transform, such as a Discrete Cosine Transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform, to the residual video data. The residual data may correspond to pixel differences between pixels of the unencoded picture and prediction values corresponding to the PU. Video encoder 20 may form the TUs to include quantized transform coefficients that represent residual data of the CU. That is, video encoder 20 may calculate residual data (in the form of a residual block), transform the residual block to generate a block of transform coefficients, and then quantize the transform coefficients to form quantized transform coefficients. Video encoder 20 may form TUs that include the quantized transform coefficients, as well as other syntax information (e.g., split information for the TUs).
As mentioned above, after any transform to generate transform coefficients, video encoder 20 may perform quantization of the transform coefficients. Quantization generally refers to the process by which transform coefficients are quantized to possibly reduce the amount of data used to represent the transform coefficients, providing further compression. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value may be downscaled to an m-bit value during quantization, where n is greater than m.
After quantization, the video encoder may scan the transform coefficients, producing a one-dimensional vector from a two-dimensional matrix including the quantized transform coefficients. The scan may be designed to place higher energy (and therefore lower frequency) coefficients at the front of the array and lower energy (and therefore higher frequency) coefficients at the back of the array. In some examples, video encoder 20 may utilize a predefined scan order to scan the quantized transform coefficients to generate a serialized vector that may be entropy encoded. In other examples, video encoder 20 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 20 may entropy encode the one-dimensional vector, e.g., according to Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), Probability Interval Partitioning Entropy (PIPE) coding, or another entropy encoding method. Video encoder 20 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 30 in decoding the video data.
To perform CABAC, video encoder 20 may assign contexts within the context model to symbols to be transmitted. The context may relate to, for example, whether neighboring values of a symbol are non-zero. To perform CAVLC, video encoder 20 may select a variable length code for the symbol to be transmitted. Codewords in VLC may be constructed such that relatively shorter codes correspond to more probable symbols and longer codes correspond to less probable symbols. In this way, bit savings may be achieved using VLC, relative to, for example, using equal length codewords for each symbol to be transmitted. The probability determination may be based on the context assigned to the symbol.
In general, video decoder 30 performs a process substantially similar, although reciprocal to that performed by video encoder 20, to decode the encoded data. For example, video decoder 30 inverse quantizes and inverse transforms the coefficients of the received TU to render the residual block. Video decoder 30 uses the signaled prediction mode (intra-prediction or inter-prediction) to form the predicted block. Video decoder 30 then combines the predicted block (on a pixel-by-pixel basis) with the residual block to render the original block. Additional processing may be performed, such as performing a deblocking process to reduce visual artifacts along block boundaries. In addition, video decoder 30 may use CABAC to decode syntax elements in a manner that, while reciprocal to the CABAC encoding process of video encoder 20, is substantially similar thereto.
Video encoder 20 may further send syntax data, such as block-based syntax data, picture-based syntax data, and sequence-based syntax data to video decoder 30, for example, in a picture header, a block header, a slice header, or other syntax data, such as a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), or a Video Parameter Set (VPS), to video decoder 30.
Fig. 2 is a block diagram illustrating an example of a video encoder 20, which may implement the techniques for enhanced linear model chroma intra prediction described in this disclosure. Video encoder 20 may perform intra-coding and inter-coding of video blocks within a video slice. Intra-coding relies on spatial prediction to reduce or remove spatial redundancy of video within a given video frame or picture. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy of video within adjacent frames or pictures of a video sequence. Intra-mode (I-mode) may refer to any of a number of spatial-based coding modes. An inter mode, such as uni-directional prediction (P-mode) or bi-directional prediction (B-mode), may refer to any of a number of temporally based coding modes.
As shown in fig. 2, video encoder 20 receives a current video block within a video frame to be encoded. In the example of fig. 2, video encoder 20 includes mode select unit 40, reference picture memory 64, which may also be referred to as a Decoded Picture Buffer (DPB), video data memory 65, summer 50, transform processing unit 52, quantization unit 54, and entropy encoding unit 56. Mode select unit 40, in turn, includes motion compensation unit 44, motion estimation unit 42, intra-prediction unit 46, and partition unit 48. For video block reconstruction, video encoder 20 also includes an inverse quantization unit 58, an inverse transform unit 60, and a summer 62. A deblocking filter (not shown in fig. 2) may also be included to filter block boundaries to remove blocking artifacts from the reconstructed video. The deblocking filter will typically filter the output of summer 62 if desired. In addition to deblocking filters, additional filters (in-loop or post-loop) may also be used. Such filters are not shown for simplicity, but may filter the output of summer 50 (as an in-loop filter), if desired.
As shown in fig. 2, video encoder 20 receives video data and stores the received video data in video data memory 85. Video data memory 65 may store video data to be encoded by components of video encoder 20. The video data stored in video data memory 65 may be obtained, for example, from video source 18. Reference picture memory 64 may be a reference picture memory that stores reference video data for encoding video data by video encoder 20 (e.g., in intra or inter coding modes). Video data memory 65 and reference picture memory 64 may be formed from any of a variety of memory devices, such as Dynamic Random Access Memory (DRAM), including synchronous DRAM (sdram), magnetoresistive ram (mram), resistive ram (rram), or other types of memory devices. Video data memory 65 and reference picture memory 64 may be provided by the same memory device or separate memory devices. In various examples, video data memory 65 may be on-chip with other components of video encoder 20, or off-chip with respect to those components.
During the encoding process, video encoder 20 receives a video frame or slice to be coded. A frame or slice may be divided into a plurality of video blocks. Motion estimation unit 42 and motion compensation unit 44 perform inter-predictive encoding of the received video block relative to one or more blocks in one or more reference frames to provide temporal prediction. Intra-prediction unit 46 may alternatively perform intra-predictive encoding of the received video block relative to one or more neighboring blocks in the same frame or slice as the block to be coded to provide spatial prediction. Video encoder 20 may perform multiple coding passes, e.g., to select an appropriate coding mode for each block of video data.
Furthermore, partition unit 48 may partition a block of video data into sub-blocks based on an evaluation of previous partitioning schemes in previous coding passes. For example, partitioning unit 48 may initially partition a frame or slice into CTUs, and partition each of the CTUs into sub-CUs based on a bitrate-distortion analysis (e.g., bitrate-distortion optimization). Mode select unit 40 may further generate a quadtree data structure that indicates partitioning of the CTUs into sub-CUs. Leaf-node CUs of the quadtree may include one or more PUs and one or more TUs.
Mode select unit 40 may select one of the prediction modes (intra or inter), e.g., based on the error result, and provide the resulting predicted block to summer 50 to generate residual data and to summer 62 to reconstruct the encoded block used as the reference frame. Among the possible intra-prediction modes, mode select unit 40 may determine to use a linear model chroma intra-prediction mode in accordance with the techniques of this disclosure. Mode select unit 40 also provides syntax elements (e.g., motion vectors, intra-mode indicators, partition information, and other such syntax information) to entropy encoding unit 56.
Motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation, performed by motion estimation unit 42, is the process of generating motion vectors that estimate the motion of video blocks. For example, a motion vector may indicate the displacement of a PU of a video block within a current video frame or picture relative to a predictive block within a reference picture (or other coded unit) relative to a current block being coded within the current picture (or other coded unit). A predictive block is a block that is found to closely match the block to be coded in terms of pixel differences, which may be determined by Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), or other difference metrics. In some examples, video encoder 20 may calculate values for sub-integer pixel positions of reference pictures stored in reference picture memory 64. For example, video encoder 20 may interpolate values for a quarter-pixel position, an eighth-pixel position, or other fractional-pixel positions of a reference picture. Thus, motion estimation unit 42 may perform a motion search relative to full pixel positions and fractional pixel positions and output motion vectors with fractional pixel precision.
Motion estimation unit 42 calculates motion vectors for PUs of video blocks in inter-coded slices by comparing the locations of the PUs to locations of predictive blocks of reference pictures. The reference picture may be selected from a first reference picture list (list 0) or a second reference picture list (list 1), each of which identifies one or more reference pictures stored in reference picture memory 64. Motion estimation unit 42 sends the calculated motion vectors to entropy encoding unit 56 and motion compensation unit 44.
The motion compensation performed by motion compensation unit 44 may involve extracting or generating a predictive block based on the motion vectors determined by motion estimation unit 42. Again, in some examples, motion estimation unit 42 and motion compensation unit 44 may be functionally integrated. Upon receiving the motion vector for the PU of the current video block, motion compensation unit 44 may locate, in one of the reference picture lists, the predictive block to which the motion vector points. Summer 50 forms a residual video block by subtracting the pixel values of the predictive block from the pixel values of the current video block being coded, forming pixel difference values, as discussed below. In general, motion estimation unit 42 performs motion estimation with respect to the luma component, and motion compensation unit 44 uses motion vectors calculated based on the luma component for both the chroma components and the luma components. Mode select unit 40 may also generate syntax elements associated with the video blocks and the video slice for use by video decoder 30 in decoding the video blocks of the video slice.
In lieu of inter-prediction performed by motion estimation unit 42 and motion compensation unit 44, intra-prediction unit 46 may intra-predict the current block, as described above. In particular, intra-prediction unit 46 may determine the intra-prediction mode to be used to encode the current block. In some examples, intra-prediction unit 46 may encode the current block using various intra-prediction modes, e.g., during separate encoding passes, and intra-prediction unit 46 (or mode selection unit 40 in some examples) may select an appropriate intra-prediction mode to use from the tested modes.
For example, intra-prediction unit 46 may calculate rate-distortion values using rate-distortion analysis for various tested intra-prediction modes, and select the intra-prediction mode having the best rate-distortion characteristics among the tested modes. Rate-distortion analysis generally determines the amount of distortion (or error) between an encoded block and an original, unencoded block, which is encoded to produce an encoded block, and the bit rate (i.e., the number of bits) used to produce the encoded block. Intra-prediction unit 46 may calculate ratios from the distortion and rate of various encoded blocks to determine which intra-prediction mode exhibits the best rate-distortion value for the block.
After selecting the intra-prediction mode for the block, intra-prediction unit 46 may provide information to entropy encoding unit 56 indicating the selected intra-prediction mode for the block. Entropy encoding unit 56 may encode information indicative of the selected intra-prediction mode. Video encoder 20 may include the following in the transmitted bitstream: configuration data, which may include a plurality of intra-prediction mode index tables and a plurality of modified intra-prediction mode index tables (also referred to as codeword mapping tables); definition of coding context of various blocks; and an indication of a most probable intra-prediction mode, an intra-prediction mode index table, and a modified intra-prediction mode index table to be used for each of the contexts.
As will be explained in more detail below, intra-prediction unit 46 may be configured to perform the enhanced linear model chroma intra-prediction techniques described in this disclosure.
Video encoder 20 forms a residual video block by subtracting the prediction data from mode select unit 40 from the original video block being coded. Summer 50 represents one or more components that perform this subtraction operation. Transform processing unit 52 applies a transform, such as a Discrete Cosine Transform (DCT) or a conceptually similar transform, to the residual block, producing a video block that includes residual transform coefficient values. Instead of DCT, wavelet transform, integer transform, subband transform, Discrete Sine Transform (DST), or other types of transform may be used. In any case, transform processing unit 52 applies the transform to the residual block, producing a block of transform coefficients. The transform may convert the residual information from a pixel domain to a transform domain, e.g., a frequency domain. Transform processing unit 52 may send the resulting transform coefficients to quantization unit 54. Quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The quantization level may be modified by adjusting a quantization parameter.
After quantization, entropy encoding unit 56 entropy codes the quantized transform coefficients. For example, entropy encoding unit 56 may perform Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), Probability Interval Partition Entropy (PIPE) coding, or another entropy coding technique. In the case of context-based entropy coding, the contexts may be based on neighboring blocks. After entropy coding by entropy encoding unit 56, the encoded bitstream may be transmitted to another device (e.g., video decoder 30), or archived for later transmission or retrieval.
Inverse quantization unit 58 and inverse transform unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain. In particular, summer 62 adds the reconstructed residual block to the motion compensated prediction block generated earlier by motion compensation unit 44 or intra-prediction unit 46 to generate a reconstructed video block for storage in reference picture memory 64. The reconstructed video block may be used by motion estimation unit 42 and motion compensation unit 44 as a reference block to inter-code a block in a subsequent video frame.
In this way, video encoder 20 of fig. 2 represents an example of a video encoder configured to encode a block of luma samples for a first block of video data, reconstruct an encoded block of luma samples to generate reconstructed luma samples, and predict chroma samples for the first block of video data using reconstructed samples for the first block of video data and two or more linear prediction models.
In one example, a method of coding video data includes determining luma samples of a first block of video data; determining a prediction model to predict chroma samples of the first block of video data; determining one of a plurality of downsampling filters to downsample the luma samples; down-sampling the luma samples using the determined down-sampling filter to generate down-sampled luma samples; and predicting chroma samples of the first block of video data using the downsampled luma samples of the first block of video data and the prediction model.
In one example, a method of coding video data comprises determining whether a current chroma block of the video data is coded using a linear model; coding the current chroma block of video data using a linear model if the current chroma block of video data is coded using the linear model; in the case that the current chroma block of video data is not coded using a linear model, the method further comprises determining whether linear mode angular prediction is enabled when the current block is determined not to be coded using the linear model; applying an angular mode prediction pattern and a linear model prediction pattern to samples of the current chroma block if linear mode angular prediction is enabled; and determining a final linear-mode angular prediction for the samples of the current chroma block as a weighted sum of the applied angular-mode prediction pattern and a linear-model prediction pattern.
In one example, a device for coding video data comprises a memory that stores video data and a video coder comprising one or more processors configured to determine whether a current chroma block of the video data is coded using a linear model; code the current chroma block of video data using a linear model if the current chroma block of video data is coded using the linear model; in a case that the current chroma block of video data is not coded using a linear model, the one or more processors are further configured to determine whether linear mode angular prediction is enabled when the current block is determined not to use the linear model; applying an angular mode prediction pattern and a linear model prediction pattern to samples of the current chroma block if linear mode angular prediction is enabled; and determining a final linear-mode angular prediction for the samples of the current chroma block as a weighted sum of the applied angular-mode prediction pattern and a linear-model prediction pattern.
In one example, a method of coding video data includes determining, relative to current block video data, a number of neighboring chroma blocks coded using a linear model coding mode, and dynamically changing a particular type of codeword used to indicate the linear model coding mode based on the determined number of neighboring chroma blocks of video data coded using the linear model coding mode.
In one example, a device for coding video data includes a memory that stores video data and a video coder that includes one or more processors configured to determine, relative to current block video data, a number of neighboring chroma blocks that are coded using a linear model coding mode, and dynamically change a particular type of codeword to indicate a linear model coding mode based on the determined number of neighboring chroma blocks of video data that are coded using the linear model coding mode.
In one example, a method of coding video data comprises: determining a size of a current chroma block of the video data, comparing the size of the current chroma block to a threshold, applying a linear model mode of a plurality of linear model modes when the size of the current chroma block satisfies the threshold, and not applying the linear model mode of a plurality of linear model modes when the size of the current chroma block does not satisfy the threshold.
In one example, a device for coding video data comprises a memory that stores video data, and a video coder comprising one or more processors configured to determine a size of a current chroma block of the video data, compare the size of the current chroma block to a threshold, apply a linear model mode of a plurality of linear model modes when the size of the current chroma block satisfies the threshold, and not apply the linear model mode of a plurality of linear model modes when the size of the current chroma block does not satisfy the threshold.
Fig. 3 is a block diagram illustrating an example of a video decoder 30, which may implement the techniques for enhanced linear model chroma intra prediction described in this disclosure. In the example of fig. 3, video decoder 30 includes entropy decoding unit 70, motion compensation unit 72, intra prediction unit 74, inverse quantization unit 76, inverse transform unit 78, reference picture memory 82, video data memory 85, and summer 80. In some examples, video decoder 30 may perform a decoding pass that is substantially reciprocal to the encoding pass described with respect to video encoder 20 (fig. 2). Motion compensation unit 72 may generate prediction data based on the motion vectors received from entropy decoding unit 70, while intra-prediction unit 74 may generate prediction data based on the intra-prediction mode indicator received from entropy decoding unit 70.
During the decoding process, video decoder 30 receives an encoded video bitstream representing video blocks of an encoded video slice and associated syntax elements from video encoder 20. Video decoder 30 stores the received encoded video bitstream in video data memory 85. Video data memory 85 may store video data, such as an encoded video bitstream, to be decoded by components of video decoder 30. The video data stored in video data memory 85 may be obtained from a storage medium, such as via computer-readable medium 16, or from a local video source, such as a video camera, or by accessing a physical data storage medium. Video data memory 85 may form a Coded Picture Buffer (CPB) that stores encoded video data from an encoded video bitstream. Reference picture memory 82 may be a reference picture memory that stores reference video data for decoding of video data by video decoder 30, e.g., in an intra-coding mode or an inter-coding mode. Video data memory 85 and reference picture memory 82 may be formed by any of a variety of memory devices, such as DRAM, SDRAM, MRAM, RRAM, or other types of memory devices. Video data memory 85 and reference picture memory 82 may be provided by the same memory device or separate memory devices. In various examples, video data memory 85 may be on-chip with other components of video decoder 30, or off-chip with respect to those components.
Entropy decoding unit 70 of video decoder 30 entropy decodes the bitstream to generate quantized coefficients, motion vectors, or intra-prediction mode indicators, and other syntax elements. Entropy decoding unit 70 forwards the motion vectors and other syntax elements to motion compensation unit 72. Video decoder 30 may receive syntax elements at the video slice level and/or the video block level.
When the video slice is coded as an intra-coded (I) slice, intra-prediction unit 74 may generate prediction data for the video block of the current video slice based on the signaling intra-prediction mode and data from previously decoded blocks of the current frame or picture. When the video block is coded as an inter-coded (i.e., B or P) slice, motion compensation unit 72 generates a predictive block for the video block of the current video slice based on the motion vectors and other syntax elements received from entropy decoding unit 70. The predictive block may be generated by one of the reference pictures within one of the reference picture lists. Video decoder 30 may construct reference frame lists, list 0 and list 1, using default construction techniques based on the reference pictures stored in reference picture memory 82.
Motion compensation unit 72 determines prediction information for video blocks of the current video slice by parsing the motion vectors and other syntax elements and uses the prediction information to generate predictive blocks for the current video block being decoded. For example, motion compensation unit 72 uses some of the received syntax elements to determine prediction modes (e.g., intra or inter prediction) used to code video blocks of a video slice, inter prediction slice types (e.g., B-slices or P-slices), construction information for one or more of the reference picture lists of the slice, motion vectors for each inter-coded video block of the slice, inter prediction states for each inter-coded video block of the slice, and other information used to decode video blocks in the current video slice.
Motion compensation unit 72 may also perform interpolation based on interpolation filters. Motion compensation unit 72 may calculate interpolated values for sub-integer pixels of the reference block using interpolation filters as used by video encoder 20 during encoding of video blocks. In this case, motion compensation unit 72 may determine the interpolation filter used by video encoder 20 from the received syntax elements and use the interpolation filter to generate the predictive block.
As will be explained in more detail below, intra-prediction unit 74 may be configured to perform the enhanced linear model chroma intra-prediction techniques described in this disclosure.
Inverse quantization unit 76 inverse quantizes (i.e., de-quantizes) the quantized transform coefficients provided in the bitstream and decoded by entropy decoding unit 70. The inverse quantization process may include quantization parameters calculated for each video block in the video slice using video decoder 30QPYTo determine the degree of quantization and likewise the degree of inverse quantization that should be applied.
The inverse transform unit 78 applies an inverse transform (e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients in order to generate a residual block in the pixel domain.
After motion compensation unit 72 generates the predictive block for the current video block based on the motion vector and other syntax elements, video decoder 30 forms a decoded video block by summing the residual block from inverse transform unit 78 with the corresponding predictive block generated by motion compensation unit 72. Summer 80 represents the component that performs this summation operation. If desired, deblocking filters may also be applied to filter the decoded blocks in order to remove blockiness artifacts. Other filters may also be used (within or after the coding loop) to smooth pixel transitions, or otherwise improve video quality. The decoded video blocks in a given frame or picture are then stored in reference picture memory 82, which stores reference pictures used for subsequent motion compensation. Reference picture memory 82 also stores decoded video for later presentation on a display device (e.g., display device 32 of fig. 1).
In this way, video decoder 30 of fig. 3 represents an example of a video decoder configured to receive an encoded block of luma samples for a first block of video data, decode the encoded block of luma samples to generate reconstructed luma samples, and predict chroma samples of the first block of video data using the reconstructed luma samples of the first block of video data and two or more linear prediction models.
In one example, a method of coding video data includes determining luma samples of a first block of video data; determining a prediction model to predict chroma samples of the first block of video data; determining one of a plurality of downsampling filters to downsample the luma samples; down-sampling the luma samples using the determined down-sampling filter to generate down-sampled luma samples; and predicting chroma samples of the first block of video data using the downsampled luma samples of the first block of video data and the prediction model.
In one example, a method of coding video data comprises determining whether a current chroma block of the video data is coded using a linear model; code the current chroma block of video data using a linear model if the current chroma block of video data is coded using the linear model; in the case that the current chroma block of video data is not coded using a linear model, the method further comprises determining whether linear mode angular prediction is enabled when the current block is determined not to be coded using the linear model; applying an angular mode prediction pattern and a linear model prediction pattern to samples of the current chroma block if linear mode angular prediction is enabled; and determining a final linear-mode angular prediction for the samples of the current chroma block as a weighted sum of the applied angular-mode prediction pattern and a linear-model prediction pattern.
In one example, a device for coding video data comprises a memory that stores video data and a video coder comprising one or more processors configured to determine whether a current chroma block of the video data is coded using a linear model; code the current chroma block of video data using a linear model if the current chroma block of video data is coded using the linear model; in a case that the current chroma block of video data is not coded using a linear model, the one or more processors are further configured to determine whether linear mode angular prediction is enabled when the current block is determined not to use the linear model; applying an angular mode prediction pattern and a linear model prediction pattern to samples of the current chroma block if linear mode angular prediction is enabled; and determining a final linear-mode angular prediction for the samples of the current chroma block as a weighted sum of the applied angular-mode prediction pattern and a linear-model prediction pattern.
In one example, a method of coding video data includes determining, relative to current block video data, a number of neighboring chroma blocks coded using a linear model coding mode, and dynamically changing a particular type of codeword used to indicate the linear model coding mode based on the determined number of neighboring chroma blocks of video data coded using the linear model coding mode.
In one example, a device for coding video data includes a memory that stores video data and a video coder including one or more processors configured to determine, relative to current block video data, a number of neighboring chroma blocks coded using a linear model coding mode, and dynamically change a particular type of codeword used to indicate the linear model coding mode based on the determined number of neighboring chroma blocks of video data coded using the linear model coding mode.
In one example, a method of coding video data comprises: determine a size of a current chroma block of the video data, compare the size of the current chroma block to a threshold, apply a linear model mode of a plurality of linear model modes when the size of the current chroma block satisfies the threshold, and not apply the linear model mode of a plurality of linear model modes when the size of the current chroma block does not satisfy the threshold.
In one example, a device for coding video data comprises a memory that stores video data, and a video coder comprising one or more processors configured to determine a size of a current chroma block of the video data, compare the size of the current chroma block to a threshold, apply a linear model mode of a plurality of linear model modes when the size of the current chroma block satisfies the threshold, and not apply the linear model mode of a plurality of linear model modes when the size of the current chroma block does not satisfy the threshold.
Linear Model (LM) chroma intra prediction is described in chen et al "ce6. a.4: chroma intra prediction by reconstructed luma samples (CE6.a.4: Chroma intra prediction by reconstructed luma samples) "(ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG1 for visual acuityJoint cooperative team with audio decoding (JCT-VC)) fifth meeting: geneva, 2011, 16.16.2011 to 23.266) for JCT-VC, which can be found inhttp://phenix.int-evry.fr/jct/doc_end_user/documents/5_Geneva/wg11/JCTVC- E0266-v4.zip.And (4) obtaining. LM mode has also been proposed for JFET and is described in"Joint exploration test model 3 of Chenet al Description of Algorithm (Algorithm Description of Joint expression Test Model 3)'Section 2.2.4(Joint Cooperation team on video coding (JVT) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG 11) th Three meetings: geneva, switzerland, 2016 from 26 months and 26 days to 1 month and 6 months, jmet-C1001)), and can be found in http://phenix.int-evry.fr/jvet/doc_end_user/documents/3_Geneva/wg11/JVET- Acquisition at C1001-v3.zip. The LM mode assumes that there is a linear relationship between the luma component and the chroma components of the video block. When coding video data according to LM mode, video encoder 20 (e.g., intra-prediction unit 46) and video decoder 30 (e.g., intra-prediction unit 74) may be configured to determine the relationship between luma samples and chroma samples by analyzing neighboring reconstructed pixels of a block of video data utilizing a linear regression method. When using LM mode, video encoder 20 and video decoder 30 may be configured to predict chroma values (e.g., both Cr and Cb chroma samples) from reconstructed luma values of the same block as follows.
PredC[x,y]=α·RecL'[x,y]+β (1)
Wherein PredCIndicates prediction of chroma samples in a block, and RecLIndicating reconstructed luma samples in the block. The parameters α and β are derived from causal reconstructed samples that neighbor the current block.
In some examples, the sampling ratio for the chroma components is half of the sampling ratio for the luma component to the chroma components with a 0.5 pixel phase difference in the vertical direction in the YUV420 sampling (e.g., also referred to as 4:2:0 chroma subsampling). The reconstructed luma samples are downsampled in the vertical direction and subsampled in the horizontal direction to match the magnitude and phase of the chroma signal (i.e., the expected number of chroma components in the block) as follows:
RecL'[x,y]=(RecL[2x,2y]+RecL[2x,2y+1])>>1 (2)
where > > is a logical right shift.
One example of LM utilizes a linear least squares solution between causal reconstruction data of down-sampled luma and causal chroma components to derive linear model parameters alpha and beta. For example, the model parameters α and β can be derived as follows:
wherein RecC(i) And RecL' (I) indicates reconstructed chroma samples and downsampled luma samples that neighbor the target block, and I indicates the total number of samples of neighboring data.
Fig. 4 is a conceptual diagram illustrating the positions of samples used to derive model parameters α and β. As illustrated in fig. 4, the remaining and above causal samples, labeled only as gray rings, are involved in calculating model parameters α and β to keep the total number of samples I at a power of 2. For a target nxn chroma block, when both left and upper causal samples are available, the total number of samples involved, I, is 2N; when only the remaining samples or top causative samples are available, the total number of samples involved is N.
Fig. 5 is a graph of an example of a linear regression between a luma (Y) component and a chroma (C) component. As illustrated in fig. 5, according to one example, a linear regression method may be used to solve the linear relationship between the luma and chroma components. In FIG. 5, points on the conceptual diagram correspond to a pair of sample Rec'L[x,y],Recc[x,y])。
FIG. 6 is a conceptual diagram illustrating an example of luma sample downsampling in JEM 3.0. In the example of fig. 6, the triangle symbols represent downsampled luma values, while the circle symbols represent the original reconstructed luma samples (i.e., prior to any downsampling). The lines represent which of the original luma samples are used to generate downsampled luma values according to each particular downsampling filter. In one example, JVT uses a more complex luma sample downsampling filter for the LM mode in JEM3.0, as illustrated in FIG. 6 of this disclosure, where
Rec′L[x,y]=(2·RecL[2x,2y]+2·RecL[2x,2y+1]+RecL[2x-1,2y]+RecL[2x+1,2y]+RecL[2x-1,2y+1]+RecL[2x+1,2y+1]+4)>>3
When the sample is located at the picture boundary, the two-tap filter may be applied as generally in equation (2) above.
Previous techniques for LM chroma prediction use a single linear regression model to predict chroma values from reconstructed luma values. However, this approach may have drawbacks for certain video sequences. For example, the relationship between luma and chroma samples across all possible luma values may not be linear. Thus, in some examples, LM chroma prediction may introduce an undesirable amount of distortion into the decoded video. This may be particularly true for blocks of video data having a wide range of luma values. This disclosure describes techniques for performing LM chroma prediction, including techniques for luma sub-sampling; and a combined LM chroma prediction and angular prediction mode. The techniques of this disclosure may improve the visual quality of video data encoded and decoded using the LM chroma prediction mode.
In some examples, this disclosure describes the concept of multiple luma sub-sampling filters. In one example, when LM chroma prediction mode is enabled, one or more sets of downsampling filters may be further signaled in a Sequence Parameter Set (SPS), Picture Parameter Set (PPS), or slice header. In one example, a Supplemental Enhancement Information (SEI) message syntax may be introduced to describe the downsampling filter. In one example, a default downsampling filter may be defined without signaling, such as a 6-tap filter [1,2, 1; 1,2,1]. In one example, video encoder 20 may signal an index for the filter in LM prediction mode in one PU/CU/max CU. In one example, the use of filter taps may be derived on-the-fly without signaling. For example, video decoder 30 may be configured to determine the use of filter taps from characteristics of the encoded video bitstream and/or the coding mode without explicit signaling.
As will be described in further detail below, this disclosure describes a multi-model LM (mmlm) method, a multi-filter LM (mflm) method, and an LM Angle Prediction (LAP) method, each of which may be utilized alone or in combination.
In one example, when utilizing the MMLM method, video encoder 20 and video decoder 30 may be configured to use more than one linear model (e.g., multiple linear models) for a single block/Coding Unit (CU)/Transform Unit (TU) to predict chroma components of a block from luma components of the block. Video encoder 20 and video decoder 30 may be configured to derive a plurality of linear models using neighboring luma samples and neighboring chroma samples.
Neighboring luma samples and neighboring chroma samples of a current block may be classified into groups based on the values of the samples. Each group is used as a training set to derive a different linear model (i.e., specific alpha and beta are derived for each specific group). In one example, moreover, video encoder 20 and video decoder 30 are configured to classify samples of the corresponding preceding luma block (i.e., the luma block corresponding to the current chroma block) based on the same rules for classification of neighboring samples.
In one example, video encoder 20 and video decoder 30 are configured to apply each linear model to corresponding classified luma samples in order to obtain a partially predicted chroma block. Video encoder 20 and video decoder 30 are configured to combine each partially predicted chroma block obtained from each linear model to obtain a final predicted chroma block. In another example, video encoder 20 and video decoder 30 may be configured to apply each linear model to all luma samples of the current block to obtain a plurality of predicted chroma blocks. Video encoder 20 and video decoder 30 may then apply the weighted average to each of the plurality of predicted chroma blocks to obtain a final predicted chroma block.
In some examples, video encoder 20 and video decoder 30 may be configured to require the number of samples in a group after classification to be greater than or equal to a particular number (e.g., at least 2 samples per classified group). In one example, the minimum number of samples in one classification group is predefined and the same value is used for all block sizes. In another example, the minimum number of samples in one classification group may be variable, and may depend on the size of the current block, and/or may depend on other characteristics (e.g., the classification group including the minimum number of samples may depend on the prediction mode of the neighboring block). If the number of samples in a group is less than the minimum defined for a block, the samples in other groups may be changed to this group (e.g., samples from neighboring classification groups may be combined). For example, the samples in the group with the most samples may be changed to a group with less than a defined minimum number of samples for the block.
In one example, the sample in the group with the most samples (named group a) may be changed to a group with a lower defined minimum number of samples than the block (named group B) if the sample is the closest sample to the existing sample in group B. In one example, "nearest" may refer to nearest in pixel position. In another example, "nearest" may refer to the closest intensity (e.g., chroma or luma values). In another example, the defined minimum number of blocks may depend on the width and/or height of the coding blocks.
In one example, the classification of neighboring luma and chroma samples may be based on the intensity of the samples (e.g., luma and/or chroma values of the neighboring samples) and/or the location of the neighboring luma and/or chroma samples. In one example, video encoder 20 may be configured to signal a syntax element to video decoder 30 that indicates the classification method to be used.
In one example, the number of categories may be predefined and fixed for all video sequences. In one example, video encoder 20 may be configured to signal the number of classes to video decoder 30 in an encoded video bitstream in one or more of a PPS, SPS, and/or slice header. In one example, the number of classes may depend on the block size, e.g., the width and/or height of the current luma/chroma block. Examples of the M classes of MMLM are given below:
in the above example, T1-TM-1Is a threshold level for each classification group, and thus the threshold level of each corresponding linear model is Predc[x,y]=αM·Rec′L[x,y]+βM. In the above example, the threshold may be defined as the value of the luma sample. Adjacent luma samples (Rec ') with values between two consecutive thresholds'L[x,y]) (e.g., T)m-1<Rec′L[x,y]≤Tm) Classified into the mth group (where M is from 1 to M (inclusive)). In one example, T -1A negative value, such as-1, may be defined. By (T)1…TM-1) The indicated (M-1) threshold may be signaled from video encoder 20 to video decoder 30. In other examples, the threshold may be predefined and stored at each of video encoder 20 and video decoder 30.
In one example, video encoder 20 and video decoder 30 may be configured to calculate the threshold value depending on: all or a partial subset of neighboring coded luma/chroma samples, and/or coded luma samples in the current block.
Figures 7A-7E are graphs depicting the classification of neighboring samples into groups and the determination of a linear model for each group, according to an example of this disclosure. The classification of neighboring samples into two groups is illustrated in fig. 7A, the classification of neighboring samples into three groups is illustrated in fig. 7B, and the classification of neighboring samples into two or more non-contiguous groups is illustrated in fig. 7C-7E. In some examples, the definition or calculation of the threshold may be different depending on different values of M (e.g., different thresholds depending on the number of groups and thus on the number of linear models).
In one example, as illustrated in fig. 7A, when M is equal to 2, neighboring samples may be classified into two groups. Wherein Rec' L[x,y]Neighboring samples ≦ threshold may be classified into group 1; and wherein Rec'L[x,y]>The neighboring samples of the threshold may be classified into group 2. Video encoder 20 and video decoder 30 may be configured to derive two linear models (one for each group) as follows:
in one example according to fig. 7A (i.e., where two groups are classified), video encoder 20 and video decoder 30 may be configured to calculate the threshold as an average of neighboring coded (also indicated as "reconstructed") luma samples. As discussed above, video encoder 20 and video decoder 30 may be configured to downsample reconstructed luma samples if the chroma components are subsampled (e.g., using a chroma subsampling format other than 4:4: 4). In another example, video encoder 20 and video decoder 30 may be configured to calculate the threshold as a median of the neighboring coded luma samples. In another example, video encoder 20 and video decoder 30 may be configured to calculate the threshold as an average of minV and maxV, where minV and maxV are the minimum and maximum values, respectively, of neighboring coded luma samples (which may be downsampled if not in 4:4:4 format). In another example, video encoder 20 and video decoder 30 may be configured to calculate the threshold as an average of neighboring coded luma samples and coded luma samples of the current block (which may be downsampled if not in 4:4:4 format). In another example, video encoder 20 and video decoder 30 may be configured to calculate the threshold as a median of neighboring coded luma samples and coded luma samples of the current block (which may be downsampled if not in 4:4:4 format). In another example, video encoder 20 and video decoder 30 may be configured to calculate the threshold as an average of minV and maxV, where minV and maxV are the minimum and maximum of neighboring coded luma samples and coded luma samples (which may be downsampled if not in 4:4:4 format) in the current block, respectively.
In one example, as illustrated in fig. 7B, when M is equal to 3, neighboring samples may be classified into three groups. Wherein Rec'L[x,y]The neighboring samples ≦ threshold 1 (e.g., luma samples) may be classified into group 1; wherein the threshold value is 1<Rec′L[x,y]Adjacent samples ≦ threshold 2 may be classified into group 2, and where Rec'L[x,y]>The neighboring samples of threshold 2 may be classified into group 3. Video encoder 20 and video decoder 30 may be configured to derive three linear models as:
in one example, video encoder 20 and video decoder 30 may be configured to calculate the threshold using the method described above for the case when M is equal to 2. Video encoder 20 and video decoder 30 may be further configured to calculate that threshold 1 (e.g., as shown in fig. 7B) is the average of minV and the threshold. Video encoder 20 and video decoder 30 may be configured to calculate threshold 2 (e.g., as shown in fig. 7B) as an average of maxV and the threshold. The values of minV and maxV may be the minimum and maximum values, respectively, of neighboring coded luma samples, which may be downsampled if not in the 4:4:4 format.
In another example, video encoder 20 and video decoder 30 may be configured to calculate threshold 1 as 1/3 for sumV, and threshold 2 as 2/3 for sumV, where sumV is the cumulative sum of neighboring coded luma samples (which may be downsampled if not in 4:4:4 format).
In another example, video encoder 20 and video decoder 30 may be configured to calculate threshold 1 as a value between S [ N/3] and S [ N/3+1], and threshold 2 as a value between S [2 x N/3] and S [2 x N/3+1 ]. In this example, N may be the total number of neighboring coded luma samples (which may be downsampled if not in the 4:4:4 format). S [0], S [1], … S [ N-2], S [ N-1] can be a recursive classification sequence of neighboring coded luma samples, which can be downsampled if not in 4:4:4 format.
In another example, video encoder 20 and video decoder 30 may be configured to calculate the threshold using any of the methods described above for the case when M equals 2. Video encoder 20 and video decoder 30 may be further configured to calculate threshold 1 as an average of minV and the threshold. Video encoder 20 and video decoder 30 may be configured to calculate threshold 2 as an average of maxV and the threshold. In this example, the values of minV and maxV may be the minimum and maximum values, respectively, of two of: neighboring coded luma samples (which may be downsampled if not in 4:4:4 format), and coded luma samples in the current block (which may be downsampled if not in 4:4:4 format).
In another example, video encoder 20 and video decoder 30 may be further configured to calculate 1/3 with threshold 1 of sumV. Video encoder 20 and video decoder 30 may be configured to calculate 2/3 with threshold 2 of sumV. In this example, sumV may be a cumulative sum of both: neighboring coded luma samples (which may be downsampled if not in 4:4:4 format), and coded luma samples in the current block (which may be downsampled if not in 4:4:4 format).
In another example, video encoder 20 and video decoder 30 may be configured to calculate threshold 1 as a value between S [ N/3] and S [ N/3+1], and threshold 2 as a value between S [2 × N/3] and S [2 × N/3+1 ]. In this example, N may be the total number of: neighboring coded luma samples (which may be downsampled if not in 4:4:4 format), and coded luma samples in the current block (which may be downsampled if not in 4:4:4 format). S [0], S [1], … S [ N-2], S [ N-1] can be ascending categorical sequences of: neighboring coded luma samples (which may be downsampled if not in 4:4:4 format), and coded luma samples in the current block (which may be downsampled if not in 4:4:4 format).
In one example, the derived linear relationship (represented as a line in fig. 7A-7E) for each group may be continuously segmented, as can be seen in fig. 7A and 7B, where the linear model for the adjacent group causes the same value at various thresholds, as shown in equations (8) and (9) below:
α1·Rec'L[x,y]+β1=α2·Rec'L[x,y]+β2if Rec'L[x,y]Threshold value (8)
In FIG. 7A, and
in fig. 7B.
In another example, the derived linear relationship for each group may be non-contiguously segmented, as in fig. 7C and 7E, where the linear models of adjacent groups do not result in the same value at various thresholds, as shown in equations (10) and (11) below
α1·Rec'L[x,y]+β1≠α2·Rec'L[x,y]+β2If Rec'L[x,y]Threshold value (10)
In FIG. 7C, and
in fig. 7E.
To convert a discontinuous piecewise-linear model (e.g., the discontinuous piecewise-linear model shown in fig. 7C) to a continuous piecewise-linear model, video encoder 20 and video decoder 30 may be configured to generate a transition region between two thresholds. The segments of the linear model in the transition zone connect the original linear model. In this case, the non-continuous two model relationships result in three model relationships after the transformation (as shown in fig. 7D). Video encoder 20 and video decoder 30 may be configured to derive the boundary of the transition region (Z in fig. 7D) based on the value of the original threshold value used for classification and the values of the neighboring samples and/or the value of the current block sample oTo Z1)。
In the example with a transition region, the linear model may be defined as follows:
if Rec'L[x,y]Is in the transition zone [ Z ]0,Z1]In (1).
In one example of the above-described method,
s=Z1-Z01=Z1-Rec'L[x,y],ω2=s-ω1
in one example of the above-described method,
s=2n=Z1-Z01=Z1-Rec'L[x,y],ω2=s-ω1
the transformed continuous piecewise-linear model may be used to replace a non-continuous piecewise-linear model, or may be inserted as an additional LM prediction mode.
With the MMLM techniques of this disclosure, more neighboring luma and/or chroma samples can be used to derive a linear model relative to earlier LM prediction mode techniques. Fig. 8A shows neighboring chroma samples used in the previous example of the LM mode. The same adjacent chroma samples may be used for the MMLM technique of this disclosure. Fig. 8B-8D are conceptual diagrams of other example groups of neighboring chroma samples used to derive a linear model in MMLM mode, according to examples of this disclosure. In fig. 8B-8D, more neighboring samples are used to derive a linear model with MMLM relative to fig. 8A. The black dots in fig. 8A-8D represent adjacent chroma samples used to derive two or more linear models in accordance with the MMLM technique of the present invention. White dots outside the block show other adjacent chroma samples that are unused. The white dots inside the frame represent the chroma samples of the block to be predicted. The corresponding down-sampled luma samples may also be used to derive a linear model.
FIG. 9 is a schematic view ofConceptual view of adjacent sample classifications for one example of an MMLM technique in accordance with this disclosure. FIG. 9 illustrates a current block and neighboring blocks having coded neighboring chroma samples and corresponding coded luma samples (Rec'L4 x 4 currently coded chroma block (Rec) that may be downsampled if not in 4:4:4 formatc). According to one example, in MMLM mode, video encoder 20 and video decoder 30 may be configured to classify adjacent coded luma samples into groups. In the example of fig. 9, adjacent coded luma samples are classified into two groups. Rec'L[x,y]Adjacent luma samples ≦ threshold may be classified into group 1; and Rec'L[x,y]>The adjacent samples of the threshold may be classified into group 2. In this example, the threshold may be 17, for example. Video encoder 20 and video decoder 30 may be configured to classify neighboring chroma samples according to the classification of corresponding neighboring luma samples. That is, the corresponding chroma samples are classified into the same group as the corresponding luma samples in the same location.
As shown in fig. 9, each of both a luma sample and neighboring luma samples in the current block has an associated luma value depicted with each circle. Neighboring luma values less than or equal to the threshold (in this case 17) are colored black (group 1). Lightness values greater than the threshold remain white, i.e., uncolored (group 2). Neighboring chroma samples are classified into group 1 and group 2 based on the classification of the corresponding luma samples in the same location.
Fig. 10 is a conceptual diagram of two linear models of neighboring coded luma samples classified into 2 groups. After neighboring samples are classified into two groups (e.g., as shown in fig. 9), video encoder 20 and video decoder 30 may be configured to separately derive two independent linear models for the two groups, as depicted in fig. 10. In this example, two linear models may be obtained for two classes as follows
The parameters of the linear models may be derived in the same manner as described above, with the parameters being derived for each linear model using samples for a particular classification group of the model.
Fig. 11 is a conceptual diagram in which one of two linear models, model 1, is applied to all pixels of a current block. Fig. 12 is a conceptual diagram of applying the other one of two linear models, model 2, to all pixels of a current block. In one example, video encoder 20 and video decoder 30 may be configured to apply one of model 1 or model 2 to all samples (Rec ″) of the downsampled luma block corresponding to the current coded chroma block, respectively, as shown in fig. 11 and 12'L) To obtain predicted chroma samples (Pred) of the current block c). In one example, video encoder 20 and video decoder 30 may be configured to form the predicted chroma blocks in parallel by two models. Then, a final prediction may be obtained by selecting a particular predicted chroma sample from the two predicted blocks based on the group classification for each position (i.e., based on the group classification of each luma value at each chroma position).
In another example, video encoder 20 and video decoder 30 may be configured to apply both model 1 and model 2 to all samples (Rec ') of the downsampled luma block corresponding to the current coded chroma block'L) To obtain two versions (Pred) of the predicted chroma samples of the current blockc). Video encoder 20 and video decoder 30 are further configured to calculate a weighted average of the two versions of the predicted chroma samples. The weighted average of the two prediction blocks (model 1 or model 2) may be considered as the final prediction block of the current chroma block. Any weighting may be used. As an example, a 0.5/0.5 weighting may be used.
FIG. 13 is a conceptual diagram of another example prediction technique of the MMLM technique according to this disclosure. As illustrated in FIG. 13, video encoder 20 and video decoder 30 may first reconstruct luma samples (Rec ') in the current block' L) And (5) carrying out classification. Video encoder 20 and video decoder 30 may be further configured to apply a first linear model (e.g., model 1 of fig. 10) to the first classification group (via model 1 in fig. 13)Black circles) in the same manner. Video encoder 20 and video decoder 30 may be further configured to apply a second linear model (e.g., model 2 of fig. 10) to the luma samples in the second classification group (represented by the white circles in fig. 13).
In the example of fig. 13, coded luma samples (downsampled if not in a 4:4:4 format) may be classified into two groups depending on the intensity (e.g., value) of the samples. Has a value less than or equal to a threshold value (e.g., Rec'L[x,y]Luma samples ≦ threshold may be classified as group 1 with values greater than the threshold (e.g., Rec ″).L[x,y]>Threshold) may be classified into group 2. In this example, the threshold may be 17, which is calculated using neighboring coded luma samples, as described above. In one example, the classification method for reconstructed luma samples in the current block is the same as the classification method for coded neighboring luma samples.
As illustrated in fig. 13, video encoder 20 and video decoder 30 may be configured to apply model 1 to coded luma samples (downsampled without being in 4:4:4 format) in a current block in a first classifier group (black circles) to derive corresponding predicted chroma samples in the current block. Likewise, video encoder 20 and video decoder 30 may be configured to apply model 2 to coded luma samples (downsampled without being in 4:4:4 format) in the current block in the second classification group (white circle) to derive corresponding predicted chroma samples in the current block. Thus, the predicted chroma samples in the current block are derived from two linear models. When there are more groups, more linear models may be used to obtain the predicted chroma samples.
In one example, video encoder 20 may signal into video decoder 30 the number of groups into which luma samples should be classified. If the number is 1, then the original LM mode is utilized. In another example, LM modes having different numbers of groups may be considered different LM modes. For example, the LM-MM1 schema includes 1 group, the LM-MM2 schema includes 2 groups, and the LM-MM3 schema includes 3 groups. LM-MM1 may be identical to the original LM mode, whereas LM-MM2 and LM-MM3 may be performed in accordance with the techniques of this disclosure. In yet another example, video decoder 30 may derive the number of groups without video encoder 20 signaling the number of groups.
In another example of the present disclosure, a multi-filter lm (mflm) mode is described. In the MFLM mode, more than one luma downsampling filter may be defined if the video data is not in a 4:4:4 chroma subsampling format. For example, an additional downsampling filter other than the downsampling filter defined in JEM-3.0 (shown in FIG. 6 of this disclosure) may be used. The filter may be of the form:
Rec′L[x,y]=a·RecL[2x,2y]+b·RecL[2x,2y+1]+c·RecL[2x-1,2y]+d·RecL[2x+1,2y]+e·RecL[2x-1,2y+1]+f·RecL[2x+1,2y+1]+g, (12)
wherein the filter weights a, b, c, d, e, f, g are real numbers.
Or the like, or, alternatively,
Rec′L[x,y]=(a·RecL[2x,2y]+b·RecL[2x,2y+1]+c·RecL[2x-1,2y]+d·RecL[2x+1,2y]+e·RecL[2x-1,2y+1]+f·RecL[2x+1,2y+1]+g)/h, (13)
wherein the weights a, b, c, d, e, f, g, h of the filter are integers.
Or
Rec′L[x,y]=(a·RecL[2x,2y]+b·RecL[2x,2y+1]+c·RecL[2x-1,2y]+d·RecL[2x+1,2y]+e·RecL[2x-1,2y+1]+f·RecL[2x+1,2y+1]+g)>>h, (14)
Wherein the weights a, b, c, d, e, f, g, h of the filter are integers.
FIGS. 14A-C are conceptual diagrams of a luma sub-sampling filter according to an example of this disclosure. In the example of fig. 14A-14C, the triangle symbols represent downsampled luma values, while the circle symbols represent the original reconstructed luma samples (i.e., prior to any downsampling). The lines represent which of the original luma samples were used to generate downsampled luma values according to each particular downsampling filter. The equations for the various downsampling filters depicted in FIGS. 14A-14C are shown as follows:
(a)Rec′L[x,y]=(RecL[2x,2y]+RecL[2x+1,2y]+1)>>1;
(b)Rec′L[x,y]=(RecL[2x+1,2y]+RecL[2x+1,2y+1]+1)>>1;
(c)Rec′L[x,y]=(RecL[2x,2y]+RecL[2x,2y+1]+1)>>1;
(d)Rec′L[x,y]=(RecL[2x,2y+1]+RecL[2x+1,2y+1]+1)>>1;
(e)Rec′L[x,y]=(RecL[2x,2y]+RecL[2x+1,2y+1]+1)>>1;
(f)Rec′L[x,y]=(RecL[2x,2y+1]+RecL[2x+1,2y]+1)>>1;
(g)Rec′L[x,y]=(RecL[2x,2y]+RecL[2x,2y+1]+RecL[2x-1,2y]+RecL[2x-1,2y+1]+2)>>2;
(h)Rec′L[x,y]=(RecL[2x,2y]+RecL[2x,2y+1]+RecL[2x+1,2y]+RecL[2x+1,2y+1]+2)>>2;
(i)Rec′L[x,y]=(2·RecL[2x,2y]+RecL[2x+1,2y]+RecL[2x-1,2y]+2)>>2;
(j)Rec′L[x,y]=(2·RecL[2x,2y+1]+RecL[2x+1,2y+1]+RecL[2x-1,2y+1]+2)>>2;
(k)Rec′L[x,y]=(RecL[2x-1,2y]+RecL[2x-1,2y+1]+1)>>1;
(l)Rec′L[x,y]=(RecL[2x,2y+1]+RecL[2x-1,2y+1]+1)>>1;
(m)Rec′L[x,y]=(RecL[2x-1,2y]+RecL[2x,2y+1]+1)>>1;
(n)Rec′L[x,y]=(RecL[2x,2y]+RecL[2x-1,2y+1]+1)>>1;
(o)Rec′L[x,y]=(RecL[2x,2y]+RecL[2x-1,2y]+1)>>1;
(p)Rec′L[x,y]=(2·RecL[2x+1,2y]+RecL[2x+1,2y]+RecL[2x+1,2y+1]+2)>>2;
(q)Rec′L[x,y]=(2·RecL[2x+1,2y+1]+RecL[2x,2y+1]+RecL[2x+1,2y]+2)>>2;
(r)Rec′L[x,y]=(5·RecL[2x,2y+1]+RecL[2x-1,2y+1]+RecL[2x+1,2y+1]+RecL[2x,2y]+4)>>3;
if the video sequence is not in a 4:4:4 chroma subsampling format (i.e., non-chroma subsampling), video encoder 20 and video decoder 30 may be configured to perform MFLM using the original LM mode (e.g., single model LM mode) and one or more of the filters shown in fig. 14A-14C (or any set of filters other than those defined in JEM-3.0 and shown in fig. 6). Furthermore, the MFLM technique of the present disclosure may be used in conjunction with the MMLM technique described above.
In some examples, video encoder 20 and video decoder 30 may be pre-configured to use one of several candidate downsampling filters, such as 5 filters. Video encoder 20 may determine the best filter for a given video sequence (e.g., based on a bitrate distortion test) and signal the filter index to video decoder 30 in the encoded video bitstream. The filter index may be signaled at each syntax element level: sequence level (e.g., in VPS/SPS), picture level (e.g., in PPS), slice level (e.g., in slice header or slice segment header), coding tree unit level (in CTU), coding unit level (in CU), prediction unit level (in PU), transform unit level (in TU), or any other level.
In one example, five candidate filters may be shown as follows:
the filter 0: rec'L[x,y]=(2·RecL[2x,2y]+2·RecL[2x,2y+1]+RecL[2x-1,2y]+RecL[2x+1,2y]+RecL[2x-1,2y+1]+RecL[2x+1,2y+1]+4)>>3,
The filter 1: rec'L[x,y]=(RecL[2x,2y]+RecL[2x,2y+1]+RecL[2x+1,2y]+RecL[2x+1,2y+1]+2)>>2;
And (3) a filter 2: rec'L[x,y]=(RecL[2x,2y]+RecL[2x+1,2y]+1)>>1;
And (3) a filter: rec'L[x,y]=(RecL[2x+1,2y]+RecL[2x+1,2y+1]+1)>>1;
And (4) the filter: rec'L[x,y]=(RecL[2x,2y+1]+RecL[2x+1,2y+1]+1)>>1;
Filter 1 is the original 6-tap filter in JEM-3.0.
LM modes having different filters may be considered different LM modes, such as LM-MF0, LM-MF1, LM-MF2, LM-MF3, and LM-MF 4. In the above example, LM-MF0 is identical to the original LM mode. In another example, video decoder 30 may derive the downsampling filter without video encoder 20 signaling the downsampling filter. The filtered result may be clipped to a range of valid luma values.
Fig. 15 is a flow chart of signaling in LM Angle Prediction (LAP) mode according to an example of the present invention. With LM Angle Prediction (LAP), some sort of angle prediction, which may include directional DC plane or other non-transverse component intra prediction, may be combined with LM prediction techniques, including the MMLM techniques of this disclosure, to obtain a final prediction of chroma blocks. A syntax element such as a flag named LAP flag may be signaled if the current chroma block is coded by conventional intra prediction rather than in any LM mode. Assuming that the prediction mode of the current chroma block is mode X, X may be some type of intra angular prediction (including planar mode and DC mode). Note that if the current chroma block is signaled as DM mode, the current chroma block is also considered angular mode since it is identical to some kind of angular prediction mode for the corresponding luma block.
An example of signaling LAP prediction modes is illustrated in fig. 15. Video decoder 30 may determine whether the LM mode was used to encode the current chroma block (120). If so, video decoder 30 continues to decode the current chroma block using the LM mode used by video encoder 20 (124). If not, video decoder 30 reads and parses LAP _ flag (122). If the LAP _ flag indicates that the LAP prediction mode is to be used (e.g., LAP _ flag is 1), video decoder 30 decodes the current chroma block using the LAP prediction mode (128). If the LAP _ flag indicates that the LAP prediction mode is not used (e.g., LAP _ flag is 0), video decoder 30 decodes the current chroma block using angular prediction (126).
With LAP, two prediction patterns are first generated for the chroma blocks, and then the two prediction patterns are combined together. One prediction pattern may be generated in one of several angular prediction modes (e.g., angular mode X). Other predictions may be generated by an LM mode, such as the LM-MM2 mode described above.
Fig. 16 is a block diagram of a LAP in accordance with an example of the invention. As illustrated in fig. 16, in one example of LAP, the first prediction for each sample in the current block may be generated by angular prediction mode X, which is indicated as P1(X, y). Then, a prediction for each sample in the current block may be generated by the LM-MM2 mode, which is denoted as P2(x, y). The final LM-angle prediction may then be calculated as
P(x,y)=w1(x,y)×P1(x,y)+w2(x,y)×P2(x,y), (15)
Where (x, y) represents the coordinates of the samples in the block, and w1(x, y) and w2(x, y) are real numbers. In one example, w1 and w2 may have values of 0.5. In equation (15), w1(x, y) and w2(x, y) may satisfy:
w1(x,y)+w2(x,y)=1。 (16)
in another example of the above-described method,
P(x,y)=(w1(x,y)×P1(x,y)+w2(x,y)×P2(x,y)+a)/b, (17)
wherein w1(x, y), w2(x, y), a and b are integers.
In equation (17), w1(x, y) and w2(x, y) may satisfy:
w1(x,y)+w2(x,y)=b。 (18)
in another example of the above-described method,
P(x,y)=(w1(x,y)×P1(x,y)+w2(x,y)×P2(x,y)+a)>>b, (19)
wherein w1(x, y), w2(x, y), a and b are integers.
In equation (17), w1(x, y) and w2(x, y) may satisfy:
w1(x,y)+w2(x,y)=2b。 (20)
in one example, w1(x, y) and w2(x, y) may be different for different (x, y). In another example, w1(x, y) and w2(x, y) may remain unchanged for all (x, y). In one example, for all (x, y),
P(x,y)=(P1(x,y)+P2(x,y)+1)>>1。 (21)
for all (x, y).
In one example, LAP _ flag may be coded using CABAC. The coded content context may depend on the coded/decoded LAP flag of the neighboring blocks. For example, there may be three context contexts for the LAP _ flag: LAPctx [0], LAPctx [1], and LAPctx [2 ]. Fig. 17 is a conceptual diagram of neighboring blocks of a current block. The variable ctx is calculated as ctx ═ LAP _ flag _ a + LAP _ flag _ B, where LAP _ flag _ a and LAP _ flag _ B are LAP _ flags of neighboring blocks a and B or neighboring blocks a1 and B1, respectively, as illustrated in fig. 17. In one example, P (x, y) may be clipped to a range of effective chroma values.
Using the proposed method of the present invention, there may be more types of LM modes used than those specified in JEM-3.0. This disclosure further describes efficient ways to code chroma intra prediction modes for a particular block. In general, video encoder 20 and video decoder 30 may be configured to code the LM prediction mode used (including possible MMLM, MFLM, or combined MMLM and MFLM modes) depending on the chroma intra prediction modes of neighboring blocks and/or other information of the current block. Video encoder 20 and video decoder 30 may be configured to code the LM prediction mode used such that the most probable mode to be used is coded by the smallest codeword used to specify the mode. In this way, fewer bits may be used to indicate the LM mode. Which modes are specified by the smallest codeword may be adaptive based on the chroma intra prediction modes of neighboring blocks and/or other information of the current block.
In one example, some LM patterns, such as LM, LM-MM2(2 linear models), LM-MM3(3 linear models), LM-MF1, LM-MF2, LM-MF3, and LM-MF4 are candidate LM patterns. The mode LM-MFX may indicate a particular LM mode that uses a particular subset of luma downsampling filters. The LM-MF mode can use a single linear model LM mode or an MMLM according to the techniques of this disclosure. In this example, there are 7 candidate LM modes, and the non-LM mode is appended to represent the case where the current block is coded with angular mode and non-LM mode. If a non-LM is signaled, then the angular mode is signaled as in JEM-3.0, or in any other way. The proposed LM mode signaling method is not limited to a particular LM prediction mode as described. Coding methods, including codeword mapping and binarization, etc., may be applied to any other kind of LM mode or chroma intra prediction mode signaling. Video encoder 20 and video decoder 30 first decode DM _ flag. If the chroma prediction mode is not the DM mode, the proposed LM _ coding () module is called to indicate the current chroma prediction mode. If the LM _ coding () module codes a non-LM mode, then the Chroma _ intra _ mode _ coding () module is called to code the angular Chroma prediction mode. Exemplary decode logic is as follows.
To signal the 8 possible modes (including non-LM modes), 8 symbols with different codewords or binarization, i.e., 0,1, …,6,7, can be used to represent the 8 possible modes. The symbols with the smaller numbers should not have a code length (in bits) that is longer than the code length of the symbols with the larger numbers. The symbols may be binarized in any manner, such as fixed length codes, unary codes, truncated unary codes, exponential golomb codes, and so on. Another exemplary binarization of each symbol is as follows:
0:00
1:01
2:100
3:101
4:1100
5:1101
6:1110
7:1111
in another example, the codeword for each symbol is as follows:
0:0
1:100
2:101
3:1100
4:1101
5:1110
6:11110
7:11111
in one example, video encoder 20 and video decoder 30 may be configured to perform a default mapping between symbols and modes, i.e., a mapping between coded values and coding modes. For example, the default mapping list may be:
0:LM
1:LM-MM2
2:LM-MM3
3:LM-MF1
4:LM-MF2
5:LM-MF3
6:LM-MF4
7: non-LM
According to one example, the mapping may be fixed. In another example, the mapping may be dynamic according to decoded information of neighboring blocks and/or decoded information of the current block. In one example, a symbol of a mode non-LM may be inserted into the mapping list depending on the number of neighboring chroma blocks coded in LM mode, denoted K. In one example, neighboring chroma blocks may be defined as five blocks used in the merge candidate list construction process, i.e., a0, a1, B0, B1, and B2 as shown in fig. 17. Then, the symbol mapping list may be:
-if K ═ 0, then 0: LM, 1: non-LM, 2: LM-MM2, 3: LM-MM3, 4: LM-MF1, 5: LM-MF2, 6: LM-MF3, 7: LM-MF 4;
-if 0< K < ═ 3, then 0: LM, 1: LM-MM2, 2: LM-MM3, 3: non-LM, 4: LM-MF1, 5: LM-MF2, 6: LM-MF3, 7: LM-MF 4;
-if K >3, then 0: LM, 1: LM-MM2, 2: LM-MM3, 3: LM-MF1, 4: LM-MF2, 5: LM-MF3, 6: LM-MF4, 7: a non-LM;
in another example, the symbols of the mode non-LM may be inserted into the mapping list depending on the number of neighboring chroma blocks coded in LM mode that are not denoted as K'. Then, the symbol mapping list may be:
-if K' ═ 5, then 0: LM, 1: non-LM, 2: LM-MM2, 3: LM-MM3, 4: LM-MF1, 5: LM-MF2, 6: LM-MF3, 7: LM-MF 4;
-if 2< ═ K' <5, then 0: LM, 1: LM-MM2, 2: LM-MM3, 3: non-LM, 4: LM-MF1, 5: LM-MF2, 6: LM-MF3, 7: LM-MF 4;
-if K' <2, then 0: LM, 1: LM-MM2, 2: LM-MM3, 3: LM-MF1, 4: LM-MF2, 5: LM-MF3, 6: LM-MF4, 7: a non-LM;
in another example, symbols for modes other than LM may be inserted into the mapping list depending on the number of neighboring chroma blocks, indicated as K', that are not coded in intra mode and not in LM mode. Then, the symbol mapping list may be:
-if K' > 3, then 0: LM, 1: non-LM, 2: LM-MM2, 3: LM-MM3, 4: LM-MF1, 5: LM-MF2, 6: LM-MF3, 7: LM-MF 4;
-if 2< ═ K' <3, then 0: LM, 1: LM-MM2, 2: LM-MM3, 3: non-LM, 4: LM-MF1, 5: LM-MF2, 6: LM-MF3, 7: LM-MF 4;
-if 1< ═ K' <2, then 0: LM, 1: LM-MM2, 2: LM-MM3, 3: LM-MF1, 4: LM-MF2, 5: non-LM, 6: LM-MF3, 7: LM-MF 4;
-if K' ═ 0, then 0: LM, 1: LM-MM2, 2: LM-MM3, 3: LM-MF1, 4: LM-MF2, 5: LM-MF3, 6: LM-MF4, 7: a non-LM;
in another example, symbols for modes other than LM may be inserted into the mapping list depending on the number of neighboring chroma blocks, indicated as K', that are not coded in intra mode and not in LM mode. Then, the symbol mapping list may be:
-if K' > ═ 3, then 0: LM, 1: non-LM, 2: LM-MM2, 3: LM-MM3, 4: LM-MF1, 5: LM-MF2, 6: LM-MF3, 7: LM-MF 4;
-if 1< ═ K' <3, then 0: LM, 1: LM-MM2, 2: LM-MM3, 3: non-LM, 4: LM-MF1, 5: LM-MF2, 6: LM-MF3, 7: LM-MF 4;
-if K ═ 0, then 0: LM, 1: LM-MM2, 2: LM-MM3, 3: LM-MF1, 4: LM-MF2, 5: non-LM, 6: LM-MF3, 7: LM-MF 4;
In another example, symbols for modes other than LM may be inserted into the mapping list depending on the number of neighboring chroma blocks, indicated as K', that are not coded in intra mode and not in LM mode. Then, the symbol mapping list may be:
-if K' > ═ 3, then 0: LM, 1: non-LM, 2: LM-MM2, 3: LM-MM3, 4: LM-MF1, 5: LM-MF2, 6: LM-MF3, 7: LM-MF 4;
-if 2< ═ K' <3, then 0: LM, 1: LM-MM2, 2: LM-MM3, 3: non-LM, 4: LM-MF1, 5: LM-MF2, 6: LM-MF3, 7: LM-MF 4;
-if K' <2, then 0: LM, 1: LM-MM2, 2: LM-MM3, 3: LM-MF1, 4: LM-MF2, 5: LM-MF3, 6: LM-MF4, 7: a non-LM;
in another example, symbols for modes other than LM may be inserted into the mapping list depending on the number of neighboring chroma blocks, indicated as K', that are not coded in intra mode and not in LM mode. Then, the symbol mapping list may be:
-if K' > ═ 3, then 0: LM, 1: non-LM, 2: LM-MM2, 3: LM-MM3, 4: LM-MF1, 5: LM-MF2, 6: LM-MF3, 7: LM-MF 4;
-if 1< ═ K' <3, then 0: LM, 1: LM-MM2, 2: LM-MM3, 3: non-LM, 4: LM-MF1, 5: LM-MF2, 6: LM-MF3, 7: LM-MF 4;
-if K' ═ 0, then 0: LM, 1: LM-MM2, 2: LM-MM3, 3: LM-MF1, 4: LM-MF2, 5: LM-MF3, 6: LM-MF4, 7: a non-LM;
in some examples, using the LM of the present disclosure may depend on the block size. In one example, if the size of the current chroma block is mxn, LM-X does not apply in the case of mxn < ═ T. T may be a fixed number, or the value of T may be signaled from video encoder 20 to video decoder 30. LM-X may be any proposed new LM mode, such as LM-MM2, LM-MM3, LM-MF1, LM-MF2, LM-MF3, and LM-MF 4.
In another example, if the size of the current chroma block is mxn, LM-X does not apply if M + N < ═ T. T may be a fixed number, or the value of T may be signaled from video encoder 20 to video decoder 30. LM-X may be any proposed new LM mode, such as LM-MM2, LM-MM3, LM-MF1, LM-MF2, LM-MF3, and LM-MF 4.
In yet another example, if the size of the current chroma block is mxn, LM-X does not apply in the case of Min (M, N) < ═ T. T may be a fixed number, or the value of T may be signaled from video encoder 20 to video decoder 30. LM-X may be any proposed new LM mode, such as LM-MM2, LM-MM3, LM-MF1, LM-MF2, LM-MF3, and LM-MF 4.
In yet another example, if the size of the current chroma block is mxn, LM-X does not apply in the Max (M, N) < ═ T case. T may be a fixed number, or the value of T may be signaled from video encoder 20 to video decoder 30. LM-X may be any proposed new LM mode, such as LM-MM2, LM-MM3, LM-MF1, LM-MF2, LM-MF3, and LM-MF 4.
The use of the proposed LAP may depend on the block size. In one example, LAP does not apply in the case of M × N < ═ T. T may be a fixed number, or the value of T may be signaled from video encoder 20 to video decoder 30. In another example, LAP does not apply in the case of M + N < ═ T. In yet another example, LAP does not apply in the case of Min (M, N) < ═ T. In yet another example, LAP does not apply in the Max (M, N) < ═ T case. For example, T may be an integer, such as 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 … …, and the like.
FIG. 18 is a flow chart illustrating an example encoding method of the present invention. The techniques of fig. 18 may be performed by one or more components of video encoder 20.
In one example of this disclosure, video encoder 20 may be configured to encode a block of luma samples for a first block of video data (132), reconstruct an encoded block of luma samples to generate reconstructed luma samples (134), and predict chroma samples for the first block of video data using the reconstructed luma samples for the first block of video data and two or more linear prediction models (136).
In another example of this disclosure, video encoder 20 may be configured to determine parameters for each of the two or more linear prediction models using luma samples and chroma samples from a block of video data that neighbors the first block of video data. In one example, video encoder 20 may be configured to classify reconstructed luma samples greater than a first threshold as being in a first sample group of a plurality of sample groups, classify reconstructed luma samples less than or equal to the first threshold as being in a second sample group of the plurality of sample groups, apply a first linear prediction model of the two or more linear prediction models to reconstructed luma samples in the first sample group, apply a second linear prediction model of the two or model linear prediction models to reconstructed luma samples in the second sample group, the second linear prediction model being different from the first linear prediction model, and determine predicted chroma samples in the first block of video data based on the applied first linear prediction model and the applied second linear prediction model. In one example, the first threshold depends on neighboring coded luma and chroma samples.
In another example of this disclosure, video encoder 20 may be configured to down-sample the reconstructed luma samples. In another example of this disclosure, video encoder 20 may be configured to determine one of a plurality of downsampling filters to use to downsample reconstructed luma samples, downsample the reconstructed luma samples using the determined downsampling filter to generate luma samples to be downsampled, and predict chroma samples of the first block of video data using the downsampled luma samples and two or more linear prediction models.
In another example of this disclosure, video encoder 20 may be configured to determine whether chroma samples of a second block of video data are coded using a linear prediction model of two or more linear prediction models. In the case that chroma samples of the second block of video data are not coded using a linear prediction model, video encoder 20 may be configured to determine that a linear mode angular prediction mode is enabled, apply an angular mode prediction pattern to chroma samples of the second block of video data to generate first predicted chroma values, apply a linear model prediction pattern to corresponding luma samples of the second block of video data to generate second predicted chroma values, and determine a final block of the second block of video data having predicted chroma values by determining a weighted average of the first predicted chroma values and the second predicted chroma values.
In another example of this disclosure, video encoder 20 may be configured to determine, relative to a first block of video data, a number of neighboring chroma blocks coded using a linear prediction model coding mode, and dynamically change a particular type of codeword used to indicate the linear prediction model coding mode based on the determined number of neighboring chroma blocks of video data coded using the linear prediction model coding mode. In one example, video encoder 20 may be configured to use a first symbol mapping list based on a number of neighboring chroma blocks of video data coded using a linear prediction model coding mode being zero, use a second symbol mapping list based on the number of neighboring chroma blocks of video data coded using the linear prediction model coding mode being below a threshold, use a third symbol mapping list based on the number of neighboring chroma blocks of video data coded using the linear prediction model coding mode being greater than the threshold.
Fig. 19 is a flow diagram illustrating an example encoding method of the present invention. The techniques of fig. 19 may be performed by one or more components of video decoder 30.
In one example of this disclosure, video decoder 30 may be configured to receive an encoded block of luma samples for a first block of video data (142), decode the encoded block of luma samples to generate reconstructed luma samples (144), and predict chroma samples for the first block of video data using the reconstructed luma samples for the first block of video data and two or more linear prediction models (146).
In another example of this disclosure, video decoder 30 may be configured to determine parameters for each of two or more linear prediction models using luma samples and chroma samples from a block of video data that neighbors a first block of video data. In one example, video decoder 30 may be configured to classify reconstructed luma samples greater than a first threshold as being in a first sample group of a plurality of sample groups, classify reconstructed luma samples less than or equal to the first threshold as being in a second sample group of the plurality of sample groups, apply a first linear prediction model of the two or more linear prediction models to reconstructed luma samples in the first sample group, apply a second linear prediction model of the two or model linear prediction models to reconstructed luma samples in the second sample group, the second linear prediction model being different from the first linear prediction model, and determine predicted chroma samples in the first block of video data based on the applied first linear prediction model and the applied second linear prediction model. In one example, the first threshold depends on neighboring coded luma and chroma samples.
In another example of this disclosure, video decoder 30 may be configured to down-sample the reconstructed luma samples. In another example of this disclosure, video decoder 30 may be configured to determine one of a plurality of downsampling filters to use to downsample reconstructed luma samples, downsample the reconstructed luma samples using the determined downsampling filters to generate downsampled luma samples, and predict chroma samples of the first video data block using the downsampled luma samples and two or more linear prediction models.
In another example of this disclosure, video decoder 30 may be configured to determine whether chroma samples of a second block of video data are coded using a linear prediction model of two or more linear prediction models. In the case that chroma samples of the second block of video data are not coded using a linear prediction model, video decoder 30 may be configured to determine that linear-mode angular prediction mode is enabled, apply an angular-mode prediction pattern to chroma samples of the second block of video data to generate first predicted chroma values, apply a linear-mode prediction pattern to corresponding luma samples of the second block of video data to generate second predicted chroma values, and determine a final block of the second block of video data having the predicted chroma values by determining a weighted average of the first predicted chroma values and the second predicted chroma values.
In another example of this disclosure, video decoder 30 may be configured to determine, relative to a first block of video data, a number of neighboring chroma blocks coded using a linear prediction model coding mode, and dynamically change a particular type of codeword used to indicate the linear prediction model coding mode based on the determined number of neighboring chroma blocks of video data coded using the linear prediction model coding mode. In one example, video encoder 20 may be configured to use a first symbol mapping list based on a number of neighboring chroma blocks of video data coded using a linear prediction model coding mode being zero, use a second symbol mapping list based on the number of neighboring chroma blocks of video data coded using the linear prediction model coding mode being below a threshold, use a third symbol mapping list based on the number of neighboring chroma blocks of video data coded using the linear prediction model coding mode being greater than the threshold.
FIG. 20 is a flow diagram illustrating an example method for encoding a current block. The current block may include the current CU or a portion of the current CU. Although described with respect to video encoder 20 (fig. 1 and 2), it should be understood that other devices may be configured to perform a method similar to that of fig. 20.
In this example, video encoder 20 initially predicts the current block (150). For example, video encoder 20 may calculate one or more Prediction Units (PUs) for the current block. Video encoder 20 may then calculate a residual block for the current block, e.g., to generate a Transform Unit (TU) (152). To calculate the residual block, video encoder 20 may calculate the difference between the original un-coded block and the predicted block for the current block. Video encoder 20 may then transform and quantize the coefficients of the residual block (154). Video encoder 20 may then scan the quantized transform coefficients of the residual block (156). During or after scanning, video encoder 20 may entropy encode the coefficients (158). For example, video encoder 20 may encode the coefficients using CAVLC or CABAC. Video encoder 20 may then output entropy coded data for the coefficients of the block (160).
Fig. 21 is a flow diagram illustrating an example method for decoding a current block of video data. The current block may include the current CU or a portion of the current CU. Although described with respect to video decoder 30 (fig. 1 and 3), it should be understood that other devices may be configured to perform a method similar to that of fig. 21.
Video decoder 30 may predict the current block (200), e.g., using intra or inter prediction modes, to calculate a predicted block for the current block. Video decoder 30 may also receive entropy coding data for the current block, such as entropy coding data for coefficients of a residual block corresponding to the current block (202). Video decoder 30 may entropy decode the entropy coded data to render coefficients of the residual block (204). Video decoder 30 may then inverse scan the rendered coefficients (206) to generate a block of quantized transform coefficients. Video decoder 30 may then inverse quantize and inverse transform the coefficients to generate a residual block (208). Video decoder 30 may finally decode the current block by combining the predicted block and the residual block (210).
The following summarizes examples of the invention discussed above. The above examples of LM prediction using multi-model methods, multi-filter methods, and LM angle prediction may be applied individually or in any combination. There are more than one linear model between luma and chroma components in a coding block/Coding Unit (CU)/Transform Unit (TU). Neighboring luma and chroma samples of the current block may be classified into groups, and each group may be used as a training set to derive a linear model (i.e., specific alpha and beta are derived for a specific group). In one example, the classification of the sample may be based on the intensity or location of the sample. In another example, the classification method may be signaled from the encoder to the decoder.
In one example, as shown in fig. 7A, neighboring samples may be classified into two groups. Wherein Rec'L[x,y]Adjacent samples ≦ threshold may be classified as group 1; and wherein Rec'L[x,y]>The adjacent samples of the threshold may be classified into group 2. In one example, the threshold may be calculated depending on neighboring coded luma/chroma samples and coded luma samples in the current block. In one example, the threshold may be calculated as an average of neighboring coded luma samples (which may be downsampled if not in the 4:4:4 format). In another example, the threshold may be calculated as a median value of neighboring coded luma samples (which may be downsampled if not in the 4:4:4 format). In yet another example, the threshold may be calculated as an average of minV and maxV, where minV and maxV are the minimum and maximum values, respectively, of neighboring coded luma samples (which may be downsampled if not in 4:4:4 format). In another example, the threshold may be calculated as an average of neighboring coded luma samples and coded luma samples in the current block, which may be downsampled without being in the 4:4:4 format. In another example, the threshold may be calculated as a median of neighboring coded luma samples and coded luma samples of the current block, which may be downsampled without being in a 4:4:4 format. In another example, the threshold may be calculated as an average of minV and maxV, where minV and maxV are the minimum and maximum values of neighboring coded luma samples and coded luma samples (which may be downsampled if not in 4:4:4 format) in the current block, respectively. In one example, the threshold may be signaled from encoder 20 to decoder 30.
In one example, as illustrated in fig. 7B, neighboring samples may be classified into three groups. Wherein Rec'L[x,y]Neighboring samples ≦ threshold 1 may be classified as group 1; wherein the threshold value is 1<Rec′L[x,y]Adjacent samples ≦ threshold 2 may be classified as group 2; and wherein Rec'L[x,y]>The neighboring samples of threshold 2 may be classified into group 3. In one example, threshold 1 and threshold 2 may depend on neighboring coded luma/chroma samples and the coded luma/chroma samples in the current blockThe decoded luma samples are computed. In one example, the threshold may first be calculated as described above. Threshold 1 may then be calculated as the average of minV and the threshold. Threshold 2 may be calculated as the average of maxV and the threshold. minV and maxV may be the minimum and maximum of neighboring coded luma samples, respectively, which may be downsampled if not in the 4:4:4 format. In another example, threshold 1 may be calculated as 1/3 of sumV. Threshold 2 may be calculated as 2/3 of sumV. sumV may be the cumulative sum of neighboring coded luma samples, which may be downsampled if not in the 4:4:4 format. In another example, threshold 1 may be calculated as S [ N/3 ]]And S [ N/3+1 ]]A value in between. Threshold 2 can be calculated as S [2 x N/3%]And S [2 x N/3+1]. N may be the total number of neighboring coded luma samples (which may be downsampled if not in the 4:4:4 format). S0 ],S[1],…S[N-2],S[N-1]A ascending classification sequence of neighboring coded luma samples, which may be downsampled if not in 4:4:4 format, may be used. In another example, the threshold may first be calculated as described above. Threshold 1 may then be calculated as the average of minV and the threshold. Threshold 2 may be calculated as the average of maxV and the threshold. minV and maxV can be the minimum and maximum of neighboring coded luma samples (which can be downsampled if not in 4:4:4 format) and current block coded luma samples (which can be downsampled if not in 4:4:4 format), respectively. In another example, threshold 1 may be calculated as 1/3 of sumV. Threshold 2 may be calculated as 2/3 of sumV. sumV may be the cumulative sum of neighboring coded luma samples (which may be downsampled if not in the 4:4:4 format) and coded luma samples (which may be downsampled if not in the 4:4:4 format) of the current block. In another example, threshold 1 may be calculated as S [ N/3 ]]And S [ N/3+1 ]]A value in between. The threshold 2 may be calculated as S [2 x N/3%]And S [2 x N/3+1]. N may be the total number of: neighboring coded luma samples (which may be downsampled if not in 4:4:4 format), and coded luma samples in the current block (which may be downsampled if not in 4:4:4 format). S0 ],S[1],…S[N-2],S[N-1]Ascending categorizing sequences for each of the following may be used: adjacent coded luma samples (which may be downsampled if not in 4:4:4 format)And coded luma samples in the current block (which may be downsampled if not in 4:4:4 format). In one example, threshold 1 and threshold 2 may be signaled from encoder 20 to decoder 30. In one example, more neighboring samples may be used to derive the above linear model, e.g., as in the examples shown in fig. 8A-8D.
In one example, model 1 or model 2 derived in MMLM may be applied to all pixels in the current block, as illustrated in fig. 11 and 12, respectively. In another example, the pixels in the current block may be classified first, then some of the pixels are selected to apply model 1 while other pixels select to apply model 2, as illustrated in FIG. 13. In one example, it may be required that the classification method should be the same for coded neighboring luma samples and for coded luma samples in the current block.
In one example, as illustrated in fig. 13, coded luma samples (downsampled without 4:4:4 format) in the current block in group 1 may apply model 1 to derive corresponding predicted chroma samples in the current block, while coded luma samples (downsampled without 4:4:4 format) in the current block in group 2 may apply model 2 to derive corresponding predicted chroma samples in the current block. In this way, predicted chroma samples in the current block may be derived according to two linear models. When there are more groups, more linear models may be used to obtain the predicted chroma samples.
In one example, the number of samples in the group after classification may be required to be greater than a certain number, such as 2 or 3. In one example, if the number of samples in a group is less than a particular number, samples in other groups may be changed to this group. For example, the samples in the group with the most samples may be changed to groups with less than a certain number of samples. In one example, the sample in the group with the most samples (named group a) may be changed to a group with less than a certain number of samples (named group B) if the sample is the most recent sample to the existing sample in group B. "nearest" may refer to nearest in pixel position. Or "closest" to being closest in intensity. In one example, encoder 20 may signal the number of groups into which the samples should be classified into decoder 30. If the number is 1, it is the original LM mode. In another example, LM modes may be considered different LM modes with different numbers of groups, e.g., LM-MM1 for 1 group, LM-MM2 for 2 groups, and LM-MM3 for 3 groups. LM-MM1 is identical to the original LM mode. In another example, decoder 30 may derive the number of groups without encoder 20 signaling the number of groups.
In one example, there may be more than one luma downsampling filter if not in 4:4:4 format, rather than the downsampling filter defined in JEM-3.0, as illustrated in FIG. 6. In one example, the filter may be in the form of:
a.Rec′L[x,y]=a·RecL[2x,2y]+b·RecL[2x,2y+1]+c·RecL[2x-1,2y]+d·RecL[2x+1,2y]+e·RecL[2x-1,2y+1]+f·RecL[2x+1,2y+1]+g,
wherein a, b, c, d, e, f and g are real numbers.
b.Rec′L[x,y]=(a·RecL[2x,2y]+b·RecL[2x,2y+1]+c·RecL[2x-1,2y]+d·RecL[2x+1,2y]+e·RecL[2x-1,2y+1]+f·RecL[2x+1,2y+1]+g)/h,
Wherein a, b, c, d, e, f, g and h are integers.
c.Rec′L[x,y]=(a·RecL[2x,2y]+b·RecL[2x,2y+1]+c·RecL[2x-1,2y]+d·RecL[2x+1,2y]+e·RecL[2x-1,2y+1]+f·RecL[2x+1,2y+1]+g)>>h,
Wherein a, b, c, d, e, f, g and h are integers.
For example, examples of possible filters are illustrated in fig. 14A-14C, such as the following possible filters:
a.Rec′L[x,y]=(RecL[2x,2y]+RecL[2x+1,2y]+1)>>1;
b.Rec′L[x,y]=(RecL[2x+1,2y]+RecL[2x+1,2y+1]+1)>>1;
c.Rec′L[x,y]=(RecL[2x,2y]+RecL[2x,2y+1]+1)>>1;
d.Rec′L[x,y]=(RecL[2x,2y+1]+RecL[2x+1,2y+1]+1)>>1;
e.Rec′L[x,y]=(RecL[2x,2y]+RecL[2x+1,2y+1]+1)>>1;
f.Rec′L[x,y]=(RecL[2x,2y+1]+RecL[2x+1,2y]+1)>>1;
g.Rec′L[x,y]=(RecL[2x,2y]+RecL[2x,2y+1]+RecL[2x-1,2y]+RecL[2x-1,2y+1]+2)>>2;
h.Rec′L[x,y]=(RecL[2x,2y]+RecL[2x,2y+1]+RecL[2x+1,2y]+RecL[2x+1,2y+1]+2)>>2;
i.Rec′L[x,y]=(2·RecL[2x,2y]+RecL[2x+1,2y]+RecL[2x-1,2y]+2)>>2;
j.Rec′L[x,y]=(2·RecL[2x,2y+1]+RecL[2x+1,2y+1]+RecL[2x-1,2y+1]+2)>>2;
k.Rec′L[x,y]=(RecL[2x-1,2y]+RecL[2x-1,2y+1]+1)>>1;
l.Rec′L[x,y]=(RecL[2x,2y+1]+RecL[2x-1,2y+1]+1)>>1;
m.Rec′L[x,y]=(RecL[2x-1,2y]+RecL[2x,2y+1]+1)>>1;
n.Rec′L[x,y]=(RecL[2x,2y]+RecL[2x-1,2y+1]+1)>>1;
o.Rec′L[x,y]=(RecL[2x,2y]+RecL[2x-1,2y]+1)>>1;
p.Rec′L[x,y]=(2·RecL[2x+1,2y]+RecL[2x+1,2y]+RecL[2x+1,2y+1]+2)>>2;
q.Rec′L[x,y]=(2·RecL[2x+1,2y+1]+RecL[2x,2y+1]+RecL[2x+1,2y]+2)>>2;
r.Rec′L[x,y]=(5·RecL[2x,2y+1]+RecL[2x-1,2y+1]+RecL[2x+1,2y+1]+RecL[2x,2y]+4)>>3;
in one example, if the sequence is not in 4:4:4 format, the LM mode may operate with any downsampling filter other than the filter defined in JEM-3.0 and shown in FIG. 6 of this disclosure. In one example, decoder 30 may derive the downsampling filter without encoder 20 signaling the downsampling filter. In an example, the filtered results may be clipped to a range of effective chroma values. The type of angular prediction and the type of LM prediction can be combined together to obtain the final prediction. A flag named LAP flag may be signaled if the current chroma block is coded by intra prediction rather than in any LM mode. In one example, if the prediction mode of the current chroma block is mode X, X may be a type of angular intra prediction (including planar mode and DC mode). It should be noted that if the current chroma block is signaled as DM mode, it is also considered angular mode since it is identical to a type of angular prediction mode for the corresponding luma block. In one example, two prediction patterns may be first generated for a chroma block and then combined together. One prediction pattern may be generated by the angular mode X. For example, other predictions may be generated by a type of LM mode, such as the LM-MM2 mode.
As illustrated in fig. 16, first, a prediction for each sample in the current block may be generated by an angular prediction mode X, denoted as P1(X, y). Then, a prediction for each sample in the current block may be generated by the LM-MM2 mode, which is denoted as P2(x, y). Then, the final LM angle prediction can be calculated as P (x, y) ═ w1(x, y) x P1(x, y) + w2(x, y) x P2(x, y), where (x, y) represents the coordinates of the sample in the block, and w1(x, y) and w2(x, y) are real numbers. w1(x, y) and w2(x, y) satisfy w1(x, y) + w2(x, y) ═ 1. In another example, the final LM-angle prediction may be calculated as: p (x, y) ═ w1(x, y) × P1(x, y) + w2(x, y) × P2(x, y) + a)/b,
where w1(x, y), w2(x, y), a and b are integers, and w1(x, y) and w2(x, y) may satisfy w1(x, y) + w2(x, y) ═ b.
In another example, the final LM-angle prediction may be calculated as P (x, y) ═ w1(x, y) × P1(x, y) + w2(x, y) × P2(x, y) + a) > b,
wherein w1(x, y), w2(x, y), a and b are integers, and w1(x, y) and w2(x, y) can satisfy: w1(x, y) + w2(x, y) ═ 2b. In one example, w1(x, y) and w2(x, y) may be different for different (x, y). In another example, w1(x, y) and w2(x, y) may not change for all (x, y). In one example, for all (x, y), P (x, y) ═ P1(x, y) + P2(x, y) +1) > 1.
In one example, LAP flag may be coded by CABAC. The coded content context may depend on the coded/decoded LAP flag of the neighboring blocks. For example, there may be three context contexts for LAP _ flag: LAPctx [0], LAPctx [1], and LAPctx [2 ]. The variable ctx may be calculated as ctx ═ LAP _ flag _ a + LAP _ flag _ B, where LAP _ flag _ a and LAP _ flag _ B are LAP _ flags of neighboring blocks a and B, respectively, as illustrated in fig. 17. In one example, P (x, y) may be clipped to a range of effective chroma values. In one example, coding of the LM mode may depend on chroma intra prediction modes of neighboring blocks. In one example, some LM modes, such as LM, LM-MM2, LM-MM3, LM-MF1, LM-MF2, LM-MF3, and LM-MF4, may be candidate LM modes. In this example, there are 7 candidate LM modes, and the non-LM mode is appended to represent the case where the current block is coded by angular mode, non-LM mode. If a non-LM is signaled, the angular mode may be signaled as in JEM-3.0, or in any other non-LM method.
In example coding logic, the DM _ flag may be coded first. If the chroma prediction mode is not the DM mode, the proposed LM coding () module may be called to indicate the current chroma prediction mode. If the LM _ coding () module codes a non-LM mode, then the Chroma _ intra _ mode _ coding () module may be called to code the angular Chroma prediction mode.
In one example, to signal the possible N modes (including non-LMs), N symbols (0,1, …,6,7) with different code words or called binarization may be used to represent the N possible modes. Symbols with smaller numbers may not have a code length that is longer than the code length of symbols with larger numbers. The symbols may be binarized in any manner, such as fixed length codes, unary codes, truncated unary codes, exponential golomb codes, and so on. In one example, there may be a default mapping between symbols and modes. In one example, the mapping may be fixed or dynamic according to decoded neighboring blocks.
In one example, symbols of mode non-LM may be inserted into the mapping list depending on the number of neighboring chroma blocks coded in LM mode, denoted K. In one example, the symbol mapping list may be:
if K ═ 0, then 0: LM, 1: non-LM, 2: LM-MM2, 3: LM-MM3, 4: LM-MF1, 5: LM-MF2, 6: LM-MF3, 7: LM-MF 4;
if 0< K < ═ 3, then 0: LM, 1: LM-MM2, 2: LM-MM3, 3: non-LM, 4: LM-MF1, 5: LM-MF2, 6: LM-MF3, 7: LM-MF 4;
if K >3, then 0: LM, 1: LM-MM2, 2: LM-MM3, 3: LM-MF1, 4: LM-MF2, 5: LM-MF3, 6: LM-MF4, 7: a non-LM;
In one example, symbols for a mode non-LM may be inserted into the mapping list depending on the number of neighboring chroma blocks indicated as K' that are not coded in LM mode, and the symbol mapping list may be:
if K' ═ 5, then 0: LM, 1: non-LM, 2: LM-MM2, 3: LM-MM3, 4: LM-MF1, 5: LM-MF2, 6: LM-MF3, 7: LM-MF 4;
if 2< K' <5, then 0: LM, 1: LM-MM2, 2: LM-MM3, 3: non-LM, 4: LM-MF1, 5: LM-MF2, 6: LM-MF 37: LM-MF 4;
if K' < ═ 2, then 0: LM, 1: LM-MM2, 2: LM-MM3, 3: LM-MF1, 4: LM-MF2, 5: LM-MF3, 6: LM-MF4, 7: a non-LM;
in one example, symbols of a mode non-LM may be inserted into the mapping list depending on the number of neighboring chroma blocks indicated as K' that are not coded in LM mode, and the symbol mapping list may be:
if K ═ 5, 0: LM, 1: non-LM, 2: LM-MM2, 3: LM-MM3, 4: LM-MF1, 5: LM-MF2, 6: LM-MF3, 7: LM-MF 4;
if 2< K' <5, 0: LM, 1: LM-MM2, 2: LM-MM3, 3: non-LM, 4: LM-MF1, 5: LM-MF2, 6: LM-MF3, 7: LM-MF 4;
if K' < ═ 2, 0: LM, 1: LM-MM2, 2: LM-MM3, 3: LM-MF1, 4: LM-MF2, 5: LM-MF3, 6: LM-MF4, 7: a non-LM;
In one example, the use of the proposed LM refinement may depend on the block size. In one example, if the size of the current chroma block is mxn, LM-X may not be applicable in the mxn < ═ T case, where T may be a fixed number or may be signaled from encoder 20 to decoder 30. For example, LM-X may be any LM mode, such as LM-MM2, LM-MM3, LM-MF1, LM-MF2, LM-MF3, and LM-MF 4. In one example, LM-X may not be applicable in the case of M + N < ═ T, where T may be a fixed number, or may be signaled from encoder 20 to decoder 30. For example, LM-X may be any LM mode, such as LM-MM2, LM-MM3, LM-MF1, LM-MF2, LM-MF3, and LM-MF 4. In one example, LM-X may not be applicable in the case of Min (M, N) < ═ T, where T may be a fixed number, or may be signaled from encoder 20 to decoder 30. For example, LM-X may be any LM mode, such as LM-MM2, LM-MM3, LM-MF1, LM-MF2, LM-MF3, and LM-MF 4. In another example, LM-X may not be applicable in the Max (M, N) < ═ T case. T may be a fixed number or may be signaled from encoder 20 to decoder 30. LM-X may be any LM mode, such as LM-MM2, LM-MM3, LM-MF1, LM-MF2, LM-MF3, and LM-MF 4. In one example, LAP may not be applicable in the mxn < ═ T case, where T may be a fixed number or may be signaled from encoder 20 to decoder 30. In one example, LAP may not be applicable in the case of M + N < ═ T, where T may be a fixed number or may be signaled from encoder 20 to decoder 30. In another example, LAP may not be applicable in the case of Min (M, N) < ═ T, where T may be a fixed number or may be signaled from encoder 20 to decoder 30. In one example, LAP may not be applicable in the Max (M, N) < ═ T case, where T may be a fixed number or may be signaled from encoder 20 to decoder 30.
It should be recognized that depending on the example, certain acts or events of any of the techniques described herein can be performed in a different sequence, can be added, merged, or left out entirely (e.g., not all described acts or events are necessary to practice the techniques). Further, in some instances, acts or events may be performed concurrently, rather than sequentially, e.g., via multi-threaded processing, interrupt processing, or multiple processors.
In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or program code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media (which corresponds to tangible media, such as data storage media) or communication media, including any medium that facilitates transfer of a computer program from one place to another, such as according to a communication protocol. In this manner, a computer-readable medium may generally correspond to (1) a tangible computer-readable storage medium that is non-transitory, or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. The computer program product may include a computer-readable medium.
By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, optical cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, optical cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, an Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Furthermore, the techniques may be fully implemented in one or more circuits or logic elements.
The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses including a wireless handset, an Integrated Circuit (IC), or a collection of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. In particular, the various units may be combined into a codec hardware unit, as described above, or provided by a set of interoperability hardware units (including one or more processors as described above) in combination with suitable software and/or firmware.
Various examples have been described. These and other examples are within the scope of the following claims.

Claims (27)

1. A method of decoding video data, the method comprising:
receiving an encoded block of luma samples for a first block of video data;
decoding the encoded blocks of luma samples to generate reconstructed luma samples;
classifying reconstructed luma samples greater than a first threshold as being in a first sample group of a plurality of sample groups;
classifying reconstructed luma samples below the first threshold as being in a second sample group of the plurality of sample groups; and
predicting chroma samples of the first block of video data by:
applying a first linear prediction model of two or more linear prediction models to reconstructed luma samples of the first group of samples;
applying a second linear prediction model of the two or more linear prediction models to reconstructed luma samples of the second group of samples, the second linear prediction model being different from the first linear prediction model; and
determine predicted chroma samples in the first block of video data based on the applied first linear prediction model and the applied second linear prediction model.
2. The method of claim 1, further comprising:
parameters for each of the two or more linear prediction models are determined using luma samples and chroma samples from a block of video data that neighbors the first block of video data.
3. The method of claim 1, wherein the first threshold is dependent on neighboring coded luma samples and chroma samples.
4. The method of claim 1, further comprising:
down-sampling the reconstructed luma samples.
5. The method of claim 1, further comprising:
determining one of a plurality of downsampling filters to use to downsample the reconstructed luma samples;
down-sampling the reconstructed luma samples using the determined down-sampling filter to generate down-sampled luma samples; and
predicting chroma samples of the first block of video data using the downsampled luma samples and the two or more linear prediction models.
6. The method of claim 1, further comprising:
determining whether chroma samples of a second block of video data are coded using a linear prediction model of the two or more linear prediction models, the second block of video data being different from the first block of video data;
In a case that the chroma samples of the second block of video data are not coded using the linear prediction model, the method further comprises:
determining that a linear mode angular prediction mode is enabled;
applying an angular mode prediction pattern to the chroma samples of the second block of video data to generate first predicted chroma values;
applying a linear model prediction pattern to corresponding luma samples of the second block of video data to generate second predicted chroma values; and
determining a final block of the second block of video data having predicted chroma values by determining a weighted average of the first predicted chroma values and the second predicted chroma values.
7. The method of claim 1, further comprising:
determining, with respect to the first block of video data, a number of neighboring chroma blocks coded using a linear prediction model coding mode; and
dynamically changing a particular type of codeword used to indicate the linear prediction model coding mode based on a determined number of neighboring chroma blocks of the video data coded using the linear prediction model coding mode.
8. The method of claim 7, wherein dynamically changing the codeword comprises:
Using a first symbol mapping list based on the number of neighboring chroma blocks of the video data coded using the linear prediction model coding mode being zero;
using a second symbol mapping list based on the number of neighboring chroma blocks of the video data coded using the linear prediction model coding mode being below a threshold; and
using a third symbol mapping list based on the number of neighboring chroma blocks of the video data coded using the linear prediction model coding mode being greater than the threshold.
9. A method of encoding video data, the method comprising:
encoding a block of luma samples of a first block of video data;
reconstructing the encoded blocks of luma samples to generate reconstructed luma samples;
classifying reconstructed luma samples greater than a first threshold value as being in a first sample group of a plurality of sample groups;
classifying reconstructed luma samples below the first threshold as being in a second sample group of the plurality of sample groups; and
predicting chroma samples of the first block of video data by:
applying a first linear prediction model of two or more linear prediction models to reconstructed luma samples of the first group of samples;
Applying a second linear prediction model of the two or more linear prediction models to reconstructed luma samples of the second group of samples, the second linear prediction model being different from the first linear prediction model; and
determine predicted chroma samples in the first block of video data based on the applied first linear prediction model and the applied second linear prediction model.
10. The method of claim 9, further comprising:
parameters for each of the two or more linear prediction models are determined using luma samples and chroma samples from a block of video data that neighbors the first block of video data.
11. The method of claim 9, wherein the first threshold is dependent on neighboring coded luma samples and chroma samples.
12. The method of claim 9, further comprising:
down-sampling the reconstructed luma samples.
13. The method of claim 9, further comprising:
determining one of a plurality of downsampling filters to use to downsample the reconstructed luma samples;
down-sampling the reconstructed luma samples using the determined down-sampling filter to generate down-sampled luma samples; and
Predicting chroma samples of the first block of video data using the downsampled luma samples and the two or more linear prediction models.
14. The method of claim 9, further comprising:
determining whether chroma samples of a second block of video data are coded using a linear prediction model of the two or more linear prediction models, the second block of video data being different from the first block of video data;
in a case that the chroma samples of the second block of video data are not coded using the linear prediction model, the method further comprises:
determining that a linear mode angular prediction mode is enabled;
applying an angular mode prediction pattern to the chroma samples of the second block of video data to generate first predicted chroma values;
applying a linear model prediction pattern to corresponding luma samples of the second block of video data to generate second predicted chroma values; and
determining a final block of the second block of video data having predicted chroma values by determining a weighted average of the first predicted chroma values and the second predicted chroma values.
15. The method of claim 9, further comprising:
Determining, with respect to the first block of video data, a number of neighboring chroma blocks coded using a linear prediction model coding mode; and
dynamically changing a particular type of codeword used to indicate the linear prediction model coding mode based on a determined number of neighboring chroma blocks of the video data coded using the linear prediction model coding mode.
16. The method of claim 15, wherein dynamically changing the codeword comprises:
using a first symbol mapping list based on the number of neighboring chroma blocks of the video data coded using the linear prediction model coding mode being zero;
using a second symbol mapping list based on the number of neighboring chroma blocks of the video data coded using the linear prediction model coding mode being below a threshold; and
using a third symbol mapping list based on the number of neighboring chroma blocks of the video data coded using the linear prediction model coding mode being greater than the threshold.
17. An apparatus configured to decode video data, the apparatus comprising:
a memory configured to receive a first block of video data; and
one or more processors configured to:
Receiving an encoded block of luma samples for the first block of video data;
decoding the encoded blocks of luma samples to generate reconstructed luma samples;
classifying reconstructed luma samples greater than a first threshold value as being in a first sample group of a plurality of sample groups;
classifying reconstructed luma samples below the first threshold as being in a second sample group of the plurality of sample groups; and
predicting chroma samples of the first block of video data by:
applying a first linear prediction model of two or more linear prediction models to reconstructed luma samples of the first group of samples;
applying a second linear prediction model of the two or more linear prediction models to reconstructed luma samples of the second group of samples, the second linear prediction model being different from the first linear prediction model; and
determine predicted chroma samples in the first block of video data based on the applied first linear prediction model and the applied second linear prediction model.
18. The apparatus of claim 17, wherein the one or more processors are further configured to:
Parameters for each of the two or more linear prediction models are determined using luma samples and chroma samples from a block of video data that neighbors the first block of video data.
19. The apparatus of claim 17, wherein the first threshold is dependent on neighboring coded luma samples and chroma samples.
20. The apparatus of claim 17, wherein the one or more processors are further configured to:
down-sampling the reconstructed luma samples.
21. The apparatus of claim 17, wherein the one or more processors are further configured to:
determining one of a plurality of downsampling filters to use to downsample the reconstructed luma samples;
down-sampling the reconstructed luma samples using the determined down-sampling filter to generate down-sampled luma samples; and
predicting chroma samples of the first block of video data using the downsampled luma samples and the two or more linear prediction models.
22. The apparatus of claim 17, wherein the one or more processors are further configured to:
Determining whether chroma samples of a second block of video data are coded using a linear prediction model of the two or more linear prediction models, the second block of video data being different from the first block of video data;
in a case that the chroma samples of the second block of video data are not coded using the linear prediction model, the one or more processors are further configured to:
determining that a linear mode angular prediction mode is enabled;
applying an angular mode prediction pattern to the chroma samples of the second block of video data to generate first predicted chroma values;
applying a linear model prediction pattern to corresponding luma samples of the second block of video data to generate second predicted chroma values; and
determining a final block of the second block of video data having predicted chroma values by determining a weighted average of the first predicted chroma values and the second predicted chroma values.
23. The apparatus of claim 17, wherein the one or more processors are further configured to:
determining, with respect to the first block of video data, a number of neighboring chroma blocks coded using a linear prediction model coding mode; and
Dynamically changing a codeword to indicate a particular type of the linear prediction model coding mode based on the determined number of neighboring chroma blocks of the video data coded using the linear prediction model coding mode.
24. The apparatus of claim 23, wherein to dynamically change the codeword, the one or more processors are further configured to:
using a first symbol mapping list based on the number of neighboring chroma blocks of the video data coded using the linear prediction model coding mode being zero;
using a second symbol mapping list based on the number of neighboring chroma blocks of the video data coded using the linear prediction model coding mode being below a threshold; and
using a third symbol mapping list based on the number of neighboring chroma blocks of the video data coded using the linear prediction model coding mode being greater than the threshold.
25. An apparatus configured to encode video data, the apparatus comprising:
a memory configured to receive a first block of video data; and
one or more processors configured to:
encoding a block of luma samples of a first block of video data;
Reconstruct the encoded blocks of luma samples to generate reconstructed luma samples;
classifying reconstructed luma samples greater than a first threshold as being in a first sample group of a plurality of sample groups;
classifying reconstructed luma samples below the first threshold as being in a second sample group of the plurality of sample groups; and
predicting chroma samples of the first block of video data by:
applying a first linear prediction model of two or more linear prediction models to reconstructed luma samples of the first group of samples;
applying a second linear prediction model of the two or more linear prediction models to reconstructed luma samples of the second group of samples, the second linear prediction model being different from the first linear prediction model; and
determine predicted chroma samples in the first block of video data based on the applied first linear prediction model and the applied second linear prediction model.
26. An apparatus configured to decode video data, the apparatus comprising:
means for receiving an encoded block of luma samples for a first block of video data;
Means for decoding the encoded blocks of luma samples to generate reconstructed luma samples;
means for classifying reconstructed luma samples greater than a first threshold as being in a first sample group of a plurality of sample groups;
means for classifying reconstructed luma samples below the first threshold as being in a second sample group of the plurality of sample groups; and
means for predicting chroma samples of the first block of video data by:
applying a first linear prediction model of two or more linear prediction models to reconstructed luma samples of the first group of samples;
applying a second linear prediction model of the two or more linear prediction models to reconstructed luma samples of the second group of samples, the second linear prediction model being different from the first linear prediction model; and
determine predicted chroma samples in the first block of video data based on the applied first linear prediction model and the applied second linear prediction model.
27. A computer-readable storage medium storing instructions that, when executed, cause one or more processors configured to decode video data to:
Receiving an encoded block of luma samples for a first block of video data;
decoding the encoded blocks of luma samples to generate reconstructed luma samples;
classifying reconstructed luma samples greater than a first threshold value as being in a first sample group of a plurality of sample groups;
classifying reconstructed luma samples below the first threshold as being in a second sample group of the plurality of sample groups; and
predicting chroma samples of the first block of video data by:
applying a first linear prediction model of two or more linear prediction models to reconstructed luma samples of the first group of samples;
applying a second linear prediction model of the two or more linear prediction models to reconstructed luma samples of the second group of samples, the second linear prediction model being different from the first linear prediction model; and
determine predicted chroma samples in the first block of video data based on the applied first linear prediction model and the applied second linear prediction model.
HK19124175.1A 2016-09-15 2017-09-15 Linear model chroma intra prediction for video coding HK40000966B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US62/395,145 2016-09-15
US15/705,029 2017-09-14

Publications (2)

Publication Number Publication Date
HK40000966A HK40000966A (en) 2020-02-21
HK40000966B true HK40000966B (en) 2022-12-09

Family

ID=

Similar Documents

Publication Publication Date Title
CN109716771B (en) Linear Model Chroma Intra Prediction for Video Coding
JP7284137B2 (en) Block information encoding/decoding method using quadtree and apparatus using the method
CA3000373C (en) Video intra-prediction using position dependent prediction combination for video coding
TWI782904B (en) Merging filters for multiple classes of blocks for video coding
JP5922244B2 (en) Sample adaptive offset merged with adaptive loop filter in video coding
JP5833249B2 (en) Adaptive centerband offset filter for video coding
KR101661828B1 (en) Adaptive overlapped block motion compensation
JP6293756B2 (en) Adjustment of transform criteria in scalable video coding
KR102301450B1 (en) Device and method for scalable coding of video information
TWI527440B (en) Low-complexity support of multiple layers for hevc extensions in video coding
US20150071357A1 (en) Partial intra block copying for video coding
US20130163664A1 (en) Unified partition mode table for intra-mode coding
JP6199371B2 (en) Inter-layer texture prediction for video coding
CN111149361B (en) Adaptive group of pictures structure with future reference frames in a random access configuration for video coding
CN104429076A (en) Generalized residual prediction for scalable video coding and 3d video coding
EP2883347A1 (en) Weighted difference prediction under the framework of generalized residual prediction
JP6224851B2 (en) System and method for low complexity coding and background detection
JP2024501465A (en) Adaptive loop filter with fixed filter
HK40000966B (en) Linear model chroma intra prediction for video coding
HK40000966A (en) Linear model chroma intra prediction for video coding
HK40018386A (en) Improved intra prediction in video coding